Fix PAN/OND spec ID parsing bug and add weather utilities, convenience APIs, and improved docs#9
Fix PAN/OND spec ID parsing bug and add weather utilities, convenience APIs, and improved docs#9
Conversation
iantsednv
commented
Apr 9, 2026
- Fix PAN/OND spec ID parsing bug and add weather utilities, convenience APIs, and improved docs
- Bug fix: Replace split(".")[0] with Path.stem for PAN/OND spec ID derivation in pvsystem.py, matching the existing validator in endpoint_modelchains.py — fixes silent mismatch for filenames with multiple dots (e.g. Trina_TSM-DEG19C.20-550_APP.PAN)
- Weather utilities: Add sf.from_dataframe(), sf.from_pvlib(), sf.validate_tsv_timestamps() for converting DataFrames to SolarFarmer TSV format, with automatic TMY year remapping and pvlib column mapping
- Client-side TMY validation: validate_tsv_timestamps() is called automatically before upload, replacing the opaque HTTP 400 with a clear ValueError
- Convenience properties: CalculationResults.net_energy_MWh, .performance_ratio, .energy_yield_kWh_per_kWp for year-1 metrics without dict traversal
- Enum exports: sf.MountingType, sf.InverterType, sf.OrientationType now available at the top-level namespace
- pan_files/ond_files accept lists: plant.pan_files = [Path("module.PAN")] now works in addition to dicts; dict keys documented as labels only (spec ID comes from filename)
- pandas as optional dep: [weather] extra; all pandas-dependent functions guard with ImportError pointing to install instructions
- Docs: Updated API reference, workflow guides, and FAQ to cover all of the above
…ence APIs, improve docs per SDK
| or to parse timeseries results as DataFrames, install the `weather` extra: | ||
|
|
||
| ```bash | ||
| pip install "dnv-solarfarmer[weather]" |
There was a problem hiding this comment.
I think weather will not age well. It's already awkward with how the response parsing also optionally uses pandas. One "kick the can down the road" solution is putting it into an all dependency while also moving the dev/test/docs dependencies into dependency groups
solarfarmer/weather.py
Outdated
| rename = {"ghi": "GHI", "dhi": "DHI", "temp_air": "TAmb", | ||
| "wind_speed": "WS", "pressure": "Pressure"} | ||
| df = pvlib_df.rename(columns=rename) | ||
| df["Pressure"] = df["Pressure"] / 100 # Pa → mbar |
There was a problem hiding this comment.
here and in the description above we need to be more generic about units. pvlib does not, to my knowledge, make any promises about units - only variable names. So let's simply state what SF needs for units (or refer to the dict spec) and state that user should ensure the various pvlib data sources are consistent with what SF needs. Then we can provide the NSRDB PSM example for pressure.
solarfarmer/weather.py
Outdated
| __all__ = ["TSV_COLUMNS", "validate_tsv_timestamps", "from_dataframe", "from_pvlib"] | ||
|
|
||
|
|
||
| def validate_tsv_timestamps(file_path: str | pathlib.Path) -> None: |
There was a problem hiding this comment.
This requirement is specific to tmy, the function name and documentation should be clear about that. It seems to me that the function is being used in the get_files function in a way that blocks non-tmy workflows that may rightly span a year turnover. Or am I misunderstanding something?
I'm inclined to say that we remove this function and rely on documentation. I think documentation has been improved significantly in this and previous PRs. What do you think?
There was a problem hiding this comment.
validate_tsv_timestamps was not correctly designed. The goal was to protected against shuffled-year TMYs provided as a tsv. We want to allow tsv of any duration as it is designed for, so refactored that function.
solarfarmer/weather.py
Outdated
|
|
||
| _PANDAS_INSTALL_MSG = ( | ||
| "pandas is required for this function. Install it with: pip install 'dnv-solarfarmer[weather]'" | ||
| ) |
There was a problem hiding this comment.
this isn't the only module that conditionally needs pandas so refactor into init or some other common place
solarfarmer/weather.py
Outdated
| out.index = out.index.map(lambda t: t.replace(year=year)) | ||
|
|
||
| out.index = out.index.map( | ||
| lambda t: t.strftime("%Y-%m-%dT%H:%M") + t.strftime("%z")[:3] + ":" + t.strftime("%z")[3:] |
There was a problem hiding this comment.
The %z → manual colon insertion ([:3] + ":" + [3:]) assumes %z returns exactly ±HHMM. On Python 3.12+ or some tz implementations, %z can return ±HH:MM or ±HH:MM:SS. No test with a non-UTC timezone (e.g., +05:30). Consider datetime.isoformat() or explicit formatting.
and if I recall correctly, the map pattern is less efficient than index.strftime
There was a problem hiding this comment.
The tradeoff:
index.strftime("%Y-%m-%dT%H:%M%z")would be fast but reproduces exactly the %z bug we just fixed (+0530 without colon). There's no vectorized pandas equivalent ofisoformat(timespec="minutes")that reliably produces ±HH:MM.
I had it do a profile:
Surprisingly,
.map(isoformat)is actually faster than vectorizedstrftimehere: 141 ms vs 228 ms.datetime.isoformat()is implemented in C and very lean, whilestrftimehas heavier format-parsing overhead per element even in the vectorized path.At 87,600 rows it's ~140 ms either way — not a bottleneck worth optimizing given the API call that follows will take orders of magnitude longer. Keep
isoformat.
| extension_met_data = pathlib.Path(meteorological_data_file_path).suffix.lower() | ||
|
|
||
| if extension_met_data in (".tsv", ".dat"): | ||
| validate_tsv_timestamps(meteorological_data_file_path) |
There was a problem hiding this comment.
this is in a block that checks .dat too? does that actually work? if we keep the function and if this works, then function should be renamed
| # ----- Convenience properties for common metrics (year 1) ----- | ||
|
|
||
| @property | ||
| def net_energy_MWh(self) -> float: |
There was a problem hiding this comment.
if these properties are specific to year 1 then they should be named as such. My sense is that we have some larger refactors to consider for how to return results. As such, I think less API surface is better for now and I recommend deleting these properties. The documentation example is good enough for now.
There was a problem hiding this comment.
Oh I see there there is a fallback indexing in get_performance that would break the assumption this is year 1. Hmmm probably the solution is to remove that fallback. Seems to me that it contradicts the stated documentation of the function.
| before the last dot), not from the dict key. | ||
| """ | ||
| self._pan_files.clear() | ||
| if isinstance(mapping, (list, tuple)): |
There was a problem hiding this comment.
Agent concern:
List setter doesn't validate empty lists or duplicate stems.
pan_files = [Path("a/module.PAN"), Path("b/module.PAN")] — second silently overwrites the first. No empty-list warning. The type annotation says Sequence[PathLike] but runtime check is isinstance(mapping, (list, tuple)) — other sequences fall through to .items() and fail cryptically.
My take: at end of method you can fail if len(mapping) != len(self._pan_files). Maybe you/your agent has a better idea.
…ulti-year and sub-year data