Skip to content

Fix PAN/OND spec ID parsing bug and add weather utilities, convenience APIs, and improved docs#9

Open
iantsednv wants to merge 18 commits intomainfrom
bugfix/filename_parsing_better_docs
Open

Fix PAN/OND spec ID parsing bug and add weather utilities, convenience APIs, and improved docs#9
iantsednv wants to merge 18 commits intomainfrom
bugfix/filename_parsing_better_docs

Conversation

@iantsednv
Copy link
Copy Markdown
Collaborator

  • Fix PAN/OND spec ID parsing bug and add weather utilities, convenience APIs, and improved docs
  • Bug fix: Replace split(".")[0] with Path.stem for PAN/OND spec ID derivation in pvsystem.py, matching the existing validator in endpoint_modelchains.py — fixes silent mismatch for filenames with multiple dots (e.g. Trina_TSM-DEG19C.20-550_APP.PAN)
  • Weather utilities: Add sf.from_dataframe(), sf.from_pvlib(), sf.validate_tsv_timestamps() for converting DataFrames to SolarFarmer TSV format, with automatic TMY year remapping and pvlib column mapping
  • Client-side TMY validation: validate_tsv_timestamps() is called automatically before upload, replacing the opaque HTTP 400 with a clear ValueError
  • Convenience properties: CalculationResults.net_energy_MWh, .performance_ratio, .energy_yield_kWh_per_kWp for year-1 metrics without dict traversal
  • Enum exports: sf.MountingType, sf.InverterType, sf.OrientationType now available at the top-level namespace
  • pan_files/ond_files accept lists: plant.pan_files = [Path("module.PAN")] now works in addition to dicts; dict keys documented as labels only (spec ID comes from filename)
  • pandas as optional dep: [weather] extra; all pandas-dependent functions guard with ImportError pointing to install instructions
  • Docs: Updated API reference, workflow guides, and FAQ to cover all of the above

@iantsednv iantsednv marked this pull request as ready for review April 9, 2026 22:15
or to parse timeseries results as DataFrames, install the `weather` extra:

```bash
pip install "dnv-solarfarmer[weather]"
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think weather will not age well. It's already awkward with how the response parsing also optionally uses pandas. One "kick the can down the road" solution is putting it into an all dependency while also moving the dev/test/docs dependencies into dependency groups

rename = {"ghi": "GHI", "dhi": "DHI", "temp_air": "TAmb",
"wind_speed": "WS", "pressure": "Pressure"}
df = pvlib_df.rename(columns=rename)
df["Pressure"] = df["Pressure"] / 100 # Pa → mbar
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

here and in the description above we need to be more generic about units. pvlib does not, to my knowledge, make any promises about units - only variable names. So let's simply state what SF needs for units (or refer to the dict spec) and state that user should ensure the various pvlib data sources are consistent with what SF needs. Then we can provide the NSRDB PSM example for pressure.

__all__ = ["TSV_COLUMNS", "validate_tsv_timestamps", "from_dataframe", "from_pvlib"]


def validate_tsv_timestamps(file_path: str | pathlib.Path) -> None:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This requirement is specific to tmy, the function name and documentation should be clear about that. It seems to me that the function is being used in the get_files function in a way that blocks non-tmy workflows that may rightly span a year turnover. Or am I misunderstanding something?

I'm inclined to say that we remove this function and rely on documentation. I think documentation has been improved significantly in this and previous PRs. What do you think?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

validate_tsv_timestamps was not correctly designed. The goal was to protected against shuffled-year TMYs provided as a tsv. We want to allow tsv of any duration as it is designed for, so refactored that function.


_PANDAS_INSTALL_MSG = (
"pandas is required for this function. Install it with: pip install 'dnv-solarfarmer[weather]'"
)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this isn't the only module that conditionally needs pandas so refactor into init or some other common place

out.index = out.index.map(lambda t: t.replace(year=year))

out.index = out.index.map(
lambda t: t.strftime("%Y-%m-%dT%H:%M") + t.strftime("%z")[:3] + ":" + t.strftime("%z")[3:]
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The %z → manual colon insertion ([:3] + ":" + [3:]) assumes %z returns exactly ±HHMM. On Python 3.12+ or some tz implementations, %z can return ±HH:MM or ±HH:MM:SS. No test with a non-UTC timezone (e.g., +05:30). Consider datetime.isoformat() or explicit formatting.

and if I recall correctly, the map pattern is less efficient than index.strftime

Copy link
Copy Markdown
Collaborator Author

@iantsednv iantsednv Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tradeoff: index.strftime("%Y-%m-%dT%H:%M%z") would be fast but reproduces exactly the %z bug we just fixed (+0530 without colon). There's no vectorized pandas equivalent of isoformat(timespec="minutes") that reliably produces ±HH:MM.

I had it do a profile:

Surprisingly, .map(isoformat) is actually faster than vectorized strftime here: 141 ms vs 228 ms. datetime.isoformat() is implemented in C and very lean, while strftime has heavier format-parsing overhead per element even in the vectorized path.

At 87,600 rows it's ~140 ms either way — not a bottleneck worth optimizing given the API call that follows will take orders of magnitude longer. Keep isoformat.

extension_met_data = pathlib.Path(meteorological_data_file_path).suffix.lower()

if extension_met_data in (".tsv", ".dat"):
validate_tsv_timestamps(meteorological_data_file_path)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is in a block that checks .dat too? does that actually work? if we keep the function and if this works, then function should be renamed

# ----- Convenience properties for common metrics (year 1) -----

@property
def net_energy_MWh(self) -> float:
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if these properties are specific to year 1 then they should be named as such. My sense is that we have some larger refactors to consider for how to return results. As such, I think less API surface is better for now and I recommend deleting these properties. The documentation example is good enough for now.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh I see there there is a fallback indexing in get_performance that would break the assumption this is year 1. Hmmm probably the solution is to remove that fallback. Seems to me that it contradicts the stated documentation of the function.

before the last dot), not from the dict key.
"""
self._pan_files.clear()
if isinstance(mapping, (list, tuple)):
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agent concern:

List setter doesn't validate empty lists or duplicate stems.
pan_files = [Path("a/module.PAN"), Path("b/module.PAN")] — second silently overwrites the first. No empty-list warning. The type annotation says Sequence[PathLike] but runtime check is isinstance(mapping, (list, tuple)) — other sequences fall through to .items() and fail cryptically.

My take: at end of method you can fail if len(mapping) != len(self._pan_files). Maybe you/your agent has a better idea.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants