-
Notifications
You must be signed in to change notification settings - Fork 13
Description
We are planning to update the current dataset_compliance method available on a Field (or its Domain) to provide as its output a complete and general summary of the CF Conventions compliance of the Field in line with the canonical Conformance document. The output encoding this information would be machine-readable in some agreed and easily-parsable structure, which could then be processed to produce a human-digestible report (and even fancier views such as graphs with nodes for families of variables and attributes etc. leading to ultimate reasons for non-compliance), for example as served up on a browser page as with the (lately necessarily unmaintained) NCAS/CEDA CF Checker.
This has been in discussion over the past month or so, with a PR in preparation for a bit longer than that, but I will use this Issue to register our plans and to house any discussion on converging towards a design for the final data structure for the output of the dataset_compliance method. (This saves from needing to open the PR in preparation pre-maturely in draft form.)
Data structure for output
After some discussions over the past month we've agreed - as I understand it, please correct me if anything seems amiss etc. - that the structure should be as follows, using to illustrate two field examples, one a non-UGRID field and the other a UGRID field (as per EXPECT project focus), the data structure we want to emerge is as follows, with notable features/points being:
- information about intermediate variables was previously not registered in the output - we want to include it those to clarify the relationship between attributes which have problems noted;
- we address the above with a nesting structure whereby variables are keys against an
objects_dict(see below 'Code to generate') noting problems with attributes and/or dimensions, and either of the latter are keys against areason_dict(see also below) having reasons registered either as a nested list of dicts or, eventually, a string value, along with the value of the attribute itself (and a code corresponding to the issue, only if the reason is a singular string case); - we (sadly) have to use a list of dicts for reasons as per the above point, rather than a simple dict with multiple items as individual keys, because for (at least) the case of cell methods, there is no variable to register an issue under - so we'd need a way to register a
reason_dictwithout anything to key it under, and a list seems most appropriate; - the CF version checked for compliance against only needs to be noted once, at the top-level - not repeated throughout the structure as-is with the present prototype in
main.
General structure
If a field itself and one of the attributes defined on it had a bad standard name, the following would represent this, noting I have only populated the 'attributes' parts of the structure for demonstration, where the 'dimensions' keys would have similar structure to register dimensional issues:
{'the_field_variable': {'CF_version': '1.12',
'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. bad '
'standard '
'name',
'value': None}},
{'an_attribute': {'reason': {'child_variable': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}}],
'dimension_sizes': {},
'dimensions': []}}Examples
Example UGRID field
{'pa': {'CF_version': '1.12',
'attributes': [{'standard_name': {'code': 0,
'reason': 'some string reason e.g. '
'bad standard name',
'value': None}},
{'mesh': {'reason': {'Mesh2': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}},
{'edge_node_connectivity': {'reason': {'Mesh2_edge_nodes': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}},
{'face_face_connectivity': {'reason': {'Mesh2_face_links': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}},
{'face_node_connectivity': {'reason': {'Mesh2_face_nodes': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}}],
'dimension_sizes': {},
'dimensions': []}}Example non-UGRID field
Say there were some bad standard names: on the variable corresponding to the field itself, something under the ancil variable, a cell measure, and some orogoraphy related variables:
{'ta': {'CF_version': '1.12',
'attributes': [{'standard_name': {'code': 0,
'reason': 'some string reason e.g. '
'bad standard name',
'value': None}},
{'ancillary_variables': {'reason': {'air_temperature_standard_error': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}},
{'cell_measures': {'reason': {'cell_measure': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}},
{'surface_altitude': {'reason': {'x': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []},
'y': {'attributes': [{'standard_name': {'code': 0,
'reason': 'some '
'string '
'reason '
'e.g. '
'bad '
'standard '
'name',
'value': None}}],
'dimension_sizes': {},
'dimensions': []}},
'value': None}}],
'dimension_sizes': {},
'dimensions': []}}Code to generate data structure cases above for ease of editing
from pprint import pprint
reason_dict = {"reason": {}, "value": None}
# Use None and 0 as placeholders for actual code value and 'value' value
reason_dict_end = {
"reason": "some string reason e.g. bad standard name",
"code": 0,
"value": None
}
objects_dict = {
# Not showing in this demo, but basically dimension names as keys with
# sizes as values, a simple non-nested dict structure.
"dimension_sizes": {},
# Lists of dicts of relevant object info. in a congruous way
"dimensions": [],
"attributes": [],
}
def populate_reason_dict(set_reason_info):
d = reason_dict.copy()
# Ignore values (singular) for this demo which are only to indicate the
# structure
d["reason"] = set_reason_info
return d
def populate_objects_dict(
set_attrs_info, set_dims_info=False, set_dims_sizes=False,
is_top_level=False,
):
d = objects_dict.copy()
if is_top_level:
d["CF_version"] = "1.12" # placeholder for actual value
if set_attrs_info:
d["attributes"] = set_attrs_info
if set_dims_info:
d["dimensions"] = set_dims_info
return d
general_idea = {
"the_field_variable": populate_objects_dict(
[
{"standard_name": reason_dict_end},
{
"an_attribute": populate_reason_dict(
{
"child_variable": populate_objects_dict(
[
{"standard_name": reason_dict_end},
]
)
}
)
},
], is_top_level=True,
),
}
non_ugrid_bad_names_example = {
"ta": populate_objects_dict(
[
{"standard_name": reason_dict_end},
{"ancillary_variables": populate_reason_dict(
{
"air_temperature_standard_error": populate_objects_dict(
[{"standard_name": reason_dict_end}]
)
}
)},
{"cell_measures": populate_reason_dict(
{
"cell_measure": populate_objects_dict(
[{"standard_name": reason_dict_end}]
)
}
)},
{"surface_altitude": populate_reason_dict(
{
"x": populate_objects_dict(
[{"standard_name": reason_dict_end}]
),
"y": populate_objects_dict(
[{"standard_name": reason_dict_end}]
)
},
)},
], is_top_level=True,
),
}
ugrid_bad_names_example = {
"pa": populate_objects_dict(
[
{"standard_name": reason_dict_end},
{
"mesh": populate_reason_dict(
{
"Mesh2": populate_objects_dict(
[
{"standard_name": reason_dict_end},
{
"edge_node_connectivity": populate_reason_dict(
{
"Mesh2_edge_nodes": populate_objects_dict(
[{"standard_name": reason_dict_end}]
)
}
)
},
{
"face_face_connectivity": populate_reason_dict(
{
"Mesh2_face_links": populate_objects_dict(
[{"standard_name": reason_dict_end}]
)
}
)
},
{
"face_node_connectivity": populate_reason_dict(
{
"Mesh2_face_nodes": populate_objects_dict(
[{"standard_name": reason_dict_end}]
)
}
)
},
]
)
}
)
},
], is_top_level=True,
),
}
print("\nGeneral idea is:\n")
pprint(general_idea)
print(
"\nResult, desired data structure, for non-UGRID bad named field "
"case is:\n"
)
pprint(non_ugrid_bad_names_example)
print(
"\nResult, desired data structure, for UGRID bad named field case "
"is:\n"
)
pprint(ugrid_bad_names_example)