Skip to content

Output for Field.dataset_compliance towards a CF Checker #365

@sadielbartholomew

Description

@sadielbartholomew

We are planning to update the current dataset_compliance method available on a Field (or its Domain) to provide as its output a complete and general summary of the CF Conventions compliance of the Field in line with the canonical Conformance document. The output encoding this information would be machine-readable in some agreed and easily-parsable structure, which could then be processed to produce a human-digestible report (and even fancier views such as graphs with nodes for families of variables and attributes etc. leading to ultimate reasons for non-compliance), for example as served up on a browser page as with the (lately necessarily unmaintained) NCAS/CEDA CF Checker.

This has been in discussion over the past month or so, with a PR in preparation for a bit longer than that, but I will use this Issue to register our plans and to house any discussion on converging towards a design for the final data structure for the output of the dataset_compliance method. (This saves from needing to open the PR in preparation pre-maturely in draft form.)

Data structure for output

After some discussions over the past month we've agreed - as I understand it, please correct me if anything seems amiss etc. - that the structure should be as follows, using to illustrate two field examples, one a non-UGRID field and the other a UGRID field (as per EXPECT project focus), the data structure we want to emerge is as follows, with notable features/points being:

  • information about intermediate variables was previously not registered in the output - we want to include it those to clarify the relationship between attributes which have problems noted;
  • we address the above with a nesting structure whereby variables are keys against an objects_dict (see below 'Code to generate') noting problems with attributes and/or dimensions, and either of the latter are keys against a reason_dict (see also below) having reasons registered either as a nested list of dicts or, eventually, a string value, along with the value of the attribute itself (and a code corresponding to the issue, only if the reason is a singular string case);
  • we (sadly) have to use a list of dicts for reasons as per the above point, rather than a simple dict with multiple items as individual keys, because for (at least) the case of cell methods, there is no variable to register an issue under - so we'd need a way to register a reason_dict without anything to key it under, and a list seems most appropriate;
  • the CF version checked for compliance against only needs to be noted once, at the top-level - not repeated throughout the structure as-is with the present prototype in main.

General structure

If a field itself and one of the attributes defined on it had a bad standard name, the following would represent this, noting I have only populated the 'attributes' parts of the structure for demonstration, where the 'dimensions' keys would have similar structure to register dimensional issues:

{'the_field_variable': {'CF_version': '1.12',
                        'attributes': [{'standard_name': {'code': 0,
                                                          'reason': 'some '
                                                                    'string '
                                                                    'reason '
                                                                    'e.g. bad '
                                                                    'standard '
                                                                    'name',
                                                          'value': None}},
                                       {'an_attribute': {'reason': {'child_variable': {'attributes': [{'standard_name': {'code': 0,
                                                                                                                         'reason': 'some '
                                                                                                                                   'string '
                                                                                                                                   'reason '
                                                                                                                                   'e.g. '
                                                                                                                                   'bad '
                                                                                                                                   'standard '
                                                                                                                                   'name',
                                                                                                                         'value': None}}],
                                                                                       'dimension_sizes': {},
                                                                                       'dimensions': []}},
                                                         'value': None}}],
                        'dimension_sizes': {},
                        'dimensions': []}}

Examples

Example UGRID field

{'pa': {'CF_version': '1.12',
        'attributes': [{'standard_name': {'code': 0,
                                          'reason': 'some string reason e.g. '
                                                    'bad standard name',
                                          'value': None}},
                       {'mesh': {'reason': {'Mesh2': {'attributes': [{'standard_name': {'code': 0,
                                                                                        'reason': 'some '
                                                                                                  'string '
                                                                                                  'reason '
                                                                                                  'e.g. '
                                                                                                  'bad '
                                                                                                  'standard '
                                                                                                  'name',
                                                                                        'value': None}},
                                                                     {'edge_node_connectivity': {'reason': {'Mesh2_edge_nodes': {'attributes': [{'standard_name': {'code': 0,
                                                                                                                                                                   'reason': 'some '
                                                                                                                                                                             'string '
                                                                                                                                                                             'reason '
                                                                                                                                                                             'e.g. '
                                                                                                                                                                             'bad '
                                                                                                                                                                             'standard '
                                                                                                                                                                             'name',
                                                                                                                                                                   'value': None}}],
                                                                                                                                 'dimension_sizes': {},
                                                                                                                                 'dimensions': []}},
                                                                                                 'value': None}},
                                                                     {'face_face_connectivity': {'reason': {'Mesh2_face_links': {'attributes': [{'standard_name': {'code': 0,
                                                                                                                                                                   'reason': 'some '
                                                                                                                                                                             'string '
                                                                                                                                                                             'reason '
                                                                                                                                                                             'e.g. '
                                                                                                                                                                             'bad '
                                                                                                                                                                             'standard '
                                                                                                                                                                             'name',
                                                                                                                                                                   'value': None}}],
                                                                                                                                 'dimension_sizes': {},
                                                                                                                                 'dimensions': []}},
                                                                                                 'value': None}},
                                                                     {'face_node_connectivity': {'reason': {'Mesh2_face_nodes': {'attributes': [{'standard_name': {'code': 0,
                                                                                                                                                                   'reason': 'some '
                                                                                                                                                                             'string '
                                                                                                                                                                             'reason '
                                                                                                                                                                             'e.g. '
                                                                                                                                                                             'bad '
                                                                                                                                                                             'standard '
                                                                                                                                                                             'name',
                                                                                                                                                                   'value': None}}],
                                                                                                                                 'dimension_sizes': {},
                                                                                                                                 'dimensions': []}},
                                                                                                 'value': None}}],
                                                      'dimension_sizes': {},
                                                      'dimensions': []}},
                                 'value': None}}],
        'dimension_sizes': {},
        'dimensions': []}}

Example non-UGRID field

Say there were some bad standard names: on the variable corresponding to the field itself, something under the ancil variable, a cell measure, and some orogoraphy related variables:

{'ta': {'CF_version': '1.12',
        'attributes': [{'standard_name': {'code': 0,
                                          'reason': 'some string reason e.g. '
                                                    'bad standard name',
                                          'value': None}},
                       {'ancillary_variables': {'reason': {'air_temperature_standard_error': {'attributes': [{'standard_name': {'code': 0,
                                                                                                                                'reason': 'some '
                                                                                                                                          'string '
                                                                                                                                          'reason '
                                                                                                                                          'e.g. '
                                                                                                                                          'bad '
                                                                                                                                          'standard '
                                                                                                                                          'name',
                                                                                                                                'value': None}}],
                                                                                              'dimension_sizes': {},
                                                                                              'dimensions': []}},
                                                'value': None}},
                       {'cell_measures': {'reason': {'cell_measure': {'attributes': [{'standard_name': {'code': 0,
                                                                                                        'reason': 'some '
                                                                                                                  'string '
                                                                                                                  'reason '
                                                                                                                  'e.g. '
                                                                                                                  'bad '
                                                                                                                  'standard '
                                                                                                                  'name',
                                                                                                        'value': None}}],
                                                                      'dimension_sizes': {},
                                                                      'dimensions': []}},
                                          'value': None}},
                       {'surface_altitude': {'reason': {'x': {'attributes': [{'standard_name': {'code': 0,
                                                                                                'reason': 'some '
                                                                                                          'string '
                                                                                                          'reason '
                                                                                                          'e.g. '
                                                                                                          'bad '
                                                                                                          'standard '
                                                                                                          'name',
                                                                                                'value': None}}],
                                                              'dimension_sizes': {},
                                                              'dimensions': []},
                                                        'y': {'attributes': [{'standard_name': {'code': 0,
                                                                                                'reason': 'some '
                                                                                                          'string '
                                                                                                          'reason '
                                                                                                          'e.g. '
                                                                                                          'bad '
                                                                                                          'standard '
                                                                                                          'name',
                                                                                                'value': None}}],
                                                              'dimension_sizes': {},
                                                              'dimensions': []}},
                                             'value': None}}],
        'dimension_sizes': {},
        'dimensions': []}}

Code to generate data structure cases above for ease of editing

from pprint import pprint

reason_dict = {"reason": {}, "value": None}
# Use None and 0 as placeholders for actual code value and 'value' value
reason_dict_end = {
    "reason": "some string reason e.g. bad standard name",
    "code": 0,
    "value": None
}
objects_dict = {
    # Not showing in this demo, but basically dimension names as keys with
    # sizes as values, a simple non-nested dict structure.
    "dimension_sizes": {},
    # Lists of dicts of relevant object info. in a congruous way
    "dimensions": [],
    "attributes": [],   
}


def populate_reason_dict(set_reason_info):
    d = reason_dict.copy()

    # Ignore values (singular) for this demo which are only to indicate the
    # structure
    d["reason"] = set_reason_info

    return d


def populate_objects_dict(
        set_attrs_info, set_dims_info=False, set_dims_sizes=False,
        is_top_level=False,
):
    d = objects_dict.copy()
    if is_top_level:
        d["CF_version"] = "1.12"  # placeholder for actual value

    if set_attrs_info:
        d["attributes"] = set_attrs_info
    if set_dims_info:
        d["dimensions"] = set_dims_info

    return d


general_idea = {
       "the_field_variable": populate_objects_dict(
        [
            {"standard_name": reason_dict_end},
            {
                "an_attribute": populate_reason_dict(
                    {
                        "child_variable": populate_objects_dict(
                            [
                                {"standard_name": reason_dict_end},
                            ]
                        )
                    }
                )
            },
        ], is_top_level=True,
    ),
}

non_ugrid_bad_names_example = {
    "ta": populate_objects_dict(
        [
            {"standard_name": reason_dict_end},
            {"ancillary_variables": populate_reason_dict(
                {
                "air_temperature_standard_error": populate_objects_dict(
                    [{"standard_name": reason_dict_end}]
                )
                }
            )},
            {"cell_measures": populate_reason_dict(
                {
                "cell_measure": populate_objects_dict(
                    [{"standard_name": reason_dict_end}]
                )
                }
            )},
            {"surface_altitude": populate_reason_dict(
                {
                "x": populate_objects_dict(
                    [{"standard_name": reason_dict_end}]
                ),
                "y": populate_objects_dict(
                    [{"standard_name": reason_dict_end}]
                )
                },
            )},
        ], is_top_level=True,
    ),
}

ugrid_bad_names_example = {
    "pa": populate_objects_dict(
        [
            {"standard_name": reason_dict_end},
            {
                "mesh": populate_reason_dict(
                    {
                        "Mesh2": populate_objects_dict(
                            [
                                {"standard_name": reason_dict_end},
                                {
                                    "edge_node_connectivity": populate_reason_dict(
                                        {
                                            "Mesh2_edge_nodes": populate_objects_dict(
                                                [{"standard_name": reason_dict_end}]
                                            )
                                        }
                                    )
                                },
                                {
                                    "face_face_connectivity": populate_reason_dict(
                                        {
                                            "Mesh2_face_links": populate_objects_dict(
                                                [{"standard_name": reason_dict_end}]
                                            )
                                        }
                                    )
                                },
                                {
                                    "face_node_connectivity": populate_reason_dict(
                                        {
                                            "Mesh2_face_nodes": populate_objects_dict(
                                                [{"standard_name": reason_dict_end}]
                                            )
                                        }
                                    )
                                },
                            ]
                        )
                    }
                )
            },
        ], is_top_level=True,
    ),
}

print("\nGeneral idea is:\n")
pprint(general_idea)

print(
    "\nResult, desired data structure, for non-UGRID bad named field "
    "case is:\n"
)
pprint(non_ugrid_bad_names_example)

print(
    "\nResult, desired data structure, for UGRID bad named field case "
    "is:\n"
)
pprint(ugrid_bad_names_example)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions