Skip to content

Commit

Permalink
Fix invalid standard names, add standard name checking script and git…
Browse files Browse the repository at this point in the history
…hub action (#52)

This PR introduces a Github Action to automatically check the validity of Standard Names against the list of criteria specified by the CF conventions: Must only contain lowercase letters, numerals, or underscores, and the first character must be a letter. This is achieved with a new script (tools/check_name_rules.py) that parses the given standard names xml file, and checks each name against these rules, returning an error if any invalid names are found. This action will run on all branches for any new commits to ensure any new names follow these rules.

This new script revealed several existing invalid names; those have been corrected as part of this PR.
  • Loading branch information
mkavulich authored Dec 4, 2023
1 parent d8b060d commit 67783d1
Show file tree
Hide file tree
Showing 6 changed files with 140 additions and 35 deletions.
30 changes: 28 additions & 2 deletions .github/workflows/pull_request_ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,13 @@ name: Pull request checks

on:
pull_request:
workflow_dispatch:

jobs:
check-unique-standard-names:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: actions/setup-python@v4
with:
Expand All @@ -21,10 +22,33 @@ jobs:
- name: Check for duplicate standard names
run: tools/check_xml_unique.py standard_names.xml

check-name-rules:
name: Check standard names against rules
runs-on: ubuntu-latest

steps:
- name: Checkout repository
uses: actions/checkout@v4

- uses: actions/setup-python@v4
with:
python-version: "3.x"

# Install dependencies
- name: Install dependencies
run: |
sudo apt-get update
sudo apt-get -y install libxml2-utils
- name: Checks standard names against character rules
run: |
python3 tools/check_name_rules.py -s standard_names.xml
check-rerendered-markdown:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v3
- uses: actions/checkout@v4

- uses: actions/setup-python@v4
with:
Expand All @@ -43,3 +67,5 @@ jobs:
checksum=$(sha256sum Metadata-standard-names.md)
tools/write_standard_name_table.py standard_names.xml
test "$checksum" = "$(sha256sum Metadata-standard-names.md)"
26 changes: 14 additions & 12 deletions Metadata-standard-names.md
Original file line number Diff line number Diff line change
Expand Up @@ -200,7 +200,7 @@ Note that appending '_on_previous_timestep' to standard_names in this section yi
* `real(kind=kind_phys)`: units = W m-2
* `cumulative_boundary_flux_of_total_water`: Cumulative boundary flux of total water
* `real(kind=kind_phys)`: units = W m-2
* `US_standard_air_pressure_at_sea_level`: US Standard Atmospheric pressure at sea level
* `us_standard_air_pressure_at_sea_level`: US Standard Atmospheric pressure at sea level
* `real(kind=kind_phys)`: units = Pa
* `reference_pressure`: reference pressure used in definition of potential temperature, Exner function, etc.
* `real(kind=kind_phys)`: units = Pa
Expand Down Expand Up @@ -428,9 +428,9 @@ Standard / required CCPP variables
* `real(kind=kind_phys)`: units = frac
* `critical_relative_humidity_at_toa`: Critical relative humidity at toa
* `real(kind=kind_phys)`: units = frac
* `date_and_time_at_model_initialization_in_ISO_order`: Date and time at model initialization in ISO order
* `date_and_time_at_model_initialization_in_iso_order`: Date and time at model initialization in iso order
* `integer(kind=)`: units = 1
* `date_and_time_at_model_initialization_in_United_States_order`: Date and time at model initialization in United States order
* `date_and_time_at_model_initialization_in_united_states_order`: Date and time at model initialization in united states order
* `integer(kind=)`: units = 1
* `decorrelation_length_used_by_overlap_method`: Decorrelation length used by overlap method
* `real(kind=kind_phys)`: units = km
Expand Down Expand Up @@ -496,13 +496,13 @@ Standard / required CCPP variables
* `logical(kind=)`: units = flag
* `do_diagnostics`: Do diagnostics
* `logical(kind=)`: units = flag
* `do_XYZ_dimensioned_diagnostics`: Do XYZ dimensioned diagnostics
* `do_xyz_dimensioned_diagnostics`: Do xyz dimensioned diagnostics
* `logical(kind=)`: units = flag
* `do_flip`: Do flip
* `logical(kind=)`: units = flag
* `control_for_flux_adjusting_surface_data_assimilation_system`: Control for flux adjusting surface data assimilation system
* `integer(kind=)`: units = 1
* `do_flux_form_in chikira_sugiyama_deep_convection_scheme`: Do flux form in chikira sugiyama deep convection scheme
* `do_flux_form_in_chikira_sugiyama_deep_convection_scheme`: Do flux form in chikira sugiyama deep convection scheme
* `logical(kind=)`: units = flag
* `do_nrl_2015_ozone_scheme`: Do nrl 2015 ozone scheme
* `logical(kind=)`: units = flag
Expand Down Expand Up @@ -560,7 +560,7 @@ Standard / required CCPP variables
* `logical(kind=)`: units = flag
* `control_for_land_surface_scheme_frozen_soil_permeability`: Control for land surface scheme frozen soil permeability
* `integer(kind=)`: units = 1
* ` do_cellular_automata_gaussian_spatial_filter`: do cellular automata gaussian spatial filter
* `do_cellular_automata_gaussian_spatial_filter`: Do cellular automata gaussian spatial filter
* `logical(kind=)`: units = flag
* `do_gcycle_surface_option`: Do gcycle surface option
* `logical(kind=)`: units = flag
Expand All @@ -584,7 +584,7 @@ Standard / required CCPP variables
* `logical(kind=)`: units = flag
* `do_global_cellular_automata_closure`: Do global cellular automata closure
* `logical(kind=)`: units = flag
* ` do_global_cellular_automata_deep_convective_entrainment`: do global cellular automata deep convective entrainment
* `do_global_cellular_automata_deep_convective_entrainment`: Do global cellular automata deep convective entrainment
* `logical(kind=)`: units = flag
* `do_global_cellular_automata_trigger`: Do global cellular automata trigger
* `logical(kind=)`: units = flag
Expand Down Expand Up @@ -826,11 +826,11 @@ Standard / required CCPP variables
* `logical(kind=)`: units = flag
* `do_longwave_scattering_in_cloud_optics`: Do longwave scattering in cloud optics
* `logical(kind=)`: units = flag
* `do_tracer_XYZ_dimensioned_diagnostics`: Do tracer XYZ dimensioned diagnostics
* `do_tracer_xyz_dimensioned_diagnostics`: Do tracer xyz dimensioned diagnostics
* `logical(kind=)`: units = flag
* `control_for_variable_bulk_richardson_number`: Control for variable bulk richardson number
* `real(kind=kind_phys)`: units = 1
* `date_and_time_of_forecast_in_United_States_order`: Date and time of forecast in United States order
* `date_and_time_of_forecast_in_united_states_order`: Date and time of forecast in united states order
* `integer(kind=)`: units = 1
* `forecast_utc_hour`: Forecast utc hour
* `real(kind=kind_phys)`: units = h
Expand Down Expand Up @@ -1048,11 +1048,11 @@ Standard / required CCPP variables
* `character(kind=len=64)`: units = none
* `filename_of_internal_namelist`: Filename of internal namelist
* `character(kind=len=256)`: units = none
* `number_of_XY_dimensioned_auxiliary_arrays`: Number of XY dimensioned auxiliary arrays
* `number_of_xy_dimensioned_auxiliary_arrays`: Number of xy dimensioned auxiliary arrays
* `integer(kind=)`: units = count
* `number_of_pdf_based_variables_in_xyz_dimensioned_restart_array`: Number of pdf based variables in xyz dimensioned restart array
* `integer(kind=)`: units = count
* `number_of_XYZ_dimensioned_auxiliary_arrays`: Number of XYZ dimensioned auxiliary arrays
* `number_of_xyz_dimensioned_auxiliary_arrays`: Number of xyz dimensioned auxiliary arrays
* `integer(kind=)`: units = count
* `number_of_radiatively_active_gases`: Number of radiatively active gases
* `integer(kind=)`: units = count
Expand Down Expand Up @@ -1192,7 +1192,7 @@ Standard / required CCPP variables
* `real(kind=kind_phys)`: units = 1
* `thickness_of_soil_layers_for_land_surface_model`: Thickness of soil layers for land surface model
* `real(kind=kind_phys)`: units = m
* ` cellular_automata_vertical_velocity_perturbation_threshold_for_deep_convection`: cellular automata vertical velocity perturbation threshold for deep convection
* `cellular_automata_vertical_velocity_perturbation_threshold_for_deep_convection`: Cellular automata vertical velocity perturbation threshold for deep convection
* `real(kind=kind_phys)`: units = m s-1
* `period_of_maximum_diagnostics_reset`: Period of maximum diagnostics reset
* `real(kind=kind_phys)`: units = s
Expand Down Expand Up @@ -1236,6 +1236,8 @@ Standard / required CCPP variables
* `integer(kind=)`: units = mm
* `index_of_water_vegetation_category`: Index of water vegetation category
* `integer(kind=)`: units = index
* `filename_of_micm_configuration`: Filename of micm configuration
* `character(kind=len=*)`: units = none
## GFS_typedefs_GFS_interstitial_type
* `cloud_ice_mixing_ratio_wrt_moist_air_interstitial`: Cloud ice mixing ratio wrt moist air interstitial
* `real(kind=kind_phys)`: units = kg kg-1
Expand Down
24 changes: 12 additions & 12 deletions standard_names.xml
Original file line number Diff line number Diff line change
Expand Up @@ -310,7 +310,7 @@
<standard_name name="cumulative_boundary_flux_of_total_water">
<type kind="kind_phys" units="W m-2">real</type>
</standard_name>
<standard_name name="US_standard_air_pressure_at_sea_level"
<standard_name name="us_standard_air_pressure_at_sea_level"
long_name="US Standard Atmospheric pressure at sea level">
<type kind="kind_phys" units="Pa">real</type>
</standard_name>
Expand Down Expand Up @@ -712,10 +712,10 @@
<standard_name name="critical_relative_humidity_at_toa">
<type kind="kind_phys" units="frac">real</type>
</standard_name>
<standard_name name="date_and_time_at_model_initialization_in_ISO_order">
<standard_name name="date_and_time_at_model_initialization_in_iso_order">
<type kind="" units="1">integer</type>
</standard_name>
<standard_name name="date_and_time_at_model_initialization_in_United_States_order">
<standard_name name="date_and_time_at_model_initialization_in_united_states_order">
<type kind="" units="1">integer</type>
</standard_name>
<standard_name name="decorrelation_length_used_by_overlap_method">
Expand Down Expand Up @@ -814,7 +814,7 @@
<standard_name name="do_diagnostics">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="do_XYZ_dimensioned_diagnostics">
<standard_name name="do_xyz_dimensioned_diagnostics">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="do_flip">
Expand All @@ -823,7 +823,7 @@
<standard_name name="control_for_flux_adjusting_surface_data_assimilation_system">
<type kind="" units="1">integer</type>
</standard_name>
<standard_name name="do_flux_form_in chikira_sugiyama_deep_convection_scheme">
<standard_name name="do_flux_form_in_chikira_sugiyama_deep_convection_scheme">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="do_nrl_2015_ozone_scheme">
Expand Down Expand Up @@ -910,7 +910,7 @@
<standard_name name="control_for_land_surface_scheme_frozen_soil_permeability">
<type kind="" units="1">integer</type>
</standard_name>
<standard_name name=" do_cellular_automata_gaussian_spatial_filter">
<standard_name name="do_cellular_automata_gaussian_spatial_filter">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="do_gcycle_surface_option">
Expand Down Expand Up @@ -946,7 +946,7 @@
<standard_name name="do_global_cellular_automata_closure">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name=" do_global_cellular_automata_deep_convective_entrainment">
<standard_name name="do_global_cellular_automata_deep_convective_entrainment">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="do_global_cellular_automata_trigger">
Expand Down Expand Up @@ -1309,13 +1309,13 @@
<standard_name name="do_longwave_scattering_in_cloud_optics">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="do_tracer_XYZ_dimensioned_diagnostics">
<standard_name name="do_tracer_xyz_dimensioned_diagnostics">
<type kind="" units="flag">logical</type>
</standard_name>
<standard_name name="control_for_variable_bulk_richardson_number">
<type kind="kind_phys" units="1">real</type>
</standard_name>
<standard_name name="date_and_time_of_forecast_in_United_States_order">
<standard_name name="date_and_time_of_forecast_in_united_states_order">
<type kind="" units="1">integer</type>
</standard_name>
<standard_name name="forecast_utc_hour">
Expand Down Expand Up @@ -1642,13 +1642,13 @@
<standard_name name="filename_of_internal_namelist">
<type kind="len=256" units="none">character</type>
</standard_name>
<standard_name name="number_of_XY_dimensioned_auxiliary_arrays">
<standard_name name="number_of_xy_dimensioned_auxiliary_arrays">
<type kind="" units="count">integer</type>
</standard_name>
<standard_name name="number_of_pdf_based_variables_in_xyz_dimensioned_restart_array">
<type kind="" units="count">integer</type>
</standard_name>
<standard_name name="number_of_XYZ_dimensioned_auxiliary_arrays">
<standard_name name="number_of_xyz_dimensioned_auxiliary_arrays">
<type kind="" units="count">integer</type>
</standard_name>
<standard_name name="number_of_radiatively_active_gases">
Expand Down Expand Up @@ -1858,7 +1858,7 @@
<standard_name name="thickness_of_soil_layers_for_land_surface_model">
<type kind="kind_phys" units="m">real</type>
</standard_name>
<standard_name name=" cellular_automata_vertical_velocity_perturbation_threshold_for_deep_convection">
<standard_name name="cellular_automata_vertical_velocity_perturbation_threshold_for_deep_convection">
<type kind="kind_phys" units="m s-1">real</type>
</standard_name>
<standard_name name="period_of_maximum_diagnostics_reset">
Expand Down
79 changes: 79 additions & 0 deletions tools/check_name_rules.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,79 @@
#!/usr/bin/env python3

"""
Check standard names database file for violations of standard name character rules
"""

import argparse
import sys
import os.path
import re

################################################
#Add CCPP framework (lib) modules to python path
################################################

_CURR_DIR = os.path.dirname(os.path.abspath(__file__))
sys.path.append(os.path.join(_CURR_DIR, "lib"))

#######################################
#Import needed framework python modules
#######################################

from xml_tools import find_schema_file, find_schema_version, validate_xml_file, read_xml_file

def main():
"""Parse the standard names database file and output a dictionary
where the keys are any standard names in violation of character rules,
and the values are lists of the specific rules violated
"""
#Parse arguments
parser = argparse.ArgumentParser(description=__doc__)

parser.add_argument("-s","--standard_name_file",
metavar='<standard names filename>',required=True,
type=str, help="XML file with standard name library")
args = parser.parse_args()

stdname_file = os.path.abspath(args.standard_name_file)
tree, root = read_xml_file(stdname_file)

# Validate the XML file
version = find_schema_version(root)
schema_name = os.path.basename(stdname_file)[0:-4]
schema_root = os.path.dirname(stdname_file)
schema_path = os.path.join(schema_root,schema_name)
schema_file = find_schema_file(schema_path, version)
if schema_file:
try:
validate_xml_file(stdname_file, schema_name, version, None,
schema_path=schema_root, error_on_noxmllint=True)
except ValueError:
raise ValueError(f"Invalid standard names file, {stdname_file}")
else:
raise ValueError(f'Cannot find schema file, {schema_name}, for {version=}')

#Parse list of standard names and see if any names violate one or more rules
violators = {}
legal_first_char = re.compile('[a-z]')
valid_chars = re.compile('[a-z0-9_]')
for name in root.findall('./section/standard_name'):
sname = name.attrib['name']
violations = []
if legal_first_char.sub('', sname[0]):
violations.append('First character is not a lowercase letter')
testchars = valid_chars.sub('', sname)
if testchars:
violations.append(f'Invalid characters are present: "{testchars}"')

# If any violations were detected, add an entry to "violators" dictionary
if violations:
violators[sname] = violations

if violators:
raise Exception(f"Violating standard names found:\n{violators}")
else:
print(f'Success! All standard names in {args.standard_name_file} follow the rules.')

if __name__ == "__main__":
main()
12 changes: 5 additions & 7 deletions tools/check_xml_unique.py
Original file line number Diff line number Diff line change
Expand Up @@ -59,11 +59,9 @@ def main_func():
validate_xml_file(stdname_file, schema_name, version, None,
schema_path=schema_root, error_on_noxmllint=True)
except ValueError:
raise ValueError("Invalid standard names file, {}".format(stdname_file))
raise ValueError(f"Invalid standard names file, {stdname_file}")
else:
raise ValueError(
'Cannot find schema file, {}, for version {}'.format(schema_name, version)
)
raise ValueError(f'Cannot find schema file, {schema_name}, for {version=}')

#get list of all standard names
all_std_names = []
Expand All @@ -87,8 +85,8 @@ def main_func():
print('The following duplicate standard names were found:')
for dup in dup_std_names:
rm_elements = root.findall('./section/standard_name[@name="%s"]'%dup)[1:]
print("{0}, ({1} duplicate(s))".format(dup, len(rm_elements)))
print('Removing duplicates and overwriting {}'.format(stdname_file))
print(f"{dup}, ({len(rm_elements)} duplicate(s))")
print(f'Removing duplicates and overwriting {stdname_file}')
for dup in dup_std_names:
first_use = True #Logical that indicates the first use of the duplicated name
rm_parents = root.findall('./section/standard_name[@name="%s"]..'%dup)
Expand All @@ -110,7 +108,7 @@ def main_func():
print('The following duplicate standard names were found:')
for dup in dup_std_names:
rm_elements = root.findall('./section/standard_name[@name="%s"]'%dup)[1:]
print("{0}, ({1} duplicate(s))".format(dup, len(rm_elements)))
print(f"{dup}, ({len(rm_elements)} duplicate(s))")
sys.exit(1)
else:
print('No duplicate standard names were found.')
Expand Down
4 changes: 2 additions & 2 deletions tools/lib/xml_tools.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,10 +11,10 @@
import subprocess
import sys
import logging
from distutils.spawn import find_executable
from shutil import which
import xml.etree.ElementTree as ET
try:
_XMLLINT = find_executable('xmllint')
_XMLLINT = which('xmllint')
except ImportError:
_XMLLINT = None
# end try
Expand Down

0 comments on commit 67783d1

Please sign in to comment.