Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[New Atlas Check] AbbreviatedAddressStreetCheck #667

Merged
merged 6 commits into from
Apr 12, 2022

Conversation

vladlemberg
Copy link
Collaborator

@vladlemberg vladlemberg commented Mar 22, 2022

Description:

A check to validate that Address Street addr:street road type abbreviation is spelled out in full. Please reference official OSM good practice mapping conventions.

Corresponded Issues:
#597
#409

Potential Impact:

  • Name Standardization.
  • Improving Geocoding service.

Unit Test Approach:

Simulate unit tests covering most common address types and false positives.

Test Results:

Passed.

ISO Total Flags Sampled Sampling % TP FP False Positive Rate
USA 50,000 5960 12% 5912 48 0.8%

Copy link
Collaborator

@Bentleysb Bentleysb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great. It looks like there is just a bit of configuration and sonar clean up needed. Should this only be enabled in English speaking countries, for the moment?

@atiannicelli
Copy link
Collaborator

This looks great @vladlemberg. Thanks! I agree with @Bentleysb that we should add a line to the configuration file that restricts this to a set of predominantly English speaking countries.

    "countries": [
      "USA", "GRB", "AUS", "CAN", "JAM", "IRL", <others?>
    ],

@Bentleysb - Do you think that having the street name txt file is the best way to store and access that information? It seems fine with me, but I know that we were curious if that would be acceptable, architecture wise.

@sonarqubecloud
Copy link

Kudos, SonarCloud Quality Gate passed!    Quality Gate passed

Bug A 0 Bugs
Vulnerability A 0 Vulnerabilities
Security Hotspot A 0 Security Hotspots
Code Smell A 0 Code Smells

96.3% 96.3% Coverage
0.0% 0.0% Duplication

@vladlemberg vladlemberg marked this pull request as ready for review March 31, 2022 21:45
Copy link
Collaborator

@atiannicelli atiannicelli left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks great to me.

final String[] abbreviationsVariations = roadTypeAbbreviations.split("\\|");
final List<String> temp = new ArrayList<>(
Arrays.asList(abbreviationsVariations));
this.roadTypeAbbreviationsMap.put(roadType, temp);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Given we want to search on the values, would it make more sense to invert this map? It would be less memory efficient, as the road type would be duplicated for each abbreviation. However, it seems worth it for the increase in run time from being able to drop the double for loop in the flag method, given the scale of data this is expected to run on.

@pscrimshaw pscrimshaw merged commit bf167fc into osmlab:dev Apr 12, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants