Skip to content

Processing

Win Olario edited this page Feb 22, 2022 · 3 revisions

Pre-processing

Formating and Style Guidelines (aka Style Manual)

Thanks to a user's comment and reference to the Chicago Manual of Style, this page will serve as a formatting and style guidelines for this project.

This "Style Manual" is a compilation of rules being followed to pre-process the DPWH-RBI dataset.

Arabic Numerals are preferable to Roman Numerals

For example, a bridge called "Santa Monica Bridge Ⅳ" is preferably tagged as

  • name=Santa Monica Bridge № 4
  • alt_name=Santa Monica Bridge Ⅳ
  • short_name=Santa Monica Bridge 4
    • use the correct symbols
Dashes

Use a space on either side of the en dash ("–", Unicode Character 'EN DASH' (U+2013)) between two nouns that both retain their original meaning. These are called ‘coordinate nouns’.

For a DPWH highway called "Ozamis City-Oroquieta City Road", tag this in OSM as

  • nat_name=Ozamis – Oroquieta Road
  • *drop "City" see toponyms

If a hyphen is used instead, a compound noun is created, and these cannot stand in for coordinate nouns.

Toponyms, Econyms

Use name key to capture the common name (i.e. without "City", "Barangay" , "Municipality of", etc.) with some exceptions. See LoCo convention on toponyms.

  • tag "Barangay Poblacion" as name=Poblacion
  • tag "Island Garden City of Samal" as name=Samal + official_name=Island Garden City of Samal
  • tag "Barangay Ⅳ-A" as name=Barangay 4-A + alt_name=Barangay Ⅳ-A
  • tag "Sitio Mabuhay" as name=Mabuhay + loc_name=Sitio Mabuhay
  • tag "Barangay #23" as name = Barangay 23
is_in

The is_in=* tag is primarily used for indexing features where a feature is located, and also for creating a category for searching. The conventional reading order is smaller to bigger (i.e. neighbourhood;village;city;region;country)

Useful in PH where OSM administrative boundaries are inconsistently available.

The values found in the is_in are separated by semi-colons without spaces, and may include multiple econyms where boundaries are shared or disputed.

  • A bridge found San Pedro, San Juan, and disputed boundary by San Nicolas in an imaginary town called Salem's Lot may be tagged as is_in=San Pedro;San Juan;San Nicolas;Salem's Lot
  • In some records of the RBI dataset, multiple values in BRGY, and MUN fields are inconsistently separated by slashes, ampersands, commas, or descriptive text.
    • some phrases like "intersection among barangays ...", "boundary of ..." or "Barangay X & Barangay Y" to describe bridges on boundaries.
    • Simplify to only include the common name of the toponyms involved as values for is_in
    • Values found are not always barangay names. There had been sitios, subdivisions, or misplaced town- or province-names.
Transformations
  • Abbreviations, acronyms are spelled-out in full.
  • Missing "ñ" are corrected (e.g. Doña not Dona, Santo Niño not Santo Nino)
  • Fix identifiable issues (typographic, orthographic)
  • removal of technical terms in object names
    • JCT - Junction
    • NRJ - National Road Junction"
    • (truss), (cantilever)
    • RCDG - Reinforced Concrete Deck Girder
    • RCBC - Reinforced Concrete Box Culvert

Workflow