Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Version 0.6 #9

Open
wants to merge 1 commit into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion .nvmrc
Original file line number Diff line number Diff line change
@@ -1 +1 @@
16
18
16 changes: 8 additions & 8 deletions DATA.md
Original file line number Diff line number Diff line change
@@ -1,18 +1,18 @@
Static data files
=================
How to generate static data files
=================================


Load GNAF database
------------------

* Install PostgreSQL 14 and PostGIS
* Install PostgreSQL and PostGIS (tested against PostgreSQL versions 14 and 15)
* Create database (name assumed to be `gnaf` but can be anything)
* Enable PostGIS extension on database:
* `CREATE EXTENSION IF NOT EXISTS postgis;`
* Download database dumps from [gnaf-loader](https://github.com/minus34/gnaf-loader) (see Option 3)
* Download database dumps from [gnaf-loader](https://github.com/minus34/gnaf-loader) (see Option 3, GDA2020 version)
* Import database dumps using pg_restore
* `pg_restore -d gnaf gnaf-202111.dmp`
* `pg_restore -d gnaf admin-bdys-202111.dmp`
* `pg_restore -d gnaf gnaf-202208.dmp`
* `pg_restore -d gnaf admin-bdys-202208.dmp`


Generate data files
Expand All @@ -21,6 +21,6 @@ Generate data files
* Run bin/data.ts with PostgreSQL environment variables:
* `PGDATABASE=gnaf ./bin/data.ts`
* Optional: Pre-compress data files:
* `gzip -9 *.txt`
* Or ideally, if server handles brotli: `brotli --rm *.txt`
* `gzip -r -v -9 data/`
* Or for even smaller files: `pigz -r -v -11 data/`
* Upload files to static hosting service / S3 bucket / etc.
77 changes: 77 additions & 0 deletions INDEXES.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
Address index format
====================


How addresses are indexed and stored
------------------------------------

Address indexes are stored in static text files containing newline-delimetered JSON.

Each file contains a subset of addresses that share a common street number and metaphone (a phonetic representation) of the street name or suburb.


JSON structure
--------------

Each line of JSON represents a "street" containing the street name, suburb, state/territory and postcode.

Within each street there is an array of "blocks" on that street. Each block has a street number and coordinates. Coordinates are stored as a 9-character [geohash](https://en.wikipedia.org/wiki/Geohash) to conserve space and improve compression.

Optionally, within each "block" there may an object mapping unit types to an array of unit numbers.

For a TypeScript type definition, refer to the `Street` type in `src/types.ts`.


Where addresses are indexed
---------------------------

Addresses are indexed by words in the street name and suburb. Additionally, addresses are indexed by the street number.

For each word in the street name and suburb, we take the first letter of the word and the metaphone (a phonetic represenation) of the word. We then index by a combination of the first letter and first three characters of the metaphone. We also index by a combination of the first letter, the street number modulo 20 and the first two characters of the metaphone.

For example, this address:

__Shop 17, Australia Square, 264-278 George Street, Sydney NSW 2000__

...will appear in the following indexes:

* `G/4/JR` (G = first letter of GEORGE, 4 = 264 modulo 20, JR = 2-character metaphone of GEORGE)
* `G/_/JRJ` (G = first letter of GEORGE, _ = no street number, JRJ = 3-character metaphone of GEORGE)
* `S/4/ST` (S = first letter of SYDNEY, 4 = 264 modulo 20, ST = 2-character metaphone of SYDNEY)
* `S/_/STN` (S = first letter of SYDNEY, _ = no street number, STN = 3-character metaphone of SYDNEY)


How an address look will search indexes
---------------------------------------

Address lookups are first cleaned up by removing anything that's not a letter, number or space.

We then analyse each word. Words that includes a digit somewhere (e.g. "123", "123A") require an exact match in the address. Words that do not include a digit (e.g. "sydney", "sidney", "syd") will be searched using a "fuzzy" algorithm that allows for misspellings and prefixes.

For example, the following query:

`17/264 george st sidney`

...would be searched for in the following indexes:

* `G/17/JR` (G = first letter of GEORGE, 17 = 17 modulo 20, JR = 2-character metaphone of GEORGE)
* `G/4/JR` (G = first letter of GEORGE, 4 = 264 modulo 20, JR = 2-character metaphone of GEORGE)
* `S/17/ST` (S = first letter of ST, 17 = 17 modulo 20, ST = 2-character metaphone of ST)
* `S/4/ST` (S = first letter of ST, 4 = 264 modulo 20, ST = 2-character metaphone of ST)
* `S/17/ST` (S = first letter of SIDNEY, 17 = 17 modulo 20, ST = 2-character metaphone of SIDNEY)
* `S/4/ST` (S = first letter of SIDNEY, 4 = 264 modulo 20, ST = 2-character metaphone of SIDNEY)

If the query does not include a number then we have to widen the search to more potential addresses. We use the 3-character metaphones to offset the increase in potential addresses with a stricter fuzzy search.

For example, the following query:

`george st sidney`

...would be searched for in the following indexes:

* `G/_/JRJ` (G = first letter of GEORGE, _ = no number, JRJ = 3-character metaphone of GEORGE)
* `G/_/JRJ` (G = first letter of GEORGE, _ = no number, JRJ = 3-character metaphone of GEORGE)
* `S/_/ST` (S = first letter of ST, _ = no number, ST = 3-character metaphone of ST)
* `S/_/ST` (S = first letter of ST, _ = no number, ST = 3-character metaphone of ST)
* `S/_/STN` (S = first letter of SIDNEY, _ = no number, STN = 3-character metaphone of SIDNEY)
* `S/_/STN` (S = first letter of SIDNEY, _ = no number, STN = 3-character metaphone of SIDNEY)
16 changes: 11 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@ Browser-based geocoder for Australian addresses.
Features
--------

* Based on GNAF (Geocoded National Address File of Australia) database
* Based on GNAF (Geocoded National Address File of Australia) database version 202208
* Very fast (e.g. fast enough for search-as-you-type)
* Fuzzy search (typo tolerant, word prefixes, etc.)
* Infinitely scalable when served via a CDN
Expand All @@ -17,11 +17,11 @@ Function and return type
------------------------

```typescript
function geocode(input: string, options?: { limit?: number }): Promise<GeocodeResult>
function geocode(input: string, options?: { limit?: number, abortPrevious?: boolean }): Promise<GeocodeResult>

type GeocodeResult = {
input: string,
group: string,
startTime: number,
duration: number,
results: Array<{
address: string,
Expand All @@ -33,14 +33,20 @@ type GeocodeResult = {
}
```

Options
-------

* `limit`: Set to a positive integer to limit results, or leave undefined for no limit. (Default: `undefined`)
* `abortPrevious`: Set to `true` to abort unresolved calls to geocode() before starting this one. Intended for use with search-as-you-type user interfaces. Will abort pending HTTP requests that are no longer needed, and abort unresolved calls to geocode() by throwing a `GeocodeAbortError`. (Default: `false`)


Example
-------

```typescript
import geocode from 'geocoder';

geocode('114 grey', { limit: 5 }).then(result => console.log(result));
geocode('114 grey', { limit: 5, abortPrevious: true }).then(result => console.log(result)).catch(e => console.error(e));

/*
Returns addresses:
Expand Down Expand Up @@ -82,7 +88,7 @@ See [DATA.md](DATA.md)
Demo
----

https://www.abc.net.au/res/sites/news-projects/geocoder/0.1.0/
https://www.abc.net.au/res/sites/news-projects/geocoder/0.6.0/


Authors
Expand Down
Loading