Skip to content

Commit

Permalink
Add new instructions for spatial join evaluation
Browse files Browse the repository at this point in the history
1. Setting up PostgreSQL and PostGIS for Ubuntu 24.04
2. Create a new database in a seperate directory
3. Get data from SPARQL endpoint and load it into the database
4. Create table with four columns (id, class, type, geometry)

TODO: Add instructions for the evaluation of variants of our spatial
join algorithm, using `scripts/spatialjoin-evaluation.py`.
  • Loading branch information
Hannah Bast committed Jan 9, 2025
1 parent c229a5e commit 225ea14
Showing 1 changed file with 128 additions and 0 deletions.
128 changes: 128 additions & 0 deletions evaluation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,128 @@
# Evaluation instructions and results

We evaluated the performance of our spatial join and compared it against
PostgreSQL+PostGIS. In the following sections, we provide instructions and
results for the evaluation.

## Setup PostgreSQL and PostGIS on Ubuntu 24.04

We first install the required packages:


```
sudo apt update
sudo apt install postgresql postgresql-contrib postgis
sudo apt-get install gdal-bin
```

Next, we create a new database in a directory of our choice.

```
export POSTGIS_DIR=/local/data-ssd/postgis/spatialjoin
sudo mkdir -p ${POSTGIS_DIR} && sudo chown postgres:postgres ${POSTGIS_DIR}
sudo -u postgres /usr/lib/postgresql/16/bin/initdb -D ${POSTGIS_DIR}
sudo vim ${POSTGIS_DIR}/postgresql.conf
EDIT: work_mem = 4MB
EDIT: max_worker_processes = 8
EDIT: max_parallel_workers_per_gather = 2
EDIT: max_parallel_workers = 8
sudo su - postgres -c "/usr/lib/postgresql/16/bin/pg_ctl -D ${POSTGIS_DIR} -l logfile start"
psql -U postgres -c "SHOW data_directory;"
psql -U postgres -c "SHOW work_mem;"
```

Optionally, set up the `postgres` user and group (only neeeded on machines,
where the changes to `/etc/passwd` and `/etc/group` from the installation are
not persistent), and create a simple `.bashrc` file for the `postgres` user.

```
export POSTGRES_UID=$(stat -c %u ${POSTGIS_DIR})
sudo groupadd -g 114 postgres && sudo useradd -u 115 -g 114 -s /bin/bash -m -d /local/data-ssd/postgis postgres
echo -e 'export PS1="\u@\h:\W$ "\nalias ll="ls -alh"' > .bashrc
ln -s .bashrc .bash_profile
```

## Create a database (if not already created), and list existing tables and indexes

Create a database `spatialjoin_db`, enable PostGIS, and list all public (that
is, not system) tables and indexes in the `spatialjoin_db` database (the `+`
provides additional information). The first two commands will do nothing if
the database and extension were already created before.


```
sudo su - postgres -c "createdb spatialjoin_db"
psql -U postgres -d spatialjoin_db -c "CREATE EXTENSION postgis;"
psql -U postgres -d spatialjoin_db -c "\dt+ public.*"
psql -U postgres -d spatialjoin_db -c "\di+ public.*"
```

To remove tables and indexes with a certain prefix, use the following commands:

```
psql -U postgres -d spatialjoin_db -c "DROP TABLE public.freiburg*;"
psql -U postgres -d spatialjoin_db -c "DROP INDEX public.freiburg*;"
```

## Get data, load it into the database, and query it

The following `curl` command produces a TSV file from a SPARQL query that
contains all geometries that are contained in the region specified by the
`osmrel:...` relation. The table has two columns: `osm_id` and `geometry`.

```
export NAME=freiburg
curl -s https://qlever.cs.uni-freiburg.de/api/osm-planet -H "Accept: text/csv" -H "Content-type: application/sparql-query" --data "PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX ogc: <http://www.opengis.net/rdf#> PREFIX osmrel: <https://www.openstreetmap.org/relation/> SELECT ?osm_id ?geometry WHERE { osmrel:62768 ogc:sfContains ?osm_id . ?osm_id geo:hasGeometry/geo:asWKT ?geometry }" | sed 's/,/\t/;s|https://www.openstreetmap.org/|osm|;s|/|:|;s/"//g' > $NAME.tsv
```

The following commands load the data into the database, and create a spatial index.
This produces one table called `${NAME}` and two indexes called `${NAME}_pkey`
and `${NAME}_geom_idx`.


```
psql -U postgres -d spatialjoin_db -c "CREATE TABLE ${NAME} (id VARCHAR PRIMARY KEY, geom GEOMETRY);"
psql -U postgres -d spatialjoin_db -c "CREATE TABLE ${NAME}_loader (id VARCHAR, geom_text VARCHAR);"
psql -U postgres -d spatialjoin_db -c "\copy ${NAME}_loader FROM '$(pwd)/${NAME}.tsv' WITH (FORMAT csv, DELIMITER E'\t', HEADER true);"
psql -U postgres -d spatialjoin_db -c "INSERT INTO ${NAME} (id, geom) SELECT id, ST_GeomFromText(geom_text, 4326) FROM ${NAME}_loader;"
psql -U postgres -d spatialjoin_db -c "DROP table ${NAME}_loader;"
psql -U postgres -d spatialjoin_db -c "SELECT COUNT(*) FROM ${NAME};"
psql -U postgres -d spatialjoin_db -c "CREATE INDEX ${NAME}_geom_idx ON ${NAME} USING GIST (geom);"
psql -U postgres -d spatialjoin_db -c "\dt+ public.${NAME}*"
psql -U postgres -d spatialjoin_db -c "\di+ public.${NAME}*"
```

The following commands compute the complete spatial self-join and output its
size. The first command only computes the number of candidate pairs (that is,
the number of pairs of geometries that have a bounding box overlap). The second
command computes the exact number of pairs that intersect.

```
psql -U postgres -d spatialjoin_db -c "SELECT COUNT(*) FROM ${NAME} AS a, ${NAME} AS b WHERE a.geom && b.geom;"
psql -U postgres -d spatialjoin_db -c "SELECT COUNT(*) FROM ${NAME} AS a, ${NAME} AS b WHERE ST_Intersects(a.geom, b.geom);"
```

# Create table for complete OSM data

The following produces a TSV file for all OSM objects of a certain class. The
TSV file has four columns: the OSM id, the class name, the type, and the
geometry.

```
export CLASS=building
curl -s https://qlever.cs.uni-freiburg.de/api/osm-planet -H "Accept: text/tab-separated-values" -H "Content-type: application/sparql-query" --data "PREFIX osm: <https://www.openstreetmap.org/> PREFIX geo: <http://www.opengis.net/ont/geosparql#> PREFIX ogc: <http://www.opengis.net/rdf#> PREFIX osmrel: <https://www.openstreetmap.org/relation/> PREFIX osmkey: <https://www.openstreetmap.org/wiki/Key:> SELECT (REPLACE(REPLACE(STR(?osm_id_), STR(osm:), \"osm\"), \"/\", \":\") AS ?osm_id) (REPLACE(STR(osmkey:${CLASS}), STR(osmkey:), \"\") AS ?predicate) ?type ?geometry WHERE { { SELECT ?osm_id_ (SAMPLE(?type_) AS ?type) WHERE { ?osm_id_ osmkey:${CLASS} ?type_ } GROUP BY ?osm_id_ } ?osm_id_ geo:hasGeometry/geo:asWKT ?geometry }" | sed 's/"//g;s/\^\^<http[^\t]*>$//' > ${CLASS}.tsv
```

```
export NAME=osm-planet
psql -U postgres -d spatialjoin_db -c "CREATE TABLE ${NAME} (id VARCHAR PRIMARY KEY, class VARCHAR, type VARCHAR, geom GEOMETRY);"
psql -U postgres -d spatialjoin_db -c "CREATE TABLE ${NAME}_loader (id VARCHAR, class VARCHAR, type VARCHAR, geom_text VARCHAR);"
psql -U postgres -d spatialjoin_db -c "COPY ${NAME}_loader FROM '$(pwd)/building.tsv' WITH (FORMAT csv, DELIMITER E'\t', HEADER true);"
psql -U postgres -d spatialjoin_db -c "COPY ${NAME}_loader FROM '$(pwd)/highway.tsv' WITH (FORMAT csv, DELIMITER E'\t', HEADER true);"
psql -U postgres -d spatialjoin_db -c "INSERT INTO ${NAME} (id, class, type, geom) SELECT DISTINCT ON (id) id, class, type, ST_GeomFromText(geom_text, 4326) FROM ${NAME}_loader;"
psql -U postgres -d spatialjoin_db -c "DROP table ${NAME}_loader;"
psql -U postgres -d spatialjoin_db -c "SELECT COUNT(*) FROM ${NAME};"
psql -U postgres -d spatialjoin_db -c "CREATE INDEX ${NAME}_geom_idx ON ${NAME} USING GIST (geom);"
psql -U postgres -d spatialjoin_db -c "\dt+ public.${NAME}*"
psql -U postgres -d spatialjoin_db -c "\di+ public.${NAME}*"
```

0 comments on commit 225ea14

Please sign in to comment.