Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make approximate contains slack configurable #67

Merged
merged 2 commits into from
Jan 22, 2024

Conversation

patrickbr
Copy link
Member

@patrickbr patrickbr commented Jan 16, 2024

For the named area contains relation, we use an approximate contains relation. Its slack has so far been hardcoded. It makes sense to makes this configurable. The first commit in this PR adds a variable APPROX_CONTAINS_SLACK in GeometryHandler.h.

This is particularly important as we currently cannot build the OpenHistoricalMap dataset (see ad-freiburg/qlever#1220) with an approximate contains relation. Basically, the problem is that for some administrative areas, we have many versions which differ only slightly. If the difference is within the allowed slack, none of our existing speedup heuristics are effective (they notice that we are not contained, but do not abort as we might be approximately contained), and we must compute a full intersection of the polygons A, B to compute whether A is approximately contained in B. If APPROX_CONTAINS_SLACK is 0, then we can immediately decide that A is not in B in such cases by a simple floating point comparison (the geometric area of A vs. the geometric area of B).

As an example, see https://www.openhistoricalmap.org/relation/2704379 and https://www.openhistoricalmap.org/relation/2704296, which are two of the (dozens of) versions of the boundaries of San Jose. Because of cases like this, it took 24h hours on our standard build machine to process 3% (sic!) of the named areas in the latest OHM planet file.

@lehmann-4178656ch, it would be nice if you could add a configuration parameter for this.

@patrickbr
Copy link
Member Author

patrickbr commented Jan 16, 2024

Side note: we should somehow merge OSM and OHM and use this as a test dataset for our new geospatial-joins tool.

@lehmann-4178656ch
Copy link
Member

After this is merged I'll add a configuration option as separate PR - we should discuss the naming of the flag.

@patrickbr patrickbr merged commit 80f85e1 into master Jan 22, 2024
@patrickbr patrickbr deleted the configurable-approx-slack branch January 22, 2024 10:03
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants