Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Adding name to latitude and longitude in the hash generation #65

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

edumucelli
Copy link
Contributor

Latitude and longitude alone are not a reliable way to obtain a hash that represents uniquely the stations.
For instance, consider Sevici's stations "213_AVENIDA DE ANDALUCIA" and "214_AVENIDA DE ANDALUCIA", both have same pair of latitude and longitude: (37.3871434869, -5.94841005455)
Furthermore, some systems do not have reliable information of latitude and longitude, e.g., Cyclocity Stockholm, reports (0.0, 0.0) for 4 out of 5 stations (refer to the extract bellow).
This PR aims to improve that by adding the station name in the info to be hashed.

# Without fix
stockholm = pybikes.get('stockholm-cyclocity', 'KEY')
stockholm.update()
for station in stockholm.stations:
    print station.get_hash(), station.name, station.latitude, station.longitude
# Current output
fc3ce29e4cbee5e7185f3b528b4dd1bc VOLVO VAK 0.0 0.0
2cdf5487aa0d83466c8d2494408906fa DJURG�RDEN 59.32419 18.10028
fc3ce29e4cbee5e7185f3b528b4dd1bc VOLVO PV 0.0 0.0
fc3ce29e4cbee5e7185f3b528b4dd1bc VOLVOCITY 0.0 0.0
fc3ce29e4cbee5e7185f3b528b4dd1bc VOLVO PVE 0.0 0.0
# Output with fix
292d521cf46282c8a776eab6af284974 VOLVO VAK 0.0 0.0
3d36932c64ac7e7ff8b6b8b55ec89a11 DJURG�RDEN 59.32419 18.10028
e8f8d8498f7cf9dbe31fb4edf3410b5a VOLVO PV 0.0 0.0
cc8086b944e98b698e697f4c5c080249 VOLVOCITY 0.0 0.0
f9632d7b5edb16b789e1de331f881428 VOLVO PVE 0.0 0.0

@eskerda
Copy link
Owner

eskerda commented Sep 21, 2015

I've been thinking for a long time about this. We should totally drop the way we are currently generating hashes. Also, we must stop giving them on the API as if they were ids. hash != id, and it's totally my fault, when I implemented this I did not know any better I guess.

The things you mention are just the start. Most networks have jumping numbers on lat / lng, and they can change slightly over the day. So, precise lat / lng is not a good way to get hashes.

I think the safest would be to rely on a rounded lat / lng + station uid (if available). Names, they are unsafe too.

@edumucelli
Copy link
Contributor Author

I see your point. However, from the current implementation, adding the name is better than considering only lat and long.

The lat and long changing during the day is an actual problem, e.g., call-a-bike. Are there other networks doing the same?

One possibility would be identifying the networks that change lat and long during the day, and applying the current hash for their stations' rounded lat and long.

The name is not safe alone, but adding it helps to increase the differentiability among stations. I do not see a drawback on that.

A solution that works for all networks would be harder than checking inside get_hash which is the network and applying a specific solution per network, if necessary. For instance, call-a-bike has the "standort_id", which could be applied in its case. For the others, the name, lat, and long is enough.

@eskerda
Copy link
Owner

eskerda commented Oct 4, 2015

Cyclocity Stockholm, reports (0.0, 0.0) for 4 out of 5 stations

Did not get that. I think these stations should be automatically discarded.

The lat and long changing during the day is an actual problem, e.g., call-a-bike. Are there other networks doing the same?

Yes, the samba system does it too.

The name is not safe alone, but adding it helps to increase the differentiability among stations. I do not see a drawback on that.

How can we be sure that names are not going to change? They do change, as city councils or providers see fit to change them.

I am going to play with stations having uid plus a rounded latitude and longitude. We could even throw the network_id too, but since slugs do change that would not be future proof.

@edumucelli
Copy link
Contributor Author

I still get the 0 on stockholm, that is strange we are getting different results.

Indeed we cannot be sure that the name is not going to change.

Ok, then it might be case of unifying the 'uid' key into the extra dict for all the parsers for a future implementation of the hash. Maybe looking again into each of the parsers because we may get the an uid-like field in many of the parsers. I confess that I have not looked too much after an uid in the ones I've implemented.

@eskerda eskerda force-pushed the master branch 3 times, most recently from 3d97d79 to 2514dfb Compare March 31, 2016 03:16
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants