-
Notifications
You must be signed in to change notification settings - Fork 170
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Adding name to latitude and longitude in the hash generation #65
base: master
Are you sure you want to change the base?
Conversation
I've been thinking for a long time about this. We should totally drop the way we are currently generating hashes. Also, we must stop giving them on the API as if they were ids. The things you mention are just the start. Most networks have jumping numbers on lat / lng, and they can change slightly over the day. So, precise lat / lng is not a good way to get hashes. I think the safest would be to rely on a rounded lat / lng + station uid (if available). Names, they are unsafe too. |
I see your point. However, from the current implementation, adding the name is better than considering only lat and long. The lat and long changing during the day is an actual problem, e.g., call-a-bike. Are there other networks doing the same? One possibility would be identifying the networks that change lat and long during the day, and applying the current hash for their stations' rounded lat and long. The name is not safe alone, but adding it helps to increase the differentiability among stations. I do not see a drawback on that. A solution that works for all networks would be harder than checking inside get_hash which is the network and applying a specific solution per network, if necessary. For instance, call-a-bike has the "standort_id", which could be applied in its case. For the others, the name, lat, and long is enough. |
Did not get that. I think these stations should be automatically discarded.
Yes, the samba system does it too.
How can we be sure that names are not going to change? They do change, as city councils or providers see fit to change them. I am going to play with stations having |
I still get the 0 on stockholm, that is strange we are getting different results. Indeed we cannot be sure that the name is not going to change. Ok, then it might be case of unifying the 'uid' key into the extra dict for all the parsers for a future implementation of the hash. Maybe looking again into each of the parsers because we may get the an uid-like field in many of the parsers. I confess that I have not looked too much after an uid in the ones I've implemented. |
3d97d79
to
2514dfb
Compare
Latitude and longitude alone are not a reliable way to obtain a hash that represents uniquely the stations.
For instance, consider Sevici's stations "213_AVENIDA DE ANDALUCIA" and "214_AVENIDA DE ANDALUCIA", both have same pair of latitude and longitude: (37.3871434869, -5.94841005455)
Furthermore, some systems do not have reliable information of latitude and longitude, e.g., Cyclocity Stockholm, reports (0.0, 0.0) for 4 out of 5 stations (refer to the extract bellow).
This PR aims to improve that by adding the station name in the info to be hashed.