-
Notifications
You must be signed in to change notification settings - Fork 93
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suggestions for additional datasets #135
Comments
Have you tried to contact them? Can you suggest other data sources we can use instead? Thanks |
No, I didn't try. There is an open issue there. And although the repo was updated since that issue was opened, it doesn't seem to be done on regular basis (i.e. in automated fashion). I would suggest to use this source which looks more automated. But since I'm by no means affiliated with those people, I cannot guarantee it works better. |
Well, it turned out that the main JHU repo has actual |
Hi @kirienko. After speding some time I was not able to validate the data from the other repo. Moreover, I'm afraid it may be discountinued as well in the future. I agree that the best choice would be JHU since it seems there are not open governmental data for Russia (this is what they said when we tried to contact them some months ago). |
Hi @eguidotti. Thank you so much for your efforts! I really appreciate your work! |
hi. i have been pulling and cleaning up the JHU data, more or less since the beginning. just having been pointed in your direction, i was thinking of converting to your data. but, it's true that the data i've been processing might have some data you want. if you want daily changes, you'd want the columns whose names end in "_Changes_1". my github repository is here, though i wouldn't want to be the first person other than me trying to actually do a build. if you're interested, and have any questions, please let me know. cheers! |
Hi @greg-minshall and thanks for your message. |
btw: the README.org in my repo had old links; i've just changed that (in case you poke around), hopefully correctly. cheers. |
@greg-minshall Well actually, it looks quite interesting! It seems to me the file I could easily integrate in this repository is the following: https://somenumbers.info/covid-19/csvs/coleaned.csv.gz Just a couple of questions:
Thanks! |
@eguidotti coleaned is basically a cleaned up version of the the entire run of JHU "daily" files. yes, the files are updated daily. i think, in the past six months, there have been only a few glitches, maybe one time when something changed in the JHU data. i don't have a citation. you can just say "Greg Minshall", some such. or, a pointer to the repo. if coleaned works, that's great, as it's the smallest file, and you'll be the closest to JHU (in terms of me messing with the data). the cleaning performed? you can look in then ( i also drop some JHU columns
as i think i can derive those from the existing data. (though, in fact, i don't.) there's filtering: remove duplicates, take only the last observation ( i add FIPS and Iso3c columns. there's also some textual transformations ( that seems to be about it. |
i realized in my listing of transformations i missed some bits that used to be in the file |
Thanks @greg-minshall for the information. I have integrated the data for Russia. Let's wait a couple of hours for the workflow to complete and see if we can close this long-standing issue. I was also interested in the As far as I understand, the other files are aggregating the numbers. E.g. compute the totals for Alabama by summing up together all entries that include Alabama as the upper level in the combined key. Is that correct? At a first stage of this project, I was also aggregating the data in this way but then I noticed that it usually doesn't work. In my experience, they almost never matched with the data provided directly for the upper level. For instance, if only one city is missing in the data, the aggregated state-wise counts are downward biased. Moreover, the data released for the upper level may include travelers or cases in which it is not known the exact location. So unfortunately I won't be able to use the aggregated data. |
@eguidotti, you're welcome. i hope it helps. let me know. i only use JHU data. i think 'Recovered' comes from JHU's yes, you're right about the aggregation technique. i think when i originally did that work i did some verification. if i look at the JHU data now, for example, for California on 2021-04-04 ( were you looking at these "daily reports"? or, the more often used "time series" (that's a set of data i don't use so am not familiar with). |
@greg-minshall yes, it works and I'm going to close this issue. Thanks a lot!
Time series data
I guess that's the case for JHU. What I mean with "the data released for the upper level" are actually data that are released directly from the government for the upper level (not necessarily US, but around the world). In general, when I aggregated data from the lower levels I never got the counts provided for the upper level. Also, in many cases JHU data (aggregated or not) do not match the ones available from open governmental data. That's basically the motivation behind this repo :) We try to pull the data from the official providers whenever possible. But in many cases it is not possible, and works like yours are very useful! |
@eguidotti ah, "ground truth", or whatever the saying is. no, i decided early on that for me, btw, i've killed off the embarrassing also, if you ever wanted (as a backup, say, to my build process), probably producing a daily |
Ok I downloaded your repo as a backup, but I hope everything will go smoothly. Thanks again! |
is it legal, useful, to post to a closed issue? anyway, @eguidotti, you might look at this issue on my site. i won't do anything about this soon, but that data set might also appeal to you (instead of my coleaned.csv). i'll be curious of your thoughts. cheers. |
Hi @greg-minshall, thanks for posting this! |
yes, i agree. cheers! |
After months of work... it's done! The new version is available. Please see the changelog |
Emanuele, congratulations. are you still pulling from my data (for states/provinces/oblasts)? just so i can feel un-guilty if/when my builds break... :) |
Hi Greg, I have switched to the JHU unified dataset as you suggested. Many thanks for your package and your input, it has been very useful! |
good -- enjoy! |
It seems that data for Russia,
level=2
is coming from this repo which itself is not updating. Of course it's easy to say «That's not our issue but theirs.» But no.The text was updated successfully, but these errors were encountered: