-
Notifications
You must be signed in to change notification settings - Fork 54
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Pakistan (National Assembly): Fix official site import #53037
Conversation
Replaces #53025 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Mostly good, but there are a few loose ends here that are probably worth tidying up.
@@ -349,3 +347,4 @@ id,uuid | |||
1053,c03a2279-1684-479a-8165-c6098ebb1174 | |||
1054,5ed669b9-d426-4b19-8298-53cc6598d9e9 | |||
1056,f67e27f4-7419-425c-879b-c89672659464 | |||
1057,47223c4f-9ca3-4d33-928a-258adee23f2d |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to have slipped into an earlier commit. It's not really part of archiving off the vanishing members.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oops, I've tidied up that commit so it only includes archive related things now.
@@ -2958,10 +2954,10 @@ | |||
"scheme": "wikidata" | |||
} | |||
], | |||
"image": "http://www.na.gov.pk/uploads/images/NA-300%20sMayryam.jpg", | |||
"image": "http://www.na.gov.pk/uploads/images/final%20s01.JPG", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not quite sure what's going on here, but this image doesn't exist.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This turned out to be a problem in the scraper which was causing lots of images to break. Fixed in everypolitician-scrapers/pakistan-national-assembly@51abd21.
"url": "https://en.wikipedia.org/wiki/Usman_Badini" | ||
} | ||
], | ||
"name": "Engineer Mohammad Usman Badaini", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmmm. We probably want to start trapping prefixes like this. It's a pre-existing problem, so it doesn't need to be done in this change, but we'd probably want to fix it before letting any new people in with the same problem, especially as this is now a well-solved issue with a solution that can copied easily from other scrapers. It also seems like the sort of thing that's going to get in the way of our Wikidata prompts, so that might also accelerate us wanting to fix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've opened #53128 to capture the problem with the name prefixes.
data/Pakistan/Assembly/term-14.csv
Outdated
@@ -191,7 +192,7 @@ cc6f31bb-70cf-4517-824b-981b15d9efa6,Muhammad Junaid Anwar Chaudhary,Muhammad Ju | |||
073ace5a-3f2e-43e3-a1a6-73b5e518384d,Muhammad Khan Daha,Muhammad Khan Daha,,,,PML(N),pmln,area/na-157_(khanewal-ii),NA-157 (Khanewal-II),National Assembly,14,,,http://www.na.gov.pk/uploads/images/Muhammad%20sKhan%20sDaha(1).jpg,male,Q18815970,Q799577,Q12812171 | |||
aa87aa59-451e-48a3-b755-15becf80df53,Muhammad Moeen Wattoo,Muhammad Moeen Wattoo,,,,PML(N),pmln,area/na-147_(okara-v),NA-147 (Okara-V),National Assembly,14,,,http://www.na.gov.pk/uploads/images/147.JPG,male,Q18815998,Q799577,Q12856786 | |||
0e2c4697-c9b4-4df4-a90a-98c7af328ed5,Muhammad Muzammil Qureshi,Muhammad Muzammil Qureshi,,,,MQM,mqm,area/na-253_(karachi-xv),NA-253 (Karachi-XV),National Assembly,14,,,http://www.na.gov.pk/uploads/images/253.jpg,male,Q19517997,Q1265113,Q12857475 | |||
7304486c-f12d-4009-b2db-ee47ec4f5771,"Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan","Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan",,,,PML(N),pmln,area/na-120_(lahore-iii),NA-120 (Lahore-III),National Assembly,14,,,http://www.na.gov.pk/uploads/images/120.jpg,male,Q134068,Q799577,Q12780067 | |||
7304486c-f12d-4009-b2db-ee47ec4f5771,"Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan","Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan",,,,PML(N),pmln,area/na-120_(lahore-iii),NA-120 (Lahore-III),National Assembly,14,,2017-07-28,http://www.na.gov.pk/uploads/images/120.jpg,male,Q134068,Q799577,Q12780067 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If we're removing the Prime Minister suffix from names (which is definitely a good idea!), we should probably remove it from here too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, I've removed that from the archived source.
@@ -8,3 +8,5 @@ id,name,patronymic_name,address,phone,constituency,province,party,start_date,ima | |||
910,Muhammad Rehan Hashmi,Muhammad Usman Hashmi,"H. No. A-141, Block H, North Nazimabad, Karachi",0302-8255444,NA-245 (Karachi-VII),Sindh,MQM,2013-06-01,http://www.na.gov.pk/uploads/images/245.jpg,14,http://www.na.gov.pk/en/profile.php?uid=910,mqm,, | |||
922,Abdul Hakeem Baloch,Abdullah,"Dur Muhammad Goth Dersano Channa, Malir.","03005250789, [email protected]",NA-258 (Karachi-XX),Sindh,PML(N),2013-06-01,http://www.na.gov.pk/uploads/images/258.jpg,14,http://www.na.gov.pk/en/profile.php?uid=922,pmln,,2016-05-26 | |||
967,Mrs. Alizeh Iqbal Haider,D/o Syed Iqbal Haider,"D-25, Block-IV, Clifton, Karachi",,,Sindh,PPPP,2013-06-01,http://www.na.gov.pk/uploads/images/313.jpg,14,http://www.na.gov.pk/en/profile.php?uid=967,pppp,, | |||
788,"Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan",Mian Muhammad Sharif,"Shamim Farm, Jati Umra, Raiwind Road Lahore.","",NA-120 (Lahore-III),Punjab,PML(N),2013-06-01,http://www.na.gov.pk/uploads/images/120.jpg,14,http://www.na.gov.pk/en/profile.php?uid=788,pmln,male,2017-07-28 | |||
924,Abdul Rahim Mandokhail,Abdul Rehman Mandokhail,"House No. 196 Tehsil Road, Zhob",0346-5303971,NA-260 (QUETTA-CUM-CHANGAI-CUM-NUSHKI (OLD QUETTA-CUM-CHAGAI-CUM-MUSTANG)),Balochistan,PMAP,2013-06-01,http://www.na.gov.pk/uploads/images/260.jpg,14,http://www.na.gov.pk/en/profile.php?uid=924,pmap,,2017-05-20 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These end dates don't seem to have been in the original file, so I'm assuming you found them from an external source? It would be useful to credit that, if so.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added a link to the source to the commit message 👍
These members have disappeared from the official source because they're no longer standing in the 14th term. I've rescued the rows from the official source and archived them off, preserving their existing UUIDs. I got the end dates for these three members from their Wikipedia pages: - https://en.wikipedia.org/wiki/Nawaz_Sharif - https://en.wikipedia.org/wiki/Abdul_Rahim_Khan_Mandokhel - https://en.wikipedia.org/wiki/Gulzar_Khan_(politician)
a50af0c
to
99f4834
Compare
@tmtmtmtm I've addressed you comments, so I think this is ready for review again. |
99f4834
to
cf69ec7
Compare
The scraper has now run, so I've updated this pull request to use the output from morph, rather than the manually generated version. |
Add a record for a new member that's appeared in the official source.
cf69ec7
to
d16ff75
Compare
Summary of changes in PeopleAdded
RemovedNo people removed Name Changes
Additional Name Changes
Wikidata ChangesNo changes OrganizationsAddedNo organizations added RemovedNo organizations removed MembershipsAddedterm/14
Removedterm/14
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The commits are in a slightly funny order here, with the reconciliation change appearing after the commit that uses it. But 👍
This fixes the following issues with the official site import:
Because the scraper for the official site isn't in the everypolitician-scraper account I can't manually trigger a run, so instead I've run the scraper locally and then manually generated a CSV from it and committed that.
Part of https://github.com/everypolitician/everypolitician/issues/612