Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Pakistan (National Assembly): Fix official site import #53037

Merged
merged 3 commits into from
Sep 5, 2017

Conversation

chrismytton
Copy link
Contributor

@chrismytton chrismytton commented Sep 4, 2017

This fixes the following issues with the official site import:

Because the scraper for the official site isn't in the everypolitician-scraper account I can't manually trigger a run, so instead I've run the scraper locally and then manually generated a CSV from it and committed that.

Part of https://github.com/everypolitician/everypolitician/issues/612

@chrismytton
Copy link
Contributor Author

Replaces #53025

Copy link
Contributor

@tmtmtmtm tmtmtmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly good, but there are a few loose ends here that are probably worth tidying up.

@@ -349,3 +347,4 @@ id,uuid
1053,c03a2279-1684-479a-8165-c6098ebb1174
1054,5ed669b9-d426-4b19-8298-53cc6598d9e9
1056,f67e27f4-7419-425c-879b-c89672659464
1057,47223c4f-9ca3-4d33-928a-258adee23f2d
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems to have slipped into an earlier commit. It's not really part of archiving off the vanishing members.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops, I've tidied up that commit so it only includes archive related things now.

@@ -2958,10 +2954,10 @@
"scheme": "wikidata"
}
],
"image": "http://www.na.gov.pk/uploads/images/NA-300%20sMayryam.jpg",
"image": "http://www.na.gov.pk/uploads/images/final%20s01.JPG",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not quite sure what's going on here, but this image doesn't exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This turned out to be a problem in the scraper which was causing lots of images to break. Fixed in everypolitician-scrapers/pakistan-national-assembly@51abd21.

"url": "https://en.wikipedia.org/wiki/Usman_Badini"
}
],
"name": "Engineer Mohammad Usman Badaini",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm. We probably want to start trapping prefixes like this. It's a pre-existing problem, so it doesn't need to be done in this change, but we'd probably want to fix it before letting any new people in with the same problem, especially as this is now a well-solved issue with a solution that can copied easily from other scrapers. It also seems like the sort of thing that's going to get in the way of our Wikidata prompts, so that might also accelerate us wanting to fix it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've opened #53128 to capture the problem with the name prefixes.

@@ -191,7 +192,7 @@ cc6f31bb-70cf-4517-824b-981b15d9efa6,Muhammad Junaid Anwar Chaudhary,Muhammad Ju
073ace5a-3f2e-43e3-a1a6-73b5e518384d,Muhammad Khan Daha,Muhammad Khan Daha,,,,PML(N),pmln,area/na-157_(khanewal-ii),NA-157 (Khanewal-II),National Assembly,14,,,http://www.na.gov.pk/uploads/images/Muhammad%20sKhan%20sDaha(1).jpg,male,Q18815970,Q799577,Q12812171
aa87aa59-451e-48a3-b755-15becf80df53,Muhammad Moeen Wattoo,Muhammad Moeen Wattoo,,,,PML(N),pmln,area/na-147_(okara-v),NA-147 (Okara-V),National Assembly,14,,,http://www.na.gov.pk/uploads/images/147.JPG,male,Q18815998,Q799577,Q12856786
0e2c4697-c9b4-4df4-a90a-98c7af328ed5,Muhammad Muzammil Qureshi,Muhammad Muzammil Qureshi,,,,MQM,mqm,area/na-253_(karachi-xv),NA-253 (Karachi-XV),National Assembly,14,,,http://www.na.gov.pk/uploads/images/253.jpg,male,Q19517997,Q1265113,Q12857475
7304486c-f12d-4009-b2db-ee47ec4f5771,"Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan","Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan",,,,PML(N),pmln,area/na-120_(lahore-iii),NA-120 (Lahore-III),National Assembly,14,,,http://www.na.gov.pk/uploads/images/120.jpg,male,Q134068,Q799577,Q12780067
7304486c-f12d-4009-b2db-ee47ec4f5771,"Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan","Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan",,,,PML(N),pmln,area/na-120_(lahore-iii),NA-120 (Lahore-III),National Assembly,14,,2017-07-28,http://www.na.gov.pk/uploads/images/120.jpg,male,Q134068,Q799577,Q12780067
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If we're removing the Prime Minister suffix from names (which is definitely a good idea!), we should probably remove it from here too.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point, I've removed that from the archived source.

@@ -8,3 +8,5 @@ id,name,patronymic_name,address,phone,constituency,province,party,start_date,ima
910,Muhammad Rehan Hashmi,Muhammad Usman Hashmi,"H. No. A-141, Block H, North Nazimabad, Karachi",0302-8255444,NA-245 (Karachi-VII),Sindh,MQM,2013-06-01,http://www.na.gov.pk/uploads/images/245.jpg,14,http://www.na.gov.pk/en/profile.php?uid=910,mqm,,
922,Abdul Hakeem Baloch,Abdullah,"Dur Muhammad Goth Dersano Channa, Malir.","03005250789, [email protected]",NA-258 (Karachi-XX),Sindh,PML(N),2013-06-01,http://www.na.gov.pk/uploads/images/258.jpg,14,http://www.na.gov.pk/en/profile.php?uid=922,pmln,,2016-05-26
967,Mrs. Alizeh Iqbal Haider,D/o Syed Iqbal Haider,"D-25, Block-IV, Clifton, Karachi",,,Sindh,PPPP,2013-06-01,http://www.na.gov.pk/uploads/images/313.jpg,14,http://www.na.gov.pk/en/profile.php?uid=967,pppp,,
788,"Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan",Mian Muhammad Sharif,"Shamim Farm, Jati Umra, Raiwind Road Lahore.","",NA-120 (Lahore-III),Punjab,PML(N),2013-06-01,http://www.na.gov.pk/uploads/images/120.jpg,14,http://www.na.gov.pk/en/profile.php?uid=788,pmln,male,2017-07-28
924,Abdul Rahim Mandokhail,Abdul Rehman Mandokhail,"House No. 196 Tehsil Road, Zhob",0346-5303971,NA-260 (QUETTA-CUM-CHANGAI-CUM-NUSHKI (OLD QUETTA-CUM-CHAGAI-CUM-MUSTANG)),Balochistan,PMAP,2013-06-01,http://www.na.gov.pk/uploads/images/260.jpg,14,http://www.na.gov.pk/en/profile.php?uid=924,pmap,,2017-05-20
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These end dates don't seem to have been in the original file, so I'm assuming you found them from an external source? It would be useful to credit that, if so.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added a link to the source to the commit message 👍

@tmtmtmtm tmtmtmtm assigned chrismytton and unassigned tmtmtmtm Sep 4, 2017
These members have disappeared from the official source because they're
no longer standing in the 14th term. I've rescued the rows from the
official source and archived them off, preserving their existing UUIDs.

I got the end dates for these three members from their Wikipedia pages:

- https://en.wikipedia.org/wiki/Nawaz_Sharif
- https://en.wikipedia.org/wiki/Abdul_Rahim_Khan_Mandokhel
- https://en.wikipedia.org/wiki/Gulzar_Khan_(politician)
@chrismytton
Copy link
Contributor Author

@tmtmtmtm I've addressed you comments, so I think this is ready for review again.

@chrismytton chrismytton requested a review from tmtmtmtm September 5, 2017 13:26
@chrismytton chrismytton assigned tmtmtmtm and unassigned chrismytton Sep 5, 2017
@chrismytton chrismytton force-pushed the pakistan-assembly-fix-official-source branch from 99f4834 to cf69ec7 Compare September 5, 2017 16:36
@chrismytton
Copy link
Contributor Author

The scraper has now run, so I've updated this pull request to use the output from morph, rather than the manually generated version.

@chrismytton chrismytton force-pushed the pakistan-assembly-fix-official-source branch from cf69ec7 to d16ff75 Compare September 5, 2017 16:45
@everypoliticianbot
Copy link
Member

Summary of changes in data/Pakistan/Assembly/ep-popolo-v1.0.json:

People

Added

  • 47223c4f-9ca3-4d33-928a-258adee23f2d - Engineer Mohammad Usman Badaini

Removed

No people removed

Name Changes

  • 7304486c-f12d-4009-b2db-ee47ec4f5771: Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan → Muhammad Nawaz Sharif

Additional Name Changes

  • 7304486c-f12d-4009-b2db-ee47ec4f5771 (Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan): Removed: Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan. Added: Muhammad Nawaz Sharif.

Wikidata Changes

No changes

Organizations

Added

No organizations added

Removed

No organizations removed

Memberships

Added

term/14

  • Abdul Rahim Mandokhail ( - 2017-05-20)

  • Engineer Mohammad Usman Badaini (2017-08-01 - )

  • Gulzar Khan ( - 2017-08-28)

  • Muhammad Nawaz Sharif ( - 2017-07-28)

Removed

term/14

  • Abdul Rahim Mandokhail

  • Gulzar Khan

  • Muhammad Nawaz Sharif, Prime Minister Islamic Republic of Pakistan

Copy link
Contributor

@tmtmtmtm tmtmtmtm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The commits are in a slightly funny order here, with the reconciliation change appearing after the commit that uses it. But 👍

@tmtmtmtm tmtmtmtm merged commit 67c1e29 into master Sep 5, 2017
@tmtmtmtm tmtmtmtm deleted the pakistan-assembly-fix-official-source branch May 23, 2018 09:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants