-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[validator] Add additional countries to adm1 #50
Comments
Discussed this with TRC on Slack but adding the salient points here:
@tomrconnor can you confirm that the sequences from these new country codes will not need any other special handling? ie. Is there a problem with automatically uploading genomes from these countries with the basic public metadata using the same system currently provided for the home nations? Or providing the consortium level metadata to COG through the usual means?
|
As far as I am aware, deductive disclosure issues aside, there shouldn't be any need for special handling. I think the advice to the locations concerned will be to not upload other details, so we may need to think about what they put into adm2. I would expect that the upload of metadata would use the same system as the rest of COG-UK. The locations concerned will be signing the data access agreement and will basically be treated like other sites. |
Perfect. adm2 is only made public through sources like Microreact where
deductive disclosure concerns are already handled routinely.
DIPI meets tomorrow so I'll raise it with the various pipeline teams.
…On Tue, Apr 13, 2021 at 3:16 PM tomrconnor ***@***.***> wrote:
As far as I am aware, deductive disclosure issues aside, there shouldn't
be any need for special handling. I think the advice to the locations
concerned will be to not upload other details, so we may need to think
about what they put into adm2. I would expect that the upload of metadata
would use the same system as the rest of COG-UK.
The locations concerned will be signing the data access agreement and will
basically be treated like other sites.
—
You are receiving this because you were assigned.
Reply to this email directly, view it on GitHub
<#50 (comment)>,
or unsubscribe
<https://github.com/notifications/unsubscribe-auth/AAIN6OU6SFMKFEIHEVUB433TIRG23ANCNFSM423MWGYQ>
.
|
Flagged this to DIPI. RC will follow up with VH and AT on geography cleaning steps. AU confirms this should fit in to standard practice with Microreact. Test data might be useful. |
@tomrconnor Although adm2 is not controlled by Majora it would be useful to get an idea of what the adm2 data for each of these countries may look like, do you have any example sample-level metadata or aggregate adm2 counts that we might be able to feed back to geography teams so they can update their scripts? |
I have conducted a small scale audit of my software directory and spot the following:
I've realised, there is a slightly wider problem here, in that we've never asked for adm0, because it has always been assumed to be United Kingdom. All samples have their adm0 set to UK in Majora, but we will need to automatically fill in and handle adm0 with something appropriate going forward. Note that adm0 cannot be blank as it is a proxy for whether the biosample has been filled in (which is a different future issue). Will need to compile the accepted formats for these new countries at GISAID and ENA. Additionally, if PHE needs to receive the local lab ID field then we will need to revisit the configuration of the agreement within Majora to expand the country list from UK-ENG.
|
Outbound lookups as follows:
|
Just to check, the proposed adm1s would mean that instead of e.g. |
I thought it was |
@rmcolq Yes; the proposed adm1 will be integrated as-is rather than prefixed (ie. using the proper ISO 3166-2 codes). The home nations will continue to use the existing adm1 (which are modified ISO 3166-2 using UK over GB). |
Complication is none of those new places are technically part of the UK (but then Northern Ireland is not part of Great Britain but still has GB-NIR under ISO 3166-2). |
@rambaut Indeed - there is a bunch of hard coding assuming United Kingdom dotted around a few things on this end of Elan. I'm not familiar with the NUTS (hah) and bolts of the geography cleaning but presumably this is a bit of work to integrate. There won't be any changes in Majora until we're confident that everything downstream will accept the new codes without undefined behaviour. |
Verity tells me that |
Awesome. I see the changes in COG-UK/geography_cleaning@54f5c76 tracked by COG-UK/geography_cleaning#2. @ViralVerity from the look of the patch this will work even if users don't change to the "correct" |
Yes it will work even if you don't update the existing ones. Datapipe/Phylopipe2 changes made (largely just the publishing step), just testing now. |
Yes - it should propagate up to country level! There's been quite a lot of sequences from the new adm1s so far that just had postcodes, so it will pick those up. I'm also happy to take advisement on what the cleaned up country looks like in terms of underscores vs spaces and capitalisation. |
@rmcolq @ViralVerity Thanks both! Looks like it's mostly with me to sort out the outbound pipes and Majora itself now... The main thing I still need to sort out is whether these new country codes need to come under the same treatment as the @ViralVerity I'll be using the countries as in the table above but whatever is more consistent with the cleaning you do already should be fine I think. We could set up a page on the docs site about geography cleaning if it would be helpful for people? |
Great ok, yeah I've got those as inputs! |
Health Informatics group has asked for an update on this. It looks like we're ready downstream. Before we carry on I just want to chase up whether each of these adm1 are signed on to the agreement such that we can handle them as standard. |
I can confirm geography cleaning takes the countries exactly as above (ie no "UK" and in capitals) and will return prettier, human readable versions, in line with how the current adm1s are treated. |
@tomrconnor Do you know if all these new adm1 have signed up to COG now, don't want to inadvertently trigger an ethics meeting! |
They were in the process of it, I think. PHE are sequencing for some of these locations; think that the other sites won't be told they can have access to CLIMB until the HI group knows that this work has been done. So the downstream work being done is great, and I think the next thing is to pass this back to the HI group for them to manage the next actions. |
@tomrconnor Any update from HI? |
Bumping to backlog #62 |
Brief description
COG-UK may begin to receive samples from British Overseas Territories or Crown Dependencies. This will require an update to adm1 in order to provide the option for sequences from these locations to be correctly linked. These may require some form of suppression post-upload, for example in the generation of data for the public MicroReact.
Detail
Proposed by: Connor, T. R. PHWC
The text was updated successfully, but these errors were encountered: