Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PDB fetch failures due to internet connectivity - should be handled better #58

Open
3 tasks done
knaegle opened this issue Jun 19, 2024 · 2 comments
Open
3 tasks done

Comments

@knaegle
Copy link
Contributor

knaegle commented Jun 19, 2024

Description

We found some spurious fetching of PDB that likely results from internet connectivity losses during the fetch. We should include better annotations of fetching issues (right now we report it as invalid PDB IDs, but that's not the case) and attempt refetching. Also, could be good to consider doing an append, so things fetched could be added to later on.

Files

A list of relevant files for this issue. This will help people navigate the project and offer some clues of where to start.

To Reproduce

Steps to reproduce the behavior:

  1. Go to a cafe with bad internet and try running PDB fetching. - I kid, but probably need to enhance the probability of a failure by putting a constraint on internet.

Expected behavior

We should capture the error type on return of PDB fetch and re-attempt failed IDs if it is due to a web issue.
We should also create an appending of files that will look at a list that was fetched and one that was attempted and add new lines as able. This preserves the run time of the first fetch, allowing extension later on.

Tasks

Include specific tasks in the order they need to be done in. Include links to specific lines of code where the task should happen at, if known

@knaegle
Copy link
Contributor Author

knaegle commented Jun 21, 2024

Digging into this, found some serious inefficiencies in coding. Increased speed and added better handling of errors.

Structural changes to code:
PDB interface now operates on only one PDB. Added looping outside of construction.
Changed the annotation code, which used to have a number of if/elif to evaluate with function call to
Checked all url requests for accurate fetch with a success status code of 200, else considered a failure
Separated failures that were tolerable (like not being able to get the uniprot sequence) with intolerable error (no PDB information)
If we have a failure, attempt up to 5 times in case it was an internet connectivity issue.

@knaegle
Copy link
Contributor Author

knaegle commented Jun 21, 2024

Choosing not to add appending at this time, given the significant speed up of about 5-20fold.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant