Skip to content

Commit

Permalink
Merge remote-tracking branch 'origin/infra-recs-kl'
Browse files Browse the repository at this point in the history
  • Loading branch information
dpark01 committed Apr 12, 2024
2 parents d8f8641 + 6508290 commit dcc8f32
Showing 1 changed file with 6 additions and 6 deletions.
12 changes: 6 additions & 6 deletions docs/recommendations.md
Original file line number Diff line number Diff line change
Expand Up @@ -28,7 +28,7 @@ ORCIDs? (good idea but probably depends on journal)

## Abstract

With climate change, habitat disruptions, an increase of antibiotic resistance and other anthropogenic and natural factors, combating infections both in humans and animals is again coming to the forefront. Sequencing technology has become an increasingly important part of the toolkit to track and characterise pathogens, and the use of pathogen genomic data in public health is growing. However, sequencing technologies have the potential to generate significant volumes of data, and this requires data management, analysis methods and interpretation tools that might be unfamiliar for the institutions tasked with this work. Here, the PHA4GE Infrastructure working group, consisting of practising bioinformaticians from both academia and healthcare/public health, present a set of recommendations on how institutions can manage this new technology in order to get full use of the data. These recommendations cover aspects including infrastructure, data management, analysis workflow tools and user management. In addition, non-technical considerations, such as legacy systems and regulatory factors are discussed. With these recommendations, the working group aims to provide institutions and working bioinformaticians with a set of best practice guidelines to guide decision making around computational environments used to employ sequencing data to combat disease.
With climate change, habitat disruptions, an increase of antibiotic resistance and other anthropogenic and natural factors, combating infections both in humans and animals is again coming to the forefront. Sequencing technology has become an increasingly important part of the toolkit to track and characterise pathogens, and the use of pathogen genomic data in public health is growing. However, sequencing technologies have the potential to generate significant volumes of data, and this requires data management, analysis methods and interpretation tools that might be unfamiliar for the institutions tasked with this work. Here, the PHA4GE Infrastructure working group, consisting of practising bioinformaticians from both academia and healthcare/public health, present a set of recommendations on how institutions can manage this new technology in order to get full use of the data. These recommendations cover aspects including infrastructure, data management, analysis workflow tools and user management. In addition, non-technical considerations, such as legacy systems and regulatory factors are discussed. With these recommendations, the working group aims to provide institutions and working bioinformaticians with a set of best practice guidelines to guide decision making around computational environments used to employ sequencing data to combat disease.

## Background and Motivation

Expand Down Expand Up @@ -126,9 +126,9 @@ Responsibilities for cyber incident management, how one detects and responds to

## Results

To illustrate how different questions (who, what, where) may be answered, we describe six real world implementations (vignettes) of bioinformatics infrastructure to contrast the many benefits and constraints. To compare them, we have outlined eight dimensions based on these broader questions (See methods). These dimensions include; **Future proofing**, **Ease of use** (for administrator), **Ease of use** (for user) (How the analysis is run); **Data provenance and management** (How data flows); **Access control** (Who has access); **External access requirements**, **Flexibility**, **Scalability** (Where the analysis is run).
To illustrate how different questions (who, what, where) may be answered, we describe six real world implementations (see vignettes for more) of bioinformatics infrastructure to contrast the many benefits and constraints that come with different solutions. To compare them, we have outlined eight dimensions based on these broader questions (See methods). These dimensions include; **Future proofing**, **Ease of use** (for administrator), **Ease of use** (for user) (How the analysis is run); **Data provenance and management** (How data flows); **Access control** (Who has access); **External access requirements**, **Flexibility**, **Scalability** (Where the analysis is run).

The six implementations are summarised in **Table 1** with details in Supplementary Materials, and the summary of assessment of these are presented in **Figure 3**. The detailed scoring for each vignettes is in **Supplementary Table 1**.
The six implementations are summarised in Table 1 with details in Supplementary Materials. Each solution was evaluated by competent practitioners with experience with the solution. The detailed scoring for each vignettes is in Supplementary Table 1. A summary of the assessment of the solutions are presented in Figure 3.

> [!WARNING]
> TO DO: INSERT TABLE 1
Expand All @@ -140,9 +140,9 @@ The six implementations are summarised in **Table 1** with details in Supplement
> [!WARNING]
> CONFUSING - the following sections lacks fluff to explain context
The INRB Laptop example as a local installation on a single device, with the single advantage that it does not require external resources to run, which was important as one of the key motivating constraints was slow or unreliable internet and power where it was deployed. INRB Laptop, in being self contained and self managed, shifted all management onto the operator and was limited to the resources on the physical device, which scored poorly for criteria such as scalability and flexibility.

In the survey of real world implementations, only the laptop example was consistently ranked the lowest with limited scalability, flexibility and structures (like access control).
As is evident from Figure 3, only the laptop example was consistently scored the lowest with limited scalability, flexibility and structures (like access control).
The INRB Laptop is an example of a local installation on a single device that has the single advantage that it does not require external resources to run. This independence was the key motivating factor for the people who chose to use this solution, due to slow or unreliable internet and power where it was deployed. The INRB Laptop in being self contained and self managed, in addition shifted all management onto the operator and was limited to the resources on the physical device.
> [!current working site]
Centralised on premises solutions (Nextflow - Ibadan, IRIDA, HPC) had different systems for data provenance and user control but were adequate. Adding a web front end application (IRIDA) provided easy for users without additional complexity for administration. Indeed, the key complication for on premises solutions was the provisioning of the job processing systems, which required expert knowledge. Notably the HPC example uses a traditional HPC arrangement that was married to the HPC hardware making it difficult to switch to other resources for data processing (such as cloud) to scale up or down the resource. There were different motivations for the different implementations, for instance the HPC on premises example was to utilise existing HPC resources while the IRIDA NVI example was to respond to data privacy and protection requirements.

Expand Down

0 comments on commit dcc8f32

Please sign in to comment.