Skip to content

Commit

Permalink
šŸ“ Index updates (#390)
Browse files Browse the repository at this point in the history
* šŸ“ Update index.md page Structure

* šŸ“ Update contact route / heading

* āœļø Fix typo

* šŸ“ Update Out of Scope

* šŸ“ Remove needless word

* šŸ“ Update Feature Request link/phrase

* āœļø Typo

* āœļø Typo

* āœļø Rephrase

* šŸ“ Update typos/docs
  • Loading branch information
Gary-H9 authored Jan 18, 2024
1 parent d1e6996 commit 460de09
Showing 1 changed file with 50 additions and 39 deletions.
89 changes: 50 additions & 39 deletions source/documentation/index.md
Original file line number Diff line number Diff line change
@@ -1,57 +1,68 @@
# Overview

The Analytical Platform (AP) is a data analysis platform made up of tools, packages and datasets for creating applications that utilise data within the Ministry of Justice (MoJ). The AP provides development environments in both Python (JupyterLab) and R (RStudio), allowing you multiple ways to query, analyse and model data.
The Analytical Platform (AP) is a data analysis platform made up of tools, packages and datasets for creating applications that utilise data within the Ministry of Justice (MoJ). The Analytical Platform provides development environments in both Python (JupyterLab) and R (RStudio), allowing you to query, analyse and model data.

This site provides instructions on how to configure and use the AP.
This site provides instructions on how to configure and use the Analytical Platform.

## Intended users

Primarily intended for data analysts in the Data and Analytical Services Directorate, the Analytical Platform also hosts users from:
- Criminal Injury Claims (CICA)
- HM Courts & Tribunals Service (HMCTS)
- HM Prison and Probation Service (HMPPS)
- Legal Aid Agency (LAA)
- Office of the Public Guardian (OPG)
Primarily intended for Data Analysts, in the Data and Analytical Services Directorate, the Analytical Platform also hosts users from:
- Criminal Injury Claims (CICA)
- HM Courts & Tribunals Service (HMCTS)
- HM Prison and Probation Service (HMPPS)
- Legal Aid Agency (LAA)
- Office of the Public Guardian (OPG)

We can also host other MoJ organisations. [Contact us][contact] to discuss your options.
If you would like to use the Analytical Platform please contact us via the relevant (support)[https://github.com/ministryofjustice/data-platform-support/issues/new/choose] route.

### Knowledge requirements

The Analytical Platform incorporates a variety of technical tools and concepts. While our community provide basic training materials on how to use some of these, to use the platform, as a minimum we recommend you have working knowledge of the following:
The Analytical Platform incorporates a variety of technical tools and concepts. To use the platform, as a minimum we recommend you have a working knowledge of the following:

- Amazon Athena and S3: to create, manipulate and query data
- GitHub and GitHub actions: to manage your application code
- Python or R: to develop applications on the Analytical Platform
- SQL: to query and transform data
- Amazon Athena and S3 - to create, manipulate and query data
- GitHub and GitHub Actions - to manage your application code
- Python or R - to develop applications on the Analytical Platform
- SQL - to query and transform data

## Benefits
## Benefits of Using the Analytical Platform

In additional to Python and R compatibility, benefits of using the Analytical Platform include:

- **modern data tools and services**:
- the ability to freely install packages from CRAN and PyPI to perform advanced analytical techniques, such as text mining, predictive analytics and data visualisation
- compatiblity with current cloud data services, such as Amazon Athena, Glue and Redshift, offering scalability and a managed service at commodity pay-as-you-go prices
- **centralised data**:
- our Data Engineering team converts raw data from operational systems into structures and excerpts
- we hold data files in Amazon S3 for ease of use, to load into your code or run SQL queries directly using Amazon Athena
- users can also upload data to the AP from other sources and share them with granular access controls, subject to normal data protection processes; for more information, see [Information governance][information-governance.md]
- **reproducible analysis**: the AP provides tools to develop reproducible analytical pipelines (RAPs) to automate timeā€“consuming and repetitive tasks, allowing you to focus on interpreting the results with the following elements:
- when datasets are imported into the AP, snapshots of them are taken and versioned
- standardised system libraries in GitHub
- a standardised virtual machine that can run R Studio or Jupyter, or code running in an explicitly defined Dockerfile
- **secure environments**: we host the Analytical Platform in a cloud-based ecosystem that is easy to access remotely from all MoJ IT systems. Designed for data at security classifications OFFICIAL and OFFICIAL-SENSITIVE, we follow NCSC Cloud Security Principles, implementing features such as:
- two-factor authentication
- data encryption at rest and in transit
- granular access control
- extensive tracking of user behaviour, user privilege requests/changes and data flows
- multiple isolation levels between users and system components
- resilience and high availability to provide optimal performance and uptime

> **Note**: The Analytical Platform does not currently provide the following:
- production apps at scale
### Modern Data Tools and Services

- the ability to freely install packages from CRAN and PyPI to perform advanced analytical techniques, such as text mining, predictive analytics and data visualisation
- compatibility with current cloud data services, such as Amazon Athena, Glue and Redshift, offering scalability and a managed service

### Centralised Data

- our Data Engineering team converts raw data from operational systems into structures and excerpts
- we hold data files in Amazon S3 for ease of use, to load into your code or run SQL queries directly using Amazon Athena
- users can also upload data to the Analytial Platform from other sources and share them with granular access controls, subject to normal data protection processes; for more information, see [Information governance][information-governance.md]

### Reproducible Analysis

The Analytical Platform provides tools to develop reproducible analytical pipelines (RAnalytical Platforms) to automate timeā€“consuming and repetitive tasks, allowing you to focus on interpreting the results with the following elements:
- when datasets are imported into the Analytical Platform, snapshots of them are taken and versioned
- standardised system libraries in GitHub
- a standardised virtual machine that can run RStudio or Jupyter, or code running in an explicitly defined Dockerfile

### Secure Environments

The Analytical Platform in hosted in a cloud-based ecosystem that is easy to access remotely from all MoJ IT systems. Designed for data at security classifications OFFICIAL and OFFICIAL-SENSITIVE, we follow NCSC Cloud Security Principles, implementing features such as:
- two-factor authentication
- data encryption at rest and in transit
- granular access control
- extensive tracking of user behaviour, user privilege requests/changes and data flows
- multiple isolation levels between users and system components
- resilience and high availability to provide optimal performance and uptime

### Out of Scope Offerings of the Analytical Platform

The Analytical Platform does not _currently_ provide the following:
- production Analytical Platforms at scale
- management information
- real-time data; however, the Airflow tool can schedule data processing as frequently as every few minutes
- pure data archival: Amazon S3, which the AP uses for data storage, does not offer index or search facilities
- we can set up a custom bucket policy to archive data to S3-IA or Glacier but recommend exploring SaaS alternatives, such as SharePoint or Google Drive
- pure data archival: Amazon S3, which the Analytical Platform uses for data storage, does not offer index or search facilities
- we can set up a custom bucket policy to archive data to S3-IA or Glacier but recommend exploring SaaS alternatives, such as SharePoint or Google Drive

[contact]: mailto:[email protected]
If you would like to raise a feature request this can be done (here)[https://github.com/ministryofjustice/data-platform/issues/new/choose].

0 comments on commit 460de09

Please sign in to comment.