GitHub - bastienllamas/BIOINF-3010-7150

TOC {:toc}

BIOINF3010/7150: Genomics Applications

{:.no_toc}

Semester 1 2021 - Provisional Timetable

Lecture and Practical Timetable

Week	Monday	Lecture	Practical
1	1/3	Introduction to sequencing Molecular basis of polymer extension sequencing (Sanger)High throughput sequencing: Illumina Sequence quality (Dave)	Introduction to Bash 1 (Dave) Introduction to Bash 2 (Dave)
2	8/3	Resequencing (Exome/WGS) (Dave)	Read Quality Control (Dave) SAMTools and alignments (Joe/Dave)
3	15/3	Short read assembly; approaches and issues (Dave	SARS-CoV-2 Resequencing (Dave) SARS-CoV-2 Short Read Assembly (Dave)
4	22/3	Single molecule sequencing PacBio/Nanopore Uses/Characteristics/Error profiles (Dave)	Short and long read alignment (Dave) E. coli K-12 Hybrid Genome Assembly (Dave)
5	29/3	Friday public holiday, no lecture	[Tuesday open practical session] (Dave) [Friday public holiday no practical]
6	5/4	De novo assembly genome size estimate (k-mers), coverage. (Lloyd)	[Assembly practical pt 1] (Lloyd) [Assembly practical pt 2] (Lloyd)
-			MID-SEMESTER BREAK
7	26/4	Genome Graphs (Yassine)	HiC analysis (Callum) Genome graphs1 (Yassine)
8	3/5	Annotation - Gene finding, Repeat identification/classification/masking, comparative genomics (Dave)	Genome graphs2 (Yassine) Intro to BLAST (Dave)
9	10/5	Variant calling and high-throughput genotyping (Julien)	BLAST practical (Dave) Clinical genomics1 (Julien)
10	17/5	High-throughput genotyping technologies and applications (Rick)	Clinical genomics2 (Julien) Agricultural genomics (Rick)
11	24/5	Population genomics (Bastien)	Open Practical Session Tuesday (Dave) Population genetics1 (Bastien)
12	31/5	Wrap up lecture (Dave)	Population genetics2 (Bastien) [Open Prac session - Friday] (Dave)
13	7/6	TBD	TBD

Assessment

Assessment Tasks

Assessment	Subject
Assessment 0	Bash
Assessment 1	Genome sequencing
Assessment 2	CANU genome assembly
Assessment 3	Genome Annotation
Assessment 4	Ancient DNA (No Link)
Project (PG only)	Complete Dataset

Major Project (Post Grad only)

In this course, the following next-generation sequencing (NGS) datasets/protocols will be examined in detail:

Whole genome sequencing/Resequencing
SNV variant/structural variation analysis
Enrichment/Capture sequencing (Methyl-capture, ChIPseq, RIPseq)
Metagenomics/Microbial profiling

Each of these NGS approaches uses similar programs and analysis approaches, such as quality control (quality and sequencing adapter trimming), genome alignment, and downstream visualisation and statistical methods. For the project, you will take a published (or otherwise obtained) dataset and complete all the analysis tasks (from raw data to final results) during the course. You have the freedom to choose any dataset you would like. You will prepare a final report that will be due at the end of the semester. The report should be prepared using RStudio as an Rmd document including all code needed to perform the analysis, and will include the standard components of a scientific report:

Introduction (background on the study and identification of the research hypothesis)
Methods (analysis steps and programs used)
Results (what you found) and;
Discussion (how the results relate to the research hypothesis and the published literature).

The Rmd document and a compiled knitted html will form the submission; marks will be awarded to the code and Rmd that you use.

Section	Mark
Abstract	5%
Introduction and hypothesis	10%
Methods	20%
Results and Discussion	30%
References	5%
Analysis scripts	30%

Major Project Data

For the project I was able to download a number of publicly available datasets from the Encylopedia of DNA elements (ENCODE) project, which is a large multi-national study that wrapped up a while ago. The purpose of the study was to identify any "functional" region of the genome that may not be gene-coding, so the project sequenced a lot of RNA sequencing, ChIP-seq (Transcription Factor-binding), DNA methylation sequencing and arrays etc.

Sample Information

GM12878 is a human lymphoblastoid cell-line, a component of the human Lymphoblastic Leukaemias, taken from a large family from Utah (Central European Ancestry) in 1985. These cell-lines are widely used in genomics as reference sets for large projects and are easy to obtain and use in a research setting.

RNA-seq

In the data directory you will find a range of RNA-seq and ChIPseq data from the human cell-line GM12878. ENCODE datasets were produced back in 2012 by a number of labs in the US. They include RNA-seq from four different RNA fractions:

Long PolyA+ enriched RNA from Whole-cells
Long RNA from Whole-cells without PolyA enrichment
Short total RNA
Long total RNA

Short vs Long refers to the size selection of the RNA before making the library. Short is generally less than 100bp and large is >100bp.

All of the library protocols are available already so you can have a look at the specifics.

For differential expression, there is 6 samples from the paper "Cis-Regulatory Circuits Regulating NEK6 Kinase Overexpression in Transformed B Cells Are Super-Enhancer-Independent" by Huang et al. 2017. These GM12878 cells are the same as above, with one group of 3 clones from normal cells, and the other group of 3 clones with a deleted region.

ChIP-seq

If you would like to do something slighly different, I have also included two ChIP-seq datasets that enrich for CTCF transcription factor binding sites. CTCF is an important TF for structural organisation of the chromosome and is used a lot on 3D chromosome structure analyses (3C/4C/5C/HiC-seq).

Each replicate is also sampled on GM12878.

All the data is available [here]https://universityofadelaide.box.com/v/mscProjectData

Note 1: The data from this directory is approximately 100GB, meaning that you cannot download the data in one go. I would suggest choosing specific libraries you would like to work on and download those separately onto your VM so you don't fill up the VM's allocated space.

Note 2: Some of the data is from 2012-2014, so some of the sequencing technology is quite old!

Good luck!

Assessment Checklist

Have you:

Answered all the questions?
Followed naming conventions for Assessments?
Checked that you have not breached the Academic Honesty Policy.
Identified the work as yours?
- Emails should have the course and assessment task names.
- Documents should be named with your name, the course name and the assessment task.
- Printed documents should have you name and the course and assessment task in the text/footer/header.
Used appropriate electronic communication with assessors?
- Emails should have a meaningful subject.
Handed in the assignment before the due time (see MyUni)?

Useful Links

How To Ask Questions The Smart Way

How to write a good bug report

This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.

Name		Name	Last commit message	Last commit date
Latest commit History 299 Commits
Assignments		Assignments
Practicals		Practicals
images		images
javascripts		javascripts
stylesheets		stylesheets
.gitignore		.gitignore
LICENSE		LICENSE
README_old.md		README_old.md
Readme.md		Readme.md
params.json		params.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

BIOINF3010/7150: Genomics Applications

Lecture and Practical Timetable

Assessment

Assessment Tasks

Major Project (Post Grad only)

Major Project Data

Sample Information

RNA-seq

ChIP-seq

Assessment Checklist

Useful Links

About

Releases

Packages

Languages

License

bastienllamas/BIOINF-3010-7150

Folders and files

Latest commit

History

Repository files navigation

BIOINF3010/7150: Genomics Applications

Lecture and Practical Timetable

Assessment

Assessment Tasks

Major Project (Post Grad only)

Major Project Data

Sample Information

RNA-seq

ChIP-seq

Assessment Checklist

Useful Links

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages