-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathBARCODES_CUTANA
67 lines (61 loc) · 4.55 KB
/
BARCODES_CUTANA
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
TTCGCGCGTAACGACGTACCGT unmodified
CGCGATACGACCGCGTTACGCG unmodified
CGACGTTAACGCGTTTCGTACG h3k4me1
CGCGACTATCGCGCGTAACGCG h3k4me1
CCGTACGTCGTGTCGAACGACG h3k4me2
CGATACGCGTTGGTACGCGTAA h3k4me2
TAGTTCGCGACACCGTTCGTCG h3k4me3
TCGACGCGTAAACGGTACGTCG h3k4me3
TTATCGCGTCGCGACGGACGTA h3k9me1
CGATCGTACGATAGCGTACCGA h3k9me1
CGCATATCGCGTCGTACGACCG h3k9me2
ACGTTCGACCGCGGTCGTACGA h3k9me2
ACGATTCGACGATCGTCGACGA h3k9me3
CGATAGTCGCGTCGCACGATCG h3k9me3
CGCCGATTACGTGTCGCGCGTA h3k27me1
ATCGTACCGCGCGTATCGGTCG h3k27me1
CGTTCGAACGTTCGTCGACGAT h3k27me2
TCGCGATTACGATGTCGCGCGA h3k27me2
ACGCGAATCGTCGACGCGTATA h3k27me3
CGCGATATCACTCGACGCGATA h3k27me3
CGCGAAATTCGTATACGCGTCG h3k36me1
CGCGATCGGTATCGGTACGCGC h3k36me1
GTGATATCGCGTTAACGTCGCG h3k36me2
TATCGCGCGAAACGACCGTTCG h3k36me2
CCGCGCGTAATGCGCGACGTTA h3k36me3
CCGCGATACGACTCGTTCGTCG h3k36me3
GTCGCGAACTATCGTCGATTCG h3k20me1
CCGCGCGTATAGTCCGAGCGTA h3k20me1
CGATACGCCGATCGATCGTCGG h3k20me2
CCGCGCGATAAGACGCGTAACG h3k20me2
CGATTCGACGGTCGCGACCGTA h3k20me3
TTTCGACGCGTCGATTCGGCGA h3k20me3
##########################################
# Written by Dr. Bryan Venters, EpiCypher Inc.
# Updated 29 OCT 2021
#
# Purpose of script: Use "grep -c" to count exact match to CUTANA spike-in nucleosome barcodes from unzipped paired-end (R1 & R2) fastqs. Both R1 and R2 are searched because the barcoded side of nucleosome may be ligated in either direction relative to the P5 or P7 adapters.
#
# Caution: avoid spaces in file paths and filenames because it is a frequent source of syntax errors.
##########################################
## Instructions ##
# In Finder (Mac):
# 1. Duplicate this template shell script to create an experiment-specific copy. Save to a desired folder.
# 2. Unzip (extract) all fastq.gz to fastq on Mac or Linux by double-clicking on *.gz files. Save these files to the same folder containing shell script.
# In TextEdit:
# 3. Copy/Paste the lines between "# template loop begin ##" and "# template loop end ##" (below; from the first "echo" to the last "done") in this shell script as many times as needed (1 template loop per paired-end R1 & R2 data set). This is the script needed to align samples to the spike-in barcodes.
# 4. Add sample filenames (eg: H3K4me3_dNuc_R1_100k.fastq) in place of "sample1_R1.fastq" in copied/pasted loops. Each loop should contain R1 and R2 files for the matched reaction.
# In Terminal:
# 5. In Terminal, cd (change directory) to directory (folder) containing unzipped_fastq files (eg: cd path_to_fastq). This can also be done in a Mac by dragging your file onto Terminal.
# 6. To execute shell script in the Terminal application type "sh" followed by the file path to your saved shell script, or by dragging shell script file into Terminal: sh path_to_shell.sh
#7: Press enter. The shell script will output the barcode counts in the order listed under "# Barcode identities" (below) and datasets will be annotated based on the filenames. For each annotated loop (R1 and R2 sample set), it will generate all R1 counts and then all R2 counts.
# In Excel:
# Note: The Excel template provides space to copy in read count data for the IgG negative control, the H3K4me3 positive control, and 6 additional samples. Copy and make additional analyses as needed.
# 8. Copy and paste R1 and R2 barcode counts for control reactions from Terminal to yellow highlighted cells.
# 9. For reactions using a K-methyl antibody represented in the panel, select the on-target name from the dropdown list in column B. Then copy the R1 and R2 barcode read counts generated from running the script and paste into the appropriate highlighted yellow cells. Make sure the R1 and R2 files are matched (i.e. from the same reaction).
# 10. The Excel file will automatically generate a heatmap visualization of results, where read counts are normalized relative to the on-target PTM specified in column B. Condensed results, summarizing antibody binding data for each PTM in panel, is provided to right.
# 11. Enter the total "Unique align reads" from your sequencing reaction at the bottom of each table. The total barcode reads will auto-fill. The target for "% total barcode reads" is 1-10%. Lower than 1% and there may be insufficient reads to determine specificity. Greater than 10% indicates spike-ins should be diluted further in future experiments.
# 12. Go to the "Output Table" sheet in the Excel Workbook for a combined heat map showing antibody specificity data for all 8 samples.
# Notes:
# EpiCypher considers an antibody with <20% binding to all off-target PTMs specific and suitable for downstream data analysis.
# For IgG, data is normalized to the sum of total barcode reads.