results fileter #5

whiteorchid · 2020-11-20T16:36:48Z

Hi,

May I know how to filter the results, as some item has few overlaps(eg. < 100), is there a threshold(eg. >20 bp) for the result to be considered as a truly detected ERV.

Thanks so much for your guidance!

JianGuoZhou3 · 2022-02-12T16:21:27Z

Hi @whiteorchid I have a problem with running ERV, could you help me to check this issue?
#7
Best,
Jian-Guo

whiteorchid · 2022-02-15T23:11:56Z

Thanks for your message!

I used ERVmap by installing it in the traditional way, not by the Docker method. Maybe you can check if the bed files and the ref genome version are matched.

Best,

JianGuoZhou3 · 2022-02-15T23:22:01Z

Hi @whiteorchid Thanks for your message. Could you please share your install code with me?
I just used conda, but some package is not available...

Best,

whiteorchid · 2022-02-16T12:00:01Z

Sorry, currently I do not have access to the previous server.

I find the trim script kindly given by Professor Kong. For all other processes, I just follow the instructions on the Github page.



#!/usr/bin/perl -w

#
# a simple program for post-processing paired-end reads after btrim trimming
#
# Yong Kong
# Yale University 
#

use strict;
 
my $f1 = shift;  # summary file 1
my $f2 = shift;  # summary file 2
my $s1 = shift;  # trim output file 1
my $s2 = shift;  # trim output file 2

my $check = shift || 0;

die "usage: $0 <summary file 1> <summary file 2> <trim output file 1> <trim output file 2>" unless ($f1 && $f2 && $s1 && $s2);

my ($fh1, $fh2);
my ($sh1, $sh2);
my ($oh1, $oh2);
my ($ph1, $ph2);

open($fh1, "<$f1") || die "cannot open $f1";
open($fh2, "<$f2") || die "cannot open $f2";

open($sh1, "<$s1") || die "cannot open $s1";
open($sh2, "<$s2") || die "cannot open $s2";

my $o1 = "${s1}.pe";  # reads in both ends passed
my $o2 = "${s2}.pe";

my $p1 = "${s1}.se";  # read in trim file1 passed, but the pair in file2 failed
my $p2 = "${s2}.se";  # read in trim file2 passed, but the pair in file1 failed

open($oh1, ">$o1") || die "cannot open $o1";
open($oh2, ">$o2") || die "cannot open $o2";

open($ph1, ">$p1") || die "cannot open $p1";
open($ph2, ">$p2") || die "cannot open $p2";

while (my $l1 = <$fh1>, my $l2 = <$fh2>) {
    chomp($l1);
    my @t1 = split(/\t/, $l1);
    my $n1 = $t1[0];
    my $s1 = $t1[1];

    chomp($l2);
    my @t2 = split(/\t/, $l2);
    my $n2 = $t2[0];
    my $s2 = $t2[1];
	
    if ($check) {
	my ($n11, $n12);
	my ($n21, $n22);
	my $dummy;
	($n11, $dummy, $n12) = ($n1 =~ /(.*)(\s+|\/)(1|2)/);
	($n21, $dummy, $n22) = ($n2 =~ /(.*)(\s+|\/)(1|2)/);
	
	die ">$n11< ne >$n21<" if ($n11 ne $n21);
	die ">$n12< == 1" unless ($n12 == 1);
	die ">$n22< == 2" unless ($n22 == 2);
    }


    if ($s1 eq 'Pass' && $s2 eq 'Pass') {
	&find_and_print($sh1, $oh1, $n1);
	&find_and_print($sh2, $oh2, $n2);
    } elsif ($s1 eq 'Pass') {
	&find_and_print($sh1, $ph1, $n1);
    } elsif ($s2 eq 'Pass') {
	&find_and_print($sh2, $ph2, $n2);
    }
}

close($fh1);
close($fh2);
close($sh1);
close($sh2);
close($oh1);
close($oh2);
close($ph1);
close($ph2);


print "Done!\n";


sub find_and_print {
    my ($ih, $oh, $name) = @_;

    while (defined(my $sl = <$ih>)) {
	chomp($sl);
	my @t = split(/\t/, $sl);
	my $n = $t[0];

	if (substr($n, 1) eq $name) {
	    print $oh "$sl\n";
	    $sl = <$ih>;
	    print $oh "$sl";
	    $sl = <$ih>;
	    print $oh "$sl";
	    $sl = <$ih>;
	    print $oh "$sl";
	    last;
	}
    }

}

JianGuoZhou3 · 2022-02-16T12:57:02Z

Hey @whiteorchid thanks for your reply.

This is my code.

dir=~/PRJNA82747 
index=~/t12a/reference/hg38_ek12/STAR_index_star_2.7.6a
erv_bed=/home/zhou/soft/ERVmap
mkdir -p ERV.map
cd ${dir}/ERV.map
mkdir -p results
cd ${dir}/ERV.map/results

    singularity exec \
    --bind ${dir}/raw:/data:ro \
    --bind ${index}:/genome:ro \
    --bind ${erv_bed}:/resources:ro \
    --bind ${dir}/ERV.map/results/${i}:/results \
    ~/singularity/ervmap.1.2.1.sif /scripts/ERVmapping.sh \
    --read1 ${dir}/raw/${i}_1.fastq.gz --read2 ${dir}/raw/${i}_2.fastq.gz \
    --output ${i} -m ALL \
    --cpus 12 --limit-ram 151290751290 -d

I just used GRCh38.primary_assembly.genome.fa and gencode.v34.annotation.gtf for STAR_index_star_2.7.6a.
Best,
Jian-Guo

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

results fileter #5

results fileter #5

whiteorchid commented Nov 20, 2020

JianGuoZhou3 commented Feb 12, 2022

whiteorchid commented Feb 15, 2022

JianGuoZhou3 commented Feb 15, 2022

whiteorchid commented Feb 16, 2022

JianGuoZhou3 commented Feb 16, 2022

results fileter #5

results fileter #5

Comments

whiteorchid commented Nov 20, 2020

JianGuoZhou3 commented Feb 12, 2022

whiteorchid commented Feb 15, 2022

JianGuoZhou3 commented Feb 15, 2022

whiteorchid commented Feb 16, 2022

JianGuoZhou3 commented Feb 16, 2022