Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

results fileter #5

Open
whiteorchid opened this issue Nov 20, 2020 · 5 comments
Open

results fileter #5

whiteorchid opened this issue Nov 20, 2020 · 5 comments

Comments

@whiteorchid
Copy link

Hi,

May I know how to filter the results, as some item has few overlaps(eg. < 100), is there a threshold(eg. >20 bp) for the result to be considered as a truly detected ERV.

Thanks so much for your guidance!

image

@JianGuoZhou3
Copy link

Hi @whiteorchid I have a problem with running ERV, could you help me to check this issue?
#7
Best,
Jian-Guo

@whiteorchid
Copy link
Author

Thanks for your message!

I used ERVmap by installing it in the traditional way, not by the Docker method. Maybe you can check if the bed files and the ref genome version are matched.

Best,

@JianGuoZhou3
Copy link

Hi @whiteorchid Thanks for your message. Could you please share your install code with me?
I just used conda, but some package is not available...

Best,

@whiteorchid
Copy link
Author

Sorry, currently I do not have access to the previous server.

I find the trim script kindly given by Professor Kong. For all other processes, I just follow the instructions on the Github page.



#!/usr/bin/perl -w

#
# a simple program for post-processing paired-end reads after btrim trimming
#
# Yong Kong
# Yale University 
#

use strict;
 
my $f1 = shift;  # summary file 1
my $f2 = shift;  # summary file 2
my $s1 = shift;  # trim output file 1
my $s2 = shift;  # trim output file 2

my $check = shift || 0;

die "usage: $0 <summary file 1> <summary file 2> <trim output file 1> <trim output file 2>" unless ($f1 && $f2 && $s1 && $s2);

my ($fh1, $fh2);
my ($sh1, $sh2);
my ($oh1, $oh2);
my ($ph1, $ph2);

open($fh1, "<$f1") || die "cannot open $f1";
open($fh2, "<$f2") || die "cannot open $f2";

open($sh1, "<$s1") || die "cannot open $s1";
open($sh2, "<$s2") || die "cannot open $s2";

my $o1 = "${s1}.pe";  # reads in both ends passed
my $o2 = "${s2}.pe";

my $p1 = "${s1}.se";  # read in trim file1 passed, but the pair in file2 failed
my $p2 = "${s2}.se";  # read in trim file2 passed, but the pair in file1 failed

open($oh1, ">$o1") || die "cannot open $o1";
open($oh2, ">$o2") || die "cannot open $o2";

open($ph1, ">$p1") || die "cannot open $p1";
open($ph2, ">$p2") || die "cannot open $p2";

while (my $l1 = <$fh1>, my $l2 = <$fh2>) {
    chomp($l1);
    my @t1 = split(/\t/, $l1);
    my $n1 = $t1[0];
    my $s1 = $t1[1];

    chomp($l2);
    my @t2 = split(/\t/, $l2);
    my $n2 = $t2[0];
    my $s2 = $t2[1];
	
    if ($check) {
	my ($n11, $n12);
	my ($n21, $n22);
	my $dummy;
	($n11, $dummy, $n12) = ($n1 =~ /(.*)(\s+|\/)(1|2)/);
	($n21, $dummy, $n22) = ($n2 =~ /(.*)(\s+|\/)(1|2)/);
	
	die ">$n11< ne >$n21<" if ($n11 ne $n21);
	die ">$n12< == 1" unless ($n12 == 1);
	die ">$n22< == 2" unless ($n22 == 2);
    }


    if ($s1 eq 'Pass' && $s2 eq 'Pass') {
	&find_and_print($sh1, $oh1, $n1);
	&find_and_print($sh2, $oh2, $n2);
    } elsif ($s1 eq 'Pass') {
	&find_and_print($sh1, $ph1, $n1);
    } elsif ($s2 eq 'Pass') {
	&find_and_print($sh2, $ph2, $n2);
    }
}

close($fh1);
close($fh2);
close($sh1);
close($sh2);
close($oh1);
close($oh2);
close($ph1);
close($ph2);


print "Done!\n";


sub find_and_print {
    my ($ih, $oh, $name) = @_;

    while (defined(my $sl = <$ih>)) {
	chomp($sl);
	my @t = split(/\t/, $sl);
	my $n = $t[0];

	if (substr($n, 1) eq $name) {
	    print $oh "$sl\n";
	    $sl = <$ih>;
	    print $oh "$sl";
	    $sl = <$ih>;
	    print $oh "$sl";
	    $sl = <$ih>;
	    print $oh "$sl";
	    last;
	}
    }

}

@JianGuoZhou3
Copy link

Hey @whiteorchid thanks for your reply.

This is my code.

dir=~/PRJNA82747 
index=~/t12a/reference/hg38_ek12/STAR_index_star_2.7.6a
erv_bed=/home/zhou/soft/ERVmap
mkdir -p ERV.map
cd ${dir}/ERV.map
mkdir -p results
cd ${dir}/ERV.map/results

    singularity exec \
    --bind ${dir}/raw:/data:ro \
    --bind ${index}:/genome:ro \
    --bind ${erv_bed}:/resources:ro \
    --bind ${dir}/ERV.map/results/${i}:/results \
    ~/singularity/ervmap.1.2.1.sif /scripts/ERVmapping.sh \
    --read1 ${dir}/raw/${i}_1.fastq.gz --read2 ${dir}/raw/${i}_2.fastq.gz \
    --output ${i} -m ALL \
    --cpus 12 --limit-ram 151290751290 -d

I just used GRCh38.primary_assembly.genome.fa and gencode.v34.annotation.gtf for STAR_index_star_2.7.6a.
Best,
Jian-Guo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants