Skip to content

Commit

Permalink
update BUSCO and fix --miniprot parameter (#6153)
Browse files Browse the repository at this point in the history
* update BUSCO and fiw --miniprot parameter

* error version

* update

* fix miniprot parameter

* update test-data

* update

* small modification on busco db

* small modifications

* fix output errors

* small modification

* small modification

* fix test

* add test-data/genome_results_miniprot

* small change in busco.xml

* update test-data

* add assert

* fix test 7
  • Loading branch information
rlibouba authored Aug 30, 2024
1 parent 3d970de commit 21578f9
Show file tree
Hide file tree
Showing 32 changed files with 1,452 additions and 180 deletions.
185 changes: 112 additions & 73 deletions tools/busco/busco.xml

Large diffs are not rendered by default.

2 changes: 1 addition & 1 deletion tools/busco/macros.xml
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
<?xml version="1.0"?>
<macros>
<token name="@TOOL_VERSION@">5.5.0</token>
<token name="@TOOL_VERSION@">5.7.1</token>
<token name="@VERSION_SUFFIX@">0</token>

<xml name="citations">
Expand Down
2 changes: 1 addition & 1 deletion tools/busco/test-data/busco_database.loc
Original file line number Diff line number Diff line change
Expand Up @@ -3,4 +3,4 @@
# - name
# - version
# - /path/to/data
busco-demo-db-20230328 BUSCO-DEMO-DB-20230328 5.4.6 ${__HERE__}/test-db/busco_downloads
busco-demo-db-20230328 BUSCO-DEMO-DB-20230328 5.4.6 ${__HERE__}/test-db/busco_downloads
213 changes: 213 additions & 0 deletions tools/busco/test-data/busco_downloads/file_versions.tsv

Large diffs are not rendered by default.

6 changes: 3 additions & 3 deletions tools/busco/test-data/genome_results/full_table
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: arthropoda_odb10 (Creation date: 2020-09-10, number of genomes: 90, number of BUSCOs: 1013)
# BUSCO version is: 5.7.1
# The lineage dataset is: arthropoda_odb10 (Creation date: 2024-01-08, number of genomes: 90, number of BUSCOs: 1013)
# Busco id Status Sequence Gene Start Gene End Strand Score Length OrthoDB url Description
774at6656 Missing
980at6656 Missing
Expand Down Expand Up @@ -509,7 +509,7 @@
93535at6656 Missing
93797at6656 Missing
94054at6656 Missing
94238at6656 Complete sample:34764-38486 34764 38486 - 60.7 116 https://www.orthodb.org/v10?query=94238at6656 checkpoint protein HUS1
94238at6656 Complete sample 38486 34764 - 60.7 116 https://v10-1.orthodb.org/?query=94238at6656 Checkpoint protein HUS1
94263at6656 Missing
94304at6656 Missing
94473at6656 Missing
Expand Down
2 changes: 1 addition & 1 deletion tools/busco/test-data/genome_results/full_table_cached
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
# BUSCO version is: 5.5.0
# BUSCO version is: 5.7.1
# The lineage dataset is: archaea_odb10 (Creation date: 2021-02-23, number of genomes: 404, number of BUSCOs: 194)
# Busco id Status Sequence Gene Start Gene End Strand Score Length
860at2157 Missing
Expand Down
4 changes: 2 additions & 2 deletions tools/busco/test-data/genome_results/missing_buscos_list
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: arthropoda_odb10 (Creation date: 2020-09-10, number of genomes: 90, number of BUSCOs: 1013)
# BUSCO version is: 5.7.1
# The lineage dataset is: arthropoda_odb10 (Creation date: 2024-01-08, number of genomes: 90, number of BUSCOs: 1013)
# Busco id
100070at6656
100136at6656
Expand Down
39 changes: 20 additions & 19 deletions tools/busco/test-data/genome_results/short_summary
Original file line number Diff line number Diff line change
@@ -1,35 +1,36 @@
# BUSCO version is: 5\.5\.0
# The lineage dataset is: arthropoda_odb10 \(Creation date: [0-9]{4}-[0-9]{2}-[0-9]{2}, number of genomes: 90, number of BUSCOs: 1013\)
# Summarized benchmarking in BUSCO notation for file [a-z0-9_\-/\.]+
# BUSCO version is: 5.7.1
# The lineage dataset is: arthropoda_odb10 (Creation date: 2024-01-08, number of genomes: 90, number of BUSCOs: 1013)
# Summarized benchmarking in BUSCO notation for file /tmp/tmpl5l1blpe/files/7/a/3/dataset_7a33f452-1064-4b4a-943f-b0efef6a4a4a.dat
# BUSCO was run in mode: euk_genome_aug
# Gene predictor used: augustus

\*\*\*\*\* Results: \*\*\*\*\*
***** Results: *****

C:0\.1%\[S:0\.1%,D:0\.0%\],F:0\.0%,M:99\.9%,n:1013
1 Complete BUSCOs \(C\)
1 Complete and single-copy BUSCOs \(S\)
0 Complete and duplicated BUSCOs \(D\)
0 Fragmented BUSCOs \(F\)
1012 Missing BUSCOs \(M\)
C:0.1%[S:0.1%,D:0.0%],F:0.0%,M:99.9%,n:1013
1 Complete BUSCOs (C)
1 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
1012 Missing BUSCOs (M)
1013 Total BUSCO groups searched

Assembly Statistics:
1 Number of scaffolds
1 Number of contigs
62370 Total length
0\.000% Percent gaps
0.000% Percent gaps
62 KB Scaffold N50
62 KB Contigs N50


Dependencies and versions:
hmmsearch: [0-9\.\+]+
bbtools: [0-9\.\+]+
makeblastdb: [0-9\.\+]+
tblastn: [0-9\.\+]+
augustus: [0-9\.\+]+
gff2gbSmallDNA\.pl: None
new_species\.pl: None
hmmsearch: 3.1
bbtools: 39.06
makeblastdb: 2.15.0+
tblastn: 2.15.0+
augustus: 3.5.0
gff2gbSmallDNA.pl: None
new_species.pl: None
etraining: None
busco: [0-9\.\+]+
python: sys.version_info(major=3, minor=9, micro=19, releaselevel='final', serial=0)
busco: 5.7.1
9 changes: 5 additions & 4 deletions tools/busco/test-data/genome_results/short_summary_cached
Original file line number Diff line number Diff line change
@@ -1,7 +1,7 @@
# BUSCO version is: 5.5.0
# BUSCO version is: 5.7.1
# The lineage dataset is: archaea_odb10 (Creation date: 2021-02-23, number of genomes: 404, number of BUSCOs: 194)
# Summarized benchmarking in BUSCO notation for file /tmp/tmpbmq1q2c6/files/3/3/0/dataset_330f49ed-7a6b-4380-870f-22494bb4b257.dat
# BUSCO was run in mode: prok_genome
# Summarized benchmarking in BUSCO notation for file /tmp/tmprlx1s52v/files/e/8/d/dataset_e8d136af-46c1-4ffa-abaf-e8899526d2b0.dat
# BUSCO was run in mode: prok_genome_prod
# Gene predictor used: prodigal

***** Results: *****
Expand All @@ -27,4 +27,5 @@ Dependencies and versions:
hmmsearch: 3.1
bbtools: 39.01
prodigal: 2.6.3
busco: 5.5.0
python: sys.version_info(major=3, minor=9, micro=19, releaselevel='final', serial=0)
busco: 5.7.1
8 changes: 4 additions & 4 deletions tools/busco/test-data/genome_results_metaeuk/full_table
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: arthropoda_odb10 (Creation date: 2020-09-10, number of genomes: 90, number of BUSCOs: 1013)
# BUSCO version is: 5.7.1
# The lineage dataset is: arthropoda_odb10 (Creation date: 2024-01-08, number of genomes: 90, number of BUSCOs: 1013)
# Busco id Status Sequence Gene Start Gene End Strand Score Length OrthoDB url Description
774at6656 Missing
980at6656 Missing
Expand Down Expand Up @@ -340,7 +340,7 @@
68939at6656 Missing
68961at6656 Missing
68981at6656 Missing
68987at6656 Complete sample:40255-42070 40255 42070 + 122.8 266 https://www.orthodb.org/v10?query=68987at6656 mannose-1-phosphate guanyltransferase alpha
68987at6656 Complete sample 40255 42070 + 122.8 266 https://v10-1.orthodb.org/?query=68987at6656 Nucleotidyl transferase domain
69201at6656 Missing
69238at6656 Missing
69284at6656 Missing
Expand Down Expand Up @@ -509,7 +509,7 @@
93535at6656 Missing
93797at6656 Missing
94054at6656 Missing
94238at6656 Complete sample:35678-34845 34845 35678 - 60.7 116 https://www.orthodb.org/v10?query=94238at6656 checkpoint protein HUS1
94238at6656 Complete sample 35693 34845 - 60.6 116 https://v10-1.orthodb.org/?query=94238at6656 Checkpoint protein HUS1
94263at6656 Missing
94304at6656 Missing
94473at6656 Missing
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: arthropoda_odb10 (Creation date: 2020-09-10, number of genomes: 90, number of BUSCOs: 1013)
# BUSCO version is: 5.7.1
# The lineage dataset is: arthropoda_odb10 (Creation date: 2024-01-08, number of genomes: 90, number of BUSCOs: 1013)
# Busco id
100070at6656
100136at6656
Expand Down
3 changes: 3 additions & 0 deletions tools/busco/test-data/genome_results_metaeuk/out.gff3
Original file line number Diff line number Diff line change
@@ -0,0 +1,3 @@
##gff-version 3
sample MetaEuk gene 40256 42071 198 + . Target_ID=68987at6656_29053_0:00071c;TCS_ID=68987at6656_29053_0:00071c|sample|+|40255
sample MetaEuk gene 34846 35694 527 - . Target_ID=94238at6656_7245_0:00200b;TCS_ID=94238at6656_7245_0:00200b|sample|-|34845
36 changes: 19 additions & 17 deletions tools/busco/test-data/genome_results_metaeuk/short_summary
Original file line number Diff line number Diff line change
@@ -1,30 +1,32 @@
# BUSCO version is: 5\.5\.0
# The lineage dataset is: arthropoda_odb10 \(Creation date: [0-9]{4}-[0-9]{2}-[0-9]{2}, number of genomes: 90, number of BUSCOs: 1013\)
# Summarized benchmarking in BUSCO notation for file [a-z0-9_\-/\.]+
# BUSCO was run in mode: euk_genome_met
# Gene predictor used: metaeuk
# BUSCO version is: 5.7.1
# The lineage dataset is: arthropoda_odb10 (Creation date: 2024-01-08, number of genomes: 90, number of BUSCOs: 1013)
# Summarized benchmarking in BUSCO notation for file /tmp/tmpl5l1blpe/files/f/3/1/dataset_f31d44e3-c824-4cdf-92ba-99a2c26071d2.dat
# BUSCO was run in mode: euk_genome_min
# Gene predictor used: miniprot

\*\*\*\*\* Results: \*\*\*\*\*
***** Results: *****

C:0\.2%\[S:0\.2%,D:0\.0%\],F:0\.0%,M:99\.8%,n:1013
2 Complete BUSCOs \(C\)
2 Complete and single-copy BUSCOs \(S\)
0 Complete and duplicated BUSCOs \(D\)
0 Fragmented BUSCOs \(F\)
1011 Missing BUSCOs \(M\)
C:0.1%[S:0.1%,D:0.0%],F:0.0%,M:99.9%,n:1013
1 Complete BUSCOs (C)
1 Complete and single-copy BUSCOs (S)
0 Complete and duplicated BUSCOs (D)
0 Fragmented BUSCOs (F)
1012 Missing BUSCOs (M)
1013 Total BUSCO groups searched

Assembly Statistics:
1 Number of scaffolds
1 Number of contigs
62370 Total length
0\.000% Percent gaps
0.000% Percent gaps
62 KB Scaffold N50
62 KB Contigs N50


Dependencies and versions:
hmmsearch: [0-9\.\+]+
bbtools: [0-9\.\+]+
metaeuk: [0-9a-z\.\+]+
busco: [0-9\.\+]+
hmmsearch: 3.1
bbtools: 39.06
miniprot_index: 0.13-r248
miniprot_align: 0.13-r248
python: sys.version_info(major=3, minor=9, micro=19, releaselevel='final', serial=0)
busco: 5.7.1
Binary file modified tools/busco/test-data/genome_results_metaeuk/summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
8 changes: 4 additions & 4 deletions tools/busco/test-data/genome_results_metaeuk_auto/full_table
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: eukaryota_odb10 (Creation date: 2020-09-10, number of genomes: 70, number of BUSCOs: 255)
# Busco id Status Sequence Gene Start Gene End Strand Score Length
# BUSCO version is: 5.7.1
# The lineage dataset is: eukaryota_odb10 (Creation date: 2024-01-08, number of genomes: 70, number of BUSCOs: 255)
# Busco id Status Sequence Gene Start Gene End Strand Score Length OrthoDB url Description
39650at2759 Missing
83779at2759 Missing
87842at2759 Missing
Expand Down Expand Up @@ -131,7 +131,7 @@
1041560at2759 Missing
1049599at2759 Missing
1051021at2759 Missing
1053181at2759 Complete sample:35678-34845 34845 35678 - 45.0 149
1053181at2759 Complete sample 35693 34845 - 44.7 149 https://v10-1.orthodb.org/?query=1053181at2759 Checkpoint protein Hus1/Mec3
1057950at2759 Missing
1065019at2759 Missing
1076134at2759 Missing
Expand Down
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: eukaryota_odb10 (Creation date: 2020-09-10, number of genomes: 70, number of BUSCOs: 255)
# BUSCO version is: 5.7.1
# The lineage dataset is: eukaryota_odb10 (Creation date: 2024-01-08, number of genomes: 70, number of BUSCOs: 255)
# Busco id
1001705at2759
1003258at2759
Expand Down
3 changes: 0 additions & 3 deletions tools/busco/test-data/genome_results_metaeuk_auto/out.gff
Original file line number Diff line number Diff line change
@@ -1,5 +1,2 @@
##gff-version 3
sample MetaEuk gene 34846 35694 527 - . Target_ID=1053181at2759_7245_0:00200b;TCS_ID=1053181at2759_7245_0:00200b|sample|-|34845
sample MetaEuk mRNA 34846 35694 527 - . Target_ID=1053181at2759_7245_0:00200b;TCS_ID=1053181at2759_7245_0:00200b|sample|-|34845_mRNA;Parent=1053181at2759_7245_0:00200b|sample|-|34845
sample MetaEuk exon 34846 35694 527 - . Target_ID=1053181at2759_7245_0:00200b;TCS_ID=1053181at2759_7245_0:00200b|sample|-|34845_exon_0;Parent=1053181at2759_7245_0:00200b|sample|-|34845_mRNA
sample MetaEuk CDS 34846 35694 527 - . Target_ID=1053181at2759_7245_0:00200b;TCS_ID=1053181at2759_7245_0:00200b|sample|-|34845_CDS_0;Parent=1053181at2759_7245_0:00200b|sample|-|34845_exon_0
13 changes: 7 additions & 6 deletions tools/busco/test-data/genome_results_metaeuk_auto/short_summary
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# BUSCO version is: 5.5.0
# The lineage dataset is: eukaryota_odb10 (Creation date: 2020-09-10, number of genomes: 70, number of BUSCOs: 255)
# Summarized benchmarking in BUSCO notation for file /tmp/tmp5_syjrgy/files/a/5/a/dataset_a5adf25d-f667-41d6-9472-f8661403e128.dat
# BUSCO version is: 5.7.1
# The lineage dataset is: eukaryota_odb10 (Creation date: 2024-01-08, number of genomes: 70, number of BUSCOs: 255)
# Summarized benchmarking in BUSCO notation for file /tmp/tmpg41og118/files/f/5/e/dataset_f5e13834-6f57-4cb4-af82-b730a3e03fdb.dat
# BUSCO was run in mode: euk_genome_met
# Gene predictor used: metaeuk

Expand All @@ -25,7 +25,8 @@ Assembly Statistics:

Dependencies and versions:
hmmsearch: 3.1
bbtools: 39.01
bbtools: 39.06
prodigal: 2.6.3
busco: 5.5.0
metaeuk: 6.a5d39d9
python: sys.version_info(major=3, minor=9, micro=19, releaselevel='final', serial=0)
busco: 5.7.1
metaeuk: 7.bba0d80
Binary file modified tools/busco/test-data/genome_results_metaeuk_auto/summary.png
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading

0 comments on commit 21578f9

Please sign in to comment.