Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Question about --read_proba_threshold in m6anet inference #177

Open
cxy-26 opened this issue Oct 30, 2024 · 5 comments
Open

Question about --read_proba_threshold in m6anet inference #177

cxy-26 opened this issue Oct 30, 2024 · 5 comments

Comments

@cxy-26
Copy link

cxy-26 commented Oct 30, 2024

Hi,

I am working on Nanopore direct RNA-seq RNA004 chemistry. I have a question about the --read_proba_threshold in m6anet inference step. I tried the threshold with default value (0.033379376), 0.05 and 0.5 but there is no difference in the output data.site_proba.csv datasets.

m6anet inference --input_dir ./ --out_dir ./ --pretrained_model HEK293T_RNA004 --n_processes 16 --num_iterations 1000
m6anet inference --input_dir ./ --out_dir ./threshold_0.05/ --pretrained_model HEK293T_RNA004 --n_processes 16 --num_iterations 1000 --read_proba_threshold 0.05
m6anet inference --input_dir ./ --out_dir ./threshold_0.5/ --pretrained_model HEK293T_RNA004 --n_processes 16 --num_iterations 1000 --read_proba_threshold 0.5

I expected a decrease of mod_ratio as the threshold increases since the mod_ratio column is calculated by thresholding the probability_modified from data.indiv_proba.csv based on the --read_proba_threshold parameter during m6anet inference call.

Summary of mod_ratio with default threshold:
image

Summary of mod_ratio with threshold=0.05:
image

Summary of mod_ratio with threshold=0.5:
image

I am wordering whether this --read_proba_threshild worked or not.
Or I misunderstood the way it calulates the mod_ratio.

@yuukiiwa
Copy link
Collaborator

Hi @cxy-26,

mod_ratio is the number for read with read-level probabilities over the --read_proba_threshold. From our experience, changing the --read_proba_threshold changes the mod_ratio.

Thanks!

Best wishes,
Yuk Kei

@baibhav-bioinfo
Copy link

hi @cxy-26,
i have a query, which if you can clarify would help me a lot.

i also have DRS reads sequenced with RNA004 chemistry for a plant species (~15 million reads/sample).
So, as the new m6Anet inference model based on RNA004 is only trained on human cell line, can i still use the RNA002 based arabidopsis model for m6A prediction in my case (plant species)?

i did use the plant model (RNA002), but the results shows very low (~3000) number of m6A sites per sample.
I ran with default threshold (0.0032978046219796 for plant).

@cxy-26
Copy link
Author

cxy-26 commented Nov 25, 2024

Hi @baibhav-bioinfo,

I am sorry that I can't answer your question since all my samples are human cell line. Have you tried to use RNA004 human cell line model for your plant sample or to train your own model?

And I guess you at the wrong person?

Best

@cxy-26
Copy link
Author

cxy-26 commented Nov 25, 2024

Hi @cxy-26,

mod_ratio is the number for read with read-level probabilities over the --read_proba_threshold. From our experience, changing the --read_proba_threshold changes the mod_ratio.

Thanks!

Best wishes, Yuk Kei

Thanks for your explaination. I'll doublecheck my code and results.

@baibhav-bioinfo
Copy link

Thanks for the response and also apologies for the bothering.
i did ask the developers, they might have missed my issue or not able to fully comprehend my issue.
So, i am asking you as you are also using the DRS in first hand.

what would you suggest will be best in my case? using RNA004 human model or RNA002 based plant model?
is there any option i can train my own model? how does that work

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants