-
Notifications
You must be signed in to change notification settings - Fork 25
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve the documentation for tidybulk
#282
Comments
tidybulk
Hi, I'd like to contribute this one, but I am not sure how it should be done or when is the deadline. I am a medical doctor but currently a PhD student as a computational biologist of my group. I know transcriptomics analysis with bioconductor and I like tidy paradigm. Since this is labeled as a "good first issue", if there will be enough guidance, I can give it a try, what do you think? |
Thanks @aliosmancetin! Nice to meet you.
There is a soft deadline to commit and start working, so we can form the team for the upcoming paper. But there is no hard deadline for completing the task. We trust the community on that side. Of course, each challenge should not drag too long, so to keep the ball rolling.
Then, other more challenging tasks are
Let me know if you want more guidance. @HelenaLC might be a good code reviewer, as she is an expert on R packages and standards. |
Nice to meet you too! Thanks for your comments and outline. It seems doable for me, I'll start asap. |
I couldn't figure out how to assign myself. Should ".take" work or not? |
Hello @aliosmancetin , just to do some planning on our side. Do you have a timeline already to work on this challenge? A note: for the tidy adapters, please follow the |
Hi @stemangiola, If you refer very structured work-plan with "timeline", no, I don't have unfortunately 😅 But, I've already started to read must-read documents. I am willing to commit some work on this issue, but I am not sure how long it takes. |
I assigned you prematurely. Now I have emptied the assignee field, so you can assign yourself when you will be ready, with no rush :) |
Okay, let me focus on this one more week. Then we can discuss my progress and whether it's going to work or not. |
@GrootJ welcome to this issue! As this issue is multifaceted, we can spread the work with other devs (@aliosmancetin potentially). Please have a look at the description for the various goals. Sure, we can have a short Zoom now or any other time. I will be in NY until tomorrow |
Hi @stemangiola, I saw you assigned me again. I think I figured out how generally tidybulk and tidySummarizedExperiment work but I have couple of questions and I need some confirmations if I get this right. Then I could start improving the documentation. Do you prefer doing this under this issue or with email (questions may sound easy and basic, I don't know). |
I was reorganising things and I figured out it was more informative to put potential contributors anyway :) feel free to ask here. So other contributors will be able to learn as well. |
@stemangiola - got more set up, about to start on this - i will follow suggestions you provided to @aliosmancetin. cheers, Joost |
Thanks @GrootJ and @aliosmancetin, for the enthusiasm! tidybulk really needed this. Welcome to the team and to the publication ;) We can divide the tasks so @GrootJ and @aliosmancetin can pick and work independently. If possible, do this with the perspective of a new user who should find everything intuitive and clear and should feel tidybulk as a black box (a thing that more experienced people often criticise). So tidybulk can be not only convenient but also an educational tool.
You can put your name next to 1, 2, 3, 4 so we know what we are working on. |
Now this task is more doable for us, welcome @GrootJ :) @stemangiola At this point, my question is, should we focus on "methods.R" or as well as "methods_SE.R". Other question, you refer to "analysis methods". Are those in "methods.R", "methods_SE.R" files or "functions.R", "functions_SE.R" files? I may have a suggestion about documentation, we may include a kind of "developer guide" to documentation. Maybe experienced users/developers can easily understand how tidybulk works (classes, generics, methods and backend implementation etc.) however, it took me some time to understand how it works exactly (after I saw there are |
Good point, I edited the to-do list specifying which files they apply to
I would not say so. Still, we have to give equal importance to the
"methods.R", "methods_SE.R" (I specified now in the to-do list).
Amazing point. This is definitely a new/different "#tidyomics open challenge". I see this to be developed as a blog post and vignette as a guide. You @aliosmancetin and @GrootJ are great judges of the most confusing, less documented and intuitive aspects of the code base, on the front and back end. You could open a challenge/issue, and start listing all confusing stuff that you are figuring out with sweet and tiers, and that can be the material for a vignette/blog post.
None has tackled the developer side, but it is a good idea. |
@aliosmancetin and @GrootJ when you manage to do your first commit, please add your authorship details here https://docs.google.com/spreadsheets/d/19XqhN3xAMekCJ-esAolzoWT6fttruSEermjIsrOFcoo/edit?usp=sharing |
Hello Fellas, @chilampoon would be on their way to resolve the conflicts of Is any of you (@aliosmancetin and @GrootJ ) actively working on this? I don't want to create duplications. |
Hi @stemangiola, I am not working on solving the conflicts. |
Hi @stemangiola, others - I am not working on resolving those conflicts either (item 1 of your list above). I am making some headway to get a bit better grip on some of the ongoing conversations you guys have (around tidybulk package development, tidiness in omics goals in general). I test-ran 2 tidybulk workflows (README.Rmd with se_mini dataset and diff feature abundance from the manuscript with the pasilla dataset) which gave me a better idea of the functions and documentation in there. I think I see the opportunities for documentation improvement you mentioned (item 3 of your list above). E.g. adjust_abundance function documentation lists inputs and outputs but does not mention batch correction (using sva which it seems to auto-install+load) it seems to perform (between single and paired reads - type in the pasilla dataset). Is this is an example of documentation you like to get clarified? How to start writing such improvement also depends of on from which new user's perspective it would need to be made "easily" understandable; new users w basic familiarity with RNA-Seq and tidyverse?
Probably good to touch base on this (virtual chat or zoom?) and how we wil go about that (in version control, who which functions etc). i still need to get more familiar with roxigen too (not sure if that would fit the timeline for your publication?) |
Yes, correct
Very basic knowledge. But we should not rely on any jargon or abbreviation. Basically the English dictionary should be enough to understand tidybulk and the underlying methods.
Agree
Sure. Where are you based? I could do it in 2 hours from now. |
Check - clearly an educational goal as well which can synergize with the laudable further standardization/tidy-ing you guys are working on. I guess there is a trade-off between level of summarization of documentation (and functions) where advanced users may want more technical detail (in functions I guess configuration options could deal with that). But from the sound of it, the aim would be plenty of "basic knowledge" users still learning about RNA-Seq in general. Some initial ideas:
Perhaps some of that sounds obvious - just trying to organize thoughts, get some initial feedback Here 2 initial ideas for datasets (again, open to feedback/other suggestions)
I am on the US east coast - you're back in Melbourne again? |
Great initiative!
We don't necessarily want to suggest the best way to do analysis. In this first instance just give transparency of what is happening underneath.
Although we are still supporting tibble input, we are not recommending it anymore, as the
Probably we don't want to go that basic. Let's take this task as an onion-like. We can first tackle the high-return part, which exposes the backend code (just the really relevant part of the method called) in the Maybe let's start from the
Thanks. Let's tackle this as a later task. Having said all that. I think in the future, after
One step at a time, we will climb the mountain ;) |
@GrootJ I have added |
Hi guys @stemangiola @GrootJ, Sorry for my late response. I was very busy recently, I am trying to move to new apartment. By the way, I am in Germany. As far as I understand, first step we should do is to improve @stemangiola previously you mentioned documentation is based on methods.R. Isn't methods_SE.R included at all? This still confuses me because as you said before, although tidybulk is still supported, it is not recommended. Maybe default documentation should be changed with methods_SE.R. Tidybulk related differences or notes could be mentioned under |
Yes I should have been clearer. We have three elements in tidybulk
The documentation is relative to (1). (2) and (3) do not need documentation, and they use the same underlying analysis methods, so the documentation for (1) will apply to (2) and (3) |
Any news team? |
will update soon, any new requests? |
From my side, I don't have unfortunately. |
The text was updated successfully, but these errors were encountered: