-
Notifications
You must be signed in to change notification settings - Fork 35
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix dynamic <<>> runindex #213
Fix dynamic <<>> runindex #213
Conversation
I agree that this is probably less complicated and a good idea
I need to think about this a bit more, get a better understanding of the problem you are solving with the tsv file |
Thank you very much for looking into this. Bids mappings are used here as a way of storing run indexes so we know at the end of bidscoiner plugin run which runs use A TSV file containing BIDS mappings can be highly beneficial in validating the results produced by BIDScoiner. This file enables tracking of which original data inputs lead to specific final outputs (including runs, dcm2niix postfixes...). |
Could you perhaps split off the tsv part? I also think it's probably better to fix the run-index at the end, after all the files are there. |
Let's deal with the tsv option later, and first fix the more important run-indexing |
bidscoin/bidscoiner.py
Outdated
@@ -56,6 +56,12 @@ def bidscoiner(rawfolder: str, bidsfolder: str, subjects: list=(), force: bool=F | |||
# Create a code/bidscoin subfolder | |||
(bidsfolder/'code'/'bidscoin').mkdir(parents=True, exist_ok=True) | |||
|
|||
# Delete bids_mappings file if it exists | |||
bids_mappings_file = bidsfolder / 'code' / 'bidscoin' / 'bids_mappings.tsv' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to go wrong when you run bidscoiner multiple times (e.g. run it every time a new subject is acquired)
bidscoin/bids.py
Outdated
@@ -313,6 +313,32 @@ def dynamicvalue(self, value, cleanup: bool=True, runtime: bool=False): | |||
return value | |||
|
|||
|
|||
class BidsMapping: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wonder if a new class is needed, or whether this could simply be part of DataSource (that contains almost the same info, except run and targets)
bidscoin/plugins/dcm2niix2bids.py
Outdated
@@ -226,6 +227,7 @@ def bidscoiner_plugin(session: Path, bidsmap: dict, bidsses: Path) -> Union[None | |||
scans_table.index.name = 'filename' | |||
|
|||
# Process all the source files or run subfolders | |||
bids_mappings: List[BidsMapping] = [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see what you are doing but I generally try to keep the plugins as short and simple as possible (to encourage external contributors to write plugiuns), and move stuff over as much as possible to bidscoiner. Building a list of mappings does add to the complexity...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand :). BidsMappings can be deleted from the code.
Commits "Make increment_runindex to not retroactively add run1, delete run from bidsname if not needed with dynamic index" and the principle of method bids.rename_runless_to_run1 introduced in commit "Add rename_runles_to_run1" can be used.
In order to dynamically add run-1 to run-less files at the end, we could:
- save information which files were generated with <<>> setting, and for those run add_rename_runless_to_run1
Each plugin would need to run at the end of conversion bids.rename_runless_to_run1, which is still easier than BidsMappings. - Run rename_runless_to_run1 in bidscoiner after plugin. Files that need run-1 could be detected via the existence of run-2 and no run-1. Hmm, that defeats the purpose of fixed run-index. Then maybe we could get <<>> information from bidsmap, but we do not extract from it which files were conversion output of that entry, and matching them again is not a great option.
I will have time to look at it more closely tomorrow evening.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think with the bidsmapping tsv file you were basically making a provenance table. For tracking provenance there is the W3C standard, and there is even a BEP in the making to implement this in BIDS:
https://github.com/bids-standard/BEP028_BIDSprov
That adds even more complexity, but would come with the benefit of being a standard...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, the idea was to quickly check after the bidscoiner run what was done - what source was converted to what BIDS.
Thank you for pointing me to the standards; it looks really useful.
bidscoin/bids.py
Outdated
new_entry = { | ||
"subject": target_subject, | ||
"session": target_session, | ||
'SeriesDescription': bids_mapping.run.get("attributes", {}).get("SeriesDescription"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This works only for DICOM, not for other modalities / plugins
…m bidsname if not needed with dynamic index
0cb2145
to
d29115d
Compare
bidscoin/plugins/dcm2niix2bids.py
Outdated
@@ -312,6 +314,7 @@ def bidscoiner_plugin(session: Path, bidsmap: dict, bidsses: Path) -> Union[None | |||
if not list(outfolder.glob(f"{bidsname}.*nii*")): continue | |||
|
|||
jsonfiles.update(outfolder.glob(f"{bidsname}.json")) # add existing created json files: bidsname.json | |||
run["targets"].update(outfolder.glob(f"{bidsname}.*[!json]")) # add files created using this bidsmap run-item (except sidecars) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This line is all that is need when targets are added to the datasource
bidscoin/bids.py
Outdated
else: | ||
run = copy.copy(run) # popping targets will not change original run | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit like a hack, I suspect it is not needed if we add targets
to the datasource instead of to the run. The datasource as part of the run is already hacky, but at least then it is all in one place. Moreover, cleaning up the datasource from the run is already implemented (e.g. when saving the bidsmap). I will make some commits to your PR so you can see what I mean
I moved |
It looks fine to me now, what do you think, are you happy with my changes? |
Your provenance bidsmappings tsv file can now be easily generated in the bids library (most easily in |
The code looks really nice. The problem is that while The quick solution would be to instead of |
Is it? I didn't test it in practice but I did checked the logic for this in the code. The datasource instance is passed all the way up to get_run_? |
Plugin
|
Ok, I'll see if I can fix that in a clean way |
Adding a DataSource to run has always been a workaround I'm not very happy with. I have now cleaned things up by refactoring getting / creating runs from a data source. It passes the pytests, but that unfortunately doesn't properly cover plugin integration tests. Does it work for you now? |
Yes, it works! :) |
- Fix broken suffix handling - Remove the illogical logic for obsolete(?) run-2 names - Adjust the tests accordingly
- Revert renaming at the end, instead rename when encountering run-2 - Remove the datasource.targets bookkeeping - Adjust the tests accordingly
I want to release a new bidscoin version and started to do more integration tests. I then came across several issues related to this PR (e.g. broken suffix handling, run-index orphans, etc) and started looking deeper into your arguments for it. I decided to revert most of this PR for the following reasons:
I fail to see how that would change if you wait until the end
That's an excellent observation of a subtle bug! I now fixed that by simply resetting the run-index of
No, not with the above fix
That's a very complicated sentence, and I'm not sure if I follow it. But I think whatever you trying to say here, it is no longer relevant as the run-index of
Again, I don't see how doing more bookkeeping of outputs makes things easier. I think it is the other way around, by reverting the PR (well not all of it, I kept the good parts :-)), the code became simpler, with less overhead and better to maintain. Unfortunately, the problem of dealing with dcm2niix output is just very hard, as you have experienced, so renaming schemes remain complex but inevitable.
No, as soon as Anyhow, I greatly appreciated this PR and it has definitely led to bugfixes and improved design. I still need to do more testing and there are some (unrelated) bugs that I still need to fix. I sincerely hope the new version is working for you, let me know if it isn't, and I'll try to identify the cause and fix it! |
Hello, Not updating sidecars when run1 is added via increment_runindex. This means that newly renamed run1 file via
For the other problem, let me give a concrete example: The code you added is really nice. Renaming runless to run1 and back is not great, but will work. I tried to play a bit with bidsprov from the other issue, which in simplified version uses the aggregated variable matched_runs to generate bidsprov, and these changes mean bidsprov will be harder to generate (maybe via custom logger function?). But that is out of question here + I don’t know if other people would see as valuable a quick way to check what outputs were generated from what inputs + BEP028 is only proposed now. |
…of scan_table altogether (Github PR #213)
I just realized that that scans_table doesn't need any bookkeeping / renaming at all, it is sidecars (now further simplified and renamed once more to |
One other idea that is still in my mind is to use the acq times to verify (or even correct) the ordering of the run-indexes |
Every run-1, run-2, etc file should originate from the same scan protocol, meaning that the same output files are generated every time that protocol (= source) is encountered in the
I think you will find out that if you test with the latest code all these problems have disappeared :-) |
That is a great observation and the new code is with bidsnames greatly simplified :). I have one question, can it happen that we rename runless file to run1 file incorrectly? |
Yes, there is one edge case that I am working on right now. It's when dcm2niix outputs e.g.
Yes, I was also thinking of creating a bidsprov logger or a bidsprov class. I think p.s. Also note that the bids-validator gives an error if bidsignore scans (e.g. from |
Here's an example, perhaps you see something similar too: ll bids/sub-flatABRIM/extra_data
total 270M
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-1_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-1_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_GR.json
-rw-r--r-- 1 marzwi mrphys 9.8M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-2_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_GR.json
-rw-r--r-- 1 marzwi mrphys 9.6M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-3_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_GR.json
-rw-r--r-- 1 marzwi mrphys 9.4M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-4_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_GR.json
-rw-r--r-- 1 marzwi mrphys 9.3M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-5_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_GR.json
-rw-r--r-- 1 marzwi mrphys 9.1M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-6_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_GR.json
-rw-r--r-- 1 marzwi mrphys 8.9M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 19M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-7_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_GR.json
-rw-r--r-- 1 marzwi mrphys 8.8M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 20M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-8_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_GR.json
-rw-r--r-- 1 marzwi mrphys 8.7M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_part-phase_GR.json
-rw-r--r-- 1 marzwi mrphys 20M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_mod-AlternativeGRE5echosPFshorter_echo-9_part-phase_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_run-1_mod-AlternativeGRE5echosPFshorter_echo-1_GR.json
-rw-r--r-- 1 marzwi mrphys 9.9M Feb 5 12:12 sub-flatABRIM_acq-AlternativeGRE5echosPFshorter_dir-LR_run-1_mod-AlternativeGRE5echosPFshorter_echo-1_GR.nii.gz
-rw-r--r-- 1 marzwi mrphys 2.4K Feb 5 12:12 sub-flatABRIM_acq-tsevfliso08mmLARGEFOV_T2w.json
-rw-r--r-- 1 marzwi mrphys 16M Feb 5 12:12 sub-flatABRIM_acq-tsevfliso08mmLARGEFOV_T2w.nii.gz |
Ok, here's a short update. What I said about echo numbers not in sync was not true, and the issue was caused by me bidsmapping data automatically and not setting the mag/phase part. As a remedy I added some handy logic to the dcm2niix2bids plugin to automatically set it for any non-magnitude image. I think it should work but the strange thing is that it doesn't and that there are unwanted aliases introduced (presumably making everything a phase image). I tried to get rid of all aliases when loading a bidsmap, but so far I have been unsuccessful in that. If you have anything to change for the bidsmap_dccn template, I'm happy to hear about it |
Yes, I think echo numbers are in sync, at least I haven't noticed for our data any differences. The new code looks ok to me. I checked bidsmap_dccn template with our dataset and it added part-phase correctly only to the files with 'P'. I will let you know if I find out something new. |
Ok, I fixed it, I had vague issues because dict.update() was making shallow copies (I think I had fixed this long ago, it was removed by me with this PR and now I re-fixed it). It was really hard to pinpoint, but from my side, my integration tests are now looking good again :-) |
Adding run-1 to files immediately when new run is detected introduces problems:
dcm2niix2bids: variable jsonfiles
Need to remove run-less file and add run-1 file
dcm2niix2bids: newbidsname via dcm2niix postfixes
I think this solution is not well maintainable since one needs to rely on the concrete order of functions called and increment_runindex is not standing on its own, since we need to later fix renamed files. With dcm2niix postfixes, files are being renamed back and forth, when run-1 is not needed in the first place, although we do not know it at the time of naming. Also, if in the future files are collected in new variables, these changes need to be reflected in all the variables and it is easy to forget to do that.
I suggest adding run-1 to files that need it (run-2 exists) and use <<>> run-index at the end of the plugin run, since at that time all the names of files are final and dcm2niix postfixes won’t change them. It also removes side effect from increment_runindex and makes it simpler.
I merged this fix with other suggestion I have: generate .tsv file with information what source was mapped to what bids files. I know that one can see in logs what was mapped to what, but having tsv file means one can quickly check the final names and that mappings are correct. We can see how the result mappings would look like also from bidsmap, but run-indexes and changes via dcm2niix postfixes are not there.
Maybe in the future, user could define what run data would he like to add as new column for easier check of mappings, for now I added SeriesDescription since a lot of files are mapped by that.
This bids_mappings.tsv file looks for example like this: