print_partial_datasets¶
Command-line tool to scan a dataset organised in structured directories and print a table to highlight gaps. This is useful for spotting missing data or broken analyses.
It relies on the wonderful fsl.utils.filetree tool from fslpy. For example:
Complete datasets
participant session raw_T1 raw_bold raw_fmap_mag raw_fmap_ph
───────────────────────────────────────────────────────────────────────────────
01 01 x x x x
───────────────────────────────────────────────────────────────────────────────
01 02 x x x x
───────────────────────────────────────────────────────────────────────────────
Partial datasets
participant session raw_T1 raw_bold raw_fmap_mag raw_fmap_ph
───────────────────────────────────────────────────────────────────────────────
04 02 x x x
───────────────────────────────────────────────────────────────────────────────
07 02 x x x
───────────────────────────────────────────────────────────────────────────────
10 02 x x x
───────────────────────────────────────────────────────────────────────────────
Installation¶
pip install print-partial-datasets
Usage¶
Specify your own file tree in a text file as shown in the fsl.utils.filetree documentation. This can be as simple as the example below. If your dataset is already organised in a structure such as BIDS, you may be able to use one of the preset trees:
sub-{participant}
ses-{session}
anat (anat_dir)
sub-{participant}_ses-{session}_T1w.nii.gz (anat_image)
sub-{participant}_ses-{session}_T1w_brain.nii.gz (brain_extracted)
dwi (dwi_dir)
sub-{participant}_ses-{session}_dwi.nii.gz (dwi_image)
You can either call the script from the command line, or programmatically from a python console or script.
Command line¶
print_partial_datasets -d /data/directory -f /path/to/file.tree -s anat_image brain_extracted dwi_image -v participant session
Python¶
Example python usage:
from print_partial_datasets import print_partial_datasets
datadir = "/data/directory"
filetree = "/path/to/file.tree"
short_name = ["anat_image", "brain_extracted", "dwi_image"]
variables = ["participant", "session"]
print_partial_datasets(datadir, filetree, short_name, variables)
This should produce a nice printed summary of your data, with complete datasets followed by partial ones.
Free software: Apache Software License 2.0
Documentation: https://print-partial-datasets.readthedocs.io.
Credits¶
This is little more than a user-friendly wrapper around code written by Michiel Cottaar and Paul McCarthy.
This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.