babs init: Initialize a BABS project

Command-Line Arguments

Initialize a BABS project and bootstrap scripts that will be used later.

usage: babs init [-h] [--list_sub_file LIST_SUB_FILE] --container_ds
                 CONTAINER_DS --container_name CONTAINER_NAME
                 [--container_config CONTAINER_CONFIG] --processing_level
                 {subject,session} --queue {slurm} [--keep_if_failed]
                 PATH

Positional Arguments

PATH

Absolute path to the directory where the BABS project will be located. This folder will be automatically created.

Named Arguments

--list_sub_file, --list-sub-file

Path to the CSV file that lists the subject (and sessions) to analyze; If there is no such file, please not to specify this flag. Single-session data: column of 'sub_id'; Multi-session data: columns of 'sub_id' and 'ses_id'.

--container_ds, --container-ds

Path to the container DataLad dataset

--container_name, --container-name

The name of the BIDS App container, i.e., the <image NAME> used when running datalad containers-add <image NAME>. Importantly, this should include the BIDS App's name to make sure the bootstrap scripts are set up correctly; Also, the version number should be added, too. babs init is not case sensitive to this --container_name. Example: toybidsapp-0-0-7 for toy BIDS App version 0.0.7.

--container_config, --container-config

Path to a YAML file that contains the configurations of how to run the BIDS App container

--processing_level, --processing-level

Possible choices: subject, session

Whether jobs should be run on a per-subject or per-session (within subject) basis.

--queue

Possible choices: slurm

The name of the job scheduling queue that you will use.

--keep_if_failed, --keep-if-failed

If babs init fails with error, whether to keep the created BABS project. By default, you don't need to turn this option on. However, when babs init fails and you hope to use babs check-setup to diagnose, please turn it on to rerun babs init, then run babs check-setup. Please refer to section below 'What if babs init fails?' for details.

Default: False

Detailed description

How do I define the input dataset's name <name> in babs init --datasets?

General guideline: a string you think that's informative. Examples are BIDS, freesurfer.

Specific restrictions:

  1. If you have more than one input BIDS dataset (i.e., more than one --datasets), please make sure the <name> is different for each dataset;

  2. If an input BIDS dataset is a zipped dataset, i.e., files are zipped files, such as BIDS data derivatives from another BABS project:

    1. You must name it with pattern in the zip filenames so that babs init knows which zip file you want to use for a subject or session. For example, one of your input dataset is BIDS derivates of fMRIPrep, which includes zip files of sub-xx*_freesurfer*.zip and sub-xx*_fmriprep*.zip. If you'd like to feed freesurfer results zip files into current BABS project, then you should name this input dataset as freesurfer. If you name it a random name like BIDS_derivatives, as this is not a pattern found in these zip files, babs init will fail.

    2. In addition, the zip files named with such pattern (e.g., *freesurfer*.zip) should include a folder named as the same name too (e.g., a folder called freesurfer).

    3. For example, in multi-session, zipped fMRIPrep derivatives data (e.g., https://osf.io/k9zw2/):

      sub-01_ses-A_freesurfer-20.2.3.zip
      ├── freesurfer
      │   ├── fsaverage
      │   └── sub-01
      sub-01_ses-B_freesurfer-20.2.3.zip
      ├── freesurfer
      │   ├── fsaverage
      │   └── sub-02
      etc
      

How is the list of subjects (and sessions) determined?

A list of subjects (and sessions) will be determined when running babs init, and will be saved in a CSV file called named processing_inclusion.csv located at /path/to/my_BABS_project/analysis/code.

To filter subjects and sessions, use babs init with -- /path/to/subject/list/csv/file. Examples: Single-session example, Multi-session example.

See List of included subjects (and sessions) to process for how this list is determined.

What if babs init fails?

If babs init fails, by default it will remove ("clean up") the created, failed BABS project.

When this happens, if you hope to use babs check-setup to debug what's wrong, you'll notice that the failed BABS project has been cleaned and it's not ready to run babs check-setup yet. What you need to do are as follows:

  1. Run babs init with --keep-if-failed turned on.

    • In this way, the failed BABS project will be kept.

  2. Then you can run babs check-setup for diagnosis.

  3. After you know what's wrong, please remove the failed BABS project with following commands:

    cd <project_root>/analysis    # replace `<project_root>` with the path to your BABS project
    
    # Remove input dataset(s) one by one:
    datalad remove -d inputs/data/<input_ds_name>   # replace `<input_ds_name>` with each input dataset's name
    # repeat above step until all input datasets have been removed.
    # if above command leads to "drop impossible" due to modified content, add `--reckless modification` at the end
    
    git annex dead here
    datalad push --to input
    datalad push --to output
    
    cd ..
    pwd   # this prints `<project_root>`; you can copy it in case you forgot
    cd ..   # outside of `<project_root>`
    rm -rf <project_root>
    

    If you don't remove the failed BABS project, you cannot overwrite it by running babs init again.

Example commands

Example babs init command for toy BIDS App + multi-session data on a SLURM cluster:

babs init \
    --datasets BIDS=/path/to/BIDS_datalad_dataset \
    --container_ds /path/to/toybidsapp-container \
    --container_name toybidsapp-0-0-7 \
    --container_config /path/to/container_toybidsapp.yaml \
    --processing_level session \
    --queue slurm \
    /path/to/a/folder/holding/BABS/project/my_BABS_project

Example command if you have more than one input datasets, e.g., raw BIDS data, and fMRIPrep with FreeSurfer results ingressed. The 2nd dataset is also result from another BABS project - a zipped dataset has filenames in patterns of 'sub-xx*_freesurfer*.zip'. Therefore, the 2nd input dataset should be named as 'freesurfer', a keyword in filename:

babs init \
    ... \
    --datasets \
    BIDS=/path/to/BIDS_datalad_dataset \
    freesurfer=/path/to/freesurfer_results_datalad_dataset \
    ...

Debugging

Error when cloning an input dataset

What happened: After babs init prints out a message like this: Cloning input dataset #x: '/path/to/input_dataset', there was an error message that includes this information: err: 'fatal: repository '/path/to/input_dataset' does not exist'.

Diagnosis: This means that the specified path to this input dataset (i.e., in --datasets) was not valid; there is no DataLad dataset there.

How to solve the problem: Fix this path. To confirm the updated path is valid, you can try cloning it to a temporary directory with datalad clone /updated/path/to/input_dataset. If it is successful, you can go ahead rerun babs init.