babs init: Initialize a BABS project

Command-Line Arguments

Initialize a BABS project and bootstrap scripts that will be used later.

usage: babs init [-h] [--list_sub_file LIST_SUB_FILE] --container_ds
                 CONTAINER_DS --container_name CONTAINER_NAME
                 [--container_config CONTAINER_CONFIG] --processing_level
                 {subject,session} --queue {slurm} [--keep_if_failed]
                 [--throttle THROTTLE] [--shared_group SHARED_GROUP]
                 PATH

Positional Arguments

PATH

Absolute path to the directory where the BABS project will be located. This folder will be automatically created.

Named Arguments

--list_sub_file, --list-sub-file

Path to the CSV file that lists the subject (and sessions) to analyze; If there is no such file, please not to specify this flag. Single-session data: column of 'sub_id'; Multi-session data: columns of 'sub_id' and 'ses_id'.

--container_ds, --container-ds

Path to the container DataLad dataset

--container_name, --container-name

The name of the BIDS App container, i.e., the <image NAME> used when running datalad containers-add <image NAME>. Importantly, this should include the BIDS App's name to make sure the bootstrap scripts are set up correctly; Also, the version number should be added, too. babs init is not case sensitive to this --container_name. Example: toybidsapp-0-0-7 for toy BIDS App version 0.0.7.

--container_config, --container-config

Path to a YAML file that contains the configurations of how to run the BIDS App container

--processing_level, --processing-level

Possible choices: subject, session

Whether jobs should be run on a per-subject or per-session (within subject) basis.

--queue

Possible choices: slurm

The name of the job scheduling queue that you will use.

--keep_if_failed, --keep-if-failed

If babs init fails with error, whether to keep the created BABS project. By default, you don't need to turn this option on. However, when babs init fails and you hope to use babs check-setup to diagnose, please turn it on to rerun babs init, then run babs check-setup. Please refer to section below 'What if babs init fails?' for details.

Default: False

--throttle

Optional throttle value for SLURM array jobs. This limits the number of simultaneously running array tasks. The value will be added to the array specification as %<throttle>. Example: --throttle 10 will result in --array=1-${max_array}%10.

--shared_group, --shared-group

Unix group name for shared write access. If provided, analysis is initialized with git init --shared=group and RIA siblings are created with --shared group --group <GROUP>.

Detailed description

How do I define the input datasets in the YAML config file?

Please see document Prepare a configuration YAML file for the BIDS App for how to define the input datasets in the YAML config file.

How is the list of subjects (and sessions) determined?

A list of subjects (and sessions) will be determined when running babs init, and will be saved in a CSV file called named processing_inclusion.csv located at /path/to/my_BABS_project/analysis/code.

To filter subjects and sessions, use babs init with -- /path/to/subject/list/csv/file. Examples: Single-session example, Multi-session example.

See List of included subjects (and sessions) to process for how this list is determined.

What if babs init fails?

If babs init fails, by default it will remove ("clean up") the created, failed BABS project.

When this happens, if you hope to use babs check-setup to debug what's wrong, you'll notice that the failed BABS project has been cleaned and it's not ready to run babs check-setup yet. What you need to do are as follows:

  1. Run babs init with --keep-if-failed turned on.

    • In this way, the failed BABS project will be kept.

  2. Then you can run babs check-setup for diagnosis.

  3. After you know what's wrong, please remove the failed BABS project with following commands:

    cd <project_root>/analysis    # replace `<project_root>` with the path to your BABS project
    
    # Remove input dataset(s) one by one:
    datalad remove -d inputs/data/<input_ds_name>   # replace `<input_ds_name>` with each input dataset's name
    # repeat above step until all input datasets have been removed.
    # if above command leads to "drop impossible" due to modified content, add `--reckless modification` at the end
    
    git annex dead here
    datalad push --to input
    datalad push --to output
    
    cd ..
    pwd   # this prints `<project_root>`; you can copy it in case you forgot
    cd ..   # outside of `<project_root>`
    rm -rf <project_root>
    

    If you don't remove the failed BABS project, you cannot overwrite it by running babs init again.

Example commands

Example babs init command for toy BIDS App + multi-session data on a SLURM cluster:

babs init \
    --container_ds /path/to/toybidsapp-container \
    --container_name toybidsapp-0-0-7 \
    --container_config /path/to/container_toybidsapp.yaml \
    --processing_level session \
    --queue slurm \
    /path/to/a/folder/holding/BABS/project/my_BABS_project

Note

Throttling SLURM array jobs: If you want to limit the number of simultaneously running array tasks, you can add the --throttle option. For example, --throttle 10 will limit SLURM to run at most 10 array tasks at the same time. This is useful when you have many jobs but want to avoid overwhelming the cluster or hitting resource limits. The throttle value will be added to the array specification as %<throttle>, e.g., --array=1-${max_array}%10.

Note

Shared group permissions: On multi-user shared filesystems:

  1. Set ``umask 002`` in your shell startup (for example ~/.bashrc). This is necessary so new files remain group-writable for collaborators.

  2. Pass ``--shared-group <GROUP>`` to babs init so Git and RIA stores are created with group-shared permissions.

With --shared-group, BABS writes generated scripts with group read, write, and execute permissions (equivalent to mode 770), and registers BABS repositories in Git safe.directory so different users in the same Unix group can run babs status and babs submit without ownership issues.

Debugging

Error when cloning an input dataset

What happened: After babs init prints out a message like this: Cloning input dataset #x: '/path/to/input_dataset', there was an error message that includes this information: err: 'fatal: repository '/path/to/input_dataset' does not exist'.

Diagnosis: This means that the specified path to this input dataset (i.e., in origin_url) was not valid; there is no DataLad dataset there.

How to solve the problem: Fix this path. To confirm the updated path is valid, you can try cloning it to a temporary directory with datalad clone /updated/path/to/input_dataset. If it is successful, you can go ahead rerun babs init.