Example walkthrough

In this example walkthrough, we will use toy BIDS data and a toy BIDS App to demonstrate how to use BABS. We use SGE clusters as examples here; adaptations to Slurm clusters will also be covered.

By following the the installation page, on the cluster, you should have successfully installed BABS and its dependent software (DataLad, Git, git-annex, datalad-container) in a conda environment called babs. In addition, because the toy BIDS data you'll use is on OSF, you also need to install datalad-osf.

Here is the list of software versions we used to prepare this walkthrough. It is a good idea to use the versions at or above the versions listed:

$ python --version
Python 3.9.16
$ datalad --version
datalad 0.18.3
$ git --version
git version 2.34.1
$ git-annex version
git-annex version: 10.20230215-gd24914f2a
$ datalad containers-add --version
datalad_container 1.1.9
$ datalad osf-credentials --version
datalad_osf 0.2.3.1

We used BABS version 0.0.3 to prepare this example walkthrough. We encourage you to use the latest BABS version available on PyPI. There might be minor differences in the printed messages or generated code, however you can still follow the same steps instructed here. To check your BABS's version, you can run this command:

$ pip show babs
Name: babs
Version: x.x.x   # e.g., 0.0.3
...

Let's create a folder called babs_demo in root directory as the working directory in this example walkthrough:

$ conda activate babs
$ mkdir -p ~/babs_demo
$ cd babs_demo

Step 0: Ensure dependencies and data access

Notes: This Step 0 is only required for clusters where there is no Internet connection on compute nodes; otherwise, you may skip this step. However we do recommend going through this step if this is your first time running this example walkthrough.

Before you start, you can test if you have all the dependencies (including datalad-osf) installed properly. Let's try installing the toy, multi-session BIDS dataset you'll use in this example walkthrough:

$ datalad clone https://osf.io/w2nu3/ raw_BIDS_multi-ses

The printed messages should look like below. Note that the absolute path to babs_demo (i.e., /cbica/projects/BABS/babs_demo) would probably be different from yours due to different clusters, which is fine:

install(ok): /cbica/projects/BABS/babs_demo/raw_BIDS_multi-ses (dataset)
Why do I also see [INFO] messages?

It's normal to see additional messages from DataLad like below:

[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore

There are two subjects (sub-01 and sub-02) and six sessions in this toy dataset. Now let's try getting a file's content:

$ cd raw_BIDS_multi-ses
$ datalad get sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz

You should see:

get(ok): sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz (file) [from osf-storage...]

You can now view this image in image viewers. Note that the intensities of images in this dataset have been zero-ed out, so it's normal to see all-black images in image viewers.

If there is no Internet connection on compute nodes

In the later steps, jobs for executing the BIDS App will run on compute nodes, and will fetch the file contents of the input BIDS dataset. As this input BIDS dataset we use for this example walkthrough is available on OSF, by default, jobs will fetch the file contents from OSF via Internet connections. This would be a problem for clusters without Internet connection on compute nodes.

If the cluster you're using does not have Internet connection on compute nodes, to avoid issues when running the jobs, please fetch all the file contents now by running:

$ datalad get *

You should see these printed messages from datalad at the end:

action summary:
  get (notneeded: 1, ok: 47)

Then, please skip the step in the next code block below, i.e., do NOT drop file content or remove the local copy of this dataset.

By now, you have made sure you can successfully install this dataset and get the file contents. Now you can drop the file content and remove this local copy of this dataset, as you can directly use its OSF link for input dataset for BABS:

$ datalad drop sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz
$ cd ..
$ datalad remove -d raw_BIDS_multi-ses
Printed messages you'll see
# from `datalad drop`:
drop(ok): sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz (file)

# from `datalad remove`:
uninstall(ok): . (dataset)

Step 1. Get prepared

There are three things required by BABS as input:

  1. DataLad dataset of BIDS dataset(s);

  2. DataLad dataset of containerized BIDS App;

  3. A YAML file regarding how the BIDS App should be executed.

Step 1.1. Prepare DataLad dataset(s) of BIDS dataset(s)

As mentioned above, you will use a toy, multi-session BIDS dataset available on OSF: https://osf.io/w2nu3/. You'll directly copy this link as the path to the input dataset, so no extra work needs to be done here.

If there is no Internet connection on compute nodes

When providing the path to the input BIDS dataset, please do not use the OSF http link; instead, please use the path to the local copy of this dataset. We will provide more guides when we reach that step.

Step 1.2. Prepare DataLad dataset of containerized BIDS App

For the BIDS App, we have prepared a toy BIDS App that performs a simple task: if the input dataset is a raw BIDS dataset (unzipped), the toy BIDS App will count non-hidden files in a subject's folder. Note that even if the input dataset is multi-session dataset, it will still count at subject-level (instead of session-level).

You now need to pull our toy BIDS App as a Singularity image (the latest version is 0.0.7):

$ cd ~/babs_demo
$ singularity build \
    toybidsapp-0.0.7.sif \
    docker://pennlinc/toy_bids_app:0.0.7

Now you should see the file toybidsapp-0.0.7.sif in the current directory.

Having trouble building this Singularity image?

It might be because the Singularity software's version you're using is too old. You can check your Singularity's version via singularity --version. We've tested that these versions work fine: singularity-ce version 3.9.5 and apptainer version 1.1.8-1.el7.

Then create a DataLad dataset of this container (i.e., let DataLad track this Singularity image):

I'm confused - Why the container is another DataLad dataset?

Here, "DataLad dataset of container" means "a collection of container image(s) in a folder tracked by DataLad". Same as DataLad dataset of input BIDS dataset, it's tracked by DataLad; but different from input BIDS dataset, "DataLad dataset of container" contains container image(s), and it won't be processed.

$ datalad create -D "toy BIDS App" toybidsapp-container
$ cd toybidsapp-container
$ datalad containers-add \
    --url ${PWD}/../toybidsapp-0.0.7.sif \
    toybidsapp-0-0-7
Printed messages you'll see
# from `datalad create`:
create(ok): /cbica/projects/BABS/babs_demo/toybidsapp-container (dataset)

# from `datalad containers-add`:
[INFO   ] Copying local file /cbica/projects/BABS/babs_demo/toybidsapp-container/../toybidsapp-0.0.7.sif to /cbica/projects/BABS/babs_demo/toybidsapp-container/.datalad/environments/toybidsapp-0-0-7/image
add(ok): .datalad/environments/toybidsapp-0-0-7/image (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
action summary:
  add (ok: 2)
  save (ok: 1)
add(ok): .datalad/environments/toybidsapp-0-0-7/image (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
containers_add(ok): /cbica/projects/BABS/babs_demo/toybidsapp-container/.datalad/environments/toybidsapp-0-0-7/image (file)
action summary:
  add (ok: 2)
  containers_add (ok: 1)
  save (ok: 1)

Now, the DataLad dataset containing the toy BIDS App container toybidsapp-container is ready to use.

As the sif file has been copied into toybidsapp-container, you can remove the original sif file:

$ cd ..
$ rm toybidsapp-0.0.7.sif

Step 1.3. Prepare a YAML file for the BIDS App

Finally, you'll prepare a YAML file that instructs BABS for how to run the BIDS App. Below is an example YAML file for toy BIDS App:

 1# Arguments in `singularity run`:
 2singularity_run:
 3    --no-zipped: ""
 4    --dummy: "2"
 5    -v: ""
 6
 7# Output foldername(s) to be zipped, and the BIDS App version to be included in the zip filename(s):
 8zip_foldernames:
 9    toybidsapp: "0-0-7"
10
11# How much cluster resources it needs:
12cluster_resources:
13    interpreting_shell: /bin/bash
14    hard_memory_limit: 2G
15
16# Necessary commands to be run first:
17script_preamble: |
18    source ${CONDA_PREFIX}/bin/activate babs    # for Penn Med CUBIC cluster
19
20# Where to run the jobs:
21job_compute_space: "${CBICA_TMPDIR}"   # for Penn Med CUBIC cluster tmp space

As you can see, there are several sections in this YAML file.

Here, in section singularity_run, both --dummy and -v are dummy arguments to this toy BIDS Apps: argument --dummy can take any value afterwards, whereas argument -v does not take values. Here we use these arguments to show examples of:

  • how to add values after arguments: e.g., --dummy: "2";

  • how to add arguments without values: e.g., --no-zipped: "" and -v: "";

  • and it's totally fine to mix flags with prefix of -- and -.

Section zip_foldernames tells BABS to zip the output folder named toybidsapp as a zip file as ${sub-id}_${ses-id}_toybidsapp-0-0-7.zip for each subject's each session, where ${sub-id} is a subject ID, ${ses-id} is a session ID.

You can copy the above content and save it as file config_toybidsapp_demo.yaml in ~/babs_demo directory.

How to copy above content using Vim with correct indent?

After copying above content, and initializing a new file using vim, you need to enter:

:set paste

hit Enter key, hit i to start INSERT (paste) mode, then paste above content into the file. Otherwise, you'll see wrong indent. After pasting, hit escape key and enter:

:set nopaste

and hit Enter key to turn off pasting. You now can save this file by typing :w. Close the file by entering :q and hitting Enter key.

There are several lines (highlighted above) that require customization based on the cluster you are using:

  • Section cluster_resources:

    • Check out if line #13 interpreting_shell looks appropriate for your cluster. Some Slurm clusters may recommend adding -l at the end, i.e.,:

      interpreting_shell: "/bin/bash -l"
      

      See Section cluster_resources for more explanations about this line.

    • For Slurm clusters, if you would like to use specific partition(s), as requesting partition is currently not a pre-defined key in BABS, you can use customized_text instead, and add line #3-4 highlighted in the block below:

      1cluster_resources:
      2    ...
      3    customized_text: |
      4        #SBATCH -p <partition_names>
      

      Please replace <partition_names> with the partition name(s) you would like to use. And please replace ... with other lines with pre-defined keys from BABS, such as interpreting_shell and hard_memory_limit.

    • If needed, you may add more requests for other resources, e.g., runtime limit of 20min (hard_runtime_limit: "00:20:00"), temporary disk space of 20GB (temporary_disk_space: 20G), Or even resources without pre-defined keys from BABS. See Section cluster_resources for how to do so.

    • For Penn Medicine CUBIC cluster only:

      You may need to add line #4-5 highlighted in the block below to avoid some compute nodes that currently have issues in file locks:

      1cluster_resources:
      2    interpreting_shell: /bin/bash
      3    hard_memory_limit: 2G
      4    customized_text: |
      5        #$ -l hostname=!compute-fed*
      
  • Section script_preamble:

    • You might need to adjust the highlighted line #18 of the source command based on your cluster and conda environment name.

    • You might need to add another line to module load any necessary modules, such as singularity. This section will looks like this after you add it:

      script_preamble: |
          source ${CONDA_PREFIX}/bin/activate babs
          module load xxxx
      
    • For more, please see: Section script_preamble.

  • Section job_compute_space:

    • You need to change "${CBICA_TMPDIR}" to the temporary compute space available on your cluster where you will be running jobs, e.g., "/path/to/some_temporary_compute_space". Here "${CBICA_TMPDIR}" is for Penn Medicine CUBIC cluster only.

    • For more, please see: Section job_compute_space.

By now, you have prepared these in the ~/babs_demo folder:

config_toybidsapp_demo.yaml
toybidsapp-container/
If there is no Internet connection on compute nodes

In this folder, you should also see the local copy of the input BIDS dataset raw_BIDS_multi-ses.

Now you can start to use BABS for data analysis.

Step 2. Create a BABS project

Step 2.1. Use babs-init to create a BABS project

A BABS project is the place where all the inputs are cloned to, all scripts are generated, and results and provenance are saved. An example command of babs-init is as follows:

 1$ cd ~/babs_demo
 2$ babs-init \
 3    --where_project ${PWD} \
 4    --project_name my_BABS_project \
 5    --input BIDS https://osf.io/w2nu3/ \
 6    --container_ds ${PWD}/toybidsapp-container \
 7    --container_name toybidsapp-0-0-7 \
 8    --container_config_yaml_file ${PWD}/config_toybidsapp_demo.yaml \
 9    --type_session multi-ses \
10    --type_system sge
If there is no Internet connection on compute nodes

Please replace line #5 with --input BIDS /path/to/cloned_input_BIDS_dataset, and please replace /path/to/cloned_input_BIDS_dataset with the correct path to the local copy of the input BIDS dataset, e.g., ${PWD}/raw_BIDS_multi-ses.

Here you will create a BABS project called my_BABS_project in directory ~/babs_demo. The input dataset will be called BIDS, and you can just provide the OSF link as its path (line #5). For container, you will use the DataLad-tracked toybidsapp-container and the YAML file you just prepared (line #6-8). It is important to make sure the string toybidsapp-0-0-7 used in --container_name (line #7) is consistent with the image name you specified when preparing the DataLad dataset of the container (datalad containers-add). As this input dataset is a multi-session dataset, you should specify this as --type_session multi-ses (line #9). Finally, please change the cluster system type --type_system (highlighted line #10) to yours; currently BABS supports sge and slurm.

If babs-init succeeded, you should see this message at the end:

`babs-init` was successful!
Full printed messages from babs-init
DataLad version: 0.18.3

project_root of this BABS project: /cbica/projects/BABS/babs_demo/my_BABS_project
type of data of this BABS project: multi-ses
job scheduling system of this BABS project: sge


Creating `analysis` folder (also a datalad dataset)...
[INFO   ] Running procedure cfg_yoda 
[INFO   ] == Command start (output follows) ===== 
[INFO   ] == Command exit (modification check follows) =====                                                                                        
run(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset) [/cbica/projects/BABS/miniconda3/envs/bab...]                       
create(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset)
action summary:
  create (ok: 1)
  run (ok: 1)
add(ok): .gitignore (file)                                                                                                                          
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 1)
  save (ok: 1)
Save configurations of BABS project in a yaml file ...
Path to this yaml file will be: 'analysis/code/babs_proj_config.yaml'
add(ok): code/babs_proj_config.yaml (file)                                                                                                          
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 1)
  save (ok: 1)

Creating output and input RIA...
[INFO   ] create siblings 'output' and 'output-storage' ... 
[INFO   ] Fetching updates for Dataset(/cbica/projects/BABS/babs_demo/my_BABS_project/analysis) 
update(ok): . (dataset)
update(ok): . (dataset)
[INFO   ] Configure additional publication dependency on "output-storage" 
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset)
action summary:  
  configure-sibling (ok: 1)
  create-sibling-ria (ok: 1)
  update (ok: 1)
[INFO   ] create sibling 'input' ... 
[INFO   ] Fetching updates for Dataset(/cbica/projects/BABS/babs_demo/my_BABS_project/analysis) 
update(ok): . (dataset)
update(ok): . (dataset)
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset)
action summary:
  configure-sibling (ok: 1)
  create-sibling-ria (ok: 1)
  update (ok: 1)

Registering the input dataset(s)...
Cloning input dataset #1: 'BIDS'
[INFO   ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore                                                            
install(ok): inputs/data/BIDS (dataset)
add(ok): inputs/data/BIDS (dataset)                                                                                                                 
add(ok): .gitmodules (file)                                                                                                                         
save(ok): . (dataset)                                                                                                                               
add(ok): .gitmodules (file)                                                                                                                         
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 3)
  install (ok: 1)
  save (ok: 2)

Checking whether each input dataset is a zipped or unzipped dataset...
input dataset 'BIDS' is considered as an unzipped dataset.
Performing sanity check for any unzipped input dataset...
add(ok): code/babs_proj_config.yaml (file)                                                                                                          
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 1)
  save (ok: 1)

Adding the container as a sub-dataset of `analysis` dataset...
install(ok): containers (dataset)                                                                                                                   
add(ok): containers (dataset)                                                                                                                       
add(ok): .gitmodules (file)                                                                                                                         
save(ok): . (dataset)                                                                                                                               
add(ok): .gitmodules (file)                                                                                                                         
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 3)
  install (ok: 1)
  save (ok: 2)

Generating a bash script for running container and zipping the outputs...
This bash script will be named as `toybidsapp-0-0-7_zip.sh`

/cbica/projects/BABS/miniconda3/envs/babs/lib/python3.9/site-packages/babs/utils.py:440: UserWarning: Usually BIDS App depends on TemplateFlow, but environment variable `TEMPLATEFLOW_HOME` was not set up. Therefore, BABS will not bind its directory or inject this environment variable into the container when running the container. This may cause errors.
  warnings.warn("Usually BIDS App depends on TemplateFlow,"
Below is the generated `singularity run` command:
singularity run --cleanenv \
	-B ${PWD} \
	containers/.datalad/environments/toybidsapp-0-0-7/image \
	inputs/data/BIDS \
	outputs \
	participant \
	--no-zipped \
	--dummy 2 \
	-v \
	--participant-label "${subid}"
add(ok): code/toybidsapp-0-0-7_zip.sh (file)                                                                                                        
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 1)
  save (ok: 1)

Generating a bash script for running jobs at participant (or session) level...
This bash script will be named as `participant_job.sh`
add(ok): code/check_setup/call_test_job.sh (file)                                                                                                   
add(ok): code/check_setup/test_job.py (file)                                                                                                        
add(ok): code/participant_job.sh (file)                                                                                                             
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 3)
  save (ok: 1)

Determining the list of subjects (and sessions) to analyze...
Did not provide `list_sub_file`. Will look into the first input dataset to get the initial inclusion list.
Did not provide `required files` in `container_config_yaml_file`. Not to filter subjects (or sessions)...
The final list of included subjects and sessions has been saved to this CSV file: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/code/sub_ses_final_inclu.csv
add(ok): code/sub_ses_final_inclu.csv (file)                                                                                                        
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 1)
  save (ok: 1)

Generating a template for job submission calls...
The template text file will be named as `submit_job_template.yaml`.
add(ok): code/check_setup/submit_test_job_template.yaml (file)                                                                                      
add(ok): code/submit_job_template.yaml (file)                                                                                                       
save(ok): . (dataset)                                                                                                                               
action summary:                                                                                                                                     
  add (ok: 2)
  save (ok: 1)
                                                                                                                                                    
Final steps...
DataLad dropping input dataset's contents...
action summary:
  drop (notneeded: 2)
Updating input and output RIA...
publish(ok): . (dataset) [refs/heads/master->input:refs/heads/master [new branch]]                                                                  
publish(ok): . (dataset) [refs/heads/git-annex->input:refs/heads/git-annex [new branch]]                                                            
action summary:                                                                                                                                     
  publish (ok: 2)
publish(ok): . (dataset) [refs/heads/master->output:refs/heads/master [new branch]]                                                                 
publish(ok): . (dataset) [refs/heads/git-annex->output:refs/heads/git-annex [new branch]]                                                           
                                                                                                                                                   action summary:                                                                                                                                      
  publish (ok: 2)
Adding an alias 'data' to output RIA store...

`babs-init` was successful!
Warning regarding TemplateFlow? Fine to toy BIDS App!

You may receive this warning from babs-init if you did not set up environment variable $TEMPLATEFLOW_HOME:

UserWarning: Usually BIDS App depends on TemplateFlow, but environment variable `TEMPLATEFLOW_HOME` was not set up.
Therefore, BABS will not bind its directory or inject this environment variable into the container when running the container. This may cause errors.

This is totally fine to toy BIDS App, and it won't use TemplateFlow. However, a lot of BIDS Apps would use it. Make sure you set it up when you use those BIDS Apps.

It's very important to check if the generated singularity run command is what you desire. The command below can be found in the printed messages from babs-init:

singularity run --cleanenv \
    -B ${PWD} \
    containers/.datalad/environments/toybidsapp-0-0-7/image \
    inputs/data/BIDS \
    outputs \
    participant \
    --no-zipped \
    --dummy 2 \
    -v \
    --participant-label "${subid}"

As you can see, BABS has automatically handled the positional arguments of BIDS App (i.e., input directory, output directory, and analysis level - 'participant'). --participant-label is also covered by BABS, too.

It's also important to check if the generated directives for job submission are what you desire. You can get them via:

$ cd ~/babs_demo/my_BABS_project    # make sure you're in `my_BABS_project` folder
$ head analysis/code/participant_job.sh

The first several lines starting with # and before the line # Script preambles: are directives for job submissions. It should be noted that, when using different types of cluster system (e.g., SGE, Slurm), you will see different generated directives. In addition, depending on the BABS version, you'll see slightly different directives, too. If you used YAML file above without further modification, the generated directives would be:

If on a Slurm cluster + using BABS version >0.0.3, you'll see:
#!/bin/bash
#SBATCH --mem=2G
If on an SGE cluster + using BABS version >0.0.3, you'll see:
#!/bin/bash
#$ -l h_vmem=2G
If on an SGE cluster + using BABS version 0.0.3, you'll see:
#!/bin/bash
#$ -S /bin/bash
#$ -l h_vmem=2G
What's inside the created BABS project my_BABS_project?
.
├── analysis
│   ├── CHANGELOG.md
│   ├── code
│   │   ├── babs_proj_config.yaml
│   │   ├── babs_proj_config.yaml.lock
│   │   ├── check_setup
│   │   ├── participant_job.sh
│   │   ├── README.md
│   │   ├── submit_job_template.yaml
│   │   ├── sub_ses_final_inclu.csv
│   │   └── toybidsapp-0-0-7_zip.sh
│   ├── containers
│   ├── inputs
│   │   └── data
│   ├── logs
│   └── README.md
├── input_ria
└── output_ria

Here, analysis is a DataLad dataset that includes generated scripts in code/, a cloned container DataLad dataset containers/, and a cloned input dataset in inputs/data. Input and output RIA stores (input_ria and output_ria) are DataLad siblings of the analysis dataset. When running jobs, inputs are cloned from input RIA store, and results and provenance will be pushed to output RIA store.

Step 2.2. Use babs-check-setup to make sure it's good to go

It's important to let BABS check to be sure that the project has been initialized correctly. In addition, it's often a good idea to run a test job to make sure that the environment and cluster resources specified in the YAML file are workable.

Note that starting from this step in this example walkthrough, without further instructions, all BABS functions will be called from where the BABS project is located: ~/babs_demo/my_BABS_project. This is to make sure you can directly use ${PWD} for argument --project-root. Therefore, please make sure you switch to this directory before calling them.

$ cd ~/babs_demo/my_BABS_project    # make sure you're in `my_BABS_project` folder
$ babs-check-setup \
    --project-root ${PWD} \
    --job-test

It might take a bit time to finish, depending on how busy your cluster is, and how much resources you requested in the YAML file - in this example, you only requested very minimal amount of resources.

You'll see this message at the end if babs-check-setup was successful:

`babs-check-setup` was successful!

Before moving on, please make sure you review the summarized information of designated environment, especially the version numbers:

Below is the information of designated environment and temporary workspace:

workspace_writable: true
which_python: '/cbica/projects/BABS/miniconda3/envs/babs/bin/python'
version:
  datalad: 'datalad 0.18.3'
  git: 'git version 2.34.1'
  git-annex: 'git-annex version: 10.20230215-gd24914f2a'
  datalad_containers: 'datalad_container 1.1.9'
Full printed messages from babs-check-setup
Will check setups of BABS project located at: /cbica/projects/BABS/babs_demo/my_BABS_project
Will submit a test job for testing; will take longer time.
Below is the configuration information saved during `babs-init` in file 'analysis/code/babs_proj_config.yaml':

type_session: multi-ses
type_system: sge
input_ds:
  $INPUT_DATASET_#1:
    name: BIDS
    path_in: https://osf.io/w2nu3/
    path_data_rel: inputs/data/BIDS
    is_zipped: false
container:
  name: toybidsapp-0-0-7
  path_in: /cbica/projects/BABS/babs_demo/toybidsapp-container

Checking the BABS project itself...
✓ All good!

Check status of 'analysis' DataLad dataset...
nothing to save, working tree clean
✓ All good!

Checking input dataset(s)...
✓ All good!

Checking container datalad dataset...
✓ All good!

Checking `analysis/code/` folder...
✓ All good!

Checking input and output RIA...
	Datalad dataset `analysis`'s siblings:
.: here(+) [git]
.: input(-) [/cbica/projects/BABS/babs_demo/my_BABS_project/input_ria/d5f/7c9f2-1b55-4bc9-ada8-ca296b2c3268 (git)]
.: output-storage(+) [ora]
.: output(-) [/cbica/projects/BABS/babs_demo/my_BABS_project/output_ria/d5f/7c9f2-1b55-4bc9-ada8-ca296b2c3268 (git)]
✓ All good!

Submitting a test job, will take a while to finish...
Although the script will be submitted to the cluster to run, this job will not run the BIDS App; instead, this test job will gather setup information in the designated environment and will make sure jobs can finish successfully on current cluster.
Test job has been submitted (job ID: 4635991).
Will check the test job's status every 1 min...
2023-05-05 14:36:28.253463: Test job is pending (`qw`)...
2023-05-05 14:37:28.628330: Test job is pending (`qw`)...
2023-05-05 14:38:28.777482: Test job is running (`r`)...
2023-05-05 14:39:29.199464: Test job is successfully finished!
Below is the information of designated environment and temporary workspace:

workspace_writable: true
which_python: '/cbica/projects/BABS/miniconda3/envs/babs/bin/python'
version:
  datalad: 'datalad 0.18.3'
  git: 'git version 2.34.1'
  git-annex: 'git-annex version: 10.20230215-gd24914f2a'
  datalad_containers: 'datalad_container 1.1.9'

Please check if above versions are the ones you hope to use! If not, please change the version in the designated environment, or change the designated environment you hope to use in `--container-config-yaml-file` and rerun `babs-init`.
✓ All good in test job!

`babs-check-setup` was successful!

Now it's ready for job submissions.

Step 3. Submit jobs and check job status

We'll iteratively use babs-submit and babs-status to submit jobs and check job status.

We first use babs-status to check the number of jobs we initially expect to finish successfully. In this example walkthrough, as no initial list was provided, BABS determines this number based on the number of sessions in the input BIDS dataset. We did not request extra filtering (based on required files) in our YAML file either, so BABS will submit one job for each session.

$ cd ~/babs_demo/my_BABS_project    # make sure you're in `my_BABS_project` folder
$ babs-status --project-root $PWD

You'll see:

Did not request resubmit based on job states (no `--resubmit`).

Job status:
There are in total of 6 jobs to complete.
0 job(s) have been submitted; 6 job(s) haven't been submitted.

Let's use babs-submit to submit one job and see if it will finish successfully. By default, babs-submit will only submit one job. If you would like to submit all jobs, you can use the --all argument.

$ babs-submit --project-root $PWD

You'll see something like this (the job ID will probably be different):

Job for sub-01, ses-A has been submitted (job ID: 4639278).
sub_id ses_id  has_submitted   job_id  job_state_category  job_state_code  duration  is_done  is_failed
0  sub-01  ses-A           True  4639278                 NaN             NaN       NaN    False        NaN  \
1  sub-01  ses-B          False       -1                 NaN             NaN       NaN    False        NaN
2  sub-01  ses-C          False       -1                 NaN             NaN       NaN    False        NaN
3  sub-02  ses-A          False       -1                 NaN             NaN       NaN    False        NaN
4  sub-02  ses-B          False       -1                 NaN             NaN       NaN    False        NaN
5  sub-02  ses-D          False       -1                 NaN             NaN       NaN    False        NaN

                log_filename  last_line_stdout_file  alert_message  job_account
0  toy_sub-01_ses-A.*4639278                    NaN            NaN          NaN
1                        NaN                    NaN            NaN          NaN
2                        NaN                    NaN            NaN          NaN
3                        NaN                    NaN            NaN          NaN
4                        NaN                    NaN            NaN          NaN
5                        NaN                    NaN            NaN          NaN

You can check the job status via babs-status:

$ babs-status --project-root $PWD

If it's successfully finished, you'll see:

Did not request resubmit based on job states (no `--resubmit`).

Job status:
There are in total of 6 jobs to complete.
1 job(s) have been submitted; 5 job(s) haven't been submitted.
Among submitted jobs,
1 job(s) are successfully finished;
0 job(s) are pending;
0 job(s) are running;
0 job(s) are failed.

All log files are located in folder: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/logs

Now, you can submit all other jobs by specifying --all:

$ babs-submit --project-root $PWD --all

You can again call babs-status --project-root $PWD to check status. If those 5 jobs are pending (submitted but not yet run by the cluster), you'll see:

 1Did not request resubmit based on job states (no `--resubmit`).
 2
 3Job status:
 4There are in total of 6 jobs to complete.
 56 job(s) have been submitted; 0 job(s) haven't been submitted.
 6Among submitted jobs,
 71 job(s) are successfully finished;
 85 job(s) are pending;
 90 job(s) are running;
100 job(s) are failed.
11
12All log files are located in folder: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/logs

If some jobs are running or have failed, you'll see non-zero numbers in line #9 or #10.

If all jobs have finished successfully, you'll see:

Did not request resubmit based on job states (no `--resubmit`).

Job status:
There are in total of 6 jobs to complete.
6 job(s) have been submitted; 0 job(s) haven't been submitted.
Among submitted jobs,
6 job(s) are successfully finished;
All jobs are completed!

All log files are located in folder: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/logs

Step 4. After jobs have finished

Step 4.1. Use babs-merge to merge all results and provenance

After all jobs have finished successfully, we will merge all the results and provenance. Each job was executed on a different branch, so we must merge them together into the mainline branch.

We now run babs-merge in the root directory of my_BABS_project:

$ babs-merge --project-root $PWD

If it was successful, you'll see this message at the end:

`babs-merge` was successful!
Full printed messages from babs-merge
Cloning output RIA to 'merge_ds'...
[INFO   ] Configure additional publication dependency on "output-storage"                                                                           
configure-sibling(ok): . (sibling)
install(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/merge_ds (dataset)
action summary:
  configure-sibling (ok: 1)
  install (ok: 1)

Listing all branches in output RIA...

Finding all valid job branches to merge...
Git default branch's name of output RIA is: 'master'

Merging valid job branches chunk by chunk...
Total number of job branches to merge = 6
Chunk size (number of job branches per chunk) = 2000
--> Number of chunks = 1
Merging chunk #1 (total of 1 chunk[s] to merge)...
Fast-forwarding to: remotes/origin/job-4639278-sub-01-ses-A
Trying simple merge with remotes/origin/job-4648997-sub-01-ses-B
Trying simple merge with remotes/origin/job-4649000-sub-01-ses-C
Trying simple merge with remotes/origin/job-4649003-sub-02-ses-A
Trying simple merge with remotes/origin/job-4649006-sub-02-ses-B
Trying simple merge with remotes/origin/job-4649009-sub-02-ses-D
Merge made by the 'octopus' strategy.
 sub-01_ses-A_toybidsapp-0-0-7.zip | 1 +
 sub-01_ses-B_toybidsapp-0-0-7.zip | 1 +
 sub-01_ses-C_toybidsapp-0-0-7.zip | 1 +
 sub-02_ses-A_toybidsapp-0-0-7.zip | 1 +
 sub-02_ses-B_toybidsapp-0-0-7.zip | 1 +
 sub-02_ses-D_toybidsapp-0-0-7.zip | 1 +
 6 files changed, 6 insertions(+)
 create mode 120000 sub-01_ses-A_toybidsapp-0-0-7.zip
 create mode 120000 sub-01_ses-B_toybidsapp-0-0-7.zip
 create mode 120000 sub-01_ses-C_toybidsapp-0-0-7.zip
 create mode 120000 sub-02_ses-A_toybidsapp-0-0-7.zip
 create mode 120000 sub-02_ses-B_toybidsapp-0-0-7.zip
 create mode 120000 sub-02_ses-D_toybidsapp-0-0-7.zip


Pushing merging actions to output RIA...
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 40 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 522 bytes | 522.00 KiB/s, done.
Total 2 (delta 1), reused 0 (delta 0), pack-reused 0
To /cbica/projects/BABS/babs_demo/my_BABS_project/output_ria/d5f/7c9f2-1b55-4bc9-ada8-ca296b2c3268
   5996f87..4d737d8  master -> master

dead here (recording state in git...)
ok
(recording state in git...)

[INFO] Determine push target 
[INFO] Push refspecs 
[INFO] Determine push target 
[INFO] Push refspecs 
[INFO] Update availability information 
[INFO] Start enumerating objects 
[INFO] Start counting objects 
[INFO] Start compressing objects 
[INFO] Start writing objects 
[INFO] Finished push of Dataset(/gpfs/fs001/cbica/projects/BABS/babs_demo/my_BABS_project/merge_ds) 
[INFO] Finished push of Dataset(/gpfs/fs001/cbica/projects/BABS/babs_demo/my_BABS_project/merge_ds) 
publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex 8fd488d..d8a14a5]
action summary:
  publish (notneeded: 1, ok: 1)


`babs-merge` was successful!

Now you're ready to consume the results.

Step 4.2. Consume results

To consume the results, you should not access the output RIA store or merge_ds directories inside the BABS project. Instead, clone the output RIA as another folder (e.g., called my_BABS_project_outputs) to a location external to the BABS project:

$ cd ..   # Now, you should be in folder `babs_demo`, where `my_BABS_project` locates
$ datalad clone \
    ria+file://${PWD}/my_BABS_project/output_ria#~data \
    my_BABS_project_outputs

You'll see:

[INFO   ] Configure additional publication dependency on "output-storage"
configure-sibling(ok): . (sibling)
install(ok): /cbica/projects/BABS/babs_demo/my_BABS_project_outputs (dataset)
action summary:
  configure-sibling (ok: 1)
  install (ok: 1)

Let's go into this new folder and see what's inside:

$ cd my_BABS_project_outputs
$ ls

You'll see:

CHANGELOG.md                                sub-01_ses-B_toybidsapp-0-0-7.zip@
code/                                       sub-01_ses-C_toybidsapp-0-0-7.zip@
containers/                                 sub-02_ses-A_toybidsapp-0-0-7.zip@
inputs/                                     sub-02_ses-B_toybidsapp-0-0-7.zip@
README.md                                   sub-02_ses-D_toybidsapp-0-0-7.zip@
sub-01_ses-A_toybidsapp-0-0-7.zip@

As you can see, each session's results have been saved in a zip file. Before unzipping a zip file, you need to get its content first:

$ datalad get sub-01_ses-A_toybidsapp-0-0-7.zip
$ unzip sub-01_ses-A_toybidsapp-0-0-7.zip

You'll see printed messages like this:

# from `datalad get`:
get(ok): sub-01_ses-A_toybidsapp-0-0-7.zip (file) [from output-storage...]

# from unzip:
Archive:  sub-01_ses-A_toybidsapp-0-0-7.zip
   creating: toybidsapp/
 extracting: toybidsapp/num_nonhidden_files.txt

From the zip file, you got a folder called toybidsapp.

$ cd toybidsapp
$ ls

In this folder, there is a file called num_nonhidden_files.txt. This is the result from toy BIDS App, which is the number of non-hidden files in this subject. Note that for raw BIDS dataset, toy BIDS App counts at subject-level, even though current dataset is a multi-session dataset.

$ cat num_nonhidden_files.txt
67

Here, 67 is the expected number for sub-01 (which you're looking at), 56 is the expected number for sub-02. This means that toy BIDS App and BABS ran as expected :).