Example walkthrough
Table of Contents
In this example walkthrough, we will use toy BIDS data and a toy BIDS App to demonstrate how to use BABS. We use SGE clusters as examples here; adaptations to Slurm clusters will also be covered.
By following the the installation page,
on the cluster, you should have successfully installed BABS and its dependent software
(DataLad
, Git
, git-annex
, datalad-container
)
in a conda environment called babs
. In addition, because the toy BIDS data
you'll use is on OSF, you also need to install datalad-osf
.
Here is the list of software versions we used to prepare this walkthrough. It is a good idea to use the versions at or above the versions listed:
$ python --version
Python 3.9.16
$ datalad --version
datalad 0.18.3
$ git --version
git version 2.34.1
$ git-annex version
git-annex version: 10.20230215-gd24914f2a
$ datalad containers-add --version
datalad_container 1.1.9
$ datalad osf-credentials --version
datalad_osf 0.2.3.1
We used BABS version 0.0.3
to prepare this example walkthrough.
We encourage you to use the latest BABS version available on PyPI.
There might be minor differences in the printed messages or generated code,
however you can still follow the same steps instructed here.
To check your BABS's version, you can run this command:
$ pip show babs
Name: babs
Version: x.x.x # e.g., 0.0.3
...
Let's create a folder called babs_demo
in root directory
as the working directory in this example walkthrough:
$ conda activate babs
$ mkdir -p ~/babs_demo
$ cd babs_demo
Step 0: Ensure dependencies and data access
Notes: This Step 0 is only required for clusters where there is no Internet connection on compute nodes; otherwise, you may skip this step. However we do recommend going through this step if this is your first time running this example walkthrough.
Before you start, you can test if you have all the dependencies
(including datalad-osf
) installed properly. Let's try installing
the toy, multi-session BIDS dataset you'll use in this example walkthrough:
$ datalad clone https://osf.io/w2nu3/ raw_BIDS_multi-ses
The printed messages should look like below.
Note that the absolute path to babs_demo
(i.e., /cbica/projects/BABS/babs_demo
)
would probably be different from yours due to different clusters, which is fine:
install(ok): /cbica/projects/BABS/babs_demo/raw_BIDS_multi-ses (dataset)
Why do I also see [INFO]
messages?
It's normal to see additional messages from DataLad like below:
[INFO ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
There are two subjects (sub-01
and sub-02
) and six sessions in this toy dataset.
Now let's try getting a file's content:
$ cd raw_BIDS_multi-ses
$ datalad get sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz
You should see:
get(ok): sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz (file) [from osf-storage...]
You can now view this image in image viewers. Note that the intensities of images in this dataset have been zero-ed out, so it's normal to see all-black images in image viewers.
If there is no Internet connection on compute nodes
In the later steps, jobs for executing the BIDS App will run on compute nodes, and will fetch the file contents of the input BIDS dataset. As this input BIDS dataset we use for this example walkthrough is available on OSF, by default, jobs will fetch the file contents from OSF via Internet connections. This would be a problem for clusters without Internet connection on compute nodes.
If the cluster you're using does not have Internet connection on compute nodes, to avoid issues when running the jobs, please fetch all the file contents now by running:
$ datalad get *
You should see these printed messages from datalad
at the end:
action summary:
get (notneeded: 1, ok: 47)
Then, please skip the step in the next code block below, i.e., do NOT drop file content or remove the local copy of this dataset.
By now, you have made sure you can successfully install this dataset and get the file contents. Now you can drop the file content and remove this local copy of this dataset, as you can directly use its OSF link for input dataset for BABS:
$ datalad drop sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz
$ cd ..
$ datalad remove -d raw_BIDS_multi-ses
Printed messages you'll see
# from `datalad drop`:
drop(ok): sub-01/ses-A/anat/sub-01_ses-A_T1w.nii.gz (file)
# from `datalad remove`:
uninstall(ok): . (dataset)
Step 1. Get prepared
There are three things required by BABS as input:
DataLad dataset of BIDS dataset(s);
DataLad dataset of containerized BIDS App;
A YAML file regarding how the BIDS App should be executed.
Step 1.1. Prepare DataLad dataset(s) of BIDS dataset(s)
As mentioned above, you will use a toy, multi-session BIDS dataset available on OSF: https://osf.io/w2nu3/. You'll directly copy this link as the path to the input dataset, so no extra work needs to be done here.
If there is no Internet connection on compute nodes
When providing the path to the input BIDS dataset, please do not use the OSF http link; instead, please use the path to the local copy of this dataset. We will provide more guides when we reach that step.
Step 1.2. Prepare DataLad dataset of containerized BIDS App
For the BIDS App, we have prepared a toy BIDS App that performs a simple task: if the input dataset is a raw BIDS dataset (unzipped), the toy BIDS App will count non-hidden files in a subject's folder. Note that even if the input dataset is multi-session dataset, it will still count at subject-level (instead of session-level).
You now need to pull our toy BIDS App as a Singularity image (the latest version is 0.0.7
):
$ cd ~/babs_demo
$ singularity build \
toybidsapp-0.0.7.sif \
docker://pennlinc/toy_bids_app:0.0.7
Now you should see the file toybidsapp-0.0.7.sif
in the current directory.
Having trouble building this Singularity image?
It might be because the Singularity software's version you're using is too old.
You can check your Singularity's version via singularity --version
.
We've tested that these versions work fine:
singularity-ce version 3.9.5
and apptainer version 1.1.8-1.el7
.
Then create a DataLad dataset of this container (i.e., let DataLad track this Singularity image):
I'm confused - Why the container is another DataLad dataset?
Here, "DataLad dataset of container" means "a collection of container image(s) in a folder tracked by DataLad". Same as DataLad dataset of input BIDS dataset, it's tracked by DataLad; but different from input BIDS dataset, "DataLad dataset of container" contains container image(s), and it won't be processed.
$ datalad create -D "toy BIDS App" toybidsapp-container
$ cd toybidsapp-container
$ datalad containers-add \
--url ${PWD}/../toybidsapp-0.0.7.sif \
toybidsapp-0-0-7
Printed messages you'll see
# from `datalad create`:
create(ok): /cbica/projects/BABS/babs_demo/toybidsapp-container (dataset)
# from `datalad containers-add`:
[INFO ] Copying local file /cbica/projects/BABS/babs_demo/toybidsapp-container/../toybidsapp-0.0.7.sif to /cbica/projects/BABS/babs_demo/toybidsapp-container/.datalad/environments/toybidsapp-0-0-7/image
add(ok): .datalad/environments/toybidsapp-0-0-7/image (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
action summary:
add (ok: 2)
save (ok: 1)
add(ok): .datalad/environments/toybidsapp-0-0-7/image (file)
add(ok): .datalad/config (file)
save(ok): . (dataset)
containers_add(ok): /cbica/projects/BABS/babs_demo/toybidsapp-container/.datalad/environments/toybidsapp-0-0-7/image (file)
action summary:
add (ok: 2)
containers_add (ok: 1)
save (ok: 1)
Now, the DataLad dataset containing the toy BIDS App container toybidsapp-container
is ready to use.
As the sif
file has been copied into toybidsapp-container
,
you can remove the original sif
file:
$ cd ..
$ rm toybidsapp-0.0.7.sif
Step 1.3. Prepare a YAML file for the BIDS App
Finally, you'll prepare a YAML file that instructs BABS for how to run the BIDS App. Below is an example YAML file for toy BIDS App:
1# Arguments in `singularity run`:
2singularity_run:
3 --no-zipped: ""
4 --dummy: "2"
5 -v: ""
6
7# Output foldername(s) to be zipped, and the BIDS App version to be included in the zip filename(s):
8zip_foldernames:
9 toybidsapp: "0-0-7"
10
11# How much cluster resources it needs:
12cluster_resources:
13 interpreting_shell: /bin/bash
14 hard_memory_limit: 2G
15
16# Necessary commands to be run first:
17script_preamble: |
18 source ${CONDA_PREFIX}/bin/activate babs # for Penn Med CUBIC cluster
19
20# Where to run the jobs:
21job_compute_space: "${CBICA_TMPDIR}" # for Penn Med CUBIC cluster tmp space
As you can see, there are several sections in this YAML file.
Here, in section singularity_run
,
both --dummy
and -v
are dummy arguments to this toy BIDS Apps:
argument --dummy
can take any value afterwards, whereas argument -v
does not take values.
Here we use these arguments to show examples of:
how to add values after arguments: e.g.,
--dummy: "2"
;how to add arguments without values: e.g.,
--no-zipped: ""
and-v: ""
;and it's totally fine to mix flags with prefix of
--
and-
.
Section zip_foldernames
tells BABS to zip the output folder named toybidsapp
as a zip file as ${sub-id}_${ses-id}_toybidsapp-0-0-7.zip
for each subject's each session,
where ${sub-id}
is a subject ID, ${ses-id}
is a session ID.
You can copy the above content and save it as file config_toybidsapp_demo.yaml
in ~/babs_demo
directory.
How to copy above content using Vim
with correct indent?
After copying above content, and initializing a new file using vim
, you need to enter:
:set paste
hit Enter
key,
hit i
to start INSERT (paste)
mode, then paste above content into the file. Otherwise, you'll see wrong indent.
After pasting, hit escape
key and enter:
:set nopaste
and hit Enter
key to turn off pasting.
You now can save this file by typing :w
. Close the file by entering :q
and hitting Enter
key.
There are several lines (highlighted above) that require customization based on the cluster you are using:
Section
cluster_resources
:Check out if line #13
interpreting_shell
looks appropriate for your cluster. Some Slurm clusters may recommend adding-l
at the end, i.e.,:interpreting_shell: "/bin/bash -l"
See Section cluster_resources for more explanations about this line.
For Slurm clusters, if you would like to use specific partition(s), as requesting partition is currently not a pre-defined key in BABS, you can use
customized_text
instead, and add line #3-4 highlighted in the block below:1cluster_resources: 2 ... 3 customized_text: | 4 #SBATCH -p <partition_names>
Please replace
<partition_names>
with the partition name(s) you would like to use. And please replace...
with other lines with pre-defined keys from BABS, such asinterpreting_shell
andhard_memory_limit
.If needed, you may add more requests for other resources, e.g., runtime limit of 20min (
hard_runtime_limit: "00:20:00"
), temporary disk space of 20GB (temporary_disk_space: 20G
), Or even resources without pre-defined keys from BABS. See Section cluster_resources for how to do so.For Penn Medicine CUBIC cluster only:
You may need to add line #4-5 highlighted in the block below to avoid some compute nodes that currently have issues in file locks:
1cluster_resources: 2 interpreting_shell: /bin/bash 3 hard_memory_limit: 2G 4 customized_text: | 5 #$ -l hostname=!compute-fed*
Section
script_preamble
:You might need to adjust the highlighted line #18 of the
source
command based on your cluster and conda environment name.You might need to add another line to
module load
any necessary modules, such assingularity
. This section will looks like this after you add it:script_preamble: | source ${CONDA_PREFIX}/bin/activate babs module load xxxx
For more, please see: Section script_preamble.
Section
job_compute_space
:You need to change
"${CBICA_TMPDIR}"
to the temporary compute space available on your cluster where you will be running jobs, e.g.,"/path/to/some_temporary_compute_space"
. Here"${CBICA_TMPDIR}"
is for Penn Medicine CUBIC cluster only.For more, please see: Section job_compute_space.
By now, you have prepared these in the ~/babs_demo
folder:
config_toybidsapp_demo.yaml
toybidsapp-container/
If there is no Internet connection on compute nodes
In this folder, you should also see the local copy of the input BIDS dataset
raw_BIDS_multi-ses
.
Now you can start to use BABS for data analysis.
Step 2. Create a BABS project
Step 2.1. Use babs-init
to create a BABS project
A BABS project is the place where all the inputs are cloned to, all scripts are generated,
and results and provenance are saved. An example command of babs-init
is as follows:
1$ cd ~/babs_demo
2$ babs-init \
3 --where_project ${PWD} \
4 --project_name my_BABS_project \
5 --input BIDS https://osf.io/w2nu3/ \
6 --container_ds ${PWD}/toybidsapp-container \
7 --container_name toybidsapp-0-0-7 \
8 --container_config_yaml_file ${PWD}/config_toybidsapp_demo.yaml \
9 --type_session multi-ses \
10 --type_system sge
If there is no Internet connection on compute nodes
Please replace line #5 with --input BIDS /path/to/cloned_input_BIDS_dataset
,
and please replace /path/to/cloned_input_BIDS_dataset
with the correct path
to the local copy of the input BIDS dataset,
e.g., ${PWD}/raw_BIDS_multi-ses
.
Here you will create a BABS project called my_BABS_project
in directory ~/babs_demo
.
The input dataset will be called BIDS
, and you can just provide the OSF link as its path (line #5).
For container, you will use the DataLad-tracked toybidsapp-container
and the YAML file you just prepared (line #6-8).
It is important to make sure the string toybidsapp-0-0-7
used in --container_name
(line #7)
is consistent with the image name you specified when preparing
the DataLad dataset of the container (datalad containers-add
).
As this input dataset is a multi-session dataset, you should specify this as --type_session multi-ses
(line #9).
Finally, please change the cluster system type --type_system
(highlighted line #10) to yours;
currently BABS supports sge
and slurm
.
If babs-init
succeeded, you should see this message at the end:
`babs-init` was successful!
Full printed messages from babs-init
DataLad version: 0.18.3
project_root of this BABS project: /cbica/projects/BABS/babs_demo/my_BABS_project
type of data of this BABS project: multi-ses
job scheduling system of this BABS project: sge
Creating `analysis` folder (also a datalad dataset)...
[INFO ] Running procedure cfg_yoda
[INFO ] == Command start (output follows) =====
[INFO ] == Command exit (modification check follows) =====
run(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset) [/cbica/projects/BABS/miniconda3/envs/bab...]
create(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset)
action summary:
create (ok: 1)
run (ok: 1)
add(ok): .gitignore (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
Save configurations of BABS project in a yaml file ...
Path to this yaml file will be: 'analysis/code/babs_proj_config.yaml'
add(ok): code/babs_proj_config.yaml (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
Creating output and input RIA...
[INFO ] create siblings 'output' and 'output-storage' ...
[INFO ] Fetching updates for Dataset(/cbica/projects/BABS/babs_demo/my_BABS_project/analysis)
update(ok): . (dataset)
update(ok): . (dataset)
[INFO ] Configure additional publication dependency on "output-storage"
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset)
action summary:
configure-sibling (ok: 1)
create-sibling-ria (ok: 1)
update (ok: 1)
[INFO ] create sibling 'input' ...
[INFO ] Fetching updates for Dataset(/cbica/projects/BABS/babs_demo/my_BABS_project/analysis)
update(ok): . (dataset)
update(ok): . (dataset)
configure-sibling(ok): . (sibling)
create-sibling-ria(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/analysis (dataset)
action summary:
configure-sibling (ok: 1)
create-sibling-ria (ok: 1)
update (ok: 1)
Registering the input dataset(s)...
Cloning input dataset #1: 'BIDS'
[INFO ] Remote origin uses a protocol not supported by git-annex; setting annex-ignore
install(ok): inputs/data/BIDS (dataset)
add(ok): inputs/data/BIDS (dataset)
add(ok): .gitmodules (file)
save(ok): . (dataset)
add(ok): .gitmodules (file)
save(ok): . (dataset)
action summary:
add (ok: 3)
install (ok: 1)
save (ok: 2)
Checking whether each input dataset is a zipped or unzipped dataset...
input dataset 'BIDS' is considered as an unzipped dataset.
Performing sanity check for any unzipped input dataset...
add(ok): code/babs_proj_config.yaml (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
Adding the container as a sub-dataset of `analysis` dataset...
install(ok): containers (dataset)
add(ok): containers (dataset)
add(ok): .gitmodules (file)
save(ok): . (dataset)
add(ok): .gitmodules (file)
save(ok): . (dataset)
action summary:
add (ok: 3)
install (ok: 1)
save (ok: 2)
Generating a bash script for running container and zipping the outputs...
This bash script will be named as `toybidsapp-0-0-7_zip.sh`
/cbica/projects/BABS/miniconda3/envs/babs/lib/python3.9/site-packages/babs/utils.py:440: UserWarning: Usually BIDS App depends on TemplateFlow, but environment variable `TEMPLATEFLOW_HOME` was not set up. Therefore, BABS will not bind its directory or inject this environment variable into the container when running the container. This may cause errors.
warnings.warn("Usually BIDS App depends on TemplateFlow,"
Below is the generated `singularity run` command:
singularity run --cleanenv \
-B ${PWD} \
containers/.datalad/environments/toybidsapp-0-0-7/image \
inputs/data/BIDS \
outputs \
participant \
--no-zipped \
--dummy 2 \
-v \
--participant-label "${subid}"
add(ok): code/toybidsapp-0-0-7_zip.sh (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
Generating a bash script for running jobs at participant (or session) level...
This bash script will be named as `participant_job.sh`
add(ok): code/check_setup/call_test_job.sh (file)
add(ok): code/check_setup/test_job.py (file)
add(ok): code/participant_job.sh (file)
save(ok): . (dataset)
action summary:
add (ok: 3)
save (ok: 1)
Determining the list of subjects (and sessions) to analyze...
Did not provide `list_sub_file`. Will look into the first input dataset to get the initial inclusion list.
Did not provide `required files` in `container_config_yaml_file`. Not to filter subjects (or sessions)...
The final list of included subjects and sessions has been saved to this CSV file: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/code/sub_ses_final_inclu.csv
add(ok): code/sub_ses_final_inclu.csv (file)
save(ok): . (dataset)
action summary:
add (ok: 1)
save (ok: 1)
Generating a template for job submission calls...
The template text file will be named as `submit_job_template.yaml`.
add(ok): code/check_setup/submit_test_job_template.yaml (file)
add(ok): code/submit_job_template.yaml (file)
save(ok): . (dataset)
action summary:
add (ok: 2)
save (ok: 1)
Final steps...
DataLad dropping input dataset's contents...
action summary:
drop (notneeded: 2)
Updating input and output RIA...
publish(ok): . (dataset) [refs/heads/master->input:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->input:refs/heads/git-annex [new branch]]
action summary:
publish (ok: 2)
publish(ok): . (dataset) [refs/heads/master->output:refs/heads/master [new branch]]
publish(ok): . (dataset) [refs/heads/git-annex->output:refs/heads/git-annex [new branch]]
action summary:
publish (ok: 2)
Adding an alias 'data' to output RIA store...
`babs-init` was successful!
Warning regarding TemplateFlow? Fine to toy BIDS App!
You may receive this warning from babs-init
if you did not set up environment variable $TEMPLATEFLOW_HOME
:
UserWarning: Usually BIDS App depends on TemplateFlow, but environment variable `TEMPLATEFLOW_HOME` was not set up.
Therefore, BABS will not bind its directory or inject this environment variable into the container when running the container. This may cause errors.
This is totally fine to toy BIDS App, and it won't use TemplateFlow. However, a lot of BIDS Apps would use it. Make sure you set it up when you use those BIDS Apps.
It's very important to check if the generated singularity run
command is what you desire.
The command below can be found in the printed messages from babs-init
:
singularity run --cleanenv \
-B ${PWD} \
containers/.datalad/environments/toybidsapp-0-0-7/image \
inputs/data/BIDS \
outputs \
participant \
--no-zipped \
--dummy 2 \
-v \
--participant-label "${subid}"
As you can see, BABS has automatically handled the positional arguments of BIDS App (i.e., input directory,
output directory, and analysis level - 'participant'). --participant-label
is also covered by BABS, too.
It's also important to check if the generated directives for job submission are what you desire. You can get them via:
$ cd ~/babs_demo/my_BABS_project # make sure you're in `my_BABS_project` folder
$ head analysis/code/participant_job.sh
The first several lines starting with #
and before the line # Script preambles:
are directives for job submissions.
It should be noted that, when using different types of cluster system (e.g., SGE, Slurm),
you will see different generated directives.
In addition, depending on the BABS version, you'll see slightly different directives, too.
If you used YAML file above without further modification,
the generated directives would be:
If on a Slurm cluster + using BABS version >0.0.3, you'll see:
#!/bin/bash
#SBATCH --mem=2G
If on an SGE cluster + using BABS version >0.0.3, you'll see:
#!/bin/bash
#$ -l h_vmem=2G
If on an SGE cluster + using BABS version 0.0.3, you'll see:
#!/bin/bash
#$ -S /bin/bash
#$ -l h_vmem=2G
What's inside the created BABS project my_BABS_project
?
.
├── analysis
│ ├── CHANGELOG.md
│ ├── code
│ │ ├── babs_proj_config.yaml
│ │ ├── babs_proj_config.yaml.lock
│ │ ├── check_setup
│ │ ├── participant_job.sh
│ │ ├── README.md
│ │ ├── submit_job_template.yaml
│ │ ├── sub_ses_final_inclu.csv
│ │ └── toybidsapp-0-0-7_zip.sh
│ ├── containers
│ ├── inputs
│ │ └── data
│ ├── logs
│ └── README.md
├── input_ria
└── output_ria
Here, analysis
is a DataLad dataset that includes generated scripts in code/
,
a cloned container DataLad dataset containers/
, and a cloned input dataset in inputs/data
.
Input and output RIA stores (input_ria
and output_ria
) are
DataLad siblings of the analysis
dataset.
When running jobs, inputs are cloned from input RIA store,
and results and provenance will be pushed to output RIA store.
Step 2.2. Use babs-check-setup
to make sure it's good to go
It's important to let BABS check to be sure that the project has been initialized correctly. In addition, it's often a good idea to run a test job to make sure that the environment and cluster resources specified in the YAML file are workable.
Note that starting from this step in this example walkthrough, without further instructions,
all BABS functions will be called from where the BABS project
is located: ~/babs_demo/my_BABS_project
.
This is to make sure you can directly use ${PWD}
for argument --project-root
.
Therefore, please make sure you switch to this directory before calling them.
$ cd ~/babs_demo/my_BABS_project # make sure you're in `my_BABS_project` folder
$ babs-check-setup \
--project-root ${PWD} \
--job-test
It might take a bit time to finish, depending on how busy your cluster is, and how much resources you requested in the YAML file - in this example, you only requested very minimal amount of resources.
You'll see this message at the end if babs-check-setup
was successful:
`babs-check-setup` was successful!
Before moving on, please make sure you review the summarized information of designated environment, especially the version numbers:
Below is the information of designated environment and temporary workspace:
workspace_writable: true
which_python: '/cbica/projects/BABS/miniconda3/envs/babs/bin/python'
version:
datalad: 'datalad 0.18.3'
git: 'git version 2.34.1'
git-annex: 'git-annex version: 10.20230215-gd24914f2a'
datalad_containers: 'datalad_container 1.1.9'
Full printed messages from babs-check-setup
Will check setups of BABS project located at: /cbica/projects/BABS/babs_demo/my_BABS_project
Will submit a test job for testing; will take longer time.
Below is the configuration information saved during `babs-init` in file 'analysis/code/babs_proj_config.yaml':
type_session: multi-ses
type_system: sge
input_ds:
$INPUT_DATASET_#1:
name: BIDS
path_in: https://osf.io/w2nu3/
path_data_rel: inputs/data/BIDS
is_zipped: false
container:
name: toybidsapp-0-0-7
path_in: /cbica/projects/BABS/babs_demo/toybidsapp-container
Checking the BABS project itself...
✓ All good!
Check status of 'analysis' DataLad dataset...
nothing to save, working tree clean
✓ All good!
Checking input dataset(s)...
✓ All good!
Checking container datalad dataset...
✓ All good!
Checking `analysis/code/` folder...
✓ All good!
Checking input and output RIA...
Datalad dataset `analysis`'s siblings:
.: here(+) [git]
.: input(-) [/cbica/projects/BABS/babs_demo/my_BABS_project/input_ria/d5f/7c9f2-1b55-4bc9-ada8-ca296b2c3268 (git)]
.: output-storage(+) [ora]
.: output(-) [/cbica/projects/BABS/babs_demo/my_BABS_project/output_ria/d5f/7c9f2-1b55-4bc9-ada8-ca296b2c3268 (git)]
✓ All good!
Submitting a test job, will take a while to finish...
Although the script will be submitted to the cluster to run, this job will not run the BIDS App; instead, this test job will gather setup information in the designated environment and will make sure jobs can finish successfully on current cluster.
Test job has been submitted (job ID: 4635991).
Will check the test job's status every 1 min...
2023-05-05 14:36:28.253463: Test job is pending (`qw`)...
2023-05-05 14:37:28.628330: Test job is pending (`qw`)...
2023-05-05 14:38:28.777482: Test job is running (`r`)...
2023-05-05 14:39:29.199464: Test job is successfully finished!
Below is the information of designated environment and temporary workspace:
workspace_writable: true
which_python: '/cbica/projects/BABS/miniconda3/envs/babs/bin/python'
version:
datalad: 'datalad 0.18.3'
git: 'git version 2.34.1'
git-annex: 'git-annex version: 10.20230215-gd24914f2a'
datalad_containers: 'datalad_container 1.1.9'
Please check if above versions are the ones you hope to use! If not, please change the version in the designated environment, or change the designated environment you hope to use in `--container-config-yaml-file` and rerun `babs-init`.
✓ All good in test job!
`babs-check-setup` was successful!
Now it's ready for job submissions.
Step 3. Submit jobs and check job status
We'll iteratively use babs-submit
and babs-status
to submit jobs and check job status.
We first use babs-status
to check the number of jobs we initially expect to finish successfully.
In this example walkthrough, as no initial list was provided,
BABS determines this number based on the number of sessions in the input BIDS dataset.
We did not request extra filtering (based on required files) in our YAML file either,
so BABS will submit one job for each session.
$ cd ~/babs_demo/my_BABS_project # make sure you're in `my_BABS_project` folder
$ babs-status --project-root $PWD
You'll see:
Did not request resubmit based on job states (no `--resubmit`).
Job status:
There are in total of 6 jobs to complete.
0 job(s) have been submitted; 6 job(s) haven't been submitted.
Let's use babs-submit
to submit one job and see if it will finish successfully.
By default, babs-submit
will only submit one job.
If you would like to submit all jobs, you can use the --all
argument.
$ babs-submit --project-root $PWD
You'll see something like this (the job ID will probably be different):
Job for sub-01, ses-A has been submitted (job ID: 4639278).
sub_id ses_id has_submitted job_id job_state_category job_state_code duration is_done is_failed
0 sub-01 ses-A True 4639278 NaN NaN NaN False NaN \
1 sub-01 ses-B False -1 NaN NaN NaN False NaN
2 sub-01 ses-C False -1 NaN NaN NaN False NaN
3 sub-02 ses-A False -1 NaN NaN NaN False NaN
4 sub-02 ses-B False -1 NaN NaN NaN False NaN
5 sub-02 ses-D False -1 NaN NaN NaN False NaN
log_filename last_line_stdout_file alert_message job_account
0 toy_sub-01_ses-A.*4639278 NaN NaN NaN
1 NaN NaN NaN NaN
2 NaN NaN NaN NaN
3 NaN NaN NaN NaN
4 NaN NaN NaN NaN
5 NaN NaN NaN NaN
You can check the job status via babs-status
:
$ babs-status --project-root $PWD
If it's successfully finished, you'll see:
Did not request resubmit based on job states (no `--resubmit`).
Job status:
There are in total of 6 jobs to complete.
1 job(s) have been submitted; 5 job(s) haven't been submitted.
Among submitted jobs,
1 job(s) are successfully finished;
0 job(s) are pending;
0 job(s) are running;
0 job(s) are failed.
All log files are located in folder: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/logs
Now, you can submit all other jobs by specifying --all
:
$ babs-submit --project-root $PWD --all
You can again call babs-status --project-root $PWD
to check status.
If those 5 jobs are pending (submitted but not yet run by the cluster), you'll see:
1Did not request resubmit based on job states (no `--resubmit`).
2
3Job status:
4There are in total of 6 jobs to complete.
56 job(s) have been submitted; 0 job(s) haven't been submitted.
6Among submitted jobs,
71 job(s) are successfully finished;
85 job(s) are pending;
90 job(s) are running;
100 job(s) are failed.
11
12All log files are located in folder: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/logs
If some jobs are running or have failed, you'll see non-zero numbers in line #9 or #10.
If all jobs have finished successfully, you'll see:
Did not request resubmit based on job states (no `--resubmit`).
Job status:
There are in total of 6 jobs to complete.
6 job(s) have been submitted; 0 job(s) haven't been submitted.
Among submitted jobs,
6 job(s) are successfully finished;
All jobs are completed!
All log files are located in folder: /cbica/projects/BABS/babs_demo/my_BABS_project/analysis/logs
Step 4. After jobs have finished
Step 4.1. Use babs-merge
to merge all results and provenance
After all jobs have finished successfully, we will merge all the results and provenance. Each job was executed on a different branch, so we must merge them together into the mainline branch.
We now run babs-merge
in the root directory of my_BABS_project
:
$ babs-merge --project-root $PWD
If it was successful, you'll see this message at the end:
`babs-merge` was successful!
Full printed messages from babs-merge
Cloning output RIA to 'merge_ds'...
[INFO ] Configure additional publication dependency on "output-storage"
configure-sibling(ok): . (sibling)
install(ok): /cbica/projects/BABS/babs_demo/my_BABS_project/merge_ds (dataset)
action summary:
configure-sibling (ok: 1)
install (ok: 1)
Listing all branches in output RIA...
Finding all valid job branches to merge...
Git default branch's name of output RIA is: 'master'
Merging valid job branches chunk by chunk...
Total number of job branches to merge = 6
Chunk size (number of job branches per chunk) = 2000
--> Number of chunks = 1
Merging chunk #1 (total of 1 chunk[s] to merge)...
Fast-forwarding to: remotes/origin/job-4639278-sub-01-ses-A
Trying simple merge with remotes/origin/job-4648997-sub-01-ses-B
Trying simple merge with remotes/origin/job-4649000-sub-01-ses-C
Trying simple merge with remotes/origin/job-4649003-sub-02-ses-A
Trying simple merge with remotes/origin/job-4649006-sub-02-ses-B
Trying simple merge with remotes/origin/job-4649009-sub-02-ses-D
Merge made by the 'octopus' strategy.
sub-01_ses-A_toybidsapp-0-0-7.zip | 1 +
sub-01_ses-B_toybidsapp-0-0-7.zip | 1 +
sub-01_ses-C_toybidsapp-0-0-7.zip | 1 +
sub-02_ses-A_toybidsapp-0-0-7.zip | 1 +
sub-02_ses-B_toybidsapp-0-0-7.zip | 1 +
sub-02_ses-D_toybidsapp-0-0-7.zip | 1 +
6 files changed, 6 insertions(+)
create mode 120000 sub-01_ses-A_toybidsapp-0-0-7.zip
create mode 120000 sub-01_ses-B_toybidsapp-0-0-7.zip
create mode 120000 sub-01_ses-C_toybidsapp-0-0-7.zip
create mode 120000 sub-02_ses-A_toybidsapp-0-0-7.zip
create mode 120000 sub-02_ses-B_toybidsapp-0-0-7.zip
create mode 120000 sub-02_ses-D_toybidsapp-0-0-7.zip
Pushing merging actions to output RIA...
Enumerating objects: 8, done.
Counting objects: 100% (8/8), done.
Delta compression using up to 40 threads
Compressing objects: 100% (2/2), done.
Writing objects: 100% (2/2), 522 bytes | 522.00 KiB/s, done.
Total 2 (delta 1), reused 0 (delta 0), pack-reused 0
To /cbica/projects/BABS/babs_demo/my_BABS_project/output_ria/d5f/7c9f2-1b55-4bc9-ada8-ca296b2c3268
5996f87..4d737d8 master -> master
dead here (recording state in git...)
ok
(recording state in git...)
[INFO] Determine push target
[INFO] Push refspecs
[INFO] Determine push target
[INFO] Push refspecs
[INFO] Update availability information
[INFO] Start enumerating objects
[INFO] Start counting objects
[INFO] Start compressing objects
[INFO] Start writing objects
[INFO] Finished push of Dataset(/gpfs/fs001/cbica/projects/BABS/babs_demo/my_BABS_project/merge_ds)
[INFO] Finished push of Dataset(/gpfs/fs001/cbica/projects/BABS/babs_demo/my_BABS_project/merge_ds)
publish(ok): . (dataset) [refs/heads/git-annex->origin:refs/heads/git-annex 8fd488d..d8a14a5]
action summary:
publish (notneeded: 1, ok: 1)
`babs-merge` was successful!
Now you're ready to consume the results.
Step 4.2. Consume results
To consume the results, you should not access the output RIA store
or merge_ds
directories inside the BABS project.
Instead, clone the output RIA as another folder (e.g., called my_BABS_project_outputs
)
to a location external to the BABS project:
$ cd .. # Now, you should be in folder `babs_demo`, where `my_BABS_project` locates
$ datalad clone \
ria+file://${PWD}/my_BABS_project/output_ria#~data \
my_BABS_project_outputs
You'll see:
[INFO ] Configure additional publication dependency on "output-storage"
configure-sibling(ok): . (sibling)
install(ok): /cbica/projects/BABS/babs_demo/my_BABS_project_outputs (dataset)
action summary:
configure-sibling (ok: 1)
install (ok: 1)
Let's go into this new folder and see what's inside:
$ cd my_BABS_project_outputs
$ ls
You'll see:
CHANGELOG.md sub-01_ses-B_toybidsapp-0-0-7.zip@
code/ sub-01_ses-C_toybidsapp-0-0-7.zip@
containers/ sub-02_ses-A_toybidsapp-0-0-7.zip@
inputs/ sub-02_ses-B_toybidsapp-0-0-7.zip@
README.md sub-02_ses-D_toybidsapp-0-0-7.zip@
sub-01_ses-A_toybidsapp-0-0-7.zip@
As you can see, each session's results have been saved in a zip file. Before unzipping a zip file, you need to get its content first:
$ datalad get sub-01_ses-A_toybidsapp-0-0-7.zip
$ unzip sub-01_ses-A_toybidsapp-0-0-7.zip
You'll see printed messages like this:
# from `datalad get`:
get(ok): sub-01_ses-A_toybidsapp-0-0-7.zip (file) [from output-storage...]
# from unzip:
Archive: sub-01_ses-A_toybidsapp-0-0-7.zip
creating: toybidsapp/
extracting: toybidsapp/num_nonhidden_files.txt
From the zip file, you got a folder called toybidsapp
.
$ cd toybidsapp
$ ls
In this folder, there is a file called num_nonhidden_files.txt
.
This is the result from toy BIDS App, which is the number of non-hidden files in this subject.
Note that for raw BIDS dataset, toy BIDS App counts at subject-level, even though
current dataset is a multi-session dataset.
$ cat num_nonhidden_files.txt
67
Here, 67
is the expected number for sub-01
(which you're looking at),
56
is the expected number for sub-02
.
This means that toy BIDS App and BABS ran as expected :).