Docker Diaries: Docker → Singularity on HPC Systems

I’ve been writing about the advantages of using Docker for reproducibility given its portability and imutability, but I’ve come to realize that HPC sites typically don’t allow users to run Docker containers for various reasons related to security. As a result, I’ve decided to dip my toes into Singulairity. Again I’ll use the example of containerizing this age prediction model that I’m currently using for my research on brain aging. What I’ve figured out so far:

How to convert a Docker image to a Singularity Image
How to run a Python script (run_inference.py) that launches the aforementioned ML model inside a Singularity container.

🚨

None of this tutorial/example is going to make any sense unless you’ve read how the Docker image was built here. All the scripts discussed here, including those for running and managing Singularity containers, were executed on my local machine, not on an HPC system, although the end goal is doing so on an HPC.

Docker vs. Singularity

It might help to understand the differences between Docker and Singularity to provide some context and motivation why Singularity is the default for running containers on HPCs.

Feature or Concern	Docker	Singularity
Security	Runs as root by default; potential security risks	Runs as user; designed to mitigate privilege escalation
Resource Sharing	Root access can be a risk on shared resources	Designed for non-root access; safer for shared resources
HPC Architecture Support	Limited native support for HPC hardware	Native support for HPC hardware like GPUs and InfiniBand
Integration with HPC tools	Not designed for HPC job schedulers	Integrates well with HPC job schedulers like Slurm
Resource Assumptions	Assumes local resources and control	Can work efficiently with distributed HPC resources
Networking	Oriented around TCP/IP networking	Supports HPC-optimized networking like InfiniBand
Container Image Format	Layered images managed by a daemon	Single file image for easy transport and reproducibility

Glossary:

HPC-Optimized Networking: HPC environments often use specialized networking hardware such as InfiniBand. InfiniBand has emerged as a preferred interconnect technology for HPCs due to its low-latency characteristics; it provides high data transfer rates and low communication overhead.
Schedulers: HPC systems use job schedulers, such as Slurm, to manage the allocation of computational resources to various user-submitted jobs. These schedulers help in queuing and executing jobs efficiently on the HPC cluster.
TCP/IP: TCP/IP is the standard suite of communication protocols used across the internet. Its process is similar to sending a message written on a puzzle, where each piece travels through different postal routes to reach the destination; the internet uses this method to send data in packets, ensuring they can reroute if a path is congested or down, and then reassemble them upon arrival. It's a robust and efficient system that allows for reliable communication, even in the face of network issues.
Root Access: refers to having the highest level of user permissions on a system, which can pose security risks.
Daemon: is a background process that manages system services. Docker uses a daemon that traditionally requires root privileges to run.

Building a Singularity Image

Prerequisites

The Python script (run_inference.py) and its dependencies inside the container.
Required data files and directories available on your host system. We will mount these two folders/files as a volume to the container.

T1 MRI files in a folder on your local machine, in this case, in the folder T1
subject_id_file_path.csv containing the paths to the T1 MRI files

Directory Tree for context

nghuixin@HXN~/
├── MidiBrainAge
│   ├── BrainAge
│   │   ├── ANTsPy
│   │   ├── Data
│   │   ├── Dockerfile
│   │   ├── LICENCE
│   │   ├── Models
│   │   ├── README.md
│   │   ├── T1
│   │   ├── Utils
│   │   ├── pycache
│   │   ├── fine_tune.py
│   │   ├── new_requirements.txt
│   │   ├── pre_process.py
│   │   ├── requirements.txt
│   │   ├── run_inference.py
│   │   ├──  midi_brain_age_2.0.sif
│   │   └── subject_id_file_path.csv
│   ├── Dockerfile
│   ├── bin
│   ├── include
│   ├── lib
│   ├── lib64
│   └── pyvenv.cfg
├── go
├── hd-bet_params
├── index.html
└── singularity

Steps

Preparing the Environment:

Navigate to your data folder and reate a Singularity image file (.sif) from the Docker image by pulling the docker image from docker hub:

singularity pull docker://nghuixin/midi_brain_age:2.0  # in my case, this is the BrainAge folder

Starting the Singularity Container:

Use singularity shell command with the -writable-tmpfs option to create a temporary writable filesystem. This command allows us to create an output folder like test_results from the singularity container. Bind the local directories (T1, subject_id_file_path.csv) to the corresponding paths inside the container (/app/T1, /app/subject_id_file_path.csv):

cd MidiBrainAge
singularity shell --writable-tmpfs --bind $(pwd)/BrainAge/T1:/app/T1 --bind $(pwd)/BrainAge/subject_id_file_path.csv:/app/subject_id_file_path.csv  $(pwd)/BrainAge/midi_brain_age_2.0.sif

Navigating Inside the Container:

Once inside the container, change to the /app directory (because we created the app directory at build time in the Docker file; see the article on Docker) where run_inference.py is located:

cd /app

Running the Python Script:

Execute the run_inference.py script with the necessary arguments. You should get the output folder test_results with the results of the predictions as a csv file.

python run_inference.py --csv_file subject_id_file_path.csv --project_name test_result --skull_strip --sequence t1 --ensemble

Notes

The -writable-tmpfs option provides a temporary writable filesystem, and changes inside the container are not persistent after you exit. There are other options such as -writeable but in may not work in an HPC.
I need to look deeper into whether -writeable-tmpfs or -write makes more sense in the context of HPCs. One of the issues I ran into, which is specific to this ML model, is this error that appeared when I included --bind $(pwd)/BrainAge/test_results:app/test_results in the singularity command:

Traceback (most recent call last):
  File "run_inference.py", line 51, in <module>
    raise ValueError('project name {} already exists')

Initially, I opted to mount an empty folder from the local machine to the container, to attempt to save the results from the ML model to that folder test_results . However, as we have seen above, the error above appears - this errors stems from the python script written by the author of the ML model. (edit: I eventually edited the python script so this is no longer an issue).

Yet if I did not create an empty folder in the local machine, this is the error that appeared:

OSError: [Errno 30] Read-only file system

Typically, Singularity container's filesystem is mounted as read-only. This is a common default setting in Singularity for security and reproducibility reasons. It means that while I can read files (from my local machine) and execute programs within the container, I am not allowed write to or modify the container's filesystem (i.e., create a new folder test_results in the local machine).

Docker Diaries: Docker → Singularity