Docker Diaries: Docker → Singularity on HPC Systems
I’ve been writing about the advantages of using Docker for reproducibility given its portability and imutability, but I’ve come to realize that HPC sites typically don’t allow users to run Docker containers for various reasons related to security. As a result, I’ve decided to dip my toes into Singulairity. Again I’ll use the example of containerizing this age prediction model that I’m currently using for my research on brain aging. What I’ve figured out so far:
- How to convert a Docker image to a Singularity Image
- How to run a Python script (
run_inference.py
) that launches the aforementioned ML model inside a Singularity container.
Docker vs. Singularity
It might help to understand the differences between Docker and Singularity to provide some context and motivation why Singularity is the default for running containers on HPCs.
Feature or Concern | Docker | Singularity |
Security | Runs as root by default; potential security risks | Runs as user; designed to mitigate privilege escalation |
Resource Sharing | Root access can be a risk on shared resources | Designed for non-root access; safer for shared resources |
HPC Architecture Support | Limited native support for HPC hardware | Native support for HPC hardware like GPUs and InfiniBand |
Integration with HPC tools | Not designed for HPC job schedulers | Integrates well with HPC job schedulers like Slurm |
Resource Assumptions | Assumes local resources and control | Can work efficiently with distributed HPC resources |
Networking | Oriented around TCP/IP networking | Supports HPC-optimized networking like InfiniBand |
Container Image Format | Layered images managed by a daemon | Single file image for easy transport and reproducibility |
Glossary:
- HPC-Optimized Networking: HPC environments often use specialized networking hardware such as InfiniBand. InfiniBand has emerged as a preferred interconnect technology for HPCs due to its low-latency characteristics; it provides high data transfer rates and low communication overhead.
- Schedulers: HPC systems use job schedulers, such as Slurm, to manage the allocation of computational resources to various user-submitted jobs. These schedulers help in queuing and executing jobs efficiently on the HPC cluster.
- TCP/IP: TCP/IP is the standard suite of communication protocols used across the internet. Its process is similar to sending a message written on a puzzle, where each piece travels through different postal routes to reach the destination; the internet uses this method to send data in packets, ensuring they can reroute if a path is congested or down, and then reassemble them upon arrival. It's a robust and efficient system that allows for reliable communication, even in the face of network issues.
- Root Access: refers to having the highest level of user permissions on a system, which can pose security risks.
- Daemon: is a background process that manages system services. Docker uses a daemon that traditionally requires root privileges to run.
Building a Singularity Image
Prerequisites
- The Python script (
run_inference.py
) and its dependencies inside the container. - Required data files and directories available on your host system. We will mount these two folders/files as a volume to the container.
- T1 MRI files in a folder on your local machine, in this case, in the folder
T1
subject_id_file_path.csv
containing the paths to the T1 MRI files
Directory Tree for context
nghuixin@HXN~/
├── MidiBrainAge
│ ├── BrainAge
│ │ ├── ANTsPy
│ │ ├── Data
│ │ ├── Dockerfile
│ │ ├── LICENCE
│ │ ├── Models
│ │ ├── README.md
│ │ ├── T1
│ │ ├── Utils
│ │ ├── pycache
│ │ ├── fine_tune.py
│ │ ├── new_requirements.txt
│ │ ├── pre_process.py
│ │ ├── requirements.txt
│ │ ├── run_inference.py
│ │ ├── midi_brain_age_2.0.sif
│ │ └── subject_id_file_path.csv
│ ├── Dockerfile
│ ├── bin
│ ├── include
│ ├── lib
│ ├── lib64
│ └── pyvenv.cfg
├── go
├── hd-bet_params
├── index.html
└── singularity
Steps
- Preparing the Environment:
- Navigate to your data folder and reate a Singularity image file (.sif) from the Docker image by pulling the docker image from docker hub:
singularity pull docker://nghuixin/midi_brain_age:2.0 # in my case, this is the BrainAge folder
- Starting the Singularity Container:
- Use
singularity shell
command with the-writable-tmpfs
option to create a temporary writable filesystem. This command allows us to create an output folder liketest_results
from the singularity container. Bind the local directories (T1
,subject_id_file_path.csv
) to the corresponding paths inside the container (/app/T1
,/app/subject_id_file_path.csv
): - Navigating Inside the Container:
- Once inside the container, change to the
/app
directory (because we created the app directory at build time in the Docker file; see the article on Docker) whererun_inference.py
is located: - Running the Python Script:
- Execute the
run_inference.py
script with the necessary arguments. You should get the output foldertest_results
with the results of the predictions as a csv file.
cd MidiBrainAge
singularity shell --writable-tmpfs --bind $(pwd)/BrainAge/T1:/app/T1 --bind $(pwd)/BrainAge/subject_id_file_path.csv:/app/subject_id_file_path.csv $(pwd)/BrainAge/midi_brain_age_2.0.sif
cd /app
python run_inference.py --csv_file subject_id_file_path.csv --project_name test_result --skull_strip --sequence t1 --ensemble
Notes
- The
-writable-tmpfs
option provides a temporary writable filesystem, and changes inside the container are not persistent after you exit. There are other options such as-writeable
but in may not work in an HPC. - I need to look deeper into whether
-writeable-tmpfs
or-write
makes more sense in the context of HPCs. One of the issues I ran into, which is specific to this ML model, is this error that appeared when I included--bind $(pwd)/BrainAge/test_results:app/test_results
in the singularity command:
Traceback (most recent call last):
File "run_inference.py", line 51, in <module>
raise ValueError('project name {} already exists')
Initially, I opted to mount an empty folder from the local machine to the container, to attempt to save the results from the ML model to that folder test_results
. However, as we have seen above, the error above appears - this errors stems from the python script written by the author of the ML model. (edit: I eventually edited the python script so this is no longer an issue).
Yet if I did not create an empty folder in the local machine, this is the error that appeared:
OSError: [Errno 30] Read-only file system
Typically, Singularity container's filesystem is mounted as read-only. This is a common default setting in Singularity for security and reproducibility reasons. It means that while I can read files (from my local machine) and execute programs within the container, I am not allowed write to or modify the container's filesystem (i.e., create a new folder test_results
in the local machine).