Quick start guide running Nextflow on HPC
- Link to code on github
- After learning nextflow at one of the Seqera sessions, I decided to try running nextflow on my university’s HPC and made a some tweaks to the original tutorial as an extra ‘challenge’ to test my understanding.
Setting up the environment
- Load the Java Module:
- Check available Java versions:
module avail
. - Load your desired Java version:
module load java-19
. - Verify Java Installation:
- Check the Java version:
java -version
. - Set
JAVA_HOME
: - After loading Java, set
JAVA_HOME
:export JAVA_HOME=$(dirname $(dirname $(readlink -f $(which java))))
. - Add the commands above to
~/.bashrc
or~/.bash_profile
to make it permanent. - Honestly I wasn’t sure if this step was absolutely necessary for nextflow but most Java-based applications and development tools might point to
JAVA_HOME
so I might as well. - Autoload Java each session:
- To auto-load Java in every session, add the
module load
command to~/.bashrc
or~/.bash_profile
:echo "module load java-11" >> ~/.bashrc
. - Apply changes:
source ~/.bashrc
or log out and back in. - Install Nextflow by following the documentation
Kumusta Mundo, Selamat Pagi, God Morgen
At the training session, I learned that nextflow uses a combination of Groovy and shell scripts. I followed the hello-world tutorial here but added some code to process two files by taking two inputs: --input_file
and --lang_file
instead of just one in the original tutorial.
In nextflow, a process represents a single task or step within a larger workflow. Each process is a self-contained unit that performs a specific action, such as running a script, processing a file, or executing a command. A workflow is a larger function that defines how multiple processes are connected and interact with each other. It executes multiple processes, specifying the order and conditions under which each process runs. In this case, we have sayHello
and toUpper
that are called inside the workflow.
Nextflow pipeline
// Sets a parameter for the output file name, defaulting to 'output.txt'
params.output_file = 'output.txt'
// Defines a process named 'sayHello'
process sayHello {
input:
// Takes two input values: 'greeting' and 'lang'
val greeting
val lang
output:
// Specifies the output file path using the input 'lang' and the 'params.output_file' parameter
path "${lang}-${params.output_file}"
// Script block with Unix-like commands
// Writes the 'greeting' value to a file named as per 'lang' and 'params.output_file'
"""
echo '$greeting' > '$lang-$params.output_file'
"""
}
// Defines another process named 'toUpper'
process toUpper {
input:
// Takes a file path as input
path input_file
output:
// Specifies the output file path, prefixing the input file name with 'upper-'
path "upper-${input_file}"
// Script block with Unix-like commands
// Reads the 'input_file', converts its content to upper case, and writes to a new file
"""
cat $input_file | tr '[a-z]' '[A-Z]' > upper-${input_file}
"""
}
// Defines the workflow
workflow {
// Creates a channel from the file specified in 'params.input_file' and splits its content into lines
greeting_ch = Channel.fromPath(params.input_file).splitText() { it.trim() }
// Creates another channel from the file specified in 'params.lang_file' and splits its content into lines
lang_ch = Channel.fromPath(params.lang_file).splitText() {it.trim()}
// Calls the 'sayHello' process with 'greeting_ch' and 'lang_ch' as inputs
sayHello(greeting_ch, lang_ch)
// Calls the 'toUpper' process with the output of 'sayHello' process as its input
toUpper(sayHello.out)
}
selamat pagi
magandang umaga
god morgen
malay
tagalog
norwegian
Run the Nextflow pipeline
nextflow run test.nf --input_file "greetings.txt" --lang_file "languages.txt" ansi_log false
Nextflow uses ANSI escape codes in its terminal logging to enhance readability with color and interactivity. However, these features can be less useful and clutter logs with plain text ANSI characters in non-interactive contexts. Nextflow allows disabling rich ANSI logging for cleaner, plain text output, so we’ll do that.
The output from running the commands above indicate that there were 6 tasks in total, which is correct.sayHello
process has completed all tasks (3 out of 3), and so did toUpper
.
N E X T F L O W ~ version 23.10.1
Launching test.nf [prickly_gates] DSL2 - revision: bc2825233c
executor > local (6)
[39/0941f9] process > sayHello (2) [100%] 3 of 3 ✔
[5f/666175] process > toUpper (1) [100%] 3 of 3 ✔
One of the more mystifying things for me was the numbers next to processes do not add up to the total number of tasks i.e., (2) and (1). It appears that Nextflow allows for parallel execution of tasks, so if a process is designed to work on multiple data items independently, it can launch several tasks in parallel, each counted separately in the total task count. The numbers next to process names in Nextflow's log output are counts of process invocations, which do not correspond to the total task count.
Other tips:
command.sh
: This file contains the actual command executed by Nextflow. Review it to verify that the command was interpreted and executed as intended..exitcode
: This file holds the exit code of the command. An exit code of0
indicates successful execution. Any other number suggests an error or issue occurred.command.out
: This file captures the output produced by the command. Check here to see what the command generated or returned during its execution.