How to save output after running new analyses in your docker container
Recently I gave a brief presentation introducing docker to fellow scientists about how to create a reproducible environment the analyses completed and to facilitate sharing between lab members. Aming the questions that came up were concerns over data privacy and security (which I have addressed them here), and issues with persisting outputs and new analyses added to the original script that came with the image. So for instance, I’ve shared an image (pull it here) to Docker hub and my colleague pulls it and runs the container to reproducible my results. But they decide to add some new analyses to the script (see code comments), and now they would like to save the newly added code and its output.
Snippet of R script that you can find by pulling the docker image I created:
# Load necessary libraries
library(readr) # For reading CSV files
library(plyr) # For data manipulation
library(tidyverse) # For data wrangling and visualization
library(lme4) # For linear mixed-effects models
library(car) # For diagnostic plots
library(ggplot2) # For visualization
library(nlme) # For fitting mixed-effects models
# Read the CSV file
data <- read_csv("data/infl_231010.csv")
# A bunch of data cleaning steps:
# ....
# ....
complete_data <- data[....]
# --- Analysis ---
# Fit the mixed-effects model using lme4
mod2 <- lme(lgvegf ~ time * (agem * dxgroup + gender) + dxgroup * (gender) ,
random = ~ time | subnum,
data = complete_data)
# Print model summary
summary(mod2)
# --- Plots ---
# Create a ggplot to plot the data and fitted values
ggplot(complete_data, aes(x = time, y = lgvegf, color = dxgroup)) +
geom_point(size = 0.9) + # Add points for the observed data
geom_smooth(method = "lm", se = FALSE) + # Add regression line without confidence interval
labs(x = "Time", y = "lgvegf", color = "dxgroup") + # Labels
theme_minimal() # Theme
#### ---- NEW analyses and plots that were NOT already part of the origina image and container -------
# Fit the mixed-effects model using lme4
mod3 <- lme(lgvegf ~ time * (agem * dxgroup + gender) ,
random = ~ time | subnum,
data = complete_data)
# Print model summary
summary(mod3)
# Create a ggplot to plot the data and fitted values
jpeg('figures/plot3.png')
ggplot(complete_data, aes(x = time, y = fitted_values, color = gender, linetype = dxgroup)) +
geom_point(size = 0.9) + # Add points for the observed data
geom_smooth(method = "lm", se = TRUE) + # Add regression line without confidence interval
labs(x = "Time", y = "Predicted lgvegf", color = "dxgroup") + # Labels
theme_minimal() # Theme
dev.off()
print('analyses completed')
Option 1: Create a new Docker image
The most common approach involves committing changes made within a container to a new image. This can be done using the docker commit
command, which creates a new image that includes the changes. For example:
docker commit <container_id> <new_image_name>:<tag>
docker commit nghuixin/infl_marker_analysis:1.0.0 soohyun/infl_marker_analysis:1.0.0
Upon running the new docker container with docker run soohyun/infl_marker_analysis:1.0.0
, the new code will be visible in the R script, while preserving the same libraries and versions, and will produce the expected outputs.
Option 2: Save the modified script and/or analysis results outside of Docker container
Save the modified script to your local machine
You can do so by running the following commands:
docker container ls
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
5cb1dedcc204 nghuixin/infl_marker_analysis:1.0.0 "/init" About an hour ago Up About an hour 0.0.0.0:8787->8787/tcp eager_poincare
docker cp 5cb1dedcc204:home/rstudio/analysis/infl_marker_huixin.R ./container_r.R
5cb1dedcc204
is the container id which can be obtained by running docker container ls
, and ./container_r.R
is the new R script with the added lines of code. It is now saved in the root directory of the project on the hosts machine.
🚨 However, this is not recommended, because once the file is saved outside of the Docker container, then there is no guarantee that the results of the analyses will be replicable given that the versions of R and associated libraries might not be the same the local machine.
Save the analyses results to your local machine (text output)
If or some reason, you wish to only save the output like summary(mod3)
above, then you can either save your results output by using sink()
#### ---- New analyses that were NOT already part of the container -------
sink('new_analyses_output.txt')
# Fit the mixed-effects model using lme4
mod3 <- lme(lgvegf ~ time * (agem * dxgroup + gender) , random = ~ time | subnum, data = complete_data)
# Print model summary
summary(mod3)
print('analyses completed')
sink()
Next, you can copy the txt
file output from the container to your local machine.
docker cp <container_id>:/path/to/container/file /path/to/local/destination
docker cp 5cb1dedcc204 :/home/rstudio/new_analyses_output.txt /new_analyses_output.txt
Save the analyses results to your local machine (image output)
You can also do the same for the plots you created. For instance, if there isn’t already a figures
directory in this Docker container you can create it manually just as you would on your local machine R studio, or run dir.create('figures')
:
Then run the code for creating a new plot:
# Create a ggplot to plot the data and fitted values
jpeg('figures/plot3.png')
ggplot(complete_data, aes(x = time, y = fitted_values, color = gender, linetype = dxgroup)) +
geom_point(size = 0.9) + # Add points for the observed data
geom_smooth(method = "lm", se = TRUE) + # Add regression line without confidence interval
labs(x = "Time", y = "Predicted lgvegf", color = "dxgroup") + # Labels
theme_minimal() # Theme
dev.off()