7 Developer notes
7.1 Design
py-rocket-base is inspired by repo2docker and the Pangeo Docker stack design. py-rocker-base is built using repo2docker (via repo2docker-action) and thus lets repo2docker make the choices regarding the environment design—things like how the conda environment is set-up and the base directory structure and permissions.
The Pangeo Docker stack does not use repo2docker, but mimics repo2docker’s environment design. The Pangeo base-image behaves similar to repo2docker in that using the base-image in the FROM
line of a Dockerfile causes the build to look for files with the same names as repo2docker’s configuration files and then do the proper action with those files. This means that routine users do not need to know how to write Dockerfile code in order to extend the image with new packages or applications. py-rocker-base Docker image uses this Pangeo base-image design. It is based on ONBUILD
commands in the Dockerfile that trigger actions only when the image is used in the FROM
line of another Dockerfile.
py-rocket-base does not include this ONBUILD
behavior. Instead it follows the rocker docker stack design and provides helper scripts for building on the base image. py-rocket-base a directory called \pyrocket_scripts
that will help you do common tasks for scientific docker images.These scripts are not required. If users are familiar with writing Docker files, they can write their own code. The use of helper scripts was used after feedback that the Pangeo ONBUILD behavior makes it harder to customize images that need very specific structure or order of operations.
There are many ways to install R and RStudio into an image designed for JupyterHubs The objective of py-rocker-base is not to install R and RStudio, per se, and there are other leaner and faster ways to install R/RStudio if that is your goal1. The objective of py-rocket-base is to create an JupyterHub image such when you click the RStudio button in the JupyterLab UI to enter the RStudio UI, you enter an environment that is the same as if you had used a Rocker image. If you are in the JupyterLab UI, the environment is the same as it you had used repo2docker (or Pangeo base-image) to create the environment.
7.2 Documentation
To build the documentation book, clone repo and then
cd book
quarto render .
Set GitHub Pages to docs folder.
7.3 Building the images
The .github/workflows/build.yaml
is a GitHub Action to build the image with repo2docker-action. The GitHub Action builds the image and the URL will look like one of these
ghcr.io/nmfs-opensci/repo-name/image-name:latest
ghcr.io/nmfs-opensci/image-name:latest
For example, for this repo the image is ghcr.io/nmfs-opensci/py-rocket-base:latest
.
7.4 repo2docker
repo2docker-action is creating the image and publishing to ghcr.io/nmfs-opensci/py-rocket-base
(image hosting like dockerhub or quay.io).
repo2docker (a python package) sets up the structure of the base environment, e.g. installs mamba for package solving, sets up environment variables, installs linux packages, etc. It looks for specific files (like apt.txt, environment.yml, postBuild) in the build context (the repo that the Dockerfile is in) and takes the appropriate action. repo2docker-action also allows you to include appendix
to add more commands to your Dockerfile.
repo2docker does a lot behind the scene and has some different behavior.
7.4.1 COPY
COPY <src> <dest>
does not work in appendix
because repo2docker changes the build context. The files are in src
not .
. In appendix, you do not do lines like this
COPY file1 newlocation/file1
to bring file1 into the build. Instead, the files are already in ${REPO_DIR}/
. If you want to copy a file to a new location, run the following as root if jovyan does not have permission to write to newlocation
.
RUN cp ${REPO_DIR}/file1 newlocation/file1
7.4.2 ENV and ARGS
repo2docker uses a number of ARGs in the build which you might expect to be in the image environment, for example ${NB_USER}
. These need to be converted to ENV to be available to child builds.
7.5 RStudio
jupyter-rsession-proxy allows us to launch RStudio from Jupyter Lab, but the environment is different than the environment in Jupyter Lab.
7.5.1 Environmental variables
- PATH is different. conda is not on the path.
- None of the environmental variables in the docker file will be in the
/rstudio
environment. The start command affects\lab
and\notebook
but not\rstudio
. - The path in the terminal (in RStudio) can/is different than in the R console. Expect weird unexpected behavior because of this. If you type
bash
, then.bashrc
is run and that will runconda init
and that will add conda binaries to the path. Then really weird and unexpected things can happen.
If you need some environmental variable set, you will need to set those in $R_HOME/etc/Rprofile.site
which is run when R starts.
7.6 Basic structure of py-rocket-base
The py-rocket-base docker build has the following structure:
# base image and environment
repo2docker sets this up
repo2docker-action sets the directory where the build files
are put to /srv/repo (via $REPO_DIR in GitHub Action)
and ownership is set to jovyan (via $NB_USER)
# environment.yml
repo2docker adds these packages to the conda notebook environment
# start
repo2docker points the Docker image entrypoint (command run on start) to this file
${REPO_DIR}/start
# appendix
repo2docker-action adds the Docker commands here to the end of Dockerfile
most of the work in py-rocket-base is done here. appendix calls rocker.sh and install packages in apt2.txt
Each file is described below.
7.7 apt2.txt
This is not named apt.txt
because these packages need to be installed after R is installed because the R scripts uninstall packages as part of cleanup. There are some packages that are required for Desktop (/desktop
) to operate correctly. Packages needed for R and RStudio building (/rstudio
) are installed via the rocker install scripts.
7.8 environment.yml
These are added to the notebook conda environment and in py-rocket-base the basic packages needed for Jupyter Lab, RStudio and Desktop are added. Scientific packages are not added here. They will be added via child images that use py-rocket-base as the base image (in the FROM line).
7.9 appendix
This a long file with many pieces. The pieces are explained below. Click on the number next to code to read about what that code block does.
USER root
# Set env variables
# This is the default env in repo2docker type images
=notebook
ENV CONDA_ENV# Tell applications where to open desktop apps
=":1.0"
DISPLAY
# Install R, RStudio via Rocker scripts
="4.4.1"
ENV R_VERSION="verse_${R_VERSION}"
ENV R_DOCKERFILE# This is in the rocker script but will not run since ${NB_USER} already exists
# Needed because rocker scripts set permissions based on the staff group
-a -G staff "${NB_USER}"
RUN usermod =/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && \
RUN PATH+x ${REPO_DIR}/rocker.sh && \
chmod ${REPO_DIR}/rocker.sh
# Install linux packages after R installation since the R install scripts get rid of packages
# The package_list part is reading the file and doing clean-up to just have the list of packages
=$(grep -v '^\s*#' ${REPO_DIR}/apt2.txt | grep -v '^\s*$' | sed 's/\r//g; s/#.*//; s/^[[:space:]]*//; s/[[:space:]]*$//' | awk '{$1=$1};1') && \
RUN package_list-get update && \
apt-get install --yes --no-install-recommends $package_list && \
apt-get autoremove --purge && \
apt-get clean && \
apt-rf /var/lib/apt/lists/*
rm
# Re-enable man pages disabled in Ubuntu 18 minimal image
# https://wiki.ubuntu.com/Minimal
| unminimize
RUN yes # NOTE: $NB_PYTHON_PREFIX is the same as $CONDA_PREFIX at run-time.
# $CONDA_PREFIX isn't available in this context.
# NOTE: Prepending ensures a working path; if $MANPATH was previously empty,
# the trailing colon ensures that system paths are searched.
="${NB_PYTHON_PREFIX}/share/man:${MANPATH}"
ENV MANPATH
RUN mandb
# Add custom jupyter config. You can also put config.py files in the same place
${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_server_config.d/
COPY custom_jupyter_server_config.json ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_notebook_config.d/
COPY custom_jupyter_server_config.json
# Clean up extra files in ${REPO_DIR}
-rf ${REPO_DIR}/book ${REPO_DIR}/docs
RUN rm
# Copy scripts into /pyrocket_scripts directory in the image
-p /pyrocket_scripts && cp -r ${REPO_DIR}/scripts/* /pyrocket_scripts/
RUN mkdir
# Set ownership to root and permissions to 755
-R root:staff /pyrocket_scripts && \
RUN chown -R 775 /pyrocket_scripts
chmod
# Convert NB_USER to ENV (from ARG) so that it passes to the child dockerfile
=${NB_USER}
ENV NB_USER
# Revert to default user and home as pwd
${NB_USER}
USER ${HOME} WORKDIR
- 1
-
Some commands need to be run as root, such as installing linux packages with
apt-get
- 2
- Set variables. CONDA_ENV is useful for child builds
- 3
-
This section runs the script
rocker.sh
which installs R and RStudio using rocker scripts. - 4
-
The rocker scripts build R from source and as part of clean up in the script, linux packages are removed that are not needed. repo2docker installs the packages in
apt.txt
automatically before the code inappendix
thus the needed linux packages (which include packages for the Xfce Desktop Environment in\desktop
) are put inapt2.txt
. repo2docker will not detect this file and we can install the packages here after R is built. Thegrep -v
etc code is processingapt2.txt
and removing comments and blank lines. - 5
-
Ubuntu does not have man pages installed by default. These lines activate
man
so users have the common help files. - 6
- This is some custom jupyter config to allow hidden files to be listed in the folder browser.
- 7
-
book
anddocs
are the documentation files and are not needed in the image. - 8
-
Copy the pyrocket helper scripts to the
/pyrocket_scripts
directory and set to executable. - 9
-
The
NB_USER
environmental variable is not exported by repo2docker (it is an argument confined to the parent build) but is very useful for child builds. So it is converted to an environmental variable. - 10
-
The parent docker build completes by setting the user to jovyan and the working directory to
${HOME}
. Within a JupyterHub deployment,${HOME}
will often be re-mapped to the user persistent memory so it is important not to write anything that needs to be persistent to${HOME}
, for example configuration. You can do this in thestart
script since that runs after the user directory is mapped or you can put configuration files in some place other than${HOME}
.
7.10 rocker.sh
This script will copy in the rocker scripts from rocker-versioned2 into ${REPO_DIR}
to install things. It will read in one of the rocker docker files using R_DOCKERFILE
defined in the appendix
file (which is inserted into the main docker file). Variables defined here will only be available in this script. Click on the numbers in the script to learn what each section is doing.
#!/bin/bash
set -e
# Copy in the rocker files. Work in ${REPO_DIR} to make sure I don't clobber anything
cd ${REPO_DIR}
wget https://github.com/rocker-org/rocker-versioned2/archive/refs/tags/R${R_VERSION}.tar.gz
tar zxvf R${R_VERSION}.tar.gz && \
mv rocker-versioned2-R${R_VERSION}/scripts /rocker_scripts && \
ROCKER_DOCKERFILE_NAME="${R_DOCKERFILE}.Dockerfile"
mv rocker-versioned2-R${R_VERSION}/dockerfiles/${ROCKER_DOCKERFILE_NAME} /rocker_scripts/original.Dockerfile && \
rm R${R_VERSION}.tar.gz && \
rm -rf rocker-versioned2-R${R_VERSION}
cd /
# Read the Dockerfile and process each line
while IFS= read -r line; do
# Check if the line starts with ENV or RUN
if [[ "$line" == ENV* ]]; then
# Assign variable
var_assignment=$(echo "$line" | sed 's/^ENV //g')
# Replace ENV DEFAULT_USER="jovyan"
if [[ "$var_assignment" == DEFAULT_USER* ]]; then
var_assignment="DEFAULT_USER=${NB_USER}"
fi
# Run this way eval "export ..." otherwise the " will get turned to %22
eval "export $var_assignment"
# Write the exported variable to env.txt
echo "export $var_assignment" >> ${REPO_DIR}/env.txt
elif [[ "$line" == RUN* ]]; then
# Run the command from the RUN line
cmd=$(echo "$line" | sed 's/^RUN //g')
echo "Executing: $cmd"
eval "$cmd" # || echo ${cmd}" encountered an error, but continuing..."
fi
done < /rocker_scripts/original.Dockerfile
# Install extra tex packages that are not installed by default
if command -v tlmgr &> /dev/null; then
echo "Installing texlive collection-latexrecommended..."
tlmgr install collection-latexrecommended
tlmgr install pdfcol tcolorbox eurosym upquote adjustbox titling enumitem ulem soul rsfs
fi
- 1
-
The rocker-versioned2 repository for a particular R version is copied into
{REPO_DIR}
and unzipped.R_VERSION
is defined inappendix
. - 2
-
The unzipped directory will be named
rocker-versioned2-R${R_VERSION}
. We move thescripts
directory to/rocker_scripts
(base level) because the rocker scripts expect the scripts to be there. - 3
-
R_DOCKERFILE
is defined asverse_${R_VERSION}
. The docker file we will process (find ENV and RUN lines) is calledROCKER_DOCKERFILE_NAME
in the rocker files. We move this to/rocker_scripts/original.Dockerfile
so we can refer to it later. - 4
- Clean up the rocker directories that we no longer need.
- 5
-
cd to the base level where
/rocker_scripts
is. - 6
-
The big while loop is processing
/rocker_scripts/original.Dockerfile
. The code is using piping>
and the input file and pipe is specified at the end of the while loop code. - 7
-
This looks if the line starts with
ENV
and if it does, it strips offENV
and stores the variable assigment statement to$var_assignment
. - 8
-
The rocker docker files do not use the
NB_USER
environmental variable (defined inappendix
). If theENV
line is defining the default user, we need to change that assignment to the variableNB_USER
. This part is specific to the rocker docker files. - 9
-
We need to export any variables (
ENV
) found in the docker file so it is available to the scripts that will run in theRUN
statements. We need to export the variables as done here (witheval
andexport
) otherwise they don’t make it to the child scripts about to be run. Getting variables to be exported to child scripts being called by a parent script is tricky and this line required a lot of testing and debugging to get variables exported properly. - 10
- The export line will only make the variable available to the child scripts. We also want them available in the final image. To do that, we write them to a file that we will source from the docker file. Scripts are run in an ephemeral subshell during docker builds so we cannot define the variable here.
- 11
-
If the docker file line starts with
RUN
then run the command. This command should be a rocker script because that is how rocker docker files are organized. See an example rocker docker file. - 12
- Here the input file for the while loop is specified.
- 13
-
The rocker
install_texlive.sh
script (which is part of verse) will provide a basic texlive installation. Here a few more packages are added so that the user is able to run vanilla Quarto to PDF and Myst to PDF. See the chapter on texlive.
7.11 start
Within a JupyterHub, the user home directory $HOME
is typically re-mapped to the user persistent home directory. That means that the image build process cannot put things into $HOME
, they would just be lost when $HOME
is re-mapped. If a process needs to have something in the home directory, e.g. in some local user configuration, this must be done in the start
script. The repo2docker docker image specifies that the start script is ${REPO_DIR}/start
. In py-rocket-base, the start scripts in a child docker file is souces in a subshell from the py-rocket-base start script.
#!/bin/bash
set -euo pipefail
# Start - Set any environment variables here
# These are inherited by all processes, *except* RStudio
# USE export <parname>=value
# source this file to get the variables defined in the rocker Dockerfile
source ${REPO_DIR}/env.txt
# End - Set any environment variables here
# Run child start scripts in a subshell to contain its environment
# ${REPO_DIR}/childstart/ is created by setup-start.sh
if [ -d "${REPO_DIR}/childstart/" ]; then
for script in ${REPO_DIR}/childstart/*; do
if [ -f "$script" ]; then
echo "Sourcing script: $script"
source "$script" || {
echo "Error: Failed to source $script. Moving on to the next script."
}
fi
done
fi
exec "$@"
- 1
-
In a Docker file so no way to dynamically set environmental variables, so the
env.txt
file with theexport <var>=<value>
are source at start up. - 2
-
Run any child start script in a subshell. Run in a subshell to contain any
set
statements or similar. start scripts are moved intochildstarts
by thesetup-start.sh
pyrocket script.
7.12 desktop.sh
The default for XDG and xfce4 is for Desktop files to be in ~/Desktop but this leads to a variety of problems. First we are altering the user directiory which seems rude, second orphan desktop files might be in ~/Desktop so who knows what the user Desktop experience with be, here the Desktop dir is set to /usr/share/Desktop so is part of the image. Users that really want to customize Desktop can change ~/.config/user-dirs.dirs
. Though py-rocket-base might not respect that. Not sure why you’d do that instead of just using a different image that doesn’t have the py-rocket-base behavior.
#!/bin/bash
set -e
# Copy in the Desktop files
APPLICATIONS_DIR=/usr/share/applications
DESKTOP_DIR=/usr/share/Desktop
mkdir -p "${DESKTOP_DIR}"
chown :staff /usr/share/Desktop
chmod 775 /usr/share/Desktop
# set the Desktop dir default for XDG
echo 'XDG_DESKTOP_DIR="${DESKTOP_DIR}"' > /etc/xdg/user-dirs.defaults
# The for loops will fail if they return null (no files). Set shell option nullglob
shopt -s nullglob
for desktop_file_path in ${REPO_DIR}/Desktop/*.desktop; do
cp "${desktop_file_path}" "${APPLICATIONS_DIR}/."
# Symlink application to desktop and set execute permission so xfce (desktop) doesn't complain
desktop_file_name="$(basename ${desktop_file_path})"
# Set execute permissions on the copied .desktop file
chmod +x "${APPLICATIONS_DIR}/${desktop_file_name}"
ln -sf "${APPLICATIONS_DIR}/${desktop_file_name}" "${DESKTOP_DIR}/${desktop_file_name}"
done
update-desktop-database "${APPLICATIONS_DIR}"
# Add MIME Type data from XML files to the MIME database.
MIME_DIR="/usr/share/mime"
MIME_PACKAGES_DIR="${MIME_DIR}/packages"
mkdir -p "${MIME_PACKAGES_DIR}"
for mime_file_path in ${REPO_DIR}/Desktop/*.xml; do
cp "${mime_file_path}" "${MIME_PACKAGES_DIR}/."
done
update-mime-database "${MIME_DIR}"
# Add icons
ICON_DIR="/usr/share/icons"
ICON_PACKAGES_DIR="${ICON_DIR}/packages"
mkdir -p "${ICON_PACKAGES_DIR}"
for icon_file_path in "${REPO_DIR}"/Desktop/*.png; do
cp "${icon_file_path}" "${ICON_PACKAGES_DIR}/" || echo "Failed to copy ${icon_file_path}"
done
for icon_file_path in "${REPO_DIR}"/Desktop/*.svg; do
cp "${icon_file_path}" "${ICON_PACKAGES_DIR}/" || echo "Failed to copy ${icon_file_path}"
done
gtk-update-icon-cache "${ICON_DIR}"
- 1
- This is the default local for system applications.
- 2
- Create the Desktop directory and make sure jovyan can put files there. This is mainly for debugging.
- 3
-
Set up the default XDG_DESKTOP_DIR value. This will be copied to the
~.config
(by xinitrc). - 4
- Copy the .desktop file in the Desktop directory into the applications directory and make a symlink to the Desktop directory. The former means that the applications will appear in the menu in xfce4 desktop and the latter means there will be a desktop icon.
- 5
- Add any mime xml files to the mime folder and update the mime database.
- 6
- Add any png or svg icon files to the icon folder and update the icon database.