8 Developer notes
8.1 Design
py-rocket-base is inspired by repo2docker and the Pangeo Docker stack design. py-rocker-base is built using repo2docker (via repo2docker-action) and thus lets repo2docker make the choices regarding the environment design—things like how the conda environment is set-up and the base directory structure and permissions.
The Pangeo Docker stack does not use repo2docker, but mimics repo2docker’s environment design. The Pangeo base-image behaves similar to repo2docker in that using the base-image in the FROM
line of a Dockerfile causes the build to look for files with the same names as repo2docker’s configuration files and then do the proper action with those files. This means that routine users do not need to know how to write Dockerfile code in order to extend the image with new packages or applications. py-rocker-base Docker image uses this Pangeo base-image design. It is based on ONBUILD
commands in the Dockerfile that trigger actions only when the image is used in the FROM
line of another Dockerfile.
py-rocket-base does not include this ONBUILD
behavior. Instead it follows the rocker docker stack design and provides helper scripts for building on the base image. py-rocket-base a directory called \pyrocket_scripts
that will help you do common tasks for scientific docker images.These scripts are not required. If users are familiar with writing Docker files, they can write their own code. The use of helper scripts was used after feedback that the Pangeo ONBUILD behavior makes it harder to customize images that need very specific structure or order of operations.
There are many ways to install R and RStudio into an image designed for JupyterHubs The objective of py-rocker-base is not to install R and RStudio, per se, and there are other leaner and faster ways to install R/RStudio if that is your goal1. The objective of py-rocket-base is to create an JupyterHub image such when you click the RStudio button in the JupyterLab UI to enter the RStudio UI, you enter an environment that is the same as if you had used a Rocker image. If you are in the JupyterLab UI, the environment is the same as it you had used repo2docker (or Pangeo base-image) to create the environment.
8.2 Documentation
To build the documentation book, clone repo and then
cd book quarto render .
Set GitHub Pages to docs folder.
8.3 Building the images
The .github/workflows/build.yaml
is a GitHub Action to build the image. The GitHub Action builds the image and the URL will look like one of these
ghcr.io/nmfs-opensci/repo-name/image-name:latest
ghcr.io/nmfs-opensci/image-name:latest
For example, for this repo the image is ghcr.io/nmfs-opensci/py-rocket-base:latest
.
8.4 base-image
In the directory, base-image
is the Pangeo base-image Dockerfile minus the ONBUILD statements. Thus the base-image for py-rocket-base is the same as Pangeo base-image but doesn’t have the behavior of automatically processing files like environment.yml
in child images (that use the base image in the FROM
line).
py-rocket-base uses base-image and adds on the pangeo-notebook metapackage which add the basic JupyterHub and JupyterLab packages. py-rocket-base then adds on R/RStudio, more conda packages and Desktop via install scripts.
8.5 py-rocket-base
The Dockerfile does the following in order:
- Move files into
/srv/repo
- Move scripts into
/pyrocket_scripts
and/rocker_scripts
- Install conda packages with the pangeo-notebook metapackage as the main set of packages plus the extra server packages
- Install R and RStudio plus the verse set of packages with the rocker scripts
- Set up the Desktop environment and ensure that applications go into
/etc/xdg/userconfig
instead of$HOME
. - Move the start script to
/srv/start
.
The pieces of the Dockerfile are explained below. Click on the number next to code to read about what that code block does.
/nmfs-opensci/py-rocket-base/base-image:latest
FROM ghcr.io
USER root
# Define environment variables
# DISPLAY Tell applications where to open desktop apps - this allows notebooks to pop open GUIs
="/srv/repo" \
ENV REPO_DIR=":1.0" \
DISPLAY="4.4.1"
R_VERSION
# Add NB_USER to staff group (required for rocker script)
# Ensure the staff group exists first
-f staff && usermod -a -G staff "${NB_USER}"
RUN groupadd
# Copy files into REPO_DIR and make sure staff group can edit (use staff for rocker)
--chown=${NB_USER}:${NB_USER} . ${REPO_DIR}
COPY -R staff ${REPO_DIR} && \
RUN chgrp -R g+rwx ${REPO_DIR} && \
chmod -rf ${REPO_DIR}/book ${REPO_DIR}/docs
rm
# Copy scripts to /pyrocket_scripts and set permissions
-p /pyrocket_scripts && \
RUN mkdir -r ${REPO_DIR}/scripts/* /pyrocket_scripts/ && \
cp -R root:staff /pyrocket_scripts && \
chown -R 775 /pyrocket_scripts
chmod
# Install conda packages (will switch to NB_USER in script)
/pyrocket_scripts/install-conda-packages.sh ${REPO_DIR}/environment.yml
RUN
# Install R, RStudio via Rocker scripts. Requires the prefix for a rocker Dockerfile
/pyrocket_scripts/install-rocker.sh "verse_${R_VERSION}"
RUN
# Install extra apt packages
# Install linux packages after R installation since the R install scripts get rid of packages
/pyrocket_scripts/install-apt-packages.sh ${REPO_DIR}/apt.txt
RUN
# Install some basic VS Code extensions
/pyrocket_scripts/install-vscode-extensions.sh ${REPO_DIR}/vscode-extensions.txt
RUN
# Re-enable man pages disabled in Ubuntu 18 minimal image
# https://wiki.ubuntu.com/Minimal
| unminimize
RUN yes ="${NB_PYTHON_PREFIX}/share/man:${MANPATH}"
ENV MANPATH
RUN mandb
# Add custom Jupyter server configurations
-p ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_server_config.d/ && \
RUN mkdir -p ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_notebook_config.d/ && \
mkdir ${REPO_DIR}/custom_jupyter_server_config.json ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_server_config.d/ && \
cp ${REPO_DIR}/custom_jupyter_server_config.json ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_notebook_config.d/
cp
# Set up the defaults for Desktop.
=/etc/xdg/userconfig
ENV XDG_CONFIG_HOME-p ${XDG_CONFIG_HOME} && \
RUN mkdir -R ${NB_USER}:${NB_USER} ${XDG_CONFIG_HOME} && \
chown -R u+rwx,g+rwX,o+rX ${XDG_CONFIG_HOME} && \
chmod ${REPO_DIR}/user-dirs.dirs ${XDG_CONFIG_HOME} && \
mv +x ${REPO_DIR}/scripts/setup-desktop.sh && \
chmod ${REPO_DIR}/scripts/setup-desktop.sh
# Fix home permissions. Not needed in JupyterHub with persistent memory but needed if not used in that context
/pyrocket_scripts/fix-home-permissions.sh
RUN
# Set up the start command
${NB_USER}
USER +x ${REPO_DIR}/start \
RUN chmod && cp ${REPO_DIR}/start /srv/start
# Revert to default user and home as pwd
${NB_USER}
USER ${HOME} WORKDIR
- 1
-
Some commands need to be run as root, such as installing linux packages with
apt-get
- 2
- Set variables. CONDA_ENV is useful for child builds
- 3
-
Copy the py-rocket-base files into
/srv/repo
directory.book
anddocs
are the documentation files and are not needed in the image. - 4
- Copy the pyrocket scripts into the image and set the permissions so they can be executed by the staff group (which includes jovyan). The pyrocket scripts are used to do most of the installation tasks and these can also be used to extend py-rocket-base.
- 5
-
Use the pyrocket script to install the conda packages in
environment.yml
. The script does clean-up. The core package is the pangeo-notebook metapackage to this are added some JupyterLab extensions and packages needed for RStudio and Desktop. Scientific packages are not added here. They will be added via child images that use py-rocket-base as the base image (in the FROM line). - 6
-
This section runs the script
install-rocker.sh
which installs R and RStudio using rocker scripts. - 7
-
The linux packages are installed with the
install-apt-packages
script which takes care of clean-up. These packages need to be installed after R is installed because the R scripts uninstall packages as part of cleanup. - 8
- The VSCode extensions are installed into the conda environment directory since instead of the home directory since the home directory is replaced by the user persistent home directory in a JupyterHub.
- 9
-
Ubuntu does not have man pages installed by default. These lines activate
man
so users have the common help files. - 10
- This is some custom jupyter config to allow hidden files to be listed in the folder browser.
- 11
-
Setting up Desktop. Keep config in the
/etc
so doesn’t trash user environment (that they might want for other environments). Setting up Desktop configuration very poorly documented. The key is setting the environmental variableXDG_CONFIG_HOME
and then putting the fileuser-dirs.dirs
within that directory. In that file, one can specifyXDG_DESKTOP_DIR="/usr/share/Desktop"
which says where application files are kept. - 12
-
Ensure that none of the directories in
/home
are owned by root. When the image is used in a JupyterHub, this won’t matter if home is replaced by the user persistent directory but in other applications having any directories in home owned by root will cause problems. - 13
- The start file mainly includes a subshell to run any start files used in extenstions from the py-rocket-base image.
- 14
-
The parent docker build completes by setting the user to jovyan and the working directory to
${HOME}
. Within a JupyterHub deployment,${HOME}
will often be re-mapped to the user persistent memory so it is important not to write anything that needs to be persistent to${HOME}
, for example configuration. You can do this in thestart
script since that runs after the user directory is mapped or you can put configuration files in some place other than${HOME}
.
8.6 install-rocker.sh
This script will copy in the rocker scripts from rocker-versioned2 into ${REPO_DIR}
to install things. It will read in one of the rocker docker files using R_DOCKERFILE
defined in the appendix
file (which is inserted into the main docker file). Variables defined here will only be available in this script. Click on the numbers in the script to learn what each section is doing.
#!/bin/bash
-e
set
# Copy in the rocker files. Work in ${REPO_DIR} to make sure I don't clobber anything
${REPO_DIR}
cd ://github.com/rocker-org/rocker-versioned2/archive/refs/tags/R${R_VERSION}.tar.gz
wget https${R_VERSION}.tar.gz && \
tar zxvf R-versioned2-R${R_VERSION}/scripts /rocker_scripts && \
mv rocker="${R_DOCKERFILE}.Dockerfile"
ROCKER_DOCKERFILE_NAME-versioned2-R${R_VERSION}/dockerfiles/${ROCKER_DOCKERFILE_NAME} /rocker_scripts/original.Dockerfile && \
mv rocker${R_VERSION}.tar.gz && \
rm R-rf rocker-versioned2-R${R_VERSION}
rm
/
cd # Read the Dockerfile and process each line
while IFS= read -r line; do
# Check if the line starts with ENV or RUN
if [[ "$line" == ENV* ]]; then
# Assign variable
=$(echo "$line" | sed 's/^ENV //g')
var_assignment# Replace ENV DEFAULT_USER="jovyan"
if [[ "$var_assignment" == DEFAULT_USER* ]]; then
="DEFAULT_USER=${NB_USER}"
var_assignment
fi# Run this way eval "export ..." otherwise the " will get turned to %22
"export $var_assignment"
eval # Write the exported variable to env.txt
"export $var_assignment" >> ${REPO_DIR}/env.txt
echo "$line" == RUN* ]]; then
elif [[ # Run the command from the RUN line
=$(echo "$line" | sed 's/^RUN //g')
cmd"Executing: $cmd"
echo "$cmd" # || echo ${cmd}" encountered an error, but continuing..."
eval
fi< /rocker_scripts/original.Dockerfile
done
# Install extra tex packages that are not installed by default
if command -v tlmgr &> /dev/null; then
"Installing texlive collection-latexrecommended..."
echo -latexrecommended
tlmgr install collection
tlmgr install pdfcol tcolorbox eurosym upquote adjustbox titling enumitem ulem soul rsfs fi
- 1
-
The rocker-versioned2 repository for a particular R version is copied into
{REPO_DIR}
and unzipped.R_VERSION
is defined inappendix
. - 2
-
The unzipped directory will be named
rocker-versioned2-R${R_VERSION}
. We move thescripts
directory to/rocker_scripts
(base level) because the rocker scripts expect the scripts to be there. - 3
-
R_DOCKERFILE
is defined asverse_${R_VERSION}
. The docker file we will process (find ENV and RUN lines) is calledROCKER_DOCKERFILE_NAME
in the rocker files. We move this to/rocker_scripts/original.Dockerfile
so we can refer to it later. - 4
- Clean up the rocker directories that we no longer need.
- 5
-
cd to the base level where
/rocker_scripts
is. - 6
-
The big while loop is processing
/rocker_scripts/original.Dockerfile
. The code is using piping>
and the input file and pipe is specified at the end of the while loop code. - 7
-
This looks if the line starts with
ENV
and if it does, it strips offENV
and stores the variable assigment statement to$var_assignment
. - 8
-
The rocker docker files do not use the
NB_USER
environmental variable (defined inappendix
). If theENV
line is defining the default user, we need to change that assignment to the variableNB_USER
. This part is specific to the rocker docker files. - 9
-
We need to export any variables (
ENV
) found in the docker file so it is available to the scripts that will run in theRUN
statements. We need to export the variables as done here (witheval
andexport
) otherwise they don’t make it to the child scripts about to be run. Getting variables to be exported to child scripts being called by a parent script is tricky and this line required a lot of testing and debugging to get variables exported properly. - 10
- The export line will only make the variable available to the child scripts. We also want them available in the final image. To do that, we write them to a file that we will source from the docker file. Scripts are run in an ephemeral subshell during docker builds so we cannot define the variable here.
- 11
-
If the docker file line starts with
RUN
then run the command. This command should be a rocker script because that is how rocker docker files are organized. See an example rocker docker file. - 12
- Here the input file for the while loop is specified.
- 13
-
The rocker
install_texlive.sh
script (which is part of verse) will provide a basic texlive installation. Here a few more packages are added so that the user is able to run vanilla Quarto to PDF and Myst to PDF. See the chapter on texlive.
8.7 start
Within a JupyterHub, the user home directory $HOME
is typically re-mapped to the user persistent home directory. That means that the image build process cannot put things into $HOME
, they would just be lost when $HOME
is re-mapped. If a process needs to have something in the home directory, e.g. in some local user configuration, this must be done in the start
script. The repo2docker docker image specifies that the start script is ${REPO_DIR}/start
. In py-rocket-base, the start scripts in a child docker file is souces in a subshell from the py-rocket-base start script.
#!/bin/bash
-euo pipefail
set
# Start - Set any environment variables here
# These are inherited by all processes, *except* RStudio
# USE export <parname>=value
# source this file to get the variables defined in the rocker Dockerfile
${REPO_DIR}/env.txt
source # End - Set any environment variables here
# Run child start scripts in a subshell to contain its environment
# ${REPO_DIR}/childstart/ is created by setup-start.sh
if [ -d "${REPO_DIR}/childstart/" ]; then
for script in ${REPO_DIR}/childstart/*; do
if [ -f "$script" ]; then
"Sourcing script: $script"
echo "$script" || {
source "Error: Failed to source $script. Moving on to the next script."
echo
}
fi
done
fi"$@" exec
- 1
-
In a Docker file so no way to dynamically set environmental variables, so the
env.txt
file with theexport <var>=<value>
are source at start up. - 2
-
Run any child start script in a subshell. Run in a subshell to contain any
set
statements or similar. start scripts are moved intochildstarts
by thesetup-start.sh
pyrocket script.
8.8 setup-desktop.sh
The default for XDG and xfce4 is for Desktop files to be in ~/Desktop
but this leads to a variety of problems. First we are altering the user directiory which seems rude, second orphan desktop files might be in ~/Desktop
so who knows what the user Desktop experience with be, here the Desktop dir is set to /usr/share/Desktop
so is part of the image. Users that really want to customize Desktop can change ~/.config/user-dirs.dirs
. Though py-rocket-base might not respect that. Not sure why you’d do that instead of just using a different image that doesn’t have the py-rocket-base behavior.
#!/bin/bash
-e
set
# Copy in the Desktop files
=/usr/share/applications
APPLICATIONS_DIR=/usr/share/Desktop
DESKTOP_DIR-p "${DESKTOP_DIR}"
mkdir :staff /usr/share/Desktop
chown 775 /usr/share/Desktop
chmod # set the Desktop dir default for XDG
'XDG_DESKTOP_DIR="${DESKTOP_DIR}"' > /etc/xdg/user-dirs.defaults
echo
# The for loops will fail if they return null (no files). Set shell option nullglob
-s nullglob
shopt
for desktop_file_path in ${REPO_DIR}/Desktop/*.desktop; do
"${desktop_file_path}" "${APPLICATIONS_DIR}/."
cp # Symlink application to desktop and set execute permission so xfce (desktop) doesn't complain
="$(basename ${desktop_file_path})"
desktop_file_name# Set execute permissions on the copied .desktop file
+x "${APPLICATIONS_DIR}/${desktop_file_name}"
chmod -sf "${APPLICATIONS_DIR}/${desktop_file_name}" "${DESKTOP_DIR}/${desktop_file_name}"
ln
done-desktop-database "${APPLICATIONS_DIR}"
update
# Add MIME Type data from XML files to the MIME database.
="/usr/share/mime"
MIME_DIR="${MIME_DIR}/packages"
MIME_PACKAGES_DIR-p "${MIME_PACKAGES_DIR}"
mkdir for mime_file_path in ${REPO_DIR}/Desktop/*.xml; do
"${mime_file_path}" "${MIME_PACKAGES_DIR}/."
cp
done-mime-database "${MIME_DIR}"
update
# Add icons
="/usr/share/icons"
ICON_DIR="${ICON_DIR}/packages"
ICON_PACKAGES_DIR-p "${ICON_PACKAGES_DIR}"
mkdir for icon_file_path in "${REPO_DIR}"/Desktop/*.png; do
"${icon_file_path}" "${ICON_PACKAGES_DIR}/" || echo "Failed to copy ${icon_file_path}"
cp
donefor icon_file_path in "${REPO_DIR}"/Desktop/*.svg; do
"${icon_file_path}" "${ICON_PACKAGES_DIR}/" || echo "Failed to copy ${icon_file_path}"
cp
done-update-icon-cache "${ICON_DIR}" gtk
- 1
- This is the default local for system applications.
- 2
- Create the Desktop directory and make sure jovyan can put files there. This is mainly for debugging.
- 3
-
This is not needed. It is the
user-dirs.dirs
file that is used. - 4
- Copy the .desktop file in the Desktop directory into the applications directory and make a symlink to the Desktop directory. The former means that the applications will appear in the menu in xfce4 desktop and the latter means there will be a desktop icon.
- 5
- Add any mime xml files to the mime folder and update the mime database.
- 6
- Add any png or svg icon files to the icon folder and update the icon database.
8.9 Notes on the RStudio environment
jupyter-rsession-proxy allows us to launch RStudio from Jupyter Lab, but the environment is different than the environment in Jupyter Lab.
8.9.1 Environmental variables
- PATH is different. conda is not on the path.
- None of the environmental variables in the docker file will be in the
/rstudio
environment. The start command affects\lab
and\notebook
but not\rstudio
. - The path in the terminal (in RStudio) can/is different than in the R console. Expect weird unexpected behavior because of this. If you type
bash
, then.bashrc
is run and that will runconda init
and that will add conda binaries to the path. Then really weird and unexpected things can happen.
If you need some environmental variable set, you will need to set those in $R_HOME/etc/Rprofile.site
which is run when R starts.