7  Developer notes

7.1 Design

py-rocket-base is inspired by repo2docker and the Pangeo Docker stack design. py-rocker-base is built using repo2docker (via repo2docker-action) and thus lets repo2docker make the choices regarding the environment design—things like how the conda environment is set-up and the base directory structure and permissions.

The Pangeo Docker stack does not use repo2docker, but mimics repo2docker’s environment design. The Pangeo base-image behaves similar to repo2docker in that using the base-image in the FROM line of a Dockerfile causes the build to look for files with the same names as repo2docker’s configuration files and then do the proper action with those files. This means that routine users do not need to know how to write Dockerfile code in order to extend the image with new packages or applications. py-rocker-base Docker image uses this Pangeo base-image design. It is based on ONBUILD commands in the Dockerfile that trigger actions only when the image is used in the FROM line of another Dockerfile.

py-rocket-base does not include this ONBUILD behavior. Instead it follows the rocker docker stack design and provides helper scripts for building on the base image. py-rocket-base a directory called \pyrocket_scriptsthat will help you do common tasks for scientific docker images.These scripts are not required. If users are familiar with writing Docker files, they can write their own code. The use of helper scripts was used after feedback that the Pangeo ONBUILD behavior makes it harder to customize images that need very specific structure or order of operations.

There are many ways to install R and RStudio into an image designed for JupyterHubs The objective of py-rocker-base is not to install R and RStudio, per se, and there are other leaner and faster ways to install R/RStudio if that is your goal1. The objective of py-rocket-base is to create an JupyterHub image such when you click the RStudio button in the JupyterLab UI to enter the RStudio UI, you enter an environment that is the same as if you had used a Rocker image. If you are in the JupyterLab UI, the environment is the same as it you had used repo2docker (or Pangeo base-image) to create the environment.

7.2 Documentation

To build the documentation book, clone repo and then

cd book
quarto render .

Set GitHub Pages to docs folder.

7.3 Building the images

The .github/workflows/build.yaml is a GitHub Action to build the image with repo2docker-action. The GitHub Action builds the image and the URL will look like one of these

ghcr.io/nmfs-opensci/repo-name/image-name:latest
ghcr.io/nmfs-opensci/image-name:latest

For example, for this repo the image is ghcr.io/nmfs-opensci/py-rocket-base:latest.

7.4 repo2docker

repo2docker-action is creating the image and publishing to ghcr.io/nmfs-opensci/py-rocket-base (image hosting like dockerhub or quay.io).

repo2docker (a python package) sets up the structure of the base environment, e.g. installs mamba for package solving, sets up environment variables, installs linux packages, etc. It looks for specific files (like apt.txt, environment.yml, postBuild) in the build context (the repo that the Dockerfile is in) and takes the appropriate action. repo2docker-action also allows you to include appendix to add more commands to your Dockerfile.

repo2docker does a lot behind the scene and has some different behavior.

7.4.1 COPY

COPY <src> <dest> does not work in appendix because repo2docker changes the build context. The files are in src not .. In appendix, you do not do lines like this

COPY file1 newlocation/file1

to bring file1 into the build. Instead, the files are already in ${REPO_DIR}/. If you want to copy a file to a new location, run the following as root if jovyan does not have permission to write to newlocation.

RUN cp ${REPO_DIR}/file1 newlocation/file1

7.4.2 ENV and ARGS

repo2docker uses a number of ARGs in the build which you might expect to be in the image environment, for example ${NB_USER}. These need to be converted to ENV to be available to child builds.

7.5 RStudio

jupyter-rsession-proxy allows us to launch RStudio from Jupyter Lab, but the environment is different than the environment in Jupyter Lab.

7.5.1 Environmental variables

  • PATH is different. conda is not on the path.
  • None of the environmental variables in the docker file will be in the /rstudio environment. The start command affects \lab and \notebook but not \rstudio.
  • The path in the terminal (in RStudio) can/is different than in the R console. Expect weird unexpected behavior because of this. If you type bash, then .bashrc is run and that will run conda init and that will add conda binaries to the path. Then really weird and unexpected things can happen.

If you need some environmental variable set, you will need to set those in $R_HOME/etc/Rprofile.site which is run when R starts.

7.6 Basic structure of py-rocket-base

The py-rocket-base docker build has the following structure:

# base image and environment
repo2docker sets this up
repo2docker-action sets the directory where the build files 
  are put to /srv/repo (via $REPO_DIR in GitHub Action)
  and ownership is set to jovyan (via $NB_USER)
  
# environment.yml
repo2docker adds these packages to the conda notebook environment

# start
repo2docker points the Docker image entrypoint (command run on start) to this file
${REPO_DIR}/start

# appendix
repo2docker-action adds the Docker commands here to the end of Dockerfile
most of the work in py-rocket-base is done here. appendix calls rocker.sh and install packages in apt2.txt

Each file is described below.

7.7 apt2.txt

This is not named apt.txt because these packages need to be installed after R is installed because the R scripts uninstall packages as part of cleanup. There are some packages that are required for Desktop (/desktop) to operate correctly. Packages needed for R and RStudio building (/rstudio) are installed via the rocker install scripts.

7.8 environment.yml

These are added to the notebook conda environment and in py-rocket-base the basic packages needed for Jupyter Lab, RStudio and Desktop are added. Scientific packages are not added here. They will be added via child images that use py-rocket-base as the base image (in the FROM line).

7.9 appendix

This a long file with many pieces. The pieces are explained below. Click on the number next to code to read about what that code block does.

USER root

# Set env variables
# This is the default env in repo2docker type images
ENV CONDA_ENV=notebook
# Tell applications where to open desktop apps
DISPLAY=":1.0"

# Install R, RStudio via Rocker scripts
ENV R_VERSION="4.4.1"
ENV R_DOCKERFILE="verse_${R_VERSION}"
# This is in the rocker script but will not run since ${NB_USER} already exists
# Needed because rocker scripts set permissions based on the staff group
RUN usermod -a -G staff "${NB_USER}"
RUN PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin && \
  chmod +x ${REPO_DIR}/rocker.sh && \
  ${REPO_DIR}/rocker.sh

# Install linux packages after R installation since the R install scripts get rid of packages
# The package_list part is reading the file and doing clean-up to just have the list of packages
RUN package_list=$(grep -v '^\s*#' ${REPO_DIR}/apt2.txt | grep -v '^\s*$' | sed 's/\r//g; s/#.*//; s/^[[:space:]]*//; s/[[:space:]]*$//' | awk '{$1=$1};1') && \
  apt-get update && \
  apt-get install --yes --no-install-recommends $package_list && \
  apt-get autoremove --purge && \
  apt-get clean && \
  rm -rf /var/lib/apt/lists/*
  
# Re-enable man pages disabled in Ubuntu 18 minimal image
# https://wiki.ubuntu.com/Minimal
RUN yes | unminimize
# NOTE: $NB_PYTHON_PREFIX is the same as $CONDA_PREFIX at run-time.
# $CONDA_PREFIX isn't available in this context.
# NOTE: Prepending ensures a working path; if $MANPATH was previously empty,
# the trailing colon ensures that system paths are searched.
ENV MANPATH="${NB_PYTHON_PREFIX}/share/man:${MANPATH}"
RUN mandb

# Add custom jupyter config. You can also put config.py files in the same place
COPY custom_jupyter_server_config.json ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_server_config.d/
COPY custom_jupyter_server_config.json ${NB_PYTHON_PREFIX}/etc/jupyter/jupyter_notebook_config.d/

# Clean up extra files in ${REPO_DIR}
RUN rm -rf ${REPO_DIR}/book ${REPO_DIR}/docs

# Copy scripts into /pyrocket_scripts directory in the image
RUN mkdir -p /pyrocket_scripts && cp -r ${REPO_DIR}/scripts/* /pyrocket_scripts/

# Set ownership to root and permissions to 755
RUN chown -R root:staff /pyrocket_scripts && \
    chmod -R 775 /pyrocket_scripts

# Convert NB_USER to ENV (from ARG) so that it passes to the child dockerfile
ENV NB_USER=${NB_USER}

# Revert to default user and home as pwd
USER ${NB_USER}
WORKDIR ${HOME}
1
Some commands need to be run as root, such as installing linux packages with apt-get
2
Set variables. CONDA_ENV is useful for child builds
3
This section runs the script rocker.sh which installs R and RStudio using rocker scripts.
4
The rocker scripts build R from source and as part of clean up in the script, linux packages are removed that are not needed. repo2docker installs the packages in apt.txt automatically before the code in appendix thus the needed linux packages (which include packages for the Xfce Desktop Environment in \desktop) are put in apt2.txt. repo2docker will not detect this file and we can install the packages here after R is built. The grep -v etc code is processing apt2.txt and removing comments and blank lines.
5
Ubuntu does not have man pages installed by default. These lines activate man so users have the common help files.
6
This is some custom jupyter config to allow hidden files to be listed in the folder browser.
7
book and docs are the documentation files and are not needed in the image.
8
Copy the pyrocket helper scripts to the /pyrocket_scripts directory and set to executable.
9
The NB_USER environmental variable is not exported by repo2docker (it is an argument confined to the parent build) but is very useful for child builds. So it is converted to an environmental variable.
10
The parent docker build completes by setting the user to jovyan and the working directory to ${HOME}. Within a JupyterHub deployment, ${HOME} will often be re-mapped to the user persistent memory so it is important not to write anything that needs to be persistent to ${HOME}, for example configuration. You can do this in the start script since that runs after the user directory is mapped or you can put configuration files in some place other than ${HOME}.

7.10 rocker.sh

This script will copy in the rocker scripts from rocker-versioned2 into ${REPO_DIR} to install things. It will read in one of the rocker docker files using R_DOCKERFILE defined in the appendix file (which is inserted into the main docker file). Variables defined here will only be available in this script. Click on the numbers in the script to learn what each section is doing.

#!/bin/bash
set -e

# Copy in the rocker files. Work in ${REPO_DIR} to make sure I don't clobber anything
cd ${REPO_DIR}
wget https://github.com/rocker-org/rocker-versioned2/archive/refs/tags/R${R_VERSION}.tar.gz
tar zxvf R${R_VERSION}.tar.gz && \
mv rocker-versioned2-R${R_VERSION}/scripts /rocker_scripts && \
ROCKER_DOCKERFILE_NAME="${R_DOCKERFILE}.Dockerfile"
mv rocker-versioned2-R${R_VERSION}/dockerfiles/${ROCKER_DOCKERFILE_NAME}  /rocker_scripts/original.Dockerfile && \
rm R${R_VERSION}.tar.gz && \
rm -rf rocker-versioned2-R${R_VERSION}

cd /
# Read the Dockerfile and process each line
while IFS= read -r line; do
    # Check if the line starts with ENV or RUN
    if [[ "$line" == ENV* ]]; then
        # Assign variable
        var_assignment=$(echo "$line" | sed 's/^ENV //g')
        # Replace ENV DEFAULT_USER="jovyan"
        if [[ "$var_assignment" == DEFAULT_USER* ]]; then
            var_assignment="DEFAULT_USER=${NB_USER}"
        fi
        # Run this way eval "export ..." otherwise the " will get turned to %22
        eval "export $var_assignment"
        # Write the exported variable to env.txt
        echo "export $var_assignment" >> ${REPO_DIR}/env.txt
    elif [[ "$line" == RUN* ]]; then
        # Run the command from the RUN line
        cmd=$(echo "$line" | sed 's/^RUN //g')
        echo "Executing: $cmd"
        eval "$cmd" # || echo ${cmd}" encountered an error, but continuing..."
    fi
done < /rocker_scripts/original.Dockerfile

# Install extra tex packages that are not installed by default
if command -v tlmgr &> /dev/null; then
    echo "Installing texlive collection-latexrecommended..."
    tlmgr install collection-latexrecommended
    tlmgr install pdfcol tcolorbox eurosym upquote adjustbox titling enumitem ulem soul rsfs
fi
1
The rocker-versioned2 repository for a particular R version is copied into {REPO_DIR} and unzipped. R_VERSION is defined in appendix.
2
The unzipped directory will be named rocker-versioned2-R${R_VERSION}. We move the scripts directory to /rocker_scripts (base level) because the rocker scripts expect the scripts to be there.
3
R_DOCKERFILE is defined as verse_${R_VERSION}. The docker file we will process (find ENV and RUN lines) is called ROCKER_DOCKERFILE_NAME in the rocker files. We move this to /rocker_scripts/original.Dockerfile so we can refer to it later.
4
Clean up the rocker directories that we no longer need.
5
cd to the base level where /rocker_scripts is.
6
The big while loop is processing /rocker_scripts/original.Dockerfile. The code is using piping > and the input file and pipe is specified at the end of the while loop code.
7
This looks if the line starts with ENV and if it does, it strips off ENV and stores the variable assigment statement to $var_assignment.
8
The rocker docker files do not use the NB_USER environmental variable (defined in appendix). If the ENV line is defining the default user, we need to change that assignment to the variable NB_USER. This part is specific to the rocker docker files.
9
We need to export any variables (ENV) found in the docker file so it is available to the scripts that will run in the RUN statements. We need to export the variables as done here (with eval and export) otherwise they don’t make it to the child scripts about to be run. Getting variables to be exported to child scripts being called by a parent script is tricky and this line required a lot of testing and debugging to get variables exported properly.
10
The export line will only make the variable available to the child scripts. We also want them available in the final image. To do that, we write them to a file that we will source from the docker file. Scripts are run in an ephemeral subshell during docker builds so we cannot define the variable here.
11
If the docker file line starts with RUN then run the command. This command should be a rocker script because that is how rocker docker files are organized. See an example rocker docker file.
12
Here the input file for the while loop is specified.
13
The rocker install_texlive.sh script (which is part of verse) will provide a basic texlive installation. Here a few more packages are added so that the user is able to run vanilla Quarto to PDF and Myst to PDF. See the chapter on texlive.

7.11 start

Within a JupyterHub, the user home directory $HOME is typically re-mapped to the user persistent home directory. That means that the image build process cannot put things into $HOME, they would just be lost when $HOME is re-mapped. If a process needs to have something in the home directory, e.g. in some local user configuration, this must be done in the start script. The repo2docker docker image specifies that the start script is ${REPO_DIR}/start. In py-rocket-base, the start scripts in a child docker file is souces in a subshell from the py-rocket-base start script.

#!/bin/bash
set -euo pipefail

# Start - Set any environment variables here
# These are inherited by all processes, *except* RStudio
# USE export <parname>=value
# source this file to get the variables defined in the rocker Dockerfile
source ${REPO_DIR}/env.txt
# End - Set any environment variables here

# Run child start scripts in a subshell to contain its environment
# ${REPO_DIR}/childstart/ is created by setup-start.sh
if [ -d "${REPO_DIR}/childstart/" ]; then
    for script in ${REPO_DIR}/childstart/*; do
        if [ -f "$script" ]; then
            echo "Sourcing script: $script"
            source "$script" || {
                echo "Error: Failed to source $script. Moving on to the next script."
            }
        fi
    done
fi
exec "$@"
1
In a Docker file so no way to dynamically set environmental variables, so the env.txt file with the export <var>=<value> are source at start up.
2
Run any child start script in a subshell. Run in a subshell to contain any set statements or similar. start scripts are moved into childstarts by the setup-start.sh pyrocket script.

7.12 desktop.sh

The default for XDG and xfce4 is for Desktop files to be in ~/Desktop but this leads to a variety of problems. First we are altering the user directiory which seems rude, second orphan desktop files might be in ~/Desktop so who knows what the user Desktop experience with be, here the Desktop dir is set to /usr/share/Desktop so is part of the image. Users that really want to customize Desktop can change ~/.config/user-dirs.dirs. Though py-rocket-base might not respect that. Not sure why you’d do that instead of just using a different image that doesn’t have the py-rocket-base behavior.

#!/bin/bash
set -e

# Copy in the Desktop files
APPLICATIONS_DIR=/usr/share/applications
DESKTOP_DIR=/usr/share/Desktop
mkdir -p "${DESKTOP_DIR}"
chown :staff /usr/share/Desktop
chmod 775 /usr/share/Desktop
# set the Desktop dir default for XDG
echo 'XDG_DESKTOP_DIR="${DESKTOP_DIR}"' > /etc/xdg/user-dirs.defaults

# The for loops will fail if they return null (no files). Set shell option nullglob
shopt -s nullglob

for desktop_file_path in ${REPO_DIR}/Desktop/*.desktop; do
    cp "${desktop_file_path}" "${APPLICATIONS_DIR}/."
    # Symlink application to desktop and set execute permission so xfce (desktop) doesn't complain
    desktop_file_name="$(basename ${desktop_file_path})"
    # Set execute permissions on the copied .desktop file
    chmod +x "${APPLICATIONS_DIR}/${desktop_file_name}"
    ln -sf "${APPLICATIONS_DIR}/${desktop_file_name}" "${DESKTOP_DIR}/${desktop_file_name}"
done
update-desktop-database "${APPLICATIONS_DIR}"

# Add MIME Type data from XML files  to the MIME database.
MIME_DIR="/usr/share/mime"
MIME_PACKAGES_DIR="${MIME_DIR}/packages"
mkdir -p "${MIME_PACKAGES_DIR}"
for mime_file_path in ${REPO_DIR}/Desktop/*.xml; do
    cp "${mime_file_path}" "${MIME_PACKAGES_DIR}/."
done
update-mime-database "${MIME_DIR}"

# Add icons
ICON_DIR="/usr/share/icons"
ICON_PACKAGES_DIR="${ICON_DIR}/packages"
mkdir -p "${ICON_PACKAGES_DIR}"
for icon_file_path in "${REPO_DIR}"/Desktop/*.png; do
    cp "${icon_file_path}" "${ICON_PACKAGES_DIR}/" || echo "Failed to copy ${icon_file_path}"
done
for icon_file_path in "${REPO_DIR}"/Desktop/*.svg; do
    cp "${icon_file_path}" "${ICON_PACKAGES_DIR}/" || echo "Failed to copy ${icon_file_path}"
done
gtk-update-icon-cache "${ICON_DIR}"
1
This is the default local for system applications.
2
Create the Desktop directory and make sure jovyan can put files there. This is mainly for debugging.
3
Set up the default XDG_DESKTOP_DIR value. This will be copied to the ~.config (by xinitrc).
4
Copy the .desktop file in the Desktop directory into the applications directory and make a symlink to the Desktop directory. The former means that the applications will appear in the menu in xfce4 desktop and the latter means there will be a desktop icon.
5
Add any mime xml files to the mime folder and update the mime database.
6
Add any png or svg icon files to the icon folder and update the icon database.