8 R in Jupyter Lab and Python in RStudio
py-rocket has separate R and Python installations because there are a variety of system packages linkages (GDAL and others depending whyat you are doing) that will break if you do not use the right system linkages. The way this is handled is via the the system PATH. This tells functions where to look for files it needs.
As long as you only use R or Python (don’t mix the two) in a notebook, you will be fine in py-rocket. When you activate R, the path will not have conda. When you activate Python, it will have use the conda “notebook” environment and have that on the path.
Try this in R (RStudio or the R kernel in Jupyter Lab):
Sys.getenv("PATH")
Try this in a Jupyter Notebook in Jupyter Lab:
import os
print(os.environ["PATH"])
8.1 Installing R packages
There is a user directory specified by default in the user’s home directory. If this is persistent, then packages installed using
install.packages()
will by default be installed there and will be persistent.
The 2nd and 3rd paths on .libPaths()
are in the /usr
directory and will be recreated each time the Jupyter Hub is restarted and thus any package installed there by the user will disappear.
However, this means that if you are installing R package in a Docker image, they will by default go to the /home/jovyan
user library and that will get wiped out in a Jupyter Hub where the user home is persistent since whatever is in /home
during the Docker build will be replaced by the user home directory. In a Docker build, make sure to use
install.packages(...., lib="${R_HOME}/site-library")
or use the helper script plus a install.R
file in your Docker file:
COPY . /tmp2/
RUN /pyrocket_scripts/install-r-packages.sh /tmp2/install.R
8.2 Using R in Jupyter Lab
In Jupyter Lab, you select a R kernel from the upper right. You can then use R code in the notebook. It will use the R installation in py-rocket with all the preloaded libraries.
8.3 Using Python in R (RStudio or Jupyter Lab with R kernel)
The following behavior is specific to R, not the GUI (RStudio or Jupyter Lab with R kernel) that you are using to interact with it.
8.3.1 py_require()
To use Python, you use the reticulate
library. If you only need a handful of Python packages, it will simplify things if you use py_require()
. Like this
library(reticulate)
py_require("xarray")
This will create an ephemeral environment with the packages you require and does not change the system PATH or put conda/envs/notebook
on the path. Everything should work fine though I have not tested dask.
One gotcha is that reticulate create a cache in ~/.cache/R/reticulate
and it might not be easy to change later to using a conda environment for your Python binary. I often had to do
rm ~/.cache/R/reticulate
in a terminal to get reticulate to allow me to use use_conda("notebook")
in another R session.
8.3.2 Using a conda environment
You can also use the conda environment with reticulate with all the pre-installed packages.
library(reticulate)
use_condaenv("notebook")
However this will prepend conda to the system path and that will persist until you RESTART R. In RStudio, it is not enough to close the script or notebook you are working in; you actually have to restart R. reticulate does not have a deactivate_conda()
function. In Jupyter Lab, your notebooks are isolated from each other and each has its own kernel, so whatever path changes you do in one notebook do not affect other notebooks. This is not the case for RStudio.
"/srv/conda/condabin:/srv/conda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/usr/sbin:/usr/bin:/usr/local/texlive/bin/x86_64-linux:$PATH:/usr/local/texlive/bin/x86_64-linux:/usr/lib/rstudio-server/bin/quarto/bin:/usr/lib/rstudio-server/bin/postback"
If you use use_condaenv()
in an R session and need to restore the normal path (to get R libraries that bind to system packages to work), you can do the following:
orig <- Sys.getenv("RSTUDIO_CLEAN_PATH", unset = NA)
orig # make sure it looks right
Sys.setenv(PATH = orig)
Note, the terminal in RStudio is not the same environment as R. So doing echo $PATH
in the terminal in RStudio will still show the original path without conda.
Why activating conda causes problems for R
When we use a conda environment, the PATH is altered so that the conda environment directory appears first on the PATH. Any R packages that need a particular system package that is also in conda (like GDAL) are likely to throw mis-match errors.
8.4 Dealing with SSL mismatch errors
When you use reticulate in R, use use_condaenv()
and call a function that needs to download data, you are liable to get a OpenSSL mismatch error. py-rocket solves this by adding this to
rsession-ld-library-path=/srv/conda/envs/notebook/lib
to /etc/rstudio/rserver.conf
. This let’s R know where to look for SSL links and hopefully doesn’t break R packages. Make sure that .Renviron
does not set LD_LIBRARY_PATH
or this solution will not work. I don’t know why but it breaks.
8.5 Developers
How is the R kernel created so that it shows up in Jupyter Lab? You don’t need to install R into the conda environment since it already is in the image. We just need to use IRkernel
R package to register the kernel with jupyter.
Rscript - <<-"EOF"
install.packages('IRkernel', lib = .Library) # install in system library
Sys.setenv(PATH = paste("/srv/conda/envs/notebook/bin", Sys.getenv("PATH"), sep = ":"))
IRkernel::installspec(name = "ir", displayname = "R ${R_VERSION}")
EOF