You are viewing an old version of this page. View the current version.

Compare with Current View Page History

« Previous Version 6 Next »

R and R Studio Server versions

The issue of R versions is a difficult one, especially now that many important single-cell packages are only available in newer R versions, but not all older, but still popular R packages are. This section describes the versioning issues in both the system R and in the R Studio Server web application.

The current "default" system version of R on compute servers is R 3.4.4. This is the version that is invoked if you type R from the command line.   We have also installed other R versions "side by side" – R-3.5.3 and R 3.6.1 – which can be accessed by typing R-3.5.3 and R 3.6.1 from the command line.

We have also installed many popular add-on packages in the default R 3.4.4. environment (e.g. tidyverse, ggplot2, DESeq2), most of them also in the R-3.5.3 and R 3.6.1 environments.

R Studio Server, the web application, is currently configured to use R 3.4.4, the "default" system version of R. This R Studio Server R version setting can only be set to one value system-wide and cannot be specified per-user. If your POD owners agree, we can change the R Studio Server R version to, for example, R 3.6.1. If your POD has more than one compute server (as most PODs do), we can change the default R Studio Server R version one just one of the compute servers, leaving others at the default R 3.4.4, version so that both can be used as needed. Please Contact Us if this is an option you would like to implement.

Note also that the Enterprise version of R Studio Server can set the R version used per-user, but that is a licensed product and is quite expensive. But if your teams wish to purchase a license, we're happy to install it.

Another option, one that provides maximum per-user flexibility (especially for single-compute-server PODs) is as follows. Use the R Studio Server web application for R 3.4-compatible workflows. For workflows requiring R 3.5 or 3.6, users can install the R Studio desktop application on their own desktop/laptop computers, using an underlying version(s) of R 3.5/3.6. Then, users can access files on shared storage by mounting their Work area file system via Samba (see Samba remote file system access for more information). The main drawback to this workflow is that typical personal computers do not have as much RAM as POD compute servers, and some R tasks can be memory intensive. What users can do in such cases is test the code in R Studio on their desktop computer, using smaller data sets if necessary. Then run the "full" workflow from the POD compute server command line using the appropriate R version.

Understanding R add-on packages

The libraries/add-on packages available in any given R version depend on the configured package installation directories, which can be listed in the R environment via the .libPaths() function. Typically, each user has a local package installation directory with packages they have installed. This local directory is searched first, followed by one or more system directories where we have installed add-on packages system-wide.

User local package installations directories are typically under the user's ~/R directory (e.g. /stor/home/<user_name>/R). If a user has installed packages under multiple versions of R, there will be sub-directories for the different versions (e.g. ~/R/x86_64-pc-linux-gnu-library/3.4, ~/R/x86_64-pc-linux-gnu-library/3.6). Users can list the contents of these directories to see what packages they have installed locally.

To see what packages are installed system-wide for a given R version, users can look at the version's package installation directories:

  • R 3.4.4
    • /usr/lib/R/library
    • /usr/lib/R/site-library
  • R 3.5.3/stor/system/opt/R/R-3.5.3/lib/R/library
  • R 3.6.1/stor/system/opt/R/R-3.6.1/lib/R/library

Local/Global package installation conflicts

Globally-installed R add-on packages may be updated during system maintenance. This can sometimes cause problems when users invoke R tools with many dependencies (e.g. DESeq2), some of which have been updated system-wide, but others of which have been locally installed and are not at a compatible level. The resulting error messages can be rather obscure, but typically show up after system maintenance has been performed. For example:

Error in checkSlotAssignment(object, name, value) : 
  assignment of an object of class “NULL” is not valid for slot ‘NAMES’ in an object of class “DESeqDataSet”; is(value, "characterORNULL") is not TRUE

To determine if this is due to a Local/Global package conflict, users can make their local installation directory invisible to R and see if the error goes away like this:

mv ~/R ~/R.bak

If this resolves the issue, the user may later find that they need to re-install other packages that were previously installed locally (check the now-named  ~/R.bak/x86_64-pc-linux-gnu-library/3.x directory, where x is the R version being used to see locally installed packages).

If this produces a different error indicating that one or more locally installed packages are missing, the user can re-install them then see if the problem is resolved.

Finally, if renaming the local R installation directory does not resolve the issue, it may be an issue with the globally installed packages, so Contact Us.

Troubleshooting other R issues

In addition to the Local/Global package conflict issue described above, other issues can arise involving R Studio Server (or less commonly, command-line R).

R Studio Server becomes unresponsive

One common problem is that R Studio Server may become unresponsive, even with repeated attempts to establish a new session. To troubleshoot this sort of issue, close the R Studio Server application and make some R-associated files and directories invisible to R like this:

mv ~/.rstudio ~/.rstudio.bak
mv ~/.RData ~/.RData.bak  

Note that .RData files may be in different directories. For example, if you a working in an R Project you have set up, there may be an .RData file in the project directory.

Large .RData files can be extremely slow to load from both R and R Studio Server. If you must save R data this way, consider renaming the .RData file to a different name so that it can be loaded explicitly only when needed, instead of always when R is invoked.

Disk quota exceeded

Another type of problem can arise when a user's 100 GB Home directory quota has been exceeded. This can produce errors when trying to start R Studio Server or R, perform work in R, or even install additional packages. Users will often (but not always) see an error like the following:

cannot create file'/stor/home/abattenh/output.tsv', reason 'Disk quota exceeded

If this issue arises, you should Contact Us to help relocate some of your Home directory contents to your Work or Scratch area. Just moving them yourself does not resolve the problem because Home directories have frequent snapshots taken that preserve copies of deleted files, and it requires a systems administrator to remove these snapshots (see Home directories for more information).

This issue can arise because R's default input/output directory is the user's home directory – but large files should not be stored or created there due to the 100 GB quota. Instead, R processing of large files should take place in the user's Work or Scratch area (e.g. /stor/work/<user's group name> or /stor/scratch/<user's group name>; users can find out which group(s) they belong to by typing the groups command on the command line). Users can navigate to Work or Scratch area directories using R's setwd function or using R Studio Server's file browser (e.g. via "Session" menu → "Set Working Directory" → "Choose Directory", or when a new R Project is created). Note that R Studio Server's file browser dialog will default to the user's Home directory, and the full path of the desired Work or Scratch area must be typed in.

  • No labels