Frequently Asked Questions

From ACCRE Wiki

Cluster Accounts

How do I change my ACCRE account password?

Your ACCRE login is your Vanderbilt University username and password. Verify you can login ssh vunetid@login.accre.vu or through the Visualization Portal. If you are unable to login to ACCRE with your VU Net ID and password, you can follow the steps on VUIT's support page How to Change your password. Please submit requests for resetting your VU Net ID password to VUIT. ACCRE does not manage your VU Net ID password.

Can I use my Vanderbilt password to log in to ACCRE?

Yes! Your ACCRE login is your Vanderbilt University username and password.

ssh vunetid@login.accre.vu

Connectivity

I cannot connect to the cluster, am experiencing intermittent connectivity to the cluster, or the system hangs upon log on. What should I do?

If you are normally able to connect and suddenly cannot, let us know by submitting a helpdesk ticket. Please provide as much information about the issue as possible including any useful output to your screen. Besides occasional network problems, there are a number of possible causes for sluggish to zero connectivity. Please read the following to help self-diagnose before submitting a help desk ticket so we are better able to assist you:

  • You may connect to the cluster only via a Secure Shell (SSH) client. For more information go to [1].
  • If you can connect but your login “hangs”, it is possible we are experiencing a problem with the primary cluster filesystem for /home /data and /nobackup). Other symptoms of this include logging on but not being able to see, for example, your home directory. Sometimes the file system problem is temporary, lasting only a moment. Larger file system problems normally occur when the system is overloaded, which can happen for various reasons. If the problem is found to be caused by a particular user account or set of jobs, we immediately work with the user to resolve it.

Please notify us of any connectivity problems by submitting a helpdesk ticket. Include details such as a “cut and paste” of the information in your login window if you are able.

I cannot connect to login.accre.vu.

How can I make a scheduled downtime work for me?

As a scheduled downtime for the cluster approaches more time becomes available for shorter jobs. Thus, if you have applications that take a few days or less to run, you will be able to execute more of these types of jobs as a scheduled downtime approaches because applications requiring longer period of times will not be running. It’s an excellent time to take advantage of the extra computing cycles that would ordinarily not be available!

I can't SSH into the cluster. I'm getting an error message: "No matching cipher found"

That error means that we have updated our SSH servers to disable weak ciphers, and you are using an older SSH client that cannot support strong ciphers.

Environment

How do I display graphics from the cluster to my local machine?

The easiest way to run graphical applications is by setting up a virtual desktop within the ACCRE Visualization Portal (after you log in, go to Interactive Apps > ACCRE Desktop).

The X Window Server lets you run applications on ACCRE that have a graphical user interface. While it may come in useful at times, applications running on an X connection are slower than if they are running locally. For this reason we do not support this or recommend using it (especially from outside campus) unless there are no alternatives.

If you must use the X Window Server, you should first check with your PI before installing the following software, since one or both of these may already be on your system, especially if you’re using a computer in your lab which is already configured to run on the cluster.

  1. Get X server support on your local machine: The graphics environment on the cluster is X11, therefore, you must install and run an X server from your local machine
  2. Configure SSH tunneling: You must tell SSH on your local machine to allow the display of graphics from software running on the cluster.

Windows users: Xming X Server is one of the best X Window servers available for Windows. You can follow the instruction provided there to install and set up the server.

Mac OS X users: You can get a free X11 server from Apple. Mac OS X should already have SSH installed.

  1. Follow their directions to install and run the X11 server.
  2. Launch the X11 server.
  3. Run an xterm.
  4. When you log on to the cluster from the command line in the xterm, to activate SSH tunneling you can use the -X option, i.e., ssh -X user@login.accre.vu.
  5. Finally, to test that X11 forwarding is installed and working correctly, try typing xeyes on the command prompt and hitting Enter/Return. You should see a window appear with two eyes that follow your cursor.

Linux users: We assume you are already running an X server and have SSH installed.

  1. When you log on to the cluster, ssh -X will activate SSH tunneling, i.e., ssh -X user@login.accre.vu.
  2. To test that X11 forwarding is installed and working correctly, try typing xeyes on the command prompt and hitting Enter/Return. You should see a window appear with two eyes that follow your cursor.

I am running an X server, how do I fix X connection or .Xauthority file errors?

If you are getting error messages similar to these:

/usr/X11R6/bin/xauth: error in locking authority file /home/user/.Xauthority

X11 connection rejected because of wrong authentication. X connection to
local host:11.0 broken (explicit kill or server shutdown)

try removing the .Xauthority file in your home directory, then log out and back in. This file occasionally becomes corrupted. When you log back in and start X, it will recreate your .Xauthority file. Sometimes you have to do this a few times. If you continue to have problems, please submit a helpdesk ticket.

Linux

What command do I need to type in order to run an executable in Linux?

To execute a program in the current work directory, type:

./<file_name>

For files that are not in the current working directory, use the full path: /path/to/your/executable/file

How can I schedule a command using cron?

Crontab usage is denied to all ACCRE users by default as required for security compliance according to Rocky 9 CIS Benchmark Level 1 Server Section 2.4.1.8

In the unusual case that a user needs to execute cronjobs on an ACCRE gateway node, that user must submit a ticket briefly describing the workflow and explaining the purpose for the security exception. Additionally, the user must state the ACCRE group corresponding to the project for which these tasks will be executed. Cluster users who have been provided an exception may install a crontab on the special gateway server cronibus.vm.accre.vu. This server is accessible via ssh from within the cluster and has no external access, so users must first ssh to login.accre.vu and from there ssh again to cronibus.vm.accre.vu.

Crontab usage is permitted by exception but not supported and users are responsible for understanding how to use and manage their crontab as well as for debugging any issues that may arise. We recommend that users back up their crontab file in a suitable location.

Cron may NOT be used for data analysis, large file transfers, or any memory or cpu intensive task. Acceptable uses would include parsing small standard output files from submitted jobs, periodically submitting jobs or job arrays, or updating a small sqlite data file based on job results. Tasks requiring more than a 30 seconds of CPU-core time or requiring over 500MB of RAM are not suitable and should be performed via submitted jobs. ACCRE reserves the right to determine when cron usage is considered excessive and to remove user access to cron if the user is unable to promptly reduce excess resource usage after being alerted.

ACCRE may periodically email active users to determine if their cron access is still required and may discontinue access if no response is received within a reasonable time frame.


Jobs

What types of nodes are available?


How do I run test jobs?

We allow users to run very short (< 30 minutes) tests that have low memory usage (< 1 GB). Anything more should be submitted the scheduler. We have a debug SLURM partition/queue available for running quick tests and prototyping.

What are the ACCRE cluster defined attributes I can use in my SBATCH scripts corresponding to the available node properties?

The properties of our compute nodes can be specified with combinations of available attributes (defined by us), e.g.: haswell, sandy_bridge Note that the haswell attribute requests the latest Intel processors, while sandy_bridge requests the previous generation. In your batch script you could specify: #SBATCH --constraint=haswell. This would instruct the scheduler to run the job only on a node with an Intel Xeon Haswell processor. Note that your job may take longer to start when these attributes are included as you are limiting the pool of resources the scheduler can choose from. For a full list of available features, trying running the sinfofeatures command while logged into the cluster.

Can I run on the gateway machines?

When you log on via login.accre.vanderbilt.edu, you are logged onto a gateway machine. From here you submit your jobs which are sent to the compute nodes by the scheduler. However, we do allow you to run very short (< 30 minutes) test jobs that have low memory usage (< 1 GB) on the gateway machines, as long as such jobs do not slow the gateway for other users. Anything longer than this should be submitted to the compute nodes using sbatch (see the tutorial).

What happens if my job uses more resources than requested?

The job scheduler will automatically kill most jobs which exceed the resources requested in the SBATCH script. For example, if you specify a walltime of 4 hours and your job runs over that, the scheduler will kill the job. The reason for this is that running jobs which use more resources than requested may affect the scheduling and running of other jobs. This is because the scheduler relies on SLURM specifications (among other parameters) to determine on which nodes to run jobs. Also read our job scheduler policies for more information on killing jobs which are interfering with other jobs or the system itself. When testing code or running code you are unfamiliar with, you should more diligently monitor the resource consumption to fine tune your SBATCH request. Specifying much more, e.g., walltime or mem, than your job requires may delay its start time if the requested resources are not immediately available. Therefore, you should start somewhat conservatively, then reduce your resource specifications once you’ve determined what you are really using, still always leaving a buffer to ensure the job is covered. Learn more about how to request resources and the SBATCH defaults when you submit a job. Learn how to monitor and check the status of a submitted job.

Why is my eligible job waiting so long in the PENDING state?

There are several things you should check to understand your wait time in the queue. See tips on checking the status of a submitted job.

  • Make sure you have requested an allowed set of resources. Check your SBATCH script against both the available nodes in the cluster and our job scheduler policies. You can also check on the resources requested with the command: scontrol show jobjob_number
  • Check your group’s current usage by typing qSummary -g group_name. Compare that to your group’s bursting limits by running showLimits -g group_name. If your group’s current usage is close to or equal to its bursting limits, this could be causing delays. Details about both of these commands can be found [[../../new-command-slurm_groups/index.html|by clicking on this link]].
  • Check overall cluster utilization with the SlurmActive command.
  • Check the queue and current usage on the cluster. It could be the particular resources your jobs need may be heavily utilized, even if the entire cluster is not. You can check the total usage of the cluster with the command squeue. You can also see current and past utilization levels on this website.
  • Your account or group account may be running over its fairshare. This means when the cluster is very busy, other jobs from accounts which are under fairshare may be assigned higher priority and may jump ahead of your job in the eligible queue. Use the showLimits -g group_name command to check your fairshare.

If you still do not understand why your jobs are not starting more quickly, please submit a helpdesk ticket.

What does job status Deferred mean?

In SLURM there is no “deferred” state. However, jobs may ask for resources that cannot be provided, e.g., too much memory. In such cases, running the command squeue and looking for your job ID, Slurm will provide a short explanation of why the job either cannot run, or is not running. Do squeue -u <username> to see the explanation.

What is the maximum number of jobs I can submit or have running at any one time?

“Active” Limits: Each user/group/account has a limit on the number of processors in use at any time. This number is summed from any combination of single and multi-processor jobs. Additional limits are placed as necessary for groups running either medium (defined as 4 to 7 days) or long (over 7 days) jobs on a regular basis if there usage is impacting the ability of other groups to use their full fairshare. Individual groups may also request upper limits on their users. New guest users have upper job limits until they have attended the Introduction to the Cluster and Job Scheduler classes. Use the showLimits command to check your group’s limits. Please refer to the job scheduler policies for additional important details of these limits.

What is the maximum allowed “wall clock time” I may specify?

The maximum allowed walltime is 14 days, or in hh:mm:ss = 336:00:00. Your job will not start if you have specified a walltime greater than this. You may reduce the walltime of an already submitted job using scontrol (slurm job control). In addition we ask that, except for a small number of test jobs, jobs run at least 30 minutes and over an hour in length is preferable. Our job scheduler policies explains more on this subject. Also see How to Submit Basic Jobs for other SBATCH specifications and how to deal with very short jobs.

How do I hold/release/delete a job?

A user may place a USER hold upon any job the user owns. To do that, type: scontrol hold <jobId>. To release the held job, type: scontrol release <jobId>. Note that you can only hold and release jobs that are pending (i.e. this will not work for running jobs). User can also delete a queued/running job using the command: scancel <jobId>. To delete all the jobs owned by the user, type: scancel -u <userid>. To cancel a job by name, type: scancel --name <jobName>.

Where can I find detailed documentation on all SLURM commands?

Please visit SLURM for a complete and detailed list of all SLURM commands.

How can I delete many jobs at once?

If you are using bash, the following script shows how to delete all jobs between 10000 and 10010:

    for jobid in `seq 10000 10010`
    do
    scancel $jobid
    echo cancelling $jobid
    done

How much memory is available on each node?

Because the OS and other system processes (e.g. GPFS managment) already use certain amount of memory, not all physical memory is avaiable for running jobs. In general, ACCRE nodes contain anywhere from 22GB – 248GB of available memory for jobs to use.

How do I request a node for exclusive usage?

To request a node for your private use, use something like the following:

#SBATCH --ntasks=12
#SBATCH --exclusive

in your job submission script. If the job does not require exclusive access to the node (it just needs 8 cores), you can still use:

#SBATCH -ntasks=8

Note that in this case, the job can be assigned to an 8-core, 12-core, or 16-core node that is shared with other jobs. Note that requesting exclusive access to a compute node may result in longer queue times.

Do ACCRE compute nodes support hyperthreading?

All ACCRE CPUs support 2-way simultaneous multithreading (SMT), such as Intel hyperthreading. If you request 2 tasks/cores in your SLURM job, SLURM will allocate 2 physical cores (or 4 logical cores) to your job. However, the user must decide whether to make use of hyperthreading or not, and instruct his/her program to do so. We leave hyperthreading enabled on all but our GPU nodes. Many multi-processor applications can take advantage of hyperthreading to run in significantly less time. Please see this link for more information on hyperthreading.

If I belong to multiple groups, how can I define the group name under which my job is to run on the cluster?

You can add the following line in your SBATCH script: #SBATCH --account=. Here, mygroup is the group name that you want the job to run under.

How do I checkpoint my job?

If your job runs more than a few hours, it is a good idea to periodically save output to disk in case of failure. We current do not provide any checkpointing integration through SLURM, so any checkpointing must be performed directly from a user's application.

How do I use local storage on a node?

In some scenarios it may be advantageous to read or write data to a compute node’s local hard disk, rather than to/from our parallel filesystem (/home, /nobackup, and /data are all stored on the parallel filesystem). One common example is if you will be reading or writing to/from a file frequently. Each compute node has a world-readable/writeable directory at /tmp. If you want to move files to this local storage, we recommend creating a subdirectory at /tmp and then copying data to it before launching a program that will read these data. Note: a program must know where to find these data, so you generally must provide an absolute path to the file from within your program. Please be sure to clean up your data at the end of your job (using the mv or rm commands). Below is an example of how this might be done within a SLURM job:

#!/bin/bash
#SBATCH --ntasks=1
#SBATCH --nodes=1
#SBATCH --mem=4G
#SBATCH --time=4:00:00
#SBATCH --output=myjob.txt

localdir=/tmp/myjob_${SLURM_JOBID}
tmp_cleaner()
{
rm -rf ${localdir}
exit -1
}
trap 'tmp_cleaner' TERM

localdir=/tmp/myjob_${SLURM_JOBID}
mkdir ${localdir} # create unique directory on compute node
cp mydata.txt ${localdir} # copy data to node
./run_my_prog # run program that reads/writes to/from local disk
rm ${localdir}/mydata.txt # remove data from local disk
mv ${localdir}/output.txt ./ # move results to working directory on GPFS

We've added the tmp_cleaner() function above, otherwise the data will remain in /tmp if the job is cancelled. The function intercepts the SIGTERM signal and deletes the data before the job ends. The cleanup must complete before the 30 seconds of grace time that Slurm gives to jobs to complete before forcibly killing all the job's processes.

A SLURM command fails with a socket timeout message. What’s the problem?

Occasionally when you attempt to run a SLURM command (e.g. sbatch, salloc, squeue) the command may hang for an extended period of time before failing with the following message:

error: slurm_receive_msg: Socket timed out on send/recv operation
error: Batch job submission failed: Socket timed out on send/recv operation

This error results when the SLURM controller is under a high amount of stress. Avoiding this error requires all cluster users to play nice and follow cluster etiquette policies. Specifically, all cluster users are encouraged to

  1. submit a large number (>100) of similar jobs as job arrays (see our SLURM documentation page for examples), and
  2. avoid submitting a large number (>100) of short jobs (< 30 minutes),
  3. avoid frequently calling SLURM commands like squeue and scontrol for automated job submissions and monitoring.

Job arrays reduce the load on the scheduler because SLURM only attempts to schedule the entire array once, rather than every element within the array independently. Short jobs produce more “churn” within the job scheduler as it works to allocate, de-allocate, and re-allocate resources at a rapid pace. If you are running a lot of short jobs, please try to bundle multiple jobs together into a single job to put less stress on the scheduler. Similarly, automated monitoring tools can abuse the scheduler by requesting data from SLURM too frequently. Please reach out to us if you need assistance developing alternate methods of monitoring and submitting jobs in an automated manner. The socket timeout error message is generally intermittent, so if you wait a few minutes and try your SLURM command again it may complete immediately.

Disk Space

Determining Disk Space Usage and Quotas

As noted in the cluster disk policies, you have both soft and hard limits on both your home and nobackup directories. To help keep the system running smoothly, you should be in the habit of checking your usage level, especially since hard quota limits are definitive and, due to potential filesystem problems, we may have to either kill jobs or place temporary limits on accounts which exceed their soft limit. Please read our cluster disk policies to understand disk space quotas and the FAQ on how to increase your available diskspace by using nobackup space, requesting a possible temporary quota increase, or purchasing more diskspace. To view your current usage and quota levels, type the command:

accre_storage

Using /nobackup disk space

You have disk and file allocations available for your use on both your home directory (which is backed up) and on nobackup disk space (which is not backed up). To take advantage of your nobackup disk space, simply cd to that filesystem and create your personal directory. For example:

cd /nobackup
mkdir vunetid
chmod 700 vunetid
cd vunetid

where vunetid is your unique VUNetID, which is also your ACCRE user id. If you are unsure what your VUNetID is, simply type whoami while logged into the cluster to find out. Note that the chmod 700 command is needed to set the appropriate permissions on your nobackup directory so that only you can access it. Note that some ACCRE groups also have their own private shared group /nobackup directories.

Will ACCRE restore deleted or lost data?

Yes. Please refer to our policy regarding restoring from backup. Note that files in /nobackup are never backed up.

My network connection to ACCRE is really poor and I have a lot of data that I need to upload to ACCRE (or download from ACCRE). What are my options?

To transfer files between your local machine and ACCRE, it is recommended to install and use FileZilla. FileZilla is a simple to use client which allows you to use the SFTP protocol to upload and download files between systems. To install FileZilla, simply go to their website and download the client The following is a beginner’s guide to FileZilla: https://www.ostraining.com/blog/coding/filezilla-beginner/ If you do not want to overwrite files each time you upload a directory to the cluster then you can do the following: go to Edit > Settings > Transfers > File exists action and change the Uploads setting to Overwrite file if source is newer. Changing this setting will only upload files that are newer than the copy on the remote system. Linux and Mac clients could also just use the built-in rsync command.

How can I mount NFS, Samba, FTP, SSH, HTTP, and other remote mounts locally? (beta)

To configure this, edit your .bashrc file and add:

source /accre/usr/bin/gvfs_startup.sh

Edit your .bash_logout file and add:

source /accre/usr/bin/gvfs_cleanup.sh

Log out and log back in. Now you should be able to mount remote exports like:

The remote folders will be mounted under ~/gvfs in your home directory. To unmount, use the same syntax for mounting, except change gio mount to gio mount -u. All mounted folders will be automatically unmounted if you log out.

Using local compute node temporary storage

ACCRE allows using the /tmp directory on a local compute or gateway node for the purpose of storing small to medium sized temporary files that will not be needed after a job completes. Local disk storage should be limited to no more than 10GB of space per CPU-core allocated to the job, in order to ensure sufficient space is available to all users. All data stored in /tmp will be deleted after 30 days and will also generally be considered unavailable once a job completes.

Users are responsible for removing all data from /tmp before job completion. You are welcome to write your jobs to remove their temporary files in the manner of your choice, but we do have a tool in place to make temporary file cleanup easier for users. If you run the command source setup_accre_runtime_dir at the beginning of your Slurm script, a secure temp directory will be created for you, and hooks will be set up to ensure that this space is removed when your job completes, even in the case of job failure. The path to your temporary space will then be set in the environment variable $ACCRE_RUNTIME_DIR which you can pass to your executables to store temporary data locally.

Note that storing temporary files on the local node may drastically improve performance over using the network filesystem in certain cases, especially when a job creates a large number of very small files or repeatedly opens and closes small files.

GPU

How do I request a GPU node?

Software

What research software packages are available on the cluster?

Run module available to see a comprehensive list of available software packages that can be accessed from your environment by using the module load command.

How do I make sure that my perl/python script is using the latest version available on the cluster?

First, add the appropriate package to your environment (e.g. your .bashrc/.cshrc file) with command:

module load PKG1 PKG2 PKG3

Then, use the following line:

#!/usr/bin/env python (or perl)

as the first line of your script. This automatically detects the path to the added perl/python package and use that version as the interpreter of your script.

I’d like to have some software installed on the cluster. How do I go about doing that?

As much as possible, ACCRE staff are glad to accommodate your needs for software. Of course, the software must be amenable to execution in the cluster environment and (if not open source) you are responsible for taking care of licensing arrangements prior to installation, as well as continued maintenance of the software license. If you’d like to explore the possibility of adding some software to our cluster environment, please submit a helpdesk ticket. Note that we in general recommend that users install software into their cluster home directories. In this way you have complete control over the version of the software, applying updates, and so on. ACCRE staff are more than happy to assist you during this process.

How do I install an R package from source code?

R users should take a look at our Software Page for details and best practices for using R on the ACCRE cluster. Here is an example that uses the nlme package. Login to the cluster, and, if you have not already done so, in your home directory create a directory for your R packages. Here is an example:

mkdir -p R/rlib 

You will also need a tmp directory in your home directory, so do this in your home directory:

mkdir tmp/ 

You will need to first put the appropriate version of R in your PATH environment variable using LMod:

module load GCC OpenMPI R 

Now change to your tmp directory, and download the source code:

cd tmp/
wget http://cran.r-project.org/src/contrib/nlme_3.1-104.tar.gz 

Generally, it will only take a few seconds to download the “tarball”, but sometimes it can take longer. Now start R:

R 

At the R-prompt (denoted by >) tell R where you will keep your packages:

> .libPaths("~/R/rlib") 

Next tell R to install the package:

> install.packages("nlme_3.1-104.tar.gz", repos = NULL, type="source") 

R will now compile and install nlme into your personal R library: ~/R/rlib To test your install quit R

> quit()

Restart R and at the prompt

> .libPaths("~/R/rlib")
> library("nlme") 

You should see nlme loaded. You need to remember to add these two lines to any script you feed to R if you intend to use nlme. If you wind up installing many packages you can put the .libPaths("~/R/rlib") command in your .Rprofile. You may now delete the sourcecode package:

rm nlme_3.1-104.tar.gz 

What happens if R says that there are needed dependencies? This sometimes happens, and you will need to download and install those packages before installing the one you wanted. Just follow the steps outlined above until you have downloaded and installed all the packages.

How do I download and install an R package from the internet?

R users should take a look at our R Software Page for details and best practices for using R on the ACCRE cluster. Here is an example that uses the Zelig package. Login to the cluster, and, if you have not already done so, in your home directory create a directory for your R packages. Here is an example:

mkdir -p R/rlib 

You will need to add R to your PATH environment variable using LMod:

module load GCC OpenMPI R 

Now start R:

R 

At the R-prompt (denoted by >) tell R where you will keep your packages:

> .libPaths("~/R/rlib") 

Next tell R to install the package:

> install.packages("Zelig") 

R will now give you a list of repositories to download from. Choosing the Tennessee repository seems good. That is choice 80. R will now download, compile and install Zelig into your personal R library, ~/R/rlib. Note that occasionally you may need to pass additional arguments to install.packages() if it needs a library in a nonstandard location. For example, the hdf5r package may need a recent version of the HDF5 library that is available through LMod. In this case you might instead run a command like the following:

How do I install and load an R package from Bioconductor?

R users should take a look at our R Software Page for details and best practices for using R on the ACCRE cluster. Here is an example that uses the goseq package. Login to the cluster, and in your home directory create a directory for your R packages. Here is an example:

mkdir -p R/rlib 

You will need to add R to your PATH environment variable using LMod:

module load GCC OpenMPI R 

Now start R:

R

At the R-prompt “>” tell R where you will keep your packages:

> .libPaths("~/R/rlib") 

Next, point R to the Bioconductor site:

> source("http://bioconductor.org/biocLite.R") 

Next, ask R to get the package, compile and install it in your personal R library (~/R/rlib)

> biocLite("goseq") 

goseq and its dependencies will be downloaded, compiled, and installed. If everything succeeds you will see

* DONE (goseq)

After that, you may get a series of warnings about packages needing to be upgraded. You may ignore the warnings. To test your install quit R

> quit() 

Restart R and at the prompt

> .libPaths("~/R/rlib")
> library("goseq") 

You should see goseq and the two dependencies loaded. You need to remember to add these two lines to any script you feed to R if you intend to use goseq. If you wind up installing many packages from Bioconductor you can put the .libPaths("~/R/rlib") command in your .Rprofile.

R is taking longer to start than usual when using Rstudio

The Rstudio server acts like a manager to your workflow. Any command and code you type on it, is run via a rsession (worker process) that executes the said code. The rsession is single threaded so it can only do one thing at a time (this is an important detail). If one of such line(s) on your code is computationally heavy and thus making the rsession (worker process) work full time to get that done, loading the rserver in this state can result in an unresponsive ui.

The reason behind this is the communication pattern between the rserver and rsession. The rserver needs to ask the session about it’s state before it can show the same on the UI. However, since this session is super busy and can only do one thing (single threaded) at a time, it might fail to respond to the rserver’s request. This behavior is analogous to asking a kid to finish their homework and asking them mid-homework if they were done. The kid in this case would be so pre-occupied with completing their homework (single threaded), that they wouldn't have enough capacity to listen or to respond to the question thus asked.

A good way to replicate this is to simply run the following on an rstudio console and refresh the page to wait exactly for the amount of time it sleeps:

 system("sleep 100") 

Thus, it is recommended to the users to use RStudio to only develop quick checks and tests, on small data samples. For large datasets, and/or long running jobs, please submit your code as a slurm job using Rscript. Please read more about submitting slurm job with R

How do I install a Perl module without root privilege?

You do not need to have root permission to install a module. You just install your PERL module locally in your home directory. Make a directory called, say, lib/ in your home directory like this:

# first navigate to your home directory
$ cd ~

# now make a directory called lib
$ mkdir lib 

Now you have a directory called ~/lib where the ~ represents the path to your home dir. ~ literally means your home dir, but you probably know that already. All you need to do is add a modifier to your perl Makefile.PL command

$ perl Makefile.PL PREFIX=~/lib LIB=~/lib 

This tells Make to install the files in the lib directory in your home directory. You then just make/nmake as before. To use the module you just need to add ~/lib to @INC. Next, you modify the top of your own scripts to look like this:

#!/usr/bin/perl -w use strict; # add your ~/lib dir to @INC use lib
"/usr/home/your_home_dir/lib/"; # proceed as usual use Some::Module;

How do I run Matlab/SAS job on the cluster?

Matlab is free to use for all ACCRE users. More information is available here.

In order to run SAS jobs, you must first purchase a license from ITS software store. Once ITS notifies us your purchase, you will be added to the relevant group so that you can have permission to run the software. License may not be shared among different users. However, with one license, you can run multiple jobs at the same time on the cluster.

When trying to install a package in R, I get ‘Warning: unable to access index for repository ’.

From an R interactive session, try:

 > install.packages('package_name', dependencies=TRUE, repos='http://cran.rstudio.com/') 

When trying to install a package in R, I get ‘Warning message: package ‘somepackage’ is not available (for R version 3.0.0)'.

Try using the latest version of R.