Minimizing job queuing delays on Compute-Canada / SLURM

If you have several thousands of similar jobs to submit to a SLURM cluster such as Compute-Canada, one of your goals, aside from designing each job run as quickly as possible, will be to reduce the queuing delays — time spent waiting for an idle node to accept your job. Scheduling delays can quickly shadow the runtime of your job if you do not take care when requesting job resources — you may wait hours, or days even. One way to minimize waiting, of course, to schedule fewer jobs overall (e.g. by running more tasks within a given allocation if your allocation’s time is not expired). If you can fit everything in a single job, then perfect. But what if you could divide the experiment in two halves to run concurrently, thereby obtaining close to 200% speedup, and also spend less wallclock time in the job queue, waiting?

When working with compute tasks whose execution parameters can be moved along  several different resource requirement axes (time, # threads, memory), the task of picking the right parameters can become difficult. Should I throw in more threads to save time? Should I use fewer threads to save ram? Should i divide the inputs into smaller chunks to stay within a time limit? Wait times are also, sometimes, multiplied if the jobs are resubmitted due to unforeseen errors (or change of parameters).

It helps to understand how the SLURM scheduler makes its decisions when picking the next job to run on a node. The process is not exactly transparent. The good news is that it’s not fully opaque either —  There are hints available.

Selecting resources.

Once a job has been submitted via commands such as sbatch or srun, it will enter the job queue, along with the other thousands of jobs submitted by other fellow researchers. You can use squeue  to see how far back you are in line, but that doesn’t tell you much. For instance, that will not always provide estimated start times.

You most likely already know that the amount of resources you request for your job, e.g. number of nodes, number of cpus, RAM, and wall-clock timelimit, have an influence on your job’s eligibility to run sooner. However, varying the amount of resources can have surprising effects on time spent in the scheduling state (aka state PENDING). The Kicker:

In some cases, increasing requested job resources can lower overall queuing delays.

Compute nodes in the cluster have been statically partitioned by the admins. Varying the resource constraints of your job will change the set of of nodes available to run it somewhat similar to a step function. Each compute node on compute-canada (cedar and graham) is placed in one or more partitions:

  • cpubase_bycore (for jobs wanting a number of cores, but not all cores on a node)
  • cpubase_bynode (for jobs requesting all the cores on a node)
  • cpubase_interac (for interactive jobs e.g., `salloc`)
  • gpubase_… (for jobs requesting gpus)
  • etc.

If you request a number of nodes less than the number of cores available on the hardware, then your job request is waiting for an allocation in the _bycore partitions. If you request all of the cores available, then your job request will wait for an allocation in the _bynode partitions. So based on availability, it might help to ask for more threads, and configure your jobs to process more work at once. The SLURM settings will vary by cluster — the partitions above are for cedar and graham. Niagara, the new cluster, for instance, will only be doing by_node allocations.

On compute-canada: You can view the raw information about all available nodes and their partition with the sinfo command. You can also view the current node availability, by partition using partition-stats.

The following command outputs a table that shows the current number of queued jobs in each partition. If you look in the table closely, you learn that there is only one job in the queue for any node in the _bynode partition (for jobs needing less than 3 hours). The _bycore partition, on the other hand, has a lot of jobs sitting on it patiently. If you tweak your job to make it eligible for the partition with the most availability (often it is the one with the most strict requirements), then you minimize your queuing times.

$ partition-stats -q 

     Number of Queued Jobs by partition Type (by node:by core) 

Node type | Max walltime 
          | 3 hr  | 12 hr   | 24 hr   | 72 hr  | 168 hr | 672 hr  |
----------|--------------------------------------------------------
Regular   | 1:263 | 429:889 | 138:444 | 91:3030| 14:127 | 122:138 |
Large Mem | 0:0   | 0:0     | 0:0     | 0:6    | 0:125  | 0:0     |
GPU       | 6:63  | 48:87   | 3:3     | 8:22   | 2:30   | 8:1     |
GPU Large | 1:-   | 0:-     | 0:-     | 0:-    | 1:-    | 0:-     |
----------|--------------------------------------------------------

“How do I request a _bynode partitioning instead of a _bycore ?”

That is not obvious, and quasi undocumented. The answer is that you do so by asking for all the cpus available on the node. This is done with sbatch --cpus-per-task N. To get the best number of CPUs N, you have to dig a bit deeper and look at the inventory (this is where the sinfo command comes handy). The next section covers this. And this is something that may change over time as the cluster gets upgrades and reconfigurations.

Also, if you request more than one node in your job, each with N CPUs (e.g. sbatch --nodes=3 --cpus-per-task=32 ...), then all of them will be _bynode.

Rules of Thumb

Here are some quick rules of thumb which work well for the state of the cedar cluster, as of April 2018. The info presented in this section was gathered with sinfo, through conversations with compute-canada support, and through experience (over the course of a few weeks). In other words, I haven’t attempted to systematically study job wait times over the course of months, but I will claim that those settings have worked the best so far for my use cases (backed with recommendations from the support team).

  1. Most compute nodes have 32 CPUs installed.
    If you sbatch for N=32 cpus --cpus-per-task=32, you are likely to get your job running faster than if you ask for, say N=16  cpus. If your job requires a low-number of CPUs, then it might be worth exploring options where 32 such jobs are run in parallel. It’s okay to ask for more, but try to use all the resources you ask for, since your account will be debited for them.
  2. Most compute nodes have 128G ram.
    If you keep your job’s memory ceiling under that, you’re hitting a sweet spot. You’ll skip a lot of the queue that way. Next brackets up are  (48core, 192G) (half as many as in the 128G variety), and (32 core, 256G) (even fewer).
  3. Watch out for the --exclusive=user flag.
    The flag --exclusive=user tells the scheduler that you wish your job to only be colocated with jobs run by your user. This is perhaps counter-intuitive, but it does not impose a _bynoderestriction. In the case where your job is determined to be _bynode(i.e., you request enough cpus to take a whole node), this flag is redundant. If you don’t ask for all cores available (meaning your job needs a _bycore allocation), then this flag will prevent jobs from other users from running on that node (on the remaining cpus). In that case, the flag will likely hurt your progress (unless you have many such jobs that can fill that node).
  4. The time limit you pick matters. Try to batch your work in <= 3H chunks.
    A considerable number of nodes will only execute (or favor) tasks which can complete in less than 3H of wallclock time. There’s a considerable larger amount of nodes that will be eligible to you also if you go under the 3H wallclock time limit. The next ‘brackets’ are 12H, then 24H, 72H, 168H (1 wk), and 28d. This suggests that there is no benefit to asking for 1H vs 3H, or 22H vs 18H, although an intimate conversation with the scheduler’s code would confirm that.

But, I want it now.

It should be mentioned that often, if only a single “last-minute” job needs to be run, salloc (same arguments as sbatch, mostly) can provide the quickest turnaround time to start executing a job. It will get you an interactive shell to a node of the requested size within a few minutes of asking for it. There is a separate partition, cpubase_interac, which answers those requests. Again, it is worth looking at the available configurations. Keep salloc it in your back pocket.

 

Purging GBS index switching

Quote

Considering the amount of sequencing coming out of the newer Illumina machines, we’ve started to combine GBS libraries with other samples. Due to how GBS libraries are made, when multiplexed with whole genome sequencing samples, there is an appreciable amount of contamination from GBS to WGS. That means you will find GBS reads in your WGS data. I’ve quantified that in the following figure, showing the percent of barcoded reads in WGS samples.

The left side is contamination from barcodes sequenced in different lanes (i.e. ones where they couldn’t contaminate). The right side is barcodes from GBS samples sharing the same lane (i.e. ones that could contaminate. The take home message is that between 1% to 15% of reads are GBS contamination. This is far above what we want to accept so they should be removed.

I’ve written a script to purge barcoded reads from samples. You give it the set of possible barcodes, both forward and reverse (All current barcodes listed here: GBS_2enzyme_barcodes). I’ve been conservative and been giving it all possible barcodes, but you could also trim it to only the barcodes that would be present in the lane. It looks for reads that start with the barcode (or any sequence 1bp away from the barcode to account for sequencing error) plus the cut site. If it finds a barcoded read, it removes both directions of reads. It outputs some stats at the end to STDERR. Note, this was written for 2-enzyme PstI-MspI GBS, although could be rewritten for other combinations.

An example of how you could apply this:

Make a bash script to run the perl script:

input=$1
perl ../purge_GBS_contamination.pl /home/gowens/bin/GBS_2enzyme_barcodes.txt ${input}_R1.fastq.gz ${input}_R2.fastq.gz ${input}.tmp;
gzip ${input}.tmp_R*.fastq

Run this bash script using gnu parallel

ls | grep .gz$ | grep R1 | sed s/_R1.fastq.gz//g | parallel -j 10 bash .decontaminate.sh 2>> decontamination_log.txt

 

 

 

 

 

BRC wifi on the 3rd floor

Many people on the 3rd floor of BRC building told me that their wifi (ubcsecure) keeps getting disconnected. I had the same problem, so I have asked UBC IT services for help.

IT services suggested me to do the following steps:

1)remove ubcsecure network from the wifi option
2)connect to ubcvisitor and go to autoconnect.it.ubc.ca to run the tool
3)ubcsecure is created automatically
4)reconnect to ubcsecure

Now it is working for me.
I hope it would work for everyone else in the building.

Posted in IT

SmartGit GUI Tool

NOTE: This is an old draft post from Thuy (last updated 17 Dec 2012). I’m publishing it because it seems useful and mainly complete. –Brook

What is Git?

Git is a distributed source control version system.  It allows multiple people to work on the same code simultaneously by keeping track of changes made to files.  It visualizes differences between file versions and merges changes from different authors.  It also makes snapshots of file versions, so that you can go back to any version later.  Because git is distributed, you store a copy of the code repository and its change history on your own local machine.  When you are ready, you can sync your files to a remote repository server, such as BitBucket or GitHub.  Syncing to the remote server will share the updated code with all the other users, and they can merge the changes into their own copies if they wish.  Whether or not you use a remote repository server, git will always store your entire repository change history on your local machine.

Continue reading

Non-Batch Submissions to the SRA

Since many of you have smaller amounts of data (a handful of samples/runs per experiment), I wanted to provide some info on submitting that data to the SRA without the help of their staff. The SRA has recently re-vamped their submission process, and it is much easier/simpler than before.

It’s important that everyone submit their own data. The SRA provides a free off-site backup, and we have already replaced corrupted files on our computers with clean copies obtained from the SRA. The time you spend doing this will ensure that you always have a clean copy of your raw data.

Here is a brief overview of the submission process, as well as the info you’ll need.

As well as the NCBI’s “quick start guide”.

Here are the pages for submitting BioProjects and BioSamples:

The process now is fairly straightforward and well-documented by the SRA. If anyone has trouble, you can ask others in the lab that have submitted their own data in the last year or so. I believe Dan Bock is one of them.

Here are some of the lab’s existing bioprojects. These can give you an idea of what kind of info to include in the abstract/project description, etc.:
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA64989
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA194568
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA194569
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA194570
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA194446
http://www.ncbi.nlm.nih.gov/bioproject/PRJNA194445

Mounting the Moonrise NFS

Edit: As of February 2015, all our servers are running CentOS 7. The original instructions below are for Debian/Ubuntu linux, but here is a better set of generalized instructions for CentOS: https://www.howtoforge.com/nfs-server-and-client-on-centos-7

If you are mounting on an Ubuntu/Debian system, you can still use the instructions below. If you are mounting on a Red Hat derivative (Fedora, RHEL, CentOS, ScientificLinux, etc.), the link above should work.


 

I just had to re-learn how to do this today, so I thought it would be a good idea to write it up.

If any of you would like to mount the NFS on a computer (Unix, and static IPs only. This means no wireless!) in the building, you can do so at your convenience using this guide.

First, install nfs-common with your package manager (ubuntu: apt-get install nfs-common)

Next, create a folder for the mount to bind to on your computer, and make sure the permissions are set to 777:

user@mycomputer: mkdir -p /nameofyourNFSfolder
user@mycomputer: chmod 777 /nameofyourNFSfolder

I think the whole tree needs the same permissions, so what I’ve done for all our machines (and what seems easiest) is to make a folder in the root directory, so that you don’t have to worry about the permissions in parent folders.

Next, the /etc/hosts.allowed and /etc/exports files on moonrise need to be modified. Chris, Frances, and I all have the necessary permissions to do this. Just ask one of us and we can do it.

root@mooonrise: nano /etc/exports

Add the following to the line beginning with /data/raid5part1
137.82.4.XXX(rw,sync) (with XXX standing in for the static IP of your machine)

You could also do this with machines in other buildings/off-campus as long as their IPs are static.

root@moonrise: nano /etc/hosts.allow

Your IP has to be added to the end of each line in this file.

Now reload the /etc/exports file on moonrise (a full NFS restart is not required, and will unmount it on other machines! don’t do that unless you know for sure that no one is using the NFS on any of our computers!)

root@moonrise: exportfs -a

Finally, mount the NFS on your machine:

user@mycomputer: sudo mount -v -o nolock -t nfs moonrise.zoology.ubc.ca:/data/raid5part1 /nameofyourNFSfolder

There are various options you can use with the mount command, but the above should work for just about anyone.

If you want it to auto-mount each time you boot your computer, you can add the following lines to your /etc/fstab file:
#moonriseNFS
137.82.4.123:/data/raid5part1 /nameofyourNFSfolder nfs auto 0 0

That’s it!

Zoology email forwarding

Hi All,
Here is a link to get your @zoology email to forward to another email (gmail).
https://sun.zoology.ubc.ca:442/cgi-bin/admin/forward.cgi

Use settings (the gear thing in the top right corner) -> accounts -> make default to get gmail to send as your @zoology.

Also, you can use “biodiversity” as an alias for your zoology account. For example:

bob@zoology.ubc.ca is functionally the same as bob@biodiversity.ubc.ca

@botany accounts are completely different.

 

Sequencing Data Organization Update

Image

I’ve created a skeleton directory structure and included a few example folders so that everyone can get a better idea of how our data will be organized on the new sever. These are not set in stone. A few people have commented on the blog, or in lab meeting, or to me in person, and I’ve taken all of your suggestions into account.

If you feel like the setup here isn’t optimal, please give some feedback. The better we do this now, the more smoothly things will run in the future!

Sequencing Data Curation Part 1

With our new data server (Moonrise) up and ready to store our sequences, it is time to start being more careful about where and when we move our most important data, and how we keep track of it. I’ve devised a system for storing our data for those of you who will be accessing it via the servers. Only Chris, Sariel, Frances and I will have write access to the directories where data is stored. If you would like your data stored, you will have to fill out a form which I’ve created which will give us all the information we need to store it in its right place. Here is the form.

This is inserting a little bureaucracy into our system, and it’s going to be a pain, but in the long run it will make things much easier. We currently have data which we had a very difficult time finding because the owner is no longer in the lab. With a system like the new one, that will not happen.

We will store our WGS, RNASeq, and GBS data in separate folders. This will make finding your data easier in most cases.

Here are the directory structures for the three types of data:

WGS -> Species -> Cultivar -> Library -> Experiment -> file_name_including_library_size_and_depth_coverage.fq METADATA.txt

RNASeq -> Experiment  (if unnecessary, the following directories can be omitted) -> Species -> Cultivar/Population -> Library -> file_name_including_library_size_and_depth_coverage.fq METADATA.txt

GBS is a little more complex, and we will separate things in two ways.
GBS -> Cut Site/Enzyme (data with different cut sites might be incompatible) -> Data type (population genetics, mapping data) -> From here on things diverge
Pop -> Group* -> files_with_descriptive_names.fq METADATA.txt
Map -> Experiment -> Species -> files_with_descriptive_names.fq METADATA.txt
*groups are based loosely on clades, and on how much data for each species we have (annuus_wild, annuus_cult, argophyllus, bolanderi_exilus, petiolaris_clade, hybrids, perennials, tuberosus_cult)

Generally, file names should include data not encoded in the directory structure, but important enough to be seen when perusing the data. Things like depth of coverage, library size, etc. seem appropriate for all three data types, but for types with which I’m not as familar (GBS), suggestions would be appreciated.

C/C++ Dependency Troubleshooting

Unix C/C++ programs are very finicky about the compiler and library versions.  Compiling is the process of translating human readable code into a binary executable or library that contains machine-friendly instructions.  Linking is the process of telling multiple executables or libraries how to talk to each other.

gcc is a GNU C compiler that is typically used on unix systems. g++ is a GNU C++ compiler on unix systems.  libstdc++ is the GNU standard C++ library.   glibc is the GNU standard C library.  When you install GCC on your unix machine, you are installing a package of the aforementioned items.  The gcc unix command is smart enough to call either the gcc compiler or g++ compiler depending if you pass it C or C++ code.

If you attempt to run your program with an older standard library than it was originally linked it with, your program will crash and complain.  Here are some tips to get around it.

Continue reading

C / C++ Troubleshooting

IN PROGRESS

Most of the high-performance bioinformatic programs are written in C or C++.  Unfortunately, C and C++ code is some of the hardest code to debug.  If you have only programmed casually in perl/python, you will not have a good time.  Here are some tips to help you out, but you will most likely need someone with C / C++ programming experience and knowledge of the code to get you through it.

Continue reading

SmartmonTools & GSmartControl

Smartmontools is a command-line Hard Drive Diagnostic Tool that gives you clues on how long your disk has to live. You can run it manually, or you can configure it to periodically test your drives in the background and notify you about test failures via email.

GSmartControl is a GUI for Smartmontools and much easier to use.

Check out this Ubuntu SmartmonTools Tutorial on how to install and set them up.

Here are some tips that are not easily gleaned from the previous websites:

Continue reading

Perl Troubleshooting

This is a collection of fixes for various issues I’ve had with Perl.  Feel free to add any of your Perl tips here.  I will move this to a wiki page if it gets too big.

All Perl scripts fail with error message “Compilation failed in require at…*.pm did not return true at …”

Unable to install packages on Debian with error message “Perl may be unconfigured”

Continue reading

How to Upload Files to Bitbucket (commandline)

BitBucket is an external source code repository server that hosts our shared code.  Our repositories use Git as the version control program to keep track of file changes.  The idea is you make changes to code on your local machine then share your code with everyone else by uploading to BitBucket.

The instructions below guide you step-by-step in uploading files to BitBucket using the commandline.  Git is one of the most popular version control programs but it is not easy for beginners.  If you want to do something that deviates from these steps, consult a git reference.  Once you understand the basics of the git workflow, you can use a GUI program which can combine multiple steps in a single click.

Continue reading

Newest Lab Member – Needs Name

Dear Colleagues,

It is my pleasure to introduce you to our newest lab member… actually it doesn’t have a name yet, but that’s the main reason for this post. This cute little guy/girl needs a name. I have been toying with TERMINATOR, HAL, iRob, and Rob(ot), but I’m sure you can do better. Could be male, female, or gender-neutral and whoever comes up with the best name gets a special treat.

Continue reading

CheapEasy DIY Barcodes in R

I couldn’t believe how expensive the software was for writing barcodes, so I wrote a short program in R to do it for FREE. And, frankly it should be faster and easier if you already have your labels in an Excel file. You don’t really need to understand the program or even R functions to use it, as long as you know how to run an R program.

Setup and Overview:

[UPDATED (see notes below)] – R-code. Start with this (Note I could not upload a .R file, so this is .txt but still an R program).

Input – barcodes128.csv – You need this file to run the program. Save it in your working directory (see comments in R code for how to set this). AND labels.csv – This is a sample file showing the format for your labels. Even though it’s a .csv, it is a single column with each label as a separate row, so there are no actual commas

Output – BarcodesOut.pdf – A sample output: a pdf file for the 0.5″x1.75″ Worth Poly Label WP0517 (Polyester Label Stock), currently in the lab

That’s really all you need to know, everything that follows is extraneous info. If you have any problems, check out the Detailed Instructions, Troubleshooting Tips, or add a comment below. Continue reading