(Google Search takes a while to update, so most recent changes are not detected)

R tips

These pages provide an introduction to R, emphasizing topics in data analysis that are covered in the course workshops. You are at the start page. Submenu items (above) link to further pages.

This start page outlines help available and introduces the basic use of vectors and other types of data objects in R. See the data submenu item for further information on input, management and analysis of full data sets.

Click the reload button on your browser to make sure you are seeing the most recent version of this page.


Get R

Download R from the CRAN website.


Get add-on packages

R has a core set of command libraries (base, graphics, stats, etc), but there is a wealth of add-on packages available.

Packages already included

The following are a few of the add-on packages already included with your standard R installation. 

boot – bootstrap resampling
foreign – read data from files in the format of other stats programs
lattice – multi-panel graphics
MASS – software and data associated with the book by Venables and Ripley
                "Modern Applied Statistics with S-PLUS"
mgcv – generalized additive models

To use one of them you need to load it,

  library(packagename)


You'll have to do this again every time you run R.

To see all the libraries available on your computer enter

  library()


Example packages available for download

Most R packages are not included with the standard installation, and you need to download and install it before you can use it. Here are a few add-on packages that might be useful in ecology and evolution. The full list of available packages is here.

ape – phylogenetic comparative methods
biodiversityR – statistical analysis of biodiversity patterns
leaps – all subsets regression
meta – meta-analysis
mra – analysis of mark-recapture data
multcomp – multiple comparisons for linear models
nlme – linear mixed-effects models, generalized least squares
popbio – analyzing matrix population models
pwr – power analysis
Rcmdr – graphical user interface (menus, buttons) for basic stats in R
qtl – QTL analysis
shapes – geometric morphometrics
vegan – ordination methods for community ecology

To install one of these packages use the menu bar in R. Select "Install packages" under the "Packages" menu item. You'll have to select a download site (Canada BC). Then select your package from the list provided.

Or, execute the following command instead of using the menu,

  install.packages("packagename",dependencies=TRUE)


To use a package once it is installed, load it by entering

  library(packagename)


R is under constant revision, and periodically it is a good idea to install the latest version. Once you have accomplished this, you should also download and install the latest version of all the add-on packages too. 


Get help

Built-in help

Use "?" in the R command window to get documentation of specific command. For example, to get help on the "mean" function to calculate a sample mean, enter

  ?mean


You can also search the help documentation on a more general topic using "??" or "help.search". For example, use the following commands to find out what's available on anova and linear models. 

  ??anova

  ??"linear models"  # same as help.search("linear models")


A window will pop up that lists commands available and the packages that include them. To use a command indicated you might have to load the corresponding library. (See "Add-on packages" for help on how to load libraries.) Note the "??" command will only search documentation in the R packages installed on your computer.

Interpreting a help page

As an example, here's how to interpret the help page for the sample mean, obtained by

  ?mean


In the pop-up help window, look under the title "Usage" and you will see something like this:

  mean(x, trim = 0, na.rm = FALSE, ...)


The items between the brackets "()" are called arguments.

Any argument without an "=" sign is required -- you must provide it for the command to work.  Any argument with an "=" sign represents an option, with the default value indicated. (Ignore the "..." for now.)

In this example, the argument "x" represents the data object you supply to the function. Look under "Arguments" on the help page to see what kind of object R needs. In the case of the mean almost any data object will do, but you will usually apply the function to a vector (representing a single variable).

If you are happy with the default settings, then you can use the command in its simplest form. If you want the mean of the elements in the variable "myvariable", enter

  mean(myvariable)


If the default values for the options don't meet your needs you can alter the values. The following example changes the "na.rm" option to TRUE. This instruct R to remove missing values from the data object before calculating the mean. (If you fail to do this and have missing values, R will return "NA".)

  mean(myvariable, na.rm=TRUE)


The following example changes the "trim" option to calculate a trimmed mean,

  mean(myvariable, trim=0.1)


Online help

Several excellent R books are available free to UBC students through the UBC library. See my links here.

Tom Short's R reference card
Venables and Smith's Introduction to R  (pdf file -- right-click and save to disk)
Kuhnert and Venables' An Introduction to R: Software for Statistical Modelling &
    Computing (large pdf file: right-click and save to disk)

Someone has solved your problem already

If you want to accomplish something in R and can't quite figure out how, and your books aren't helping, chances are that someone has already solved the problem and the answer is sitting on a web page somewhere on the internet. Google or the R project Search Engine might find it for you.



Keep a script file

Use a text file to write and edit your R commands. This keeps a record of your analyses for later use, and makes it easier to rerun and modify analyses as data collection continues. Add comments to the text file to help you remember how and why you did that particular analysis -- essential when reviewing it weeks (years?) later. R treats text lines beginning with a # symbol as comments.

R has a built-in editor that makes it easy to submit commands to the command line. To start a new text file, go to File on the menu and select "New Document" (Mac) or "New script" (PC). Save to a file with the ".R" extension. To open a preexisting file, choose "Open Document" or "Open script" from the File menu. Commands typed to this file can be passed to the command line by selecting and then pressing the keys <command><return> (Mac) or <control>R (PC).

(If R is not running and you double click a ".R" file later, R will start up but might not load the workspace properly. If this happens, enter load(".RData") in the command window.)


Start with vectors

A vector is a simple array of numbers or characters, such as the measurements of a single variable on a sample of individuals. It is the best way to store numbers and character strings (words). One of the great things about R is that mathematical operations and functions can be applied at once to all the values.

Enter measurements

Use the left arrow  "<-"  ("less than" sign followed by a dash) and the "c" function (for concatenate) to create a vector containing a set of measurements. 

  x <- c(11,42,-3,14,5)              # store 5 values in vector x

  x <- c(1:10)                       # store integers 1 to 10

  x <- c("Watson","Crick","Wilkins") # quotes for character data


Use the "seq" function to generate and store a sequence of numbers to a vector,

  x <- seq(0,10,by=0.1)    # 0, 0.1, 0.2, ... 9.9, 10

(note: seq results that include decimals may not be exact -- the result "0.2" may not be exactly equal to the number 0.2 unless rounded using the "round" command)

Use "rep" to repeat values a specified number of times and store to a vector,

  x <- rep(c(1,2,3),c(2,1,4))        # 1 1 2 3 3 3 3


To view contents of any object, including a vector, type its name and enter, or use "print" command,

  x

  print(x)


Paste to a vector

You can also use paste measurements into a vector from the clipboard. To demonstrate, copy the following 10 numbers to your clipboard: 76  75 -52 -70  52   8 -50  -6  57   5
(i.e., select the numbers with your mouse and then choose Edit -> Copy on your browser menu to copy to clipboard). Then execute the following command in your R command window:

  z <- scan("clipboard", what=numeric())             # on a PC

  z <- scan(pipe("pbpaste"), what=numeric())         # on a Mac


To paste characters instead of numbers, use the following,

  z <- scan("clipboard", what=character())           # PC

  z <- scan(pipe("pbpaste"), what=character())       # Mac


If characters or numbers of interest are separated by commas, use

  z <- scan("clipboard", what=character(), sep=",")      # PC

  z <- scan(pipe("pbpaste"), what=character(), sep=",")  # Mac


Access individual values

Use integers in square brackets to indicate specific elements of a vector. For example,  

  x[5]           # 5th value of the vector x

  x[2:6]         # 2nd through 6th elements

  x[2:length(x)] # everything but the first element

  x[-1]          # everything but the first element

  x[5] <- 4.2    # change 5th value to 4.2


Carry out mathematical operations

The following are operations involving one vector

  x+1            # add 1 to each element of x

  x^2            # square each element of x

  x/2            # divide each element of x by 2

  10*x           # multiply each element of x by 10


Operations involving two vectors are easiest to handle when both are the same length (have the same number of elements). For example, if x and y are two numeric vectors of the same length n, then

  x*y

yields a new vector whose elements are  

  x[1]*y[1], x[2]*y[2], ... x[n]*y[n]


(If x and y are not the same length, then the shorter vector is elongated by starting again at the beginning.)

Functions

A list of common vector functions is shown in a later section. Here I briefly explain what they do.

Some functions evaluate all the element of a vector and return one number

  mean(x)   # arithmetic mean of numbers stored in x

  length(x) # number of values in a vector (includes missing)

  min(x)    # smallest value in the vector

  max(x)    # biggest value in the vector


Some functions return more than one evaluation. For example,

  range(x)  # returns min(x) and max(x) in a vector of length 2


Other functions evaluate each element separately and return a vector as long as the original

  log(x)    # natural log of each element


More complicated functions may bundle the multiple different results into a list object. I introduce the list in a later section below.

TRUE and FALSE

Vectors can be assigned logical measurements too, either directly or as the result of a logical operation. Here's an example of direct assignment.

  z <- c(TRUE, TRUE, FALSE)  # enter 3 logical values to vector z


Logical operations can identify and select vector elements meeting specified criteria. The logical operations are symbolized == (equal to), != (not equal to), < (less than0, <= (less than or equal to), and so on. For example, if the vector z contains the following numbers,

  z <- c(2,-1,3,99,8)

then the following operations yield the results shown on the right

  z<=3                   # TRUE TRUE TRUE FALSE FALSE

  !(z<3)                 # FALSE FALSE TRUE TRUE TRUE

  z[z!=3]                # 2 -1 99  8

  which(z>=4)            # 4 5

  is.vector(z)           # TRUE

  is.character(z)        # FALSE

  is.numeric(z)          # TRUE

  is.na(z)               # FALSE FALSE FALSE FALSE FALSE

  any(z<0)               # TRUE
  all(z>0)               # FALSE


The logical operators "&" and "|" refer to AND and OR. For example, if

  z <- c(-10, -5, -1, 0, 3, 92)

then the following operations yield the results shown on the right

  z < 0 & abs(z) > 5     # TRUE FALSE FALSE FALSE FALSE FALSE

  z[z < 0 | abs(z) > 5]  # -10  -5  -1  92



Useful vector functions

Here is a selection of useful functions for data vectors. Many of the functions will also work on other data objects such as data frames, possibly with different effects.

Display data

See the display submenu tab for more information on graphing and tabulating

  hist(x)      # for numerical data

  boxplot(x)   # for numerical data

  table(x)     # for categorical data


Transform numerical data

The most common data transformations, illustrated using the single variable "x".

  sqrt(x)          # square root

  sqrt(x+0.5)      # modified square root transformation

  log(x)           # the natural log of x

  log10(x)         # log base 10 of x

  exp(x)           # exponential ("antilog") of x

  abs(x)           # absolute value of x

  asin(sqrt(x))    # arcsine square root (used for proportions)


Statistics

Here are a few basic statistical functions on a numeric vector named x. Most of them will require the "na.rm=TRUE" option if the vector includes one or more missing values.

  sum(x)                 # the sum of values in x

  length(x)              # number of elements (including missing)

  mean(x)                # sample mean

  var(x)                 # sample variance

  sd(x)                  # sample standard deviation

  min(x)                 # smallest element in x

  max(x)                 # largest element in x

  range(x)               # smallest and largest elements in x

  median(x)              # median of elements in x

  quantile(x)            # quantiles of x


What am I?

These functions return TRUE or FALSE depending on the structure of x and its data type.

  is.vector(x)

  is.character(x)

  is.numeric(x)

  is.integer(x)

  is.factor(x)


Functions for character data

  casefold(x)            # convert to lower case

  casefold(x,upper=TRUE) # convert to upper case

  substr(x,2,4)          # extract 2nd to 4th characters

                         #   of each element of x

  paste(x,"ly",sep="")   # paste "ly" to end of each element

  nchar(x)               # no. of characters in each element of x

  grep("a",x)            # which elements contain letter "a" ?

  strsplit(x,"a")        # split x into pieces at the letter "a"


Other functions

  rm(x)                  # delete x from the R environment
  unique(x)              # unique values of x

  levels(x)              # treatment levels of x, if a factor

  sort(x)                # sort smallest to largest



Make a data frame

An R data frame is what you would usually think of as a data set, with columns representing variables and rows representing sampling units (e.g., subjects or plots). The data page (see submenu above) will say more about reading, managing and analyzing data frames. Here I show how to make them from vectors and to access their contents.

Combine vectors into a data frame

Make a data frame by combining vectors of the same length using the "data.frame" command. The vectors need not be of the same type -- you can keep numeric, character, and logical vectors in the same data frame. 

  quadrat <- c(1:7)

  site <- c(1,1,2,3,3,4,5)

  species <- c("a","b","b","a","c","b","a")

  mydata <- data.frame(quadrat,site,species,

                     stringsAsFactors=FALSE) # make a data frame

(The "stringsAsFactors=FALSE" is optional but recommended to preserve any character data -- see further explanation on the data page).

To see the data frame, enter its name in the command window

  mydata                                     # show mydata

  quadrat site species                       # output
1       1    1       a
2       2    1       b
3       3    2       b
4       4    3       a
5       5    3       c
6       6    4       b
7       7    5       a

Access variables in data frame

The columns of the data frame are the vectors (representing variables). Access them by name using the "$" symbol. 

  mydata$site        # the site vector
  mydata$quadrat     # the quadrat vector


Or, access variables using square brackets that include a comma. Integers before the comma refer to rows, integers after the comma indicate columns: [rows, columns].  

  mydata[ ,1]        # column 1, the quadrat vector

  mydata[ ,3]        # column 3, the species vector


Note that a single row of a data frame is not a vector. Rather, a single row of a data frame is still a data frame, so won't behave like a vector if a function is applied to it.

  mydata[2, ]        # row 2, still a data frame, not a vector


You can convert a single row of a data frame to a vector using "unlist". Be warned that this will convert all entries to the same data type (e.g., all to characters if at least one of the original variables is a character vector),

  unlist(mydata[2, ])  # row 2, converted to a vector


Access individual values or subsets of a data frame

Use integers in square brackets to access subsets of the data frame. Within the bracket, integers before the comma refer to rows, whereas integers after the comma indicate columns:   mydata[rows, columns].

For example, all three of the following commands extract the species measurement from quadrat 2 of "mydata" (the measurement is "b"). This measurement is stored in the second row of the third column of the data frame.

  mydata[2,3]        # 2nd row, 3rd column contents of data frame

  mydata$species[2]  # 2nd element of species vector

  mydata[, 3][2]     # 2nd element of 3rd column vector


Use rows and column indicators inside square brackets to access subsets of the data frame

  mydata[ ,c(2,3)]   # data frame containing columns 2 and 3 only

  mydata[ ,-1]       # data frame leaving out first column

  mydata[1:3,1:2]    # extract first 3 rows and first 2 columns


Useful data frame functions and operations

  str(mydata)                  # summary of variables included
  is.data.frame(mydata)        # TRUE or FALSE

  ncol(mydata)                 # number of columns in data frame

  nrow(mydata)                 # number of rows

  names(mydata)                # variable names

  names(mydata)[1] <- "quad"   # change 1st variable name to quad

  rownames(mydata)             # optional row names


Some vector functions can be applied to data frames too, but with different outcomes:

  length(mydata)        # number of variables in data frame

  var(mydata)           # covariance matrix between all variables



Make a matrix

A matrix is a bit like a data frame in that it too has rows and columns of measurements, but it is less flexible and is not as easy to work with. For example, all columns of a matrix must be of the same data type (i.e., all numerical, or all character data). However, some functions in R require a matrix argument not a data frame. Also, some functions in R return a matrix as output. Below is just a bare introduction.
 

Convert a vector to a matrix

Use "matrix" to reshape a vector into a matrix. For example, if

  x <- c(1,2,3,4,5,6)

then

  xmat <- matrix(x,nrow=2)

yields the matrix

     [,1] [,2] [,3]

[1,]    1    3    5

[2,]    2    4    6


and

  xmat <- matrix(x,nrow=2, byrow=TRUE)

yields the matrix

     [,1] [,2] [,3]

[1,]    1    2    3

[2,]    4    5    6


Make a matrix by binding vectors

Use "cbind" to bind columns of equal length to form a matrix. For example, if

  x <- c(1,2,3)

  y <- c(4,5,6)

then

  xmat <- cbind(x,y)

yields the matrix

     x y

[1,] 1 4

[2,] 2 5

[3,] 3 6


Convert a matrix to a data.frame

  mydata <- as.data.frame(xmat, stringsAsFactors = FALSE)

(The "stringsAsFactors=FALSE" is optional but recommended to preserve character data. I explain further on the data page.)

Convert a data frame to a matrix

You will rarely want to do this. It will convert all variables in the data frame to the same data type (e.g., all to characters if there is at least one character variable).

  xmat <- as.matrix(mydata)


Access subsets of a matrix

Use integers in square brackets to access subsets of a matrix. Within the bracket, integers before the comma refer to rows, whereas integers after the comma indicate columns: [rows, columns].

  xmat[2,3]       # value in the 2nd row, 3rd column of matrix

  xmat[, 2]       # 2nd column of matrix (result is a vector)

  xmat[2, ]       # 2nd row of matrix (result is a vector)

  xmat[ ,c(2,3)]  # matrix with columns 2 and 3 only

  xmat[-1, ]      # matrix leaving out first column

  xmat[1:3,1:2]   # submatrix of first 3 rows and first 2 columns


Useful matrix functions

  dim(xmat)     # dimensions (rows & columns) of a matrix

  ncol(xmat)    # number of columns in matrix

  nrow(xmat)    # number of rows

  t(xmat)       # transpose a matrix



Make a list

A list is a collection of R objects bundled together. The individual objects can be vectors, matrices, data frames, and even other lists. The different objects needn't have the same number of rows or columns. Many functions return results as a list, and so it is useful to know how to work with them.

Create list

To create a list containing two vectors, use the list command. For example, if

  x <- c(1,2,3,4,5)

  y <- c("a","b","c","d","e")

then one of the following commands creates a list containing the two vectors x and y

  mylist <- list(x,y)              # components of list unnamed

  mylist <- list(name1=x,name2=y)  # names the list components


Entering "mylist" in the R command window shows the contents of the list, which is

[[1]]

[1] 1 2 3 4 5


[[2]]

[1] "a" "b" "c" "d" "e"

if the components were left unnamed, or

$name1

[1] 1 2 3 4 5


$name2

[1] "a" "b" "c" "d" "e"

if you named the list components.

Add an object to a preexising list

Use the "$"symbol to name a new object in the list

  mylist$newvar <- z


Access list components

To grab one of the components of a list, use "$" if the components are named. 

  mylist$name2   # the 2nd list component (named), here a vector


Or, use an integer in double square brackets, [[i]], to indicate the ith component of the list (works whether or not the components are named),

  mylist[[2]]     # the 2nd list component, here a vector

  mylist[[1]][4]  # the 4th element of the 1st list component


Useful list functions

  names(mylist)              # NULL if components are unnamed

  unlist(mylist)             # collapse list to a single vector



Cope with missing values

Missing values in R are indicated with NA.

  x[5]<- NA         # change the 5th element of x to missing

  x[x == -99] <- NA # change all instances of -99 in x to missing

  which(is.na(x))   # identify which element(s) is missing


Some functions will treat NA as valid entries. For example, the length of a vector (number of elements) includes missing values in the count.

  length(x)


In this case, if you want only non-missing values included,

  x <- na.omit(x)     # drop the missing values in x

  x <- x[!is.na(x)])  # select the non-missing values in x


Some functions won't work on variables with missing values unless default options are modified. For example, if you try to calculate the mean of numbers in a vector that contains missing values you will get NA as your result.

  x <- c(1,2,3,4,5,NA)  # a vector with one missing value

  mean(x)               # result is NA


To cope, specify that missing values first be removed

  mean(x, na.rm = TRUE)



Write your own function

If R is missing a needed function write your own. Here's an example of a function named "sep" that calculates the standard error of an estimate of a proportion. You would use it if you took a random sample of size "n" from a population and counted the number, "X", that are in a given state (e.g., the number that are female, or the number that have parasites).

  sep <- function(X, n){

        # This is a comment line, useful for keeping notes.

        # This function calculates a standard error of

        # a proportion using two quantities provided.
        # This function has two arguments, "X" and "n".

        # "n" is the number of trials (sample size).

        # "X" is the number of successes.

        # First, estimate the proportion of successes, p.

        p.hat <- X / n

        # The standard error of p.hat is then

        sep <- sqrt( p.hat*(1-p.hat)/(n-1) )
        # Return the standard error as the result:
        return(sep)

        }


To use the function,  copy it to your clipboard. Then paste it into your command window and hit the enter key. (On a Mac, you may need to use the R Edit menu to "Paste as Plain Text" to avoid formatting problems.)  The function "sep" will be stored in your R workspace so you only need to paste it once (if you save your workspace when you exit R it will remain there when you start up again -- otherwise you'll need to paste it in again).

To use the function on some data, for example n=20 and X=10, enter

  sep(X=10, n=20) # or

  sep(10,20)      # ok if X and n are given in correct order



Write a loop to repeat a function

Loops are useful when you want to repeat a function or operation many times.

Here's a very simple loop that repeats the same command 5 times. The variable "i" is just a counter that starts at 1 and increases by 1 each time the commands between the brackets "{ }" are executed.

  for(i in 1:5){

    print("yes we can")

    }


This next examples uses the counter to access a different element of a vector each time the loop is repeated. The following example prints the i'th element of the variable "x" on each iteration 

  x <- c(2,-1,3,99,8)

  for(i in 1:length(x)){

     print(x[i])    # use "print" to force printing inside loops

     }