R tips

These pages provide an introduction to R, emphasizing topics in the workshop. More complete help pages can be found on Dolph Schluter's web site here.

This start page outlines help available and introduces the basic use of vectors and other types of data objects in R. See the data submenu item for further information on input, management and analysis of full data sets.


Keep a script file

Use a text file to write and edit your R commands. This keeps a record of your analyses for later use, and makes it easier to rerun and modify analyses as data collection continues. Add comments to the text file to help you remember how and why you did that particular analysis -- essential when reviewing it weeks (years?) later. R treats text lines beginning with a # symbol as comments.

R has a built-in editor that makes it easy to submit commands to the command line. To start a new text file, go to File on the menu and select "New Document" (Mac) or "New script" (PC). Save to a file with the ".R" extension. To open a preexisting file, choose "Open Document" or "Open script" from the File menu. Commands typed to this file can be passed to the command line by selecting and then pressing the keys <command><return> (Mac) or <control>R (PC).

(If R is not running and you double click a ".R" file later, R will start up but might not load the workspace properly. If this happens, enter load(".RData") in the command window.)


Start with vectors

A vector is a simple array of numbers or characters, such as the measurements of a single variable on a sample of individuals. It is the best way to store numbers and character strings (words). One of the great things about R is that mathematical operations and functions can be applied at once to all the values.

Enter measurements

Use the left arrow  "<-"  ("less than" sign followed by a dash) and the "c" function (for concatenate) to create a vector containing a set of measurements. 

  x <- c(11,42,-3,14,5)              # store 5 values in vector x

  x <- c(1:10)                       # store integers 1 to 10

  x <- c("Watson","Crick","Wilkins") # quotes for character data


Use the "seq" function to generate and store a sequence of numbers to a vector,

  x <- seq(0,10,by=0.1)    # 0, 0.1, 0.2, ... 9.9, 10

(note: seq results that include decimals may not be exact -- the result "0.2" may not be exactly equal to the number 0.2 unless rounded using the "round" command)

Use "rep" to repeat values a specified number of times and store to a vector,

  x <- rep(c(1,2,3),c(2,1,4))        # 1 1 2 3 3 3 3


To view contents of any object, including a vector, type its name and enter, or use "print" command,

  x

  print(x)


Paste to a vector

You can also use paste measurements into a vector from the clipboard. To demonstrate, copy the following 10 numbers to your clipboard: 76  75 -52 -70  52   8 -50  -6  57   5
(i.e., select the numbers with your mouse and then choose Edit -> Copy on your browser menu to copy to clipboard). Then execute the following command in your R command window:

  z <- scan("clipboard", what=numeric())             # on a PC

  z <- scan(pipe("pbpaste"), what=numeric())         # on a Mac


To paste characters instead of numbers, use the following,

  z <- scan("clipboard", what=character())           # PC

  z <- scan(pipe("pbpaste"), what=character())       # Mac


If characters or numbers of interest are separated by commas, use

  z <- scan("clipboard", what=character(), sep=",")      # PC

  z <- scan(pipe("pbpaste"), what=character(), sep=",")  # Mac


Access individual values

Use integers in square brackets to indicate specific elements of a vector. For example,  

  x[5]           # 5th value of the vector x

  x[2:6]         # 2nd through 6th elements

  x[2:length(x)] # everything but the first element

  x[-1]          # everything but the first element

  x[5] <- 4.2    # change 5th value to 4.2


Carry out mathematical operations

The following are operations involving one vector

  x+1            # add 1 to each element of x

  x^2            # square each element of x

  x/2            # divide each element of x by 2

  10*x           # multiply each element of x by 10


Operations involving two vectors are easiest to handle when both are the same length (have the same number of elements). For example, if x and y are two numeric vectors of the same length n, then

  x*y

yields a new vector whose elements are  

  x[1]*y[1], x[2]*y[2], ... x[n]*y[n]


(If x and y are not the same length, then the shorter vector is elongated by starting again at the beginning.)

Functions

A list of common vector functions is shown in a later section. Here I briefly explain what they do.

Some functions evaluate all the element of a vector and return one number

  mean(x)   # arithmetic mean of numbers stored in x

  length(x) # number of values in a vector (includes missing)

  min(x)    # smallest value in the vector

  max(x)    # biggest value in the vector


Some functions return more than one evaluation. For example,

  range(x)  # returns min(x) and max(x) in a vector of length 2


Other functions evaluate each element separately and return a vector as long as the original

  log(x)    # natural log of each element


More complicated functions may bundle the multiple different results into a list object. I introduce the list in a later section below.

TRUE and FALSE

Vectors can be assigned logical measurements too, either directly or as the result of a logical operation. Here's an example of direct assignment.

  z <- c(TRUE, TRUE, FALSE)  # enter 3 logical values to vector z


Logical operations can identify and select vector elements meeting specified criteria. The logical operations are symbolized == (equal to), != (not equal to), < (less than0, <= (less than or equal to), and so on. For example, if the vector z contains the following numbers,

  z <- c(2,-1,3,99,8)

then the following operations yield the results shown on the right

  z<=3                   # TRUE TRUE TRUE FALSE FALSE

  !(z<3)                 # FALSE FALSE TRUE TRUE TRUE

  z[z!=3]                # 2 -1 99  8

  which(z>=4)            # 4 5

  is.vector(z)           # TRUE

  is.character(z)        # FALSE

  is.numeric(z)          # TRUE

  is.na(z)               # FALSE FALSE FALSE FALSE FALSE

  any(z<0)               # TRUE
  all(z>0)               # FALSE


The logical operators "&" and "|" refer to AND and OR. For example, if

  z <- c(-10, -5, -1, 0, 3, 92)

then the following operations yield the results shown on the right

  z < 0 & abs(z) > 5     # TRUE FALSE FALSE FALSE FALSE FALSE

  z[z < 0 | abs(z) > 5]  # -10  -5  -1  92



Useful vector functions

Here is a selection of useful functions for data vectors. Many of the functions will also work on other data objects such as data frames, possibly with different effects.

Display data

See the display submenu tab for more information on graphing and tabulating

  hist(x)      # for numerical data

  boxplot(x)   # for numerical data

  table(x)     # for categorical data


Transform numerical data

The most common data transformations, illustrated using the single variable "x".

  sqrt(x)          # square root

  sqrt(x+0.5)      # modified square root transformation

  log(x)           # the natural log of x

  log10(x)         # log base 10 of x

  exp(x)           # exponential ("antilog") of x

  abs(x)           # absolute value of x

  asin(sqrt(x))    # arcsine square root (used for proportions)


Statistics

Here are a few basic statistical functions on a numeric vector named x. Most of them will require the "na.rm=TRUE" option if the vector includes one or more missing values.

  sum(x)                 # the sum of values in x

  length(x)              # number of elements (including missing)

  mean(x)                # sample mean

  var(x)                 # sample variance

  sd(x)                  # sample standard deviation

  min(x)                 # smallest element in x

  max(x)                 # largest element in x

  range(x)               # smallest and largest elements in x

  median(x)              # median of elements in x

  quantile(x)            # quantiles of x


What am I?

These functions return TRUE or FALSE depending on the structure of x and its data type.

  is.vector(x)

  is.character(x)

  is.numeric(x)

  is.integer(x)

  is.factor(x)


Functions for character data

  casefold(x)            # convert to lower case

  casefold(x,upper=TRUE) # convert to upper case

  subsrt(x,2,4)          # extract 2nd to 4th characters

                         #   of each element of x

  paste(x,"ly",sep="")   # paste "ly" to end of each element

  nchar(x)               # no. of characters in each element of x

  grep("a",x)            # which elements contain letter "a" ?

  strsplit(x,"a")        # split x into pieces at the letter "a"


Other functions

  rm(x)                  # delete x from the R environment
  unique(x)              # unique values of x

  levels(x)              # treatment levels of x, if a factor

  sort(x)                # sort smallest to largest


Cope with missing values

Missing values in R are indicated with NA.

  x[5]<- NA         # change the 5th element of x to missing

  x[x == -99] <- NA # change all instances of -99 in x to missing

  which(is.na(x))   # identify which element(s) is missing


Some functions will treat NA as valid entries. For example, the length of a vector (number of elements) includes missing values in the count.

  length(x)


In this case, if you want only non-missing values included,

  x <- na.omit(x)     # drop the missing values in x

  x <- x[!is.na(x)])  # select the non-missing values in x


Some functions won't work on variables with missing values unless default options are modified. For example, if you try to calculate the mean of numbers in a vector that contains missing values you will get NA as your result.

  x <- c(1,2,3,4,5,NA)  # a vector with one missing value

  mean(x)               # result is NA


To cope, specify that missing values first be removed

  mean(x, na.rm = TRUE)