R tips
These pages provide an introduction to R, emphasizing topics in the workshop. More complete help pages can be found on Dolph Schluter's web site here.This start page outlines help available and introduces the basic use of vectors and other types of data objects in R. See the data submenu item for further information on input, management and analysis of full data sets.
Keep a script file
Use a text file to write and edit your R commands. This keeps a record of your analyses for later use, and makes it easier to rerun and modify analyses as data collection continues. Add comments to the text file to help you remember how and why you did that particular analysis -- essential when reviewing it weeks (years?) later. R treats text lines beginning with a # symbol as comments.R has a built-in editor that makes it easy to submit commands to the command line. To start a new text file, go to File on the menu and select "New Document" (Mac) or "New script" (PC). Save to a file with the ".R" extension. To open a preexisting file, choose "Open Document" or "Open script" from the File menu. Commands typed to this file can be passed to the command line by selecting and then pressing the keys <command><return> (Mac) or <control>R (PC).
(If R is not running and you double click a ".R" file later, R will start up but might not load the workspace properly. If this happens, enter load(".RData") in the command window.)
Start with vectors
A vector is a simple array of numbers or characters, such as the measurements of a single variable on a sample of individuals. It is the best way to store numbers and character strings (words). One of the great things about R is that mathematical operations and functions can be applied at once to all the values.Enter measurements
Use the left arrow "<-" ("less than" sign followed by a dash) and the "c" function (for concatenate) to create a vector containing a set of measurements.x <- c(11,42,-3,14,5) # store 5 values in vector x
x <- c(1:10) # store integers 1 to 10
x <- c("Watson","Crick","Wilkins") # quotes for character data
Use the "seq" function to generate and store a sequence of numbers to a vector,
x <- seq(0,10,by=0.1) # 0, 0.1, 0.2, ... 9.9, 10
(note: seq results that include decimals may not be exact -- the result "0.2" may not be exactly equal to the number 0.2 unless rounded using the "round" command)Use "rep" to repeat values a specified number of times and store to a vector,
x <- rep(c(1,2,3),c(2,1,4)) # 1 1 2 3 3 3 3
To view contents of any object, including a vector, type its name and enter, or use "print" command,
x
print(x)
Paste to a vector
You can also use paste measurements into a vector from the clipboard. To demonstrate, copy the following 10 numbers to your clipboard: 76 75 -52 -70 52 8 -50 -6 57 5(i.e., select the numbers with your mouse and then choose Edit -> Copy on your browser menu to copy to clipboard). Then execute the following command in your R command window:
z <- scan("clipboard", what=numeric()) # on a PC
z <- scan(pipe("pbpaste"), what=numeric()) # on a Mac
To paste characters instead of numbers, use the following,
z <- scan("clipboard", what=character()) # PC
z <- scan(pipe("pbpaste"), what=character()) # Mac
If characters or numbers of interest are separated by commas, use
z <- scan("clipboard", what=character(), sep=",") # PC
z <- scan(pipe("pbpaste"), what=character(), sep=",") # Mac
Access individual values
Use integers in square brackets to indicate specific elements of a vector. For example,x[5] # 5th value of the vector x
x[2:6] # 2nd through 6th elements
x[2:length(x)] # everything but the first element
x[-1] # everything but the first element
x[5] <- 4.2 # change 5th value to 4.2
Carry out mathematical operations
The following are operations involving one vectorx+1 # add 1 to each element of x
x^2 # square each element of x
x/2 # divide each element of x by 2
10*x # multiply each element of x by 10
Operations involving two vectors are easiest to handle when both are the same length (have the same number of elements). For example, if x and y are two numeric vectors of the same length n, then
x*y
yields a new vector whose elements arex[1]*y[1], x[2]*y[2], ... x[n]*y[n]
(If x and y are not the same length, then the shorter vector is elongated by starting again at the beginning.)
Functions
A list of common vector functions is shown in a later section. Here I briefly explain what they do.Some functions evaluate all the element of a vector and return one number
mean(x) # arithmetic mean of numbers stored in x
length(x) # number of values in a vector (includes missing)
min(x) # smallest value in the vector
max(x) # biggest value in the vector
Some functions return more than one evaluation. For example,
range(x) # returns min(x) and max(x) in a vector of length 2
Other functions evaluate each element separately and return a vector as long as the original
log(x) # natural log of each element
More complicated functions may bundle the multiple different results into a list object. I introduce the list in a later section below.
TRUE and FALSE
Vectors can be assigned logical measurements too, either directly or as the result of a logical operation. Here's an example of direct assignment.z <- c(TRUE, TRUE, FALSE) # enter 3 logical values to vector z
Logical operations can identify and select vector elements meeting specified criteria. The logical operations are symbolized == (equal to), != (not equal to), < (less than0, <= (less than or equal to), and so on. For example, if the vector z contains the following numbers,
z <- c(2,-1,3,99,8)
then the following operations yield the results shown on the rightz<=3 # TRUE TRUE TRUE FALSE FALSE
!(z<3) # FALSE FALSE TRUE TRUE TRUE
z[z!=3] # 2 -1 99 8
which(z>=4) # 4 5
is.vector(z) # TRUE
is.character(z) # FALSE
is.numeric(z) # TRUE
is.na(z) # FALSE FALSE FALSE FALSE FALSE
any(z<0)
# TRUE
all(z>0)
# FALSE
The logical operators "&" and "|" refer to AND and OR. For example, if
z <- c(-10, -5, -1, 0, 3, 92)
then the following operations yield the results shown on the rightz < 0 & abs(z) > 5 # TRUE FALSE FALSE FALSE FALSE FALSE
z[z < 0 | abs(z) > 5] # -10 -5 -1 92
Useful vector functions
Here is a selection of useful functions for data vectors. Many of the functions will also work on other data objects such as data frames, possibly with different effects.Display data
See the display submenu tab for more information on graphing and tabulatinghist(x) # for numerical data
boxplot(x) # for numerical data
table(x) # for categorical data
Transform numerical data
The most common data transformations, illustrated using the single variable "x".sqrt(x) # square root
sqrt(x+0.5) # modified square root transformation
log(x) # the natural log of x
log10(x) # log base 10 of x
exp(x) # exponential ("antilog") of x
abs(x) # absolute value of x
asin(sqrt(x)) # arcsine square root (used for proportions)
Statistics
Here are a few basic statistical functions on a numeric vector named x. Most of them will require the "na.rm=TRUE" option if the vector includes one or more missing values.sum(x) # the sum of values in x
length(x) # number of elements (including missing)
mean(x)
# sample mean
var(x) # sample variance
sd(x) # sample standard deviation
min(x) # smallest element in x
max(x) # largest element in x
range(x) # smallest and largest elements in x
median(x) # median of elements in x
quantile(x) # quantiles of x
What am I?
These functions return TRUE or FALSE depending on the structure of x and its data type.is.vector(x)
is.character(x)
is.numeric(x)
is.integer(x)
is.factor(x)
Functions for character data
casefold(x) # convert to lower case
casefold(x,upper=TRUE) # convert to upper case
subsrt(x,2,4) # extract 2nd to 4th characters
# of each element of x
paste(x,"ly",sep="") # paste "ly" to end of each element
nchar(x) # no. of characters in each element of x
grep("a",x) # which elements contain letter "a" ?
strsplit(x,"a") # split x into pieces at the letter "a"
Other functions
rm(x)
# delete x from
the R environment
unique(x)
# unique values of x
levels(x) # treatment levels of x, if a factor
sort(x) # sort smallest to largest
Cope with missing values
Missing values in R are indicated with NA.x[5]<- NA # change the 5th element of x to missing
x[x == -99] <- NA # change all instances of -99 in x to missing
which(is.na(x)) # identify which element(s) is missing
Some functions will treat NA as valid entries. For example, the length of a vector (number of elements) includes missing values in the count.
length(x)
In this case, if you want only non-missing values included,
x <- na.omit(x) # drop the missing values in x
x <- x[!is.na(x)]) # select the non-missing values in x
Some functions won't work on variables with missing values unless default options are modified. For example, if you try to calculate the mean of numbers in a vector that contains missing values you will get NA as your result.
x <- c(1,2,3,4,5,NA) # a vector with one missing value
mean(x) # result is NA
To cope, specify that missing values first be removed