# Calculate with vectors

This page explains how to get started doing basic calculations and using simple vector functions.

## Introduction to vectors

A vector is a simple array of numbers or characters, such as the measurements of a single variable on a sample of individuals. R makes it easy to carry out mathematical operations and functions to all the values in a vector at once.

### Enter measurements

Use the left arrow “<-” (“less than” sign followed by a dash) and the `c`

function (for concatenate) to create a vector containing a set of measurements.

x <- c(11,42,-3,14,5) # store these 5 numbers in vector x x <- c(1:10) # store integers 1 to 10 x <- c("Watson","Crick","Wilkins") # use quotes for character data

Use the `seq`

function to generate a sequence of numbers and store in a vector,

x <- seq(0, 10, by=0.1) # 0, 0.1, 0.2, ... 9.9, 10

(note: `seq`

results that include decimals may not be exact — the result “0.2” may not be exactly equal to the number 0.2 unless rounded using the “round” command)

Use `rep`

to repeat values a specified number of times and store to a vector,

x <- rep(c(1,2,3), c(2,1,4)) # 1 1 2 3 3 3 3

To view contents of any object, including a vector, type its name and enter, or use the `print`

command,

x # print "x" to the screen print(x) # do the same

### Delete a vector

The following command removes the vector `x`

from the local R environment.

rm(x)

### Access elements of a vector

Use integers in square brackets to indicate specific elements of a vector. For example,

x[5] # 5th value of the vector x x[2:6] # 2nd through 6th elements of x x[2:length(x)] # everything but the first element x[-1] # everything but the first element x[5] <- 4.2 # change the value of the 5th element to 4.2

### Math with vectors

These operations are carried out on every element of the vector

x + 1 # add 1 to each element of x x^2 # square each element of x x/2 # divide each element of x by 2 10 * x # multiply each element of x by 10

Operations on two vectors x and y work best when both are the same length (have the same number of elements). For example

x * y # yields a new vector whose # elements are x[1]*y[1], x[2]*y[2], ... x[n]*y[n]

If `x`

and `y`

are not the same length, then the shorter vector is elongated by starting again at the beginning.

## Useful vector functions

Here is a selection of useful functions for data vectors. Many of the functions will also work on other data objects such as data frames, possibly with different effects.

### Transform numerical data

The most common data transformations, illustrated using the single variable `x`

.

sqrt(x) # square root sqrt(x + 0.5) # modified square root transformation log(x) # the natural log of x log10(x) # log base 10 of x exp(x) # exponential ("antilog") of x abs(x) # absolute value of x asin(sqrt(x)) # arcsine square root (used for proportions)

### Statistics

Here are a few basic statistical functions on a numeric vector named `x`

. Most of them will require the `na.rm=TRUE`

option if the vector includes one or more missing values.

sum(x) # the sum of values in x length(x) # number of elements (including missing) mean(x) # sample mean var(x) # sample variance sd(x) # sample standard deviation min(x) # smallest element in x max(x) # largest element in x range(x) # smallest and largest elements in x median(x) # median of elements in x quantile(x) # quantiles of x unique(x) # extracts only the unique values of x sort(x) # sort, smallest to largest weighted.mean(x, w) # weighted mean

### Functions for character data (strings)

casefold(x) # convert to lower case casefold(x, upper=TRUE) # convert to upper case substr(x, 2, 4) # extract 2nd to 4th characters of each element of x paste(x, "ly", sep="") # paste "ly" to the end of each element in x nchar(x) # no. of characters in each element of x grep("a", x) # which elements of x contain letter "a" ? grep("a|b", x) # which elements of x contain letter "a" or letter "b"? strsplit(x, "a") # split x into pieces wherever the letter "a" occurs

### Functions for factors

A factor is like a character variable except that its unique values represent “levels” that have names but also have a numerical interpretation. The following commands are useful if `x`

is a factor variable (a vector).

levels(x) # show the unique values of a factor variable droplevels(x) # delete unused levels of a factor variable as.character(x) # convert values of a factor to character strings instead as.numeric(as.character(x)) # convert numbers in "x" from factors to numeric type

### TRUE and FALSE (logical) data

Vectors can be assigned logical measurements, directly or as the result of a logical operation. Here’s an example of direct assignment.

z <- c(TRUE, TRUE, FALSE) # put 3 logical values to a vector z

Logical operations can identify and select those vector elements for which a condition is TRUE. The comparison operations include

== (equal to) != (not equal to) < (less than) <= (less than or equal to) %in% (is an element of)

and so on.

For example, put the following numbers into a vector `z`

,

z <- c(2, -1, 3, 99, 8 )

The following logical operations and functions yield the results shown on the right

z <= 3 # TRUE TRUE TRUE FALSE FALSE (for each element of z) !(z < 3) # FALSE FALSE TRUE TRUE TRUE z[z != 3] # 2 -1 99 8, the elements of z for which the condition is TRUE which(z >= 4) # 4 5, the indices for elements of z satisfying the condition is.na(z) # FALSE FALSE FALSE FALSE FALSE any(z < 0) # TRUE all(z > 0) # FALSE 99 %in% z # TRUE 100 %in% z # FALSE

The logical operators “&” and “|” refer to AND and OR. For example, put the following numbers into a vector `z`

,

z <- c(-10, -5, -1, 0, 3, 92)

The following operations yield the results shown on the right

z < 0 & abs(z) > 5 # TRUE FALSE FALSE FALSE FALSE FALSE z[z < 0 | abs(z) > 5] # -10 -5 -1 92

### What am I?

These functions return TRUE or FALSE depending on the structure of `x`

and its data type.

is.vector(x) is.character(x) is.numeric(x) is.integer(x) is.factor(x)

## Combine vectors to make a data frame

Vectors representing different variables measured made on the same unit can be made into columns of a data frame. A data frame is a spreadsheet-like object containing a data set. See the “Data” tab for tips on working with data frames. Here we show how to make a data frame by combining vectors of the same length. The vectors need not be of the same data type.

First, obtain your vectors. For example,

quadrat <- c(1:7) site <- c(1,1,2,3,3,4,5) species <- c("a","b","b","a","c","b","a")

Now combine them into a data frame named `mydata`

.

mydata <- data.frame(quadrat = quadrat, site = site, species = species, stringsAsFactors = FALSE)

The argument `stringsAsFactors = FALSE`

is optional but recommended to preserve character data (otherwise character variables are converted to factors).

You can accomplish the same job using the `tibble`

command in the `dplyr`

package (you'll need to install the package if you have not already done so using `install.packages()`

).

library(dplyr) # load package mydata <- tibble(quadrat = quadrat, site = site, species = species) # dplyr method

## Deal with missing values

Missing values in R are indicated with NA.

x[5] <- NA # assign "missing" to the 5th element of x x[x == -99] <- NA # change all instances of -99 in x to missing which(is.na(x)) # identify which element(s) is missing

Some functions will treat NA as valid entries. For example, the length of a vector (number of elements) includes missing values in the count.

length(x)

Some functions won't work on variables that include missing values unless default options are modified. For example, if you try to calculate the mean of a vector that contains missing values you will get `NA`

as your result. Most functions have an option "na.rm" that ignores the missing values when calculating.

x <- c(1,2,3,4,5,NA) # a vector with one missing value mean(x) # result is NA mean(x, na.rm = TRUE) # result is the mean of non-missing values of x

As usual, there's more than one way to solve the problem. For example, you can create a new variable that contains only the non-missing values, but this requires an extra step so it not preferred:

x1 <- na.omit(x) # put the non-missing values of x into new vector x1 x1 <- x[complete.cases(x)] # same x1 <- x[!is.na(x)]) # same length(x1) # count the number of non-missing values

## Write your own function

If R is missing a needed function write your own. Here's an example of a function named `sep()`

that calculates the standard error of an estimate of a proportion. The argument `n`

refers to sample size, and `X`

is the number of "successes" (e.g., the number of females in the sample, the number of infected individuals, etc.).

sep <- function(X, n){ p.hat <- X / n # The proportion of "successes" sep <- sqrt(p.hat*(1-p.hat)/n) # The standard error of p.hat return(sep) # Return the standard error as the result }

To use the function, copy it to your clipboard. Then paste it into your command window and hit the enter key. (On a Mac, you may need to use the R Edit menu to "Paste as Plain Text" to avoid formatting problems.) The function `sep()`

will be stored in your R workspace so you only need to paste it once. If you save your workspace when you exit R it will remain there when you start up again -- otherwise you'll need to paste it in again.

To use the function on some data, for example `n`

=20 and `X`

=10, enter

sep(X = 10, n = 20) # yields the standard error sep(10,20) # shorthand ok if X and n are given in correct order

## Paste clipboard contents to a vector

To demonstrate, select the following 10 numbers with your mouse and copy to your clipboard:

76 75 -52 -70 52 8 -50 -6 57 5

(choose Edit -> Copy on your browser menu to copy to clipboard)

Then execute the following command in your R command window:

z <- scan("clipboard", what=numeric()) # on a PC z <- scan(pipe("pbpaste"), what=numeric()) # on a Mac

To paste characters instead of numbers, use the following,

z <- scan("clipboard", what=character()) # PC z <- scan(pipe("pbpaste"), what=character()) # Mac

If characters or numbers of interest are separated by commas, use

z <- scan("clipboard", what=character(), sep=",") # PC z <- scan(pipe("pbpaste"), what=character(), sep=",") # Mac