# Calculate with vectors

This page explains how to get started doing basic calculations and using simple vector functions.

## Introduction to vectors

A vector is a simple array of numbers or characters, such as the measurements of a single variable on a sample of individuals. R makes it easy to carry out mathematical operations and functions to all the values in a vector at once.

### Enter measurements

Use the left arrow “<-” (“less than” sign followed by a dash) and the `c` function (for concatenate) to create a vector containing a set of measurements.

```x <- c(11,42,-3,14,5)              # store these 5 numbers in vector x
x <- c(1:10)                       # store integers 1 to 10
x <- c("Watson","Crick","Wilkins") # use quotes for character data```

Use the `seq` function to generate a sequence of numbers and store in a vector,

```x <- seq(0, 10, by=0.1)            # 0, 0.1, 0.2, ... 9.9, 10
```

(note: `seq` results that include decimals may not be exact — the result “0.2” may not be exactly equal to the number 0.2 unless rounded using the “round” command)

Use `rep` to repeat values a specified number of times and store to a vector,

```x <- rep(c(1,2,3), c(2,1,4))       # 1 1 2 3 3 3 3
```

To view contents of any object, including a vector, type its name and enter, or use the `print` command,

```x                  # print "x" to the screen
print(x)           # do the same
```

### Delete a vector

The following command removes the vector `x` from the local R environment.

```rm(x)
```

### Access elements of a vector

Use integers in square brackets to indicate specific elements of a vector. For example,

```x[5]           # 5th value of the vector x
x[2:6]         # 2nd through 6th elements of x
x[2:length(x)] # everything but the first element
x[-1]          # everything but the first element
x[5] <- 4.2    # change the value of the 5th element to 4.2```

### Math with vectors

These operations are carried out on every element of the vector

```x + 1          # add 1 to each element of x
x^2            # square each element of x
x/2            # divide each element of x by 2
10 * x         # multiply each element of x by 10```

Operations on two vectors x and y work best when both are the same length (have the same number of elements). For example

```x * y        # yields a new vector whose
# elements are x[1]*y[1], x[2]*y[2], ... x[n]*y[n]
```

If `x` and `y` are not the same length, then the shorter vector is elongated by starting again at the beginning.

## Useful vector functions

Here is a selection of useful functions for data vectors. Many of the functions will also work on other data objects such as data frames, possibly with different effects.

### Transform numerical data

The most common data transformations, illustrated using the single variable `x`.

```sqrt(x)          # square root
sqrt(x + 0.5)    # modified square root transformation
log(x)           # the natural log of x
log10(x)         # log base 10 of x
exp(x)           # exponential ("antilog") of x
abs(x)           # absolute value of x
asin(sqrt(x))    # arcsine square root (used for proportions)
```

### Statistics

Here are a few basic statistical functions on a numeric vector named `x`. Most of them will require the `na.rm=TRUE` option if the vector includes one or more missing values.

```sum(x)                 # the sum of values in x
length(x)              # number of elements (including missing)
mean(x)                # sample mean
var(x)                 # sample variance
sd(x)                  # sample standard deviation
min(x)                 # smallest element in x
max(x)                 # largest element in x
range(x)               # smallest and largest elements in x
median(x)              # median of elements in x
quantile(x)            # quantiles of x
unique(x)              # extracts only the unique values of x
sort(x)                # sort, smallest to largest
weighted.mean(x, w)    # weighted mean
```

### Functions for character data (strings)

```casefold(x)              # convert to lower case
casefold(x, upper=TRUE)  # convert to upper case
substr(x, 2, 4)          # extract 2nd to 4th characters of each element of x
paste(x, "ly", sep="")   # paste "ly" to the end of each element in x
nchar(x)                 # no. of characters in each element of x
grep("a", x)             # which elements of x contain letter "a" ?
grep("a|b", x)           # which elements of x contain letter "a" or letter "b"?
strsplit(x, "a")         # split x into pieces wherever the letter "a" occurs```

### Functions for factors

A factor is like a character variable except that its unique values represent “levels” that have names but also have a numerical interpretation. The following commands are useful if `x` is a factor variable (a vector).

```levels(x)                   # show the unique values of a factor variable
droplevels(x)               # delete unused levels of a factor variable
as.character(x)             # convert values of a factor to character strings instead
as.numeric(as.character(x)) # convert numbers in "x" from factors to numeric type
```

### TRUE and FALSE (logical) data

Vectors can be assigned logical measurements, directly or as the result of a logical operation. Here’s an example of direct assignment.

`z <- c(TRUE, TRUE, FALSE)  # put 3 logical values to a vector z`

Logical operations can identify and select those vector elements for which a condition is TRUE. The comparison operations include

``` ==  (equal to)
!=  (not equal to)
<   (less than)
<=  (less than or equal to)
%in% (is an element of)
```

and so on.

For example, put the following numbers into a vector `z`,

```z <- c(2, -1, 3, 99, 8 )
```

The following logical operations and functions yield the results shown on the right

```z <= 3             # TRUE TRUE TRUE FALSE FALSE (for each element of z)
!(z < 3)           # FALSE FALSE TRUE TRUE TRUE
z[z != 3]          # 2 -1 99  8, the elements of z for which the condition is TRUE
which(z >= 4)      # 4 5, the indices for elements of z satisfying the condition
is.na(z)           # FALSE FALSE FALSE FALSE FALSE
any(z < 0)         # TRUE
all(z > 0)         # FALSE
99 %in% z          # TRUE
100 %in% z         # FALSE
```

The logical operators “&” and “|” refer to AND and OR. For example, put the following numbers into a vector `z`,

```z <- c(-10, -5, -1, 0, 3, 92)
```

The following operations yield the results shown on the right

```z < 0 & abs(z) > 5     # TRUE FALSE FALSE FALSE FALSE FALSE
z[z < 0 | abs(z) > 5]  # -10  -5  -1  92
```

### What am I?

These functions return TRUE or FALSE depending on the structure of `x` and its data type.

```is.vector(x)
is.character(x)
is.numeric(x)
is.integer(x)
is.factor(x)```

## Combine vectors to make a data frame

Vectors representing different variables measured made on the same unit can be made into columns of a data frame. A data frame is a spreadsheet-like object containing a data set. See the “Data” tab for tips on working with data frames. Here we show how to make a data frame by combining vectors of the same length. The vectors need not be of the same data type.

First, obtain your vectors. For example,

```quadrat <- c(1:7)
site <- c(1,1,2,3,3,4,5)
species <- c("a","b","b","a","c","b","a")
```

Now combine them into a data frame named `mydata`.

```mydata <- data.frame(quadrat = quadrat, site = site, species = species,
stringsAsFactors = FALSE)
```

The argument `stringsAsFactors = FALSE` is optional but recommended to preserve character data (otherwise character variables are converted to factors).

You can accomplish the same job using the `tibble` command in the `dplyr` package (you'll need to install the package if you have not already done so using `install.packages()`).

```library(dplyr)                                                       # load package
mydata <- tibble(quadrat = quadrat, site = site, species = species)  # dplyr method
```

## Deal with missing values

Missing values in R are indicated with NA.

```x[5] <- NA        # assign "missing" to the 5th element of x
x[x == -99] <- NA # change all instances of -99 in x to missing
which(is.na(x))   # identify which element(s) is missing
```

Some functions will treat NA as valid entries. For example, the length of a vector (number of elements) includes missing values in the count.

```length(x)
```

Some functions won't work on variables that include missing values unless default options are modified. For example, if you try to calculate the mean of a vector that contains missing values you will get `NA` as your result. Most functions have an option "na.rm" that ignores the missing values when calculating.

```x <- c(1,2,3,4,5,NA)  # a vector with one missing value
mean(x)               # result is NA
mean(x, na.rm = TRUE) # result is the mean of non-missing values of x
```

As usual, there's more than one way to solve the problem. For example, you can create a new variable that contains only the non-missing values, but this requires an extra step so it not preferred:

```x1 <- na.omit(x)           # put the non-missing values of x into new vector x1
x1 <- x[complete.cases(x)] # same
x1 <- x[!is.na(x)])        # same
length(x1)                 # count the number of non-missing values
```

If R is missing a needed function write your own. Here's an example of a function named `sep()` that calculates the standard error of an estimate of a proportion. The argument `n` refers to sample size, and `X` is the number of "successes" (e.g., the number of females in the sample, the number of infected individuals, etc.).

```sep <- function(X, n){
p.hat <- X / n                  # The proportion of "successes"
sep <- sqrt(p.hat*(1-p.hat)/n)  # The standard error of p.hat
return(sep)                     # Return the standard error as the result
}
```

To use the function, copy it to your clipboard. Then paste it into your command window and hit the enter key. (On a Mac, you may need to use the R Edit menu to "Paste as Plain Text" to avoid formatting problems.) The function `sep()` will be stored in your R workspace so you only need to paste it once. If you save your workspace when you exit R it will remain there when you start up again -- otherwise you'll need to paste it in again.

To use the function on some data, for example `n`=20 and `X`=10, enter

```sep(X = 10, n = 20) # yields the standard error
sep(10,20)          # shorthand ok if X and n are given in correct order
```

## Paste clipboard contents to a vector

To demonstrate, select the following 10 numbers with your mouse and copy to your clipboard:
76 75 -52 -70 52 8 -50 -6 57 5
(choose Edit -> Copy on your browser menu to copy to clipboard)
Then execute the following command in your R command window:

```z <- scan("clipboard", what=numeric())             # on a PC
z <- scan(pipe("pbpaste"), what=numeric())         # on a Mac```

To paste characters instead of numbers, use the following,

```z <- scan("clipboard", what=character())           # PC
z <- scan(pipe("pbpaste"), what=character())       # Mac```

If characters or numbers of interest are separated by commas, use

```z <- scan("clipboard", what=character(), sep=",")      # PC
z <- scan(pipe("pbpaste"), what=character(), sep=",")  # Mac
```