Tips and recommendations for making graphs and tables in R.


Read me

Base R vs tidyverse

Base R has a lot of built-in plot functions that are easy to use. For example, to examine an association between variables x and y you can often just enter

plot(y ~ x)

to get an unfussy plot. But to make the plots beautiful a lot of options need to be added. Most plotting functions will accept the same arguments to control plot elements. For the full list of plotting options in base R, get help as follows.

?par

The tidyverse is a collection of contributed packages for R that share a common philosophy about data and syntax and work well together. Two of the most-used packages are ggplot2 for graphics and dplyr for data manipulation.

Building a graph using the ggplot() command involves adding components or “layers” including data, “aesthetics” that map variables to visuals, and “geoms” that create different kinds of plots. For example, a basic scatter plot of yvar against xvar has the following components.

ggplot(data = mydata, mapping = aes(x = xvar, y = yvar)) + 
    geom_point()

The default ggplot() style has too much chartjunk for this writer. Options to alter the theme and other graph components are shown in the tips for specific graphs below.

Hadley Wickham’s book (Wickam, H. 2016. ggplot2: Elegant graphics for data analysis. 2nd edition) is the standard ggplot2 reference, but plenty of introductory resources are available online (e.g., this one)


Packages used on this page

library(ggplot2)
library(dplyr)
library(tidyr)

# Single-use packages
library(ggmosaic) # mosaic plot, works with ggplot2
library(vioplot) # violin plot for base R


Data sets used

Examples on this page make use of the following data sets, which you can read using the commands below.

bbird <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/BlackbirdTestosterone.csv")

BCfires <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/BCfires.csv")
BCfires1 <- subset(BCfires, SIZE_HA >= 1.0)

BCfireFrequency1 <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/BCfireFrequency1.csv", row.names=1)

birthDay <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/birthDay2016.csv")

mink <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/minkBrains.csv")

stickle <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/schluterMcphail1992.csv")

titanic <- read.csv("https://www.zoology.ubc.ca/~schluter/R/csv/Titanic.csv")

bbird has data on immunocompetence of red-winged blackbirds before and after testosterone implants. From Hasselquist et al. (1999, Behavioral Ecology and Sociobiology).

BCfires contains data on all forest fires in the province of British Columbia, Canada, from 1950 to 2021 (downloaded from the Canadian National Fire DataBase and cleaned up a little). BCfires1 is the subset of all fires at least 1 ha in size, and BCfireFrequency1 is the number of fires greater than 1 ha in size per month and year. The data set is not perfect – small fires weren’t reported in the early days.

birthDay contains data on the week day of birth of a random sample of 350 babies born in the USA in 1999 (example from Whitlock & Schluter (2020, The Analysis of Biological Data).

mink is a data set on brain sizes of wild, farmed, and feral mink by Pohle et al (2023, Royal Society Open Science). I adjusted brain size for body size differences (condylobasal length, cbl) using common principal components (cpcbp package).

stickle contains population means of males and females from threespine stickeleback populations. The data are from Schluter and McPhail (1992, American Naturalist).

titanic contains survival and gender data of passengers on the disastrouas voyage of the Titanic.


Save graph to pdf file

In base R, you can send your plot to a pdf file instead of the screen as follows.

pdf(file = "mygraph.pdf") # opens the pdf device for plotting
plot(...)                 # Issue your R commands here to generate plot
dev.off()                 # closes the device when you are done

If you are using ggplot(), follow your ggplot() command with the following line and it will save it to the named pdf file.

ggsave("mygraph.pdf")

Frequency tables

For a categorical variable

This frequency table counts the number (frequency) of cases in each category of a categorical variable A. The optinal useNA argument adds the category NA in the table if one or more cases is missing.

# base R
table(mydata$A, useNA = "ifany")

# dplyr method
summarize(group_by(mydata, A), Frequency = n())

For example, the following data on gender and survival of people on the Titanic.

head(titanic)
##   gender survival
## 1   Male Survived
## 2   Male Survived
## 3   Male Survived
## 4   Male Survived
## 5   Male Survived
## 6   Male Survived

To generate a frequency table of the number of passengers who survived and the number that died, use either of the following two methods

# base R
table(titanic$survival, useNA = "ifany") # base R
## 
##     Died Survived 
##     1438      654
# dplyr method
summarize(group_by(titanic, survival), Frequency = n()) # dplyr
## # A tibble: 2 × 2
##   survival Frequency
##   <chr>        <int>
## 1 Died          1438
## 2 Survived       654


Two variables (contingency table)

To generate a two-way frequency table for two categorical variables, A and B, use the following command syntax.

table(mydata$A, mydata$B, useNA = "ifany")

For example, the Titanic data set contains survival information for both men and women.

table(titanic$survival, titanic$gender, useNA = "ifany")
##           
##            Female Male
##   Died        109 1329
##   Survived    316  338

To add the row and column sums to the table, use addmargins(). For example,

mytab <- table(titanic$survival, titanic$gender)
addmargins(mytab, margin = c(1,2), FUN = sum, quiet = TRUE)
##           
##            Female Male  sum
##   Died        109 1329 1438
##   Survived    316  338  654
##   sum         425 1667 2092

dplyr() can make a contingency table but needs the spread() command from the tidyr package.

For example, using the Titanic data set read above,

mytab <- summarize(group_by(titanic, survival, gender), n = n())
mytab <- spread(mytab, gender, n)
mytab
## # A tibble: 2 × 3
## # Groups:   survival [2]
##   survival Female  Male
##   <chr>     <int> <int>
## 1 Died        109  1329
## 2 Survived    316   338

Zero table counts are given as NA. There’s an optional fix for that:

mutate(mytab, across(everything(), ~replace_na(., 0)))


Flat frequency table

The following commands generate a “flat” frequency table for two categorical variables, A and B. In a flat table, A and B are separate columns of a table, and a third column tallies the frequencies of each combination. Start with the table() command but include the names of the variables as an argument.

mytab <- table(mydata$A, mydata$B, dnn = c("A", "B"))
data.frame(mytab)

For example, using the Titanic data,

mytab <- table(titanic$survival, titanic$gender, dnn = c("survival", "gender"))
data.frame(mytab)
##   survival gender Freq
## 1     Died Female  109
## 2 Survived Female  316
## 3     Died   Male 1329
## 4 Survived   Male  338

The count() method of dplyr can make a flat table but will tally only the combinations of categories that have a frequency greater than 0. Use the complete() command from the tidyr package to add the 0’s (shown here, but not actually needed in this example).

mytab <- count(titanic, survival, gender)
complete(mytab, survival, gender, fill = list(n = 0))
## # A tibble: 4 × 3
##   survival gender     n
##   <chr>    <chr>  <int>
## 1 Died     Female   109
## 2 Died     Male    1329
## 3 Survived Female   316
## 4 Survived Male     338

Tables of means, sums, etc

R has efficient ways to get means, medians, sums, and other descriptive statistics by column or by row of data frames.


Row & column means, etc

For an example, we’ll use data on the average number of fires (at least 1 ha in size) in BC each month. The mean number of fires in each month is averaged over the years 1950 to 2021.

head(BCfireFrequency1)
##      Jan Feb Mar Apr May Jun Jul Aug Sep Oct Nov Dec
## 1950   0   0   4   3  36  87  72  34 103   3   0   1
## 1951   0   0   2  84  26  65 115 120  68   0   1   0
## 1952   0   0   4  17  94  16  60 107  69  89   1   0
## 1953   0   0   1  10  59  12  34  53  16   1   0   0
## 1954   0   0   8  16  62  13  30  10   2   1   0   1
## 1955   0   0   0   4  42  59  28  36  29   2   0   6

Use colMeans to obtain the mean number of fires in each month averaged over years. To view the resulting means, always arrange them vertically - data.frame() or cbind() will accomplish this. Here, I’ve also rounded the means to one decimal place using round().

meanNoFires <- colMeans(BCfireFrequency1)
data.frame(meanNoFires = round(meanNoFires, 1))
##     meanNoFires
## Jan         0.4
## Feb         0.6
## Mar         7.1
## Apr        49.9
## May        71.8
## Jun        49.4
## Jul        84.0
## Aug        78.9
## Sep        38.8
## Oct        20.5
## Nov         1.5
## Dec         0.3

Use rowSums to obtain the the number of fires (at least 1 ha in size) in June, July, and August of each year. I use head() and tail() to print the first 6 and the last 6 years.

nFiresJune_to_Aug <- rowSums(BCfireFrequency1[, c("Jun","Jul","Aug")])
head(data.frame(nFiresJune_to_Aug))
##      nFiresJune_to_Aug
## 1950               193
## 1951               300
## 1952               183
## 1953                99
## 1954                53
## 1955               123
tail(data.frame(nFiresJune_to_Aug))
##      nFiresJune_to_Aug
## 2016                96
## 2017               249
## 2018               564
## 2019                62
## 2020                70
## 2021               367

colSums() works the same way. Here is the total number of files at least 1 ha in size in each month from 1950 to 2021.

nFiresByMonth1950_to_2021 <- colSums(BCfireFrequency1, na.rm = TRUE)
data.frame(nFiresByMonth1950_to_2021)
##     nFiresByMonth1950_to_2021
## Jan                        29
## Feb                        43
## Mar                       514
## Apr                      3593
## May                      5170
## Jun                      3555
## Jul                      6051
## Aug                      5680
## Sep                      2794
## Oct                      1474
## Nov                       106
## Dec                        25


More row & column stats

More generally, use apply() with MARGIN = 1 to calculate other statistics by row of a data frame, and with MARGIN = 2 to get other statistics by column.

For example, to get the median number of fires in BC (at least 1 ha in size) per year from 1950 to 2021 for every month (i.e., columns of the data frame), use the following.

medianNoFires <- apply(BCfireFrequency1, MARGIN = 2, median)
data.frame(medianNoFires)
##     medianNoFires
## Jan           0.0
## Feb           0.0
## Mar           5.0
## Apr          44.5
## May          65.0
## Jun          34.0
## Jul          61.5
## Aug          59.5
## Sep          24.0
## Oct          12.0
## Nov           0.0
## Dec           0.0

Means, etc, by group

Several methods calculate descriptive statistics for each level or category of a categorical variable (i.e., “by group”).


Single numeric variable

Here is how to generate a table of group means for a single numeric variable y, where A is the categorical variable whose levels are the groups.

Arguments associated with the function used (e.g., na.rm = TRUE to the function mean()), are given right after FUN.

# result is a vector
tapply(mydata$y, INDEX = mydata$A, FUN = mean, na.rm = TRUE)

# result is a data frame
aggregate(y ~ A, data = mydata, FUN = mean, na.rm = TRUE)

# dplyr method; result is a data frame
summarize(group_by(mydata, A), ybar = mean(y, na.rm = TRUE)) 

I illustrate by calculating the total (summed) area burned each year the forest fires data set, BCfires1, which includes only fires at least 1 ha in size.

# base R
totAreaBurned <- tapply(BCfires1$SIZE_HA, INDEX = BCfires1$YEAR, FUN = sum, na.rm = TRUE)
head(data.frame(totAreaBurned))
##      totAreaBurned
## 1950      343038.4
## 1951      123688.2
## 1952       61372.3
## 1953       15397.5
## 1954        4126.2
## 1955       22748.4
totAreaBurned <- aggregate(SIZE_HA ~ YEAR, data = BCfires1, FUN = sum, na.rm = TRUE)
head(totAreaBurned)
##   YEAR  SIZE_HA
## 1 1950 343038.4
## 2 1951 123688.2
## 3 1952  61372.3
## 4 1953  15397.5
## 5 1954   4126.2
## 6 1955  22748.4
# dplyr method
totAreaBurned <- summarize(group_by(BCfires1, YEAR), totAreBurned = sum(SIZE_HA, na.rm = TRUE)) 
head(totAreaBurned)
## # A tibble: 6 × 2
##    YEAR totAreBurned
##   <int>        <dbl>
## 1  1950      343038.
## 2  1951      123688.
## 3  1952       61372.
## 4  1953       15398.
## 5  1954        4126.
## 6  1955       22748.


More variables at once

With the same data set, I show two different methods to calculate both the median fire size and median fire latitude by year with a single command.

# base R
z <- aggregate(cbind(SIZE_HA, LATITUDE) ~ YEAR, 
        FUN = median, na.rm = TRUE, data = BCfires1)
head(z)
##   YEAR SIZE_HA LATITUDE
## 1 1950     8.0  51.4340
## 2 1951    10.1  50.8470
## 3 1952    10.1  50.9630
## 4 1953     2.8  50.8035
## 5 1954     4.8  50.3760
## 6 1955     4.0  51.9340
# dplyr method
z <- summarize(group_by(BCfires1, YEAR), 
          medianFireSize = median(SIZE_HA, na.rm = TRUE),
          medianLatitude = median(LATITUDE, na.rm = TRUE)) 
head(z)
## # A tibble: 6 × 3
##    YEAR medianFireSize medianLatitude
##   <int>          <dbl>          <dbl>
## 1  1950            8             51.4
## 2  1951           10.1           50.8
## 3  1952           10.1           51.0
## 4  1953            2.8           50.8
## 5  1954            4.8           50.4
## 6  1955            4             51.9


Two or more grouping variables

Descriptive statistics can be grouped by more than one attribute at once.

For example, here are three ways to calculate median fire size in BC, in ha, grouped by year and month.

# NA if no fires in that year and month
z <- tapply(BCfires1$SIZE_HA, INDEX = list(BCfires1$YEAR, BCfires1$MONTH), 
        FUN = median, na.rm = TRUE)
head(z)
##       1  2      3     4    5    6    7    8    9  10   11    12
## 1950 NA NA   5.00  2.00  4.0 14.1  5.0  6.2 10.1 1.2   NA  4.00
## 1951 NA NA 285.15 11.90  5.2  7.2 14.1 12.1  8.0  NA 32.3    NA
## 1952 NA NA  10.65  4.80 14.9  2.0 13.1  8.0  8.0 8.0 26.3    NA
## 1953 NA NA   1.30 11.70  9.7  2.2  2.6  2.4  1.8 2.4   NA    NA
## 1954 NA NA  16.65  9.05  4.0 14.1  4.8  2.6  2.6 1.4   NA 12.10
## 1955 NA NA     NA 11.05  6.0  6.0  2.6  4.3  2.8 5.0   NA 60.65
z <- aggregate(SIZE_HA ~ MONTH + YEAR, data = BCfires1, 
        FUN = median, na.rm = TRUE)
head(z)
##   MONTH YEAR SIZE_HA
## 1     3 1950     5.0
## 2     4 1950     2.0
## 3     5 1950     4.0
## 4     6 1950    14.1
## 5     7 1950     5.0
## 6     8 1950     6.2
# dplyr method
BCfires1.grouped <- group_by(BCfires1, YEAR, MONTH)
z <- summarize(BCfires1.grouped, 
        totAreBurned = median(SIZE_HA, na.rm = TRUE))
head(z)
## # A tibble: 6 × 3
## # Groups:   YEAR [1]
##    YEAR MONTH totAreBurned
##   <int> <int>        <dbl>
## 1  1950     3          5  
## 2  1950     4          2  
## 3  1950     5          4  
## 4  1950     6         14.1
## 5  1950     7          5  
## 6  1950     8          6.2


More functions at once

The dplyr summarize() command can calculate more than one descriptive statistic at once. Here we calculate both total area burned and median fire size (ha) by year.

z <- summarize(group_by(BCfires1, YEAR), 
        totAreBurned = sum(SIZE_HA, na.rm = TRUE),
        medianFireSize = median(SIZE_HA, na.rm = TRUE)) 
head(z)
## # A tibble: 6 × 3
##    YEAR totAreBurned medianFireSize
##   <int>        <dbl>          <dbl>
## 1  1950      343038.            8  
## 2  1951      123688.           10.1
## 3  1952       61372.           10.1
## 4  1953       15398.            2.8
## 5  1954        4126.            4.8
## 6  1955       22748.            4



Bar graph

To show a frequency distribution for a single actegorical variable.


Basic bar graph

Here’s the basic method. A is a categorical variable identifying groups.

# base R
barplot(table(mydata$A), col = "firebrick", 
  space = 0.2, cex.names = 1.2)

# Using ggplot
ggplot(mydata, aes(x = A)) + 
    geom_bar(stat="count", fill = "firebrick")

For an example, I use the birthDay data set, which has the week day of birth of a random sample of 350 babies born in the USA in 1999. The option las = 2 sets all the axis labels perpendicular to the axes.

# Base R
barplot(table(birthDay$day), col = "firebrick", space = 0.2, 
    las = 2, ylab = "Frequency")

Using ggplot() instead:

ggplot(birthDay, aes(x = day)) + 
    geom_bar(stat="count", fill = "firebrick") +
    labs(x = "", y = "Frequency") +
    theme_classic() +
    theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.3))


Reorder the categories

R arranges the categories of a character variable in alphabetical order, which is rarely desired. Convert to a factor using the factor() command to specify a meaningful order for the categories using thelevels option. For example, put the weekdays in the proper consecutive order, then redraw.

birthDay$day <- factor(birthDay$day, levels = c("Monday","Tuesday",
        "Wednesday", "Thursday", "Friday", "Saturday", "Sunday"))
barplot(table(birthDay$day), col = "firebrick", space = 0.2, 
    las = 2, ylab = "Frequency")

Often the variable of interest might lack a natural ordering of categories and then the best option is to plot the bars in order of decreasing frequency.

# Using base R
barplot(sort(table(birthDay$day), decreasing = TRUE), 
    col = "firebrick", space = 0.2, las = 2, ylab = "Frequency")

# Using ggplot(), with additional text options
birthDay$day_ordered <- factor(birthDay$day, 
    levels = names(sort(table(birthDay$day), decreasing = TRUE)) )
ggplot(birthDay, aes(x = day_ordered)) + 
    geom_bar(stat="count", fill = "firebrick") +
    labs(x = "Weekday", y = "Frequency") +
    theme_classic() +
  theme(axis.text.x = element_text(angle = 90, hjust = 1, vjust = 0.3),
    text = element_text(size = 15), axis.text = element_text(size = 12))

If the frequencies are already tabulated in a data frame named mytab, then modify as follows. A is a variable in mytab that lists each named category exactly once and Freq is a variable containing the corresponding frequency of cases in each category.

mytab <- arrange(mytab, desc(Freq))
barplot(mytab$Freq, names.arg = mytab$A, col = "firebrick")

# or
mytab$A_ordered <- factor(mytab$A, levels = mytab$A[order(mytab$Freq, decreasing = TRUE)] )
ggplot(mytab, aes(x = A_ordered, y = Freq)) + 
    geom_bar(stat="identity", fill = "firebrick") +
    labs(x = "A group", y = "Frequency") +
    theme_classic() +
    theme(text = element_text(size = 15), 
    axis.text = element_text(size = 12), aspect.ratio = 0.8)


Grouped bar graph

This and the mosaic plot (next) compare the frequencies of a categorical variable by two (or more) groups.

The basic bar plot command is below A and B can be factors or character variables in a data frame mydata.

# Basic bar plot in base R
barplot(table(mydata$A, mydata$B), beside = TRUE)

I illustrate using the association between passenger survival and gender in the Titanic data.

# Order the survival categories so that "Survived" comes first
titanic$survival <- factor(titanic$survival, levels = c("Survived", "Died"))
barplot(table(titanic$survival, titanic$gender), beside = TRUE, 
    las = 1, col = c("goldenrod1", "firebrick"), cex.names = 0.8, space = c(0.2,0.8),
    xlab = "Gender", ylab = "Frequency", legend.text = TRUE)

Here’s the grouped bar graph using ggplot(). The argument position_dodge2(preserve="single") handles category combinations having a count of 0 (not an issue with this example).

ggplot(titanic, aes(x = gender, fill = survival)) + 
  geom_bar(stat = "count", position = position_dodge2(preserve="single")) +
  scale_fill_manual(values=c( "goldenrod1", "firebrick")) +
  labs(x = "Gender", y = "Frequency") +
  theme_classic() +
  theme(aspect.ratio = 0.80)



Mosaic plot

Base R

Here’s the basic command in base R.

mytable <- table(mydata$B, mydata$A)
mosaicplot(t(mytable), color = c("red","blue","yellow"), 
           las = 2, cex.axis = 0.8)

Construct the table with variables in the order shown above, so that B is the response (dependent) variable and A is the explanatory variable. Then use the t() function to transpose the table in the mosaicplot() function so that both the table and the mosaic plot have the explanatory variable on the x axis and the response variables on the y axis.

For example, the association between passenger survival and gender can be viewed in a mosaic plot using the following data.

# Order the categories so that "Survived" comes first so on top
titanic$survival <- factor(titanic$survival, levels = c("Survived", "Died"))
titanicTable <- table(titanic$survival, titanic$gender)
titanicTable
##           
##            Female Male
##   Survived    316  338
##   Died        109 1329
par(pty = "s") # makes a square plot
mosaicplot( t(titanicTable), col = c("goldenrod1", "firebrick"), cex.axis = 0.8, 
    sub = "Gender", ylab = "Relative frequency", main = "")


ggmosaic package

To draw a mosaic plot using ggplot, you’ll first need to install and load the ggmosaic package.

# Order the categories so that "Died" comes first so on bottom
titanic$survival <- factor(titanic$survival, levels = c("Died", "Survived"))
ggplot(titanic) + 
  geom_mosaic(aes(x = product(survival, gender), fill=survival)) +
  labs(x = "gender") +
  theme_classic() + 
  scale_fill_manual(values=c("firebrick", "goldenrod1")) +
  theme_classic() +
  theme(aspect.ratio = 1) + 
  guides(fill = guide_legend(title = "Survival", reverse = TRUE))



Histogram

Show the freqency distribution of a numeric variable.


Basic histogram

The basic command is

hist(mydata$x, right = FALSE)

The option right = FALSE causes all the histogram bins, except the last one to be left-closed. In other words, the value 1 would appear in the interval 1-2 rather than in the interval 0-1, which is the convention.

For example, here is a histogram of the frequency of forest fires in British Columbia, Canada, from 1950 to 2021, by month. Even using right = FALSE R has put December fires into the 11-12 month bin. To control this, use breaks = seq(0, 13, by = 1) instead, as in the subsequent plot below.

hist(BCfires1$MONTH, right = FALSE, col = "firebrick", breaks = 12, 
    xlab = "Month", ylab = "Forest fires 1950 - 2021", main = "", las = 1)

Or use ggplot()

ggplot(BCfires1, aes(x = MONTH)) + 
    geom_histogram(fill = "firebrick", col = "black",  
      breaks = seq(0, 13, 1), closed = "left") + 
    labs(x = "Month", y = "Forest fires 1950 - 2021") +
    theme_classic() + 
    theme(aspect.ratio = 0.80)


Probability density

To display probability density in base R instead of raw frequencies, add prob = TRUE to your arguments. I show this in the next section, where I also add a normal curve to the plot

# Not run
hist(BCfires1$MONTH, right = FALSE, col = "firebrick", breaks = 12, 
    xlab = "Month", ylab = "Forest fires 1950 - 2021", main = "", 
    las = 1, prob = TRUE)

If using ggplot(),

# Not run
ggplot(BCfires1, aes(x = MONTH)) + 
    geom_histogram(aes(y = after_stat(density)), fill = "firebrick",   
      col = "black", breaks = seq(0, 13, 1), closed = "left") + 
    labs(x = "Month", y = "Probability density") +
    theme_classic() + 
    theme(aspect.ratio = 0.80)


Add a normal curve

To superimpose a normal density curve on a probability density histogram, try the following commands. First, 101 evenly spaced points along the x axis, are made between the smallest and largest data value using seq. Then dnorm() generates the normal density at each x point, using the mean and standard deviation of the data.

hist(BCfires1$MONTH, right = FALSE, col = "firebrick", breaks = 12, 
    xlab = "Month", ylab = "Forest fires 1950 - 2021", main = "", 
    las = 1, prob = TRUE, ylim = c(0, 0.25))
  m <- mean(BCfires1$MONTH, na.rm = TRUE)
  s <- sd(BCfires1$MONTH, na.rm = TRUE)
  xpts <- seq(from = min(BCfires1$MONTH, na.rm=TRUE), 
    to = max(BCfires1$MONTH, na.rm = TRUE), 
    length.out = 101)
lines(dnorm(xpts, mean=m, sd=s) ~ xpts, col="red", lwd=2)

With ggplot(), add a stat_function() to get the same,

# Not run
ggplot(BCfires1, aes(x = MONTH)) + 
  geom_histogram(aes(y = after_stat(density)), fill = "firebrick",   
    col = "black", breaks = seq(0, 13, 1), closed = "left") + 
  stat_function(fun = dnorm, 
    args = list(mean = mean(BCfires1$MONTH, na.rm = TRUE), 
    sd = sd(BCfires1$MONTH, na.rm = TRUE))) +
  labs(x = "Month", y = "Probability density") +
  theme_classic() + 
  theme(aspect.ratio = 0.80)


Histograms by group

Use facet_wrap() with ggplot() to create a vertical stack of histograms, allowing visual comparison.

The basic command is

ggplot(mydata, aes(x = x)) + 
    geom_histogram(closed = "left") +
    facet_wrap( ~ A, ncol = 1, scales = "free_y")

The example below used the mink data set on brain sizes of wild, farmed, and feral mink. The trend shows that farmed mink have small brains and feral mink regain the larger brains of their wild ancestors.

head(mink)
##   id_number category       date sex age_months   cbl   vol     vol3 vol3.cpc
## 1         1     farm 2009-11-10   F          6 67.18 41.32 3.457165 3.427563
## 2         2     farm 2009-11-10   F          6 64.22 38.97 3.390342 3.441955
## 3         3     farm 2009-11-10   F          6 64.88 41.23 3.454653 3.476658
## 4         4     farm 2009-11-10   F         18 64.98 40.36 3.430181 3.455552
## 5         5     farm 2009-11-10   F          6 65.31 37.86 3.357842 3.392389
## 6         6     farm 2009-11-10   F          6 66.28 36.09 3.304677 3.330347
# brain volume of males by category of mink
ggplot(subset(mink, sex == "M"), aes(x = vol3.cpc)) + 
    geom_histogram(fill = "firebrick", col = "black", closed = "left", 
        boundary = 0, binwidth = 0.02) +
    labs(x = "Cube root brain volume (mm)", y = "Frequency of males") +
    theme_classic() + 
    theme(aspect.ratio = 0.4) + 
    facet_wrap( ~ category, ncol = 1, scales = "free_y")

Accomplishing the same in base R is more strenuous.

# Not run
par(mfrow=c(3,1), mar = c(4, 4, 2, 1), cex = 0.7) 
for( i in unique(mink$category) ){
    dat <- subset(mink, category == i & sex == "M")
    hist(dat$vol3.cpc, breaks = seq(3.3, 3.7, by = 0.02), right = FALSE, 
    xlim = range(mink$vol3.cpc, na.rm = TRUE), 
        col="firebrick", main = i, 
        xlab = "", ylab = "Frequency")
  }
mtext("Cube root brain volume (mm)", side = 1, padj = 3)



Normal quantile plot

Compare the distribution of the variable x to the normal distribution.

qqnorm(mydata$x)
qqline(mydata$x)  # adds line through first and third quartiles

The same can be accomplished in ggplot as follows.

ggplot(mydata, aes(sample = x)) +
    geom_qq() +
    geom_qq_line() +
    theme_classic()

Strip chart

A strip chart (a.k.a dot plot) is like a scatter plot but the horizontal (explanatory) variable is categorical.


Basic strip chart

The basic command is as follows. A is the name of the categorical grouping variable and y is the name of the numeric response variable. A small amount of horizontal “jitter” is added to reduce overlap of points.

# base R
stripchart(y ~ A, vertical = TRUE, data = mydata, 
  method = "jitter")

# ggplot
ggplot(mydata, aes(A, y)) + 
  geom_jitter()

For example, I’m using the brain size in mink data set used earlier.

The option jitter controls the amount of horizontal “jitter”.

# brain volume of females by category of mink
stripchart(vol3.cpc ~ category, vertical = TRUE, 
  data = subset(mink, sex == "F"), 
  method = "jitter", jitter = 0.1, pch = 16, col = "goldenrod1", 
  las = 1, xlab = "Category (females)", ylab = "Cube root brain volume (mm)")

A similar plot using ggplot() is below. width controls the amount of horizontal “jitter”.

# Not run
ggplot(subset(mink, sex == "F"), aes(category, vol3.cpc)) +
    geom_jitter(color = "goldenrod1", size = 3, width = 0.1, alpha = 0.5) +
    labs(x = "Category (females)", y = "Cube root brain volume (mm)") +
    theme_classic() + 
    theme(aspect.ratio = 0.80, text = element_text(size = 12), 
          axis.text = element_text(size = 10))


Strip chart with error bars

Use separate stat_summary() commands to overlay means and standard error bars. Use fun.data = mean_cl_normal to get 95% confidence intervals instead of standard error bars. Shift the position of the error bars in the x direction using position_nudge() to eliminate overlap with the raw data.

ggplot(subset(mink, sex == "F"), aes(category, vol3.cpc)) +
  geom_jitter(color = "goldenrod1", size = 3, width = 0.2, alpha = 0.5) +
  stat_summary(fun.data = mean_se, geom = "errorbar", 
    width = 0.1, position = position_nudge(x = 0.3)) +
  stat_summary(fun = mean, geom = "point", 
  size = 2, position=position_nudge(x = 0.3)) +
  labs(x = "Category (females)", y = "Cube root brain volume (mm)") +
  theme_classic() + 
  theme(aspect.ratio = 0.80, text = element_text(size = 12), 
    axis.text = element_text(size = 10))

In base R you can also add error bars stripchart() by using integer numbers to indicate position of categories along the x-axis. For example, to add means and standard errors to a strip chart for a numeric variable y and a categorical variable A having four categories, use

stripchart(y ~ A, data = mydata, vertical = TRUE, method = "jitter", pch = 16)
m <- tapply(mydata$y, mydata$A, mean, na.rm = TRUE)
se <- tapply(mydata$y, mydata$A, 
          function(y){ sd(y, na.rm=TRUE)/sqrt(length(na.omit(y))) })
points( m ~ c(1:4 + 0.2) + 0.2, pch=16, col = "firebrick")
segments(x0 = c(1:4 + 0.2), y0 = m - se, 
          x1 = c(1:4 + 0.2), y1 = m + se, col = "firebrick")


Strip charts by group

A grouped strip chart displays a numeric response variable yfor different levels of two categorical variables. This can be accomplished in base R by overlaying multiple strip charts, but it is also possible in ggplot() using position_jitterdodge to reduce overlap among points.

ggplot(data = mink, aes(y = vol3.cpc, x = category, fill = sex, color = sex)) + 
  geom_point(size = 3, position = position_jitterdodge(jitter.width = 0.2), 
    dodge.width = 0.4, alpha = 0.5) +
  labs(x = "Category", y = "Cube root brain volume (mm)") +
  scale_color_manual(values=c("goldenrod1", "firebrick")) +
  theme_classic() +
  theme(aspect.ratio = 0.80, text = element_text(size = 16), 
    axis.text = element_text(size = 14))



Strip charts for paired data

If data are paired (two measurements of the same individual, two attributes of the same population, then paired points should be connected with a line to indicate this connection.


Strip chart with line segments

The following commands create a strip chart with lines connection the two measurements of the same unit in the two treatments. The data frame mydata includes the response variable y, the treatment variable A, and an id variable indicating identity of individuals measured under both treatments.

The following method works in base R.

interaction.plot(response = mydata$y, x.factor = mydata$A, trace.factor = mydata$id,
    legend = FALSE, lty = 1, xlab = "Treatment", ylab = "Y variable", 
    type = "b", pch = 16, las = 1)

For example, the following are data on immunocompetence of red-winged blackbirds before and after testosterone implants (bbird data set). The treatment variable is converted to a factor to order the categories.

bbird$treatment <- factor(bbird$treatment, levels = c("before","after"))
head(bbird)
##   blackbird competence logCompetence treatment
## 1         1        105          4.65    before
## 2         2         50          3.91    before
## 3         3        136          4.91    before
## 4         4         90          4.50    before
## 5         5        122          4.80    before
## 6         6        132          4.88    before
interaction.plot(response = bbird$competence, 
  x.factor = bbird$treatment, trace.factor = bbird$blackbird,
    legend = FALSE, lty = 1, xlab = "Treatment", ylab = "Immunocompetence", 
    type = "b", pch = 16, las = 1, col = "firebrick")

The following does the same with ggplot(). alpha makes the points partly transparent so that overlapping points can be partly resolved.

ggplot(bbird, aes(y = competence, x = treatment)) +  
    geom_line(aes(group = blackbird)) +
    geom_point(size = 3, col = "firebrick", alpha = 0.5) + 
    labs(x = "Treatment", y = "Immunocompetence") +
    theme_classic() + 
    theme(text = element_text(size = 14))


Paired data by group

The easiest ggplot method for comparing paired data among groups is to make a separate paired-data plot for each group using facet_wrap() and tweak the theme() options to make it look like a single, integrated plot.

The following example uses the stickle data set, containing population means of males and females from threespine stickeleback populations, to show that limnetic and benthic ecotypes repeatedly have opposite sexual size dimorphism.

head(stickle)
##     Lake Population  Ecotype    Sex length raker.number
## 1  Emily    Emily-B  benthic female  54.85        18.92
## 2   Enos     Enos-B  benthic female  51.26        18.67
## 3 Hadley   Hadley-B  benthic female  56.71        19.42
## 4 Paxton   Paxton-B  benthic female  52.07        17.79
## 5 Priest   Priest-B  benthic female  55.16        18.71
## 6  Emily    Emily-L limnetic female  44.44        23.63
ggplot(data = stickle, aes(y = length, x = Sex, fill = Sex, color = Ecotype)) + 
    geom_line(aes(group = Population), col = "black") +
    geom_point(aes(shape = Sex), size = 2.5, 
    position = position_dodge(width = 0.2), show.legend = TRUE) +
    facet_wrap(~ Ecotype, nrow = 1, strip.position = "top") +
    labs(x = "Sex", y = "Body length (mm)") +
    scale_color_manual(values=c("indianred4", "lightcoral", "darkcyan")) +
    scale_shape_manual(values = c(16, 17)) +
    theme_classic() +
    theme(strip.background = element_blank(), strip.text.x = element_blank())



Box plot & violin plot

The following code generates a box plot for the numeric variable yvar separately for every group identified in the categorical variable A.


Basic box plot

The following shows use of the formula method in barplot(), written as “response_variable ~ explanatory_variable”. If varwidth = TRUE, box width is proportional to the square root of sample size — set to FALSE if you want all boxes to have the same width.

boxplot(yvar ~ A, data = mydata, varwidth = TRUE, 
    ylab="y value", col = "firebrick", cex.axis = 0.8, las = 1)

For example, here is a box plot of of brain sizes of female mink. Box width reflects sample size.

head(mink)
##   id_number category       date sex age_months   cbl   vol     vol3 vol3.cpc
## 1         1     farm 2009-11-10   F          6 67.18 41.32 3.457165 3.427563
## 2         2     farm 2009-11-10   F          6 64.22 38.97 3.390342 3.441955
## 3         3     farm 2009-11-10   F          6 64.88 41.23 3.454653 3.476658
## 4         4     farm 2009-11-10   F         18 64.98 40.36 3.430181 3.455552
## 5         5     farm 2009-11-10   F          6 65.31 37.86 3.357842 3.392389
## 6         6     farm 2009-11-10   F          6 66.28 36.09 3.304677 3.330347
boxplot(vol3.cpc ~ category, data = subset(mink, sex == "F"), varwidth = TRUE, 
    ylab="Cube root brain volume (mm)", col = "goldenrod1", cex.axis = 0.9, 
    xlab = "Category (females)", las = 1)

A similar box plot can be made using ggplot(). If varwidth = TRUE, box width is proportional to the square root of sample size (set this option to FALSE if you want all boxes to have the same width).

For example, using the mink data set,

# Not run
ggplot(subset(mink, sex == "F"), aes(x = category, y = vol3.cpc)) +
    geom_boxplot(fill = "goldenrod1", varwidth = TRUE) + 
  labs(x = "Category (females)", y = "Cube root brain volume (mm)") + 
    theme_classic() + 
    theme(text = element_text(size = 14))


Box plots by group

A grouped box plot displays a numeric response variable y for different levels of two categorical variables, A and B. This can be accomplished in base R by overlaying multiple box plots, but it is easier to produce the plot with ggplot().

ggplot(data = mydata, aes(y = y, x = A, fill = B)) + 
    geom_boxplot(width = 0.6, position = position_dodge(width = 0.7)) +
    labs(x = "A variable", y = "y variable") +
    theme_classic()

Box width is controlled by width, whereas position_dodge(width = ) controls the horizontal distance between the adjacent boxes depicting different categories of the B variable within groups of the A variable.

For example, here is the grouped box plot for the mink data set on (size-corrected) brain sizes in farmed, wild, and feral minks. Compare with the earlier strip chart of the same data.

ggplot(data = mink, aes(y = vol3.cpc, x = category, 
    fill = sex)) + 
    geom_boxplot(width = 0.6, position = position_dodge(width = 0.7)) +
    labs(x = "Category", y = "Cube root brain volume (mm)") +
  scale_fill_manual(values=c("goldenrod1", "firebrick")) +
    theme_classic() +
    theme(aspect.ratio = 0.8, text = element_text(size = 14))


Violin plot

A violin plot shows the frequency distribution for a numerical variable (and its mirror image) for several groups. The frequency distribution is a kernel density estimate, which “smooths” the distribution. Some find this graph more intuitive than the box plot.

The basic ggplot() violin plot is as follows. yvar is a numeric variable and A is a categorical variable. stat_summary() ads a dot for each category mean.

ggplot(mydata, aes(x = A, y = yvar)) + 
    geom_violin(fill = "firebrick") + 
  stat_summary(fun.y = mean,  geom = "point", color = "black", 

For example, the following code generates a grouped violin plot for the mink brain size data set.

ggplot(data = mink, aes(y = vol3.cpc, x = category, 
    fill = sex)) + 
    geom_violin(position = position_dodge(width = 0.8)) +
    stat_summary(fun.y = mean,  geom = "point", color = "black", 
    position = position_dodge(width = 0.8)) +
    labs(x = "Category", y = "Cube root brain volume (mm)") +
  scale_fill_manual(values=c("goldenrod1", "firebrick")) +
    theme_classic() +
    theme(aspect.ratio = 0.8, text = element_text(size = 14))

The job can be done in base R, but you’ll need to install the vioplot package first using install.packages(), and then load it. Here’s the code applied to the mink data for males. You’ll need to tweak the value for h to control the degree of smoothing.

# Not run
vioplot(vol3.cpc ~ category, data = subset(mink, sex == "M"),
    col="firebrick", drawRect = FALSE, las = 1, h = 0.03,
  xlab = "Category (males)", ylab = "Cube root brain volume (mm)")



Scatter plots

Basic scatter plot

Here’s how to to produce a scatter plot for two numeric variables, x and y.

# formula method, base R
plot(y ~ x, data = mydata, pch = 16, col = 2)

# basic scatter plot in ggplot
ggplot(mydata, aes(x, y)) + geom_point()

For example, here is a scatter plot of brain size (vol3, cube root of volume) against a measure of body size (condylobasal length, cbl) in the mink data set. The option alpha makes the dots partly transparent.

ggplot(mink, aes(x = cbl, y = vol3)) + 
    geom_point(size = 2, col = "firebrick", alpha = 0.5) + 
    labs(x = "Condylobasal length (mm)", y = "Cube root of brain volume (mm)") +
    theme_classic() + 
    theme(aspect.ratio = 0.80)


Add a lowess/loess curve

To add a smooth curve through the data, use locally weighted regression. “Local” here means that y is predicted for each x using only data in the vicinity of that x, rather than all the data.

In base R you can control the size of the vicinity using the option f, which is a proportion between 0 (very narrow vicinity) and 1 (uses all the data). Try different values of f to best capture the relationship. The default is f = 2/3. The method works best if the variables are ordered by x.

plot(y ~ x, data = mydata)
x1 <- mydata$x[order(mydata$x)]
y1 <- mydata$y[order(mydata$x)]
lines(lowess(x1, y1, f = 0.5))

Using ggplot(), you also get SE’s of predicted values (set to FALSE if not desired).

ggplot(mink, aes(x = cbl, y = vol3)) + 
    geom_point(size = 2, col = "firebrick", alpha = 0.5) + 
  geom_smooth(method = "loess", linewidth = 1, col = "black", se = TRUE) +
    labs(x = "Condylobasal length (mm)", y = "Cube root of brain volume (mm)") +
    theme_classic() + 
    theme(aspect.ratio = 0.80)


Add regression line

To add the least squares regression line to a plot, use the following.

# Not run

# base R
plot(y ~ x, data = mydata)
z <- lm(y ~ x, data = mydata)
abline(z)

# in ggplot
ggplot(mydata, aes(x, y)) + 
    geom_point(size = 2, col = "firebrick") + 
    labs(x = "x variable", y = "y variable") + 
    geom_smooth(method = "lm", linewidth = 1, col = "black") +
    theme_classic() +
    theme(aspect.ratio = 0.80)

In base R you can use the cursor to click data points on the graph to identify individuals. The following code prints the row number when you click a point (change that by setting the labels option to a character variable that labels the point instead).

plot(y ~ x, data = mydata)
identify(mydata$x, mydata$y, labels = seq_along(mydata$x))


Scatter plots by group

A single scatter plot of y on x can show multiple groups if you vary the color and shape of dots.

In base R, use pch to vary the symbol, col to vary the color, and both to vary color and shape at the same time. The legend command adds a legend identifying the groups—click on the plot (inside the plot region) with your cursor to place the legend.

# Not run

plot(y ~ x, data = mydata, pch = as.numeric(factor(mydata$A)),
  col = as.numeric(factor(mydata$A)))
  legend( locator(1), legend = as.character(levels(factor(mydata$A))), 
    pch = 1:length(levels(factor(mydata$A))), 
    col=1:length(levels(factor(mydata$A))) )

To accomplish the same with ggplot(), specify which variable to vary by color and by shape within aes() mapping. For example in the mink data we can use color and shape to differentiate the three categories. The geom_smooth() command is used to fit each group with a regression line.

ggplot(subset(mink, sex == "M"), 
    aes(x = cbl, y = vol3, colour = category, shape = category)) + 
  geom_point(size = 2, alpha = 0.7) + 
  geom_smooth(method = "lm", linewidth = 1, se = FALSE) +
    labs(x = "Condylobasal length (mm)", 
    y = "Cube root of brain volume (mm)") +
  ggtitle("Mink brain and body size - males") +
  theme_classic() + 
  theme(aspect.ratio = 0.80, plot.title = element_text(hjust = 0.5))

To constrain regression lines for all the groups to have the same slopes (but different intercepts), some pre-calculations are required.

femaledata <- subset(mink, sex == "F")
fitmink <- lm(log(vol3) ~ log(cbl) + category, data = femaledata, 
    na.action = na.exclude)
femaledata$predict <- predict(fitmink)
ggplot(femaledata, aes(x = log(cbl), y = log(vol3), 
    colour = category, shape = category)) + 
  geom_point(size = 2, alpha = 0.8) + 
    geom_line(aes(y = predict)) + 
    labs(x = "Condylobasal length (mm)", 
    y = "Cube root of brain volume (mm)") +
  theme_classic() + 
  theme(aspect.ratio = 0.80)

Another way to make a scatter plot for multiple groups is to plot them separately in panes of a panel of plots. Use facet_wrap() with ggplot() for this purpose, as follows. The lattice package can also be used to make panels of plots, as described briefly at the bottom of this page.

ggplot(subset(mink, sex == "F"), 
    aes(x = cbl, y = vol3, colour = category, shape = category)) + 
  geom_point(size = 2, alpha = 0.7) + 
  geom_smooth(method = "lm", linewidth = 1, se = FALSE) +
    facet_wrap(~ category, ncol = 2) + 
    labs(x = "Condylobasal length (mm)", 
    y = "Cube root of brain volume (mm)") +
  theme_classic() + 
  theme(aspect.ratio = 0.80)


All scatter plot combos

The following command creates a single graph with all pairwise scatter plots for variables x1 to x5 in the data frame, mydata. The option gap adjusts the spacing between separate plots,

pairs(~ x1 + x2 + x3 +x4 + x5, data = mydata)



Line plot

Here’s how to plot a sequence of (x, y) points and connect the dots with lines. This is especially useful when the x-variable represents a series of points in time or across a spatial gradient.

In base R:

plot(y ~ x, data = mydata, pch=16)
plot(y[order(x)] ~ x[order(x)], data = mydata, type = "l", 
    lty = 3, lwd = 2, col = "red")

For example, the following plots the total area burned in BC forest fires from 1950 to 2021.

# First calculate the cumulative area of all fires of at least 1 ha each year
z <- aggregate(SIZE_HA ~ YEAR, data = BCfires1, FUN = sum, na.rm = TRUE)

# Line plot of log area
plot(log10(SIZE_HA) ~ YEAR, data = z, type = "l", las = 1, cex.axis = 0.7,
  xlab = "Year", ylab = "log10 total area burned (ha)")

# Optionall addition of points
points(log10(SIZE_HA) ~ YEAR, data = z, add = TRUE, pch = 16, cex = 0.5)

To do the same in ggplot, use the following. I added geom_smooth() to show the trend.

# First calculate the cumulative area of all fires of at least 1 ha each year
z <- aggregate(SIZE_HA ~ YEAR, data = BCfires1, FUN = sum, na.rm = TRUE)

# Line plot
ggplot(z, aes(x = YEAR, y = log10(SIZE_HA))) + 
  geom_line() +
  geom_point(size = 0.5) + 
  labs(x = "Year", y = "log10 total area burned") + 
  geom_smooth(method = "loess", linewidth = 1, col = "firebrick", se = FALSE) +
  theme_classic() +
  theme(aspect.ratio = 0.80, text = element_text(size = 16), 
    axis.text = element_text(size = 14))



Interaction plot

The command interaction.plot is quick but rudimentary, as it fails to show the data.

Interaction plots display how the mean of a numeric response variable y changes between the levels of two categorical variables, A and B. The graph is especially useful for determine whether an interaction is present between two factors A and B in a factorial experiment, or between a factor A and a blocking variable B. If the lines are parallel then there is no interaction.

z <- aggregate(y ~ A + B, data = mydata, FUN = mean, na.rm = TRUE)
interaction.plot(z$A, z$B, response = z$y, type = "b")

For example, the mink data show roughly parallel lines between the sexes, indicating little interaction between sex and category.

# First calculate the means for each combination of the categorical variables
z <- aggregate(vol3.cpc ~ category + sex, data = mink, FUN = mean, na.rm = TRUE)

# Then draw the plot
interaction.plot(x.factor = z$category, trace.factor = z$sex, 
  response = z$vol3.cpc, type = "b", lty = 1, las = 1,
  legend = TRUE, pch = 16, col = c("goldenrod1", "firebrick"))


The lattice package

The lattice package makes it easy to draw a panel of plots separately by group. The basic plot is simple, but commands to add points and lines to the individual panes can be tricky.

The lattice package is included with the basic installation but you need to load the library. The graph types available in the lattice package include the standard ones found also in R’s basic graphics package, such as box plots, histograms, and so on. The table below lists the most types and the relevant command.

For example, to draw a histogram of a numeric variable x separately for four groups identified by the variable B in the data frame mydata, use

library(lattice)
histogram(~ x | B, data = mydata, layout = c(1,4), right = FALSE)

The layout option is special to lattice and draws the 4 panels in a grid with 1 column with 4 rows, so that the histograms are stacked and most easily compared visually. The right=FALSE option has the same meaning here as for the base R hist command.

To draw a bar graph showing the frequency distribution of a categorical variable A separately for each group identified by the variable B,

barchart( ~ table(A) | B, data = mydata)

This produces horizontal bar graphs, which leaves room for the category labels. To draw the bars vertically instead, while tilting the group labels on the x-axis by 45 degrees so that they fit,

barchart(table(A) ~ names(table(A)) | B, data = mydata, 
         scales = list(x=list(rot=45)))

As a third example, draw a scatter plot to show the relationship between the numeric variables x and x separately for each group in the variable B. The pch option in this example replaces the default plot symbol with a filled dot, and the aspect option sets the relative lengths of the vertical and horizontal axes.

xyplot(y ~ x | B, data = mydata, pch = 16, aspect = 0.7)

It is possible to add plot elements to individual panels, but the commands and options take some getting used to. For example, to fit a separate regression line to each scatter plot, one to each group, use the panel argument in xyplot to construct a function that applies built-in panel functions to each group, as follows.

xyplot(y ~ x | B, data = mydata, pch = 16, aspect = 0.7,
    panel=function(x, y){       # Use x and y here, not real variable names
        panel.xyplot(x, y)  # draws the scatter plot
        panel.lmline(x, y)  # fits the regression line
        }
    )

This doesn’t even begin to describe what’s possible using the lattice package. Crawley has a few more examples of trellis graphics in The R Book. Sarkar (2008) gives a complete description. See the links to these books on the “Books” tab of the Biology 501 course page.

The table below shows a few of the commonly used plotting commands in the lattice package. x and y are numeric variables, whereas A is a categorical variable (character or factor). B is a factor or character variable that will define the groups or subsets of the data frame to be plotted in separate panels. A separate plot in the graphics window will be made for each of the groups defined by the variable B.

Command Graph type Syntax (~ is the “tilde” symbol)
barchart bar graph barchart(~table(A) | B, data=mydata)
bwplot box plot bwplot(x ~ A | B, data=mydata)
histogram histogram histogram(~x | B, data=mydata, right=FALSE)
stripplot strip chart stripplot(x ~ A | B, data=mydata, jitter=TRUE)
xyplot scatter plot xyplot(y ~ x | B, data=mydata)
 

© 2009-2024 Dolph Schluter