Assignments will be posted here.
Try to write your report using R Markdown. The advantage is that R code chunks written inline are executed when you make the html file.
An example using R Markdown is found here. Save example files to a folder on your hard disk and then open with RStudio. To render it, click “Knit” at the top of the Source file pane in RStudio.
Email the TA for the course if you have questions.
This assignment is due Feb 6 at 5 pm.
Find a bad graph drawn from data and published by your thesis supervisor. If your supervisor is flawless, pick another published graph, eg from a paper published from your lab or department. Come talk to us if you are having trouble with this step.
Students from the same lab: don’t choose the same or very similar graphs.
It is important that you choose a graph that requires significant improvement. Too little improvement means we can’t assign many marks.
In your report, explain the original study, what the data are and the goal of the study.
Analyze the bad graph. What is its goal? Explain what patterns the graph was intended to show. Explain why you think it is not successful. Explain the flaws in the graph. How does it fall short?
Make a new graph using principles of effective display. Try to obtain and make use of the raw data, otherwise extract them from the graph or simulate raw data. Use R to make the graph.
While we recognize that coding is all-absorbing, don’t lose sight of the real aim of this assignment, which is to analyze and improve visual displays following the principles we have discussed. Review recommendations made in lectures and workshops.
Analyze your new graph. Why is it an improvement? Remind us of the goal of the graph. Explain how your improvements achieve the goal more effectively than the original. Explain why your graph succeeds.
Append your R script at the end (or submit your document in R Markdown, which combines text and R code).
If you used an AI, include an acknowledgement at the end that explains how you used it.
Email paper to both me and Lucia as a single .pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT1.PDF
Grade will be based on: the quality of your analysis of the original graph; the magnitude of improvement of the new graph; your interpretation of it and explanation of how it is improved; the quality of your R script.
This assignment is due Friday, March 13 at 5 pm.
Obtain a data set and analyze it by fitting a linear, mixed, or generalized linear model in R.
Prepare a thorough report on the analysis and interpretation of the data. Below I suggest some of the things to include in your report, but note that the list might not be complete. Review the topics of lectures and discussion to date to remind yourself what is important when analyzing data.
Include all your writing and graphs in a single pdf file (titled LASTNAME.FIRSTNAME.ASSIGNMENT2.PDF) and email to me.
Explain (in a paragraph) the purpose of the study that yielded the data.
Explain the specific data set you are using. For example, say where the data are from, give the meaning of the variables, and so on.
Illustrate and describe the main patterns revealed in the data.
State what parameters (magnitudes) you will estimate with these data.
State what hypotheses you will test with these data.
Fit a linear (or mixed or generalized linear) model to the data in R. Explain in words the model you fit.
Interpret the output. To assess biological significance, explain the parameter estimates (magnitudes). What do they mean and what are your conclusions based on these parameter estimates. To assess statistical significance, explain the null hypotheses and interpret the test results.
Visualize the model fit to the data. Explain what the graph is showing.
Address how well the statistical assumptions of your analysis were met. How did you handle violations?
State the overall conclusions reached from your analyses of biological and statistical significance.
If you used an AI, include an acknowledgement at the end that explains how you used it.
Include your clean R code in an appendix.
Email paper to both me and Lucia as a single .pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT1.PDF
This assignment is due April 10, 2026
Clues to the inheritance patterns of population differences can be gained by fitting linear models to measurements of traits in parents and hybrids. In this assignment you will use model selection methods to compare the fit of three alternative genetic models of divergence in soil arsenic tolerance in two populations of the grass Agrostis capillaris (Watkins and MacNair 1991, Genetics of arsenic tolerance in Agrostis capillaris. Heredity 66: 47-54). One population occurred on an abandoned, arsenic-contaminated mine; the other was from an edaphically similar, non-toxic site.
To accomplish this you will need to choose a criterion (AIC or BIC) to decide the fit of models to the data, and to determine which is best suited to your purposes. You need to defend your choice of method vigorously in your report, which will require some research. Why did you decide to use it instead of the other criterion? Decide on the criterion before you analyze the data.
Height of plant tillers of different cross generations can be downloaded here.
Height is the cube root of tiller height (in mm) when grown on arsenic-laced soil. Line refers to the parent population from the contaminated site (“high” tolerance), the parent population from the uncontaminated site (“low” tolerance), their F1 and F2 hybrids (“f1”, “f2”), and the backcrosses between the F1 hybrid and each parent population (“bh” for high and “bl” for low tolerance). I’ll refer to these crosses as genotypes.
Analyze these data in R according to the following methods. Note that this is not a complete list of expectations. Fit linear models with fixed effects only. Assume that all the data for a given cross type are independent. Provide all necessary explanations in your report. No P-values are allowed in your report. Include your R commands in an appendix.
Visualise the data. What is the pattern in the data?
Create a table of means and standard deviations of genotypes. Design the table as you would if you were publishing it. Don’t worry too much about font.
Add a numeric variable in the data set to represent the
proportion of the genome inherited from the high-tolerance parent:
1 for the high-tolerance parent genotype
0 for the low-tolerance
parent genotype
0.5 for the F1 and F2 hybrids
0.25 for the
backcross to the low tolerance population
0.75 for the backcross
to the high tolerance population
Make sure that the variable is
numeric rather than a factor or character.
Fit the numeric variable you created in (3) to the height data using a linear model. This is called the additive model, whereby tolerance increases linearly with the proportion of the genome inherited from the high tolerance parent. Evaluate the model fit (Remember: no P values!).
Add another numeric variable to the data set to represent
dominance effects that might be present in the hybrids:
0 for both
parent genotypes
1 for the F1 hybrid
0.5 for the remaining
three hybrid genotypes
Make sure that the variable is numeric rather
than a factor or character.
Fit a second model to the same data that includes both of the numeric variables created in (3) and (5). Leave out any interaction terms. This is the additive plus dominance model. Any dominance effects present will displace the mean value of the hybrids toward one or other of the parents relative to the values predicted by the additive model. Evaluate model fit.
Finally, fit a third model that has the original genotype variable as the only explanatory variable. The fit of this model will deviate from the model fitted in (6) if there is interaction (epistasis) between genes inherited from the two parents.
Present your results, comparing model fits. Which genetic model best fit the data? Explain and summarize.
Explain how the procedure you used above to analyze these data differs from that of conventional null hypothesis significance testing. In your view, would a null hypothesis significance testing approach be a poorer, equivalent, or superior approach to the one used above to decide between the three models? Explain.
If you used an AI, include an acknowledgement at the end that explains how you used it.
Include your clean R code in an appendix.
Email paper to both me and Lucia as a single .pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT1.PDF
© 2009-2026 Dolph Schluter