Try to write your report using **R Markdown**. The
advantage is that R code chunks written inline are executed when you
make the html file.

An example using R Markdown file is found here. Save example files to a folder on your hard disk and then open with RStudio. To render it, click “Knit” at the top of the Source file pane in RStudio.

Email the TA for the course if you have questions.

This assignment is due Feb 9.

Find a graph drawn from data and published by your thesis supervisor. If your supervisor is flawless, pick a graph from your own thesis or from another paper published from your lab or department.

Choose a graph that has plenty of room for improvement. Too little improvement means we can’t assign many marks.

Students from the same lab: don’t choose the same or very similar graphs.

In your report, explain the study. Analyze the graph. Explain what patterns the graph is intended to show. Explain why you think it falls short of its potential. Explain the flaws in the graph.

Redraw the graph in R using principles of effective display. Try to obtain and make use of the raw data, otherwise extract them from the graph or simulate raw data.

Analyze your new graph according to principles of good graph design. Explain how your improvements display the patterns more effectively than the original. Why does your graph succeed compared to the original?

Attach your R script at the end (or include as code chunks inline if you are using R Markdown)

Email paper to me as a single .pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT1.PDF

Grade will be based on: the quality of your analysis of the original graph; the magnitude of improvement of the new graph; your interpretation of it and explanation of how it is improved; the quality of your R script.

This assignment is due Friday, March 15, 2024.

Obtain a data set and analyze it by fitting a linear, mixed, or generalized linear model in R.

Obtain a data set from your supervisor or online data depository (e.g. datadryad.org).

Include just one response variable.

For the explanatory variables, include at least one categorical fixed factor, such as an experimental or observational treatment.

Include at least 1, and no more than 2, additional explanatory variables (random or fixed factors, blocks, covariates, etc).

Prepare a thorough report on the analysis and interpretation of the data. Below I list some of the things to include in your report, but note that the list might not be complete.

Include all your writing and graphs in a single pdf file (titled LASTNAME.FIRSTNAME.ASSIGNMENT2.PDF) and email to me.

- Explain (in a paragraph) the purpose of the study that yielded the data.
- Explain the specific data set you are using. For example, say where the data are from, give the meaning of the variables, and so on.
- Illustrate and describe the main patterns revealed in the data.
- State what parameters (magnitudes) you will estimate with these data.
- State what hypotheses you will test with these data.
- Fit a linear model to the data in R. Explain in words the model you fit.
- Interpret the output. To assess biological significance, explain the parameter estimates (magnitudes). What do they mean and what are your conclusions based on these parameter estimates. To assess statistical significance, explain the null hypotheses and interpret the test results.
- Visualize the model fit to the data. Explain what the graph is showing.
- Address how well the statistical assumptions of your analysis were met. How did you handle violations?
- State the overall conclusions reached from your analyses of biological and statistical significance.
- Include your clean R code in an appendix.

This assignment is due April 12, 2024

Clues to the inheritance patterns of population differences can be
gained by fitting linear models to measurements of traits in parents and
hybrids. In this assignment you will use model selection methods to
compare the fit of three alternative genetic models of divergence in
soil arsenic tolerance in two populations of the grass *Agrostis
capillaris* (Watkins and MacNair 1991, Genetics of arsenic tolerance
in *Agrostis capillaris*. Heredity 66: 47-54). One population
occurred on an abandoned, arsenic-contaminated mine; the other was from
an edaphically similar, non-toxic site.

To accomplish this you will need to choose a criterion (AIC or BIC)
to decide the fit of models to the data, and to determine which is best
suited to your purposes. You need to defend your choice of method
vigorously in your report, which will require some research. Why did you
decide to use it instead of the other criterion? Decide on the criterion
*before* you analyze the data.

Height of plant tillers of different cross generations can be downloaded here.

Height is the cube root of tiller height (in mm) when grown on arsenic-laced soil. Line refers to the parent population from the contaminated site (“high” tolerance), the parent population from the uncontaminated site (“low” tolerance), their F1 and F2 hybrids (“f1”, “f2”), and the backcrosses between the F1 hybrid and each parent population (“bh” for high and “bl” for low tolerance). I’ll refer to these crosses as genotypes.

Analyze these data in R according to the following methods. Note that
this is not a complete list of expectations. Fit linear models with
fixed effects only. Assume that all the data for a given cross type are
independent. Provide all necessary explanations in your report. No
*P*-values are allowed in your report. Include your R commands in
an appendix.

- Graph the data. Explain your graph. What is the pattern in the data?
- Create a table of means and standard deviations of genotypes. Design the table as you would if you were publishing it. Don’t worry too much about font.
- Add a numeric variable in the data set to represent the proportion
of the genome inherited from the high-tolerance parent:

1 for the high-tolerance parent genotype

0 for the low-tolerance parent genotype

0.5 for the F1 and F2 hybrids

0.25 for the backcross to the low tolerance population

0.75 for the backcross to the high tolerance population

Make sure that the variable is numeric rather than a factor or character. - Fit the numeric variable you created in (3) to the height data using
a linear model. This is called the additive model, whereby tolerance
increases linearly with the proportion of the genome inherited from the
high tolerance parent. Evaluate the model fit (Remember: no
*P*values!). - Add another numeric variable to the data set to represent dominance
effects that might be present in the hybrids:

0 for both parent genotypes

1 for the F1 hybrid

0.5 for the remaining three hybrid genotypes

Make sure that the variable is numeric rather than a factor or character. - Fit a second model to the same data that includes both of the numeric variables created in (3) and (5). Leave out any interaction terms. This is the additive plus dominance model. Any dominance effects present will displace the mean value of the hybrids toward one or other of the parents relative to the values predicted by the additive model. Evaluate model fit.
- Finally, fit a third model that has the original genotype variable as the only explanatory variable. The fit of this model will deviate from the model fitted in (6) if there is interaction (epistasis) between genes inherited from the two parents.
- Present your results, comparing model fits. Which genetic model best fit the data? Explain and summarize.
- Explain how the procedure you used above to analyze these data differs from that of conventional null hypothesis significance testing. In your view, would a null hypothesis significance testing approach be a poorer, equivalent, or superior approach to the one used above to decide between the three models? Explain.
- Include your clean R code in an appendix.

Email paper to me as a single pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT3.PDF

© 2009-2024 Dolph Schluter