## Homework assignments

The breakdown of marks for this course is as follows:

- Assignments (50%)
- Presenting a talk on a topic/paper (20%)
- Moderating a discussion on a topic/paper (20%)
- Overall participation in discussion (10%)

## Assignment 3: Select the best genetic model

This assignment is due Friday, December 1.

Clues to the genetic basis of species differences can be gained by fitting linear models to measurements of traits in parents and hybrids. In this assignment you will use model selection methods to fit alternative genetic models to oviposition preference data in two host races of the planthopper *Nilaparvate nugens* (Sezer and Butlin 1998, Proc. Roy. Soc. Lond. B 265: 2399-2405). One race occurs on cultivated rice. The other lives on the aquatic plant *Leersia*, which is probably the ancestral host.

To accomplish this you will need to choose a criterion (AIC or BIC) to decide the fit of models to the data, and to determine which is best, and so on. You need to defend your choice of method vigorously in your report, which will require some research. Why did you decide to use it instead of the other criterion? Decide on the criterion before you analyze the data.

Host oviposition preference data of females can be downloaded HERE.

Preference is the log-transformed ratio of the number of eggs laid on rice to the number laid on *Leersia*, when both plants were provided by the experimenters. Genotype refers to the parent race on rice (“rice”), the parent race on *Leersia* (“leer”), their F1 and F2 hybrids (“f1”, “f2”), and the backcrosses between the F1 hybrid and each parent race (“br” for rice and “bl” for *Leersia*).

Analyze these data in R according to the following methods. Fit linear models with fixed effects only. Assume that all the data for a given cross type are independent. Provide all necessary explanations in your report. Always show your model fits graphically, as usual. No *P*-values are allowed in your results. Include your R commands in an appendix.

- Create a graph to visualize the oviposition preference of the different genotypes. Explain the graph. What is the pattern in the data?
- Create a table of means and standard deviations of the genotypes. Make this a high quality table rather than simply computer output.
- Add a numeric variable in the data set to represent the proportion of the genome inherited from the rice parent:

1 for the rice parent genotype

0 for the*Leersia*parent genotype

0.5 for the F1 and F2 hybrids

0.25 for the backcross to the*Leersia*population

0.75 for the backcross to the rice backcross

Make sure that the variable is numeric rather than a factor or character. - Fit the numeric variable you created in (3) to the preference data using a linear model. This is called the additive model, whereby mean preference for rice increases linearly with the proportion of the genome inherited from the rice parent. Evaluate the model fit. (Remember: no
*P*values!) - Add another numeric variable to the data set to represent dominance effects that might be present in the hybrids:

0 for both parental genotypes

1 for the F1 hybrid genotype

0.5 for the remaining three hybrid genotypes

Make sure that the variable is numeric rather than a factor or character. - Fit a second model to the same preference data that includes both of the numeric variables created in (3) and (5). Leave out any interaction terms. This is the additive plus dominance model. Any dominance effects present will displace the mean value of the hybrids toward one or other of the parents relative to the values predicted by the additive model. Evaluate model fit.
- Finally, fit a third model that has the original genotype variable as the only explanatory variable. The fit of this model will deviate from the model fitted in (6) if there is interaction (epistasis) between genes inherited from the two parents.
- Present your results, comparing model fits. Which genetic model best fit the data? Explain and summarize.
- Explain how the procedure you used above to analyze these data differs from that of conventional null hypothesis significance testing. In your view, would a null hypothesis significance testing approach be a poorer, equivalent, or superior approach to the one used above to decide between the three models? Explain.
- Include your clean R code in an appendix.
- Email paper to me as a pdf file: LASTNAME.FIRSTNAME.ASSIGNMENT3.PDF