Splits tree and iupac coding

I’ve been running splitstree to see the relationships between samples of H. bolanderi. I have coded my data using IUPAC so hets are a different symbol (Y, W, etc). The other way to code heterozygous sites into fasta is just pick one allele randomly.

By default, splitstree ignores all ambiguous sites, so if you use IUPAC coding, it will ignore all those sites. I switched it from ignoring, to using an average for all possible alleles. This made my tree much messier and had a weird smattering of samples pulled toward the outgroup. I’ve figured out that it has to do with the amount of missing data. Since Ns are ambiguous, when you average N it just sort of homogenizes the distances between samples. Thus, it can pull your samples into weird positions if they have different amounts of missing data.

My thoughts are that you should just ignore ambiguous data if you have enough sites to resolve your samples without them.