Building a tree involves a few key steps:
- Choose taxa to be analyzed. Taxa are groups of organisms that have been assigned to a taxonomic group and named. Once the tree is built, these groups will appear at the tips of the tree, so biologists choose taxa among which they want to uncover the evolutionary relationships. The selected taxa may be very specific (e.g., each taxon might be a particular population or species) or more general (e.g., each taxon might be a group of closely related species, such as felines, rodents, or other major groups of mammals).
- Select outgroup(s). An outgroup is a taxon outside the group of interest that is known from other evidence to be closely related to that group. Once the tree is built, the outgroup will be used to help determine which lineages on the tree are the oldest and which character states are ancestral. So for example, when building a phylogeny of pine species, researchers used cedars and larches — conifers that are closely related to pines, but not actually pine trees — as outgroups.
- Choose characters. These characters are the basic evidence for building the tree and might take any of the forms discussed on previous pages (DNA sequences, morphology, etc.). And remember that only some characters are useful for building trees! As just discussed, biologists are looking for homologous traits that meet a certain set of criteria.
- Collect evidence. Once characters are selected, scientists determine the state of those characters for each taxon — e.g., which base (A, T, G, or C) does each species have in the first position of a particular gene? How many vascular bundles does each pine species have in its needles? This is the evidence that will be used to build the tree. A lot of research goes into learning about comparable sets of valid characters for each taxon. These data are summarized in the form of a data table (called a data matrix).
- Figure out which tree fits the evidence best. There are many different methods for evaluating trees, but the basic idea is the same for all of them. For seven or more taxa, the lineages could be related in any of thousands or even millions of ways. A biologist’s goal is to find the tree representing the evolutionary history that is most likely to have produced the evidence in the data matrix. To learn about some of the criteria for evaluating possible trees, read on.
Feeling lost? Review tree basics with the primer.
5Gernandt, D.S., G.G. López, S.O. Garcia, and A. Liston. 2005. Phylogeny and classification of Pinus. Taxon 54:29-42.