Understanding Evolution

Tracking COVID-19 outbreaks with evolution

May 2020, updated August 2020

3D shape of the coronavirus

The map shows the known locations of coronavirus cases by county. Credit: New York Times

Earlier this month, an autopsy revealed that COVID-19 began killing people in the United States weeks before we noticed. The earliest COVID-19 death now appears to be a Californian who died on February 6, 2020, not the Washington State residents who died 20 days later, the previous "first" deaths. The newly discovered coronavirus victims had not traveled recently and probably caught it from someone in the community. This suggests that the virus was spreading person-to-person in the U.S. earlier than previously believed. The revised timeline also fits well with new evidence collected by scientists who use evolutionary techniques to unveil the sources and course of COVID-19 outbreaks. 

Where's the evolution?

In real time, to prevent new infections, health professionals must use contact tracing to figure out who passed the virus to whom and who might have been exposed. However, after the fact, scientists can use viral genetics to reconstruct much of this information, revealing the source of outbreaks in different communities and even when they might have started.  

This is possible because viruses like COVID-19 evolve relatively quickly. As virus particles reproduce, their genomes acquire small copying errors — mutations. As the virus jumps from one person to the next, it takes its mutations with it and new mutations occur alongside the old ones. This is no different than the evolution of more familiar species (though it happens much more rapidly): just as one ancestral finch species can diversify into a multitude of species over millennia, one ancestral COVID-19 strain diversifies into a multitude of descendent strains over a few weeks of contagion.  And scientists use the same techniques to study the two processes.  To reconstruct the evolutionary history of finches, scientists collect data on their genetic sequences (and often other traits as well) and then use this information to reconstruct the evolutionary tree, or phylogeny, representing finch history and how all the modern species are related to one another. For viruses, this means sequencing genetic material — coronaviruses carry RNA, not DNA — from viral strains infecting different patients and building a viral family tree.

When evolving entities like viruses pass heritable information from parent to offspring, we can depict their evolution in a tree shape, called a phylogeny. As variations are introduced and passed to offspring, new branches of the tree form. We can usually reconstruct the evolutionary history of a group by collecting information about the distribution of traits (genetic sequences, anatomical caracteristics, behaviors, etc.) among lineages in that group. However, when new variations (e.g., mutations) are introduced rapid fire, it is harder to reconstruct ancient evolutionary relationships. This is because those variations are evidence, and when a lot of them occur, new changes are likely to overwrite older ones, destroying evidence of older relationships. Since viruses mutate quickly, it is very difficult to reconstruct their deep evolutionary history. Nevertheless, scientists have made a lot of progress on this challenge, especially by including information about the structure of viral proteins (and not just genetic sequences) in their analyses.

Mutant viruses?!?!?

The idea that COVID-19 is mutating sounds a bit scary — like it's suddenly going to acquire super-infecting powers. While the mutations of comic books and sci-fi are often of this sort, mutations in the real world are not. The vast majority of real world mutations either have no effect on the individual who carries them or make that individual less functional (and in the case of COVID-19, a poorly functioning virus would be good news for us!). While it’s true that COVID-19 is experiencing mutations all the time, this is not cause for alarm. Scientists have uncovered no evidence of COVID-19 mutations that make the virus any more infectious or virulent, and mutations are key evidence that allow us to track the movement and history of the virus.

Biologists have studied COVID-19 outbreaks in several communities this way. For example, scientists collected virus samples from patients in New York City, sequenced their genetic material, and combined them with data on viruses from around the world to build an evolutionary tree. They announced their results this past month: New York City viruses occur in small clusters spread out over the tree.  This means that the New York outbreak is the result of many different introductions of the virus, not a single Patient Zero.  However, most of the New York viruses are not very closely related to viruses from patients in China, where COVID-19 first arose.  Instead, most of the introductions to NYC seem to have come from Europe and other places in North America.  We can't blame New York's overflowing emergency rooms on travelers from China.

The same is true in California: viruses from California are all over the evolutionary (and geographic!) map. Some are closely related to European strains, some to strains from China, and some to virus samples from elsewhere in the U.S.  In Connecticut, coronavirus sequences suggest most cases there stem from other parts of the U.S, rather than from China or Europe.  In fact, the latest data suggest that most of the coronavirus outbreaks in smaller communities in the U.S. were set off by travellers from New York City, not overseas.

However, the infections in Washington State showed a very different pattern. Most of the Washington cases (85%) formed a tight-knit clade — a group of all the lineages descended from a single shared ancestral viral strain.  Furthermore, the closest viral relatives of this clade were from patients in China.  This suggests that most of the infections in Washington State can be traced to a single strain (perhaps even carried by a single person), likely acquired through travel to China, which was then passed on locally. Because mutations occur at a relatively predictable rate, biologists can use the number of mutations that distinguish two viruses to estimate how long they've been evolving as distinct lineages: more differences mean they've been evolving separately for a longer time.  In the case of the Washington State clade, the viral strains are diverse and suggested that they'd been diversifying locally since late January or early February, 2020. This means that in Washington, the virus was probably passed from person to person for weeks, without anyone realizing it, before the first case of local transmission was finally identified on February 28.

Evolutionary tree of COVID-19

Evolutionary tree of COVID-19 from Washington State and around the world. Image loosely based on Bedford et al, 2020, but significantly simplified from the original for clarity.

Studies like these use COVID-19's own evolution to illuminate the sources, pattern, and timing of transmission in local communities.  In many cases, they reveal things that we couldn't figure out in any other way — like how outbreaks started and where the virus might have passed undetected under our radar.  Mutations are key to these insights. For fast-evolving entities like viruses, mutations work like breadcrumbs, allowing us to work backwards to reconstruct the path of evolutionary history.

News update, August 2020

It’s only been two months since we published this story, but with the breakneck pace of coronavirus research, it’s already time for an update! Above, we explained that viruses mutate all the time and that the vast majority of those mutations are harmful or neutral to the virus. It was no surprise then a strain of COVID-19 carrying a particular mutation (known as D614G) became the most common. That was likely to happen even if the mutation had no effect at all — just like one person will win a game of monopoly even if all the players are equally skilled. However, scientists are now debating whether that mutation might actually tip the odds in favor of COVID-19 strains that carry it — i.e., give it a fitness advantage. The mutation tweaks a protein that helps form the distinctive spikes on the surface of coronaviruses, which allow the virus to invade cells. New evidence suggests that the D614G mutation may make it easier for the virus to invade cells, cause higher viral loads in patients, and help one strain outcompete others in regional outbreaks – a telltale sign that the mutation is favored by natural selection. But of course, this is still an area of active research and scientists are working hard to gather the additional evidence we’d need to know for certain. Stay tuned for more updates!

Primary literature

  • Bedford, T., Greninger, A. L., Roychoudhury, P., Starita, L. M., Famulare, M., Huang, M., ... and Jerome, K. R. (2020). Cryptic transmission of SARS-CoV-2 in Washington State. medRxiv preprint. DOI: https://doi.org/10.1101/2020.04.02.20051417.  read it
  • Deng, X., Gu, W., Federman, S., du Plessis, L., Pybus, O. G., Faria, N., … and Chiu, C. Y. (2020). A genomic survey of SARS-CoV-2 reveals multiple introductions into Northern California without a predominant lineage. medRxiv preprint. DOI: https://doi.org/10.1101/2020.03.27.20044925. read it

Discussion and extension questions

  1. What aspects of evolution do scientist rely upon to understand the sources of local COVID-19 outbreaks?
  2. In your own words, describe how scientists reconstruct the evolutionary history of local COVID-19 outbreaks.
  3. Imagine that a friend listens to the news and writes in an email to you, "I heard that COVID-19 has mutated. Time to head for the hills! We don’t stand a chance now." How would you respond to correct this interpretation?
  4. The article above explains that in Washington State, many viruses infecting different people are closely related and form a clade. Is this consistent with many outside introductions of the virus to the community or with community spread? Explain the logic behind your answer.
  5. Imagine two different COVID-19 outbreaks in different communities. In each case, the viral strains form a clade. The two communities have reported roughly the same numbers of infections. In community A, the strains are all nearly identical to one another. In community B, the strains are much more diverse. How would you interpret this evidence in terms of community spread and when each outbreak likely started?
  6. Advanced: Discuss how a pathogen's rate of evolution impacts our ability to track it using phylogenetics. Consider the case of a pathogen that evolves much more quickly than COVID-19 and the case of a much slower evolving pathogen.


  • Bedford, T., Greninger, A. L., Roychoudhury, P., Starita, L. M., Famulare, M., Huang, M., ... and Jerome, K. R. (2020). Cryptic transmission of SARS-CoV-2 in Washington State. medRxiv preprint. DOI: https://doi.org/10.1101/2020.04.02.20051417.
  • Carey, B., and Glanz, J., (May 7, 2020). Travel from New York City seeded waves of U.S. outbreaks. The New York Times. Retrieved May 7, 2020 from https://www.nytimes.com/2020/05/07/us/new-york-city-coronavirus-outbreak.html
  • Fauver, J. R., Petrone, M. E., Hodcroft, E. B., Shioda, K., Ehrlich, H. Y., Watts, A. G., … and Grubaugh, N. D. (2020). Coast-to-coast spread of SARS-CoV-2 in the United States revealed by genomic epidemiology. medRxiv preprint. DOI: https://doi.org/10.1101/2020.03.25.20043828.
  • Fuller, T., and Baker, M. (Apr 22. 2020). Coronavirus death in California came weeks before first known U.S. death. The New York Times. Retrieved April 29, 2020 from https://www.nytimes.com/2020/04/22/us/coronavirus-first-united-states-death.html
  • Gonzalez-Reiche, A. S.,  Hernandez, M. W., Sullivan, M., Ciferri, B., Alshammary, H., Obla, A., ... and van Bakel, H. (2020). Introductions and early spread of SARS-CoV-2 in the New York City area. medRxiv preprint. DOI:  https://doi.org/10.1101/2020.04.08.20056929.
  • Korber, B., Fischer, W. M., Gnanakaran, S., Yoon, H., Theiler, J., Abfalterer, W., … and Montefiori, D. C. (2020). Tracking changes in SARS-CoV-2 spike: evidence that D614G increases infectivity of the COVID-19 virus. Cell. DOI: https://doi.org/10.1016/j.cell.2020.06.043


View this article online at:

Understanding Evolution © 2020 by The University of California Museum of Paleontology, Berkeley, and the Regents of the University of California