The Center for Data Innovation spoke with George Church, a professor of genetics at Harvard Medical School and director of the Personal Genome Project (PGP), which provides open access information on human genomes, environmental conditions, and traits. The project, launched in 2005, aims to collect genomes and other information from 100,000 volunteers to fuel longitudinal research into genetic diseases, human biology, and precision medical treatments. In addition to being the project’s director, Dr. Church was one of its first participants.
This interview has been lightly edited.
Travis Korte: The goal of the project is to collect 100,000 genomes, but presumably research can occur long before that benchmark is reached. Is there a particular tipping point in the number of participants at which the dataset starts to become more useful to a wider range of applications?
George Church: Yes. The goal of 100,000 is not a magic number. Quite a bit can be done with one genome, including replacing correlative study with causality by changing specific base pairs in the genome and looking for changes in organ-level traits, or by using genetic mutation information to treat patients with particular drugs.
TK: Can you give an overview of areas of research where PGP data will be most useful? What are some of the first questions that will be answered using PGP data?
GC: One example is variants of unknown significance [variations in a genetic sequence that carry unknown associations with disease risk], which require subjects with known trait data, including medical and non-medical traits. Another example is carrier screening [testing to determine whether prospective parents carry genetic mutations that could cause disorders in a child]. Finally, PGP data can help the Food and Drug Administration and the National Institute of Standards and Technology establish genomic data standards.
TK: PGP asks participants to waive expectations of privacy. You and thousands of other participants have decided that it’s worth it. What is the pitch you use to convince people that the benefits outweigh the risks?
GC: I try not to pitch, since we don’t need to get everyone. Instead, I find it’s best to focus on volunteers who are highly motivated. They convince themselves based on a desire to expedite research on their family’s diseases or their exceptional health. Also, the risks have dropped a bit since we started the project, due to the passage of the Genetic Information Nondiscrimination Act in 2008.
TK: PGP aims to collaborate with participants over their lifetimes to track health information long after they submit their initial samples. This sounds like a complicated exercise in data management, since data formats, health records, and other things may change over time. How do you ensure that all this information remains meaningful to longitudinal researchers?
GC: We handle this problem the same way Wikipedia does. The most useful data will get translated and retained in a way that reflects utility. PGP also acts as a resource for comparing emerging data, interpretation, and user interface standards over time.
TK: Are there benefits to individuals participating in PGP other than the satisfaction of contributing to an important project? For example, will participating in PGP make it easier for patients to have their care personalized in the future?
GC: Hopefully yes. Some will benefit right away by finding known alleles which are highly predictive and medically actionable, while others will contribute to discoveries.
Photo: Steve Jurvetson, Creative Commons