The Center for Data Innovation spoke with Bastian Greshake, co-founder of openSNP, an online platform designed to host and share open source personal genetic data. Greshake discussed why people are interested in publicly and freely sharing their personal genomes, as well as the challenges involved in combining genetic information from a wide variety of different sources.
Joshua New: Personal genomics services, such as 23andMe and FamilyTreeDNA, are increasingly popular. How does openSNP fit into this market?
Bastian Greshake: With openSNP we are not offering any genetic testing in itself. So you can’t just send us your saliva and have us analyze it. But if you have done genetic testing with companies such as 23andMe or FamiliyTreeDNA, you can send us the data they generated. By uploading your data with openSNP you are donating the results into the public domain, for everyone to see and use. And similarly people can share phenotypic information, like eye or hair color and even their Fitbit data. In return, we mine different databases for information about the impact of different genetic variations. By offering this service we are trying to fill the niche for open source genetics, which none of the companies doing direct-to-consumer genetic testing really fills right now.
New: Why would someone want to disclose their genetic data for everyone to see for free? How do they benefit?
Greshake: The reasons for doing it are somewhat similar to the reasons why people develop open source software and make their code publicly available: The first reason is largely altruistic. By going open with your genetic data you can make sure that as many people as possible can profit from it. Generating genetic data sets is still prohibitively expensive for many scientists, especially when doing so on large scales. And of course, many people don’t have any access to generating those data sets by themselves.
Secondly, you will have potentially lots of people who are looking at your genetic information—think of it as “code review for your genes”. People with rare genetic diseases are very motivated to find other people with similar symptoms and genetic variants. And you also have people who just think it’s fun and a cool thing to analyze genetic data and those people might contact you if they found something interesting in your data.
New: Users can upload data to openSNP from a variety of genetic testing services. Are there any challenges in ensuring that all these different data sources can be used together?
Greshake: Yes, there are even challenges within data coming from a single data provider. Even though the human genome was fully sequenced over 10 years ago, there are still frequent updates. And with those updates, the names of the genetic variants and their metadata changes. Furthermore, the number of SNPs—single nucleotide polymorphisms—investigated by these testing companies changes sometimes. But all in all I would say that it’s not much worse than in any other data driven field.
New: Who uses data from openSNP, and why is it so important for them that the platform is open source?
Greshake: The data is used by lots of different people right now: It’s used for research into genomic privacy and pharmacogenomics, as a teaching aid, and some people even use the data to generate art. Here it’s definitely a huge benefit that the data is as open as possible, because it enables people to get really creative. At the same time being completely open with the source and the data is essentially an insurance for our users. This goes for users who upload data as well as for users who rely on openSNP as data source. While other large players might promise to not turn evil, in the end you still have to trust them. For openSNP, we’re trusted to act morally, and I certainly hope we can live up to that. Even if we were to do something unsavory with the data, we couldn’t stop anybody from using the data and the code to start openSNP 2.0 and keep the open idea alive.
New: openSNP has only been around for a couple years and, while growing, it is still somewhat small—I counted about about 1,800 uploaded genotypes. What’s in store for the future of openSNP?
Greshake: Actually, we really were surprised by how many people are willing to share their data under open conditions, and we are really grateful for all the people who participated so far. And the trend is really positive: every day people are uploading data sets. Our biggest problem so far is outreach. Running on a shoestring budget—we support the website with our salaries from our day jobs—means we can’t do any large advertising. And it also means that the development is rather slow. We would love to support more data formats, more wearables, make the phenotypic data more accessible and standardized and so on. But all of this is slow and ongoing work, so we would love to get to know more people interested in helping out!