Sign In

Press Release

February 26, 2014

Researchers worldwide will now have access to genetic data linked to medical information on a diverse group of more than 78,000 people, enabling investigations into many diseases and conditions. The data have just been made available to qualified researchers through the database of Genotypes and Phenotypes (dbGaP), the online database of the National Institutes of Health (NIH). The announcement was made today at the National Advisory Council on Aging by Richard Hodes, director of the National Institute on Aging (NIA).


The data come from one of the nation’s largest and most diverse genomics projects — the Genetic Epidemiology Research on Adult Health and Aging (GERA) cohort — which was developed collaboratively by the Kaiser Permanente Research Program on Genes, Environment and Health (RPGEH) and the University of California, San Francisco (UCSF). The addition of the data to dbGaP was made possible with $24.9 million in support from the NIA and the National Institute of Mental Health at NIH, as well as from the Office of the NIH Director.


“Data from this immense and ethnically diverse population will be a tremendous resource for science,” said NIH Director Francis Collins. “It offers the opportunity to identify potential genetic risks and influences on a broad range of health conditions, particularly those related to aging.”


The GERA cohort is part of the RPGEH, which includes more than 430,000 adult members of the Kaiser Permanente Northern California health plan who volunteered to participate in the research program. Data on this larger cohort include electronic medical records, behavioral and demographic information from surveys, and saliva or blood samples from 200,000 participants obtained with informed consent for genomic and other analyses.


This work was made possible with the investment of an $8.6 million grant from the Robert Wood Johnson Foundation, which saw the potential to build a resource that would transform genomic research. “This massive influx of new, high quality data will help scientists discover bigger breakthroughs faster,” said Nancy Barrand, the foundation’s senior adviser for Program Development. “Researchers used to have to go through the painstaking process of collecting and studying genomic samples on their own. Now researchers worldwide can find valuable clues for improving health by studying the genetic information from a cohort of 78,000 diverse individuals in dbGaP.”

Additional support for development of the RPGEH resource was provided by the Wayne and Gladys Valley Foundation, the Ellison Medical Foundation, and Kaiser Permanente.


The genetic information on more than 78,000 individuals translates into over 55 billion bits of genetic data for the cohort. The researchers conducted genome-wide genotyping using the newly developed Affymetrix Axiom Gene Titan system employed in the UCSF Institute for Human Genetics Genomics Core Facility to rapidly scan selected markers of genetic variation called single nucleotide polymorphisms (SNPs) in the genomes of the people in the GERA cohort. The RPGEH then combined the genetic data with information derived from Kaiser Permanente’s comprehensive longitudinal electronic medical records, as well as extensive survey data on participants’ health habits and backgrounds, providing researchers with an unparalleled research resource. These data form the basis of genome-wide association studies (GWAS) that can look at hundreds of thousands to millions of SNPs at the same time in relation to many different health conditions. 


“The transfer of this data will greatly accelerate research on genetic influences on health, disease and aging,” said Catherine Schaefer, PhD, executive director of the Research Program on Genes, Environment and Health and co-principal investigator for GERA. “Making these data on such a large diverse cohort broadly available will enable many more scientists to work at a much greater scale that is likely to help answer important questions concerning health.”


“It’s all about time and money,” added Neil Risch, PhD, director of the UCSF Institute for Human Genetics and co-principal investigator for GERA. “Collecting large amounts of health data from people — and processing it — is labor intensive and expensive. With this data set, no one has to collect clinical information, take bio samples, safeguard and store them, or conduct genome-wide genotyping of their DNA. They can simply sit at a computer, ask questions of the data, and extract information.”


In addition to diseases and conditions traditionally associated with aging, such as cardiovascular disease, cancer and osteoarthritis, researchers can explore the potential genetic underpinnings of a variety of diseases that affect people in adulthood, including depression, insomnia, diabetes, certain eye diseases and many others representing a variety of disease domains. Researchers will also be able to use the database to confirm or disprove other studies that use data from relatively small numbers of people, as well as to increase the size and power of their samples by adding participants from GERA to meta-analyses. The large cohort will also serve as an excellent source of controls that researchers can compare to individuals with different conditions that they have studied.


“The GERA cohort represents the largest number of people — of any age — with genetic, health and environmental data to be deposited in dbGaP,” said Hodes in his announcement. “New approaches to genomics were developed for this project and I’m pleased that it’s ready for researchers’ use in the dbGaP database. I look forward to new insights that such a unique resource might offer for better health with age.”


Investigators who are interested in applying for access to the database can find the procedures on the dbGaP website: Specific information on the data can be found at Qualified researchers may also apply for access to RPGEH data and biospecimens at the RPGEH Research Portal.