Data • Stats Up AI data

This collection of online resources offers brief overviews and easy access to leading databases in the fields of genomics and cancer research. It covers various research topics, from gene expression profiles and cancer genomics to genotype-phenotype correlations. Users may need to create their own accounts to download and use the specific data sets they want.



Name (Link) Description
Gene Expression Omnibus
(GEO)

An international public repository that archives and freely distributes high-throughput gene expression, genomics, and epigenetics data submitted by the research community. GEO supports various functional genomics data types, including microarray, next-generation sequencing, and other forms of high-throughput genomics data.

cBioPortal for Cancer Genomics
(cBioPortal)

An open-access resource for the interactive exploration of multidimensional cancer genomics datasets, providing visualization, analysis, and download of large-scale cancer genomics data sets. The cBioPortal is designed to support researchers in identifying driver genes and mutations, discovering patterns of gene expression, and correlating genomic data with clinical outcomes.

Database of Genotypes and Phenotypes
(dbGaP)

A database that stores and distributes information about human genetic and phenotypic data derived from studies investigating the interaction of genotype and phenotype in Humans. Funded by the National Institutes of Health (NIH), dbGaP facilitates sharing data from genetic and genomic research studies to improve our understanding of health and disease. It includes results from genome-wide association studies (GWAS) to clinical trials.

European Genome-phenome Archive
(EGA)

A secure and permanent repository for storing and sharing genetic and phenotypic data from biomedical research projects. The EGA provides a resource where researchers can deposit and share their datasets from high-throughput genomics and phenomics studies to ensure the privacy of participants and controlled access to data.

International Cancer Genome Consortium Data Portal
(ICGC)

A global platform that provides access to a comprehensive dataset of genome sequences and related clinical information from various cancer projects. The ICGC Data Portal aims to facilitate the understanding of the genomic changes involved in cancer, offering tools for searching, visualizing, and downloading the data. The portal supports international efforts to standardize data reporting and to provide a collaborative framework for sharing cancer genome information while ensuring data access compliance with participants' consent.

The National Cancer Institute’s Proteomic Data Commons
(PDC)

A comprehensive proteogenomic data repository managed by the National Cancer Institute (NCI), which offers mass spectrometry-based proteomic data organized by tumor type and study. The PDC serves as a critical resource for the cancer research community, providing open access to quality-controlled datasets, including those from the Clinical Proteomic Tumor Analysis Consortium (CPTAC) and International Cancer Proteogenome Consortium (ICPC), with the aim of advancing the understanding of cancer biology and improving the development of treatment strategies.

Tumor Immune Single-cell Hub 2
(TISCH2)

An online scRNA-seq database that focuses on the tumor microenvironment (TME). It provides detailed cell-type annotations at the single-cell level, which enables researchers to explore the TME across different cancer types. This comprehensive repository consists of 190 datasets and 6,297,320 cells related to various cancer types. TISCH2 supports researchers by providing access to a wealth of data for analysis, visualization, and downloading, aiming to foster discoveries in cancer research and immunotherapy.

Curated Cancer Cell Atlas
(3CA)

An online platform that compiles scRNA-seq data from global cancer research, aimed at mapping the diversity of cancer cell types and their microenvironments. Highlighted by its contribution to understanding tumor biology through meta-programs (Gavish et al. 2023, Nature), 3CA serves as a key resource for researchers exploring the cellular complexity of cancer, supporting advancements in targeted therapies and precision medicine.

UK Biobank
(UKB)

A large-scale biomedical database and research resource containing de-identified genetic, lifestyle and health information, and biological samples from half a million UK participants.

Trans-Omics for Precision Medicine
(TOPMed)

A comprehensive research program by the National Institutes of Health (NIH) aimed at enhancing our understanding of common and rare diseases through deep genomic and other "omic" data integration.

All of Us
(All of Us)

An initiative by the National Institutes of Health (NIH) to gather health data from one million or more people in the US to advance biomedical research and improve health.

Genome Sequencing Program Coordinating Center Data Generation Projects
(GSP CCDG)

A collaborative project aiming to generate and provide comprehensive genome sequencing data. The CCDG works alongside other genomic research initiatives to enhance our understanding of genetics, facilitate the discovery of novel genetic variants, and contribute to the advancement of personalized medicine.

Million Veteran Program
(MVP)

A research program conducted by the Department of Veterans Affairs (DVA), collecting genetic, military exposure, lifestyle, and health information from veterans to study how genes affect health and illness. The goal is to improve disease screening, diagnosis, and treatment for veterans.

Alzheimer’s Disease Sequencing Project
(ADSP)

A national research project that aims to identify genetic variants that influence the risk for or protect against Alzheimer’s disease.

Functional Annotation of Variants Online Resource
(FAVOR)

An online platform designed to help researchers understand the biological impact of genetic variants. FAVOR integrates various genomic and proteomic data sources to provide comprehensive annotations and predictions on variant functions.

Global Lipids Genetics Consortium
(GLGC)

An international collaborative effort focused on identifying genetic factors that influence lipid traits and their relationship with cardiovascular diseases.

Human Genetics Amplifier
(HuGeAMP)

A tool and resource designed to amplify the findings from human genetic research. It integrates data from various genetic studies and databases.

University of Michigan PheWAS Portal (PheWeb)

An online resource offering access to results from Phenome-Wide Association Studies (PheWAS), allowing researchers to explore associations between genetic variants and a wide range of phenotypes.

AstraZeneca PheWAS Portal
(AZPheWAS)

A portal provides access to a wealth of data from Phenome-Wide Association Studies conducted using AstraZeneca's extensive genomic database.




Site built with pkgdown 1.6.1.