Statistical Bioinformatics Lab

Wenyi Wang received her PhD in Biostatistics (Johns Hopkins University, 2007) and a joint postdoctoral training at Stanford Genome Technology Center and UC Berkeley Statistics (2007-2010). In 2010, she joined the Department of Bioinformatics and Computational Biology at the University of Texas MD Anderson Cancer Center. Wenyi's research includes contributions to statistical bioinformatics in cancer, including MuSE for subclonal mutation calling, DeMixT for transcriptome deconvolution, Famdenovo for de novo mutation identification, and more recently, a pan-cancer characterization of genetic intra-tumor heterogeneity in subclonal selection. Her group is focused on the development and application of computational methods to study the evolution of the human genome as well as the cancer genome, and to further develop risk prediction models to accelerate the translation of biological findings to clinical practice.

Pre-doctoral and post-doctoral fellow positions are available (see the biostatistics position and the cancer genomics position). Please inquire with Dr. Wang.

Lab Event May 2024

Current Research Directions

Deconvolution and single-cell modeling for intra- and inter- tumor heterogeneity


Tissues contain many distinct cell types in order to achieve their necessary functions within an organism. The activities of these cell types are determined by their transcriptional patterns and, thus, can be characterized by studying RNA expression data. The same holds true for tumors; as such, there have been many attempts to utilize transcriptomic data to uncover new insights about cancer. Although highly useful for studying and characterizing tumors, single-cell RNA–sequencing (scRNA-seq) data are costly and technically challenging to produce and, thus, often infeasible to use in clinical or research settings at a large scale. Therefore, bulk RNA-seq provides an attractive alternative, as it is significantly more cost-efficient than single-cell methods. However, this bulk data lacks the ability to distinguish between signals from different cell types, which can make downstream analysis difficult or inaccurate. Deconvolution methods have been developed to separate these signals in bulk data, broadening the range of applications in which bulk data can be used. For example, by tracking cell proportions across disease progression, it is possible to use deconvolved bulk data to gain a better understanding of the mechanisms that may cause the disease [DeMixSC]. In cancer research specifically, deconvolution is often used to determine the proportion of a tumor sample that consists of tumor and non-tumor cells or the degree to which different cell types contribute to a tumor’s characteristics. Such analyses have yielded valuable insights into cancer pathways, prognosis, treatment response, and intra- and inter-tumor heterogeneity [DeMixT, TmS] that can be used to improve treatment and outcomes for patients.

Mutation calling and subclonal reconstruction


Cancer is caused by genetic mutations, including single nucleotide variations (SNV), copy number alterations (CNA), and structural variations (SV). These alterations determine the behavior of individual tumors, including their rate of proliferation, resistance to treatment, and potential for metastasis. Therefore, the ability to identify and characterize mutations in tumors is necessary for cancer research and clinical translation. Whole-genome sequencing (WGS) and whole-exome sequencing (WES) data are commonly used in mutation calling efforts, the former because it provides high resolution to detect all mutations and the latter due to its small size and lessened computational requirements. However, many current methods for mutation calling require a significant amount of time to run, making large-scale analysis of genetic alterations difficult. As a result, Dr. Wang’s lab is interested in developing improved mutation calling methods that are computationally efficient [MuSE2]. Such improvements enable the analysis of large volumes of data, which could in turn advance precision medicine and allow novel discoveries to be made about cancer development. Similar advancements are also being made through the reconstruction of the subclonal architectures of tumors – that is, the number and characteristics of subpopulations of cancer cells within individual tumors. Understanding subclonal architecture is critical, as the resulting genetic variation can confer treatment resistance and improved fitness to cancer cell populations. Therefore, Dr. Wang’s lab is investigating new methods to reconstruct subclonal architecture and evolution such that analyses can be conducted more efficiently and with greater accuracy. Specifically, her lab is focusing on addressing the weaknesses of previous methods, such as their lack of intra-tumoral heterogeneity characterization across different cancer types or reliance on extensive computational resources and prior knowledge [Characterizing ITH, CliPP]. The insights gained from mutation calling and subclonal reconstruction analyses could help to improve patient outcomes by providing researchers and clinicians with a deeper understanding of the mechanisms that underlie cancer development and progression.

Semi-parametric survival modeling for cancer risk prediction


Cancer survivors represent a fast-growing yet under-studied population in regard to cancer risk. Second primary cancers, or new cancers that arise in cancer survivors, occur fairly often, particularly for survivors of breast or bladder cancer. However, because risk has not been accurately assessed among cancer survivors, previous cancers are often not considered in cancer prevention strategies. Further, conducting such assessments with pre-existing data is difficult and may be biased, as large pan-cancer databases such as SEER do not account for covariates other than age and sex. To overcome these difficulties, Dr. Wang’s lab examines data from patients affected by Li Fraumeni syndrome (LFS), a heritable condition that increases one’s risk of developing cancer. This population in particular is useful to study because patients can present a wide variety of cancer types and are more likely to develop multiple primary cancers than the general population. In addition, studying LFS patients also allows for additional factors such as mutation status to be considered when evaluating cancer risk. Due to the disease’s heritability, family members of afflicted patients often undergo genetic screening, as well, which can allow cancer risk to be estimated based on family members’ characteristics along with individual data. Therefore, in addition to predicting cancer risk among survivors, Dr. Wang’s lab is also investigating the use of LFS data for risk prediction of first and second primary tumors in LFS families [LFSPRO]. Obtaining a better understanding of cancer risk among cancer survivors and LFS patients can allow physicians and genetic counselors to make more personalized and data driven decisions about patients’ treatment and screening plans, enabling them to achieve early detection of first or second primary cancers. [Personalized Risk Prediction]

PI: Wenyi Wang

Department of Bioinformatics and Computational Biology

Wenyi Wang (王文漪), Professor, Department of Bioinformatics and Computational Biology, Division of Basic Science Research, The University of Texas MD Anderson, Cancer Center, Houston, Texas

Curriculum Vitae

Latest News

NCI
3May
2024

Wang Lab Postdoc Ankita Paul Awarded MD Anderson IDSO Fellowship

Congratulations to our postdoc Ankita, who has been awarded the MD Anderson Institute for Data Science in Oncology (IDSO) Fellowship! This fellowship is a great opportunity that provides junior researchers with advanced training in applying data science to oncology. We are excited to see the impactful contributions she will make through this program!

NCI
3May
2024

Wenyi Hooded PhD Graduates Nam and Yunjie(Jeffery) at Rice Univerisity Commencement

Congratuations again to Dr. Nguyen and Dr. Jiang. Wish you all the best and may your future endeavors be filled with success and fulfillment!

NCI
8Apr
2024

MuSE2.0 paper is online at Genome Research online!

We are exctied to officially introduce MuSE2.0, which reduces computing time by up to 50x compared to MuSE 1.0 and 8-80x compared to other popular callers. Our benchmark study suggests combining MuSE2.0 and the recently accelerated Strelka2 achieves high efficiency and accuracy in analyzing large cancer genomic datasets.


Free access here

NCI
3Apr
2024

Journal of Clinical Oncology paper is online!

We conducted a validation study of our LFSPRO software suite, which was developed for risk predictions in families with Li-Fraumeni syndrome, on a clinical patient cohort collected as part of the Clinical Cancer Genetics program at MD Anderson Cancer Center. Unlike research datasets that are meticulously collected over decades for research purposes, our unique dataset closely resembles what genetic counselors observe in real counseling sessions. The validation results indicate that our risk prediction models have the potential to assist decision making in clinical settings, and further highlight the importance of such validation in bridging the gap between methodology research labs and clinics.


Free access at https://ascopubs.org/doi/10.1200/JCO.23.01926

NCI
27Mar
2024

Congratulations to Dr. Nguyen and Dr. Jiang!


Congratulations to Nam H Nguyen and Yujie Jiang for successfully defending their PhD theses and earning their doctorates!

NCI
17Mar
2024

Don't miss out! Join the Upcoming Webinar: "Empowering Statistics in the Era of AI - A Fireside Conversation"


NCI
Feb
2024

STATS UP AI - A community for Statistics and Biostatistics

Professors from multiple universities initiate StatsUpAI, aiming to elevate the role of statisticians in AI research. This movement emphasizes empowering statisticians to lead and innovate in addressing real-world challenges through AI research. As one of the founding members, Wenyi's involvement highlights her commitment to advancing statistical methodologies in the era of artificial intelligence.

To learn more about StatsUpAI, visit their webpage at
https://statsupai.org

NCI
12Feb
2024

LFSPROShiny paper is published online!

LFSPROShiny is an interactive R/Shiny application designed to perform risk prediction and visualization for Li-Fraumeni syndrome (LFS), a genetic disorder associated with TP53 mutations, enabling genetic counselors to assess patient risk profiles and support informed decision-making without the need for programming expertise.

Free access at https://ascopubs.org/doi/10.1200/CCI.23.00167.

NCI
16Nov
2023

Join us on the 2023 Leading Edge of Cancer Research Symposium hosted by MD Anderson on Nov 16-17!

This in-person event provides an incredible no-cost opportunity to engage with and learn from national and international leaders in cancer research, including an opportunity to present new research at our poster session. Ten of the top posters will be chosen for presentations as part of the symposium agenda as well as monetary awards. Deadline to submit an abstract is Oct 20. Please take your lab and program trainees to join us!
Click for more details.

NCI
9Nov
2023

Congratulations to the launch of Institute for Data Science in Oncology(IDSO) at MD Anderson!

IDSO integrates the most advanced computational and data science approaches with the institution’s extensive scientific and clinical expertise, aiming to profoundly enhance patient lives by revolutionizing oncological care and research.
Dr. Wang's lab is proudly affiliated with this pioneering initiative, dedicated to advancements in cancer care and research through innovative methodologies.

NCI
14Oct
2023

DeMixSC paper is on bioRxiv!

The difference in technology between bulk and sc/snRNA-seq data significantly diminishes the accuracy of current deconvolution methods. To address this issue, we've introduced DeMixSC, an innovative deconvolution approach that overcomes this challenge using an improved wNNLS framework. DeMixSC is distinguished by its accuracy in deconvolution and its generalizability to be applied to any large bulk cohorts. All it requires is a small set of benchmark dataset that match the tissue-type of the targeted large bulk cohorts.
Click for software tool and paper preprint.

10Aug
2023

Congratulations to our summer interns for their successful Poster Exhibition!


From left to right: Annabel Settle (Intern), Shuai Guo (PhD student), Armina Fani (Intern), Liyang Xie (Intern)

NCI
6Jul
2023

The MuSE2.0 benchmark study is on bioRxiv!

Excited to announce that we have benchmarked MuSE2.0 for somatic mutation calling in computing time: finishing one pair of WGS sample < 1 hour, and in accuracy: achieves 99% recall of the PCAWG mutations. MuSE 2.0 employs a multithreaded producer-consumer model and the OpenMP library for parallel computing. Our prepring is on bioRxiv: https://doi.org/10.1101/2023.07.04.547569. We are looking for user feedbacks.

MuSE2.0 is freely downloadable at
https://github.com/wwylab/MuSE.

NCI
1May
2023

Join the 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM-BCB) at Houston Sep 3-6 2023!

Wenyi is co-chairing the program committee of the 14th ACM Conference on Bioinformatics, Computational Biology, and Health Informatics (ACM BCB 2023), the flagship conference of the ACM SIGBio. It will be held in Houston, TX during September 3-6, 2023, after the past 13 ones held in Atlanta, Boston, Chicago, Newport Beach, Niagara Falls, Online (due to COVID19), Orlando, Seattle, and Washington DC. ACM BCB 2023 will continually showcase leading-edge R&D on data collection, processing, analysis, and knowledge modeling for biological, clinical, and healthcare applications from bench to bedside.

NCI
8Mar
2023

TmS Shiny app online!

In order to facilitate biological correlative analysis using tumor-specific tumor-cell total mRNA expression (TmS) across cancer types, we present a Shiny app for visual inspections of sequencing data from ~6,500 cancer patients without prior programming knowledge. We are looking for user feedbacks:
https://wwylab.github.io/TmS/articles/shinyapp.html.

NCI
6Mar
2023

Our new risk prediction modeling paper is on bioRxiv!

There will be more than 20 million cancer survivors in the US by 2026. To help characterize the risk trajectories of this under-studied cancer population, we propose a Bayesian semiparametric model that integrates competing cancer outcomes with a non-homogenous poisson process for recurring events. Check out our preprint for more exciting results we achieved with this model: 10.1101/2023.02.28.530537v2.

NCI
31Jan
2023

Congratulations to collaborator Di Zhao and our dream team for winning the PCF Challenge Award!

NCI
12Jan
2023

Congrats: 4th year PhD student Hoai Nam Nguyen for winning the ASA Section in Lifetime Data Science (LiDS) student paper award!

NCI
1Dec
2022

TmS paper is now in print in Nature Biotechnology November issue.


https://www.nature.com/nbt/volumes/40/issues/11#Features

NCI
15Jun
2022

Exciting TmS News in the media

Check out two recently published complementary reports regarding our work on tumor cell total mRNA expression from the Scientist Magazine and GenomeWeb Precision Oncology! Click on this title to open both articles together!



Note: If you are only seeing one article, check your pop-up blocker

NCI
13Jun
2022

TmS Blog is online

The TmS blog provides a "behind the paper" insight into the motivation and the work done. Click the link below to check out the blog.
https://bioengineeringcommunity.nature.com/posts/tumor-specific-total-mrna-expression-a-robust-and-prognostic-feature-across-cancers

NCI
13Jun
2022

Our TmS Metric is Published at Nature Biotechnology

Very excited to share our paper in Nature Biotechnology today! Huge amount of work by Shaolong Cao, Jennifer Wang, Shuangxi Ji et al! It is amazing to work with many experts across cancers. Every cancer tells its own story and our metric TmS can quantify it! Special thanks to Peter Van Loo's lab for a wonderful collaboration! Here is the link for the paper. The TmS data can be found at https://github.com/wwylab/TmS.
Doi: 10.1038/s41587-022-01342-x

NCI
28Mar
2022

Wenyi will give a keynote talk at RECOMB May 22-25th 2022

The program of RECOMB 2022 is out, click here for more info. This is one of the two major annual conferences in field of computational biology. Wenyi will talk about "Deciphering cancer cell evolution and ecology" at 8am on Wed May 25th!

NCI
20Mar
2022

R01 grant achieved a 3% score!

Our methods R01 grant achieved a 3 percentile priority score from the National Cancer Institute (NCI). This grant titled "statistical methods for analysis of heterogeneous tumors" will support the development of integrative deconvolution models that unite the transcriptomic and genomic aspects of tumor heterogeneity and evolution.

CDMRP
27Feb
2022

Lab receives DoD Prostate Cancer Research Program Research Program Data Sciencec Award!

Wenyi's lab is awarded a CDMRP Department of Defense Prostate Cancer Research Program Data Science Award to develop an integrated genomic definition and therapeutic strategy for androgen-indifferent prostate cancer, in partnership with Dr. Ana Aparicio from the Department of Genitourinary Medical Oncology at MDACC.

Nam_SSGG_award
27Jan
2022

Congratulations to Nam for winning the ASA Statistical Genetics and Genomics Paper Award competition!

The title of the paper is Bayesian estimation of a joint semiparametric recurrent event model of multiple cancer types with application to the Li-Fraumeni Syndrome.

zeyapaperpic
27Oct
2021

Former PhD student Zeya Wang's paper on Bayesian Edge Regression in Undirected Graphical Models is accepted by JASA

He is now a Machine Learning Engineer at Tik Tok, California. Congrats Zeya!

28Sep
2021

Wenyi will join the editorial board of JASA Applications and Case Studies as Associate Editor on Jan 1st 2022

A huge congratulations to Wenyi!

9Sep
2021

We are proud to announce MuSE2.0

for somatic mutation calling, with 50x speedup from MuSE1.0

7Apr
2021

Wang lab is on the MD Anderson News!

For our work on genetic diversity within tumors to understand cancer evolution

21Jan
2021

Congrats: 3rd year PhD student Yujie Jiang received the best paper award from ASA Section in Statistical Genetics and Genomics!

For his work on CliP: fast subclonal architecture reconstruction forcancer cells from genomic DNA sequencing data.

4Oct
2020

We are excited to introduce a mathematical model to measure an essential RNA feature in tumor cell!

See how we use it to track tumor phenotypes at the link to the preprint below!

24Aug
2020

Wenyi received a CPRIT grant for cancer prevention as co-PI. Congrats!!

Improving Risk Prediction for Li-Fraumeni Syndrome: A Practical Tool for Clinical Health Care Providers (Banu Arun/Wenyi Wang) - $896,896

18Aug
2020

Famdenovo is on Genome Research August Issue

Congratulations to Fan, Elissa, Carlos and Matt!!

7Jul
2020

A big congrats to Wenyi for promotion to the position of a full professor effective 9/1!

31Jan
2020

DeMixT 1.2.3 released!

Available on Bioconductor and GitHub

11Oct
2019

Our two companion papers on LFS are accepted at Cancer Research!

TP53 mutation associated cancer-specific, or multiple-primary-cancer onset penetrances.

New member
18Sep
2019

Welcome! New postdoctoral fellow Dr. Shuangxi Ji.

Dr. Shuangxi Ji received a PhD in Biological Sciences from University of Birmingham.

czi
21Aug
2019

Awarded! CZI Seed Network Fund for Human Cell Atlas

We are member of the Retina team.

bioconductor
23May
2019

DeMixT is on Bioconductor.

Recommended download from Bioconductor

GCC
10Jan
2019

MD/PhD student Carlos Vera Recio received NLM fellowship Congrats!

National Library of Medicine (NLM) Training Program in Biomedical Informatics and Data Science

19-3060_Dodd_V3
9Oct
2018

DeMixT paper accepted!

Transcriptome Deconvolution of Heterogeneous Tumor Samples with Immune Infiltration will be published by iScience!

biostatspaper
9Sep
2018

Our multiple primary model accepted by Biostatistics!

Our Bayesian estimation of a semiparametric recurrent event model for estimation of multiple primary cancers paper has been accepted by Biostatistics!

csrgithubpic
5May
2018

A new subclonal clustering method

Accompanying PCAWG11, our lab has developed a new consensus clustering for subclonal reconstruction software, CSR (pronounced "Caesar").

hetpaperpic
5May
2018

PCAWG Results

A huge endeavor of the Pan-Cancer Analysis Working Group (PCAWG): Portraits of genetic intra-tumour heterogeneity (ITH) and subclonal selection across cancer types.

jasapaperpic
29Apr
2018

LFS statistical model accepted

Our first statistical modeling work for the Li-Fraumeni Syndrome (LFS) is acceptd by Journal of the American Statistical Association after a 3-year journey! Congrats everyone!

trainingprogrampic
29Apr
2018

Postdocs opportunities

My TAMU collaborator Val Johnson and I are jointly recruiting postdocs in cancer bioinformatics through this 2-year training program at TAMU. Citizenship or green card required. Send us your CV if interested.

Group Members