Direkt zum Inhalt springen

Master Thesis: Modeling SNP-Conditioned Cellular Phenotypes via Flow Matching in the labs of Prof. Fabian Theis and Dr. Matthias Heinig

04.09.2025, Abschlussarbeiten, Bachelor- und Masterarbeiten

This Master thesis aims to extend CellFlow to model how SNP-based genetic variation influences single-cell phenotypes and responses to perturbations. It integrates single-cell data with population genomics to predict genotype-dependent effects, focusing on epistasis and disease relevance. Candidates with machine learning and bioinformatics skills are encouraged to apply via email.

Context

Advances in high-throughput single-cell profiling and large-scale phenotypic screens have unlocked unprecedented insights into cellular heterogeneity and the effects of experimental perturbations. Tools like CellFlow (Klein et al. 2025), a generative framework based on flow matching, have demonstrated powerful capabilities in predicting cellular responses to perturbations such as drug treatments, gene knockouts, or cytokine stimulations.

In parallel, population-scale genomics projects such as OneK1K (Yazar et al. 2022), which combine genetic variation (e.g., SNPs) with single-cell RNA-seq data, provide a rich substrate for exploring how genetic variation modulates cellular phenotypes. This thesis will integrate these two directions, leveraging CellFlow to model how SNP-defined genetic backgrounds influence cellular responses to perturbations, with a particular focus on QTL-associated variants and epistatic interactions.

Thesis Goals and Research Questions

The primary objective of this thesis is to extend CellFlow to incorporate individual-specific genetic variation, particularly common SNPs, as conditioning variables. This enables predictive modeling of single-cell phenotypes under both genetic and experimental perturbation contexts, with applications ranging from fundamental questions in gene regulation, genetic risk prediction and genotype-informed response prediction. A longer-term goal is to position this framework as a tool for modeling cellular phenotypes relevant to disease predisposition, therapeutic response, and inter-individual variability. Specific questions include:

  • Can we model epistatic interactions between SNPs in terms of their effect on phenotypic response trajectories?
  • Can we identify causal variants and study gene-environment interactions?
  • How well can CellFlow predict phenotypes for unseen SNP combinations or rare alleles, particularly under perturbation conditions?
  • Can SNP-conditioned models reveal genotype-dependent variation in cellular responses that align with known or hypothesized disease mechanisms?
  • To what extent can SNP-conditioned models be generalized or transferred across genetically diverse populations or cell types?

Methodology and Scope

  • Adapt the CellFlow architecture to ingest SNP-based sample covariates, including QTLs and epistatic variant pairs.
  • Use genotype-informed embeddings (e.g., one-hot encoding, gene embeddings inferred from ESM3 (Hayes et al. 2025), learned SNP embeddings inferred from DNA language models (Nguyen et al. 2023; Hingerl et al. 2025), or integration with known functional annotations (Zheng et al. 2024)).
  • Evaluate on datasets such as OneK1K, eQTL Consortium, or sc-eQTLGen.
  • Compare against baseline models, such as:
    • Conditional VAEs
    • Epistasis prediction models

Candidate Profile

  • Strong background in machine learning, computational biology, or bioinformatics
  • Familiarity with deep generative models
  • Experience with single-cell RNA-seq data and basic statistical genetics
  • Programming skills in Python, PyTorch or JAX, and data science tools

How to Apply

Send an email to matthias.heinig@tum.de, dominik.klein@helmholtz-munich.de, and lucas.arnoldt@tum.de with the following information:

  • Your CV
  • A brief introduction outlining your background and motivation
  • Your preferred start date
  • Academic transcripts

We look forward to receiving your email!

References

  • Klein et al. (2025). CellFlow enables generative single-cell phenotype modeling with flow matching. bioRxiv; DOI: 10.1101/2025.04.11.648220
  • Yazar et al. (2022). Single-cell eQTL mapping identifies cell type–specific genetic control of autoimmune disease. Science, 376 (6589); DOI: 10.1126/science.abf3041
  • Hayes et al. (2025). Simulating 500 million years of evolution with a language model. Science, 387 (6736); DOI: 10.1126/science.ads0018
  • Nguyen et al. (2023). HyenaDNA: Long-Range Genomic Sequence Modeling at Single Nucleotide Resolution. arXiv; DOI: 10.48550/arXiv.2306.15794
  • Hingerl et al. (2025). scooby: Modeling multi-modal genomic profiles from DNA sequence at single-cell resolution. bioRxiv; DOI: 10.1101/2024.09.19.613754
  • Zheng et al. (2024). Leveraging functional genomic annotations and genome coverage to improve polygenic prediction of complex traits within and between ancestries. Nature Genetics, 56; DOI: 10.1038/s41588-024-01704-y

Kontakt: lucas.arnoldt@tum.de

Mehr Information

https://www.helmholtz-munich.de/en/icb/research-groups/theis-lab