Current single-cell foundational models, including scGPT, can underperform compared to simpler approaches in predicting cellular responses to perturbations, highlighting significant limitations in existing benchmark datasets: 🌲 Turbine's benchmarking showed scGPT lagging behind simpler approaches like averaging training samples and Random Forest. 🐾 Models with biologically relevant features can significantly outperform scGPT. 🔻 Perturb-Seq datasets used for benchmarking are limited by low perturbation counts and lack of intra-perturbation variance. ⛓ Pseudo-bulked expression profiles outperform foundational models, suggesting little-to-no advantage in single-cell level modeling, especially in low-heterogeneity cell lines. Findings suggest revisiting benchmarking practices for more effective evaluation of post-perturbation gene expression prediction. Read more in the following preprint: https://lnkd.in/dKBGpYxr
computational systems biologist | principal bioinformatics scientist & research team lead at Turbine
Accurately predicting #cellular responses to #perturbations is crucial for understanding cell behavior in both healthy and diseased states. Recently, several large language model (LLM)-based single-cell #foundational models have been proposed for this task. But how well are they performing? At Turbine, we conducted #benchmarks on one of these models, scGPT, and uncovered some surprising results - see our new preprint: https://lnkd.in/dmHKbYQj Even simple models, like averaging training samples, outperformed scGPT. A straightforward machine learning model, Random Forest, incorporating biologically meaningful features, outperformed it by a large margin. We also discovered that current Perturb-Seq benchmark datasets generally contain a low number of perturbations and lack intra-perturbation variance, limiting their usefulness for robust benchmarking. Additionally, models using pseudo-bulked expression profiles outperformed foundation models, suggesting that single-cell level modeling may offer little advantage, especially in low-heterogeneity cell lines. Our findings reveal important limitations in current benchmarking practices and offer new insights for more effective evaluation of post-perturbation gene expression prediction models. Thanks to the coauthors, Kristóf Szalay and especially Gerold Csendes who led these efforts.
The billion dollar question is how long this lag will last until the GPTs catch up and overtake. Experience shows: Typically not very long -> http://www.incompleteideas.net/IncIdeas/BitterLesson.html