poss_dataset_ids = dataset_info
.map(d => d.dataset_id)
.filter(d => results.map(r => r.dataset_id).includes(d))
poss_method_ids = method_info
.map(d => d.method_id)
.filter(d => results.map(r => r.method_id).includes(d))
poss_metric_ids = metric_info
.map(d => d.metric_id)
.filter(d => results.map(r => Object.keys(r.scaled_scores)).flat().includes(d))
Predict Modality
Predicting the profiles of one modality (e.g. protein abundance) from another (e.g. mRNA expression).
8 datasets · 5 methods · 4 control methods · 8 metrics
Task info Method info Metric info Dataset info Results
Experimental techniques to measure multiple modalities within the same single cell are increasingly becoming available. The demand for these measurements is driven by the promise to provide a deeper insight into the state of a cell. Yet, the modalities are also intrinsically linked. We know that DNA must be accessible (ATAC data) to produce mRNA (expression data), and mRNA in turn is used as a template to produce protein (protein abundance). These processes are regulated often by the same molecules that they produce: for example, a protein may bind DNA to prevent the production of more mRNA. Understanding these regulatory processes would be transformative for synthetic biology and drug target discovery. Any method that can predict a modality from another must have accounted for these regulatory processes, but the demand for multi-modal data shows that this is not trivial.
Summary
Display settings
Filter datasets
Filter methods
Filter metrics
Results
Results table of the scores per method, dataset and metric (after scaling). Use the filters to make a custom subselection of methods and datasets. The “Overall mean” dataset is the mean value across all datasets.
Dataset info
Show
NeurIPS2021 CITE-Seq (GEX2ADT)
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
NeurIPS2021 Multiome (GEX2ATAC)
Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
OpenProblems NeurIPS2022 Multiome (ATAC2GEX)
Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors (... 2024).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
NeurIPS2021 Multiome (ATAC2GEX)
Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
OpenProblems NeurIPS2022 Multiome (GEX2ATAC)
Single-cell Multiome (GEX+ATAC) data collected from bone marrow mononuclear cells of 12 healthy human donors (... 2024).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X Multiome Gene Expression and Chromatin Accessibility kit. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
OpenProblems NeurIPS2022 CITE-Seq (GEX2ADT)
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (... 2024).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
OpenProblems NeurIPS2022 CITE-Seq (ADT2GEX)
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (... 2024).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2022. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
NeurIPS2021 CITE-Seq (ADT2GEX)
Single-cell CITE-Seq (GEX+ADT) data collected from bone marrow mononuclear cells of 12 healthy human donors (Luecken et al. 2021).
Single-cell CITE-Seq data collected from bone marrow mononuclear cells of 12 healthy human donors using the 10X 3 prime Single-Cell Gene Expression kit with Feature Barcoding in combination with the BioLegend TotalSeq B Universal Human Panel v1.0. The dataset was generated to support Multimodal Single-Cell Data Integration Challenge at NeurIPS 2021. Samples were prepared using a standard protocol at four sites. The resulting data was then annotated to identify cell types and remove doublets. The dataset was designed with a nested batch layout such that some donor samples were measured at multiple sites with some donors measured at a single site.
Method info
Show
KNNR (Py)
K-nearest neighbor regression in Python. Links: Docs.
K-nearest neighbor regression in Python.
KNNR (R)
K-nearest neighbor regression in R. Links: Docs.
K-nearest neighbor regression in R.
Linear Model
Linear model regression. Links: Docs.
A linear model regression method.
LMDS + IRLBA + RF
A random forest regression using LMDS of modality 1 to predict a PCA embedding of modality 2, which is then reversed to predict the original modality 2. Links: Docs.
A random forest regression using LMDS of modality 1 to predict a PCA embedding of modality 2, which is then reversed to predict the original modality 2.
Guanlab-dengkw
A kernel ridge regression method with RBF kernel. Links: Docs.
This is a solution developed by Team Guanlab - dengkw in the Neurips 2021 competition to predict one modality from another using kernel ridge regression (KRR) with RBF kernel. Truncated SVD is applied on the combined training and test data from modality 1 followed by row-wise z-score normalization on the reduced matrix. The truncated SVD of modality 2 is predicted by training a KRR model on the normalized training matrix of modality 1. Predictions on the normalized test matrix are then re-mapped to the modality 2 feature space via the right singular vectors.
Control method info
Show
Mean per gene
Returns the mean expression value per gene
Returns the mean expression value per gene.
Random predictions
Returns random training profiles
Returns random training profiles.
Zeros
Returns a prediction consisting of all zeros
Returns a prediction consisting of all zeros.
Solution
Returns the ground-truth solution
Returns the ground-truth solution.
Metric info
Show
Mean pearson per cell
The mean of the pearson values of per-cell expression value vectors (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The mean of the pearson values of per-cell expression value vectors.
Mean spearman per cell
The mean of the spearman values of per-cell expression value vectors (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The mean of the spearman values of per-cell expression value vectors.
Mean pearson per gene
The mean of the pearson values of per-gene expression value vectors (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The mean of the pearson values of per-gene expression value vectors.
Mean spearman per gene
The mean of the spearman values of per-gene expression value vectors (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The mean of the spearman values of per-gene expression value vectors.
Overall pearson
The mean of the pearson values of vectorized expression matrices (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The mean of the pearson values of vectorized expression matrices.
Overall spearman
The mean of the spearman values of vectorized expression matrices (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The mean of the spearman values of vectorized expression matrices.
RMSE
The root mean squared error (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The square root of the mean of the square of all of the error.
MAE
The mean absolute error (10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.1098/rspl.1895.0041?; 10.1093/biomet/30.1-2.81?; 10.5194/gmdd-7-1525-2014?; 10.5194/gmdd-7-1525-2014?).
The average difference between the expression values and the predicted expression values.
Quality control results
Show
Category | Name | Value | Condition | Severity |
---|---|---|---|---|
Raw results | Dataset 'openproblems_neurips2022/pbmc_multiome/swap' %missing | 0.3611111 | pct_missing <= .1 | ✗✗✗ |
Dataset info | Pct 'task_id' missing | 1.0000000 | percent_missing(dataset_info, field) | ✗✗ |
Method info | Pct 'paper_reference' missing | 0.5555556 | percent_missing(method_info, field) | ✗✗ |
Raw results | Method 'guanlab_dengkw_pm' %missing | 0.2500000 | pct_missing <= .1 | ✗✗ |
Raw results | Method 'zeros' %missing | 0.2500000 | pct_missing <= .1 | ✗✗ |
Scaling | Worst score lmds_irlba_rf overall_pearson | -2.4102000 | worst_score >= -1 | ✗✗ |
Raw results | Metric 'overall_pearson' %missing | 0.1666667 | pct_missing <= .1 | ✗ |
Raw results | Metric 'overall_spearman' %missing | 0.1666667 | pct_missing <= .1 | ✗ |
Raw results | Dataset 'openproblems_neurips2022/pbmc_multiome/normal' %missing | 0.1388889 | pct_missing <= .1 | ✗ |
Raw results | Method 'knnr_py' %missing | 0.1250000 | pct_missing <= .1 | ✗ |
Raw results | Method 'lm' %missing | 0.1250000 | pct_missing <= .1 | ✗ |
Normalisation visualisation
Show
References
... 2024. “Predicting Cellular Profiles Across Modalities in Longitudinal Single-Cell Data: An Open Problems Competition.” In Preparation.
Luecken, Malte, Daniel Burkhardt, Robrecht Cannoodt, Christopher Lance, Aditi Agrawal, Hananeh Aliee, Ann Chen, et al. 2021. “A Sandbox for Prediction and Integration of DNA, RNA, and Proteins in Single Cells.” In Proceedings of the Neural Information Processing Systems Track on Datasets and Benchmarks, edited by J. Vanschoren and S. Yeung. Vol. 1. Curran. https://datasets-benchmarks-proceedings.neurips.cc/paper_files/paper/2021/file/158f3069a435b314a80bdcb024f8e422-Paper-round2.pdf.