- Subedi, Sishir, and Yongjin P. Park. "Decomposing patient heterogeneity of single-cell cancer data by cross-attention neural networks." medRxiv 2025.06.04.25328900
The following packages are required:
- anndata==0.10.8
- annoy==1.17.0
- numpy==1.24.4
- pandas>=2.0.3
- scanpy==1.9.3
- torch==2.5.1
We highly recommend to install picasa from PyPI in a new conda environment.
conda create --name picasa_env "python>=3.9"
conda activate picasa_env
pip install picasa
Lung cancer: The lung cancer dataset is available from GSE148071.
Ovarian cancer: The high-grade serous ovarian cancer (HGSOC) dataset is available from GSE165897.
Breast cancer:The breast cancer single-cell dataset is available from GSE176078.
Normal pancreas: The normal pancreas dataset is available from Seuret data integration tutorial, https://satijalab.org/seurat/archive/v3.2/integration.html.
Simulation data: The dataset is available from Figshare platform: https://figshare.com/articles/dataset/Benchmarking_atlas-level_data_integration_in_single-cell_genomics_-integration_task_datasets_Immune_and_pancreas/12420968.
For the step-by-step tutorial, please refer to notebooks :
-
Tutorial 1. Training PICASA model using simulated datasets.
-
Tutorial 2. Plotting all three latent representations learned by the model.
-
Tutorial 3. Analysis of the cross attention matrix estimated by the model.
-
Tutorial 4. Cancer common representation analysis.
-
Tutorial 5. Cancer unique representation analysis.
-
Tutorial 6. Cancer patient outcome analysis.
