{ "cells": [ { "cell_type": "markdown", "id": "3838b43a", "metadata": {}, "source": [ "# Run InSituCNV\n", "\n", "This notebook runs the full InSituCNV workflow on a spatial transcriptomics dataset:\n", "normalize raw counts, smooth over neighbors, normalize and log-transform, add genomic positions, run `infercnvpy`, cluster CNV profiles, plot heatmaps and spatial CNV maps, then manually choose normal/tumor clusters and optional tumor clone labels.\n", "\n", "The example setup uses generic terminology. Replace the paths, annotation column, and reference categories with values for your own dataset." ] }, { "cell_type": "markdown", "id": "5da280e2", "metadata": {}, "source": [ "## 1. Setup\n", "\n", "Run this notebook from the package environment:\n", "\n", "```bash\n", "conda activate insitucnv_env\n", "cd InSituCNV\n", "pip install -e .\n", "jupyter lab\n", "```" ] }, { "cell_type": "code", "execution_count": null, "id": "2e6d355b", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import pandas as pd\n", "import scanpy as sc\n", "import insitucnv as icv\n", "\n", "sc.settings.verbosity = 2\n", "sc.set_figure_params(figsize=(6, 6), dpi=100)\n", "\n", "SAMPLE_NAME = \"example_sample\"\n", "DATA_PATH = Path(\"data/your_dataset.h5ad\")\n", "OUTPUT_DIR = Path(\"outputs\") / SAMPLE_NAME\n", "PLOTS_DIR = OUTPUT_DIR / \"plots\"\n", "\n", "RAW_LAYER = \"raw_counts\" # Change this if your raw count layer has another name.\n", "REFERENCE_KEY = \"cell_type\" # obs column containing normal/reference labels.\n", "REFERENCE_CATEGORIES = [\n", " \"T_cells\",\n", " \"B_cells\",\n", " \"Stromal\",\n", "]\n", "SPATIAL_KEY = \"spatial\"\n", "\n", "OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\n", "PLOTS_DIR.mkdir(parents=True, exist_ok=True)\n", "\n", "DATA_PATH" ] }, { "cell_type": "markdown", "id": "161fce91", "metadata": {}, "source": [ "## 2. Load the AnnData object\n", "\n", "Before running the CNV steps, check that your AnnData object has:\n", "\n", "- raw counts in `adata.layers[RAW_LAYER]`, or raw counts in `adata.X` that can be copied into that layer\n", "- normal/reference labels in `adata.obs[REFERENCE_KEY]`\n", "- spatial coordinates in `adata.obsm[SPATIAL_KEY]`\n", "- a precomputed neighbor graph for smoothing" ] }, { "cell_type": "code", "execution_count": null, "id": "fe0bd3c6", "metadata": {}, "outputs": [], "source": [ "adata = sc.read_h5ad(DATA_PATH)\n", "adata" ] }, { "cell_type": "code", "execution_count": null, "id": "4dddeb88", "metadata": {}, "outputs": [], "source": [ "print(\"layers:\", list(adata.layers.keys()))\n", "print(\"obsm:\", list(adata.obsm.keys()))\n", "print(\"obs columns:\", list(adata.obs.columns))\n", "print(\"reference labels:\", sorted(adata.obs[REFERENCE_KEY].astype(str).unique()))" ] }, { "cell_type": "code", "execution_count": null, "id": "ccc7b8f0", "metadata": {}, "outputs": [], "source": [ "if SPATIAL_KEY not in adata.obsm:\n", " raise KeyError(f\"adata.obsm[{SPATIAL_KEY!r}] is required for spatial plots.\")\n", "\n", "if REFERENCE_KEY not in adata.obs:\n", " raise KeyError(f\"adata.obs[{REFERENCE_KEY!r}] is required for inferCNV reference labels.\")\n", "\n", "if RAW_LAYER not in adata.layers:\n", " print(f\"Layer {RAW_LAYER!r} was not found. Copying current adata.X into adata.layers[{RAW_LAYER!r}].\")\n", " adata.layers[RAW_LAYER] = adata.X.copy()\n", "\n", "icv.pl.plot_spatial(\n", " adata,\n", " color=REFERENCE_KEY,\n", " spatial_key=SPATIAL_KEY,\n", " output_path=PLOTS_DIR / \"reference_labels_spatial.png\",\n", " point_size=15,\n", " title=f\"{SAMPLE_NAME} reference labels\",\n", " show=True,\n", ")" ] }, { "cell_type": "markdown", "id": "548075e7", "metadata": {}, "source": [ "## 3. Define inferCNV reference cells\n", "\n", "The reference categories should be normal or healthy cells for your dataset. For a tissue sample this is often immune, stromal, endothelial, or other non-tumor categories, depending on your annotations." ] }, { "cell_type": "code", "execution_count": null, "id": "07db7ea6", "metadata": {}, "outputs": [], "source": [ "present_reference_categories = [\n", " category\n", " for category in REFERENCE_CATEGORIES\n", " if category in set(adata.obs[REFERENCE_KEY].astype(str))\n", "]\n", "\n", "if not present_reference_categories:\n", " raise ValueError(\"None of the configured REFERENCE_CATEGORIES were found in the reference annotation column.\")\n", "\n", "present_reference_categories" ] }, { "cell_type": "markdown", "id": "c527faf5", "metadata": {}, "source": [ "## 4. Normalize, smooth, log-normalize, and add genomic positions\n", "\n", "This step:\n", "\n", "1. restores raw counts from `adata.layers[RAW_LAYER]` into `adata.X`\n", "2. normalizes total counts into `adata.layers[\"norm\"]`\n", "3. smooths normalized counts over the neighbor graph into `adata.layers[\"M\"]`\n", "4. normalizes and log-transforms the smoothed matrix into `adata.layers[\"log_norm\"]`\n", "5. subsets to genes with genomic positions and adds `chromosome`, `start`, and `end`" ] }, { "cell_type": "code", "execution_count": null, "id": "81541f8e", "metadata": {}, "outputs": [], "source": [ "adata = icv.tl.prepare_cnv_input(\n", " adata,\n", " raw_layer=RAW_LAYER,\n", " target_sum=1e4,\n", " smoothing_neighbors=100,\n", " normalized_layer=\"norm\",\n", " smoothed_layer=\"M\",\n", " log_layer=\"log_norm\",\n", ")\n", "\n", "adata" ] }, { "cell_type": "code", "execution_count": null, "id": "c225c573", "metadata": {}, "outputs": [], "source": [ "adata.var[[\"chromosome\", \"start\", \"end\"]].head()" ] }, { "cell_type": "markdown", "id": "07a3fd77", "metadata": {}, "source": [ "## 5. Run inferCNV\n", "\n", "Adjust `window_size`, `step`, and `lfc_clip` if you need different inferCNV settings for your panel or tissue." ] }, { "cell_type": "code", "execution_count": null, "id": "a4950494", "metadata": {}, "outputs": [], "source": [ "icv.tl.run_infercnv(\n", " adata,\n", " reference_key=REFERENCE_KEY,\n", " reference_categories=present_reference_categories,\n", " input_layer=\"log_norm\",\n", " window_size=60,\n", " step=10,\n", " lfc_clip=4,\n", " chunksize=1000,\n", " calculate_gene_values=True,\n", ")\n", "\n", "adata" ] }, { "cell_type": "markdown", "id": "cca6225e", "metadata": {}, "source": [ "## 6. Compute CNV neighbors and cluster at multiple resolutions\n", "\n", "Testing several resolutions lets you compare how granular the CNV clone structure should be. The keys created here are `cnv_leiden_res0.1`, `cnv_leiden_res0.2`, and `cnv_leiden_res0.3`." ] }, { "cell_type": "code", "execution_count": null, "id": "fa2d929b", "metadata": {}, "outputs": [], "source": [ "icv.tl.compute_cnv_neighbors(adata)\n", "\n", "resolutions = [0.1, 0.2, 0.3]\n", "cluster_keys = icv.tl.cluster_cnv_resolutions(\n", " adata,\n", " resolutions=resolutions,\n", " dendrogram=True,\n", ")\n", "\n", "cluster_keys" ] }, { "cell_type": "code", "execution_count": null, "id": "1b70f81d", "metadata": {}, "outputs": [], "source": [ "for key in cluster_keys:\n", " display(pd.crosstab(adata.obs[key], adata.obs[REFERENCE_KEY]))" ] }, { "cell_type": "markdown", "id": "0845e449", "metadata": {}, "source": [ "## 7. Plot CNV heatmaps and spatial CNV clusters\n", "\n", "Review these plots before assigning any cluster as normal, tumor, or a separate tumor clone." ] }, { "cell_type": "code", "execution_count": null, "id": "9c55cd40", "metadata": {}, "outputs": [], "source": [ "for key in cluster_keys:\n", " print(key)\n", " icv.pl.plot_chromosome_heatmap(\n", " adata,\n", " groupby=key,\n", " output_path=PLOTS_DIR / f\"{key}_heatmap.png\",\n", " dendrogram=True,\n", " vmin=-0.4,\n", " vmax=0.4,\n", " show=True,\n", " )\n", " icv.pl.plot_spatial(\n", " adata,\n", " color=key,\n", " output_path=PLOTS_DIR / f\"{key}_spatial.png\",\n", " spatial_key=SPATIAL_KEY,\n", " point_size=1,\n", " title=f\"{SAMPLE_NAME} {key}\",\n", " show=True,\n", " )" ] }, { "cell_type": "markdown", "id": "7f2be605", "metadata": {}, "source": [ "## 8. Manually choose normal, tumor, and clone clusters\n", "\n", "Use the heatmaps and spatial maps above to decide which clustering resolution to annotate. The package no longer infers a normal cluster automatically; edit the lists below after visual review." ] }, { "cell_type": "code", "execution_count": null, "id": "900d97f7", "metadata": {}, "outputs": [], "source": [ "primary_key = \"cnv_leiden_res0.1\"\n", "\n", "cluster_summary = (\n", " adata.obs[primary_key]\n", " .astype(str)\n", " .value_counts()\n", " .sort_index()\n", " .rename_axis(primary_key)\n", " .to_frame(\"n_cells\")\n", ")\n", "cluster_summary" ] }, { "cell_type": "code", "execution_count": null, "id": "51dc68e9", "metadata": {}, "outputs": [], "source": [ "# Edit these lists after inspecting the heatmap and spatial CNV plot.\n", "normal_clusters = [] # e.g. [\"0\"]\n", "tumor_clusters = [] # optional; leave empty to label every non-normal cluster as tumor\n", "tumor_clone_clusters = [] # e.g. [\"1\", \"2\"] for tumor clusters with distinct CNV profiles\n", "\n", "if not normal_clusters and not tumor_clusters:\n", " raise ValueError(\"Edit normal_clusters and/or tumor_clusters after reviewing the CNV plots above.\")\n", "\n", "all_clusters = set(adata.obs[primary_key].astype(str).unique())\n", "normal_clusters_for_clones = normal_clusters or sorted(all_clusters - set(tumor_clusters))\n", "\n", "icv.tl.assign_cnv_status(\n", " adata,\n", " cluster_key=primary_key,\n", " normal_clusters=normal_clusters or None,\n", " tumor_clusters=tumor_clusters or None,\n", " output_key=\"cnv_status\",\n", ")\n", "icv.tl.assign_cnv_subclones(\n", " adata,\n", " cluster_key=primary_key,\n", " tumor_clusters=tumor_clone_clusters,\n", " normal_clusters=normal_clusters_for_clones,\n", " output_key=\"cnv_clone\",\n", ")\n", "\n", "display(pd.crosstab(adata.obs[primary_key], adata.obs[\"cnv_status\"]))\n", "display(pd.crosstab(adata.obs[primary_key], adata.obs[\"cnv_clone\"]))\n", "\n", "for color, filename, title in [\n", " (\"cnv_status\", \"cnv_status_spatial.png\", f\"{SAMPLE_NAME} manual CNV status\"),\n", " (\"cnv_clone\", \"cnv_clone_spatial.png\", f\"{SAMPLE_NAME} manual CNV clones\"),\n", "]:\n", " icv.pl.plot_spatial(\n", " adata,\n", " color=color,\n", " output_path=PLOTS_DIR / filename,\n", " spatial_key=SPATIAL_KEY,\n", " point_size=1,\n", " title=title,\n", " show=True,\n", " )" ] }, { "cell_type": "markdown", "id": "f89f8248", "metadata": {}, "source": [ "## 9. Optional: tumor-enriched CNV clustering\n", "\n", "For some datasets, you may want to recluster tumor-enriched cells (e.g. epithelial cells in carcinomas) by CNV profile while labeling all other cells as `non-tumor`. Edit `subset_values` if your tumor-enriched annotation uses a different name." ] }, { "cell_type": "code", "execution_count": null, "id": "ad62d046", "metadata": {}, "outputs": [], "source": [ "epi_cluster_keys = icv.tl.cluster_cnv_resolutions(\n", " adata,\n", " resolutions=[0.1],\n", " key_prefix=\"epi_cnv_leiden_res\",\n", " subset_key=REFERENCE_KEY,\n", " subset_values=[\"Epithelial\"],\n", " subset_label_prefix=\"epi_\",\n", " outside_label=\"non-epi\",\n", " dendrogram=True,\n", ")\n", "\n", "epi_cluster_keys" ] }, { "cell_type": "code", "execution_count": null, "id": "1b9c6c8e", "metadata": {}, "outputs": [], "source": [ "for key in epi_cluster_keys:\n", " icv.pl.plot_spatial(\n", " adata,\n", " color=key,\n", " output_path=PLOTS_DIR / f\"{key}_spatial.png\",\n", " spatial_key=SPATIAL_KEY,\n", " point_size=1,\n", " title=f\"{SAMPLE_NAME} {key}\",\n", " show=True,\n", " )" ] }, { "cell_type": "markdown", "id": "bc2f0123", "metadata": {}, "source": [ "## 10. Export manually selected results\n", "\n", "Run these cells after filling in the manual `normal_clusters`, `tumor_clusters`, and optional `tumor_clone_clusters` lists above. The gene-level table uses `adata.layers[\"gene_values_cnv\"]`, which was created by `infercnvpy` because `calculate_gene_values=True`." ] }, { "cell_type": "code", "execution_count": null, "id": "2fd6aaf6", "metadata": {}, "outputs": [], "source": [ "mean_cnv = icv.tl.export_mean_cnv_per_gene(\n", " adata,\n", " OUTPUT_DIR / f\"{SAMPLE_NAME}_tumor_mean_cnv_per_gene.tsv\",\n", " layer=\"gene_values_cnv\",\n", " obs_key=\"cnv_status\",\n", " obs_values=(\"tumor\",),\n", ")\n", "\n", "mean_cnv.head()" ] }, { "cell_type": "code", "execution_count": null, "id": "f3793f6a", "metadata": {}, "outputs": [], "source": [ "icv.tl.export_cell_groups(\n", " adata,\n", " OUTPUT_DIR / f\"{SAMPLE_NAME}_cnv_status_xenium_cell_groups.csv\",\n", " group_key=\"cnv_status\",\n", ")\n", "icv.tl.export_cell_groups(\n", " adata,\n", " OUTPUT_DIR / f\"{SAMPLE_NAME}_cnv_clone_xenium_cell_groups.csv\",\n", " group_key=\"cnv_clone\",\n", ")\n", "\n", "adata.write(OUTPUT_DIR / f\"{SAMPLE_NAME}_InSituCNV.h5ad\", compression=\"gzip\")\n", "OUTPUT_DIR" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }