{ "cells": [ { "cell_type": "markdown", "id": "6a9c869b", "metadata": {}, "source": [ "# 02. Smooth And Run inferCNV\n", "\n", "This notebook runs the core `insituCNV` workflow in a step-by-step format.\n", "\n", "The idea is that users can read each block, change a few parameters, and rerun only the relevant cells.\n", "\n", "The expected starting point is a checked `.h5ad` file from Notebook 1, with:\n", "\n", "- `adata.layers['raw_counts']`\n", "- `adata.obsm['spatial']`\n", "- a valid annotation column that marks normal/reference cells" ] }, { "cell_type": "code", "execution_count": null, "id": "bc5fec80", "metadata": {}, "outputs": [], "source": [ "from pathlib import Path\n", "import infercnvpy as cnv\n", "import matplotlib.pyplot as plt\n", "import numpy as np\n", "import pandas as pd\n", "import scanpy as sc\n", "\n", "PROJECT_ROOT = Path('..').resolve()\n", "import insitucnv as icv\n", "from insitucnv.analysis import find_optimal_clustering\n", "\n", "INPUT_PATH = PROJECT_ROOT / 'results' / 'example_workflow' / 'adata_checked_input.h5ad'\n", "OUTPUT_DIR = PROJECT_ROOT / 'results' / 'example_workflow'\n", "OUTPUT_DIR.mkdir(parents=True, exist_ok=True)\n", "\n", "REFERENCE_KEY = 'cell_type'\n", "REFERENCE_CATEGORIES = ['T_cells', 'B_cells', 'Stromal']\n", "\n", "SMOOTHING_NEIGHBORS = 20\n", "WINDOW_SIZE = 60\n", "STEP = 10\n", "LFC_CLIP = 4\n", "CLUSTER_RESOLUTIONS = [0.1, 0.2, 0.3]\n", "\n", "sc.settings.figdir = str(OUTPUT_DIR / 'figures')\n", "Path(sc.settings.figdir).mkdir(parents=True, exist_ok=True)" ] }, { "cell_type": "markdown", "id": "c8855ad1", "metadata": {}, "source": [ "## Load The Checked Input" ] }, { "cell_type": "code", "execution_count": null, "id": "34bc5a90", "metadata": {}, "outputs": [], "source": [ "adata = sc.read_h5ad(INPUT_PATH)\n", "adata" ] }, { "cell_type": "markdown", "id": "d10cedf5", "metadata": {}, "source": [ "## Keep Only The Reference Categories That Are Present\n", "\n", "This makes the notebook more robust when users adapt it to a different dataset." ] }, { "cell_type": "code", "execution_count": null, "id": "53f05a80", "metadata": {}, "outputs": [], "source": [ "REFERENCE_CATEGORIES = [cat for cat in REFERENCE_CATEGORIES if cat in set(adata.obs[REFERENCE_KEY].astype(str))]\n", "REFERENCE_CATEGORIES" ] }, { "cell_type": "markdown", "id": "19596d76", "metadata": {}, "source": [ "## Build The Expression Neighbor Graph Used For Smoothing\n", "\n", "The smoothing step uses nearest-neighbor relationships from the expression space.\n", "\n", "We normalize the raw counts, compute PCA and neighbors, and then smooth the normalized counts over the graph." ] }, { "cell_type": "code", "execution_count": null, "id": "2f9a334c", "metadata": {}, "outputs": [], "source": [ "adata.X = adata.layers['raw_counts'].copy()\n", "sc.pp.normalize_total(adata)\n", "adata.layers['norm'] = adata.X.copy()\n", "sc.pp.log1p(adata)\n", "sc.pp.pca(adata)\n", "sc.pp.neighbors(adata, n_neighbors=15)" ] }, { "cell_type": "markdown", "id": "f5f43ca1", "metadata": {}, "source": [ "## Smooth The Data For CNV Inference\n", "\n", "This writes a smoothed count matrix into `adata.layers['M']`." ] }, { "cell_type": "code", "execution_count": null, "id": "cf249f7f", "metadata": {}, "outputs": [], "source": [ "icv.tl.smooth_data_for_cnv(\n", " adata,\n", " layer_w_norm_counts='norm',\n", " n_neighbors=SMOOTHING_NEIGHBORS,\n", ")\n", "adata.layers['smooth_norm'] = adata.layers['M'].copy()" ] }, { "cell_type": "markdown", "id": "0ad22986", "metadata": {}, "source": [ "## Prepare The Smoothed Matrix For inferCNV\n", "\n", "This follows the same pattern used in the publication workflow:\n", "\n", "- move the smoothed matrix into `adata.X`\n", "- normalize again\n", "- log-transform\n", "- add genomic positions" ] }, { "cell_type": "code", "execution_count": null, "id": "40a950dd", "metadata": {}, "outputs": [], "source": [ "adata.X = adata.layers['M'].copy()\n", "sc.pp.normalize_total(adata)\n", "sc.pp.log1p(adata)\n", "adata = icv.pp.add_genomic_positions(adata)\n", "adata" ] }, { "cell_type": "markdown", "id": "6769ce59", "metadata": {}, "source": [ "## Run inferCNV\n", "\n", "The chosen reference categories should represent normal cells or normal tissue regions." ] }, { "cell_type": "code", "execution_count": null, "id": "470b1f90", "metadata": {}, "outputs": [], "source": [ "cnv.tl.infercnv(\n", " adata,\n", " window_size=WINDOW_SIZE,\n", " step=STEP,\n", " lfc_clip=LFC_CLIP,\n", " reference_key=REFERENCE_KEY,\n", " reference_cat=REFERENCE_CATEGORIES,\n", " chunksize=1000,\n", " calculate_gene_values=True,\n", ")" ] }, { "cell_type": "markdown", "id": "0fef5325", "metadata": {}, "source": [ "## CNV PCA, Neighbors, And Resolution Sweep" ] }, { "cell_type": "code", "execution_count": null, "id": "bbda6691", "metadata": {}, "outputs": [], "source": [ "cnv.tl.pca(adata)\n", "cnv.pp.neighbors(adata)\n", "\n", "metrics = find_optimal_clustering(\n", " adata,\n", " resolutions=CLUSTER_RESOLUTIONS,\n", " obsm_key='X_cnv',\n", " spatial_key='spatial',\n", ")\n", "metrics" ] }, { "cell_type": "markdown", "id": "fee159c7", "metadata": {}, "source": [ "## Choose A Clustering Resolution\n", "\n", "Edit `SELECTED_RESOLUTION` after looking at the metrics table above.\n", "\n", "For a quick first pass, `0.1` is often a sensible default on smaller samples." ] }, { "cell_type": "code", "execution_count": null, "id": "7b03a693", "metadata": {}, "outputs": [], "source": [ "SELECTED_RESOLUTION = 0.1\n", "CLUSTER_KEY = f'cnv_leiden_{SELECTED_RESOLUTION:g}'\n", "\n", "cnv.tl.leiden(adata, resolution=SELECTED_RESOLUTION, key_added=CLUSTER_KEY)\n", "adata.obs[CLUSTER_KEY].value_counts()" ] }, { "cell_type": "markdown", "id": "1c0fb90b", "metadata": {}, "source": [ "## Review CNV clusters before manual annotation\n", "\n", "Inspect the cluster sizes, spatial plot, and chromosome heatmap before assigning any cluster as normal or tumor. No normal cluster is inferred automatically." ] }, { "cell_type": "code", "execution_count": null, "id": "d140f57e", "metadata": {}, "outputs": [], "source": [ "cluster_summary = (\n", " adata.obs[CLUSTER_KEY]\n", " .astype(str)\n", " .value_counts()\n", " .sort_index()\n", " .rename_axis(CLUSTER_KEY)\n", " .to_frame('n_cells')\n", ")\n", "cluster_summary" ] }, { "cell_type": "markdown", "id": "3cf2d06d", "metadata": {}, "source": [ "## Visualize the CNV heatmap and spatial clusters\n", "\n", "These are the main plots to inspect before deciding which clusters are normal, tumor, or distinct tumor clones." ] }, { "cell_type": "code", "execution_count": null, "id": "076f2eb2", "metadata": {}, "outputs": [], "source": [ "sc.pl.embedding(adata, basis='spatial', color=REFERENCE_KEY, frameon=False, size=12)" ] }, { "cell_type": "code", "execution_count": null, "id": "0f22570a", "metadata": {}, "outputs": [], "source": [ "sc.pl.embedding(adata, basis='spatial', color=CLUSTER_KEY, frameon=False, size=12)" ] }, { "cell_type": "code", "execution_count": null, "id": "9c1384e0", "metadata": {}, "outputs": [], "source": [ "cnv.pl.chromosome_heatmap(\n", " adata,\n", " groupby=CLUSTER_KEY,\n", " dendrogram=True,\n", " vmin=-0.4,\n", " vmax=0.4,\n", ")" ] }, { "cell_type": "markdown", "id": "ab7b65ac", "metadata": {}, "source": [ "## Manually label tumor/normal clusters and optional tumor clones\n", "\n", "Edit the lists below after reviewing the heatmap and spatial CNV plot. Use `tumor_clone_clusters` only for tumor clusters with CNV profiles that are distinct enough to report separately." ] }, { "cell_type": "code", "execution_count": null, "id": "dc911e1d", "metadata": {}, "outputs": [], "source": [ "NORMAL_CLUSTERS = [] # e.g. ['0']\n", "TUMOR_CLUSTERS = [] # optional; leave empty to label every non-normal cluster as tumor\n", "TUMOR_CLONE_CLUSTERS = [] # e.g. ['1', '2']\n", "\n", "if not NORMAL_CLUSTERS and not TUMOR_CLUSTERS:\n", " raise ValueError('Edit NORMAL_CLUSTERS and/or TUMOR_CLUSTERS after reviewing the CNV plots above.')\n", "\n", "all_clusters = set(adata.obs[CLUSTER_KEY].astype(str).unique())\n", "normal_clusters_for_clones = NORMAL_CLUSTERS or sorted(all_clusters - set(TUMOR_CLUSTERS))\n", "\n", "icv.tl.assign_cnv_status(\n", " adata,\n", " cluster_key=CLUSTER_KEY,\n", " normal_clusters=NORMAL_CLUSTERS or None,\n", " tumor_clusters=TUMOR_CLUSTERS or None,\n", " output_key='cnv_status',\n", ")\n", "icv.tl.assign_cnv_subclones(\n", " adata,\n", " cluster_key=CLUSTER_KEY,\n", " tumor_clusters=TUMOR_CLONE_CLUSTERS,\n", " normal_clusters=normal_clusters_for_clones,\n", " output_key='cnv_clone',\n", ")\n", "\n", "display(pd.crosstab(adata.obs[CLUSTER_KEY], adata.obs['cnv_status']))\n", "display(pd.crosstab(adata.obs[CLUSTER_KEY], adata.obs['cnv_clone']))\n", "sc.pl.embedding(adata, basis='spatial', color=['cnv_status', 'cnv_clone'], frameon=False, size=12)" ] }, { "cell_type": "markdown", "id": "75bbb288", "metadata": {}, "source": [ "## Save The Final CNV Object" ] }, { "cell_type": "code", "execution_count": null, "id": "3c1b7521", "metadata": {}, "outputs": [], "source": [ "metrics.to_csv(OUTPUT_DIR / 'cluster_resolution_metrics.csv', index=False)\n", "adata.write(OUTPUT_DIR / 'adata_cnv_results.h5ad', compression='gzip')\n", "print(OUTPUT_DIR / 'adata_cnv_results.h5ad')" ] } ], "metadata": { "kernelspec": { "display_name": "Python 3 (ipykernel)", "language": "python", "name": "python3" }, "language_info": { "codemirror_mode": { "name": "ipython", "version": 3 }, "file_extension": ".py", "mimetype": "text/x-python", "name": "python", "nbconvert_exporter": "python", "pygments_lexer": "ipython3", "version": "3.11.9" } }, "nbformat": 4, "nbformat_minor": 5 }