Run InSituCNV
This notebook runs the full InSituCNV workflow on a spatial transcriptomics dataset:
normalize raw counts, smooth over neighbors, normalize and log-transform, add genomic positions, run infercnvpy, cluster CNV profiles, plot heatmaps and spatial CNV maps, then manually choose normal/tumor clusters and optional tumor clone labels.
The example setup uses generic terminology. Replace the paths, annotation column, and reference categories with values for your own dataset.
1. Setup
Run this notebook from the package environment:
conda activate insitucnv_env
cd InSituCNV
pip install -e .
jupyter lab
from pathlib import Path
import pandas as pd
import scanpy as sc
import insitucnv as icv
sc.settings.verbosity = 2
sc.set_figure_params(figsize=(6, 6), dpi=100)
SAMPLE_NAME = "example_sample"
DATA_PATH = Path("data/your_dataset.h5ad")
OUTPUT_DIR = Path("outputs") / SAMPLE_NAME
PLOTS_DIR = OUTPUT_DIR / "plots"
RAW_LAYER = "raw_counts" # Change this if your raw count layer has another name.
REFERENCE_KEY = "cell_type" # obs column containing normal/reference labels.
REFERENCE_CATEGORIES = [
"T_cells",
"B_cells",
"Stromal",
]
SPATIAL_KEY = "spatial"
OUTPUT_DIR.mkdir(parents=True, exist_ok=True)
PLOTS_DIR.mkdir(parents=True, exist_ok=True)
DATA_PATH
2. Load the AnnData object
Before running the CNV steps, check that your AnnData object has:
raw counts in
adata.layers[RAW_LAYER], or raw counts inadata.Xthat can be copied into that layernormal/reference labels in
adata.obs[REFERENCE_KEY]spatial coordinates in
adata.obsm[SPATIAL_KEY]a precomputed neighbor graph for smoothing
adata = sc.read_h5ad(DATA_PATH)
adata
print("layers:", list(adata.layers.keys()))
print("obsm:", list(adata.obsm.keys()))
print("obs columns:", list(adata.obs.columns))
print("reference labels:", sorted(adata.obs[REFERENCE_KEY].astype(str).unique()))
if SPATIAL_KEY not in adata.obsm:
raise KeyError(f"adata.obsm[{SPATIAL_KEY!r}] is required for spatial plots.")
if REFERENCE_KEY not in adata.obs:
raise KeyError(f"adata.obs[{REFERENCE_KEY!r}] is required for inferCNV reference labels.")
if RAW_LAYER not in adata.layers:
print(f"Layer {RAW_LAYER!r} was not found. Copying current adata.X into adata.layers[{RAW_LAYER!r}].")
adata.layers[RAW_LAYER] = adata.X.copy()
icv.pl.plot_spatial(
adata,
color=REFERENCE_KEY,
spatial_key=SPATIAL_KEY,
output_path=PLOTS_DIR / "reference_labels_spatial.png",
point_size=15,
title=f"{SAMPLE_NAME} reference labels",
show=True,
)
3. Define inferCNV reference cells
The reference categories should be normal or healthy cells for your dataset. For a tissue sample this is often immune, stromal, endothelial, or other non-tumor categories, depending on your annotations.
present_reference_categories = [
category
for category in REFERENCE_CATEGORIES
if category in set(adata.obs[REFERENCE_KEY].astype(str))
]
if not present_reference_categories:
raise ValueError("None of the configured REFERENCE_CATEGORIES were found in the reference annotation column.")
present_reference_categories
4. Normalize, smooth, log-normalize, and add genomic positions
This step:
restores raw counts from
adata.layers[RAW_LAYER]intoadata.Xnormalizes total counts into
adata.layers["norm"]smooths normalized counts over the neighbor graph into
adata.layers["M"]normalizes and log-transforms the smoothed matrix into
adata.layers["log_norm"]subsets to genes with genomic positions and adds
chromosome,start, andend
adata = icv.tl.prepare_cnv_input(
adata,
raw_layer=RAW_LAYER,
target_sum=1e4,
smoothing_neighbors=100,
normalized_layer="norm",
smoothed_layer="M",
log_layer="log_norm",
)
adata
adata.var[["chromosome", "start", "end"]].head()
5. Run inferCNV
Adjust window_size, step, and lfc_clip if you need different inferCNV settings for your panel or tissue.
icv.tl.run_infercnv(
adata,
reference_key=REFERENCE_KEY,
reference_categories=present_reference_categories,
input_layer="log_norm",
window_size=60,
step=10,
lfc_clip=4,
chunksize=1000,
calculate_gene_values=True,
)
adata
6. Compute CNV neighbors and cluster at multiple resolutions
Testing several resolutions lets you compare how granular the CNV clone structure should be. The keys created here are cnv_leiden_res0.1, cnv_leiden_res0.2, and cnv_leiden_res0.3.
icv.tl.compute_cnv_neighbors(adata)
resolutions = [0.1, 0.2, 0.3]
cluster_keys = icv.tl.cluster_cnv_resolutions(
adata,
resolutions=resolutions,
dendrogram=True,
)
cluster_keys
for key in cluster_keys:
display(pd.crosstab(adata.obs[key], adata.obs[REFERENCE_KEY]))
7. Plot CNV heatmaps and spatial CNV clusters
Review these plots before assigning any cluster as normal, tumor, or a separate tumor clone.
for key in cluster_keys:
print(key)
icv.pl.plot_chromosome_heatmap(
adata,
groupby=key,
output_path=PLOTS_DIR / f"{key}_heatmap.png",
dendrogram=True,
vmin=-0.4,
vmax=0.4,
show=True,
)
icv.pl.plot_spatial(
adata,
color=key,
output_path=PLOTS_DIR / f"{key}_spatial.png",
spatial_key=SPATIAL_KEY,
point_size=1,
title=f"{SAMPLE_NAME} {key}",
show=True,
)
8. Manually choose normal, tumor, and clone clusters
Use the heatmaps and spatial maps above to decide which clustering resolution to annotate. The package no longer infers a normal cluster automatically; edit the lists below after visual review.
primary_key = "cnv_leiden_res0.1"
cluster_summary = (
adata.obs[primary_key]
.astype(str)
.value_counts()
.sort_index()
.rename_axis(primary_key)
.to_frame("n_cells")
)
cluster_summary
# Edit these lists after inspecting the heatmap and spatial CNV plot.
normal_clusters = [] # e.g. ["0"]
tumor_clusters = [] # optional; leave empty to label every non-normal cluster as tumor
tumor_clone_clusters = [] # e.g. ["1", "2"] for tumor clusters with distinct CNV profiles
if not normal_clusters and not tumor_clusters:
raise ValueError("Edit normal_clusters and/or tumor_clusters after reviewing the CNV plots above.")
all_clusters = set(adata.obs[primary_key].astype(str).unique())
normal_clusters_for_clones = normal_clusters or sorted(all_clusters - set(tumor_clusters))
icv.tl.assign_cnv_status(
adata,
cluster_key=primary_key,
normal_clusters=normal_clusters or None,
tumor_clusters=tumor_clusters or None,
output_key="cnv_status",
)
icv.tl.assign_cnv_subclones(
adata,
cluster_key=primary_key,
tumor_clusters=tumor_clone_clusters,
normal_clusters=normal_clusters_for_clones,
output_key="cnv_clone",
)
display(pd.crosstab(adata.obs[primary_key], adata.obs["cnv_status"]))
display(pd.crosstab(adata.obs[primary_key], adata.obs["cnv_clone"]))
for color, filename, title in [
("cnv_status", "cnv_status_spatial.png", f"{SAMPLE_NAME} manual CNV status"),
("cnv_clone", "cnv_clone_spatial.png", f"{SAMPLE_NAME} manual CNV clones"),
]:
icv.pl.plot_spatial(
adata,
color=color,
output_path=PLOTS_DIR / filename,
spatial_key=SPATIAL_KEY,
point_size=1,
title=title,
show=True,
)
9. Optional: tumor-enriched CNV clustering
For some datasets, you may want to recluster tumor-enriched cells (e.g. epithelial cells in carcinomas) by CNV profile while labeling all other cells as non-tumor. Edit subset_values if your tumor-enriched annotation uses a different name.
epi_cluster_keys = icv.tl.cluster_cnv_resolutions(
adata,
resolutions=[0.1],
key_prefix="epi_cnv_leiden_res",
subset_key=REFERENCE_KEY,
subset_values=["Epithelial"],
subset_label_prefix="epi_",
outside_label="non-epi",
dendrogram=True,
)
epi_cluster_keys
for key in epi_cluster_keys:
icv.pl.plot_spatial(
adata,
color=key,
output_path=PLOTS_DIR / f"{key}_spatial.png",
spatial_key=SPATIAL_KEY,
point_size=1,
title=f"{SAMPLE_NAME} {key}",
show=True,
)
10. Export manually selected results
Run these cells after filling in the manual normal_clusters, tumor_clusters, and optional tumor_clone_clusters lists above. The gene-level table uses adata.layers["gene_values_cnv"], which was created by infercnvpy because calculate_gene_values=True.
mean_cnv = icv.tl.export_mean_cnv_per_gene(
adata,
OUTPUT_DIR / f"{SAMPLE_NAME}_tumor_mean_cnv_per_gene.tsv",
layer="gene_values_cnv",
obs_key="cnv_status",
obs_values=("tumor",),
)
mean_cnv.head()
icv.tl.export_cell_groups(
adata,
OUTPUT_DIR / f"{SAMPLE_NAME}_cnv_status_xenium_cell_groups.csv",
group_key="cnv_status",
)
icv.tl.export_cell_groups(
adata,
OUTPUT_DIR / f"{SAMPLE_NAME}_cnv_clone_xenium_cell_groups.csv",
group_key="cnv_clone",
)
adata.write(OUTPUT_DIR / f"{SAMPLE_NAME}_InSituCNV.h5ad", compression="gzip")
OUTPUT_DIR