Example Analysis

Table of Contents

This tutorial introduces three common simulation workflows in SpaSimAgent:

  1. reference-based simulation at tissue level,
  2. reference-free simulation within domains, and
  3. scRNA-seq-based simulation without a deconvolution matrix.

The examples are intentionally minimal and use small demo datasets hosted in the project repository, so you can run them quickly and compare outputs.

library(SpaSimAgent)

We now move from a reference-based workflow to a reference-free workflow, and finally to a single-cell-guided workflow.

Reference-Based Simulation within Tissue #

In this first example, we load Xenium breast cancer demo data and run runReferenceBased() with scheme = "tissue".
The cor_list object provides the correlation structure used during simulation.

    location <- read.csv("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/xenium_bc_loc_demo.csv", row.names = 1)
    count <- as.matrix(readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/xenium_bc_count_demo.rds"))
    corMat <- as.matrix(readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/xenium_bc_cor_demo.rds"))
    cor_list <- list(corMat)
    names(cor_list) <- unique(location$label)
    
    ### simulate
    set.seed(1)
    res <- runReferenceBased(count, 
                             location, 
                             cor_list,
                             marginal = "auto",
                             scheme = "tissue")
    
    new <- res@simulated_counts
    print(new[1:3,1:3])

The printed 3 x 3 slice confirms that a simulated count matrix was generated and can be inspected as ordinary matrix-like output.

Next, we switch to a reference-free setup where marginal distributions are specified explicitly.

Reference-Free Simulation within Domains #

Here, we use DLPFC demo data and estimate simple Poisson parameters per gene within each spatial domain label.
This shows how to prepare para_list and dist_list before calling runReferenceFree().

    count <- as.matrix(readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/DLPFC151674_count_demo.rds"))
    location <- read.csv("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/DLPFC151674_loc_demo.csv", row.names = 1)
    cor_list <- readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/DLPFC151674_cor_demo.rds")
    
    ### calculate parameters
    clu <- unique(location$label)
    pars_list <- vector("list", length(clu))
    names(pars_list) <- clu
    dist_list <- vector("list", length(clu))
    names(dist_list) <- clu
    
    ## assume we want to fit poisson ditribution
    for(i in clu){
        ctx <- count[,location$label == i]
        
        for(j in 1:nrow(ctx)){
            lambda <- mean(ctx[j,])
            pars_list[[i]][[j]] <- c(lambda = lambda)
        }
        names(pars_list[[i]]) <- rownames(count) # add genename 
        
        dist_list[[i]] <- rep("qpois", nrow(ctx))
        
    }
    
    # simulate
    set.seed(1) 
    res <- runReferenceFree(
        cor_list = cor_list, 
        dist_list = dist_list, 
        para_list = pars_list,
        Loc = location, 
        scheme = "domain")
    
    new <- as.matrix(SpaSimAgent::simulated_counts(res))
    print(new[1:3,1:3])

The output preview shows the simulation completed with user-defined marginal settings, while preserving a domain-aware correlation structure.

After the reference-free case, we proceed to a workflow that integrates scRNA-seq and spatial data.

scRNA-seq-Based Simulation without Deconvolution Matrix #

In this final section, we use single-cell counts/metadata plus spatial counts/locations and call runscRNAseqBased() with deconv_matrix = NULL.
This demonstrates the API when a precomputed deconvolution matrix is not provided.

## data
    sc_count <- readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/mob_sc_count_demo.rds")
    sc_meta <- readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/mob_sc_meta_demo.rds")
    sp_count <- readRDS("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/mob_sp_count_demo.rds")
    location <- read.csv("https://github.com/YMa-lab/SpaSimAgent/blob/main/data/mob_sp_location_demo.csv", row.names = 1)

    ### simulate
    set.seed(1)
    res <- runscRNAseqBased(
        deconv_matrix = NULL,
        sp_loc = location,
        sc_count= sc_count, 
        sc_meta = sc_meta, 
        total_num_cells = 10,
        sp_count = sp_count, 
        card.ct.varname = "cellType",
        card.sample.varname = "sampleInfo",
        card.ct.select = as.character(unique(sc_meta$cellType)),
        card.minCountGene = 0,
        card.minCountSpot = 0,
        ties.method = "random")
    
    new <- as.matrix(SpaSimAgent::simulated_counts(res))
    print(new[1:3,1:3])

The output slice indicates that the simulated spatial expression matrix is available through simulated_counts(res) and ready for downstream analysis or benchmarking.

Summary #

This tutorial walked through three practical SpaSimAgent workflows, from reference-based simulation to reference-free and scRNA-seq-guided simulation.
Across all three modes, the core pattern is consistent: prepare inputs, set a random seed for reproducibility, run the simulator, and inspect simulated_counts for downstream use.