# PHAROST API Reference

Public API surface of the `pharost` package. Import as:

```python
import pharost
```

The package exposes three layers:

- **Top-level** (`pharost.*`): training, inference, model loading.
- **`pharost.analysis.*`**: downstream analyses on predicted drug-response scores.
- **`pharost.model.*`**: lower-level neural network components (for advanced use).

---
## Model Modules

### `pharost.train`

```python
pharost.train(
    p_bulk_gene_exp,
    p_bulk_label,
    p_adata,
    out_dir,
    n_epochs,
    batch_size,
    seed=42,
    device='cuda',
    LR=5e-5,
    GAT_hidden_dim=(512, 64),
    lmmd_weight=0.3,
    coral_weight=0.7,
    spatial_preprocess=False,
    p_cell_info=None,
    p_sc_label=None,
)
```

End-to-end training. Loads bulk + spatial data, trains the GAT-MLP transfer
model with LMMD + CORAL domain-adaptation losses, and saves predicted
probabilities to `out_dir`.

**Parameters**

| Parameter | Type | Description |
|---|---|---|
| `p_bulk_gene_exp` | `str` | Path to bulk RNA-Seq expression CSV (cells × genes). |
| `p_bulk_label` | `str` | Path to bulk drug-response label CSV. |
| `p_adata` | `str` | Path to spatial AnnData (`.h5` or `.h5ad`). |
| `out_dir` | `str` | Output directory; predictions and per-run log written here. |
| `n_epochs` | `int` | Number of training epochs. |
| `batch_size` | `int` | Number of METIS partitions for the spatial graph. |
| `seed` | `int` | `default : 42`, Random seed for full reproducibility. |
| `device` | `str` | `'cuda'` or `'cpu'`. |
| `LR` | `float` | `default : 5e-5`,AdamW learning rate. |
| `GAT_hidden_dim` | `tuple[int, int]` |`default: (512, 64)`, GAT hidden dimensions `(num_hidden, out_dim)`. |
| `lmmd_weight` | `float` | `default: 0.3`, LMMD loss weight. |
| `coral_weight` | `float` | `default: 0.7`, CORAL loss weight. |
| `spatial_preprocess` | `bool` | If True, run scanpy normalize/log/scale on spatial data. |
| `p_cell_info` | `str`, optional | CSV with `x_centroid`/`y_centroid` columns; used if `adata.obsm['spatial']` is missing. |
| `p_sc_label` | `str`, optional | scRNA-seq label path (passed to inner trainer). |

**Returns**: trained `TransferNN` model.

**Outputs**: writes the following to `out_dir`:
- `pharost_weights_adapted.pth` — pickled trained model.
- `predicted_probabilities.csv` — spatial-cell predicted probabilities.
- `predicted_probabilities_bulk.csv` — bulk predicted probabilities.

---

### `pharost.load_model`

```python
pharost.load_model(path, device='cuda')
```

Load a saved PHAROST model from disk.

**Parameters**

| Parameter | Type | Description |
|---|---|---|
| `path` | `str` | Path to a saved `.pth` file produced by `pharost.train()`. |
| `device` | `str` | Device to map the model onto. |

**Returns**: `TransferNN` model.

---

### `pharost.predict`

```python
pharost.predict(
    model,
    adata,
    batch_size,
    device='cuda',
    spatial_preprocess=False,
    spa_identity=False,
)
```

Run inference on spatial cells using a trained model.

**Parameters**

| Parameter | Type | Description |
|---|---|---|
| `model` | `TransferNN` | Model from `pharost.train()` or `pharost.load_model()`. |
| `adata` | `AnnData` | Spatial transcriptomics object with `adata.obsm['spatial']`. |
| `batch_size` | `int` | Number of METIS partitions, same as in the training. |
| `device` | `str` | `'cuda'` or `'cpu'`. |
| `spatial_preprocess` | `bool` | Apply scanpy preprocessing before inference. |
| `spa_identity` | `bool` | Use identity matrix for spatial graph. |

**Returns**: `list[np.ndarray]` — per-cell predicted probability vectors,
ordered to match `adata.obs`.

---
## Analysis Modules `pharost.analysis`

Downstream analyses. All functions assume `adata.obs[drug]` already contains
predicted probabilities (use `load_response_prediction` to populate them).

### `pharost.analysis.load_response_prediction`

```python
pharost.analysis.load_response_prediction(
    adata,
    drugs,
    path_template,
    add_label=False,
    label_threshold=0.5,
)
```

Load per-drug prediction CSVs into `adata.obs`.

**Parameters**

| Parameter | Type | Description |
|---|---|---|
| `adata` | `AnnData \| str` | AnnData object, or path to `.h5ad` file. |
| `drugs` | `list[str]` | Drug names to load. |
| `path_template` | `str \| callable` | Per-drug path: format string with `{drug}` placeholder, or `callable(drug) -> str`. |
| `add_label` | `bool` | If True, also write `adata.obs[f"{drug}_label"]` as `"Sensitive"`/`"Resistant"`. |
| `label_threshold` | `float` | Threshold for the Sensitive/Resistant cutoff. |

**Returns**: `AnnData` (in-place modified, also returned).

**Example**

```python
adata = pharost.analysis.load_response_prediction(
    adata,
    drugs=['LAPATINIB', 'AFATINIB'],
    path_template=lambda d: f'BC_result/{d}/predicted_probabilities.csv',
)
```

---

### `pharost.analysis.plot_response_celltype_prop`

```python
pharost.analysis.plot_response_celltype_prop(
    adata,
    target_drugs,
    cell_type_col,
    save=False,
    file_format='pdf',
    sample_id=None,
    save_dir='Drug_Celltype',
    palette='tab10',
)
```

Bar plot of the proportion of "sensitive" cells (probability > 0.5) in each
cell type, faceted by drug. Cell types with fewer than 5 cells are dropped.

---

### `pharost.analysis.drug_gene_correlation`

```python
pharost.analysis.drug_gene_correlation(
    adata,
    target_drugs,
    n_top_genes=15,
    cmap=None,
    plot=True,
    annot=False,
    save=False,
    save_dir='Gene_Drug_Corr',
    file_format='png',
    verbose=False,
)
```

Spearman correlation between gene expression and drug response across all
cells. Computes per-drug top-`n_top_genes` and plots a clustered heatmap of
the union.

**Returns**: long-form `pd.DataFrame` with columns `Gene`, `Drug`, `Correlation`.