Tutorials
This page gives a minimal workflow to run MUGO and reproduce the main results from the paper.
Prerequisites
- Installation completed as in Installation.
- Reference genome and annotations: the scripts expect hg38 (e.g.
hg38.faorhg38.ml.fa) and a GTF (e.g. GENCODE v41). Setpath/to/your/folderin the code or config to your project root so that paths likedataset/,results/resolve correctly.
Quick Start: Single-Tissue Optimization
To run the multi-head optimization for one tissue (e.g. Whole Blood) with the Borzoi backbone and K = 20 heads:
# From the repository root
python src/train_model/MVP_multi_head.py --tissue blood --k 20
Outputs (optimization logs, selected SNPs, gains) are written under results/ in a tissue- and K-specific folder.
Reproducing Paper Results
-
Dataset preparation
Place your gene list (e.g.gene_3000_borzoi_gencode_v41_hg38.csv), FASTA, and GTF under the paths used in the scripts (seepath/to/your/folder/dataset/in the source). -
Run optimization
Use the same commands as in the paper (e.g.MVP_multi_head.pyfor Borzoi RNA, or the corresponding script for ATAC/CAGE/other modalities). Adjust--tissueand--kas in the experiments. -
Baseline benchmarks
Scripts undersrc/baseline_benchmark/(e.g. CADD, FunSeq2, Greedy ISM, Random Search, Saliency) can be run to reproduce the comparison tables and figures. See the respective script headers andreadme.mdin that folder for inputs and options.
For exact hyperparameters and data splits, refer to the paper and the default arguments in each script.