MosaicMRI title

A large-scale and diverse dataset of raw musculoskeletal MRI

2,671 volumes
80,156 slices
454 patients
10 anatomies
Multi-contrast + multi-coil

About the Dataset

MosaicMRI is the largest and most diverse open-source raw musculoskeletal MRI dataset to date.
It spans 10 anatomies, 2,671 volumes, and 80,156 slices.

Anatomy Distribution

Anatomy distribution across MosaicMRI

A diverse multi-anatomy dataset

MosaicMRI brings realistic clinical variability across anatomy, contrast, orientation, and coil configuration far beyond the existing brain- and knee-focused datasets.

Raw data for reconstruction research

MosaicMRI fascilitates research on accelerated MRI, low-field reconstruction, motion suppression, and other real-world reconstruction challenges.

A new benchmark for stress-testing AI

Explore Benchmark

This initial release goes beyond typical accelerated reconstruction to probe anatomical and contrast generalization under real-world variability.

Enabling foundation model research in MRI

MosaicMRI provides a testbed for studying key foundation model challenges, including scaling laws, data synergies, continual learning, data mixtures, reliability, and out-of-distribution generalization.

Example Visuals

Orientation-specific examples grouped by anatomy. Select a category to view axial, coronal, and sagittal samples.

Coronal
Coronal sample
Axial
Axial sample
Sagittal
Sagittal sample

Dataset Details

MosaicMRI is designed for learning-based MRI under realistic clinical variability in anatomy, contrast, orientation, and coil configuration.

Volumes
2,671
Patients
454
Slices
80,156
Source Data
~4 TB

Constructing the Dataset

Data were collected on a 1.5T Siemens Magnetom Avantofit scanner between July 15, 2025 and September 23, 2025. We removed incomplete exams, localizers/planning scans, calibration-only acquisitions, and protocols not suited for slice-based reconstruction.

Remaining scans were visually quality-checked and stored as HDF5 with ISMRMRD-compatible headers and fastMRI-style internal layout.

Protocols and Labels

  • Protocol families include PD, T1, T2, STIR, and compatible clinical variants (for example DIXON, DESS, and TIRM).
  • Each scan is labeled with orientation (AX/SAG/COR), coarse contrast, fat-suppression flag, and anatomical category.
  • The final release contains routine slice-based MSK acquisitions only.

Geometry and Coil Diversity

  • Reconstruction matrix: Hx in [256, 768] (mean 320), Hy in [190, 768] (mean 324).
  • Most common resolution: 320 x 320 (1,041 volumes).
  • In-plane resolution: 0.1953-1.4844 mm (mean 0.5729 mm).
  • Slice count: 12-80 (mean 30). Coil count: 4-46, with 16-channel most common (1,056 scans).

Dataset Partitioning

Splits are patient-disjoint to avoid leakage, with target ratios 70% train, 15% val, and 15% test. Assignment was optimized to balance slice counts while preserving per-anatomy coverage across splits.

Split Scans Patients Slices
train 1,873 303 56,235
val 398 68 12,027
test 400 79 11,894

Data Format and Quickstart

File organization and baseline usage for reconstruction experiments.

File structure

Directory layout (current release statistics):

MosaicMRI/
  multicoil_train/                                (1,744 files, 2,381.92 GiB)
    *.h5
  multicoil_val/                                  (398 files, 579.77 GiB)
    *.h5
  multicoil_test/                                 (64 files, 71.58 GiB)
    *.h5
  anatomy_transfer_challenge/
    ankle/                                        (20 files, 49.40 GiB)
      *.h5
  contrast_generalization_challenge/
    T1_FS/                                        (17 files, 20.74 GiB)
      *.h5
  • Core splits: multicoil_train, multicoil_val, and multicoil_test are the standard reconstruction splits; multicoil_test contains both 4x and 8x accelerated test inputs.
  • Benchmark folders: anatomy_transfer_challenge/ankle and contrast_generalization_challenge/T1_FS are challenge-specific evaluation subsets.
  • Challenge construction: ankle and T1-FS were excluded from training for benchmark construction; validation remained available for development workflows.
See Benchmark Challenges

Inside each H5

  • Datasets: ismrmrd_header, kspace, and reconstruction_rss.
  • Study fields: anatomy and protocol (for example tStudyDescription and tProtocolName).
  • Acquisition fields: scanner/vendor, receiver channels, matrix size, FOV, and sequence parameters (TR/TE/TI).
  • Encoding fields: trajectory and parallel-imaging settings (including acceleration factor).

A helper script in the GitHub repository reads these metadata fields and plots one reference slice.

Quickstart

Minimal steps to download a file, apply a mask, and run a baseline reconstruction.

1) Install
git clone https://github.com/AIF4S/mosaicmri
cd mosaicmri
conda env create -f varnet/environment.yml
conda activate mosaic_mri_varnet
2) Run demo reconstruction
python varnet/run_pretrained_varnet_inference.py   --state_dict_file /path/to/weights.ckpt   --data_path /path/to/MosaicMRI/multicoil_test   --output_path /path/to/recons   --accelerations 8   --center_fractions 0.04

Request Access to MosaicMRI

To obtain access, please submit the request form below. We will contact you with instructions after review.

Data Sharing Agreement for Research and Educational Use

Open formal agreement page

By downloading or accessing the dataset entitled “Testing the performance and reliability of AI-based image reconstruction” (the “Dataset” or “Data”), you (“Researcher”) acknowledge that the Data are proprietary to and owned by the University of Southern California (“USC” or “Licensor”).

  • USC grants the Researcher a non-exclusive, royalty-free license to access and use the Data solely for internal, non-commercial research or educational purposes. No other rights are conveyed.
  • The Researcher will receive a download link to access the Data without charge and will not share the download link with any other individual. Each user must register separately and agree to these Terms.
  • The Data may not be sold, licensed, monetized, or otherwise commercially exploited. Those seeking commercial use must contact the USC Stevens Center for Innovation, 3720 S Flower Street, Floor 3, Los Angeles, CA 90089, at mta@stevens.usc.edu.
  • The Researcher shall not distribute, publish, reproduce, retransmit, copy, or transfer any portion of the Data, or variables derived from it, to anyone outside their direct supervision, except as necessary for academic publications or presentations that properly cite the Dataset.
  • The Researcher shall ensure that any colleagues or students within the same institution who access the Data first agree to be bound by these Terms. Anyone under the Researcher’s supervision must follow the same restrictions and obligations.
  • If the Researcher is employed by a for-profit or commercial entity, the employer is also bound by these Terms, and the Researcher represents that they are authorized to enter into this Agreement on behalf of the employer.

Compliance, Use Restrictions, and Safeguards

  • The Data will not include personally identifiable information. If the Data are coded, USC will not release, and the Researcher will not request, the key to the code.
  • The Researcher agrees not to attempt to re-identify individuals, link the Data with other datasets for identification purposes, or contact any individual subjects.
  • The Researcher will comply with all applicable laws, regulations, and institutional policies, including obtaining any required ethics approvals.
  • The Researcher will maintain the security and confidentiality of the Data and use appropriate safeguards to prevent unauthorized access, use, or disclosure.
  • If the Researcher suspects that any portion of the Data contains protected health information (PHI), they will not use it, will immediately destroy it, and will promptly notify USC.

Acknowledgment and Attribution

The Researcher agrees to acknowledge USC as the source of the Data in all written, visual, or oral public disclosures, using citation language consistent with scholarly standards.

“Data used in the preparation of this article were obtained from the University of Southern California MosaicMRI database. USC investigators provided data but did not participate in analysis or writing of this report.”

Please cite: Arguello, P., Tinaz, B., Mohammad, S. S., Soltanolkotabi, Maryam, and Soltanolkotabi, Mahdi. "MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI." arXiv (2026). https://arxiv.org/abs/2604.11762.

Warranty Disclaimer and Limitation of Liability

  • THE DATA ARE PROVIDED “AS IS.” USC has no obligation to provide maintenance, support, updates, enhancements, or modifications.
  • USC MAKES NO REPRESENTATIONS OR WARRANTIES, express or implied, including but not limited to warranties of merchantability, fitness for a particular purpose, or non-infringement.
  • The Researcher accepts full responsibility for their use of the Data and agrees to defend and indemnify USC and its employees, trustees, officers, and agents against all claims arising from such use, including claims related to copyrighted images created from the Data.
  • USC shall not be liable for any loss, claim, or demand made by or against the Researcher arising from use of the Data.
  • IN NO EVENT SHALL USC BE LIABLE for incidental, consequential, exemplary, indirect, or economic damages of any kind, including lost profits, lost business, or lost goodwill, regardless of legal theory or prior notice of potential damages.

Termination and Post-Termination Obligations

  • USC reserves the right to terminate the Researcher’s access to the Data at any time.
  • Upon termination or completion of the Researcher’s work, all copies of the Data must be destroyed.

Governing Law

These Terms and any disputes arising from the use of the Data shall be governed by the laws of the State of California.

By submitting, you agree to comply with the dataset license and access policy.

MosaicMRI Benchmark

A three-track, 8x-acceleration benchmark: Mixed Anatomy Reconstruction, Anatomy Generalization (held-out ankle), and Contrast Generalization (held-out T1-FS). Participants upload reconstructed H5 files and are evaluated against hidden ground truth with PSNR, SSIM, and NMSE leaderboards.

Go to Benchmark

License, Access Policy, and Ethics

Access is granted for research use after manual review.

License

MosaicMRI is released for non-commercial research and method development under the posted license terms.

Scope
Research-only / non-commercial use

De-identification

Metadata is de-identified before release.

  • PHI removed from headers/metadata
  • Research-only use

Citation

Please cite the dataset paper if you use MosaicMRI.

BibTeX

@article{mosaicmri_2026,
  title   = {MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI},
  author  = {Arguello, Paula and Tinaz, Berk and Mohammad, Shahab Sepehri and Soltanolkotabi, Maryam and Soltanolkotabi, Mahdi},
  journal = {arXiv},
  year    = {2026},
  doi     = {10.48550/arXiv.2604.11762}
}

Citation metadata will be updated if publication details change.

Collaborators

AIF4S Research Group
DISC Research Group
University of Utah Logo

University of Utah

USC Logo

University of Southern California (USC)

UC Irvine Logo

University of California, Irvine (UCI)