MosaicMRI title

A large-scale dataset of fully-sampled raw musculoskeletal MRI

2,671 volumes
80,156 slices
454 patients
10 anatomies
Multi-contrast + multi-coil

About the Dataset

MosaicMRI is the largest open-source raw musculoskeletal MRI dataset to date, with 2,671 volumes and 80,156 slices.

Anatomy Distribution

Anatomy distribution across MosaicMRI

Beyond single-anatomy benchmarks

MosaicMRI extends open raw MRI benchmarks beyond brain and knee with diverse MSK anatomies and protocols.

Raw data for reconstruction research

Fully sampled multi-coil raw measurements suitable for accelerated reconstruction studies.

Research questions enabled

VarNet baselines demonstrate the usefulness of the dataset for analyzing scaling and cross-anatomy generalization.

Example Visuals

Orientation-specific examples grouped by anatomy. Select a category to view axial, coronal, and sagittal samples.

Anatomy: Upper Extremity
Coronal
Coronal sample
Axial
Axial sample
Sagittal
Sagittal sample

Dataset Details

MosaicMRI is designed for learning-based MRI under realistic clinical variability in anatomy, contrast, orientation, and coil configuration.

Volumes
2,671
Patients
454
Slices
80,156
Source Data
~4 TB

Constructing the Dataset

Data were collected on a 1.5T Siemens Magnetom Avantofit scanner between July 15, 2025 and September 23, 2025. We removed incomplete exams, localizers/planning scans, calibration-only acquisitions, and protocols not suited for slice-based reconstruction.

Remaining scans were visually quality-checked and stored as HDF5 with ISMRMRD-compatible headers and fastMRI-style internal layout.

Protocols and Labels

  • Protocol families include PD, T1, T2, STIR, and compatible clinical variants (for example DIXON, DESS, and TIRM).
  • Each scan is labeled with orientation (AX/SAG/COR), coarse contrast, fat-suppression flag, and anatomical category.
  • The final release contains routine slice-based MSK acquisitions only.

Geometry and Coil Diversity

  • Reconstruction matrix: Hx in [256, 768] (mean 320), Hy in [190, 768] (mean 324).
  • Most common resolution: 320 x 320 (1,041 volumes).
  • In-plane resolution: 0.1953-1.4844 mm (mean 0.5729 mm).
  • Slice count: 12-80 (mean 30). Coil count: 4-46, with 16-channel most common (1,056 scans).

Dataset Partitioning (Patient-level)

Splits are patient-disjoint to avoid leakage, with target ratios 70% train, 15% val, and 15% test. Assignment was optimized to balance slice counts while preserving per-anatomy coverage across splits.

Split Scans Patients Slices
train 1,873 303 56,235
val 398 68 12,027
test 400 79 11,894

Data Format and Quickstart

File organization and baseline usage for reconstruction experiments.

File structure

Directory layout (current release statistics):

MosaicMRI/
  multicoil_train/                                (1,744 files, 2,381.92 GiB)
    *.h5
  multicoil_val/                                  (398 files, 579.77 GiB)
    *.h5
  multicoil_test/                                 (64 files, 71.58 GiB)
    *.h5
  anatomy_transfer_challenge/
    ankle/                                        (20 files, 49.40 GiB)
      *.h5
  contrast_generalization_challenge/
    T1_FS/                                        (17 files, 20.74 GiB)
      *.h5
  • Core splits: multicoil_train, multicoil_val, and multicoil_test are the standard reconstruction splits; multicoil_test contains both 4x and 8x accelerated test inputs.
  • Benchmark folders: anatomy_transfer_challenge/ankle and contrast_generalization_challenge/T1_FS are challenge-specific evaluation subsets.
  • Challenge construction: ankle and T1-FS were excluded from training for benchmark construction; validation remained available for development workflows.
See Benchmark Challenges

Inside each H5

  • Datasets: ismrmrd_header, kspace, and reconstruction_rss.
  • Study fields: anatomy and protocol (for example tStudyDescription and tProtocolName).
  • Acquisition fields: scanner/vendor, receiver channels, matrix size, FOV, and sequence parameters (TR/TE/TI).
  • Encoding fields: trajectory and parallel-imaging settings (including acceleration factor).

A helper script in the GitHub repository reads these metadata fields and plots one reference slice.

Minimal steps to download a file, apply a mask, and run a baseline reconstruction.

1) Install
git clone https://github.com/paularguello07/msk_mri_dataset
cd msk_mri_dataset
pip install -r requirements.txt
2) Run demo reconstruction
python demo_recon.py \
  --file path/to/sample.h5 \
  --mask random \
  --acc 8 \
  --out out.png

Request Access to MosaicMRI

To obtain access, please submit the request form below. We will contact you with instructions after review.

Data Sharing Agreement for Research and Educational Use

Open formal agreement page

By registering for access to the Dataset released by the University of Southern California (“USC”), I agree to this Dataset Sharing Agreement (“Agreement”), as well as to any additional terms of use as posted and updated periodically on USC’s designated website(s).

The Dataset is proprietary to and owned by USC. Other than the rights granted herein, USC retains all rights, title, and interest in the Dataset.

1. License Grant

Subject to the provisions of this Agreement, USC grants to me a non-exclusive, royalty-free license to access and use the Dataset for internal research or educational purposes only. This Agreement conveys no other rights of any sort with respect to the Dataset or the intellectual property embodied therein.

2. Access and Use Restrictions

  • I will receive a download link to access the Dataset without charge, solely for internal research or educational use.
  • I will not share the download link with others. Each user must register individually and agree to this Agreement.
  • I will not sell, license, monetize, or commercially exploit any portion of the Dataset.
  • I will not distribute, publish, reproduce, retransmit, copy, or otherwise transfer any portion of the Dataset, or variables derived from it, to anyone outside my direct supervision, except for uses in academic publications and presentations that properly cite the Dataset.
  • I will ensure that anyone under my supervision who accesses the Dataset agrees to the same restrictions and obligations.

3. Compliance and Safeguards

  • I will comply with all applicable laws, governmental regulations, and institutional policies, including obtaining all necessary approvals.
  • I will maintain the security and confidentiality of the Dataset and use appropriate safeguards to prevent unauthorized use or disclosure.
  • I will not attempt to re-identify subjects, link the Dataset with other datasets to enable identification, or contact any individual subjects.
  • If I suspect that any portion of the Dataset contains protected health information (PHI), I will not use it, will immediately destroy it, and will notify USC promptly.

4. Acknowledgement and Citations

I agree to acknowledge the Dataset in any publications or presentations by citing the designated references provided by USC. Language similar to the following must be included:

“Data used in the preparation of this article were obtained from the University of Southern California MosaicMRI database. USC investigators provided data but did not participate in analysis or writing of this report.”

USC will provide official citation references (journal publications, technical reports, or arXiv papers) to be used in each manuscript.

5. Limitations

  • The Dataset is provided strictly for research and educational purposes and is not intended for clinical use or diagnosis.
  • No warranties are made, express or implied, including warranties of merchantability, fitness for a particular purpose, or non-infringement.
  • USC disclaims liability for any loss, claim, damage, or liability arising from or connected to this Agreement or the Dataset.

6. Indemnification

I agree to indemnify, defend, and hold harmless USC and its trustees, directors, officers, faculty, staff, students, affiliates, and agents from any claims, damages, or losses resulting from my access to or use of the Dataset, except to the extent caused by the gross negligence or willful misconduct of USC.

7. Termination

  • USC may terminate this Agreement immediately if I violate its terms.
  • Upon termination or completion of my work, I will destroy all copies of the Dataset.

8. Governing Law and Venue

This Agreement shall be governed by and construed in accordance with the laws of the State of California. Any dispute arising under this Agreement shall be brought exclusively in state or federal courts located in Los Angeles, California.

By submitting, you agree to comply with the dataset license and access policy.

MosaicMRI Benchmark

Evaluate cross-anatomy generalization for accelerated MRI reconstruction (8x). Train on released anatomies and submit results for hidden-ground-truth scoring.

Go to Benchmark

License, Access Policy, and Ethics

Access is granted for research use after manual review.

License

MosaicMRI is released for non-commercial research and method development under the posted license terms.

Scope
Research-only / non-commercial use

De-identification

Metadata is de-identified before release. Users may not attempt participant re-identification.

  • PHI removed from headers/metadata
  • Research-only use
  • Ethics / IRB details (if applicable)

Citation

Please cite the dataset paper if you use MosaicMRI.

BibTeX

@article{mosaicmri_2026,
  title   = {MosaicMRI: A Diverse Dataset and Benchmark for Raw Musculoskeletal MRI},
  author  = {Arguello, Paula and Tinaz, Berk and Mohammad, Shahab Sepehri and Soltanolkotabi, Maryam and Soltanolkotabi, Mahdi},
  journal = {arXiv},
  year    = {2026}
}

Citation metadata will be updated if publication details change.

Collaborators

University of Utah Logo

University of Utah

USC Logo

University of Southern California (USC)

UC Irvine Logo

University of California, Irvine (UCI)