The data for this project are stored on the Northeast Storage Exchange (NESE). Follow the instructions for large data download found on our website: Downloading data from NESE via Globus: Quick Start

Mammograms are vital for detecting breast cancer, the most common cancer among women in the US. However, low-quality scans and imaging artifacts can compromise their efficacy. We introduce an automated pipeline to filter low-quality mammograms from large datasets.

Our initial dataset of 176, 492 mammograms contained an estimated 5.5% lower quality scans with issues like metal coil frames, wire clamps, and breast implants. Manually removing these images is tedious and error-prone. Our two-stage process first uses threshold-based 5-bin histogram filtering to eliminate undesirable images, followed by a variational autoencoder to remove remaining low-quality scans. Our method detects such scans with an F1 Score of 0.8862 and preserves 163, 568 high-quality mammograms. We provide results and tools publicly available as open-source software, available here: https://github.com/mpsych/ODM

Featured Dataverses

In order to use this feature you must have at least one published or linked dataverse.

Publish Dataverse

Are you sure you want to publish your dataverse? Once you do so it must remain published.

Publish Dataverse

This dataverse cannot be published because the dataverse it is in has not been published.

Delete Dataverse

Are you sure you want to delete your dataverse? You cannot undelete this dataverse.

Advanced Search

1 to 2 of 2 Results
Jul 8, 2024
Haehn, Daniel; Zurrin, Ryan; Goyal, Neha; Bendiksen, Benni; Manocha, Muskaan; Simovici, Dan; Haspel, Nurit; Pomplun, Marc; Lotter, Bill; Sorensen, Greg, 2024, "2D Mammograms + DeepSight Cancer Annotations", https://doi.org/10.7910/DVN/KXJCIU, Harvard Dataverse, V1
Each image in the dataset is accompanied by a metadata file in JSON format, providing detailed information about the image. Below is an outline of the metadata content: PatientID: A unique identifier for the patient associated with the image. View: The mammographic view (e.g., cranio-caudal) that the image represents. WindowCenter: An array indicat...
Jul 8, 2024
Haehn, Daniel; Zurrin, Ryan; Goyal, Neha; Bendiksen, Benni; Manocha, Muskaan; Simovici, Dan; Haspel, Nurit; Pomplun, Marc; Lotter, Bill; Sorensen, Greg, 2024, "3D Tomosynthesis + DeepSight Cancer Annotations", https://doi.org/10.7910/DVN/E7GHGE, Harvard Dataverse, V1
Each image in the dataset is accompanied by a metadata file in JSON format, providing detailed information about the image. Below is an outline of the metadata content: PatientID: A unique identifier for the patient associated with the image. View: The mammographic view (e.g., cranio-caudal) that the image represents. WindowCenter: An array indicat...
Add Data

Sign up or log in to create a dataverse or add a dataset.

Share Dataverse

Share this dataverse on your favorite social media networks.

Link Dataverse
Reset Modifications

Are you sure you want to reset the selected metadata fields? If you do this, any customizations (hidden, required, optional) you have done will no longer appear.