Version v1.0 | ML-Ready Dataset for High-Throughput Screening
A massive, machine learning-ready dataset of 137,953 hypothetical MOF structures with comprehensive structural and geometric features. This dataset enables high-throughput computational screening, material discovery, and gas adsorption prediction for various applications. Note: This dataset does not include hydrogen storage labels and is primarily designed for inference and candidate screening tasks.
From Northwestern University databases containing 137,953 hypothetically constructed MOF CIF files assembled from 102 building blocks. The hMOF-ML Database was originally developed by Christopher E. Wilmer and collaborators and is now curated by the Snurr Research Lab.
From MOFXDB API including original geometric descriptors:
We performed comprehensive processing to transform raw hMOF-ML data into a ML-ready format:
Computed additional features to enrich the feature space:
Unified JSON database containing:
For a complete list of all accessible attributes and their descriptions, please refer to the Data Format documentation.
Structured JSON format with comprehensive features, ready for direct model inference
Comprehensive structural and geometric features
137,953 hypothetical structures for comprehensive screening
Unified format optimized for inference tasks
This dataset does NOT include hydrogen storage labels (UG_at_PS, UV_at_PS, UG_at_TPS, UV_at_TPS). It is primarily designed for inference tasks and candidate screening. For training models on hydrogen storage prediction, please use the H2MOF-ML Dataset.
The dataset is available in JSON format. You can download it via:
If you use this dataset, please cite the following references:
N. Scott Bobbitt, Kaihang Shi, Benjamin J. Bucior, Haoyuan Chen, Nathaniel Tracy-Amoroso, Zhao Li, Yangzesheng Sun, Julia H. Merlin, J. Ilja Siepmann, Daniel W. Siderius, and Randall Q. Snurr.
MOFX-DB: An Online Database of Computational Adsorption Data for Nanoporous Materials.
Journal of Chemical & Engineering Data, 2023, 68 (2), 483β498.
DOI: 10.1021/acs.jced.2c00583
Christopher E. Wilmer, Michael Leaf, Chang Yeon Lee, Omar K. Farha, Brad G. Hauser, Joseph T. Hupp, and Randall Q. Snurr.
Large-scale screening of hypothetical metalβorganic frameworks.
Nature Chemistry, 2012, 4, 83β89.
DOI: 10.1038/nchem.1192
For training machine learning models on hydrogen storage prediction with labeled GCMC data, see: H2MOF-ML Dataset