Version v1.0 | ML-Ready Dataset for Hydrogen Storage Prediction
A comprehensive, machine learning-ready dataset of 4,034 experimentally synthesised MOF materials, including rich structural features and four hydrogen adsorption capacity metrics derived from Grand Canonical Monte Carlo (GCMC) simulations. This dataset enables direct training and inference for hydrogen storage performance prediction tasks.
From CSD MOF Collection (Non-Commercial) containing experimentally synthesised MOF CIF data.
From HyMARC Datahub (Ahmed et al., 2017) including:
UG_at_PS β Usable Gravimetric capacity at 77 K (100 β 5 bar) in wt.%UV_at_PS β Usable Volumetric capacity at 77 K (100 β 5 bar) in g Hβ/LUG_at_TPS β Usable Gravimetric capacity (100 bar/77 K β 5 bar/160 K) in wt.%UV_at_TPS β Usable Volumetric capacity (100 bar/77 K β 5 bar/160 K) in g Hβ/LWe performed extensive processing to transform raw data into a ML-ready format:
Retained only MOFs with CSD codes (4,034 samples) to ensure data traceability and structure reproducibility.
Computed additional features to enrich the feature space:
Unified JSON database containing:
Missing value handling with default values to ensure training stability.
For a complete list of all accessible attributes and their descriptions, please refer to the Data Format documentation.
Structured JSON format with comprehensive features, ready for direct model training
Multi-dimensional feature space: structural + geometric + performance
Preserved CSD codes for reliable data provenance
Unified format with comprehensive feature engineering
The dataset is available in JSON format. You can download it via:
If you use this dataset, please cite both the original data sources:
Ahmed, A., Liu, Y., Purewal, J., Tran, L., Sholl, D. S., & Lively, R. P.
High-throughput screening of metalβorganic frameworks for hydrogen storage at cryogenic temperature.
Energy & Environmental Science, 2017, 10, 2459β2471.
From CSD MOF Collection (Non-Commercial), licensed under CC BY-NC-SA 4.0
This modified dataset is licensed under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License (CC BY-NC-SA 4.0).