🚀 Quick Start Guide

Welcome to MOF DB! This guide will help you get started with accessing and using our MOF datasets.

Step 1: Download the Dataset

📦 Dataset Usage

H2MOF-ML Dataset (JSON): 🎯 Use for model training — Contains GCMC simulation values (labels) for hydrogen adsorption, enabling supervised learning.
hMOF-ML Dataset (JSON): 🎯 Use for prediction — Apply trained models to predict properties on 137,953 hypothetical MOF structures for large-scale screening.

💡 Workflow Tip: Train your model on H2MOF-ML, then use it to make predictions on hMOF-ML for high-throughput screening.

You can download datasets in two ways:

Option A: Web Interface

Open the dataset page in your browser
H2MOF-ML Dataset hMOF-ML Dataset to visit the Hugging Face dataset link.
Click the download button on the page
Save the JSON file to your computer

Option B: API Access

Use the following Python script to fetch the JSON directly from Hugging Face into your current working directory. Choose the appropriate script based on which dataset you need:

download_h2mof.py

# Download H2MOF-ML Dataset
from pathlib import Path
from huggingface_hub import hf_hub_download

current_dir = Path.cwd()

file_path = hf_hub_download(
    repo_id="ytbai/H2MOF-ML",
    filename="H2MOF-ML.json",
    repo_type="dataset",
    local_dir=str(current_dir),
    local_dir_use_symlinks=False,
)

print("Saved at:", file_path)

download_hmof.py

# Download hMOF-ML Dataset
from pathlib import Path
from huggingface_hub import hf_hub_download

current_dir = Path.cwd()

file_path = hf_hub_download(
    repo_id="ytbai/hMOF-ML",
    filename="hMOF-ML.json",
    repo_type="dataset",
    local_dir=str(current_dir),
    local_dir_use_symlinks=False,
)

print("Saved at:", file_path)

Run the corresponding script (e.g., python download_h2mof.py or python download_hmof.py) and the dataset will be saved where you execute the script.

Step 2: Load and Explore the Data

Use Python to load the JSON file and access samples with convenient dot notation, similar to the explore.py helper shown below:

explore.py

import json
import logging
import os
from types import SimpleNamespace

logging.basicConfig(level=logging.DEBUG, format='%(asctime)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
logger.disabled = True

def dict_to_namespace(d):
    """Convert a dictionary to dot-notation objects"""
    if isinstance(d, dict):
        return SimpleNamespace(**{k: dict_to_namespace(v) for k, v in d.items()})
    elif isinstance(d, list):
        return [dict_to_namespace(item) for item in d]
    return d

def load_json_samples(json_file):
    if not os.path.exists(json_file):
        raise FileNotFoundError(f"JSON file not found: {json_file}")

    with open(json_file, 'r', encoding='utf-8') as f:
        json_data = json.load(f)

    samples = {}
    for sample_name, sample_data in json_data.items():
        samples[sample_name] = dict_to_namespace(sample_data)
    return samples

json_file = "H2MOF-ML.json"
samples = load_json_samples(json_file)

sample_name = "ABEXOW"
print(samples[sample_name].lattice.abc)
print(samples[sample_name].properties.metal_composition.total_metal_atoms)
print(samples[sample_name].properties.UV_at_TPS)

# Example output
# [12.002, 11.008, 19.473]
# 7
# 23.96

💡 Tip

The loading method is the same for both datasets - simply replace the file path with the corresponding dataset filename. For detailed attribute descriptions and available properties, please refer to the Data Format documentation.

Next Steps

Explore the Data Format documentation for detailed field descriptions