CSI-4CAST Organization

Welcome to the CSI-4CAST organization on Hugging Face! This organization hosts datasets for CSI prediction research. This dataset is originally created for our research paper: CSI-4CAST: A Hybrid Deep Learning Model for CSI Prediction with Comprehensive Robustness and Generalization Testing. The corresponding code and implementation are available in our GitHub repo.

TL;DR

Quick Start Options:

For specific datasets: Use the snapshot_download command to download individual datasets you need
For all datasets with original structure: Run download.py followed by reconstruction.py to get the complete, well-structured dataset

See the Usage section below for detailed instructions.

Dataset Structure

The datasets are organized in the following structure:

data/
├── stats/
│   ├── fdd/
│   │   └── normalization_stats.pkl
│   └── tdd/
│       └── normalization_stats.pkl
├── test/
│   ├── generalization/
│   │   ├── cm_A_ds_030_ms_001/
│   │   │   ├── H_D_pred.pt
│   │   │   ├── H_U_hist.pt
│   │   │   └── H_U_pred.pt
│   │   ├── cm_B_ds_030_ms_001/
│   │   ├── cm_C_ds_030_ms_001/
│   │   ├── cm_D_ds_030_ms_001/
│   │   ├── cm_E_ds_030_ms_001/
│   │   └── ...
│   └── regular/
│       ├── cm_A_ds_030_ms_001/
│       │   ├── H_D_pred.pt
│       │   ├── H_U_hist.pt
│       │   └── H_U_pred.pt
│       ├── cm_C_ds_030_ms_001/
│       ├── cm_D_ds_030_ms_001/
│       └── ...
└── train/
    └── regular/
        ├── cm_A_ds_030_ms_001/
        │   ├── H_D_pred.pt
        │   ├── H_U_hist.pt
        │   └── H_U_pred.pt
        ├── cm_C_ds_030_ms_001/
        ├── cm_D_ds_030_ms_001/
        └── ...

Dataset Organization Strategy

Our datasets are organized using a convenience-first naming strategy on Hugging Face. Instead of uploading the entire data folder as one large dataset, we've split it into individual datasets with descriptive names. This approach allows users to:

Download only the specific data they need (e.g., just one configuration or test type)
Easily identify datasets by their purpose and configuration
Reduce download time and storage by avoiding unnecessary data
Enable selective loading for different research scenarios

Available Datasets

Statistics Dataset

stats: Contains normalization statistics for FDD and TDD configurations

Test Datasets

test_regular_*: Regular test data for various configurations
test_generalization_*: Generalization test data with extended parameter ranges

Training Datasets

train_regular_*: Training data for various configurations

Dataset Naming Convention

The datasets follow this naming pattern:

[train/test]_[regular/generalization]_cm_[A/B/C/D/E]: Dataset type and channel model
cm_[A/B/C/D/E]: Channel models CDL-A, CDL-B, CDL-C, CDL-D, CDL-E
ds_[030/050/100/200/300/400]: Delay spreads with values in ns
ms_[001/003/006/009/010/012/015/018/021/024/027/030/033/036/039/042/045]: User speed with values in m/s

Examples:

test_regular_cm_A_ds_030_ms_001: Regular test data for CDL-A model, 30ns delay spread, 1 m/s speed
train_regular_cm_C_ds_100_ms_030: Training data for CDL-C model, 100ns delay spread, 30 m/s speed
test_generalization_cm_B_ds_200_ms_015: Generalization test data for CDL-B model, 200ns delay spread, 15 m/s speed

Usage

Downloading Datasets

You can download individual datasets using the Hugging Face Hub:

from huggingface_hub import snapshot_download

# Download the stats dataset
snapshot_download(repo_id="CSI-4CAST/stats", repo_type="dataset")

# Download a specific CSI prediction dataset
snapshot_download(repo_id="CSI-4CAST/test_regular_cm_A_ds_030_ms_001", repo_type="dataset")

Downloading All Datasets

To download all available datasets at once, use the provided download.py script:

# Download all datasets to a 'datasets' folder
python3 download.py

# Download to a custom directory
python3 download.py --output-dir my_datasets

# Dry run to test without downloading (creates empty placeholder files)
python3 download.py --dry-run

The script will automatically:

Check for all possible dataset combinations
Download only the datasets that exist on Hugging Face
Create organized folder structure with descriptive names

Reconstructing Original Folder Structure

While our naming strategy makes it easy to download specific datasets, you might want to work with the complete dataset in its original folder structure. For this purpose, we provide the reconstruction.py script that restores the original organization:

python3 reconstruction.py --input-dir datasets --output-dir data

This script will:

Remove the prefixes (test_regular_, test_generalization_, train_regular_)
Organize the folders back into the original data structure
Create the proper hierarchy: data/stats/, data/test/regular/, data/test/generalization/, data/train/regular/

When to use reconstruction:

You want to replicate the exact structure used in the original CSI-4CAST paper
Your existing code expects the original folder organization
You need the complete dataset in the original research structure

Note: Reconstruction is only necessary if you need to replicate the CSI-4CAST paper's results exactly. If you're working with individual datasets or don't need the specific folder structure, you can skip reconstruction and work directly with the downloaded datasets.

File Types

Each dataset folder contains:

H_D_pred.pt: Predicted H_D values (PyTorch tensor)
H_U_hist.pt: Historical H_U values (PyTorch tensor)
H_U_pred.pt: Predicted H_U values (PyTorch tensor)

Questions & Contributions

For further questions or any contribution suggestions, you can create pull requests here or to the GitHub homepage of this organization.

Citation

@misc{cheng2025csi4casthybriddeeplearning,
      title={CSI-4CAST: A Hybrid Deep Learning Model for CSI Prediction with Comprehensive Robustness and Generalization Testing}, 
      author={Sikai Cheng and Reza Zandehshahvar and Haoruo Zhao and Daniel A. Garcia-Ulloa and Alejandro Villena-Rodriguez and Carles Navarro Manchón and Pascal Van Hentenryck},
      year={2025},
      eprint={2510.12996},
      archivePrefix={arXiv},
      primaryClass={cs.LG},
      url={https://arxiv.org/abs/2510.12996}, 
}