Haxxsh's picture
Upload README.md with huggingface_hub
b6be5d4 verified
---
license: mit
tags:
- tabular-regression
- sklearn
- xgboost
- random-forest
- motorsport
- lap-time-prediction
datasets:
- Haxxsh/gdgc-datathon-data
language:
- en
pipeline_tag: tabular-regression
---
# GDGC Datathon 2025 - Lap Time Prediction Models
Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.
## Model Description
This repository contains ensemble models trained to predict `Lap_Time_Seconds` for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.
### Models Included
| File | Description | Size |
|------|-------------|------|
| `rf_final.pkl` | Final Random Forest model | 158 MB |
| `xgb_final.pkl` | Final XGBoost model | 2.6 MB |
| `rf_cv_models.pkl` | Random Forest CV fold models | 13.4 GB |
| `xgb_cv_models.pkl` | XGBoost CV fold models | 103 MB |
| `rf_model.pkl` | Base Random Forest model | 95 MB |
| `xgb_model.pkl` | Base XGBoost model | 2 MB |
| `feature_engineer.pkl` | Feature preprocessing pipeline | 6 KB |
| `best_params.json` | Optimal hyperparameters | 1 KB |
| `cv_results.json` | Cross-validation results | 1 KB |
## Training Data
The models were trained on the [GDGC Datathon 2025 dataset](https://huggingface.co/datasets/Haxxsh/gdgc-datathon-data):
- **Training samples:** 734,002
- **Target variable:** `Lap_Time_Seconds` (continuous)
- **Target range:** 70.001s - 109.999s
- **Target distribution:** Nearly symmetric (mean ≈ 90s, std ≈ 11.5s)
### Features
The dataset includes features such as:
- Circuit characteristics (length, corners, laps)
- Weather conditions (temperature, humidity, track condition)
- Rider/driver information (championship points, position, history)
- Tire compounds and degradation factors
- Pit stop durations
## Usage
### Loading the Models
```python
import pickle
import joblib
# Load the final models
with open("rf_final.pkl", "rb") as f:
rf_model = pickle.load(f)
with open("xgb_final.pkl", "rb") as f:
xgb_model = pickle.load(f)
# Load feature engineering pipeline
with open("feature_engineer.pkl", "rb") as f:
feature_engineer = pickle.load(f)
```
### Making Predictions
```python
import pandas as pd
# Load test data
test_df = pd.read_csv("test.csv")
# Apply feature engineering
X_test = feature_engineer.transform(test_df)
# Predict with ensemble (average of RF and XGB)
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)
ensemble_preds = (rf_preds + xgb_preds) / 2
```
### Download from Hugging Face
```python
from huggingface_hub import hf_hub_download
# Download a specific model file
model_path = hf_hub_download(
repo_id="Haxxsh/gdgc-datathon-models",
filename="xgb_final.pkl"
)
# Load it
with open(model_path, "rb") as f:
model = pickle.load(f)
```
## Hyperparameters
Best parameters found via cross-validation (see `best_params.json`):
```json
{
"random_forest": {
"n_estimators": 100,
"max_depth": null,
"min_samples_split": 2,
"min_samples_leaf": 1
},
"xgboost": {
"n_estimators": 100,
"learning_rate": 0.1,
"max_depth": 6
}
}
```
## Evaluation
Cross-validation results are stored in `cv_results.json`. Primary metric: **RMSE** (Root Mean Squared Error).
## Training Code
The training code is available on GitHub: [ezylopx5/DATATHON](https://github.com/ezylopx5/DATATHON)
Key files:
- `train.py` - Main training script
- `features.py` - Feature engineering
- `predict.py` - Inference script
## Framework Versions
- Python 3.8+
- scikit-learn
- XGBoost
- pandas
- numpy
## License
MIT License
## Citation
```bibtex
@misc{gdgc-datathon-2025,
author = {Haxxsh},
title = {GDGC Datathon 2025 Lap Time Prediction Models},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
}
```