|
|
--- |
|
|
license: mit |
|
|
tags: |
|
|
- tabular-regression |
|
|
- sklearn |
|
|
- xgboost |
|
|
- random-forest |
|
|
- motorsport |
|
|
- lap-time-prediction |
|
|
datasets: |
|
|
- Haxxsh/gdgc-datathon-data |
|
|
language: |
|
|
- en |
|
|
pipeline_tag: tabular-regression |
|
|
--- |
|
|
|
|
|
# GDGC Datathon 2025 - Lap Time Prediction Models |
|
|
|
|
|
Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition. |
|
|
|
|
|
## Model Description |
|
|
|
|
|
This repository contains ensemble models trained to predict `Lap_Time_Seconds` for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation. |
|
|
|
|
|
### Models Included |
|
|
|
|
|
| File | Description | Size | |
|
|
|------|-------------|------| |
|
|
| `rf_final.pkl` | Final Random Forest model | 158 MB | |
|
|
| `xgb_final.pkl` | Final XGBoost model | 2.6 MB | |
|
|
| `rf_cv_models.pkl` | Random Forest CV fold models | 13.4 GB | |
|
|
| `xgb_cv_models.pkl` | XGBoost CV fold models | 103 MB | |
|
|
| `rf_model.pkl` | Base Random Forest model | 95 MB | |
|
|
| `xgb_model.pkl` | Base XGBoost model | 2 MB | |
|
|
| `feature_engineer.pkl` | Feature preprocessing pipeline | 6 KB | |
|
|
| `best_params.json` | Optimal hyperparameters | 1 KB | |
|
|
| `cv_results.json` | Cross-validation results | 1 KB | |
|
|
|
|
|
## Training Data |
|
|
|
|
|
The models were trained on the [GDGC Datathon 2025 dataset](https://huggingface.co/datasets/Haxxsh/gdgc-datathon-data): |
|
|
|
|
|
- **Training samples:** 734,002 |
|
|
- **Target variable:** `Lap_Time_Seconds` (continuous) |
|
|
- **Target range:** 70.001s - 109.999s |
|
|
- **Target distribution:** Nearly symmetric (mean ≈ 90s, std ≈ 11.5s) |
|
|
|
|
|
### Features |
|
|
|
|
|
The dataset includes features such as: |
|
|
- Circuit characteristics (length, corners, laps) |
|
|
- Weather conditions (temperature, humidity, track condition) |
|
|
- Rider/driver information (championship points, position, history) |
|
|
- Tire compounds and degradation factors |
|
|
- Pit stop durations |
|
|
|
|
|
## Usage |
|
|
|
|
|
### Loading the Models |
|
|
|
|
|
```python |
|
|
import pickle |
|
|
import joblib |
|
|
|
|
|
# Load the final models |
|
|
with open("rf_final.pkl", "rb") as f: |
|
|
rf_model = pickle.load(f) |
|
|
|
|
|
with open("xgb_final.pkl", "rb") as f: |
|
|
xgb_model = pickle.load(f) |
|
|
|
|
|
# Load feature engineering pipeline |
|
|
with open("feature_engineer.pkl", "rb") as f: |
|
|
feature_engineer = pickle.load(f) |
|
|
``` |
|
|
|
|
|
### Making Predictions |
|
|
|
|
|
```python |
|
|
import pandas as pd |
|
|
|
|
|
# Load test data |
|
|
test_df = pd.read_csv("test.csv") |
|
|
|
|
|
# Apply feature engineering |
|
|
X_test = feature_engineer.transform(test_df) |
|
|
|
|
|
# Predict with ensemble (average of RF and XGB) |
|
|
rf_preds = rf_model.predict(X_test) |
|
|
xgb_preds = xgb_model.predict(X_test) |
|
|
ensemble_preds = (rf_preds + xgb_preds) / 2 |
|
|
``` |
|
|
|
|
|
### Download from Hugging Face |
|
|
|
|
|
```python |
|
|
from huggingface_hub import hf_hub_download |
|
|
|
|
|
# Download a specific model file |
|
|
model_path = hf_hub_download( |
|
|
repo_id="Haxxsh/gdgc-datathon-models", |
|
|
filename="xgb_final.pkl" |
|
|
) |
|
|
|
|
|
# Load it |
|
|
with open(model_path, "rb") as f: |
|
|
model = pickle.load(f) |
|
|
``` |
|
|
|
|
|
## Hyperparameters |
|
|
|
|
|
Best parameters found via cross-validation (see `best_params.json`): |
|
|
|
|
|
```json |
|
|
{ |
|
|
"random_forest": { |
|
|
"n_estimators": 100, |
|
|
"max_depth": null, |
|
|
"min_samples_split": 2, |
|
|
"min_samples_leaf": 1 |
|
|
}, |
|
|
"xgboost": { |
|
|
"n_estimators": 100, |
|
|
"learning_rate": 0.1, |
|
|
"max_depth": 6 |
|
|
} |
|
|
} |
|
|
``` |
|
|
|
|
|
## Evaluation |
|
|
|
|
|
Cross-validation results are stored in `cv_results.json`. Primary metric: **RMSE** (Root Mean Squared Error). |
|
|
|
|
|
## Training Code |
|
|
|
|
|
The training code is available on GitHub: [ezylopx5/DATATHON](https://github.com/ezylopx5/DATATHON) |
|
|
|
|
|
Key files: |
|
|
- `train.py` - Main training script |
|
|
- `features.py` - Feature engineering |
|
|
- `predict.py` - Inference script |
|
|
|
|
|
## Framework Versions |
|
|
|
|
|
- Python 3.8+ |
|
|
- scikit-learn |
|
|
- XGBoost |
|
|
- pandas |
|
|
- numpy |
|
|
|
|
|
## License |
|
|
|
|
|
MIT License |
|
|
|
|
|
## Citation |
|
|
|
|
|
```bibtex |
|
|
@misc{gdgc-datathon-2025, |
|
|
author = {Haxxsh}, |
|
|
title = {GDGC Datathon 2025 Lap Time Prediction Models}, |
|
|
year = {2025}, |
|
|
publisher = {Hugging Face}, |
|
|
url = {https://huggingface.co/Haxxsh/gdgc-datathon-models} |
|
|
} |
|
|
``` |
|
|
|