File size: 3,871 Bytes
b6be5d4 |
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 |
---
license: mit
tags:
- tabular-regression
- sklearn
- xgboost
- random-forest
- motorsport
- lap-time-prediction
datasets:
- Haxxsh/gdgc-datathon-data
language:
- en
pipeline_tag: tabular-regression
---
# GDGC Datathon 2025 - Lap Time Prediction Models
Trained models for predicting Formula racing lap times from the GDGC Datathon 2025 competition.
## Model Description
This repository contains ensemble models trained to predict `Lap_Time_Seconds` for Formula racing events. The models use a combination of Random Forest and XGBoost regressors with cross-validation.
### Models Included
| File | Description | Size |
|------|-------------|------|
| `rf_final.pkl` | Final Random Forest model | 158 MB |
| `xgb_final.pkl` | Final XGBoost model | 2.6 MB |
| `rf_cv_models.pkl` | Random Forest CV fold models | 13.4 GB |
| `xgb_cv_models.pkl` | XGBoost CV fold models | 103 MB |
| `rf_model.pkl` | Base Random Forest model | 95 MB |
| `xgb_model.pkl` | Base XGBoost model | 2 MB |
| `feature_engineer.pkl` | Feature preprocessing pipeline | 6 KB |
| `best_params.json` | Optimal hyperparameters | 1 KB |
| `cv_results.json` | Cross-validation results | 1 KB |
## Training Data
The models were trained on the [GDGC Datathon 2025 dataset](https://huggingface.co/datasets/Haxxsh/gdgc-datathon-data):
- **Training samples:** 734,002
- **Target variable:** `Lap_Time_Seconds` (continuous)
- **Target range:** 70.001s - 109.999s
- **Target distribution:** Nearly symmetric (mean ≈ 90s, std ≈ 11.5s)
### Features
The dataset includes features such as:
- Circuit characteristics (length, corners, laps)
- Weather conditions (temperature, humidity, track condition)
- Rider/driver information (championship points, position, history)
- Tire compounds and degradation factors
- Pit stop durations
## Usage
### Loading the Models
```python
import pickle
import joblib
# Load the final models
with open("rf_final.pkl", "rb") as f:
rf_model = pickle.load(f)
with open("xgb_final.pkl", "rb") as f:
xgb_model = pickle.load(f)
# Load feature engineering pipeline
with open("feature_engineer.pkl", "rb") as f:
feature_engineer = pickle.load(f)
```
### Making Predictions
```python
import pandas as pd
# Load test data
test_df = pd.read_csv("test.csv")
# Apply feature engineering
X_test = feature_engineer.transform(test_df)
# Predict with ensemble (average of RF and XGB)
rf_preds = rf_model.predict(X_test)
xgb_preds = xgb_model.predict(X_test)
ensemble_preds = (rf_preds + xgb_preds) / 2
```
### Download from Hugging Face
```python
from huggingface_hub import hf_hub_download
# Download a specific model file
model_path = hf_hub_download(
repo_id="Haxxsh/gdgc-datathon-models",
filename="xgb_final.pkl"
)
# Load it
with open(model_path, "rb") as f:
model = pickle.load(f)
```
## Hyperparameters
Best parameters found via cross-validation (see `best_params.json`):
```json
{
"random_forest": {
"n_estimators": 100,
"max_depth": null,
"min_samples_split": 2,
"min_samples_leaf": 1
},
"xgboost": {
"n_estimators": 100,
"learning_rate": 0.1,
"max_depth": 6
}
}
```
## Evaluation
Cross-validation results are stored in `cv_results.json`. Primary metric: **RMSE** (Root Mean Squared Error).
## Training Code
The training code is available on GitHub: [ezylopx5/DATATHON](https://github.com/ezylopx5/DATATHON)
Key files:
- `train.py` - Main training script
- `features.py` - Feature engineering
- `predict.py` - Inference script
## Framework Versions
- Python 3.8+
- scikit-learn
- XGBoost
- pandas
- numpy
## License
MIT License
## Citation
```bibtex
@misc{gdgc-datathon-2025,
author = {Haxxsh},
title = {GDGC Datathon 2025 Lap Time Prediction Models},
year = {2025},
publisher = {Hugging Face},
url = {https://huggingface.co/Haxxsh/gdgc-datathon-models}
}
```
|