metadata
license: mit
library_name: pytorch
tags:
- robotics
- libero
- vision-language-action
- imitation-learning
- manipulation
datasets:
- gate-institute/GATE-VLAP-datasets
GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning
Trained on LIBERO-10 Benchmark
This model is trained for robotic manipulation tasks using vision-language-action learning with semantic action chunking.
Model Details
- Architecture: CLIP-RT (CLIP-based Robot Transformer)
- Training Dataset: GATE-VLAP LIBERO-10
- Training Epochs: 90
- Task Type: Long-horizon robotic manipulation
- Input: RGB images (128×128) + language instructions
- Output: 7-DOF actions (xyz, rpy, gripper)
Training Details
- Dataset: LIBERO-10 (29 subtasks, 1,354 demonstrations)
- Segmentation: Semantic action chunking using Gemini Vision API
- Framework: PyTorch
- Checkpoint: Epoch 90
Usage
import torch
from pathlib import Path
# Load checkpoint
checkpoint = torch.load(
"checkpoints/libero_10_fixed_training_v1/epoch_90.pt",
map_location="cuda"
)
# Extract model state
model_state = checkpoint['model_state_dict']
# TODO: Add inference code here
Performance
Training run: libero_10_fixed_training_v1
Add your metrics here after evaluation
Dataset
This model was trained on the GATE-VLAP Datasets, which includes:
- LIBERO-10: 103,650 frames across 29 subtasks
- Semantic action segmentation
- Vision-language annotations
Citation
@article{gateVLAP2024,
title={GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning},
author={[Your Name]},
journal={arXiv preprint arXiv:XXXX.XXXXX},
year={2024}
}
Maintainer
GATE Institute - Advanced AI Research Group, Sofia, Bulgaria
Links
- 🤗 Dataset: gate-institute/GATE-VLAP-datasets
- 📄 Paper: Coming soon
- 💻 Code: Add your GitHub repo here
License
MIT License