GATE-VLAP / README.md

asenppopov

Add model card

cea91ba verified 16 days ago

preview code

raw

history blame

2.21 kB

metadata

license: mit
library_name: pytorch
tags:
  - robotics
  - libero
  - vision-language-action
  - imitation-learning
  - manipulation
datasets:
  - gate-institute/GATE-VLAP-datasets

GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning

Trained on LIBERO-10 Benchmark

This model is trained for robotic manipulation tasks using vision-language-action learning with semantic action chunking.

Model Details

Architecture: CLIP-RT (CLIP-based Robot Transformer)
Training Dataset: GATE-VLAP LIBERO-10
Training Epochs: 90
Task Type: Long-horizon robotic manipulation
Input: RGB images (128×128) + language instructions
Output: 7-DOF actions (xyz, rpy, gripper)

Training Details

Dataset: LIBERO-10 (29 subtasks, 1,354 demonstrations)
Segmentation: Semantic action chunking using Gemini Vision API
Framework: PyTorch
Checkpoint: Epoch 90

Usage

import torch
from pathlib import Path

# Load checkpoint
checkpoint = torch.load(
    "checkpoints/libero_10_fixed_training_v1/epoch_90.pt",
    map_location="cuda"
)

# Extract model state
model_state = checkpoint['model_state_dict']

# TODO: Add inference code here

Performance

Training run: libero_10_fixed_training_v1

Add your metrics here after evaluation

Dataset

This model was trained on the GATE-VLAP Datasets, which includes:

LIBERO-10: 103,650 frames across 29 subtasks
Semantic action segmentation
Vision-language annotations

Citation

@article{gateVLAP2024,
  title={GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning},
  author={[Your Name]},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

Maintainer

GATE Institute - Advanced AI Research Group, Sofia, Bulgaria

License

MIT License

gate-institute
/

GATE-VLAP