GATE-VLAP / README.md
asenppopov's picture
Add model card
cea91ba verified
|
raw
history blame
2.21 kB
metadata
license: mit
library_name: pytorch
tags:
  - robotics
  - libero
  - vision-language-action
  - imitation-learning
  - manipulation
datasets:
  - gate-institute/GATE-VLAP-datasets

GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning

Trained on LIBERO-10 Benchmark

This model is trained for robotic manipulation tasks using vision-language-action learning with semantic action chunking.

Model Details

  • Architecture: CLIP-RT (CLIP-based Robot Transformer)
  • Training Dataset: GATE-VLAP LIBERO-10
  • Training Epochs: 90
  • Task Type: Long-horizon robotic manipulation
  • Input: RGB images (128×128) + language instructions
  • Output: 7-DOF actions (xyz, rpy, gripper)

Training Details

  • Dataset: LIBERO-10 (29 subtasks, 1,354 demonstrations)
  • Segmentation: Semantic action chunking using Gemini Vision API
  • Framework: PyTorch
  • Checkpoint: Epoch 90

Usage

import torch
from pathlib import Path

# Load checkpoint
checkpoint = torch.load(
    "checkpoints/libero_10_fixed_training_v1/epoch_90.pt",
    map_location="cuda"
)

# Extract model state
model_state = checkpoint['model_state_dict']

# TODO: Add inference code here

Performance

Training run: libero_10_fixed_training_v1

Add your metrics here after evaluation

Dataset

This model was trained on the GATE-VLAP Datasets, which includes:

  • LIBERO-10: 103,650 frames across 29 subtasks
  • Semantic action segmentation
  • Vision-language annotations

Citation

@article{gateVLAP2024,
  title={GATE-VLAP: Grounded Action Trajectory Embeddings with Vision-Language Action Planning},
  author={[Your Name]},
  journal={arXiv preprint arXiv:XXXX.XXXXX},
  year={2024}
}

Maintainer

GATE Institute - Advanced AI Research Group, Sofia, Bulgaria

Links

License

MIT License