YOLOv11 Warehouse Pallet Detector
A family of fine-tuned YOLOv11 models for real-time detection of pallets (wooden skid + stacked products) in warehouse environments. Available in 2 YOLOv11 sizes — nano (2.6M params, edge-ready) and small (9.4M params, best accuracy). Optimized for foreground pallet identification in operational warehouse settings with forklifts, racks, and dynamic lighting.
Model Variants
640p Models (Standard Resolution)
| Model | Params | Size (MB) | Resolution | [email protected] | [email protected]:0.95 | Precision | Recall | GPU (ms) | CPU (ms) | Best For |
|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv11n-640 | 2.6M | ~5 | 640 | 0.572 | 0.457 | 0.600 | 0.563 | ~5 | ~25 | Edge / real-time on low-power devices |
| YOLOv11s-640 | 9.4M | ~19 | 640 | 0.592 | 0.485 | 0.599 | 0.574 | ~7 | ~45 | Balanced speed/accuracy |
1280p Models (Native High Resolution)
| Model | Params | Size (MB) | Resolution | [email protected] | [email protected]:0.95 | Precision | Recall | GPU (ms) | CPU (ms) | Best For |
|---|---|---|---|---|---|---|---|---|---|---|
| YOLOv11n-1280 | 2.6M | ~5 | 1280 | 0.567 | 0.440 | 0.610 | 0.530 | ~18 | ~95 | Edge with high-res cameras |
| YOLOv11s-1280 | 9.4M | ~19 | 1280 | 0.569 | 0.459 | 0.543 | 0.607 | ~25 | ~170 | Balanced, small pallet detection |
All variants share the same training data, augmentation pipeline, and hyperparameters. Medium, large, and extra-large variants were tested but showed no accuracy improvement over small with the current dataset, so only nano and small are published.
Model Description
This model detects complete pallet units (wooden skid base + all products stacked on top) in warehouse imagery. It was trained on real-world warehouse photos captured during normal operations, making it robust to common warehouse conditions: motion blur, variable lighting, partial occlusions by forklifts and personnel, and cluttered backgrounds.
Unlike generic object detection models, this model is specifically trained to:
- Detect foreground pallets that are fully within the frame
- Distinguish pallets from visually similar structures (ceiling rafters, doors, rack uprights)
- Handle pallets stacked 2-high as separate detections per level
- Work reliably with motion-blurred images from moving cameras
Intended Use
- Warehouse automation: Real-time pallet counting and position tracking
- Forklift guidance: Detecting pallets in the robot/forklift field of view
- Inventory management: Automated pallet inventory from security or mounted cameras
- 3D warehouse mapping: Input to multi-view reconstruction pipelines for spatial pallet localization
Out of Scope
- Empty pallet (wooden skid only) detection without products
- Pallet type classification (EUR, GMA, block, stringer)
- Damaged pallet assessment
- Outdoor or non-warehouse environments
Training Details
Architecture
| Variant | Base Model | Parameters | Input Resolution | Classes | Framework |
|---|---|---|---|---|---|
| Nano | yolo11n.pt |
~2.6M | 640x640 or 1280x1280 | 1 (pallet) |
Ultralytics 8.x |
| Small | yolo11s.pt |
~9.4M | 640x640 or 1280x1280 | 1 (pallet) |
Ultralytics 8.x |
Each variant is trained at both 640p and 1280p, yielding 4 models total. The 1280p models use the same architecture but train on higher-resolution inputs, improving detection of small/distant pallets at the cost of slower inference.
Dataset
| Split | Ratio | Description |
|---|---|---|
| Train | 80% | Labeled warehouse images |
| Validation | 15% | Held out for epoch-level evaluation |
| Test | 5% | Held out for final evaluation |
Training uses the full available dataset (no image cap). Exact counts depend on the number of labeled images at training time.
Labeling Pipeline: Images were auto-labeled using Qwen3.5-9B (natively multimodal vision-language model) with structured prompts to identify pallet bounding boxes, followed by human review of preview images. Negative examples (images with no pallets) are included as hard negatives.
Label Format: YOLO format (class_id, x_center, y_center, width, height) normalized 0-1.
Training Configuration
| Hyperparameter | Value (640p) | Value (1280p) |
|---|---|---|
| Epochs | 100 (early stopping, patience=20) | 100 (early stopping, patience=20) |
| Batch Size | 16 | 4 (auto-scaled) |
| Image Size | 640 | 1280 |
| Optimizer | SGD (Ultralytics default) | SGD (Ultralytics default) |
| Learning Rate | Auto-scaled | Auto-scaled |
| Augmentation | HSV jitter, horizontal flip, mosaic, scale | Same |
| Device | NVIDIA GPU (CUDA) | NVIDIA GPU (CUDA) |
Data Augmentation Details:
hsv_h=0.015, hsv_s=0.7, hsv_v=0.4
degrees=10.0, translate=0.1, scale=0.5
fliplr=0.5, mosaic=1.0
Evaluation Results
Test Set Performance (All Variants)
| Variant | [email protected] | [email protected]:0.95 | Precision | Recall | GPU (ms) | CPU (ms) |
|---|---|---|---|---|---|---|
| Nano (n) | 0.572 | 0.457 | 0.600 | 0.563 | ~5 | ~25 |
| Small (s) | 0.592 | 0.485 | 0.599 | 0.574 | ~7 | ~45 |
Benchmark Comparison
There is no established standard benchmark for warehouse pallet detection. The table below compares against results reported in published literature on similar (but not identical) datasets, to provide context for this model's performance.
| Model | Dataset | Images | [email protected] | Precision | Recall | Source |
|---|---|---|---|---|---|---|
| This model (YOLOv11n) | EDITools Warehouse | Full dataset | 0.572 | 0.600 | 0.563 | This work |
| This model (YOLOv11s) | EDITools Warehouse | Full dataset | 0.592 | 0.599 | 0.574 | This work |
| NVIDIA SDG Pallet | Omniverse synthetic | ~25,000 | - | - | - | NVIDIA SDG Pallet Model |
| YOLOv8 (synthetic) | Unity synthetic | Synthetic | 0.995 | - | - | Pallet Detection From Synthetic Data (2025) |
| YOLOv8 (synthetic boost) | Synthetic + real | Custom | +69% stacked | - | - | Improving Pallet Detection Using Synthetic Data (2024) |
| YOLOv8 | Custom warehouse | Custom | 0.950 | - | - | Semi-Autonomous Forklift (2025) |
| YOLOv11 | Custom warehouse | Custom | - | 0.93+ | - | Semi-Autonomous Forklift (2025) |
| AM-Mask R-CNN | Complex warehouse | Custom | - | - | - | Enhanced Pallet Detection (2025) |
| Faster R-CNN | Industrial warehouse | 1,344 | 0.89 | - | - | IEEE Comparison (2020) |
| SSD | Industrial warehouse | 1,344 | 0.85 | - | - | IEEE Comparison (2020) |
| YOLOv4 | Industrial warehouse | 1,344 | 0.82 | - | - | IEEE Comparison (2020) |
| YOLOv5 + ArUco | Custom + fiducial | Custom | - | 0.995 | - | Pallet Detection with YOLO + Fiducial (2023) |
| YOLOv8 + CBAM | Warehouse tracking | Custom | - | - | - | CBAM Pallet Tracking (2025) |
| YOLOX | Industrial | Custom | - | - | - | Digital Camera Pallet Detection |
Notes on comparison:
- No standard benchmark exists for warehouse pallet detection (unlike COCO or KITTI for general/driving OD). Each study uses its own private dataset, making direct comparison difficult.
- The NVIDIA SDG model is the most production-ready alternative, trained on ~25K synthetic images via Omniverse, targeting pallet side-face centers/corners. It detects wood, metal, and plastic pallets but focuses on pallet pocket localization for forklift docking rather than full pallet unit detection.
- The synthetic-data model (0.995 mAP) was evaluated on simple single-pallet scenes, not cluttered warehouses.
- This model is the first dedicated pallet detection model published to Hugging Face Hub — no fine-tuned pallet model previously existed in the HF ecosystem.
- This model is specifically optimized for foreground pallet detection in real operational environments.
Performance by Scenario
| Scenario | Qualitative Performance |
|---|---|
| Single pallet, clear view | Excellent |
| Multiple pallets in row | Good - detects individual units |
| Pallet on forklift forks | Good - detects if mostly visible |
| Pallets on racks (background) | Limited - trained for foreground detection |
| Motion blur | Good - trained on real warehouse video frames |
| Low/mixed lighting | Good - augmented with HSV jitter |
Usage
Quick Start
from ultralytics import YOLO
# Choose the variant that fits your deployment:
# "n" = nano (fastest, edge devices)
# "s" = small (balanced)
# "s" = small (best accuracy)
model = YOLO("EFFGRP/yolov11s-warehouse-pallets") # or n
# Run inference on an image
results = model.predict("warehouse_photo.jpg", conf=0.25)
# Process results
for result in results:
for box in result.boxes:
cls_id = int(box.cls[0])
confidence = float(box.conf[0])
x1, y1, x2, y2 = box.xyxy[0].tolist()
print(f"Pallet detected: conf={confidence:.2f}, bbox=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")
Choosing a Variant
from ultralytics import YOLO
# Edge deployment (Jetson, RPi, mobile) — use nano at 640p
model = YOLO("EFFGRP/yolov11n-warehouse-pallets-640")
# General warehouse camera system — use small at 640p as the best tradeoff
model = YOLO("EFFGRP/yolov11s-warehouse-pallets-640")
# High-res camera with small/distant pallets — use small at 1280p
model = YOLO("EFFGRP/yolov11s-warehouse-pallets-1280")
Batch Processing
from ultralytics import YOLO
from pathlib import Path
model = YOLO("EFFGRP/yolov11s-warehouse-pallets")
# Process a directory of images
image_dir = Path("warehouse_photos/")
results = model.predict(
source=str(image_dir),
conf=0.25,
save=True, # Save annotated images
save_txt=True, # Save YOLO-format labels
project="output/",
name="pallet_detections"
)
Export to ONNX for Edge Deployment
from ultralytics import YOLO
# Export nano for edge, or any other variant
model = YOLO("EFFGRP/yolov11n-warehouse-pallets")
model.export(format="onnx", imgsz=640, simplify=True)
# Produces yolov11n-warehouse-pallets.onnx for deployment on edge devices
Integration with Warehouse Systems
This model is designed to work as part of a larger warehouse automation pipeline. Example integration with a Redis message queue:
import json
import redis
from ultralytics import YOLO
model = YOLO("EFFGRP/yolov11s-warehouse-pallets")
r = redis.Redis()
def process_frame(image_path):
results = model.predict(image_path, conf=0.25, verbose=False)
detections = []
for box in results[0].boxes:
detections.append({
"confidence": float(box.conf[0]),
"bbox": box.xyxy[0].tolist(),
"center": box.xywh[0][:2].tolist(),
})
r.lpush("pallet_detections", json.dumps({
"image": str(image_path),
"count": len(detections),
"detections": detections,
}))
Model Files
Each variant repository contains:
| File | Description |
|---|---|
whole_pallet_{size}_{resolution}.pt |
PyTorch model weights |
README.md |
This model card |
benchmark_results.json |
Structured benchmark comparison data |
Where {size} is one of: n (nano), s (small), and {resolution} is 640 or 1280.
Limitations
- Single class: Only detects "pallet" (complete unit). Does not distinguish pallet types, contents, or conditions.
- Foreground bias: Trained primarily on foreground pallets. Background or distant pallets may be missed.
- Domain specific: Trained on a single warehouse environment. Performance may degrade in visually different warehouses (outdoor yards, cold storage, etc.). Fine-tuning on your own data is recommended.
- Partial occlusion: Pallets significantly occluded by other pallets (not people or forklifts) are intentionally excluded from training labels.
- Dataset size: Performance improves with more training data. Fine-tuning on your own warehouse data is recommended.
Ethical Considerations
This model is intended for warehouse automation and logistics optimization. It does not process personal biometric data. However, warehouse images may incidentally contain workers - this model does not detect or track people, but users should ensure compliance with workplace surveillance regulations when deploying camera systems.
Citation
If you use this model in your research, please cite:
@misc{effgrp-warehouse-pallets-2026,
title={YOLOv11 Warehouse Pallet Detector},
author={EFFGRP},
year={2026},
publisher={Hugging Face},
url={https://huggingface.co/EFFGRP/yolov11s-warehouse-pallets}
}
Landscape: Why This Model Exists
As of March 2026, no dedicated pallet detection model exists on Hugging Face. The HF Hub has ~21 models tagged "logistics" but none are pallet-specific. The closest alternatives are:
- NVIDIA SDG Pallet Model (GitHub) — trained on ~25K synthetic images via Omniverse, focuses on pallet side-face and pocket localization for autonomous forklift docking. Production-ready but targets a different task (pocket detection vs. full pallet unit detection).
- Roboflow Universe (pallets) — community datasets with 1,755+ images and some pre-trained models, but fragmented across projects with inconsistent annotation guidelines.
- Academic models — published in papers but weights/code not publicly shared on model hubs.
This model fills the gap as a ready-to-use, real-world-trained pallet unit detector for the Hugging Face ecosystem.
Related Work
- NVIDIA SDG Pallet Model (2024) - Synthetic data + Omniverse for pallet pocket localization
- NVIDIA Pallet Detection Blog - OpenUSD synthetic data pipeline
- Pallet Detection and Localisation From Synthetic Data (2025) - Unity domain randomization + YOLOv8
- Improving Pallet Detection Using Synthetic Data (2024) - 69% mAP improvement on stacked pallets with synthetic augmentation
- Learning-Based Vision Systems for Semi-Autonomous Forklift (2025) - YOLOv8/v11 comparison for forklift pallet detection
- Enhanced Pallet Detection: AM-Mask R-CNN (2025) - Attention-enhanced Mask R-CNN for complex warehouses
- A Comparison of Deep Learning Models for Pallet Detection (IEEE, 2020) - Benchmark comparing Faster R-CNN, SSD, YOLOv4 on 1,344 images
- Pallet Detection and Distance Estimation with YOLO + Fiducial Markers (2023) - YOLOv5 + ArUco for distance estimation
- Amazon ARMBench (2023) - Large-scale warehouse manipulation benchmark (450K+ labels)
- CBAM-Enhanced Pallet Tracking (2025) - YOLOv8 + attention + DeepSORT for tracking
- Roboflow Pallet Datasets - Community-contributed pallet detection datasets (1,755+ images)
- EmaroLab PDT - Pallet Detection and Tracking with Faster R-CNN + 2D LRF
- Downloads last month
- 14
Paper for EFFGRP/yolov11s-warehouse-pallets-640
Evaluation results
- [email protected] on EDITools Warehouse Pallet Datasettest set self-reported0.572
- [email protected]:0.95 on EDITools Warehouse Pallet Datasettest set self-reported0.457
- Precision on EDITools Warehouse Pallet Datasettest set self-reported0.600
- Recall on EDITools Warehouse Pallet Datasettest set self-reported0.563
- [email protected] on EDITools Warehouse Pallet Datasettest set self-reported0.592
- [email protected]:0.95 on EDITools Warehouse Pallet Datasettest set self-reported0.485
- Precision on EDITools Warehouse Pallet Datasettest set self-reported0.599
- Recall on EDITools Warehouse Pallet Datasettest set self-reported0.574