YOLOv11 Warehouse Pallet Detector

A family of fine-tuned YOLOv11 models for real-time detection of pallets (wooden skid + stacked products) in warehouse environments. Available in 2 YOLOv11 sizes — nano (2.6M params, edge-ready) and small (9.4M params, best accuracy). Optimized for foreground pallet identification in operational warehouse settings with forklifts, racks, and dynamic lighting.

Model Variants

640p Models (Standard Resolution)

Model	Params	Size (MB)	Resolution	[email protected]	[email protected]:0.95	Precision	Recall	GPU (ms)	CPU (ms)	Best For
YOLOv11n-640	2.6M	~5	640	0.572	0.457	0.600	0.563	~5	~25	Edge / real-time on low-power devices
YOLOv11s-640	9.4M	~19	640	0.592	0.485	0.599	0.574	~7	~45	Balanced speed/accuracy

1280p Models (Native High Resolution)

Model	Params	Size (MB)	Resolution	[email protected]	[email protected]:0.95	Precision	Recall	GPU (ms)	CPU (ms)	Best For
YOLOv11n-1280	2.6M	~5	1280	0.567	0.440	0.610	0.530	~18	~95	Edge with high-res cameras
YOLOv11s-1280	9.4M	~19	1280	0.569	0.459	0.543	0.607	~25	~170	Balanced, small pallet detection

All variants share the same training data, augmentation pipeline, and hyperparameters. Medium, large, and extra-large variants were tested but showed no accuracy improvement over small with the current dataset, so only nano and small are published.

Model Description

This model detects complete pallet units (wooden skid base + all products stacked on top) in warehouse imagery. It was trained on real-world warehouse photos captured during normal operations, making it robust to common warehouse conditions: motion blur, variable lighting, partial occlusions by forklifts and personnel, and cluttered backgrounds.

Unlike generic object detection models, this model is specifically trained to:

Detect foreground pallets that are fully within the frame
Distinguish pallets from visually similar structures (ceiling rafters, doors, rack uprights)
Handle pallets stacked 2-high as separate detections per level
Work reliably with motion-blurred images from moving cameras

Intended Use

Warehouse automation: Real-time pallet counting and position tracking
Forklift guidance: Detecting pallets in the robot/forklift field of view
Inventory management: Automated pallet inventory from security or mounted cameras
3D warehouse mapping: Input to multi-view reconstruction pipelines for spatial pallet localization

Out of Scope

Empty pallet (wooden skid only) detection without products
Pallet type classification (EUR, GMA, block, stringer)
Damaged pallet assessment
Outdoor or non-warehouse environments

Training Details

Architecture

Variant	Base Model	Parameters	Input Resolution	Classes	Framework
Nano	`yolo11n.pt`	~2.6M	640x640 or 1280x1280	1 (`pallet`)	Ultralytics 8.x
Small	`yolo11s.pt`	~9.4M	640x640 or 1280x1280	1 (`pallet`)	Ultralytics 8.x

Each variant is trained at both 640p and 1280p, yielding 4 models total. The 1280p models use the same architecture but train on higher-resolution inputs, improving detection of small/distant pallets at the cost of slower inference.

Dataset

Split	Ratio	Description
Train	80%	Labeled warehouse images
Validation	15%	Held out for epoch-level evaluation
Test	5%	Held out for final evaluation

Training uses the full available dataset (no image cap). Exact counts depend on the number of labeled images at training time.

Labeling Pipeline: Images were auto-labeled using Qwen3.5-9B (natively multimodal vision-language model) with structured prompts to identify pallet bounding boxes, followed by human review of preview images. Negative examples (images with no pallets) are included as hard negatives.

Label Format: YOLO format (class_id, x_center, y_center, width, height) normalized 0-1.

Training Configuration

Hyperparameter	Value (640p)	Value (1280p)
Epochs	100 (early stopping, patience=20)	100 (early stopping, patience=20)
Batch Size	16	4 (auto-scaled)
Image Size	640	1280
Optimizer	SGD (Ultralytics default)	SGD (Ultralytics default)
Learning Rate	Auto-scaled	Auto-scaled
Augmentation	HSV jitter, horizontal flip, mosaic, scale	Same
Device	NVIDIA GPU (CUDA)	NVIDIA GPU (CUDA)

Data Augmentation Details:

hsv_h=0.015, hsv_s=0.7, hsv_v=0.4
degrees=10.0, translate=0.1, scale=0.5
fliplr=0.5, mosaic=1.0

Evaluation Results

Test Set Performance (All Variants)

Variant	[email protected]	[email protected]:0.95	Precision	Recall	GPU (ms)	CPU (ms)
Nano (n)	0.572	0.457	0.600	0.563	~5	~25
Small (s)	0.592	0.485	0.599	0.574	~7	~45

Benchmark Comparison

There is no established standard benchmark for warehouse pallet detection. The table below compares against results reported in published literature on similar (but not identical) datasets, to provide context for this model's performance.

Model	Dataset	Images	[email protected]	Precision	Recall	Source
This model (YOLOv11n)	EDITools Warehouse	Full dataset	0.572	0.600	0.563	This work
This model (YOLOv11s)	EDITools Warehouse	Full dataset	0.592	0.599	0.574	This work
NVIDIA SDG Pallet	Omniverse synthetic	~25,000	-	-	-	NVIDIA SDG Pallet Model
YOLOv8 (synthetic)	Unity synthetic	Synthetic	0.995	-	-	Pallet Detection From Synthetic Data (2025)
YOLOv8 (synthetic boost)	Synthetic + real	Custom	+69% stacked	-	-	Improving Pallet Detection Using Synthetic Data (2024)
YOLOv8	Custom warehouse	Custom	0.950	-	-	Semi-Autonomous Forklift (2025)
YOLOv11	Custom warehouse	Custom	-	0.93+	-	Semi-Autonomous Forklift (2025)
AM-Mask R-CNN	Complex warehouse	Custom	-	-	-	Enhanced Pallet Detection (2025)
Faster R-CNN	Industrial warehouse	1,344	0.89	-	-	IEEE Comparison (2020)
SSD	Industrial warehouse	1,344	0.85	-	-	IEEE Comparison (2020)
YOLOv4	Industrial warehouse	1,344	0.82	-	-	IEEE Comparison (2020)
YOLOv5 + ArUco	Custom + fiducial	Custom	-	0.995	-	Pallet Detection with YOLO + Fiducial (2023)
YOLOv8 + CBAM	Warehouse tracking	Custom	-	-	-	CBAM Pallet Tracking (2025)
YOLOX	Industrial	Custom	-	-	-	Digital Camera Pallet Detection

Notes on comparison:

No standard benchmark exists for warehouse pallet detection (unlike COCO or KITTI for general/driving OD). Each study uses its own private dataset, making direct comparison difficult.
The NVIDIA SDG model is the most production-ready alternative, trained on ~25K synthetic images via Omniverse, targeting pallet side-face centers/corners. It detects wood, metal, and plastic pallets but focuses on pallet pocket localization for forklift docking rather than full pallet unit detection.
The synthetic-data model (0.995 mAP) was evaluated on simple single-pallet scenes, not cluttered warehouses.
This model is the first dedicated pallet detection model published to Hugging Face Hub — no fine-tuned pallet model previously existed in the HF ecosystem.
This model is specifically optimized for foreground pallet detection in real operational environments.

Performance by Scenario

Scenario	Qualitative Performance
Single pallet, clear view	Excellent
Multiple pallets in row	Good - detects individual units
Pallet on forklift forks	Good - detects if mostly visible
Pallets on racks (background)	Limited - trained for foreground detection
Motion blur	Good - trained on real warehouse video frames
Low/mixed lighting	Good - augmented with HSV jitter

Usage

Quick Start

from ultralytics import YOLO

# Choose the variant that fits your deployment:
#   "n" = nano (fastest, edge devices)
#   "s" = small (balanced)
#   "s" = small (best accuracy)
model = YOLO("EFFGRP/yolov11s-warehouse-pallets")  # or n

# Run inference on an image
results = model.predict("warehouse_photo.jpg", conf=0.25)

# Process results
for result in results:
    for box in result.boxes:
        cls_id = int(box.cls[0])
        confidence = float(box.conf[0])
        x1, y1, x2, y2 = box.xyxy[0].tolist()
        print(f"Pallet detected: conf={confidence:.2f}, bbox=({x1:.0f},{y1:.0f},{x2:.0f},{y2:.0f})")

Choosing a Variant

from ultralytics import YOLO

# Edge deployment (Jetson, RPi, mobile) — use nano at 640p
model = YOLO("EFFGRP/yolov11n-warehouse-pallets-640")

# General warehouse camera system — use small at 640p as the best tradeoff
model = YOLO("EFFGRP/yolov11s-warehouse-pallets-640")

# High-res camera with small/distant pallets — use small at 1280p
model = YOLO("EFFGRP/yolov11s-warehouse-pallets-1280")

Batch Processing

from ultralytics import YOLO
from pathlib import Path

model = YOLO("EFFGRP/yolov11s-warehouse-pallets")

# Process a directory of images
image_dir = Path("warehouse_photos/")
results = model.predict(
    source=str(image_dir),
    conf=0.25,
    save=True,           # Save annotated images
    save_txt=True,       # Save YOLO-format labels
    project="output/",
    name="pallet_detections"
)

Export to ONNX for Edge Deployment

from ultralytics import YOLO

# Export nano for edge, or any other variant
model = YOLO("EFFGRP/yolov11n-warehouse-pallets")
model.export(format="onnx", imgsz=640, simplify=True)
# Produces yolov11n-warehouse-pallets.onnx for deployment on edge devices

Integration with Warehouse Systems

This model is designed to work as part of a larger warehouse automation pipeline. Example integration with a Redis message queue:

import json
import redis
from ultralytics import YOLO

model = YOLO("EFFGRP/yolov11s-warehouse-pallets")
r = redis.Redis()

def process_frame(image_path):
    results = model.predict(image_path, conf=0.25, verbose=False)
    detections = []
    for box in results[0].boxes:
        detections.append({
            "confidence": float(box.conf[0]),
            "bbox": box.xyxy[0].tolist(),
            "center": box.xywh[0][:2].tolist(),
        })
    r.lpush("pallet_detections", json.dumps({
        "image": str(image_path),
        "count": len(detections),
        "detections": detections,
    }))

Model Files

Each variant repository contains:

File	Description
`whole_pallet_{size}_{resolution}.pt`	PyTorch model weights
`README.md`	This model card
`benchmark_results.json`	Structured benchmark comparison data

Where {size} is one of: n (nano), s (small), and {resolution} is 640 or 1280.

Limitations

Single class: Only detects "pallet" (complete unit). Does not distinguish pallet types, contents, or conditions.
Foreground bias: Trained primarily on foreground pallets. Background or distant pallets may be missed.
Domain specific: Trained on a single warehouse environment. Performance may degrade in visually different warehouses (outdoor yards, cold storage, etc.). Fine-tuning on your own data is recommended.
Partial occlusion: Pallets significantly occluded by other pallets (not people or forklifts) are intentionally excluded from training labels.
Dataset size: Performance improves with more training data. Fine-tuning on your own warehouse data is recommended.

Ethical Considerations

This model is intended for warehouse automation and logistics optimization. It does not process personal biometric data. However, warehouse images may incidentally contain workers - this model does not detect or track people, but users should ensure compliance with workplace surveillance regulations when deploying camera systems.

Citation

If you use this model in your research, please cite:

@misc{effgrp-warehouse-pallets-2026,
  title={YOLOv11 Warehouse Pallet Detector},
  author={EFFGRP},
  year={2026},
  publisher={Hugging Face},
  url={https://huggingface.co/EFFGRP/yolov11s-warehouse-pallets}
}

Landscape: Why This Model Exists

As of March 2026, no dedicated pallet detection model exists on Hugging Face. The HF Hub has ~21 models tagged "logistics" but none are pallet-specific. The closest alternatives are:

NVIDIA SDG Pallet Model (GitHub) — trained on ~25K synthetic images via Omniverse, focuses on pallet side-face and pocket localization for autonomous forklift docking. Production-ready but targets a different task (pocket detection vs. full pallet unit detection).
Roboflow Universe (pallets) — community datasets with 1,755+ images and some pre-trained models, but fragmented across projects with inconsistent annotation guidelines.
Academic models — published in papers but weights/code not publicly shared on model hubs.

This model fills the gap as a ready-to-use, real-world-trained pallet unit detector for the Hugging Face ecosystem.

Related Work

NVIDIA SDG Pallet Model (2024) - Synthetic data + Omniverse for pallet pocket localization
NVIDIA Pallet Detection Blog - OpenUSD synthetic data pipeline
Pallet Detection and Localisation From Synthetic Data (2025) - Unity domain randomization + YOLOv8
Improving Pallet Detection Using Synthetic Data (2024) - 69% mAP improvement on stacked pallets with synthetic augmentation
Learning-Based Vision Systems for Semi-Autonomous Forklift (2025) - YOLOv8/v11 comparison for forklift pallet detection
Enhanced Pallet Detection: AM-Mask R-CNN (2025) - Attention-enhanced Mask R-CNN for complex warehouses
A Comparison of Deep Learning Models for Pallet Detection (IEEE, 2020) - Benchmark comparing Faster R-CNN, SSD, YOLOv4 on 1,344 images
Pallet Detection and Distance Estimation with YOLO + Fiducial Markers (2023) - YOLOv5 + ArUco for distance estimation
Amazon ARMBench (2023) - Large-scale warehouse manipulation benchmark (450K+ labels)
CBAM-Enhanced Pallet Tracking (2025) - YOLOv8 + attention + DeepSORT for tracking
Roboflow Pallet Datasets - Community-contributed pallet detection datasets (1,755+ images)
EmaroLab PDT - Pallet Detection and Tracking with Faster R-CNN + 2D LRF

Downloads last month: 14

Paper for EFFGRP/yolov11s-warehouse-pallets-640

Pallet Detection And Localisation From Synthetic Data

Paper • 2503.22965 • Published Mar 29, 2025

Evaluation results

[email protected] on EDITools Warehouse Pallet Dataset
test set self-reported

0.572
[email protected]:0.95 on EDITools Warehouse Pallet Dataset
test set self-reported

0.457
Precision on EDITools Warehouse Pallet Dataset
test set self-reported

0.600
Recall on EDITools Warehouse Pallet Dataset
test set self-reported

0.563
[email protected] on EDITools Warehouse Pallet Dataset
test set self-reported

0.592
[email protected]:0.95 on EDITools Warehouse Pallet Dataset
test set self-reported

0.485
Precision on EDITools Warehouse Pallet Dataset
test set self-reported

0.599
Recall on EDITools Warehouse Pallet Dataset
test set self-reported

0.574