DINOv3 ViT-H/16+ Booru Tagger

A multi-label image tagger trained on e621 and Danbooru annotations, using a DINOv3 ViT-H/16+ backbone fine-tuned end-to-end with a single linear projection head.

Model Details

Property	Value
Backbone	`facebook/dinov3-vith16plus-pretrain-lvd1689m`
Architecture	ViT-H/16+ · 32 layers · hidden dim 1280 · 20 heads · SwiGLU MLP · RoPE · 4 register tokens
Head	`Linear((1 + 4) × 1280 → 74 625)` — CLS + 4 register tokens concatenated
Vocabulary	74 625 tags (min frequency ≥ 50 across training set)
Input resolution	Any multiple of 16 px — trained at 512 px, generalises to higher resolutions
Input normalisation	ImageNet mean/std `[0.485, 0.456, 0.406]` / `[0.229, 0.224, 0.225]`
Output	Raw logits — apply `sigmoid` for per-tag probabilities
Parameters	~632 M (backbone) + ~480 M (head)

Training

Hyperparameter	Value
Training data	e621 + Danbooru (parquet)
Batch size	32
Learning rate	1e-6
Warmup steps	50
Loss	`BCEWithLogitsLoss` with per-tag `pos_weight = (neg/pos)^(1/T)`, cap 100
Optimiser	AdamW (β₁=0.9, β₂=0.999, wd=0.01)
Precision	bfloat16 (backbone) / float32 (projection + loss)
Hardware	2× GPU, ThreadPoolExecutor + NCCL all-reduce

Usage

1. Install dependencies

pip install -r requirements.txt

Or manually:

pip install torch torchvision safetensors Pillow requests \
            python-multipart fastapi uvicorn jinja2 aiofiles

2. Download model files

huggingface-cli download lodestones/taggerine \
    tagger_proto.safetensors \
    tagger_vocab_with_categories_and_alias_updated.json \
    tagger_ui_server.py \
    inference_tagger_standalone.py \
    --local-dir .

Note: tagger_proto.safetensors is ~5.3 GB. Make sure you have enough disk space.

3. Download the `tagger_ui/` templates folder

The server requires the tagger_ui/templates/ directory to be present alongside tagger_ui_server.py:

huggingface-cli download lodestones/taggerine \
    --include "tagger_ui/**" \
    --local-dir .

4. Run the Web UI

python tagger_ui_server.py \
    --checkpoint tagger_proto.safetensors \
    --vocab tagger_vocab_with_categories_and_alias_updated.json \
    --port 7860
# → open http://localhost:7860

CPU-only machine? Add --device cpu (inference will be slower):

python tagger_ui_server.py \
    --checkpoint tagger_proto.safetensors \
    --vocab tagger_vocab_with_categories_and_alias_updated.json \
    --device cpu \
    --port 7860

Standalone CLI inference (no server)

python inference_tagger_standalone.py \
    --checkpoint tagger_proto.safetensors \
    --vocab tagger_vocab_with_categories_and_alias_updated.json \
    --images photo.jpg \
    --topk 30

Files

File	Description
`tagger_proto.safetensors`	Model weights (bfloat16)
`tagger_vocab_with_categories_and_alias_updated.json`	`{"idx2tag": [...], "tag2category": {...}}` — 74 625 tags with category metadata
`tagger_vocab_with_categories.json`	Same without alias data
`tagger_vocab.json`	Minimal vocab — `{"idx2tag": [...]}` only
`inference_tagger_standalone.py`	Self-contained CLI inference script (no `transformers` dep)
`tagger_ui_server.py`	FastAPI + Jinja2 web UI server
`requirements.txt`	Python dependencies

Tag Vocabulary

Tags are sourced from e621 and Danbooru annotations and cover:

Subject — species, character count, gender (solo, duo, anthro, 1girl, male, …)
Body — anatomy, fur/scale/skin markings, body parts
Action / pose — looking at viewer, sitting, …
Scene — background, lighting, setting
Style — digital art, hi res, sketch, watercolor, …
Rating — explicit content tags are included; filter as needed for your use case

Minimum tag frequency threshold: 50 occurrences across the combined dataset.

Limitations

Evaluated on booru-style illustrations and furry art; performance on photographic images or other art works to some extend.
The vocabulary reflects the biases of e621 and Danbooru annotation practices.

License

Apache 2.0

Downloads last month: -; Downloads are not tracked for this model. How to track

Model tree for lodestones/taggerine

Finetunes

2 models

lodestones
/

taggerine

DINOv3 ViT-H/16+ Booru Tagger

Model Details

Training

Usage

1. Install dependencies

2. Download model files

3. Download the `tagger_ui/` templates folder

4. Run the Web UI

Standalone CLI inference (no server)

Files

Tag Vocabulary

Limitations

License

Model tree for lodestones/taggerine

Spaces using lodestones/taggerine 4

DINOv3 ViT-H/16+ Booru Tagger

Model Details

Training

Usage

1. Install dependencies

2. Download model files

3. Download the tagger_ui/ templates folder

4. Run the Web UI

Standalone CLI inference (no server)

Files

Tag Vocabulary

Limitations

License

Model tree for lodestones/taggerine

Spaces using lodestones/taggerine 4

3. Download the `tagger_ui/` templates folder