Shot Scale

This model predicts an image's cinematic camera angle [extreme_close_up, close_up, medium, full, wide]. The model is a DinoV2 with registers backbone (initiated with facebook/dinov2-with-registers-large weights) and trained on a diverse set of five thousand human-annotated images.

How to use:


import torch
from PIL import Image
from transformers import AutoImageProcessor
from transformers import AutoModelForImageClassification

image_processor = AutoImageProcessor.from_pretrained("facebook/dinov2-with-registers-large")
model = AutoModelForImageClassification.from_pretrained('aslakey/shot_scale')
model.eval()

# example medium shot image
# Model labels: [extreme_close_up, close_up, medium, full, wide]
image = Image.open('medium.jpg')
inputs = image_processor(image, return_tensors="pt")
with torch.no_grad():
    outputs = model(**inputs)

# technically multi-label training, but argmax works too!
predicted_label = outputs.logits.argmax(-1).item()
print(model.config.id2label[predicted_label])

Performance:

Due to very low representation for ECU, the performance on that category is less than desirable. In the next version we will oversample ECU images. Also note that Wide and Full shots overlap quite a bit. In practice, a full shot is often a wide shot with a human subject.

Category Precision Recall
ECU (low coverage) 75% 32%
CU 66% 51%
M 88% 90%
F 69% 68%
W 89% 83%
Downloads last month
34
Safetensors
Model size
0.3B params
Tensor type
F32
ยท
Inference Providers NEW
This model isn't deployed by any Inference Provider. ๐Ÿ™‹ Ask for provider support