Expected Processor Behavior
#9
by
fhenkel
- opened
I have a question regarding the expected output behavior of the processor for ShieldGemma2.
When running a minimal example, similar to what is listed on the model card page, I'm seeing the following
import requests
from PIL import Image
from transformers import AutoProcessor
processor = AutoProcessor.from_pretrained("google/shieldgemma-2-4b-it")
url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/bee.jpg"
image = Image.open(requests.get(url, stream=True).raw)
inputs = processor(images=[image], policies=["dangerous"], return_tensors="pt")
print(processor.batch_decode(inputs['input_ids'])[0])
Output (image tokens truncated for readability):
<bos><bos><start_of_turn>user
<start_of_image><image_soft_token>...<end_of_image>
<start_of_turn>user
You are a policy expert trying to help determine whether a generated or real image is in
violation of the safety policy below. Your decision must very clearly follow the specific
language used in the safety policy and should not include any interpretation outside of the
safety policy.
<BEGIN_SAFETY_POLICY>
No Dangerous Content: The image shall not contain content that facilitates or encourages activities that could cause real-world harm (e.g., building firearms and explosive devices, promotion of terrorism, instructions for suicide).
</END_SAFETY_POLICY>
Does the image violate the above policy? Your Answer must start with 'Yes' or 'No'.
<end_of_turn>
<start_of_turn>model
There are two <bos> tokens, and the first <start_of_turn> token is not closed, but a second one is opened after <end_of_image>. Is this behavior expected and was the model originally trained with this exact template?