Image resolutions that will work well?

by pedalnomica - opened Dec 6, 2024

Dec 6, 2024

Thank you for all the hardwork that went into creating this model and providing it to the community!

The model card could be improved by making it clear what resolutions your model supports/will perform well with/was trained on. This is the most basic information for a vision LLM: what inputs will work (well) with it? For some reason almost everyone releasing vision LLMs makes this very hard to figure out.

I'm guessing it is like the your 2.0, up to 12 tiles of 448x448 pixels? Some things that weren't clear to me with that were:
-What if one of the dimensions of your image isn't divisible by 448?
-What if your image would require more than 12 tiles?
-If inputs violating those constraints aren't outright rejected, what happens? (e.g. do the tiles overlap/ is the image is resized or cropped) Is the model trained on such images?

Thanks again!

pedalnomica changed discussion title from Image resolutions that will work well to Image resolutions that will work well? Dec 6, 2024

czczup

OpenGVLab org Dec 9, 2024

Thank you for your kind words and valuable feedback! We appreciate your suggestion to clarify supported resolutions in the model card. Here's the detailed information:

If one of the dimensions of your image isn't divisible by 448, the image will be resized to the nearest dimensions divisible by 448, which might introduce some slight distortion.
You can control the resolution and tiling behavior using the max_num parameter. By default, we set max_num=12, but you can adjust this to 18 or 24 tiles to process higher-resolution images.
If an input violates these constraints (e.g., exceeds the maximum number of tiles), the model may resize or crop the image to fit within the supported tiling limits. The model has been trained on such cases to ensure robust performance.

Additionally, you can refer to the dynamic_preprocess function in the README for more details on how preprocessing is handled dynamically.

czczup changed discussion status to closed Dec 19, 2024

Upload images, audio, and videos by dragging in the text input, pasting, or clicking here.

Tap or paste here to upload images

· Sign up or log in to comment