You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
My own task or dataset (give details below)
Reproduction
The following fails when "forward" is called. If you increase the MAX_LENGTH to 30 it succeeds
import torch
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image
device = "cuda:0"
MAX_LENGTH = 15
# Load model and tokenizer
model_name = "Qwen/Qwen2-VL-7B"
model = Qwen2VLForConditionalGeneration.from_pretrained(
model_name,
torch_dtype=torch.float16,
device_map=None,
)
processor = AutoProcessor.from_pretrained(model_name)
# Prepare model with FSDP
model = model.to(device)
text = "test this image <|vision_start|><|image_pad|><|vision_end|>"
image = [Image.new('RGB', (100, 100), color='red')]
# Prepare inputs
inputs = processor(
text = text,
images = image,
return_tensors="pt",
max_length=MAX_LENGTH,
truncation = "longest_first",
padding = True,
)
# Move inputs to device
inputs = {k: v.to(device) for k, v in inputs.items()}
inputs["labels"] = inputs["input_ids"].clone()
outputs = model(**inputs)
Expected behavior
I expect the script to either:
fail gracefully at tokenization time raising an error informing the user that the image tokens are being truncated and this is untennable, possibly with another kwarg telling it to ignore the error and return the broken tokens or
Right, we expect that the max length takes into account the total length of the input. We have already seen a few issues in LLaVa and added a doc about this afair. But raising a warning/error is a better option now that multimodality is more common, and more beginner users are using them
Since we do this for all Vision LLMs, the error will have to be added in all processors. We can do return_overflowing_tokens=True bu default and check if modality special tokens were truncated. Then raise error/warning if special token is overflowed. I don't want to copy same block in all processors with code duplication, so I will make a fix that changes tokenizers to handle everything. Will submit a PR soon :)
System Info
transformers
version: 4.49.0- distributed_type: MULTI_GPU
- mixed_precision: no
- use_cpu: False
- debug: False
- num_processes: 8
- machine_rank: 0
- num_machines: 1
- gpu_ids: all
- rdzv_backend: static
- same_network: True
- main_training_function: main
- enable_cpu_affinity: False
- downcast_bf16: no
- tpu_use_cluster: False
- tpu_use_sudo: False
- tpu_env: []
Who can help?
@qubvel
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
The following fails when "forward" is called. If you increase the MAX_LENGTH to 30 it succeeds
Expected behavior
I expect the script to either:
fail gracefully at tokenization time raising an error informing the user that the image tokens are being truncated and this is untennable, possibly with another kwarg telling it to ignore the error and return the broken tokens or
Truncate the image tokens, grid, and pixel values in a compatible way that works with the model forward. This is complicated by the other issue I raised: Quen FSDP model training hangs when some batches do not contain images #37186
The text was updated successfully, but these errors were encountered: