Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Qwen fails ungracefully when images are truncated #37222

Open
2 of 4 tasks
gbarello-uipath opened this issue Apr 2, 2025 · 1 comment
Open
2 of 4 tasks

Qwen fails ungracefully when images are truncated #37222

gbarello-uipath opened this issue Apr 2, 2025 · 1 comment
Assignees
Labels

Comments

@gbarello-uipath
Copy link
Contributor

System Info

  • transformers version: 4.49.0
  • Platform: Linux-6.8.0-1025-gcp-x86_64-with-glibc2.39
  • Python version: 3.11.10
  • Huggingface_hub version: 0.29.3
  • Safetensors version: 0.5.3
  • Accelerate version: 0.34.2
  • Accelerate config: - compute_environment: LOCAL_MACHINE
    - distributed_type: MULTI_GPU
    - mixed_precision: no
    - use_cpu: False
    - debug: False
    - num_processes: 8
    - machine_rank: 0
    - num_machines: 1
    - gpu_ids: all
    - rdzv_backend: static
    - same_network: True
    - main_training_function: main
    - enable_cpu_affinity: False
    - downcast_bf16: no
    - tpu_use_cluster: False
    - tpu_use_sudo: False
    - tpu_env: []
  • DeepSpeed version: not installed
  • PyTorch version (GPU?): 2.6.0+cu124 (True)
  • Tensorflow version (GPU?): not installed (NA)
  • Flax version (CPU?/GPU?/TPU?): not installed (NA)
  • Jax version: not installed
  • JaxLib version: not installed
  • Using distributed or parallel set-up in script?:
  • Using GPU in script?:
  • GPU type: NVIDIA H100 80GB HBM3

Who can help?

@qubvel

Information

  • The official example scripts
  • My own modified scripts

Tasks

  • An officially supported task in the examples folder (such as GLUE/SQuAD, ...)
  • My own task or dataset (give details below)

Reproduction

The following fails when "forward" is called. If you increase the MAX_LENGTH to 30 it succeeds

import torch
from transformers import AutoProcessor, Qwen2VLForConditionalGeneration
from PIL import Image

device = "cuda:0"

MAX_LENGTH = 15

# Load model and tokenizer
model_name = "Qwen/Qwen2-VL-7B"
model = Qwen2VLForConditionalGeneration.from_pretrained(
    model_name,
    torch_dtype=torch.float16,
    device_map=None,
)
processor = AutoProcessor.from_pretrained(model_name)

# Prepare model with FSDP
model = model.to(device)

text = "test this image <|vision_start|><|image_pad|><|vision_end|>"
image = [Image.new('RGB', (100, 100), color='red')]

# Prepare inputs
inputs = processor(
    text = text,
    images = image,
    return_tensors="pt",
    max_length=MAX_LENGTH,
    truncation = "longest_first",
    padding = True,
)    
# Move inputs to device
inputs = {k: v.to(device) for k, v in inputs.items()}

inputs["labels"] = inputs["input_ids"].clone()

outputs = model(**inputs)

Expected behavior

I expect the script to either:

  1. fail gracefully at tokenization time raising an error informing the user that the image tokens are being truncated and this is untennable, possibly with another kwarg telling it to ignore the error and return the broken tokens or

  2. Truncate the image tokens, grid, and pixel values in a compatible way that works with the model forward. This is complicated by the other issue I raised: Quen FSDP model training hangs when some batches do not contain images #37186

@zucchini-nlp
Copy link
Member

Hey @gbarello-uipath !

Right, we expect that the max length takes into account the total length of the input. We have already seen a few issues in LLaVa and added a doc about this afair. But raising a warning/error is a better option now that multimodality is more common, and more beginner users are using them

Since we do this for all Vision LLMs, the error will have to be added in all processors. We can do return_overflowing_tokens=True bu default and check if modality special tokens were truncated. Then raise error/warning if special token is overflowed. I don't want to copy same block in all processors with code duplication, so I will make a fix that changes tokenizers to handle everything. Will submit a PR soon :)

@zucchini-nlp zucchini-nlp self-assigned this Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants