Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unable to load ibm-granite/granite-vision-3.2-2b (LlavaNextForConditionalGeneration config mismatch) #3143

Open
1 of 4 tasks
Vinno97 opened this issue Mar 28, 2025 · 0 comments

Comments

@Vinno97
Copy link
Contributor

Vinno97 commented Mar 28, 2025

System Info

2025-03-28T14:39:27.430620Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.85.0
Commit sha: 4d28897b4e345f4dfdd93d3434e50ac8afcdf9e1
Docker label: sha-4d28897
nvidia-smi:
Fri Mar 28 14:39:27 2025       
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA GeForce RTX 3060        Off |   00000000:08:00.0  On |                  N/A |
   |  0%   51C    P3             32W /  170W |    6618MiB /  12288MiB |     39%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+
                                                                                            
   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   +-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
hpu-smi:
N/A

2025-03-28T14:39:27.430665Z  INFO text_generation_launcher: Args {
    model_id: "bigscience/bloom-560m",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "46c8c5d5f669",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: true,
    max_client_batch_size: 4,
    lora_adapters: None,
    usage_stats: On,
    payload_limit: 2000000,
    enable_prefill_logprobs: false,
}

Information

  • Docker
  • The CLI directly

Tasks

  • An officially supported command
  • My own modifications

Reproduction

Just running Granite-Vision-3.2-2B causes the crash on start-up:

docker run --gpus all --shm-size 1g -p 8080:80 -v ./models:/data ghcr.io/huggingface/text-generation-inference:3.2.1 --model-id ibm-granite/granite-vision-3.2-2b
Excerpt from log
2025-03-28T14:33:21.511700Z  INFO text_generation_launcher: Using Attention = flashdecoding
2025-03-28T14:33:24.392413Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-03-28T14:33:34.400967Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-03-28T14:33:44.409181Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-03-28T14:33:49.270771Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/usr/src/.venv/bin/text-generation-server", line 10, in <module>
    sys.exit(app())
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 743, in main
    return _main(
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 198, in _main
    rv = self.invoke(ctx)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
    server.serve(
  File "/usr/src/server/text_generation_server/server.py", line 315, in serve
    asyncio.run(
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/server/text_generation_server/server.py", line 268, in serve_inner
    model = get_model_with_lora_adapters(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1690, in get_model_with_lora_adapters
    model = get_model(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1586, in get_model
    return VlmCausalLM(
  File "/usr/src/server/text_generation_server/models/vlm_causal_lm.py", line 362, in __init__
    super().__init__(
  File "/usr/src/server/text_generation_server/models/flash_causal_lm.py", line 1269, in __init__
    model = model_class(prefix, config, weights)
  File "/usr/src/server/text_generation_server/models/custom_modeling/llava_next.py", line 120, in __init__
    if config.vision_feature_layer < 0:
TypeError: '<' not supported between instances of 'list' and 'int'
2025-03-28T14:33:51.727103Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2025-03-28 14:33:17.195 | INFO     | text_generation_server.utils.import_utils:<module>:76 - Detected system cuda
/usr/src/server/text_generation_server/layers/gptq/triton.py:242: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16)
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│                                                                              │
│ /usr/src/server/text_generation_server/models/custom_modeling/llava_next.py: │
│ 120 in __init__                                                              │
│                                                                              │
│   117 │   │   vision_config = config.vision_config                           │
│   118 │   │   # Instead of selecting in hidden_states[-2].                   │
│   119 │   │   # Instead compute only the n -2 + 1 layers and don't pool      │
│ ❱ 120 │   │   if config.vision_feature_layer < 0:                            │
│   121 │   │   │   vision_config.num_hidden_layers += config.vision_feature_l │
│   122 │   │   else:                                                          │
│   123 │   │   │   vision_config.num_hidden_layers = config.vision_feature_la │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │        config = LlavaNextConfig {                                        │ │
│ │                   "_name_or_path": "ibm-granite/granite-vision-3.2-2b",  │ │
│ │                   "architectures": [                                     │ │
│ │                 │   "LlavaNextForConditionalGeneration"                  │ │
│ │                   ],                                                     │ │
│ │                   "image_grid_pinpoints": [                              │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     1152                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     1536                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     1920                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     2304                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     2688                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     3072                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     3456                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     3840                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     1152                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     1536                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     1920                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1152,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1152,                                              │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1152,                                              │ │
│ │                 │     1152                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1536,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1536,                                              │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1920,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1920,                                              │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     2304,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     2688,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     3072,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     3456,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     3840,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ]                                                    │ │
│ │                   ],                                                     │ │
│ │                   "image_seq_length": 576,                               │ │
│ │                   "image_token_index": 49155,                            │ │
│ │                   "model_type": "llava_next",                            │ │
│ │                   "multimodal_projector_bias": true,                     │ │
│ │                   "projector_hidden_act": "gelu",                        │ │
│ │                   "quantize": null,                                      │ │
│ │                   "speculator": null,                                    │ │
│ │                   "text_config": {                                       │ │
│ │                 │   "architectures": [                                   │ │
│ │                 │     "GraniteForCausalLM"                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   "attention_dropout": 0.1,                            │ │
│ │                 │   "attention_multiplier": 0.015625,                    │ │
│ │                 │   "bos_token_id": 0,                                   │ │
│ │                 │   "embedding_multiplier": 12.0,                        │ │
│ │                 │   "eos_token_id": 0,                                   │ │
│ │                 │   "hidden_size": 2048,                                 │ │
│ │                 │   "intermediate_size": 8192,                           │ │
│ │                 │   "logits_scaling": 8.0,                               │ │
│ │                 │   "max_position_embeddings": 131072,                   │ │
│ │                 │   "model_type": "granite",                             │ │
│ │                 │   "num_hidden_layers": 40,                             │ │
│ │                 │   "num_key_value_heads": 8,                            │ │
│ │                 │   "pad_token_id": 0,                                   │ │
│ │                 │   "residual_multiplier": 0.22,                         │ │
│ │                 │   "rms_norm_eps": 1e-05,                               │ │
│ │                 │   "rope_theta": 300000,                                │ │
│ │                 │   "tie_word_embeddings": true,                         │ │
│ │                 │   "torch_dtype": "bfloat16",                           │ │
│ │                 │   "vocab_size": 49156                                  │ │
│ │                   },                                                     │ │
│ │                   "tie_word_embeddings": true,                           │ │
│ │                   "transformers_version": "4.49.0",                      │ │
│ │                   "use_image_newline_parameter": true,                   │ │
│ │                   "vision_config": {                                     │ │
│ │                 │   "hidden_act": "gelu_pytorch_tanh",                   │ │
│ │                 │   "hidden_size": 1152,                                 │ │
│ │                 │   "image_size": 384,                                   │ │
│ │                 │   "intermediate_size": 4304,                           │ │
│ │                 │   "layer_norm_eps": 1e-06,                             │ │
│ │                 │   "model_type": "siglip_vision_model",                 │ │
│ │                 │   "num_attention_heads": 16,                           │ │
│ │                 │   "num_hidden_layers": 27,                             │ │
│ │                 │   "patch_size": 14,                                    │ │
│ │                 │   "quantize": null                                     │ │
│ │                   },                                                     │ │
│ │                   "vision_feature_layer": [                              │ │
│ │                 │   -24,                                                 │ │
│ │                 │   -20,                                                 │ │
│ │                 │   -12,                                                 │ │
│ │                 │   -1                                                   │ │
│ │                   ],                                                     │ │
│ │                   "vision_feature_select_strategy": "full"               │ │
│ │                 }                                                        │ │
│ │        prefix = None                                                     │ │
│ │          self = LlavaNextForConditionalGeneration()                      │ │
│ │ vision_config = SiglipVisionConfig {                                     │ │
│ │                   "attention_dropout": 0.0,                              │ │
│ │                   "hidden_act": "gelu_pytorch_tanh",                     │ │
│ │                   "hidden_size": 1152,                                   │ │
│ │                   "image_size": 384,                                     │ │
│ │                   "intermediate_size": 4304,                             │ │
│ │                   "layer_norm_eps": 1e-06,                               │ │
│ │                   "model_type": "siglip_vision_model",                   │ │
│ │                   "num_attention_heads": 16,                             │ │
│ │                   "num_channels": 3,                                     │ │
│ │                   "num_hidden_layers": 27,                               │ │
│ │                   "patch_size": 14,                                      │ │
│ │                   "quantize": null,                                      │ │
│ │                   "transformers_version": "4.49.0"                       │ │
│ │                 }                                                        │ │
│ │       weights = <text_generation_server.utils.weights.Weights object at  │ │
│ │                 0x71fc54eaaa10>                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: '<' not supported between instances of 'list' and 'int' rank=0
2025-03-28T14:33:51.791963Z ERROR text_generation_launcher: Shard 0 failed to start
2025-03-28T14:33:51.791989Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

It's caused by this check:

The code expects vision_feature_layer to be a number, but in Granite's config, it's a list of values:

"vision_feature_layer": [
    -24,
    -20,
    -12,
    -1
],

I don't know if Granite doesn't follow the intended schema or if it's purely a TGI issue

Expected behavior

I would have expected the model to load without any issues, as it uses the LlavaNextForConditionalGeneration architecture

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant