Unable to load ibm-granite/granite-vision-3.2-2b (LlavaNextForConditionalGeneration config mismatch) #3143

Vinno97 · 2025-03-28T14:51:44Z

System Info

2025-03-28T14:39:27.430620Z  INFO text_generation_launcher: Runtime environment:
Target: x86_64-unknown-linux-gnu
Cargo version: 1.85.0
Commit sha: 4d28897b4e345f4dfdd93d3434e50ac8afcdf9e1
Docker label: sha-4d28897
nvidia-smi:
Fri Mar 28 14:39:27 2025       
   +-----------------------------------------------------------------------------------------+
   | NVIDIA-SMI 570.133.07             Driver Version: 570.133.07     CUDA Version: 12.8     |
   |-----------------------------------------+------------------------+----------------------+
   | GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
   | Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
   |                                         |                        |               MIG M. |
   |=========================================+========================+======================|
   |   0  NVIDIA GeForce RTX 3060        Off |   00000000:08:00.0  On |                  N/A |
   |  0%   51C    P3             32W /  170W |    6618MiB /  12288MiB |     39%      Default |
   |                                         |                        |                  N/A |
   +-----------------------------------------+------------------------+----------------------+
                                                                                            
   +-----------------------------------------------------------------------------------------+
   | Processes:                                                                              |
   |  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
   |        ID   ID                                                               Usage      |
   |=========================================================================================|
   +-----------------------------------------------------------------------------------------+
xpu-smi:
N/A
hpu-smi:
N/A

2025-03-28T14:39:27.430665Z  INFO text_generation_launcher: Args {
    model_id: "bigscience/bloom-560m",
    revision: None,
    validation_workers: 2,
    sharded: None,
    num_shard: None,
    quantize: None,
    speculate: None,
    dtype: None,
    kv_cache_dtype: None,
    trust_remote_code: false,
    max_concurrent_requests: 128,
    max_best_of: 2,
    max_stop_sequences: 4,
    max_top_n_tokens: 5,
    max_input_tokens: None,
    max_input_length: None,
    max_total_tokens: None,
    waiting_served_ratio: 0.3,
    max_batch_prefill_tokens: None,
    max_batch_total_tokens: None,
    max_waiting_tokens: 20,
    max_batch_size: None,
    cuda_graphs: None,
    hostname: "46c8c5d5f669",
    port: 80,
    shard_uds_path: "/tmp/text-generation-server",
    master_addr: "localhost",
    master_port: 29500,
    huggingface_hub_cache: None,
    weights_cache_override: None,
    disable_custom_kernels: false,
    cuda_memory_fraction: 1.0,
    rope_scaling: None,
    rope_factor: None,
    json_output: false,
    otlp_endpoint: None,
    otlp_service_name: "text-generation-inference.router",
    cors_allow_origin: [],
    api_key: None,
    watermark_gamma: None,
    watermark_delta: None,
    ngrok: false,
    ngrok_authtoken: None,
    ngrok_edge: None,
    tokenizer_config_path: None,
    disable_grammar_support: false,
    env: true,
    max_client_batch_size: 4,
    lora_adapters: None,
    usage_stats: On,
    payload_limit: 2000000,
    enable_prefill_logprobs: false,
}

Information

Docker
The CLI directly

Tasks

An officially supported command
My own modifications

Reproduction

Just running Granite-Vision-3.2-2B causes the crash on start-up:

docker run --gpus all --shm-size 1g -p 8080:80 -v ./models:/data ghcr.io/huggingface/text-generation-inference:3.2.1 --model-id ibm-granite/granite-vision-3.2-2b

Excerpt from log

2025-03-28T14:33:21.511700Z  INFO text_generation_launcher: Using Attention = flashdecoding
2025-03-28T14:33:24.392413Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-03-28T14:33:34.400967Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-03-28T14:33:44.409181Z  INFO shard-manager: text_generation_launcher: Waiting for shard to be ready... rank=0
2025-03-28T14:33:49.270771Z ERROR text_generation_launcher: Error when initializing model
Traceback (most recent call last):
  File "/usr/src/.venv/bin/text-generation-server", line 10, in <module>
    sys.exit(app())
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 323, in __call__
    return get_command(self)(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1161, in __call__
    return self.main(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 743, in main
    return _main(
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/core.py", line 198, in _main
    rv = self.invoke(ctx)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1697, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 1443, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/usr/src/.venv/lib/python3.11/site-packages/click/core.py", line 788, in invoke
    return __callback(*args, **kwargs)
  File "/usr/src/.venv/lib/python3.11/site-packages/typer/main.py", line 698, in wrapper
    return callback(**use_params)
  File "/usr/src/server/text_generation_server/cli.py", line 119, in serve
    server.serve(
  File "/usr/src/server/text_generation_server/server.py", line 315, in serve
    asyncio.run(
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 190, in run
    return runner.run(main)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/runners.py", line 118, in run
    return self._loop.run_until_complete(task)
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 641, in run_until_complete
    self.run_forever()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 608, in run_forever
    self._run_once()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/base_events.py", line 1936, in _run_once
    handle._run()
  File "/root/.local/share/uv/python/cpython-3.11.11-linux-x86_64-gnu/lib/python3.11/asyncio/events.py", line 84, in _run
    self._context.run(self._callback, *self._args)
> File "/usr/src/server/text_generation_server/server.py", line 268, in serve_inner
    model = get_model_with_lora_adapters(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1690, in get_model_with_lora_adapters
    model = get_model(
  File "/usr/src/server/text_generation_server/models/__init__.py", line 1586, in get_model
    return VlmCausalLM(
  File "/usr/src/server/text_generation_server/models/vlm_causal_lm.py", line 362, in __init__
    super().__init__(
  File "/usr/src/server/text_generation_server/models/flash_causal_lm.py", line 1269, in __init__
    model = model_class(prefix, config, weights)
  File "/usr/src/server/text_generation_server/models/custom_modeling/llava_next.py", line 120, in __init__
    if config.vision_feature_layer < 0:
TypeError: '<' not supported between instances of 'list' and 'int'
2025-03-28T14:33:51.727103Z ERROR shard-manager: text_generation_launcher: Shard complete standard error output:

2025-03-28 14:33:17.195 | INFO     | text_generation_server.utils.import_utils:<module>:76 - Detected system cuda
/usr/src/server/text_generation_server/layers/gptq/triton.py:242: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd(cast_inputs=torch.float16)
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:158: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/selective_scan_interface.py:231: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:507: FutureWarning: `torch.cuda.amp.custom_fwd(args...)` is deprecated. Please use `torch.amp.custom_fwd(args..., device_type='cuda')` instead.
  @custom_fwd
/usr/src/.venv/lib/python3.11/site-packages/mamba_ssm/ops/triton/layernorm.py:566: FutureWarning: `torch.cuda.amp.custom_bwd(args...)` is deprecated. Please use `torch.amp.custom_bwd(args..., device_type='cuda')` instead.
  @custom_bwd
Using a slow image processor as `use_fast` is unset and a slow processor was saved with this model. `use_fast=True` will be the default behavior in v4.48, even if the model was saved with a slow processor. This will result in minor differences in outputs. You'll still be able to use a slow processor with `use_fast=False`.
╭───────────────────── Traceback (most recent call last) ──────────────────────╮
│                                                                              │
│ /usr/src/server/text_generation_server/models/custom_modeling/llava_next.py: │
│ 120 in __init__                                                              │
│                                                                              │
│   117 │   │   vision_config = config.vision_config                           │
│   118 │   │   # Instead of selecting in hidden_states[-2].                   │
│   119 │   │   # Instead compute only the n -2 + 1 layers and don't pool      │
│ ❱ 120 │   │   if config.vision_feature_layer < 0:                            │
│   121 │   │   │   vision_config.num_hidden_layers += config.vision_feature_l │
│   122 │   │   else:                                                          │
│   123 │   │   │   vision_config.num_hidden_layers = config.vision_feature_la │
│                                                                              │
│ ╭───────────────────────────────── locals ─────────────────────────────────╮ │
│ │        config = LlavaNextConfig {                                        │ │
│ │                   "_name_or_path": "ibm-granite/granite-vision-3.2-2b",  │ │
│ │                   "architectures": [                                     │ │
│ │                 │   "LlavaNextForConditionalGeneration"                  │ │
│ │                   ],                                                     │ │
│ │                   "image_grid_pinpoints": [                              │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     1152                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     1536                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     1920                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     2304                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     2688                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     3072                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     3456                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     384,                                               │ │
│ │                 │     3840                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     1152                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     1536                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     768,                                               │ │
│ │                 │     1920                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1152,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1152,                                              │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1152,                                              │ │
│ │                 │     1152                                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1536,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1536,                                              │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1920,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     1920,                                              │ │
│ │                 │     768                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     2304,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     2688,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     3072,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     3456,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ],                                                   │ │
│ │                 │   [                                                    │ │
│ │                 │     3840,                                              │ │
│ │                 │     384                                                │ │
│ │                 │   ]                                                    │ │
│ │                   ],                                                     │ │
│ │                   "image_seq_length": 576,                               │ │
│ │                   "image_token_index": 49155,                            │ │
│ │                   "model_type": "llava_next",                            │ │
│ │                   "multimodal_projector_bias": true,                     │ │
│ │                   "projector_hidden_act": "gelu",                        │ │
│ │                   "quantize": null,                                      │ │
│ │                   "speculator": null,                                    │ │
│ │                   "text_config": {                                       │ │
│ │                 │   "architectures": [                                   │ │
│ │                 │     "GraniteForCausalLM"                               │ │
│ │                 │   ],                                                   │ │
│ │                 │   "attention_dropout": 0.1,                            │ │
│ │                 │   "attention_multiplier": 0.015625,                    │ │
│ │                 │   "bos_token_id": 0,                                   │ │
│ │                 │   "embedding_multiplier": 12.0,                        │ │
│ │                 │   "eos_token_id": 0,                                   │ │
│ │                 │   "hidden_size": 2048,                                 │ │
│ │                 │   "intermediate_size": 8192,                           │ │
│ │                 │   "logits_scaling": 8.0,                               │ │
│ │                 │   "max_position_embeddings": 131072,                   │ │
│ │                 │   "model_type": "granite",                             │ │
│ │                 │   "num_hidden_layers": 40,                             │ │
│ │                 │   "num_key_value_heads": 8,                            │ │
│ │                 │   "pad_token_id": 0,                                   │ │
│ │                 │   "residual_multiplier": 0.22,                         │ │
│ │                 │   "rms_norm_eps": 1e-05,                               │ │
│ │                 │   "rope_theta": 300000,                                │ │
│ │                 │   "tie_word_embeddings": true,                         │ │
│ │                 │   "torch_dtype": "bfloat16",                           │ │
│ │                 │   "vocab_size": 49156                                  │ │
│ │                   },                                                     │ │
│ │                   "tie_word_embeddings": true,                           │ │
│ │                   "transformers_version": "4.49.0",                      │ │
│ │                   "use_image_newline_parameter": true,                   │ │
│ │                   "vision_config": {                                     │ │
│ │                 │   "hidden_act": "gelu_pytorch_tanh",                   │ │
│ │                 │   "hidden_size": 1152,                                 │ │
│ │                 │   "image_size": 384,                                   │ │
│ │                 │   "intermediate_size": 4304,                           │ │
│ │                 │   "layer_norm_eps": 1e-06,                             │ │
│ │                 │   "model_type": "siglip_vision_model",                 │ │
│ │                 │   "num_attention_heads": 16,                           │ │
│ │                 │   "num_hidden_layers": 27,                             │ │
│ │                 │   "patch_size": 14,                                    │ │
│ │                 │   "quantize": null                                     │ │
│ │                   },                                                     │ │
│ │                   "vision_feature_layer": [                              │ │
│ │                 │   -24,                                                 │ │
│ │                 │   -20,                                                 │ │
│ │                 │   -12,                                                 │ │
│ │                 │   -1                                                   │ │
│ │                   ],                                                     │ │
│ │                   "vision_feature_select_strategy": "full"               │ │
│ │                 }                                                        │ │
│ │        prefix = None                                                     │ │
│ │          self = LlavaNextForConditionalGeneration()                      │ │
│ │ vision_config = SiglipVisionConfig {                                     │ │
│ │                   "attention_dropout": 0.0,                              │ │
│ │                   "hidden_act": "gelu_pytorch_tanh",                     │ │
│ │                   "hidden_size": 1152,                                   │ │
│ │                   "image_size": 384,                                     │ │
│ │                   "intermediate_size": 4304,                             │ │
│ │                   "layer_norm_eps": 1e-06,                               │ │
│ │                   "model_type": "siglip_vision_model",                   │ │
│ │                   "num_attention_heads": 16,                             │ │
│ │                   "num_channels": 3,                                     │ │
│ │                   "num_hidden_layers": 27,                               │ │
│ │                   "patch_size": 14,                                      │ │
│ │                   "quantize": null,                                      │ │
│ │                   "transformers_version": "4.49.0"                       │ │
│ │                 }                                                        │ │
│ │       weights = <text_generation_server.utils.weights.Weights object at  │ │
│ │                 0x71fc54eaaa10>                                          │ │
│ ╰──────────────────────────────────────────────────────────────────────────╯ │
╰──────────────────────────────────────────────────────────────────────────────╯
TypeError: '<' not supported between instances of 'list' and 'int' rank=0
2025-03-28T14:33:51.791963Z ERROR text_generation_launcher: Shard 0 failed to start
2025-03-28T14:33:51.791989Z  INFO text_generation_launcher: Shutting down shards
Error: ShardCannotStart

It's caused by this check:

text-generation-inference/server/text_generation_server/models/custom_modeling/llava_next.py

Line 120 in 0142550

if config.vision_feature_layer < 0:

The code expects vision_feature_layer to be a number, but in Granite's config, it's a list of values:

"vision_feature_layer": [
    -24,
    -20,
    -12,
    -1
],

I don't know if Granite doesn't follow the intended schema or if it's purely a TGI issue

Expected behavior

I would have expected the model to load without any issues, as it uses the LlavaNextForConditionalGeneration architecture

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Unable to load ibm-granite/granite-vision-3.2-2b (LlavaNextForConditionalGeneration config mismatch) #3143

Unable to load ibm-granite/granite-vision-3.2-2b (LlavaNextForConditionalGeneration config mismatch) #3143

Vinno97 commented Mar 28, 2025 •

edited

Loading

Unable to load ibm-granite/granite-vision-3.2-2b (LlavaNextForConditionalGeneration config mismatch) #3143

Unable to load ibm-granite/granite-vision-3.2-2b (LlavaNextForConditionalGeneration config mismatch) #3143

Comments

Vinno97 commented Mar 28, 2025 • edited Loading

System Info

Information

Tasks

Reproduction

Expected behavior

Vinno97 commented Mar 28, 2025 •

edited

Loading