chore: switch images of llama.cpp to the RamaLama images #2708

benoitf · 2025-03-14T08:37:56Z

What does this PR do?

switch to the RamaLama images

Use default images for macOS with gpuLayers=999 (as RamaLama does for libkrun/macOS) rather than using vulkan images
(vulkan images are failing with libKrun (unstable on my end))

Screenshot / video of UI

What issues does this PR fix or reference?

fixes #2630

How to test this PR?

try to start a playground/service for a model

packages/backend/src/workers/provider/LlamaCppPython.ts

ScrewTSW · 2025-03-17T10:38:11Z

There was an image pull error mentioned in regards to libkrun, but when I tested it on our infra, it seems to be working. I haven't tried with this PR yet, I'll have a look tomorrow #2712

benoitf · 2025-03-17T10:44:58Z

note: I tested on macOS with a libkrun podman machine, I haven't tested on Windows

axel7083

Linux native

Documentation is accessible
Playground is working as expected
/v1/chat/completion works well from PostMan

jeffmaury · 2025-03-17T12:05:31Z

Model refuses to start: MaziyarPanahi/Mistral-7B-Instruct-v0.3.Q4_K_M


warning: no usable GPU found, --gpu-layers option will be ignored
warning: one possible reason is that llama.cpp was compiled without GPU support
warning: consult docs/build.md for compilation instructions
terminate called after throwing an instance of 'std::runtime_error'
  what():  error: the supplied chat template is not supported: chatml-function-calling
note: llama.cpp was started without --jinja, we only support commonly used templates

�/usr/bin/llama-server.sh: line 14:     2 Aborted                 (core dumped) llama-server --model ${MODEL_PATH} --host ${HOST:=0.0.0.0} --port ${PORT:=8001} --gpu_layers ${GPU_LAYERS:=0} ${CHAT_FORMAT}

jeffmaury · 2025-03-17T12:08:26Z

Also I am not getting model metrics with bartowski/granite-3.1-8b-instruct-GGUF works with TheBloke/Mistral-7B-Instruct-v0.2-GGUF

benoitf · 2025-03-17T12:36:15Z

for the crash it's that the chat template is not available

current list is chatglm3, chatglm4, chatml, command-r, deepseek, deepseek2, exaone3, gemma, granite, llama2, llama2-sys, llama2-sys-bos, llama2-sys-strip, llama3, minicpm, mistral-v1, mistral-v3, mistral-v3-tekken, mistral-v7, monarch, openchat, orion, phi3, rwkv-world, vicuna, vicuna-orca, zephyr

wanted chatml-function-calling which is not there

benoitf · 2025-03-17T13:59:18Z

template for function calling updated in latest commit to chatml

benoitf · 2025-03-17T14:05:34Z

we might need some enhancement in ramalama to start the server with --jinja flag

gastoner

LGTM on Windows without a GPU

benoitf · 2025-03-27T12:22:32Z

rebased and temporary images for now (until the next release of RamaLama is out) just to try the PR

it includes this change from RamaLama containers/ramalama#1053

I'm able to run the functional programming recipe

benoitf · 2025-03-27T12:25:46Z

it should be ok now (except that I'll replace the custom image by the official one on the next release of RamaLama)

axel7083

On windows running model MaziyarPanahi/Mistral-7B-Instruct-v0.3.Q4_K_M the server is not starting

Logs

{"msg":"exec container process `/usr/bin/llama-server.sh`: Exec format error","level":"error","time":"2025-03-28T14:30:48.439048Z"}

Image used quay.io/fbenoit/ramalama-llama-server:jinja-2025-03-27

benoitf · 2025-03-28T14:37:02Z

@axel7083 I think this is expected for now as my image is arm64 only. I'm waiting for official image ( next release)

axel7083 · 2025-03-28T14:53:28Z

@axel7083 I think this is expected for now as my image is arm64 only. I'm waiting for official image ( next release)

Good to know, tag me when I will need to test again 👍

benoitf · 2025-03-31T14:27:34Z

@axel7083 0.7.2 images of RamaLama are now available since few minutes, so I switched to these images

axel7083

The CPU image is working, however the cuda image does not start and get the following error

chmod: cannot access './run.sh': No such file or directory

Sorry it took so long to review, the image is 6.75 GB and my internet is not nice

benoitf · 2025-03-31T15:12:37Z

maybe then https://github.com/containers/ramalama/pull/1081/files

benoitf · 2025-04-01T07:26:53Z

forgot to notify here that I switched to the respin of the images yesterday evening

ScrewTSW · 2025-04-01T09:26:07Z

currently cannot start projects on GPU, Linux, tgz, Nvidia RTX 3090

Succeeds with GPU support disabled

benoitf · 2025-04-01T09:35:45Z

@axel7083 looks like run.sh is from AI Lab

podman-desktop-extension-ai-lab/packages/backend/src/workers/provider/LlamaCppPython.ts

Lines 158 to 162 in 9fa9843

    
           user = '0'; 
        
           entrypoint = '/usr/bin/sh'; 
        
           cmd = ['-c', 'chmod 755 ./run.sh && ./run.sh'];

but the image has no run.sh script anyway

benoitf · 2025-04-07T14:35:47Z

I updated the images of RamaLama

I also update the entrypoint to be used and removed the chmod operation as the script has correct permissions

it should now work on Windows and Linux

axel7083 · 2025-04-08T14:18:57Z

packages/backend/src/assets/inference-images.json

-    "cuda": "ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat-cuda@sha256:e4b57e52c31b379b4a73f8e9536bc130fdea665d88dbd05643350295b3402a2f",
-    "vulkan": "ghcr.io/containers/podman-desktop-extension-ai-lab-playground-images/ai-lab-playground-chat-vulkan@sha256:6a93b247099643f4f8c78ee9896c2ce4e9a455af114a69be09c16ad36aa51fd2"
+    "default": "quay.io/ramalama/ramalama-llama-server@sha256:cbadb36fbbc2abf9867a33e6dfe3f2df4a76774259b5d4d24d50f4fc7e525406",
+    "cuda": "quay.io/ramalama/cuda-llama-server@sha256:cbadb36fbbc2abf9867a33e6dfe3f2df4a76774259b5d4d24d50f4fc7e525406"


Can't pull this image?

> podman pull quay.io/ramalama/cuda-llama-server@sha256:cbadb36fbbc2abf9867a33e6dfe3f2df4a76774259b5d4d24d50f4fc7e525406 Trying to pull quay.io/ramalama/cuda-llama-server@sha256:cbadb36fbbc2abf9867a33e6dfe3f2df4a76774259b5d4d24d50f4fc7e525406... Error: initializing source docker://quay.io/ramalama/cuda-llama-server@sha256:cbadb36fbbc2abf9867a33e6dfe3f2df4a76774259b5d4d24d50f4fc7e525406: reading manifest sha256:cbadb36fbbc2abf9867a33e6dfe3f2df4a76774259b5d4d24d50f4fc7e525406 in quay.io/ramalama/cuda-llama-server: manifest unknown

🤦 the tag has been overriden so the sha is no longer there, updating...

axel7083

Seems that the cuda image is 404 ?

fixes containers#2630 Signed-off-by: Florent Benoit <fbenoit@redhat.com>

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

…entrypoint Signed-off-by: Florent Benoit <fbenoit@redhat.com>

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

benoitf · 2025-04-08T14:37:38Z

@axel7083 updated the sha, hopping the tag won't be overwritten today

axel7083

okay after testing with the cuda image on Windows 11 (WSL2), the current configuration is not able to use the GPU, due to gpu config.

Deep dive

We need to change the default value for the gpu layers (currently the default is -1 when undefined).

We need to update the following to replace -1 with 99

podman-desktop-extension-ai-lab/packages/backend/src/utils/inferenceUtils.ts

Line 83 in 878854a

gpuLayers: options.gpuLayers ?? -1,

We also need to update the following comments

podman-desktop-extension-ai-lab/packages/shared/src/models/InferenceServerConfig.ts

Line 52 in d302152

* -1 to offload all the layers

By changing the above, I was able to make the inference go fast fast fast

benoitf · 2025-04-08T15:02:13Z

ok so basically I had to apply the env for libkrun but we need that for Windows/NVidia as well

axel7083 · 2025-04-08T15:03:18Z

ok so basically I had to apply the env for libkrun but we need that for Windows/NVidia as well

yes, before it was working as -1 was max out, but now -1 give 0 offloaded

…ound Signed-off-by: Florent Benoit <fbenoit@redhat.com>

benoitf · 2025-04-08T15:10:58Z

@axel7083 PR amended

axel7083

LGTM

jeffmaury

LGTM.

tested some recipes and models (ok not all combinations) without issues on Windows

ScrewTSW

LGTM, ramalama inference starts

benoitf requested review from jeffmaury and a team as code owners March 14, 2025 08:37

benoitf requested review from cdrage and gastoner March 14, 2025 08:37

benoitf marked this pull request as draft March 14, 2025 08:38

benoitf force-pushed the AI_LAB-2630 branch from ac4ae97 to 14af01a Compare March 17, 2025 09:42

benoitf marked this pull request as ready for review March 17, 2025 09:49

benoitf force-pushed the AI_LAB-2630 branch from 14af01a to 75e6950 Compare March 17, 2025 09:50

axel7083 reviewed Mar 17, 2025

View reviewed changes

packages/backend/src/workers/provider/LlamaCppPython.ts Outdated Show resolved Hide resolved

axel7083 approved these changes Mar 17, 2025

View reviewed changes

gastoner approved these changes Mar 24, 2025

View reviewed changes

benoitf force-pushed the AI_LAB-2630 branch from e80b009 to 8ec61cd Compare March 27, 2025 12:18

axel7083 requested changes Mar 28, 2025

View reviewed changes

benoitf force-pushed the AI_LAB-2630 branch from 8ec61cd to c3650b4 Compare March 31, 2025 14:26

benoitf requested a review from axel7083 March 31, 2025 14:27

axel7083 reviewed Mar 31, 2025

View reviewed changes

benoitf force-pushed the AI_LAB-2630 branch from c3650b4 to bf95622 Compare March 31, 2025 19:24

ScrewTSW mentioned this pull request Apr 1, 2025

fix: model info was modified async causing issue in recipe start #2790

Merged

benoitf force-pushed the AI_LAB-2630 branch from bf95622 to 1854808 Compare April 7, 2025 14:34

benoitf requested a review from axel7083 April 8, 2025 13:03

axel7083 reviewed Apr 8, 2025

View reviewed changes

benoitf added 6 commits April 8, 2025 16:36

chore: switch images of llama.cpp to the RamaLama images

601b4d2

fixes containers#2630 Signed-off-by: Florent Benoit <fbenoit@redhat.com>

chore: use 0.7.2 images

7d958c5

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

chore: update after new respin of ai lab image

41f5e6f

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

chore: update images and entrypoint (as it's executable) and name of …

d487415

…entrypoint Signed-off-by: Florent Benoit <fbenoit@redhat.com>

chore: fix unit tests

a20cb26

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

chore: updating the sha to the latest image

f989f9d

Signed-off-by: Florent Benoit <fbenoit@redhat.com>

benoitf force-pushed the AI_LAB-2630 branch from 2ab3c4e to f989f9d Compare April 8, 2025 14:37

axel7083 reviewed Apr 8, 2025

View reviewed changes

chore: apply 999/layers not only for Apple LibKrun but for all GPUs f…

89a9682

…ound Signed-off-by: Florent Benoit <fbenoit@redhat.com>

axel7083 approved these changes Apr 8, 2025

View reviewed changes

jeffmaury approved these changes Apr 9, 2025

View reviewed changes

ScrewTSW approved these changes Apr 9, 2025

View reviewed changes

benoitf merged commit e34d59f into containers:main Apr 9, 2025
7 of 8 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

chore: switch images of llama.cpp to the RamaLama images #2708

chore: switch images of llama.cpp to the RamaLama images #2708

benoitf commented Mar 14, 2025 •

edited

Loading

ScrewTSW commented Mar 17, 2025

benoitf commented Mar 17, 2025

axel7083 left a comment

jeffmaury commented Mar 17, 2025

jeffmaury commented Mar 17, 2025

benoitf commented Mar 17, 2025

benoitf commented Mar 17, 2025

benoitf commented Mar 17, 2025

gastoner left a comment

benoitf commented Mar 27, 2025

benoitf commented Mar 27, 2025

axel7083 left a comment •

edited

Loading

benoitf commented Mar 28, 2025

axel7083 commented Mar 28, 2025

benoitf commented Mar 31, 2025

axel7083 left a comment •

edited

Loading

benoitf commented Mar 31, 2025

benoitf commented Apr 1, 2025

ScrewTSW commented Apr 1, 2025 •

edited

Loading

benoitf commented Apr 1, 2025

benoitf commented Apr 7, 2025

axel7083 Apr 8, 2025

benoitf Apr 8, 2025

axel7083 left a comment

benoitf commented Apr 8, 2025

axel7083 left a comment

benoitf commented Apr 8, 2025

axel7083 commented Apr 8, 2025

benoitf commented Apr 8, 2025

axel7083 left a comment

jeffmaury left a comment

ScrewTSW left a comment

chore: switch images of llama.cpp to the RamaLama images #2708

chore: switch images of llama.cpp to the RamaLama images #2708

Conversation

benoitf commented Mar 14, 2025 • edited Loading

What does this PR do?

Screenshot / video of UI

What issues does this PR fix or reference?

How to test this PR?

ScrewTSW commented Mar 17, 2025

benoitf commented Mar 17, 2025

axel7083 left a comment

Choose a reason for hiding this comment

Linux native

jeffmaury commented Mar 17, 2025

jeffmaury commented Mar 17, 2025

benoitf commented Mar 17, 2025

benoitf commented Mar 17, 2025

benoitf commented Mar 17, 2025

gastoner left a comment

Choose a reason for hiding this comment

benoitf commented Mar 27, 2025

benoitf commented Mar 27, 2025

axel7083 left a comment • edited Loading

Choose a reason for hiding this comment

benoitf commented Mar 28, 2025

axel7083 commented Mar 28, 2025

benoitf commented Mar 31, 2025

axel7083 left a comment • edited Loading

Choose a reason for hiding this comment

benoitf commented Mar 31, 2025

benoitf commented Apr 1, 2025

ScrewTSW commented Apr 1, 2025 • edited Loading

benoitf commented Apr 1, 2025

benoitf commented Apr 7, 2025

axel7083 Apr 8, 2025

Choose a reason for hiding this comment

benoitf Apr 8, 2025

Choose a reason for hiding this comment

axel7083 left a comment

Choose a reason for hiding this comment

benoitf commented Apr 8, 2025

axel7083 left a comment

Choose a reason for hiding this comment

Deep dive

benoitf commented Apr 8, 2025

axel7083 commented Apr 8, 2025

benoitf commented Apr 8, 2025

axel7083 left a comment

Choose a reason for hiding this comment

jeffmaury left a comment

Choose a reason for hiding this comment

ScrewTSW left a comment

Choose a reason for hiding this comment

benoitf commented Mar 14, 2025 •

edited

Loading

axel7083 left a comment •

edited

Loading

axel7083 left a comment •

edited

Loading

ScrewTSW commented Apr 1, 2025 •

edited

Loading