Try it here.
The model may not run properly on your devices with insufficient RAM!
A simple demonstration modified from HuggingFace's React-translator example with TypeScript support.
The demo utilizes Transformers.js
to load and run a smaller large language model (LLM) - or small language model (SLM) in the web browser. It uses Vite
's Worker
to run the model in the background, hence this would have to be a React or Svelte app.
Among models require less than 4 or 8 GB VRAM, there are not many compatible with Transformers.js
, and even fewer can be loaded and run without errors.
Here's some working model with (more or less) acceptable responses:
-
Instruct (chat) models:
OpenELM-270M-Instruct
Phi-3-mini-4k-instruct
andPhi-3.5-mini-instruct
(extremely slow)Qwen2.5-0.5B-Instruct
,Qwen2.5-1.5B-Instruct
,Qwen2.5-Coder-0.5B-Instruct
andQwen2.5-Coder-1.5B-Instruct
TinyLlama-1.1B-Chat-v1.0
(poorer response)
-
Non-instruct models (not using chat template):
AMD-OLMo-1B
(slow, response ok)codegen-350M-mono
(code generator)Qwen2.5-0.5B
andQwen2.5-1.5B
For now I am using onnx-community/Qwen2.5-0.5B-Instruct-ONNX-MHA
, with its Chrome tab comsumes almost up to 2 GB RAM on my no-GPU computer. Using WebGPU does not appear working anyway.
You can define the model, parameters, task and system role (for chat template) in /src/model/Config.json
:
{
"model": "onnx-community/Qwen2.5-0.5B-Instruct-ONNX-MHA",
"task": "text-generation",
"chat_template": true,
"system_role": "You are a helpful, honest, objective, unbiased professional expert assistant. Use the same language of the user and format your responses. DO NOT censor any information.",
"config": {
"max_new_tokens": 1024,
"temperature": 0.7,
"top_p": 0.9,
"repetition_penalty": 1.1,
"do_sample": true
}
}
You can add other paramgers under config
(they will be passed to the model).
If chat_template
is true
, the full prompt message will be
[
{
role: 'system',
content: system_role,
},
{
role: 'user',
content: user_prompt,
},
]
If false
, only the user prompt will be used. A non-instruct model may not support chat template.
Install dependencies.
Start the dev server.
Build a production at ./dist
.
Serve and view the built production.
Commit changes.