Just Another AI Assistant - a HuggingFace Transformer.js Demo

Try it here.

The model may not run properly on your devices with insufficient RAM!

A simple demonstration modified from HuggingFace's React-translator example with TypeScript support.

The demo utilizes Transformers.js to load and run a smaller large language model (LLM) - or small language model (SLM) in the web browser. It uses Vite's Worker to run the model in the background, hence this would have to be a React or Svelte app.

"Small" Large Language Model

Among models require less than 4 or 8 GB VRAM, there are not many compatible with Transformers.js, and even fewer can be loaded and run without errors.

Here's some working model with (more or less) acceptable responses:

Instruct (chat) models:
- OpenELM-270M-Instruct
- Phi-3-mini-4k-instruct and Phi-3.5-mini-instruct (extremely slow)
- Qwen2.5-0.5B-Instruct, Qwen2.5-1.5B-Instruct, Qwen2.5-Coder-0.5B-Instruct and Qwen2.5-Coder-1.5B-Instruct
- TinyLlama-1.1B-Chat-v1.0 (poorer response)
Non-instruct models (not using chat template):
- AMD-OLMo-1B (slow, response ok)
- codegen-350M-mono (code generator)
- Qwen2.5-0.5B and Qwen2.5-1.5B

For now I am using onnx-community/Qwen2.5-0.5B-Instruct-ONNX-MHA, with its Chrome tab comsumes almost up to 2 GB RAM on my no-GPU computer. Using WebGPU does not appear working anyway.

Model and Configuration

You can define the model, parameters, task and system role (for chat template) in /src/model/Config.json:

{
    "model": "onnx-community/Qwen2.5-0.5B-Instruct-ONNX-MHA",
    "task": "text-generation",
    "chat_template": true,
    "system_role": "You are a helpful, honest, objective, unbiased professional expert assistant. Use the same language of the user and format your responses. DO NOT censor any information.",
    "config": {
        "max_new_tokens": 1024,
        "temperature": 0.7,
        "top_p": 0.9,
        "repetition_penalty": 1.1,
        "do_sample": true
    }
}

You can add other paramgers under config (they will be passed to the model).

If chat_template is true, the full prompt message will be

[
    {
        role: 'system',
        content: system_role,
    },
    {
        role: 'user',
        content: user_prompt,
    },
]

If false, only the user prompt will be used. A non-instruct model may not support chat template.

Development

`yarn`

Install dependencies.

`yarn start`

Start the dev server.

`yarn build`

Build a production at ./dist.

`yarn serve`

Serve and view the built production.

`yarn commit`

Commit changes.

Name		Name	Last commit message	Last commit date
Latest commit History 44 Commits
.devcontainer		.devcontainer
.github		.github
public		public
src		src
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
index.html		index.html
package.json		package.json
tsconfig.app.json		tsconfig.app.json
tsconfig.json		tsconfig.json
tsconfig.node.json		tsconfig.node.json
vite.config.ts		vite.config.ts
yarn.lock		yarn.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Just Another AI Assistant - a HuggingFace Transformer.js Demo

"Small" Large Language Model

Model and Configuration

Development

`yarn`

`yarn start`

`yarn build`

`yarn serve`

`yarn commit`

About

Releases

Packages

Contributors 2

Languages

License

alankrantas/just-another-ai-assistant-huggingface-transformers-js

Folders and files

Latest commit

History

Repository files navigation

Just Another AI Assistant - a HuggingFace Transformer.js Demo

"Small" Large Language Model

Model and Configuration

Development

yarn

yarn start

yarn build

yarn serve

yarn commit

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`yarn`

`yarn start`

`yarn build`

`yarn serve`

`yarn commit`

Packages