RKLLAMA now implements an Ollama-compatible API, providing an interface that matches key Ollama endpoints. This enables integration with many tools and scripts designed for Ollama's API, particularly those using the chat and generate functionality.
Endpoint | Method | Description | Status |
---|---|---|---|
/api/tags |
GET | List available models | ✅ |
/api/version |
GET | Get API version (Dummy version to fix some apps) | ✅ |
/api/show |
POST | Show model information | ✅ |
/api/create |
POST | Create model from Modelfile | |
/api/pull |
POST | Pull a model | |
/api/delete |
DELETE | Delete a model | ✅ |
/api/generate |
POST | Generate a completion | ✅ |
/api/chat |
POST | Generate a chat completion | ✅ |
/api/embeddings |
POST | Generate embeddings | ❌ Not implemented |
/api/embed |
POST | Generate embeddings | ❌ Not implemented |
This endpoint uses a chat-style format with message roles and is best for multi-turn conversations:
curl -X POST http://localhost:8080/api/chat -d '{
"model": "qwen2.5:3b",
"messages": [
{"role": "user", "content": "Hello, how are you today?"}
]
}'
This endpoint is used for single-turn completions based on a prompt:
curl -X POST http://localhost:8080/api/generate -d '{
"model": "qwen2.5:3b",
"prompt": "Write a poem about AI"
}'
curl http://localhost:8080/api/tags
RKLLAMA automatically detects whether you're using an RK3588 or RK3576 platform. If detection fails, you'll be prompted to select your CPU model:
CPU model not detected automatically.
Please select your CPU model:
1) rk3588
2) rk3576
Enter selection (1-2):
- Model Format: RKLLAMA uses
.rkllm
files optimized for Rockchip NPUs, not Ollama's format - Modelfile Requirements: RKLLAMA Modelfiles require a
HUGGINGFACE_PATH
parameter - NPU Acceleration: RKLLAMA runs on NPU hardware rather than CPU/GPU
- External Tokenizers: RKLLAMA uses HuggingFace tokenizers fetched at runtime
For troubleshooting purposes, RKLLAMA includes an optional debug mode that can be enabled when starting the server:
# Using the rkllama command
rkllama serve --debug
# Or directly with the server script
bash ~/RKLLAMA/server.sh --debug
When debug mode is enabled:
- Detailed logs are written to
~/RKLLAMA/rkllama_debug.log
- Additional diagnostic information is displayed in the console
- A special
/api/debug
endpoint becomes available (advanced users only)
Debug mode is entirely optional and not needed for normal operation.
Recent updates have significantly improved streaming reliability:
- Enhanced "done" signaling for proper stream completion
- Fixed token tracking across streaming sessions
- Only core endpoints (
/api/chat
and/api/generate
) are fully implemented - Pull and Create endpoints have basic implementations
- Embeddings API is not currently implemented
- Some advanced Ollama formatting features are not yet supported
- Not all Ollama clients have been tested for compatibility
- OpenAI API compatibility is still in development
- Check if server is running and accessible
- Verify models are properly loaded before making requests
- Try non-streaming requests (
"stream": false
) if streaming has issues - Ensure your Modelfile includes required
HUGGINGFACE_PATH
parameter - For advanced troubleshooting, enable debug mode with
--debug
flag