Available Models

Returns a list of all Eburon AI models currently deployed on this VPS. Model names are aliased to precious metals for privacy.

GET /api/tags

Response

models array

Array of model objects

Generate Completion

Generate a text completion from a prompt. Use for single-turn tasks like reasoning, writing, or code generation.

POST /api/generate

Parameters

model required

Selected model name

prompt required

The prompt to generate from

system string

System prompt override

stream boolean

Stream tokens as they're generated

options object

temperature, top_p, top_k, num_predict (max tokens), seed

Chat Completion

Generate the next message in a conversation. Maintains context across multiple turns for coherent dialogues.

POST /api/chat

Parameters

model required

Selected model name

messages required

Array of {role: "system"|"user"|"assistant", content: "..."}

stream boolean

Stream tokens as they're generated

options object

temperature, top_p, num_predict, stop (stop sequences)

Create Embedding

Generate vector embeddings for text. Use with vector databases for RAG, semantic search, and similarity matching.

POST /api/embeddings

Parameters

model required

Embedding model name

prompt required

Text to embed into a vector

Pull a Model

Download a model from Ollama library. Progress streams as Server-Sent Events.

POST /api/pull

Parameters

name required

Model to pull (e.g., llama3:8b)

stream boolean

Stream progress updates

Model Info

Get detailed information about a specific model.

GET /api/show

Parameters

name required

Full model name

Delete Model

Remove a model to free up disk space.

DELETE /api/delete

Parameters

name required

Full model name to delete

Ollama CLI

Direct Ollama commands available on this VPS.

Available Commands

ollama list

List all installed models

ollama pull <model>

Download a model from library

ollama show <model>

Show model details

ollama run <model>

Run model interactively

ollama create <name> -f <Modelfile>

Create model from Modelfile

ollama cp <src> <dst>

Copy a model

ollama rm <model>

Delete a model

cURL Examples

Using cURL to interact with the API from command line.

Examples

# List models

curl https://llm.eburon.ai/api/tags

# Chat completion

curl -X POST https://llm.eburon.ai/api/chat \
-H "Content-Type: application/json" \
-d '{"model":"qwen3.5:latest","messages":[{"role":"user","content":"Hello"}]}'

# Generate with streaming

curl -X POST https://llm.eburon.ai/api/generate \
-H "Content-Type: application/json" \
-d '{"model":"minimax-m2.7:cloud","prompt":"Hi","stream":true}'

# Create embedding

curl -X POST https://llm.eburon.ai/api/embeddings \
-H "Content-Type: application/json" \
-d '{"model":"embeddinggemma:latest","prompt":"Hello world"}'

Python SDK

Using Python with httpx to call the API.

Example Code

import httpx

client = httpx.Client()

# Chat
resp = client.post("https://llm.eburon.ai/api/chat",
    json={"model": "qwen3.5:latest",
          "messages": [{"role": "user", "content": "Hello"}]})
print(resp.json()["message"]["content"])

# Embedding
resp = client.post("https://llm.eburon.ai/api/embeddings",
    json={"model": "embeddinggemma:latest",
          "prompt": "Hello world"})
print(resp.json()["embedding"][:5])

Node.js SDK

Using Node.js fetch or axios to call the API.

Example Code

const resp = await fetch('https://llm.eburon.ai/api/chat', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    model: 'qwen3.5:latest',
    messages: [{role: 'user', content: 'Hello'}]
  })
});
const data = await resp.json();
console.log(data.message.content);