NuStudio

Inference gateway

models.nuscale.io OpenAI-compatible API · vLLM

Production LLM endpoints aligned with the NuStudio stack. Use the base URLs below from browsers, IDEs, and automation.

Base URLs

RoleBase URL
General instruct (prod-01) https://models.nuscale.io/v1
Coder / agents (prod-02, vLLM) https://models.nuscale.io/coder/v1

Set OPENAI_BASE_URL or client base_url to one of the above (keep the /v1 segment).

OpenAI API (coder): Use OPENAI_BASE_URL=https://models.nuscale.io/coder/v1 and model qwen2.5-coder:7b. vLLM serves real tool_calls for Qwen Code / Cursor agents (Ollama often prints tool JSON in content instead).

Models

Served name (model field)Weights
qwen2.5-7b-instruct Qwen2.5-7B-Instruct-AWQ
qwen2.5-coder:7b Qwen2.5-Coder-7B-Instruct-AWQ (vLLM, tool parser enabled)

Run GET …/v1/models on each base URL for live limits and metadata.

List models

Instruct pool

curl -sS https://models.nuscale.io/v1/models | python3 -m json.tool

Coder (prod-02)

curl -sS https://models.nuscale.io/coder/v1/models | python3 -m json.tool

Chat completions

General instruct

curl -sS https://models.nuscale.io/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-7b-instruct",
    "messages": [{"role": "user", "content": "Say hello in one sentence."}],
    "max_tokens": 128,
    "temperature": 0.7
  }'

Coder (prod-02)

curl -sS https://models.nuscale.io/coder/v1/chat/completions \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen2.5-coder:7b",
    "messages": [{"role": "user", "content": "Write a Python function that returns the sum of two integers."}],
    "max_tokens": 512,
    "temperature": 0.3
  }'

Notes