We host high-demand LLMs on dedicated GPU capacity and sell inference through OpenAI-compatible APIs. Competitive per-token pricing, no prompt logging.
Standard /v1/chat/completions and /v1/completions endpoints. Drop-in replacement for any OpenAI client.
We track high-demand open-weight models with thin provider coverage and spin up capacity where supply is needed.
NVIDIA A100 and H100 capacity. No shared queues, no cold starts, predictable latency.
Per-token pricing published via API. No hidden fees, no prompt logging, no training on your data.
We focus on models where demand is high and provider coverage is thin. Pricing is competitive with or below current market rates.
| Model | Input / Output per 1M tokens |
|---|---|
|
Qwen3.6 35B-A3B
|
$0.15 / $1.00 |
|
Gemma 4 27B
|
$0.10 / $0.30 |
|
Mistral Small 4 24B
|
$0.15 / $0.60 |
|
Qwen3.6 27B
|
$0.32 / $3.20 |
|
Gemma 4 26B-A4B
|
$0.06 / $0.33 |
Interested in dedicated capacity, volume pricing, or custom model hosting? Get in touch.
founder@lighterhub.app