GPU inference for the latest open-weight models

We host high-demand LLMs on dedicated GPU capacity and sell inference through OpenAI-compatible APIs. Competitive per-token pricing, no prompt logging.

OpenAI-compatible

Standard /v1/chat/completions and /v1/completions endpoints. Drop-in replacement for any OpenAI client.

Latest models

We track high-demand open-weight models with thin provider coverage and spin up capacity where supply is needed.

Dedicated GPUs

NVIDIA A100 and H100 capacity. No shared queues, no cold starts, predictable latency.

Transparent pricing

Per-token pricing published via API. No hidden fees, no prompt logging, no training on your data.

Models we host

We focus on models where demand is high and provider coverage is thin. Pricing is competitive with or below current market rates.

Model Input / Output per 1M tokens
Qwen3.6 35B-A3B
MoE · 3B active · 262K context
$0.15 / $1.00
Gemma 4 27B
Dense · 27B params · 128K context
$0.10 / $0.30
Mistral Small 4 24B
Dense · 24B params · 262K context
$0.15 / $0.60
Qwen3.6 27B
Dense · 27B params · 262K context
$0.32 / $3.20
Gemma 4 26B-A4B
MoE · 4B active · 262K context
$0.06 / $0.33

Enterprise inquiries

Interested in dedicated capacity, volume pricing, or custom model hosting? Get in touch.

founder@lighterhub.app