GPU inference for the latest open-weight models

We host high-demand LLMs on dedicated GPU capacity and sell inference through OpenAI-compatible APIs. Competitive per-token pricing, no prompt logging.

OpenAI-compatible

Standard /v1/chat/completions and /v1/completions endpoints. Drop-in replacement for any OpenAI client.

Latest models

We track high-demand open-weight models with thin provider coverage and spin up capacity where supply is needed.

Dedicated GPUs

NVIDIA A100 and H100 capacity. No shared queues, no cold starts, predictable latency.

Transparent pricing

Per-token pricing published via API. No hidden fees, no prompt logging, no training on your data.

Models we host

We focus on models where demand is high and provider coverage is thin. Pricing is competitive with or below current market rates.

Model	Input / Output per 1M tokens
Qwen3.6 35B-A3B MoE · 3B active · 262K context	$0.15 / $1.00
Gemma 4 27B Dense · 27B params · 128K context	$0.10 / $0.30
Mistral Small 4 24B Dense · 24B params · 262K context	$0.15 / $0.60
Qwen3.6 27B Dense · 27B params · 262K context	$0.32 / $3.20
Gemma 4 26B-A4B MoE · 4B active · 262K context	$0.06 / $0.33

Enterprise inquiries

Interested in dedicated capacity, volume pricing, or custom model hosting? Get in touch.

founder@lighterhub.app