Free trial
1M tokensRequest free Qwen3.6 test tokens. Best for checking quality, integration fit, context behavior, and cost before buying.
Try before a sales call. Request 1M free trial tokens for Qwen3.6, compare open-weight model economics, then scale to reserved NVIDIA GPU capacity only when the workload earns it.
Rough workload details are enough. Sensitive and large deployments are reviewed for model license, GPU availability, jurisdiction, safety, and compliance fit. Card, invoice, USDC, and USDT payment can be discussed for approved customers where permitted.
Use the path that matches your urgency. Developers can test first, prepaid buyers can top up directly, and urgent Qwen3.6 workloads can request a guided pass before committing to reserved capacity.
Request free Qwen3.6 test tokens. Best for checking quality, integration fit, context behavior, and cost before buying.
Usage-based API credits for teams that want direct billing without seats, subscriptions, or a sales call.
Seven-day guided access for urgent Qwen3.6 workloads. Includes $75 usage credit and setup help for your toolchain.
Use RapidAPI when marketplace billing, app keys, and procurement are easier than direct prepaid credits.
Different buyers use the same NVIDIA-backed inference stack through the path that matches their budget, procurement, privacy, and speed requirements.
Privacy, reserved capacity, benchmarked cost, and latency review.
Fast integration for startups and small businesses with predictable token pricing.
Request model help, credits, or educational access review.
Research labs and nonprofits can ask for model guidance, subject to capacity.
Receive a model recommendation, rough cost estimate, and suggested access path.
The fastest path is a small workload test, not a long sales call. Send rough details, request 1M free trial tokens, and use the result to decide whether shared API, RapidAPI, or reserved GPU capacity makes sense.
OpenRouter floor prices are shown as public-market benchmarks. LighterHub public API pricing uses per-token billing; reserved capacity is quoted after benchmark.
Pricing audited against OpenRouter's public model API on May 10, 2026. Cache discounts are shown only for models where OpenRouter exposes cached-read pricing.
| Model | Best for | Status | NVIDIA A100 80GB fit | Context | OpenRouter floor price | Why LighterHub would host it | Caveat |
|---|---|---|---|---|---|---|---|
| Qwen3.6 35B-A3B FP8qwen/qwen3.6-35b-a3b · served with Qwen/Qwen3.6-35B-A3B-FP8 | Long-context RAG, document automation, general agents | Current route | Proven on 1x NVIDIA A100 80GB FP8 | 262K | $0.15 in / $1.00 out / $0.05 cache | Current route with validated context and cache-aware pricing. | Aggregate tok/s is not per-request latency. |
| Qwen3.5 35B-A3BQwen/Qwen3.5-35B-A3B | Fallback Qwen-family long-context workloads | Benchmark-ready | Likely fit after quantization and context review | 262K | $0.14 in / $1.00 out / $0.05 cache | Same family makes migration and comparison straightforward. | Must beat or complement the current Qwen3.6 route. |
| Qwen3-Coder-NextQwen/Qwen3-Coder-Next | Coding agents, repo analysis, IDE workflows | Private review | Review required; full-context serving may need more than 1 GPU | 262K | $0.11 in / $0.80 out / $0.07 cache | Strong fit for developer customers and agentic coding benchmarks. | Tool behavior and serving profile need workload-specific validation. |
| Qwen3 VL 30B-A3B InstructQwen/Qwen3-VL-30B-A3B-Instruct | Vision-language document and image workflows | Benchmark-ready | Benchmark memory and media workload shape | 131K | $0.13 in / $0.52 out | Adds multimodal option for document and image-heavy teams. | Prototype should not imply public vision route is available. |
| Gemma 4 31B ITgoogle/gemma-4-31B-it | Quality-sensitive chat, multimodal candidates | Benchmark-ready | Benchmark A100 fit before production | 262K | $0.13 in / $0.38 out | Useful lower-output-cost candidate with Apache 2.0 model page. | Serving performance and context fit still need verification. |
| Gemma 4 26B A4B ITgoogle/gemma-4-26B-A4B-it | Cost-sensitive chat and education workloads | Benchmark-ready | Likely benchmark candidate on A100 | 262K | $0.06 in / $0.33 out | Promising affordable option for schools, nonprofits, and startups. | Quality must be tested against real prompts. |
| Mistral Small 3.2 24Bmistralai/Mistral-Small-3.2-24B-Instruct-2506 | Fast instruction following, RAG, broad app integration | Benchmark-ready | Likely fit; verify multimodal and latency profile | 128K | $0.075 in / $0.20 out | Cost-efficient option when 128K context is enough. | Not a substitute for 262K long-context needs. |
| Llama 3.3 70B Instructmeta-llama/Llama-3.3-70B-Instruct | Enterprise familiarity, broad evaluation baselines | Private review | Likely quantized or multi-GPU review | 131K | $0.10 in / $0.32 out | Familiar model family can simplify buyer evaluation. | License and full-context serving review required. |
| gpt-oss-120bopenai/gpt-oss-120b | Reasoning, agentic workflows, frontier open-weight testing | Private review | Likely multi-GPU or reduced-context review | 131K | $0.039 in / $0.18 out | Worth testing where reasoning quality changes product economics. | Capacity and moderation expectations need review. |
| OLMo 3 32B Thinkallenai/Olmo-3-32B-Think | Open research, reasoning comparison, education | HF-only / worth testing | Benchmark fit before any public route | 65K | $0.15 in / $0.50 out | Strong open research story for colleges and labs. | OpenRouter now lists it, but route readiness still needs LighterHub benchmark. |
The table scrolls sideways on smaller screens.
Pricing transparency. Public/self-serve API access uses per-token pricing. Model rows show OpenRouter floor prices as a public-market benchmark, not a reserved-capacity quote.
Cached reads are discounted only where the selected model exposes cached-read pricing. Otherwise the calculator treats cached-read volume as normal input volume to avoid underestimating cost.
Qualified bulk workloads can request a reserved-capacity pilot discount, subject to model, GPU availability, usage profile, and review.
Enterprise-grade does not mean enterprise-only. It means the path from shared API to reserved GPU capacity is explicit, benchmarked, and reviewed before promises are made.
Qualified bulk workloads can request up to 20% off the first 30 days of a reserved-capacity pilot, subject to model, GPU availability, usage profile, and review. USDC and USDT can be considered for approved customers where permitted.
When the model, capacity, and review line up, LighterHub can bring a prepared GPU snapshot online quickly instead of starting from scratch.
The current route supports OpenAI-compatible chat completions, streaming and non-streaming responses, usage accounting, prefix/cache-aware pricing where supported, and clean overload behavior.
curl https://api.lighterhub.app/v1/chat/completions \
-H "Authorization: Bearer $LIGHTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3.6-35b-a3b",
"messages": [
{"role": "user", "content": "Summarize this policy memo."}
],
"stream": true,
"max_tokens": 700
}'
Clear constraints help the right customers start faster and prevent unsupported expectations.
No. Startups, small businesses, students, colleges, labs, nonprofits, and enterprises can ask for model help. Larger or sensitive deployments receive deeper review.
No. Sensitive or large deployments are subject to model license, GPU availability, jurisdiction, safety, and compliance review. LighterHub supports customer-defined policy layers where appropriate.
LighterHub does not advertise a formal enterprise SLA for shared access. Reserved-capacity terms are quoted after benchmark and operational review.
Public API prices must match backend billing before deployment. Reserved capacity is quoted separately after workload benchmark and capacity planning.