Request 1M free trial tokens for qualified testing. Current proven route: Qwen3.6 35B-A3B FP8 on NVIDIA A100 80GB.
Open-weight model inference via API

AI inference on NVIDIA GPUs. Accessible to start.

Try before a sales call. Request 1M free trial tokens for Qwen3.6, compare open-weight model economics, then scale to reserved NVIDIA GPU capacity only when the workload earns it.

Rough workload details are enough. Sensitive and large deployments are reviewed for model license, GPU availability, jurisdiction, safety, and compliance fit. Card, invoice, USDC, and USDT payment can be discussed for approved customers where permitted.

Qwen3.6 proven routeFP8 route for qwen/qwen3.6-35b-a3b.
262K contextValidated long-context route.
NVIDIA A100 80GBCurrent accelerator class.
No prompt payload loggingUsage metadata only.

Start free, buy credits, or use a short pilot pass.

Use the path that matches your urgency. Developers can test first, prepaid buyers can top up directly, and urgent Qwen3.6 workloads can request a guided pass before committing to reserved capacity.

01

Free trial

1M tokens

Request free Qwen3.6 test tokens. Best for checking quality, integration fit, context behavior, and cost before buying.

02

Prepaid credits

$10 / $25 / $50

Usage-based API credits for teams that want direct billing without seats, subscriptions, or a sales call.

Launch bonus may add promotional credits while available.
03

Qwen3.6 pilot pass

$49

Seven-day guided access for urgent Qwen3.6 workloads. Includes $75 usage credit and setup help for your toolchain.

Refund boundary applies if LighterHub cannot return a valid response for a valid onboarding request.
04

RapidAPI

Marketplace

Use RapidAPI when marketplace billing, app keys, and procurement are easier than direct prepaid credits.

Who uses LighterHub?

Different buyers use the same NVIDIA-backed inference stack through the path that matches their budget, procurement, privacy, and speed requirements.

Enterprise

Benchmark and capacity fit

Privacy, reserved capacity, benchmarked cost, and latency review.

Developers

API or RapidAPI

Fast integration for startups and small businesses with predictable token pricing.

Students

Low-friction experiments

Request model help, credits, or educational access review.

Colleges and nonprofits

Affordable open-model help

Research labs and nonprofits can ask for model guidance, subject to capacity.

Unsure

Send rough details

Receive a model recommendation, rough cost estimate, and suggested access path.

Try first, then decide.

The fastest path is a small workload test, not a long sales call. Send rough details, request 1M free trial tokens, and use the result to decide whether shared API, RapidAPI, or reserved GPU capacity makes sense.

  • Start with a 1M-token trial request for qualified test workloads.
  • Use OpenAI-compatible API calls before committing to capacity.
  • Compare Qwen, Gemma, Mistral, Llama, gpt-oss, OLMo, and other Hugging Face models.
  • Receive a tailored model, price, context, and access-path recommendation.
  • Move to reserved or dedicated NVIDIA GPU capacity when usage justifies it.
  • Launch prepared GPU snapshots when model, capacity, and review line up.

Model and pricing comparison.

OpenRouter floor prices are shown as public-market benchmarks. LighterHub public API pricing uses per-token billing; reserved capacity is quoted after benchmark.

Pricing audited against OpenRouter's public model API on May 10, 2026. Cache discounts are shown only for models where OpenRouter exposes cached-read pricing.

Estimate my cost
Model Best for Status NVIDIA A100 80GB fit Context OpenRouter floor price Why LighterHub would host it Caveat
Qwen3.6 35B-A3B FP8qwen/qwen3.6-35b-a3b · served with Qwen/Qwen3.6-35B-A3B-FP8 Long-context RAG, document automation, general agents Current route Proven on 1x NVIDIA A100 80GB FP8 262K $0.15 in / $1.00 out / $0.05 cache Current route with validated context and cache-aware pricing. Aggregate tok/s is not per-request latency.
Qwen3.5 35B-A3BQwen/Qwen3.5-35B-A3B Fallback Qwen-family long-context workloads Benchmark-ready Likely fit after quantization and context review 262K $0.14 in / $1.00 out / $0.05 cache Same family makes migration and comparison straightforward. Must beat or complement the current Qwen3.6 route.
Qwen3-Coder-NextQwen/Qwen3-Coder-Next Coding agents, repo analysis, IDE workflows Private review Review required; full-context serving may need more than 1 GPU 262K $0.11 in / $0.80 out / $0.07 cache Strong fit for developer customers and agentic coding benchmarks. Tool behavior and serving profile need workload-specific validation.
Qwen3 VL 30B-A3B InstructQwen/Qwen3-VL-30B-A3B-Instruct Vision-language document and image workflows Benchmark-ready Benchmark memory and media workload shape 131K $0.13 in / $0.52 out Adds multimodal option for document and image-heavy teams. Prototype should not imply public vision route is available.
Gemma 4 31B ITgoogle/gemma-4-31B-it Quality-sensitive chat, multimodal candidates Benchmark-ready Benchmark A100 fit before production 262K $0.13 in / $0.38 out Useful lower-output-cost candidate with Apache 2.0 model page. Serving performance and context fit still need verification.
Gemma 4 26B A4B ITgoogle/gemma-4-26B-A4B-it Cost-sensitive chat and education workloads Benchmark-ready Likely benchmark candidate on A100 262K $0.06 in / $0.33 out Promising affordable option for schools, nonprofits, and startups. Quality must be tested against real prompts.
Mistral Small 3.2 24Bmistralai/Mistral-Small-3.2-24B-Instruct-2506 Fast instruction following, RAG, broad app integration Benchmark-ready Likely fit; verify multimodal and latency profile 128K $0.075 in / $0.20 out Cost-efficient option when 128K context is enough. Not a substitute for 262K long-context needs.
Llama 3.3 70B Instructmeta-llama/Llama-3.3-70B-Instruct Enterprise familiarity, broad evaluation baselines Private review Likely quantized or multi-GPU review 131K $0.10 in / $0.32 out Familiar model family can simplify buyer evaluation. License and full-context serving review required.
gpt-oss-120bopenai/gpt-oss-120b Reasoning, agentic workflows, frontier open-weight testing Private review Likely multi-GPU or reduced-context review 131K $0.039 in / $0.18 out Worth testing where reasoning quality changes product economics. Capacity and moderation expectations need review.
OLMo 3 32B Thinkallenai/Olmo-3-32B-Think Open research, reasoning comparison, education HF-only / worth testing Benchmark fit before any public route 65K $0.15 in / $0.50 out Strong open research story for colleges and labs. OpenRouter now lists it, but route readiness still needs LighterHub benchmark.

The table scrolls sideways on smaller screens.

Pricing transparency. Public/self-serve API access uses per-token pricing. Model rows show OpenRouter floor prices as a public-market benchmark, not a reserved-capacity quote.

Cached reads are discounted only where the selected model exposes cached-read pricing. Otherwise the calculator treats cached-read volume as normal input volume to avoid underestimating cost.

Qualified bulk workloads can request a reserved-capacity pilot discount, subject to model, GPU availability, usage profile, and review.

Rough public-token estimate

Estimated monthly token cost $17.50 Rough planning figure only. Reserved capacity is quoted after benchmark.

Request access for your workload.

Send rough details for model fit, pricing guidance, and the 1M-token trial. No card. No commitment.

Customer type
Main priority

Rough answers are enough. We use this only to reply and recommend a model. Do not paste secrets or private data.

Request received.

LighterHub will evaluate your workload and reply with a recommended model, rough token cost, and suggested access path.

Email fallback

Scale when the workload earns dedicated planning.

Enterprise-grade does not mean enterprise-only. It means the path from shared API to reserved GPU capacity is explicit, benchmarked, and reviewed before promises are made.

Reserved Capacity Pilot

Request capacity fit

Qualified bulk workloads can request up to 20% off the first 30 days of a reserved-capacity pilot, subject to model, GPU availability, usage profile, and review. USDC and USDT can be considered for approved customers where permitted.

Designed for rapid launch.

When the model, capacity, and review line up, LighterHub can bring a prepared GPU snapshot online quickly instead of starting from scratch.

Model size GPU availability Snapshot readiness Traffic shape Safety review Compliance review

OpenAI-compatible integration.

The current route supports OpenAI-compatible chat completions, streaming and non-streaming responses, usage accounting, prefix/cache-aware pricing where supported, and clean overload behavior.

Example request Do not paste secrets into demos
curl https://api.lighterhub.app/v1/chat/completions \
  -H "Authorization: Bearer $LIGHTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.6-35b-a3b",
    "messages": [
      {"role": "user", "content": "Summarize this policy memo."}
    ],
    "stream": true,
    "max_tokens": 700
  }'

Trust boundaries.

Clear constraints help the right customers start faster and prevent unsupported expectations.

Is LighterHub enterprise-only?

No. Startups, small businesses, students, colleges, labs, nonprofits, and enterprises can ask for model help. Larger or sensitive deployments receive deeper review.

Are all workloads accepted?

No. Sensitive or large deployments are subject to model license, GPU availability, jurisdiction, safety, and compliance review. LighterHub supports customer-defined policy layers where appropriate.

Is there a formal SLA?

LighterHub does not advertise a formal enterprise SLA for shared access. Reserved-capacity terms are quoted after benchmark and operational review.

How does billing go live?

Public API prices must match backend billing before deployment. Reserved capacity is quoted separately after workload benchmark and capacity planning.