1M trial tokens available for qualified testing. Current route: Qwen3.6 35B-A3B FP8 on NVIDIA A100 80GB.
Open-weight model inference via API

AI inference on NVIDIA GPUs. Accessible to start.

Use OpenAI-compatible Qwen3.6 inference on NVIDIA A100 80GB. Start with prepaid credits, request 1M trial tokens, or move steady workloads into reserved GPU capacity after benchmark.

Rough workload details are enough. Sensitive and large deployments are reviewed for model license, GPU availability, jurisdiction, safety, and compliance fit. Card, invoice, USDC, and USDT payment can be discussed for approved customers where permitted.

Qwen3.6 proven routeFP8 route for qwen/qwen3.6-35b-a3b.
262K contextValidated long-context route.
NVIDIA A100 80GBCurrent accelerator class.
No prompt payload loggingUsage metadata only.

Ways to start.

Start free, prepay only what you need, buy short Qwen3.6 passes for burst testing, use RapidAPI marketplace billing, or request reserved NVIDIA A100 capacity for steady workloads.

Try first No card

1M trial tokens

$0

Validate Qwen3.6 quality, latency fit, context behavior, and integration before buying.

  • Best first step for new teams
  • Good for model-fit and cost checks
  • Rough workload details are enough
Pay as you go No subscription

Prepaid API credits

From $10

Use direct Stripe checkout when you want simple token billing without seats or a sales call.

Good for

Connect an OpenAI-compatible app, run prompt or RAG evaluation batches, verify streaming and usage accounting, or fund a small internal demo before choosing reserved capacity.

Example at current Qwen3.6 pricing: with a 10:1 input/output mix, $10 is about 40M input + 4M output tokens. $25 is about 100M + 10M; $50 is about 200M + 20M before cache discounts.
Actual usage depends on output length, cache use, and selected model. Buy only what you need; request guidance when comparing model options.
Marketplace Procurement

RapidAPI listing

Use RapidAPI when marketplace billing, app keys, and procurement are easier than direct prepaid credits.

Enterprise capacity path Reserved NVIDIA A100 capacity for steady workloads.

Move from shared API testing to reserved or dedicated GPU capacity when volume, privacy, latency, or predictable cost matters. Qualified bulk workloads are eligible to request up to 20% off the first 30 days after benchmark.

  • Dedicated capacity planning for high-volume API usage
  • Private endpoint discussion for sensitive workloads
  • Benchmark-backed model and throughput recommendation
  • Card, RapidAPI, USDC, and USDT options where permitted
Up to 20% off First 30 days of a reserved-capacity pilot. Final terms are based on model fit, GPU availability, usage profile, and compliance review. Discuss reserved capacity

Who uses LighterHub?

Different buyers use the same NVIDIA-backed inference stack through the path that matches their budget, procurement, privacy, and speed requirements.

Enterprise

Benchmark and capacity fit

Privacy, reserved capacity, benchmarked cost, and latency review.

Developers

API or RapidAPI

Fast integration for startups and small businesses with predictable token pricing.

Students

Low-friction experiments

Ask for model help, credits, or educational access planning.

Colleges and nonprofits

Affordable open-model help

Research labs and nonprofits are eligible for model guidance and access planning based on available capacity.

Unsure

Get routed

Receive a model recommendation, estimated cost, and suggested access path.

Model and pricing comparison.

OpenRouter floor prices are shown as public-market benchmarks. LighterHub public API pricing uses per-token billing; reserved capacity is quoted after benchmark.

Pricing audited against OpenRouter's public model API on May 10, 2026. Cache discounts are shown only for models where OpenRouter exposes cached-read pricing.

Estimate my cost
10 models shown

Qwen3.6 35B-A3B FP8

qwen/qwen3.6-35b-a3b · Qwen/Qwen3.6-35B-A3B-FP8

Current route
262Kcontext $0.15 / $1.00input / output $0.05cached read 1x A100proven FP8

Best forLong-context RAG, document automation, general agents.

LighterHub fitCurrent route with validated context and cache-aware pricing.

Deployment noteAggregate tok/s is load-test throughput; per-request latency is measured against the customer workload.

Qwen3.5 35B-A3B

Qwen/Qwen3.5-35B-A3B

Benchmark-ready
262Kcontext $0.14 / $1.00input / output $0.05cached read A100deployment candidate

Best forQwen-family continuity for long-context workloads.

LighterHub fitSame family makes migration and comparison straightforward.

Deployment noteRecommended when it improves cost, availability, or migration continuity versus the current Qwen3.6 route.

Qwen3-Coder-Next

Qwen/Qwen3-Coder-Next

Capacity planning
262Kcontext $0.11 / $0.80input / output $0.07cached read Capacityworkload sizing

Best forCoding agents, repo analysis, IDE workflows.

LighterHub fitStrong fit for developer customers and agentic coding benchmarks.

Deployment noteTool behavior, throughput, and memory profile are validated with customer benchmarks before launch.

Qwen3 VL 30B-A3B Instruct

Qwen/Qwen3-VL-30B-A3B-Instruct

Benchmark-ready
131Kcontext $0.13 / $0.52input / output No cachepublished discount A100media benchmark

Best forVision-language document and image workflows.

LighterHub fitAdds multimodal option for document and image-heavy teams.

Deployment noteVision route availability is confirmed after media payload and throughput validation.

Gemma 4 31B IT

google/gemma-4-31B-it

Benchmark-ready
262Kcontext $0.13 / $0.38input / output No cachepublished discount A100benchmark first

Best forQuality-sensitive chat, multimodal candidates.

LighterHub fitLower-output-cost candidate with an Apache 2.0 model page.

Deployment noteServing performance and context behavior are benchmarked against customer prompts.

Gemma 4 26B A4B IT

google/gemma-4-26B-A4B-it

Benchmark-ready
262Kcontext $0.06 / $0.33input / output No cachepublished discount A100benchmark path

Best forCost-sensitive chat and education workloads.

LighterHub fitCost-focused option for schools, nonprofits, and startups.

Deployment noteQuality is benchmarked against customer prompts before recommendation.

Mistral Small 3.2 24B

mistralai/Mistral-Small-3.2-24B-Instruct-2506

Benchmark-ready
128Kcontext $0.075 / $0.20input / output No cachepublished discount A100benchmark path

Best forFast instruction following, RAG, broad app integration.

LighterHub fitCost-efficient option when 128K context is enough.

Deployment noteRecommended for workloads whose context requirements fit inside 128K.

Llama 3.3 70B Instruct

meta-llama/Llama-3.3-70B-Instruct

Capacity planning
131Kcontext $0.10 / $0.32input / output No cachepublished discount Capacityquantized or multi-GPU

Best forEnterprise familiarity and broad evaluation baselines.

LighterHub fitFamiliar model family can simplify buyer evaluation.

Deployment noteLicense and full-context serving plan are confirmed during intake.

gpt-oss-120b

openai/gpt-oss-120b

Capacity planning
131Kcontext $0.039 / $0.18input / output No cachepublished discount Capacitymulti-GPU review

Best forReasoning, agentic workflows, frontier open-weight testing.

LighterHub fitHigh-priority reasoning evaluation when quality changes product economics.

Deployment noteCapacity and policy requirements are confirmed during intake.

OLMo 3 32B Think

allenai/Olmo-3-32B-Think

HF model / benchmark path
65Kcontext $0.15 / $0.50input / output No cachepublished discount A100benchmark before route

Best forOpen research, reasoning comparison, education.

LighterHub fitStrong open research story for colleges and labs.

Deployment noteOpenRouter lists the model; LighterHub benchmarks workload fit before offering a route.

Pricing transparency. Public/self-serve API access uses per-token pricing. Model rows show OpenRouter floor prices as a public-market benchmark, not a reserved-capacity quote.

Cached reads are discounted only where the selected model exposes cached-read pricing. Otherwise the calculator treats cached-read volume as normal input volume to avoid underestimating cost.

Qualified bulk workloads are eligible to request a reserved-capacity pilot discount. Final terms are based on model fit, GPU availability, usage profile, and compliance review.

Rough public-token estimate

Estimated monthly token cost $17.50 Rough planning figure only. Reserved capacity is quoted after benchmark.

Request 1M tokens or model guidance.

Send rough details for model fit, pricing guidance, and trial access. No card. No commitment.

Customer type
Main priority

Rough answers are enough. We use this only to reply and recommend a model. Do not paste secrets or private data.

Request received.

LighterHub will evaluate your workload and reply with a recommended model, rough token cost, and suggested access path.

Email fallback

Scale when the workload earns dedicated planning.

Enterprise-grade does not mean enterprise-only. It means the path from shared API to reserved GPU capacity is explicit, benchmarked, and confirmed before commitments are made.

Reserved Capacity Pilot

Reserved capacity fit

Qualified bulk workloads are eligible to request up to 20% off the first 30 days of a reserved-capacity pilot. Final terms are based on model fit, GPU availability, usage profile, and compliance review. USDC and USDT are available for approved customers where permitted.

Designed for rapid launch.

When model, capacity, and compliance requirements are cleared, LighterHub can bring a prepared GPU snapshot online quickly instead of starting from scratch.

Model size GPU availability Snapshot readiness Traffic shape Safety review Compliance review

OpenAI-compatible integration.

The current route supports OpenAI-compatible chat completions, streaming and non-streaming responses, usage accounting, prefix/cache-aware pricing where supported, and clean overload behavior.

Example request Do not paste secrets into demos
curl https://api.lighterhub.app/v1/chat/completions \
  -H "Authorization: Bearer $LIGHTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.6-35b-a3b",
    "messages": [
      {"role": "user", "content": "Summarize this policy memo."}
    ],
    "stream": true,
    "max_tokens": 700
  }'

Trust boundaries.

Clear constraints help the right customers start faster and prevent unsupported expectations.

Is LighterHub enterprise-only?

No. Startups, small businesses, students, colleges, labs, nonprofits, and enterprises are welcome to request access. Larger or sensitive deployments receive deeper intake.

Are all workloads accepted?

No. Sensitive or large deployments go through model license, GPU availability, jurisdiction, safety, and compliance review. LighterHub supports customer-defined policy layers where appropriate.

Is there a formal SLA?

Shared access is offered without a formal enterprise SLA. Reserved-capacity terms are quoted after benchmark and operational review.

How does billing go live?

Public API prices must match backend billing before deployment. Reserved capacity is quoted separately after workload benchmark and capacity planning.