RapidAPI listing
Use RapidAPI when marketplace billing, app keys, and procurement are easier than direct prepaid credits.
- Marketplace subscription flow
- Useful for buyers that prefer RapidAPI keys
- Same Qwen3.6 access path positioning
Pricing out Claude Sonnet 4.5 API usage for coding or agent workloads? Start with Qwen3.6 35B-A3B FP8 on NVIDIA A100 80GB at open-weight token economics, then benchmark quality, latency, and cost on your own prompts. Qualified Qwen deployments target live API access in under 24 hours after access, capacity, and compliance approval.
New here? Buy credits → get reviewed → receive API key + quickstart by email.
Rough workload details are enough. Sensitive and large deployments are reviewed for model license, GPU availability, jurisdiction, safety, and compliance fit. Card, invoice, USDC, and USDT payment are available for approved customers where permitted.
Start free, prepay only what you need, buy short Qwen3.6 passes for burst testing, use RapidAPI marketplace billing, or request reserved NVIDIA A100 capacity for steady workloads.
Move from shared API testing to reserved or dedicated GPU capacity when volume, privacy, latency, or predictable cost matters. Qualified Qwen deployments target live API access in under 24 hours after access, capacity, and compliance approval.
Use RapidAPI when marketplace billing, app keys, and procurement are easier than direct prepaid credits.
Validate Qwen3.6 quality, latency fit, context behavior, and integration before buying.
Use direct Stripe checkout when you want simple token billing without seats or a sales call.
Launch week promo: get 50% extra prepaid usage credit after checkout review through May 17, 2026.
Connect an OpenAI-compatible app, run prompt or RAG evaluation batches, verify streaming and usage accounting, or fund a small internal demo before choosing reserved capacity.
Time-boxed shared Qwen3.6 access with no per-token billing during the pass.
Different buyers use the same NVIDIA-backed inference stack through the path that matches their budget, procurement, privacy, and speed requirements.
Privacy, reserved capacity, benchmarked cost, and latency review.
Fast integration for startups and small businesses with predictable token pricing.
Ask for model help, credits, or educational access planning.
Research labs and nonprofits are eligible for model guidance and access planning based on available capacity.
Receive a model recommendation, estimated cost, and suggested access path.
OpenRouter floor prices are shown as public-market benchmarks. LighterHub public API pricing uses per-token billing; reserved capacity is quoted after benchmark.
Pricing audited against OpenRouter's public model API on May 10, 2026. Cache discounts are shown only for models where OpenRouter exposes cached-read pricing.
qwen/qwen3.6-35b-a3b · Qwen/Qwen3.6-35B-A3B-FP8
Best forLong-context RAG, document automation, general agents.
LighterHub fitCurrent route with validated context and cache-aware pricing.
Deployment noteAggregate tok/s is load-test throughput; per-request latency is measured against the customer workload.
Qwen/Qwen3.5-35B-A3B
Best forQwen-family continuity for long-context workloads.
LighterHub fitSame family makes migration and comparison straightforward.
Deployment noteRecommended when it improves cost, availability, or migration continuity versus the current Qwen3.6 route.
Qwen/Qwen3-Coder-Next
Best forCoding agents, repo analysis, IDE workflows.
LighterHub fitStrong fit for developer customers and agentic coding benchmarks.
Deployment noteTool behavior, throughput, and memory profile are validated with customer benchmarks before launch.
Qwen/Qwen3-VL-30B-A3B-Instruct
Best forVision-language document and image workflows.
LighterHub fitAdds multimodal option for document and image-heavy teams.
Deployment noteVision route availability is confirmed after media payload and throughput validation.
google/gemma-4-31B-it
Best forQuality-sensitive chat, multimodal candidates.
LighterHub fitLower-output-cost candidate with an Apache 2.0 model page.
Deployment noteServing performance and context behavior are benchmarked against customer prompts.
google/gemma-4-26B-A4B-it
Best forCost-sensitive chat and education workloads.
LighterHub fitCost-focused option for schools, nonprofits, and startups.
Deployment noteQuality is benchmarked against customer prompts before recommendation.
mistralai/Mistral-Small-3.2-24B-Instruct-2506
Best forFast instruction following, RAG, broad app integration.
LighterHub fitCost-efficient option when 128K context is enough.
Deployment noteRecommended for workloads whose context requirements fit inside 128K.
meta-llama/Llama-3.3-70B-Instruct
Best forEnterprise familiarity and broad evaluation baselines.
LighterHub fitFamiliar model family can simplify buyer evaluation.
Deployment noteLicense and full-context serving plan are confirmed during intake.
openai/gpt-oss-120b
Best forReasoning, agentic workflows, frontier open-weight testing.
LighterHub fitHigh-priority reasoning evaluation when quality changes product economics.
Deployment noteCapacity and policy requirements are confirmed during intake.
allenai/Olmo-3-32B-Think
Best forOpen research, reasoning comparison, education.
LighterHub fitStrong open research story for colleges and labs.
Deployment noteOpenRouter lists the model; LighterHub benchmarks workload fit before offering a route.
No models match those filters.
Pricing transparency. Public/self-serve API access uses per-token pricing. Model rows show OpenRouter floor prices as a public-market benchmark, not a reserved-capacity quote.
Cached reads are discounted only where the selected model exposes cached-read pricing. Otherwise the calculator treats cached-read volume as normal input volume to avoid underestimating cost.
Qualified bulk workloads are eligible to request a reserved-capacity pilot discount. Final terms are based on model fit, GPU availability, usage profile, and compliance review.
Enterprise-grade does not mean enterprise-only. It means the path from shared API to reserved GPU capacity is explicit, benchmarked, and confirmed before commitments are made.
Qualified bulk workloads are eligible to request up to 20% off the first 30 days of a reserved-capacity pilot. Final terms are based on model fit, GPU availability, usage profile, and compliance review. USDC and USDT are available for approved customers where permitted.
For qualified Qwen deployments, LighterHub targets live API access in under 24 hours after access, GPU availability, and compliance approval. Custom model moves are benchmarked before launch.
The current route supports OpenAI-compatible chat completions, streaming and non-streaming responses, usage accounting, prefix/cache-aware pricing where supported, and clean overload behavior.
curl https://api.lighterhub.app/v1/chat/completions \
-H "Authorization: Bearer $LIGHTERHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "qwen/qwen3.6-35b-a3b",
"messages": [
{"role": "user", "content": "Summarize this policy memo."}
],
"stream": true,
"max_tokens": 700
}'
Clear constraints help the right customers start faster and prevent unsupported expectations.
No. Startups, small businesses, students, colleges, labs, nonprofits, and enterprises are welcome to request access. Larger or sensitive deployments receive deeper intake.
No. Sensitive or large deployments go through model license, GPU availability, jurisdiction, safety, and compliance review. LighterHub supports customer-defined policy layers where appropriate.
Most approved Qwen3.6 prepaid requests are provisioned within 1 hour when capacity is ready. Custom, high-volume, or reserved-capacity requests target access within 24 hours after payment, capacity confirmation, and compliance review.
Your payment creates a setup request. LighterHub reviews payment, region, workload fit, and current Qwen3.6 capacity. If approved, you receive an API key and quickstart instructions by email, usually within 1 hour. Qualified requests target access within 24 hours.
LighterHub reviews customers worldwide where permitted. Priority launch markets include the United States, Canada, United Kingdom, Australia, New Zealand, Japan, South Korea, Taiwan, Belgium, Denmark, Finland, France, Germany, Ireland, Italy, Netherlands, Norway, Spain, and Sweden. Southeast Asia, including Vietnam, Thailand, Singapore, Malaysia, Indonesia, the Philippines, and Brunei, is available through manual review. Access depends on sanctions screening, export-control review, payment availability, model-license fit, capacity, and acceptable-use review.
Shared access is offered without a formal enterprise SLA. Reserved-capacity terms are quoted after benchmark and operational review.
Public API prices must match backend billing before deployment. Reserved capacity is quoted separately after workload benchmark and capacity planning.