Launch: 50% bonus prepaid credits + 1M trial tokens. Approved prepaid API keys are typically sent within 30 minutes after review.
Coding-agent inference via API

Open-weight API inference for coding agents.

Run coding-agent workflows from Hermes Agent, OpenClaw, Roo Code, Cline-style clients, RAG, batch tests, and long-context codebase work on Qwen3.6 via an OpenAI-compatible API. Approved prepaid customers typically receive an API key within 30 minutes after review.

Agent-ready pricing$0.15/M input · $1/M output · $0.05/M cached read. Agent-app readyUse compatible clients with base URL, model ID, and key. Launch offer50% bonus credits · 1M trial tokens · fast reviewed setup.

Checkout starts reviewed setup; unsupported requests are refunded or rerouted. See what happens after payment.

Pay now, start today after approval. Approval review usually takes less than 15 minutes. Approved prepaid customers typically receive an API key and quickstart instructions within 30 minutes after checkout review. Custom and reserved-capacity requests target access within 24 hours after payment, capacity, and compliance review.
Start fast
Launch week: 50% bonus prepaid credit. Buy prepaid credits during the launch window and receive 50% extra usage credit after checkout review. Ends May 17, 2026 at 11:59 PM PT.
Promo ends Calculating... May 17 · 11:59 PM PT
Under-24h launch target for qualified Qwen deployments. Prepared snapshots help LighterHub move quickly once access, GPU availability, and compliance review are cleared.
Plan fast deployment
20x lower input price$0.15/M input tokens on the current Qwen route vs $3/M for Claude Sonnet 4.5 API-rate benchmarks.
15x lower output price$1/M output tokens on the current Qwen route vs $15/M for Claude Sonnet 4.5 API-rate benchmarks.
30-minute API key targetApproved prepaid customers typically receive an API key and quickstart instructions within 30 minutes after checkout review.
No prompt payload loggingRequest logs store usage metadata, not prompt payloads.

Rough workload details are enough. Sensitive and large deployments are reviewed for model license, GPU availability, jurisdiction, safety, and compliance fit. Card, invoice, USDC, and USDT payment are available for approved customers where permitted.

Ways to start.

Start free, prepay only what you need, buy short Qwen3.6 passes for burst testing, use RapidAPI marketplace billing, or request reserved NVIDIA A100 capacity for steady workloads.

Enterprise capacity path Reserved NVIDIA A100 capacity for steady workloads.

Move from shared API testing to reserved or dedicated GPU capacity when volume, privacy, latency, or predictable cost matters. Qualified Qwen deployments target live API access in under 24 hours after access, capacity, and compliance approval.

  • Dedicated capacity planning for high-volume API usage
  • Private endpoint discussion for sensitive workloads
  • Benchmark-backed model and throughput recommendation
  • Card, RapidAPI, USDC, and USDT options where permitted
Up to 20% off First 30 days of a reserved-capacity pilot. Final terms are based on model fit, GPU availability, usage profile, and compliance review. Start capacity review
Marketplace Procurement

RapidAPI listing

Use RapidAPI when marketplace billing, app keys, and procurement are easier than direct prepaid credits.

  • Marketplace subscription flow
  • Useful for buyers that prefer RapidAPI keys
  • Same Qwen3.6 access path positioning
Try first No card

1M trial tokens

$0

Validate Qwen3.6 quality, latency fit, context behavior, and integration before buying.

  • Best first step for new teams
  • Good for model-fit and cost checks
  • Rough workload details are enough
Pay as you go 50% bonus

Prepaid API credits

From $10

Use direct Stripe checkout when you want simple token billing without seats or a sales call.

Launch week promo: get 50% extra prepaid usage credit after checkout review through May 17, 2026.

  • Best for app integration and RAG tests
  • OpenAI-compatible API key after review
  • Buy only what you need before reserving capacity
Payment starts setup and does not bypass review.

What happens after payment?

After checkout, LighterHub reviews payment status, region, workload fit, and current Qwen3.6 capacity. Approved prepaid customers typically receive an API key and quickstart instructions within 30 minutes after checkout review. If access cannot be approved or provisioned, LighterHub will contact you with next steps or refund guidance.

1 CheckoutPayment creates a setup request.
2 Review + key setupPayment, region, workload fit, and Qwen3.6 capacity are checked.
3 Start calling the APIApproved prepaid customers typically receive an API key and quickstart within 30 minutes.

Who uses LighterHub?

Different buyers use the same NVIDIA-backed inference stack through the path that matches their budget, procurement, privacy, and speed requirements.

Enterprise

Benchmark and capacity fit

Privacy, reserved capacity, benchmarked cost, and latency review.

Developers

API or RapidAPI

Fast integration for startups and small businesses with predictable token pricing.

Students

Low-friction experiments

Ask for model help, credits, or educational access planning.

Colleges and nonprofits

Affordable open-model help

Research labs and nonprofits are eligible for model guidance and access planning based on available capacity.

Unsure

Get routed

Receive a model recommendation, estimated cost, and suggested access path.

Model comparison.

One-screen view of the models worth evaluating. Prices are public OpenRouter floor benchmarks; LighterHub public API and reserved capacity are confirmed after workload review.

Request recommendation
Current public route Qwen3.6 35B-A3B FP8 262K context on 1x A100 80GB, ready for prepaid API access.
Lowest input benchmark $0.039 / M gpt-oss-120b is capacity-planning only and requires workload review.
Best low-cost output $0.18 / M Reasoning and multi-GPU routes are scoped before being offered.
Fastest path 30-minute API key target Approved prepaid Qwen3.6 customers typically receive quickstart by email.
Model Status Best fit Ctx Input / M Output / M Cache / GPU note
Qwen3.6 35B-A3B FP8qwen/qwen3.6-35b-a3b Current route Long RAG, documents, agents 262K $0.15 $1.00 $0.05 cache; proven 1x A100 FP8
Qwen3.5 35B-A3BQwen/Qwen3.5-35B-A3B Benchmark-ready Qwen fallback, long context 262K $0.14 $1.00 $0.05 cache; A100 candidate
Qwen3-Coder-NextQwen/Qwen3-Coder-Next Capacity planning Coding agents, repo work 262K $0.11 $0.80 $0.07 cache; workload sizing
Qwen3 VL 30B-A3BQwen/Qwen3-VL-30B-A3B-Instruct Benchmark-ready Vision, docs, images 131K $0.13 $0.52 No cache discount; media benchmark
Gemma 4 31B ITgoogle/gemma-4-31B-it Benchmark-ready Quality chat, multimodal tests 262K $0.13 $0.38 No cache discount; A100 benchmark
Gemma 4 26B A4B ITgoogle/gemma-4-26B-A4B-it Benchmark-ready Cost-sensitive chat 262K $0.06 $0.33 No cache discount; education/startup fit
Mistral Small 3.2 24Bmistralai/Mistral-Small-3.2-24B Benchmark-ready Fast RAG, app integration 128K $0.075 $0.20 No cache discount; low-cost 128K
Llama 3.3 70B Instructmeta-llama/Llama-3.3-70B-Instruct Capacity planning Enterprise baseline evals 131K $0.10 $0.32 No cache discount; license/GPU review
gpt-oss-120bopenai/gpt-oss-120b Capacity planning Reasoning, agent workflows 131K $0.039 $0.18 No cache discount; multi-GPU review
OLMo 3 32B Thinkallenai/Olmo-3-32B-Think Benchmark path Research, education 65K $0.15 $0.50 No cache discount; HF/OpenRouter-listed

Pricing audited against OpenRouter's public model API on May 10, 2026. Cache discounts appear only where OpenRouter exposes cached-read pricing. Reserved capacity is quoted after benchmark.

Get a model recommendation.

Send the rough shape of the workload. LighterHub replies with model fit, pricing guidance, and trial access when it qualifies.

No card required Rough answers are fine Usually under 60 seconds
Main priority
Customer type

Rough answers are enough. We use this only to reply and recommend a model. Do not paste secrets or private data.

Request received.

LighterHub will evaluate your workload and reply with a recommended model, rough token cost, and suggested access path.

Email fallback

Scale when the workload earns dedicated planning.

Enterprise-grade does not mean enterprise-only. It means the path from shared API to reserved GPU capacity is explicit, benchmarked, and confirmed before commitments are made.

Reserved Capacity Pilot

Reserved capacity fit

Move to reserved or dedicated GPU capacity when volume, privacy, latency, or predictable cost matters. USDC and USDT are available for approved customers where permitted.

Pilot offer Up to 20% off
First 30 days of reserved-capacity planning. Final terms are based on model fit, GPU availability, usage profile, and compliance review. Request pilot pricing
LighterHub routing icon
Private capacity, same API shape. Start on shared Qwen access, then move to reserved NVIDIA A100 capacity when usage justifies it.

Designed for rapid launch.

For qualified Qwen deployments, LighterHub targets live API access quickly after access, GPU availability, and compliance approval. Custom model moves are benchmarked before launch.

Setup target <24h
Qualified Qwen deployments target access in under 24 hours. Approved prepaid requests are typically much faster when shared Qwen3.6 capacity is ready. Start with $10
Model size GPU availability Snapshot readiness Traffic shape Safety review Compliance review

OpenAI-compatible integration.

The current route supports OpenAI-compatible chat completions, streaming and non-streaming responses, usage accounting, prefix/cache-aware pricing where supported, and clean overload behavior.

Example request Do not paste secrets into demos
curl https://api.lighterhub.app/v1/chat/completions \
  -H "Authorization: Bearer $LIGHTERHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "qwen/qwen3.6-35b-a3b",
    "messages": [
      {"role": "user", "content": "Summarize this policy memo."}
    ],
    "stream": true,
    "max_tokens": 700
  }'

Trust boundaries.

Clear constraints help the right customers start faster and prevent unsupported expectations.

Is LighterHub enterprise-only?

No. Startups, small businesses, students, colleges, labs, nonprofits, and enterprises are welcome to request access. Larger or sensitive deployments receive deeper intake.

Are all workloads accepted?

No. Sensitive or large deployments go through model license, GPU availability, jurisdiction, safety, and compliance review. LighterHub supports customer-defined policy layers where appropriate.

How fast can I get access after paying?

Approved prepaid customers typically receive an API key and quickstart instructions within 30 minutes after checkout review when Qwen3.6 capacity is ready. Custom, high-volume, or reserved-capacity requests target access within 24 hours after payment, capacity confirmation, and compliance review.

What happens after I buy credits?

Your payment creates a setup request. LighterHub reviews payment status, region, workload fit, and current Qwen3.6 capacity. Approved prepaid customers typically receive an API key and quickstart instructions within 30 minutes after checkout review. If access cannot be approved or provisioned, LighterHub will contact you with next steps or refund guidance.

Where is LighterHub currently available?

LighterHub reviews customers worldwide where permitted. Priority launch markets include the United States, Canada, United Kingdom, Australia, New Zealand, Japan, South Korea, Taiwan, Belgium, Denmark, Finland, France, Germany, Ireland, Italy, Netherlands, Norway, Spain, and Sweden. Southeast Asia, including Vietnam, Thailand, Singapore, Malaysia, Indonesia, the Philippines, and Brunei, is available through manual review. Access depends on sanctions screening, export-control review, payment availability, model-license fit, capacity, and acceptable-use review.

Is there a formal SLA?

Shared access is offered without a formal enterprise SLA. Reserved-capacity terms are quoted after benchmark and operational review.

How does billing go live?

Public API prices must match backend billing before deployment. Reserved capacity is quoted separately after workload benchmark and capacity planning.