OpenAI-compatible API with 1,000,000 input tokens free. No card. Add prepaid credits only after it works for your workload.
We email an API key with 1,000,000 input tokens. One trial per email, expires after 48 hours.
Already validated the API? Top up prepaid credits. No auto-renewal.
Use these settings in any OpenAI-compatible tool, or paste the cURL request to verify the key immediately.
OpenAI-compatiblehttps://api.lighterhub.app/v1qwen/qwen3.6-35b-a3blh_YOUR_KEY# After you receive your key by email: curl https://api.lighterhub.app/v1/chat/completions \ -H "Authorization: Bearer lh_YOUR_KEY" \ -H "Content-Type: application/json" \ -d '{ "model": "qwen/qwen3.6-35b-a3b", "messages": [{"role": "user", "content": "hello"}] }'
from openai import OpenAI client = OpenAI( base_url="https://api.lighterhub.app/v1", api_key="lh_YOUR_KEY", ) response = client.chat.completions.create( model="qwen/qwen3.6-35b-a3b", messages=[{"role": "user", "content": "hello"}], )
import OpenAI from "openai";
const client = new OpenAI({
baseURL: "https://api.lighterhub.app/v1",
apiKey: "lh_YOUR_KEY",
});
const response = await client.chat.completions.create({
model: "qwen/qwen3.6-35b-a3b",
messages: [{ role: "user", content: "hello" }],
});
Stress-tested on Qwen3.6 35B-A3B FP8 through the public API endpoint, including Cloudflare Tunnel overhead.
Qwen3.6 35B-A3B powers every request. Use it when you need a practical text model for code, documents, agents, and chat without seat fees or subscriptions.
It is a good fit when your product needs to read instructions, reason over text, write code, summarize documents, or respond to users in a chat flow. The API is OpenAI-compatible, so most apps can switch by changing the base URL and model name.
Explain code, draft changes, review diffs, and help users work through implementation details.
Answer questions from policies, help docs, manuals, tickets, or internal knowledge bases.
Plan a task, call tools, keep context, and produce a useful final answer for workflows.
Power support bots, onboarding helpers, and product assistants that need context-aware replies.
Dedicated GPU capacity for this endpoint.
Text you send to the model.
Repeated prefix context, 67% below fresh input.
Text the model generates back.
No seat fees, subscriptions, or contracts.
Best fit: coding assistants, multi-step agents, long-context RAG with cached chunks, customer support chat, and internal document workflows.
The trial stays low-friction while the runtime keeps the production controls buyers ask about first.
Requests run through a controlled OpenAI-compatible wrapper on reserved A100 80GB capacity, with explicit overload behavior and usage accounting.
Operational logs keep metadata such as token counts, latency, and status codes, not prompt content.
Streaming and non-streaming responses include usage objects for predictable prepaid billing.
/health and /readiness track uptime, latency, and backend availability.
Rate limits, body-size caps, and timing-safe token checks protect the public endpoint.
Simple public entrypoint, controlled wrapper, dedicated inference backend.
https://api.lighterhub.app/v1/chat/completions
Public HTTPS endpoint and tunnel routing.
OpenAI-compatible validation, auth, and billing logic.
Streaming completions with usage objects enforced.
Reserved GPU capacity for Qwen3.6 serving.