Yes. We forward your requests to real AI providers (Claude, GPT, Gemini, Grok, DeepSeek). Same models, same output, same context windows. Only the price is different.

How is the discount possible?

We pool bulk credit across providers and accept crypto, which keeps ops cost low. Those savings get passed through as 70-80% off list price.

Which SDKs and tools work?

Anthropic SDK, OpenAI SDK, LangChain, raw fetch — all work. Just swap the base URL and the model name to use any AI.

What payment methods do you accept?

We accept cryptocurrency — USDT (TRC20/ERC20), BTC, ETH, and 100+ other coins via Oxapay. Credits never expire.

What are the pricing plans?

Basic plan (free): 70% off all supported AI providers. Pro ($19 lifetime): 80% off all supported AI providers. One-time payment, credits never expire.

Do you store my prompts or data?

No. We don't log, store, or train on your API requests. Zero data retention policy on request content.

24/7 support via email at support@aiapi.cheap. Pro users get priority response.

All posts

May 4, 2026·4 min readstreamingapipythontutorialmulti-ai

Streaming vs Non-Streaming AI APIs: When to Use Which

Learn the difference between streaming and non-streaming API calls across Claude, GPT, Gemini, Grok, and DeepSeek — when to use each, with Python examples.

What's the Difference?

When you call any modern AI API (Claude, GPT, Gemini, Grok, or DeepSeek), you have two options for receiving responses:

Non-streaming: You send a request and wait. The API processes the entire response, then returns it all at once as a single JSON object.

Streaming: You send a request and immediately start receiving the response in small chunks (tokens) as they're generated, delivered via Server-Sent Events (SSE).

Both modes produce identical output. The difference is entirely about how and when you receive that output.

When to Use Non-Streaming

Non-streaming is simpler to implement and ideal when you don't need real-time output:

Batch processing — analyzing hundreds of documents where you collect results afterward

Backend pipelines — extracting data, classifying text, or generating summaries in automated workflows

Simple integrations — scripts and tools where you just need the final answer

Testing and prototyping — easier to debug with a single complete response

The tradeoff is perceived latency. For long responses, the user sees nothing until the entire generation finishes, which can take several seconds.

Non-Streaming Example (Anthropic SDK — Claude)

import anthropic

client = anthropic.Anthropic(
    api_key="sk-aic-YOUR_API_KEY",
    base_url="https://aiapi.cheap/api/proxy"
)

response = client.messages.create(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
)

print(response.content[0].text)

Non-Streaming Example (OpenAI SDK — works for all 5 vendors)

from openai import OpenAI

client = OpenAI(
    api_key="sk-aic-YOUR_API_KEY",
    base_url="https://aiapi.cheap/api/proxy/v1"
)

# Swap model to gpt-4o, gemini-2.5-pro, grok-2, or deepseek-chat — same call shape
response = client.chat.completions.create(
    model="gpt-4o",
    messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
)

print(response.choices[0].message.content)

When to Use Streaming

Streaming is the better choice when responsiveness matters:

Chatbots and conversational UIs — users see words appear in real time, just like ChatGPT

Long-form generation — articles, code, and reports that take several seconds to complete

Live dashboards — showing AI-generated insights as they're produced

Time-to-first-token matters — streaming starts delivering content in milliseconds instead of seconds

Streaming dramatically improves perceived performance. Even though total generation time is the same, users feel like the response is faster because they see output immediately.

Streaming Example (Anthropic SDK)

import anthropic

client = anthropic.Anthropic(
    api_key="sk-aic-YOUR_API_KEY",
    base_url="https://aiapi.cheap/api/proxy"
)

with client.messages.stream(
    model="claude-sonnet-4-6",
    max_tokens=1024,
    messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}]
) as stream:
    for text in stream.text_stream:
        print(text, end="", flush=True)

Streaming Example (OpenAI SDK — works for all 5 vendors)

from openai import OpenAI

client = OpenAI(
    api_key="sk-aic-YOUR_API_KEY",
    base_url="https://aiapi.cheap/api/proxy/v1"
)

stream = client.chat.completions.create(
    model="gemini-2.5-flash",  # or claude-*, gpt-*, grok-2, deepseek-chat
    messages=[{"role": "user", "content": "Explain recursion in 3 sentences."}],
    stream=True
)

for chunk in stream:
    delta = chunk.choices[0].delta.content
    if delta:
        print(delta, end="", flush=True)

Key Differences at a Glance

Latency: Non-streaming waits for full completion. Streaming delivers the first token in ~200ms.

Complexity: Non-streaming is a single request/response. Streaming requires handling an event stream.

Error handling: Non-streaming errors come back in one response. Streaming errors can occur mid-stream, so handle partial failures.

Cost: Both modes cost exactly the same per token. No pricing difference.

Multi-Vendor Streaming Through aiapi.cheap

Both streaming and non-streaming work identically across all five vendors through our proxy. Just set your base_url to https://aiapi.cheap/api/proxy (Anthropic format) or https://aiapi.cheap/api/proxy/v1 (OpenAI-compatible) and everything works — no code changes beyond the URL.

All streaming events are forwarded in real time with minimal added latency, regardless of vendor.

Our Recommendation

Use streaming for anything user-facing where people are waiting for a response. Use non-streaming for background tasks where you just need the final result. Many production applications use both — streaming in the chat UI and non-streaming in the data pipeline.

Get your API key →

For detailed setup instructions and code examples, check our API documentation.