What are the billing methods for AI Token? Not every platform is the same

Many people who have just started to come into contact with AI API think that the billing method is very simple: how many tokens you send in, how many tokens the model returns, and finally multiplied accordingly.

This understanding cannot be wrong, but it is only half right. Because the real situation is: Although different platforms will talk about tokens, the billing logic is not necessarily the same.

In addition to input and output, some platforms also calculate cached input, prompt caching, context caching, batch discounts, long context thresholds, search tools, grounding, images or audio and other multi-modal units. Even if the same model is switched to different platforms, the final price may be different.

OpenAI's API Pricing page lists input, cached input, and output separately, and also lists Web search, Containers, Batch API and other cost items; Anthropic's official pricing page separates prompt caching, Batch processing, long context pricing, tool use pricing into independent chapters; Google Gemini's pricing page also separately describes input, output, context caching, storage price, Grounding with Google Search and other items.

So if you are a novice, what you should really ask is not "which one is the cheapest", but: In what ways does this platform charge you?

If you have seen the basic concepts of AI Token before, this article will help you take "how to read the price page" one step further and directly understand the pricing structures behind different platforms.

Let’s talk about the conclusion first: there is more than one common billing method for AI API

The common billing methods on mainstream platforms can be roughly divided into several categories:

The most basic input token / output token

cached input or prompt caching

Batch asynchronous discount

long context or super large prompt Special pricing

Additional costs for search, grounding, and tool calls

Different unit pricing for multi-modal modes such as images, audio, and videos

Price differences caused by regions, plans, and third-party cloud platforms

In other words, even if both are called "AI Model API," the final bill may look completely different. This is exactly what the official pricing pages of OpenAI, Anthropic, and Google now show together.

The most basic one: Input Token and Output Token are priced separately

This is the most common and the first model that most novices come into contact with.

OpenAI’s API Pricing page is very clear. GPT-5.4 mini and GPT-5.4 nano both split the price into three columns: Input, Cached input, and Output.

The input of GPT-5.4 mini is US$0.75 per 1M tokens, the cached input is US$0.075 per 1M tokens, and the output is US$4.50 per 1M tokens.

The official pricing page of Google Gemini also lists input and output separately, and some models are directly marked with "Output price including thinking tokens", indicating that the platform's thinking output is also clearly included in the output cost.

For example, a section of Gemini 2.5 Pro writes: When prompts are less than or equal to 200k tokens, input is $0.625 and output is $5; after exceeding 200k tokens, both input and output prices jump up.

The most important concept here is simple: it’s not just how much you send in that costs money, it’s also what the model gives you back that costs money.

Second type: Split duplicate content into caches for pricing

Many people think that prompts are just sent again and again for recalculation, but now some platforms will separate "reused content" and charge separately.

OpenAI’s price page lists cached input directly, and the price is usually much lower than that of general input. Taking GPT-5.4 as an example, input is $2.50 per 1M tokens and cached input is $0.25 per 1M tokens.

Anthropic has a similar concept, but it is not called cached input, but prompt caching. The official document clearly states that cache write tokens are charged when content is written to the cache for the first time, cache read tokens are charged when subsequent requests read the cache content, and the read cost of cache hit is approximately 10% of the standard input price. Anthropic also specifically states that these multipliers can also be stacked with other price modifications such as Batch API discounts and data residency.

Google Gemini goes a step further and not only lists the context caching price, but also lists the storage price. In addition to the context caching price, you can directly see the storage price per million tokens per hour on the official page. For example, some models list a storage price of $1.00 / 1,000,000 tokens per hour.

So they are all "cache", but the presentation methods of the three mainstream platforms are different: some use cached input, some use prompt caching, and some use context caching plus storage. This is why you can’t just look at the model unit price without looking at the platform pricing structure.

Third type: Batch asynchronous discount

If your work is not instant interaction, but a batch task that can be postponed, some platforms will give you a significant discount.

The official price page of OpenAI directly states: Batch API can save 50% of input and output costs, provided that the task is changed to asynchronous and completed within 24 hours.

The official Anthropic document also lists Batch processing as a separate pricing chapter, and clearly states that the prompt caching multiplier can be superimposed on the Batch API discount. This means that if your workflow is designed well, the actual cost may be much different than an immediate call.

For enterprises, this type of difference is important. If your task is not real-time customer service, but night classification, batch summary, and report generation, then missing Batch is equivalent to missing a whole period of cost optimization space. This is a direct practical inference based on the official description of Batch pricing from OpenAI and Anthropic.

Fourth: Long context may not always be calculated at the original price

This is the easiest point for many novices to ignore, but it is also the easiest to rack up bills.

The official pricing page of Google Gemini directly states: For some models, after prompts exceed 200k tokens, the prices of input, output, and even context caching will jump up.

Like a certain price range of Gemini 2.5 Pro, when prompts are less than or equal to 200k tokens, input is $0.625 and output is $5; after exceeding 200k tokens, input becomes $1.25 and output becomes $7.50, and context caching also becomes expensive.

Anthropic does it differently. Official documents indicate that the full 1M token context window of Claude Opus 4.6 and Sonnet 4.6 is currently available at standard prices, and prompt caching and batch processing discounts can also be applied to the full context window. In other words, Anthropic’s long-context strategy in this version is different from Google’s threshold price jump logic.

OpenAI directly states on the price page: The standard rates listed on the page reflect the standard processing rates for context lengths under 270K. This means that the platform itself also puts the context length into the price description.

So "model supports long context" is not a free benefit in itself. Some platforms will jump to the rate after reaching the threshold, some platforms will still use standard prices for the complete long context, and some platforms will first explain to which range the standard rates apply. Supporting long context does not mean that long context will always be charged at the original price.

The fifth type: search, grounding, and tool calling are often another transaction

Many users only focus on the token unit price at first, but ignore that many AI platforms now support tool calling, and tools often count as more than just tokens.

OpenAI's price page directly lists Web search as $10/1,000 calls, and also states that search content tokens are free. This means that OpenAI’s search tool charges per call, not just per token.

Anthropic’s tools are priced more carefully. The official document states that Web search usage is charged in addition to token usage, the price is $10/1,000 searches, and the content generated by the search results will also be included in the standard token cost.

Google Gemini doesn’t just look at tokens. The official pricing page lists Grounding with Google Search, and different models have different free credits, with most starting at $35/1,000 grounded prompts. The page also clearly lists additional items such as Grounding with Google Maps.

So if your application is a search assistant, RAG, map assistant, or an agent that adjusts a large number of tools, the real bill is probably not a simple input/output, but a combination of token plus tool fee.

Type 6: Different modes may not necessarily be priced in the same unit

Not all AI content must be priced only in text tokens.

OpenAI’s official pricing page is typical. GPT-realtime-1.5 lists Audio, Text, and Image separately; GPT-image-1.5 also lists image input / cached input / output, and text input / cached input / output; while some products such as Sora are priced directly in other units, no longer just text tokens.

Google Gemini’s pricing page is also multi-unit.

In addition to text tokens, it also lists the input prices of text, image, audio, and video separately. Some projects even provide conversion explanations for each picture, audio per second, and video per frame.

So when you look at the price, you can’t just ask “How much does one million tokens cost?”, but also ask first: Is this function calculated by token, by number of times, by time, or by each picture and audio per second?

Type 7: For the same model, the price may not be the same if you change the platform

This is especially important for enterprises.

Anthropic official documents directly list Claude as also available on third-party platforms, such as Amazon Bedrock, Google Cloud Vertex AI. This is to make it clear: the same Claude model name does not guarantee the same final price on every platform.

OpenAI has similar endpoint differences. The official price page states that for models released after March 5, 2026, if the Data residency and Regional Processing endpoint is used, an additional 10% will be charged. In other words, even if it is the same model, the bill may be different just because of the different deployment and processing areas.

Google Gemini lists both Free Tier and Paid Tier on the same price page, and also displays the difference between free and paid plans for Used to improve our products. This illustrates once again: the platform is not just a different model, the solution level itself will also change your costs and usage conditions.

So, how should newbies look at AI API prices

The simplest way is not to just look at the unit price of the model on the homepage, but to look at the following things clearly:

First, see if it only counts input and output, or whether it also includes cached input, prompt caching or context caching.

Second, see if it has a Batch discount.

Third, see if the long context will increase the rate.

Fourth, see if there are any additional charges for searching, grounding, and tool calls.

Fifth, check whether pictures, information, and videos are priced in other units.

Sixth, see if different endpoints, different cloud platforms, and different plans have different prices.

If you only look at "How many yuan does this model input for one million tokens", you will probably only see a small part of the bill.

If you just want to remember the most important thing first, that is:

The AI Token billing method is not the same for every company.

Some platforms focus on input/output. Some platforms will separate the cache and calculate the price separately. Some platforms encourage you to use Batch. Some platforms set separate prices for long context, search tools, and multi-modal content. Some platforms will have different final prices due to regions, plans or third-party cloud platforms.

So whether you are a novice, an advanced user, or a company evaluating the introduction of AI, what you really need to learn is not just to compare who is the cheapest, but to understand first: how this platform charges you.

FAQ

Are the AI Token billing methods only input and output?

No. In addition to input/output, many platforms will also list additional costs for cached input, prompt caching, context caching, Batch, Grounding, search tools, etc.

Is Batch API really cheaper?

In many cases, yes. OpenAI officially states that the Batch API can save 50% of input and output costs; Anthropic also regards Batch discount as a formal price modification mechanism.

Are long contexts necessarily more expensive?

Not necessarily every family is the same. The price of some Google Gemini models will increase after prompts exceed 200k tokens; Anthropic's Claude Opus 4.6 and Sonnet 4.6 maintain the standard price with a full 1M token context window.

Is the search tool fee also included in the token?

Not necessarily. The official pages of OpenAI and Anthropic both describe Web search as charging per call or per search, and may also add token costs; Google's Grounding with Google Search is also an independent project.

Will the same model have the same price on different platforms?

Not necessarily. Anthropic officials directly pointed out that Claude can also be used on Bedrock and Vertex AI; OpenAI also stated that the Regional Processing endpoint will charge an additional 10%.

Data source and credibility statement

This article is compiled and written based on the official pricing pages and official documents of mainstream AI platforms, focusing on the following sources:

OpenAI｜API Pricing

Anthropic｜Pricing

Google AI for Developers|Gemini API pricing|| The pricing page can be disassembled first for interpretation. The core of your original manuscript is this line. This version of mine just condenses the content into a more complete and cost-effective article that can be directly posted on the website.

After understanding the different AI Token billing methods, the next step is to compare the rates of each model and platform on the same benchmark. You can then look at the price of AI Token.

After reading this article, if you want to read more related topics, you can go directly to AI Token.

This article belongs to the category "AI Token Fees"

This category mainly organizes AI Token prices, AI Token fees, AI model pricing methods, platform billing structures, cost optimization logic and budget concepts. It is especially suitable for readers who have just started to come into contact with AI APIs, model platforms and enterprise procurement evaluations. When many people first look at the price page, they will only compare the superficial unit price, but what really affects the bill is often the overall combination of input, output, cache, batch, long context, tool call and multi-modal pricing.

What’s the price of AI Token? Newbies should first understand where the fees come from

What is the difference between AI Token and points? Not every platform uses the same algorithm

How to choose an AI Token platform? Beginners should first distinguish between original factory, aggregation, and agency

AI Token

API Pricing
AI Token organizes the basic concepts, calculation methods, API fees and model comparisons of AI Token (word elements), and covers common models such as ChatGPT, Gemini, and Claude to help you establish a clear understanding and judgment faster.

Function
Model comparison
Usage context
AI Token Calculator

What are the billing methods for AI Token? Not every platform is the same