How to calculate AI Token cost? It can be seen most clearly from the separation of input and output

Many people often think that cost calculation is very simple when they first encounter AI API: isn't it just multiplying the number of tokens by the price?

This sentence is not wrong, but it is missing the most important step. A truly accurate algorithm is not to mix all tokens into one package for calculation, but to separate input and output first, and then multiply each by the unit price.

Because most mainstream platforms now price input and output separately, and output is often more expensive than input. OpenAI's API Pricing page clearly lists GPT-5.4 mini's input as $0.75 per 1M tokens, cached input as $0.075, and output as $4.50; Anthropic's Claude pricing page also lists Claude Haiku 4.5 as input $1/MTok and output $2/MTok; Google Gemini's official pricing page clearly lists Input, output, and context caching are priced separately, and even some models will jump to higher rates after prompts exceed 200k tokens.

So this article does not focus on what AI Token is, nor does it focus on how to read the price page, but directly answers a more practical question: How to calculate the cost of AI Token?

Let’s talk about the shortest answer first: AI Token cost calculation, first split input and output

The most practical algorithm is actually very simple:

Input cost = input tokens ÷ 1,000,000 × input unit price Output cost = output tokens ÷ 1,000,000 × output unit price total cost = input cost + output cost

If there are any Cache, Batch, tool fees, long context fees, and area processing fees are all added in addition.

This logic is completely consistent with the official pricing structures of OpenAI, Anthropic, and Google, because the three mainstream platforms now quote input and output separately, instead of just giving you a total average price.

The first step: first know how many input and output are each

You must first know how many input tokens and how many output tokens were used in this request. Different platforms usually provide these numbers in usage information.

OpenAI official description mentions that API response metadata will include input tokens, output tokens, cached tokens and other information, and will be directly used for billing and usage tracking. Anthropic's pricing document also clearly mentions that usage-related fields will include measurements such as input, cache write, cache read, and output; Google Gemini provides count tokens files and usage metadata to help you estimate the number of tokens and actual usage.

In other words, the first step in cost calculation is not to find the cheapest model first, but to know first: how many inputs were used in this request and how many outputs were returned by the model.

Step 2: Check the input/output unit price of the model

Don’t just look at the model name, but which model, mode, and platform you are actually using.

Because the same platform may exist at the same time:

OpenAI’s price page is not only Standard, but also lists the Batch API that can save 50%, and states that the Data residency and Regional Processing endpoint will charge an additional 10% for models released after March 5, 2026.

Anthropic’s official documents also include standard pricing, Batch processing, prompt caching, and long context pricing, and it is also stated that these modifiers can be superimposed; Google Gemini’s pricing page also lists Free / Paid, price jumps of different models around 200k tokens, and context caching storage price.

Therefore, the focus of the second step is not "checking the model name", but "checking the actual input/output unit price applicable to this request."

Step 3: Multiply the input and output by the unit price respectively, and add the sum

This step is actually the simplest, but it is also the step that is most easily ignored by many people.

Many people will directly grab a total tokens and multiply it by the average price in their minds. The biggest problem with this is: it’s not allowed.

Because output is often more expensive than input, and cache, Batch, and long context may only affect one side. If you average them all out, you'll usually underestimate or overestimate the actual cost.

A simplest trial calculation example

Suppose you are using OpenAI GPT-5.4 mini today. The current official price of OpenAI is:

Input: $0.75 / 1M tokens

Cached input: $0.075 / 1M tokens

Output: $4.50 / 1M tokens

If this request uses:

20,000 input tokens

5,000 output tokens

The calculation method is:

Input cost = 20,000 ÷ 1,000,000 × 0.75 = $0.015 Output cost = 5,000 ÷ 1,000,000 × 4.50 = $0.0225 Total cost = $0.0375

You will find that although the output token is much less than the input token, because the output token The unit price is high, but the final costs on both sides are actually very close. This is why for many content generation tasks, what is really expensive is often not how much you throw in, but how much the model returns.

Look at another example: Different models have the same algorithm, but the results are very different

Assume this set of usage:

Input: 20,000 tokens

Output: 5,000 tokens

If you switch to Claude Haiku 4.5, the official Anthropic price is:

Input: $1 / MTok

Output: $2 / MTok

Input = 20,000 ÷ 1,000,000 × 1 = $0.02 Output = 5,000 ÷ 1,000,000 × 2 = $0.01 Total cost = $0.03

If you look at a Google Gemini price range within 200k tokens, for example, a model section on the page displays:

Input: $2 / 1M tokens | | US$

So what really matters is not just "who is cheaper", but: what will your task structure look like in the end when it meets the input/output unit price of this model.

Why many people miscalculate the cost even though they have looked at the price list

The most common reasons are four.

Only look at the input, not the output

But the output of many platforms is inherently more expensive. OpenAI, Anthropic, and Google can all see this gap directly on their official pricing pages.

Only look at the standard price, not cache, Batch or long context

OpenAI’s Batch API can save 50%; Anthropic’s Batch is also half the price of input/output, and the cache hit read cost of prompt caching is about 10% of the standard input. Google also lists context caching and storage price separately.

Ignoring long context will result in a price jump

The official Google Gemini pricing page clearly states: For some models, input, output, and context caching will increase after prompts exceed 200k tokens. OpenAI officials say that the standard rate reflects context lengths under 270K.

Not including tool fees and surcharges

忽略長上下文會跳價

Google Gemini 官方定價頁明確寫出：某些模型在 prompts 超過 200k tokens 後，input、output、context caching 都會提高。OpenAI 官方則說標準費率反映的是 context lengths under 270K。

沒把工具費和附加費算進去

OpenAI's pricing page also lists tool projects such as Web search and Containers. The complete bill cannot be calculated solely by the token unit price.

If companies want to accurately calculate AI costs, what three things should they do first

The last thing companies should do is just focus on the "unit price per million tokens."

What you should really look at is the structure of each type of task:

How many input tokens will be sent on average for this task

For applications such as internal knowledge question and answer, long file retrieval, and RAG applications, the input is usually very large because it will bring in file fragments, context, and historical conversations.

How many output tokens will this task produce on average

Like content generation, report writing, analysis and organization, the output is usually very large because the model needs to be returned very long.

Are there any cache, batch, long context, tool fees or area processing bonuses

If these factors are not separated, all you will see is the "total bill", not a cost structure that can be optimized.

The most practical way to save money for novices: control input first, then control output

If you are a novice now, the simplest way to manage costs is not to memorize the entire price list, but to remember two sentences first:

If the input is too long, the cost will be dragged up by the context

The output is too long, and the cost will be dragged up by the reply content

OpenAI officials also specifically remind that non-English texts usually have a higher token-to-character ratio.

This means that Chinese users should pay more attention to input accumulation when doing long texts, long rules, and multi-turn conversations.

In practice, the most effective optimization is usually:

Change the repeated background to cache

Don’t run all tasks that can be done in Batch immediately

Get a summary first, and then decide whether to expand the complete answer

Don’t throw the entire original text of the document into it every time

If you just want to remember the most important sentence first, that is:

The most scary thing about AI Token cost calculation is not that the formula is too difficult, but that all tokens are mixed together.

As long as you remember one thing, the whole thing will be much clearer: first separate the input and output, and then do the calculation.

Because what really determines the level of your bill today is often not simply "how many tokens are used", but:

How many are input

How many are output

How many can be cache

How many can be batched

Is there any long context or tool fee

FAQ

Does the cost of AI Token only depend on the total number of tokens?

No. Most mainstream platforms bill input and output separately, and may also calculate cached tokens, storage, tool fees or long context rates.

Why is Output Token often more expensive than Input Token?

Because the pricing of many platforms originally sets output higher than input. This gap can be directly seen with OpenAI GPT-5.4 mini and Claude Haiku 4.5.

How to quickly estimate how much a request will cost?

First grab three numbers: input tokens, output tokens, and model unit price. Then use this formula: (Input ÷ 1,000,000 × input unit price) + (Output ÷ 1,000,000 × output unit price) If there is cache, Batch, search tools, and long context, then add that paragraph.

Can Batch API really save a lot?

In many cases, yes. OpenAI and Anthropic officials have clearly stated that the Batch API can provide a 50% discount on input and output costs.

Will long context make the cost higher?

Some platforms do. Some models of Google Gemini will jump in price after prompts exceed 200k tokens; OpenAI also clearly states that the standard rate applies to context lengths under 270K.

Data source and credibility statement

This article is compiled and written based on the official pricing pages and official instructions of mainstream AI platforms, focusing on the following sources:

OpenAI｜API Pricing

OpenAI｜What are tokens and how to count them?

Anthropic｜Pricing

Google AI for Developers｜Gemini API pricing

This article is organized from three perspectives: "cost formula × platform difference × actual trial calculation". The purpose is to allow readers who are exposed to AI API for the first time not only to know what the numbers on the price page represent, but also to actually calculate how much a request will cost. The focus of your original manuscript is on this line. In this version, I just organized it into a more complete, fee-based article that can be directly uploaded to the website.

If you already understand the basic direction of AI Token cost calculation, the next thing worth looking at is how to look at the AI Token price. Understand the input/output rates and price page readings of different models at once.

If you want to know more about the relevant content, you can go back to AI Token and continue reading.

This article belongs to the category "AI Token Fees"

This category mainly organizes AI Token prices, AI Token fees, AI cost calculations, model pricing methods, platform cost differences and budget interpretation logic. It is suitable for readers who have just started to contact AI APIs, model platforms and enterprise introduction evaluations. Many people think about the cost issue too simply at first, but what really affects the bill is usually the combination of input, output, cache, Batch, long context and tool fees.

How to calculate AI Token? Newbies understand the most basic calculation methods

What is the difference between Input Token and Output Token?

What’s the price of AI Token? Newbies should first understand where the fees come from

AI Token
API Pricing

AI Token organizes the basic concepts, calculation methods, API fees and model comparisons of AI Token (word elements), and covers common models such as ChatGPT, Gemini, Claude, etc. to help you establish clear understanding and judgment faster.

How to calculate AI Token cost? It can be seen most clearly from the separation of input and output