About this tool
Estimates the token count in a text, with approximations for the main current language models. Useful for predicting cost and limits when working with AI models via API, sizing prompts before sending them, comparing the verbosity of different phrasings, or simply understanding the relationship between words and tokens.
How to use
- Paste the text in the box.
- See the approximate token count in real time.
- Compare with words and characters to get a feel for it.
Frequently asked questions
- What is a token?
- It's the basic unit language models use internally, smaller than a word but larger than a character. In English, one token corresponds to roughly 4 characters or 0.75 words. For other languages, especially those with many accented characters or non-Latin alphabets, the ratio changes and typically more tokens are used.
- Is the estimate exact?
- No, it's approximate. Each model uses a different tokenisation scheme (BPE, SentencePiece, tiktoken). The estimate gives you an order of magnitude and helps you detect when you're approaching context limits. For exact costs or sensitive budgets, use the specific tokeniser of the model you'll work with.
- Why do non-English languages use more tokens?
- Tokenisers were trained on mostly English corpora. Less common characters (accents, Asian, Arabic) are split into more tokens, making the same sentence use more tokens in Portuguese or Japanese than in English. This has practical cost implications when using APIs billed per token.