How Character Encoding Works (Simple Explanation) | AksharaTool Blog

Every piece of text you see on a screen — from a simple "Hello" to a complex Telugu paragraph — exists as numbers inside a computer. Character encoding is the system that decides which number represents which character. Get the encoding wrong, and your text turns into unreadable garbage. Get it right, and text flows seamlessly between devices, applications, and continents. This guide explains the concept from the ground up.

Why Computers Need Encoding

Computers do not understand letters. They understand numbers — specifically, binary numbers made of 0s and 1s. To store and display text, every character must be assigned a numerical value. The letter "A" might be stored as the number 65. The digit "7" might be stored as 55. A Telugu character like "క" might be stored as 3093. Character encoding is the agreed-upon mapping between characters and numbers.

Without encoding standards, every computer manufacturer could assign different numbers to the same characters, making it impossible to share text between systems. This is, in fact, exactly what happened in the early decades of computing — and it caused enormous problems.

ASCII: Where It All Began

The first widely adopted encoding standard was ASCII (American Standard Code for Information Interchange), created in 1963. ASCII uses 7 bits to represent each character, giving it a total of 128 possible values (0 through 127). These 128 slots include:

26 uppercase English letters (A–Z)
26 lowercase English letters (a–z)
10 digits (0–9)
32 punctuation and symbol characters
33 control characters (like newline and tab)

ASCII worked brilliantly for English. But it had no room for accented characters (like é or ü), no support for non-Latin scripts (like Telugu, Arabic, or Chinese), and no mechanism for extension. For the non-English-speaking majority of the world, ASCII was useless.

Extended ASCII and Code Pages

To work around ASCII's limitations, manufacturers extended it to use 8 bits (one full byte), providing 256 character slots. The first 128 remained standard ASCII; the upper 128 were assigned differently depending on the region. These region-specific assignments were called code pages.

Windows-1252 was the code page for Western European languages. Windows-874 handled Thai. For Telugu, companies created proprietary code pages — most notably Anu Systems, whose Anu6 and Anu7 fonts mapped Telugu glyphs into the upper 128 positions and beyond. This is what the Telugu DTP industry calls "ANSI encoding."

The problem with code pages was interoperability. A document created using one code page would display as nonsense on a system configured for a different one. This garbled-text phenomenon is called mojibake, and it plagued international communication for decades.

Unicode: The Universal Solution

In the late 1980s, engineers at Apple and Xerox began working on a universal encoding that would assign a permanent, unique number to every character in every writing system. This became Unicode. Instead of 256 characters, Unicode supports over 1.1 million code points — enough for every script ever invented, plus room for future additions.

Unicode does not dictate how these numbers are stored in memory. That job falls to encoding forms — the most important being UTF-8, UTF-16, and UTF-32.

UTF-8: The Web Standard

UTF-8 is a variable-width encoding that uses 1 to 4 bytes per character. Its genius lies in backward compatibility: the first 128 characters are identical to ASCII, using just one byte each. This means any valid ASCII file is automatically a valid UTF-8 file. Non-Latin characters use more bytes — Telugu characters use 3 bytes each.

UTF-8 dominates the internet. Over 98% of all web pages use it. When you set <meta charset="UTF-8"> in your HTML, you are telling browsers to interpret the page's bytes using UTF-8 rules.

How Encoding Affects Your Daily Work

Web Development

If you build websites with Telugu content, UTF-8 is mandatory. Without it, Telugu characters will appear as question marks or boxes. Every HTML file should declare its encoding in the head section. Databases should be configured for UTF-8 (specifically utf8mb4 in MySQL to support the full Unicode range including emoji).

DTP and Print

If you use Adobe Photoshop or CorelDRAW with Telugu legacy fonts, you are working in a proprietary ANSI encoding. The text looks correct only because the right font is installed. To move this text to the web or share it digitally, you must convert it to Unicode using a tool like our Unicode Converter.

Email and Messaging

Modern email clients and messaging apps use UTF-8 by default. This is why Telugu WhatsApp messages display correctly on any phone. However, if you copy text from a legacy DTP file and paste it into an email, the recipient will see Latin gibberish because the ANSI bytes are being interpreted as UTF-8.

Common Encoding Errors and Fixes

Mojibake (garbled text): Caused by opening a file with the wrong encoding. Solution: identify the correct encoding and re-open the file with that setting.
Question marks or boxes: The font does not contain glyphs for the characters. Install a font that supports the script (like Noto Sans Telugu for Telugu).
Double encoding: Text that was already UTF-8 gets encoded as UTF-8 again, producing characters like "à°•" instead of "క." Solution: decode the text once to restore the original characters.

Frequently Asked Questions

What is the best encoding to use in 2026?

UTF-8 for virtually all purposes. It is the universal standard for web, mobile, email, and databases. Use it unless you have a very specific reason not to.

Can I convert between encodings?

Yes. Tools like our Unicode Converter and ANSI Converter handle the translation between encoding systems automatically.

Why do some Telugu websites still show broken characters?

Usually because the website's encoding declaration does not match the actual encoding of the content, or because the web server is sending incorrect encoding headers.

Is ASCII still used?

ASCII is a subset of UTF-8, so every ASCII file is automatically valid UTF-8. You do not need to worry about ASCII as a separate standard — UTF-8 covers it completely.

How do I check a file's encoding?

In Notepad++, open the file and check the bottom status bar. In VS Code, the encoding is shown in the bottom-right corner. On the command line, the "file" command on Linux/Mac will report the encoding.

Conclusion

Character encoding is the invisible layer that makes digital text possible. Understanding how it works — from ASCII's 128 characters through the code page era to Unicode's universal standard — empowers you to diagnose text problems, build better websites, and work more efficiently with multilingual content. For anyone working with Telugu, the key takeaway is simple: use UTF-8 for everything digital, and use conversion tools when you need to bridge the gap to legacy DTP formats.