Base64 Encoding Explained

Base64 is a way of representing arbitrary binary data using only a small set of printable ASCII characters, so bytes can travel safely through systems that were built for text. It exists because many channels — email bodies, URLs, JSON fields, HTTP headers — can mangle or reject raw binary, but they happily carry plain letters and digits. The trade-off is size: because it packs 6 bits of data into each 8-bit character, Base64 output is about 33% larger than the original. This guide explains the alphabet, the exact 3-to-4 byte mapping, what the `=` padding means, the URL-safe variant, and the crucial point that Base64 is encoding, not encryption.

  1. 1. What Base64 is for

    Base64 converts binary data into a text string drawn from a safe, universally understood character set, so it can pass through text-only or text-biased systems without corruption. Typical uses include embedding an image directly in a `data:` URL, attaching files to email via MIME, and carrying small binary blobs inside JSON or XML. It is not compression and not security — it only changes the representation so the bytes survive the journey. Anything Base64-encoded can be decoded back to the exact original bytes by anyone.

  2. 2. The 64-character alphabet

    The name comes from its alphabet of 64 characters, each representing one 6-bit value from 0 to 63. The standard set (defined in RFC 4648) is the uppercase letters `A`–`Z` (values 0–25), the lowercase letters `a`–`z` (26–51), the digits `0`–`9` (52–61), and finally `+` and `/` (62 and 63). A separate `=` character is reserved for padding and is not part of the 64. Because 64 is `2^6`, each character cleanly encodes exactly 6 bits of the source data.

  3. 3. Turning 3 bytes into 4 characters

    Base64 works on the input in groups of 3 bytes, which is 24 bits. Those 24 bits are re-divided into four 6-bit chunks, and each chunk is looked up in the alphabet to produce one character — so every 3 bytes of input become 4 characters of output. For example the three bytes of `Man` (`0x4D 0x61 0x6E`) regroup into the 6-bit values 19, 22, 5, 46, which map to `TWFu`. This 3-to-4 ratio is exactly why the encoded result is roughly one third larger than the input.

  4. 4. Padding with the = character

    Real data is rarely an exact multiple of 3 bytes, so the final group may hold only 1 or 2 bytes, and Base64 uses `=` to pad the output to a full 4-character block. If one byte remains it encodes to two characters followed by `==`; if two bytes remain it encodes to three characters followed by a single `=`. So `M` becomes `TQ==` and `Ma` becomes `TWE=`, while a clean 3-byte group like `Man` needs no padding. The padding makes the length always divisible by 4, which lets decoders know exactly how many original bytes the final block represents.

  5. 5. URL-safe Base64

    The standard `+` and `/` characters are problematic in some contexts: `/` is a path separator and `+` is interpreted as a space in query strings, so plain Base64 can break when placed in a URL or filename. The URL-safe variant, also defined in RFC 4648, swaps these two for `-` (minus) and `_` (underscore), leaving the rest of the alphabet unchanged. The padding `=` is often omitted too, since it can be re-derived from the length. This is the encoding used by JWTs (Base64URL), where tokens routinely travel in URLs and headers.

  6. 6. Base64 is encoding, not encryption

    A persistent mistake is treating Base64 as if it hides or protects data — it does neither, because there is no key and decoding is trivial for anyone. The string `cGFzc3dvcmQ=` may look scrambled, but it decodes directly back to `password` with a single command. Use Base64 purely to make binary data safe for text-based transport or storage, never to keep it secret. If the data must stay confidential you have to encrypt it; if you only need to verify it, you hash it — Base64 belongs nowhere in a security decision.

← All developer guides