HTML Entities & Escaping

In HTML a handful of characters carry structural meaning, so writing them literally inside content can confuse the parser or break the page entirely. An HTML entity is a short code beginning with `&` and ending with `;` that represents one of these reserved characters — or any character — safely as text. Entities exist both to display characters that would otherwise be interpreted as markup and to insert symbols that are awkward to type directly. Understanding them is also central to security, because failing to escape user-supplied input is the root cause of cross-site scripting. This guide explains which characters need escaping, the forms an entity can take, and when to reach for them.

  1. 1. Why some characters must be escaped

    The HTML parser treats `<` and `>` as the boundaries of tags and `&` as the start of an entity, so writing them literally in content can be misread as markup. To display them as ordinary text you escape them: `<` becomes `&lt;`, `>` becomes `&gt;`, and `&` becomes `&amp;`. The ampersand must be escaped first when converting text, otherwise an already-correct entity like `&lt;` would itself be mangled. Without escaping, a string such as `a < b` could be swallowed as the start of an unknown tag.

  2. 2. Quotes inside attributes

    Inside an attribute value the quote character that delimits it also needs escaping, because an unescaped quote ends the attribute early. Write `&quot;` for a double quote and `&#39;` (or the named `&apos;`) for a single quote when it would clash with the surrounding delimiter. For example `title="She said &quot;hi&quot;"` keeps the inner quotes from terminating the value. In ordinary text content quotes are harmless and need no escaping; the requirement is specific to attribute contexts.

  3. 3. Named entities

    A named entity refers to a character by a memorable label, such as `&amp;` for `&`, `&lt;` for `<`, `&copy;` for `©`, and `&mdash;` for an em dash. They are easy to read but limited to the set of names the HTML standard defines, and the names are case-sensitive — `&Amp;` is not valid. Only five are strictly reserved for escaping markup (`&amp;`, `&lt;`, `&gt;`, `&quot;`, `&apos;`); the rest are conveniences for typing symbols. When a character has no named form, you fall back to a numeric entity.

  4. 4. Numeric entities, decimal and hex

    A numeric entity encodes a character by its Unicode code point rather than a name, in two interchangeable forms. The decimal form is `&#NNNN;` and the hexadecimal form is `&#xHHHH;`, distinguished by the `x` prefix — so the copyright sign `©` (code point `U+00A9`, decimal 169) can be written `&copy;`, `&#169;`, or `&#xA9;`, all identical to the browser. Numeric entities can represent any character, including emoji and scripts that have no named entity. This makes them the universal fallback when a symbol is needed but no name exists.

  5. 5. Escaping and cross-site scripting

    Cross-site scripting (XSS) happens when untrusted input is inserted into a page without escaping, letting an attacker’s `<script>` or event-handler markup run as code. Escaping the dangerous characters turns that input into inert text: a submitted `<script>alert(1)</script>` becomes the harmless visible string `&lt;script&gt;alert(1)&lt;/script&gt;`. The critical rule is to escape according to context — HTML body, attribute, URL, and JavaScript each have different rules — and to rely on your framework’s auto-escaping rather than hand-rolling it. Never insert raw user input into the DOM through mechanisms like `innerHTML` without sanitising it first.

  6. 6. The non-breaking space and invisible entities

    Some entities represent characters you cannot see directly. `&nbsp;` is a non-breaking space: it renders as a space but prevents the line from wrapping at that point and stops the browser from collapsing it together with adjacent spaces. It is useful for keeping a value and its unit together, as in `10&nbsp;MB`, but overusing it to force layout is a common mistake better solved with CSS. Other invisible or formatting entities include `&shy;` (a soft hyphen that only appears when a word breaks) and `&zwnj;` (a zero-width non-joiner), each affecting how text wraps or joins rather than adding a visible glyph.

← All developer guides