Technology

サブセット化

フォントファイルから未使用の文字を削除してダウンロードサイズを削減する処理で、例えばUnicode全範囲の代わりにラテン文字のみを読み込む場合などに使用する。

Font subsetting is the practice of removing characters and features from a font file that your project doesn't use, dramatically reducing download size. A comprehensive font like Noto Sans — designed to support virtually every writing system — can exceed 500KB. Subset to Latin characters only, and it drops to under 20KB.

Subsetting is one of the highest-leverage performance optimizations available for web typography, often reducing font file size by 70-95% for Latin-script sites.

What can be subsetted:

  • Unicode ranges: Keep only the characters your content uses (Latin, Cyrillic, specific symbols)
  • OpenType features: Remove ligature tables, kerning pairs, or alternate glyphs you don't use
  • Hinting data: TTF hint data can be removed for WOFF2 (Brotli compression makes it redundant at small sizes)
  • Named instances: Variable fonts include named instances (Bold, Italic labels) that can be stripped

Using fonttools (pyftsubset) for subsetting:

# Install
pip install fonttools brotli

# Basic Latin subset, output as WOFF2
pyftsubset Inter.ttf   --output-file=inter-latin.woff2   --flavor=woff2   --unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+2000-206F,U+2074,U+20AC,U+2122,U+2191,U+2193,U+2212,U+2215,U+FEFF,U+FFFD"

Google Fonts does this automatically. When you load a font through the Google Fonts API, it detects your browser's Accept-Language header and the text= parameter to serve an already-subsetted version. This is why Google Fonts URLs load quickly — they're serving pre-subsetted WOFF2 files via CDN.

For self-hosted fonts, you can approximate Google Fonts' approach using the unicode-range CSS descriptor combined with multiple @font-face declarations:

/* Only download this file when CJK characters appear in content */
@font-face {
  font-family: 'Noto Sans';
  src: url('/fonts/noto-sans-cjk.woff2') format('woff2');
  unicode-range: U+4E00-9FFF, U+F900-FAFF, U+3400-4DBF;
}

This gives you the benefits of subsetting without pre-generating dozens of subset files — the browser only fetches files that contain characters actually used on the page.

For most Latin-script web projects, a well-chosen Latin subset (roughly 250-350 characters covering extended Latin) handles all practical content including common currency symbols, quotation marks, and diacritical characters for Western European languages. Tools like Glyphhanger can analyze your actual content and generate a minimal subset tailored to the exact characters your site uses.

Related Terms

Related Tools

Fonts That Illustrate This Concept

Learn More