字体子集化:将字体文件大小缩减90%
Font Subsetting: Cut Font File Sizes by 90%
A full-featured font file contains glyphs for every character the type designer included: Latin letters, Greek and Cyrillic alphabets, mathematical symbols, currency signs, ligatures, historical letterforms, and often thousands of glyphs that will never appear in your content. When you load Inter from Google Fonts without any configuration, you're potentially downloading support for characters your users will never see.
Font subsetting is the process of stripping out all the glyphs you don't need, keeping only the characters your content actually uses. Done well, it transforms a 300KB font into a 20KB one — a 90% size reduction that has an outsized impact on page load performance.
Why Font Files Are So Large
To understand subsetting, you first need to understand why modern font files are so heavy.
A professionally designed typeface like Inter contains approximately 2,500 glyphs in its full release. That includes:
- Basic Latin (A–Z, a–z, digits, punctuation) — roughly 95 glyphs
- Extended Latin for European languages — several hundred additional glyphs
- Cyrillic characters — 200+ glyphs
- Greek characters — 70+ glyphs
- Currency symbols from around the world
- Mathematical and technical symbols
- Arrows and dingbats
- Ligatures (fi, fl, ffi, etc.)
- Discretionary alternates and stylistic sets
- OpenType feature glyphs for swashes, small caps, and more
An English-language website needs exactly none of the Cyrillic, Greek, mathematical, or extended Latin characters. Yet without subsetting, you download all of them.
Variable fonts compound the problem. A variable font encodes the entire design space — every point along every axis — into a single file. The full Inter variable font WOFF2 file is around 330KB. The equivalent subsetted Latin-only version is about 75KB. A further subsetting pass targeting only the specific characters used on your site can bring this below 20KB.
Glyph Count vs. File Size
The relationship between glyph count and file size is roughly linear for static fonts: fewer glyphs means proportionally smaller files. For variable fonts, it's somewhat different — the axis data takes a fixed overhead regardless of glyph count — but subsetting still produces dramatic reductions.
The OpenType specification allows fonts to contain up to 65,535 glyphs. Most professional typefaces use a small fraction of this capacity, but even 2,000–3,000 glyphs represents a significant payload when you only need 200.
unicode-range: Browser-Level Subsetting
The unicode-range descriptor in @font-face declarations is a built-in CSS mechanism for subsetting at the browser level. It tells the browser which Unicode code points a particular font file covers, allowing the browser to download the file only when it encounters matching characters in the page content.
@font-face {
font-family: 'Inter';
src: url('/fonts/inter-latin.woff2') format('woff2');
font-weight: 400;
unicode-range: U+0000-00FF, U+0131, U+0152-0153, U+02BB-02BC,
U+02C6, U+02DA, U+02DC, U+2000-206F, U+2074,
U+20AC, U+2122, U+2191, U+2193, U+2212, U+2215,
U+FEFF, U+FFFD;
}
@font-face {
font-family: 'Inter';
src: url('/fonts/inter-latin-ext.woff2') format('woff2');
font-weight: 400;
unicode-range: U+0100-024F, U+0259, U+1E00-1EFF, U+2020,
U+20A0-20AB, U+20AD-20CF, U+2113, U+2C60-2C7F,
U+A720-A7FF;
}
@font-face {
font-family: 'Inter';
src: url('/fonts/inter-cyrillic.woff2') format('woff2');
font-weight: 400;
unicode-range: U+0400-045F, U+0490-0491, U+04B0-04B1, U+2116;
}
With this setup, a browser rendering an English-only page downloads only inter-latin.woff2. If a Russian word appears in the content, the browser also downloads inter-cyrillic.woff2. The Cyrillic file is never requested for an English-only page.
This is exactly how Google Fonts serves Inter, Roboto, and every other hosted typeface. The single fonts.googleapis.com CSS URL returns multiple @font-face declarations with unicode-range descriptors, and the browser downloads only what it needs.
Reading Unicode Range Values
The U+ prefix denotes a Unicode code point in hexadecimal. Ranges use a hyphen: U+0000-00FF covers the first 256 Unicode code points, which are the Basic Latin and Latin-1 Supplement blocks — the core characters for English and Western European languages. Wildcards are also valid: U+26?? covers all emoji in the Miscellaneous Symbols block.
Limitations
unicode-range is conditional loading, not file-level subsetting. The font file itself still contains all its glyphs — the browser simply decides whether to download it based on what characters appear in the DOM. For true size reduction, you need to physically remove glyphs from the font file, which requires manual subsetting tools.
Manual Subsetting with pyftsubset
pyftsubset is a command-line tool from the fonttools Python library. It physically removes glyphs from a font file, producing a smaller output that contains only the characters you specify. This is the most powerful subsetting approach available.
Installation
pip install fonttools brotli
# brotli enables WOFF2 output
Basic Usage
Subset to Basic Latin only (English characters):
pyftsubset Inter-Regular.ttf \
--output-file=inter-regular-latin.woff2 \
--flavor=woff2 \
--unicodes="U+0000-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+02C6,U+02DA,U+02DC,U+2000-206F,U+20AC,U+2122,U+FEFF,U+FFFD"
Subset to exactly the characters used in your content:
pyftsubset Inter-Regular.ttf \
--output-file=inter-regular-custom.woff2 \
--flavor=woff2 \
--text-file=all-page-text.txt
The --text-file option accepts a plain text file containing every character that appears anywhere in your site. pyftsubset extracts only the glyphs needed to render those specific characters.
OpenType Feature Flags
By default, pyftsubset removes OpenType layout tables (GSUB, GPOS) that aren't referenced by the retained glyphs. You can explicitly preserve features:
pyftsubset Inter-Regular.ttf \
--output-file=inter-regular-latin.woff2 \
--flavor=woff2 \
--layout-features="kern,liga,calt,rlig" \
--unicodes="U+0000-00FF"
kern is kerning pairs (important for quality typography), liga is standard ligatures (fi, fl), calt is contextual alternates, and rlig is required ligatures. For most web use cases, including kern and liga while dropping decorative OpenType features is the right balance between quality and file size.
Subsetting Variable Fonts
Variable fonts require additional flags:
pyftsubset Inter[wght].ttf \
--output-file=inter-variable-latin.woff2 \
--flavor=woff2 \
--layout-features="kern,liga,calt" \
--unicodes="U+0000-00FF,U+0131,U+0152-0153" \
--no-hinting \
--desubroutinize
--no-hinting removes hinting data, which is largely irrelevant at modern screen resolutions but can add significant file size. --desubroutinize simplifies the font's internal glyph description structures, sometimes producing smaller output at the cost of marginal visual quality at very small sizes.
Google Fonts' Automatic Subsetting
Google Fonts handles subsetting automatically based on the subset parameter and unicode-range CSS. When you request a font via the Google Fonts API, the returned CSS contains multiple @font-face declarations — one per Unicode block — each pointing to a pre-generated subset file.
<!-- This request returns subsetted fonts automatically -->
<link href="https://fonts.googleapis.com/css2?family=Inter:wght@400;700&display=swap" rel="stylesheet">
The returned CSS looks something like:
/* cyrillic-ext */
@font-face {
font-family: 'Inter';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url(https://fonts.gstatic.com/s/inter/v18/UcCO3FwrK3iLTeHuS_...cyrillic-ext.woff2) format('woff2');
unicode-range: U+0460-052F, U+1C80-1C88, ...;
}
/* latin */
@font-face {
font-family: 'Inter';
font-style: normal;
font-weight: 400;
font-display: swap;
src: url(https://fonts.gstatic.com/s/inter/v18/UcCO3FwrK3iLTeHuS_...latin.woff2) format('woff2');
unicode-range: U+0000-00FF, U+0131, ...;
}
An English-only page downloads only the latin.woff2 file — typically 15–25KB per weight. The Cyrillic and extended Latin files are never fetched.
The text Parameter for Extreme Optimization
Google Fonts supports a text parameter that subsets the font to exactly the characters you specify — ideal for display text that uses only a few glyphs:
<!-- Subset to just the characters in "Hello World" -->
<link href="https://fonts.googleapis.com/css2?family=Playfair+Display&text=HeloWrd" rel="stylesheet">
This produces a font file containing only the glyphs H, e, l, o, W, r, d — a file that might be 3–5KB instead of 30KB. It's a powerful optimization for heading-only fonts where the character set is predictable and small.
The limitation: the font is served with Cache-Control: max-age=31536000 but bound to the exact character set. If your headings ever use a character not in the text parameter, that character will fall back to the system font.
Subsetting Strategies by Use Case
Different types of websites have different optimal subsetting approaches.
English-Only Marketing Sites
Use Google Fonts with the default unicode-range subsetting, or self-host fonts subsetted to Basic Latin + Latin-1 Supplement. The target is a single WOFF2 file per weight under 20KB.
pyftsubset Inter-Regular.ttf \
--output-file=inter-400-latin.woff2 \
--flavor=woff2 \
--layout-features="kern,liga" \
--unicodes="U+0020-007E,U+00A0-00FF,U+0131,U+0152-0153,U+02BB-02BC,U+20AC,U+2122,U+2014,U+2013,U+201C,U+201D,U+2018,U+2019"
This covers standard English text including smart quotes, em/en dashes, the euro sign, and the trademark symbol — essentially everything a marketing copywriter will put in content.
Multilingual Applications
Use unicode-range splitting with separate font files per script. Load the base Latin file eagerly; load other script files conditionally based on unicode-range. This ensures that a Japanese user's browser downloads the CJK font file, while an English user's browser never requests it.
@font-face {
font-family: 'Noto Sans';
src: url('/fonts/noto-sans-latin.woff2') format('woff2');
font-weight: 400;
unicode-range: U+0000-00FF;
}
@font-face {
font-family: 'Noto Sans';
src: url('/fonts/noto-sans-cjk.woff2') format('woff2');
font-weight: 400;
unicode-range: U+4E00-9FFF, U+3400-4DBF, U+20000-2A6DF;
}
Display Headings with Decorative Fonts
When using an expressive display face for headings only, subset aggressively using the text parameter or by generating a font file containing only the characters that appear in your actual headings.
# Extract unique characters from your heading content
echo "The Quick Brown Fox Jumps Over The Lazy Dog" | \
python3 -c "import sys; print(''.join(sorted(set(sys.stdin.read().strip()))))"
# Subset to exactly those characters
pyftsubset PlayfairDisplay-Bold.ttf \
--output-file=playfair-headings.woff2 \
--flavor=woff2 \
--text=" ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz"
E-commerce Product Pages
Product pages often include user-generated content — product names, reviews, descriptions — that may contain characters you can't fully predict. The safe approach is Basic Latin + Latin Extended-A for Western European markets:
pyftsubset Roboto-Regular.ttf \
--output-file=roboto-400-latin-ext.woff2 \
--flavor=woff2 \
--layout-features="kern,liga,calt" \
--unicodes="U+0000-024F,U+0259,U+1E00-1EFF,U+20AC,U+2122,U+2014,U+2013,U+201C-201D,U+2018-2019"
This covers English, French, German, Spanish, Portuguese, Italian, Polish, and most other Western European languages — a reasonable default for international e-commerce without the weight of full multilingual coverage.
Automating Subsetting in Your Build Pipeline
Manual subsetting works for static sites, but dynamic applications benefit from automated subsetting that runs during the build process.
A simple Node.js build script can extract unique characters from rendered HTML and generate perfectly tailored font subsets:
# 1. Crawl your site and extract all text content
wget --recursive --level=3 --quiet --output-file=/dev/null \
--execute robots=off https://yoursite.com \
--directory-prefix=./crawl
# 2. Extract unique characters
grep -r --include="*.html" -oh "." ./crawl | sort -u | tr -d '\n' > chars.txt
# 3. Generate subset
pyftsubset Inter-Regular.ttf \
--output-file=inter-400-custom.woff2 \
--flavor=woff2 \
--text-file=chars.txt \
--layout-features="kern,liga"
# 4. Check file size
ls -lh inter-400-custom.woff2
This approach produces the smallest possible font file — one containing only the glyphs that actually appear in your content. For a typical English marketing site, the result is often under 15KB per font file, compared to 50–75KB for a standard Latin subset and 300KB+ for a full font file.
Subsetting is the highest-impact single optimization available for web font performance. Combined with WOFF2 format and proper font-display settings, it makes high-quality typography genuinely compatible with fast page loads.
Validating Your Subsets
After generating a subset font, verify it actually contains the characters you need. Loading a subsetted font that's missing characters produces invisible character "holes" — spaces where glyphs should appear — that can be extremely hard to debug in production.
Using fonttools to Inspect Glyphs
python3 -c "
from fontTools.ttLib import TTFont
font = TTFont('inter-400-latin.woff2')
cmap = font.getBestCmap()
# Check if specific characters are present
test_chars = 'abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789'
missing = [c for c in test_chars if ord(c) not in cmap]
print(f'Missing characters: {missing if missing else \"None\"}')
print(f'Total glyphs: {len(font.getGlyphOrder())}')
"
Browser Rendering Test
Create a test HTML page that renders every character in your expected character set using the subsetted font. View it in Chrome DevTools with the Network tab open to confirm only the expected font file is requested:
<!DOCTYPE html>
<html>
<head>
<style>
@font-face {
font-family: 'Inter-Subset';
src: url('/fonts/inter-400-latin.woff2') format('woff2');
font-display: block;
}
body { font-family: 'Inter-Subset', monospace; font-size: 24px; }
</style>
</head>
<body>
<p>AaBbCcDdEeFfGgHhIiJjKkLlMmNnOoPpQqRrSsTtUuVvWwXxYyZz</p>
<p>0123456789!@#$%^&*().,;:'"/?-_+=</p>
<p>€£¥©®™—–""''</p>
</body>
</html>
Inspect the rendered output for any characters rendering in a fallback font (they'll look visually different from the surrounding Inter text). Address missing characters by adding their Unicode code points to your subset and regenerating.
Subsetting in JavaScript Toolchains
For teams working in JavaScript/TypeScript environments, subsetting can be integrated directly into the build pipeline without requiring Python.
Using subset-font npm package
npm install subset-font
// subset-fonts.mjs
import { subsetFont } from 'subset-font';
import { readFileSync, writeFileSync } from 'fs';
const fontBuffer = readFileSync('./fonts/Inter-Regular.ttf');
// Subset to a specific text string
const subsetBuffer = await subsetFont(fontBuffer, 'AaBbCcDdEe', {
targetFormat: 'woff2',
});
writeFileSync('./fonts/inter-subset.woff2', subsetBuffer);
console.log(`Original: ${fontBuffer.length} bytes`);
console.log(`Subset: ${subsetBuffer.length} bytes`);
console.log(`Reduction: ${((1 - subsetBuffer.length / fontBuffer.length) * 100).toFixed(1)}%`);
Vite Plugin Integration
For Vite-based projects, font subsetting can run as a build hook:
// vite.config.js
import { subsetFont } from 'subset-font';
import { readFileSync, writeFileSync } from 'fs';
import { glob } from 'glob';
function fontSubsetPlugin() {
return {
name: 'font-subset',
async buildEnd() {
// Collect all text content from built HTML files
const htmlFiles = await glob('dist/**/*.html');
let allText = '';
for (const file of htmlFiles) {
const content = readFileSync(file, 'utf-8');
// Strip HTML tags, keep text content
allText += content.replace(/<[^>]+>/g, '');
}
const uniqueChars = [...new Set(allText)].join('');
// Subset each font
const fontFiles = await glob('dist/fonts/*.woff2');
for (const fontFile of fontFiles) {
const buffer = readFileSync(fontFile);
const subsetted = await subsetFont(buffer, uniqueChars, {
targetFormat: 'woff2'
});
writeFileSync(fontFile, subsetted);
console.log(`Subsetted ${fontFile}: ${buffer.length} → ${subsetted.length} bytes`);
}
}
};
}
This approach produces perfectly tailored fonts for each build, automatically adapting as content changes. The font files in the production build contain only the glyphs that appear in the actual HTML output — the tightest possible subset without any manual character enumeration.
Font Performance Playbook
Typography Terms
Try These Tools
Fonts Mentioned
Rasmus Andersson spent years refining this neo-grotesque specifically for computer screens, optimizing letter spacing, x-height, and stroke contrast for high readability at small sizes on digital displays. An optical size axis (opsz) lets the font automatically adjust its design for captions versus headlines, while the weight axis covers the full range from thin to black. It has become the de facto choice for dashboards, documentation sites, and developer tools worldwide.
The Latin-primary entry in Google's Noto pan-Unicode project, this humanist sans-serif is engineered for maximum script harmony across Devanagari, Cyrillic, Greek, and Vietnamese alongside standard Latin. Variable width and weight axes allow fine-grained control for both compact UI labels and comfortable reading text. Its deliberate neutrality makes it the safest choice when a document must render correctly across diverse writing systems.