Optimizing Chromium for macOS 26

Field entry, 6 May.

The project began with a premise that sounds sensible until you say it in front of a Chromium checkout: what if the browser stopped treating macOS as a polite POSIX host with decorative frameworks attached, and instead behaved like a native macOS 26 application from the first instruction of launch to the last composited frame?

This is the sort of question that makes a small patch file impossible.

Chromium is not a little app with a launch function, a window, and some good intentions. It is a city with its own roads department. On macOS it arrives as a bundle, a framework, helper processes, entitlement files, Objective-C++ bridgework, Swift-adjacent possibilities, a compositor, an input pipeline, a disk cache, a history service, a scheduler, and a great many seams where the browser has learned to survive every Apple API generation by being cautious. That caution is valuable. It is also where performance sometimes goes to sit down.

So the work in ../chromium-mac26-patches/ took a different shape. Instead of one grand optimization, it became a specimen cabinet: twenty-eight production-shaped patches, each behind a disabled-by-default Finch flag, each guarded with @available(macOS 26.0, *) where the API demanded it, and each integrated into the relevant BUILD.gn file so the idea had to at least inhabit Chromium’s build graph rather than loiter nearby with a clipboard.

The phrase “production-shaped” matters here. It does not mean “production-proven.” That distinction is the difference between an engineering note and a commemorative plaque. The patch set applies cleanly against Chromium main as of 29 April 2026, the new Objective-C++ files pass a Linux clang -fsyntax-only harness with Apple and Chromium stubs, the Swift bridges balance and line up at the C ABI boundary, and the build instructions explain how to produce a real macOS 26 binary on a Mac or a macos-26 runner. What we did not have in this pass was the full native build, link, sign, launch, trace, and battery-test loop on macOS 26 hardware.

That makes this a field journal, not a victory lap.

Still, even as a field exercise, it is useful because it forces the question into concrete pieces. “Make Chromium faster on macOS 26” is a wish. “Move renderer spawn onto the fast-launch posix_spawn path, replace a heuristic frame latch with CAMetalDisplayLink deadlines, use APFS cloning for cache writes, and throttle background painting when the operating system says the machine is thermally serious” is a plan with places to put breakpoints.

Start with launch

Cold launch is where large applications reveal their debt. The user sees a bounce in the Dock; the operating system sees dynamic libraries, code signatures, page faults, Objective-C metadata, helper process setup, component negotiation, profile work, GPU warmup, and the small tragedy of everything being correct while still taking too long.

The first patch, O1, attacked the part of launch that happens before Chromium gets much of a personality: dyld closure construction. Chromium’s macOS bundle includes the large Chromium Framework, and dyld has to understand that dependency graph before the browser can get on with being a browser. The patch adds an installer-side dyld_closure_gen.py hook to prebuild a dyld 4 closure for the framework so first launch is not forced to do all of that graph work cold. It is unromantic work, which is usually a promising sign.

O2 took the same attitude toward the binary itself. Universal2 compatibility is useful when a build must carry both Intel and Apple Silicon. For an arm64-only macOS 26 build, that padding becomes ceremony. The patch drops the universal2 padding path for arm64-only output, not because padding is morally wrong, but because every byte carried into launch needs a reason to be there.

O3 then split early browser subsystem initialization into a parallel scaffold. This is the part of the work where it is easy to become reckless, because parallel initialization sounds like a discount code for startup time until one remembers that initialization order is often an undocumented API. The useful version is not “make everything parallel.” It is to identify the work that is independent, preserve the sequence boundaries that really are boundaries, and make the remaining serial path shorter without turning startup into a race-condition museum.

O5, Background Assets, addressed a different kind of launch tax. Chromium often needs component-updater assets: model files, rulesets, language packs, the small cargo that makes the browser useful after it opens. If those assets are negotiated synchronously on first launch, the user’s first impression pays for work the operating system could have staged while the app was asleep. The patch adds a delegate shape for macOS Background Assets so the OS can prepare selected payloads outside the launch path. It is not glamorous. It is precisely the sort of quiet handoff native platforms are good at when applications stop insisting on doing everything themselves.

O6 is the launch/security sibling: a JIT write-allowlist entitlement path. Any browser that runs JavaScript has to care about writable executable memory with the seriousness of someone carrying something sharp through customs. The patch does not try to make JIT permission casual; it makes the entitlement conditional on the macOS 26 target path, keeping the build and installer in the loop rather than burying a security-sensitive assumption in runtime code.

The follow-up N9 patch made launch concrete at the helper-process level. Chromium creates renderer helpers constantly, and on macOS 26 the patch experiments with _POSIX_SPAWN_FAST_LAUNCH, combined with close-on-exec defaults and responsibility attribution. The code is deliberately shaped as a fast path that returns -1 on failure so the legacy spawn path remains authoritative. It is also the kind of patch that deserves suspicion: the attribute is resolved carefully so older systems still load, and the implementation treats the optimization as an opportunistic route, not a new law of nature.

Taken together, the launch work is less about a single stopwatch result than a principle: make the operating system do the work it can do earlier, make the binary carry less irrelevant weight, and make every faster route fail closed into the boring route.

Adding performance measurements

The rendering group was where macOS 26 became most interesting, because modern Mac performance is not just CPU time with better marketing. It is display timing, GPU residency, synchronization, memory pressure, layer-tree cost, and whether the compositor is guessing or listening.

R3 is the easiest to explain because it replaces a smell. Chromium has historically carried timing heuristics around presentation, including latch buffers that exist because the system does not always tell you the exact deadline in the shape you want. The CAMetalDisplayLink wrapper exposes target presentation timestamp, target timestamp, and duration from the system display link and posts that data back onto Chromium’s sequence as Mac26VSyncParams. In other words, the browser gets a clock with a deadline instead of sampling time near the end and hoping the weather has not changed.

R2 pairs with that by using MTLSharedEvent backpressure. Browser rendering tends to fail in two opposite directions: it can starve the GPU, or it can enthusiastically enqueue work until latency becomes a storage facility. Shared events let the browser track GPU progress with a native synchronization primitive rather than coarse CPU-side waiting. The point is not simply to send frames faster. It is to stop sending work when the work already sent has not made it through the pipe.

R1 adds a Metal 4 residency-set helper. Residency is a dull word for an important failure mode: the GPU cannot render from resources that have been quietly evicted or are expensive to make resident again at the wrong time. The patch gives Chromium a place to group and manage resources that should travel together through a frame. If R3 is about knowing when the train leaves, R1 is about making sure the passengers are at the platform before the whistle.

R4 is the unshowy layer-tree patch. Chromium’s macOS compositor has to translate browser content into Core Animation structures, and every transform, clip, intermediate layer, and special case has a cost. The sparser layer-tree policy raises the threshold for when Chromium keeps separate layers and folds transform/clip work when macOS 26 can handle the flatter shape cleanly. This is one of those optimizations that looks suspiciously like tidying, because in rendering, tidying is often performance with better manners.

R5 experiments with MetalFX frame interpolation. This is the flashy one, and therefore the one that needs the firmest leash. On a 120 Hz display, a page effectively producing 60 Hz content can look less fluid than the panel can display. The wrapper takes two real frames and page-level motion vectors, then asks MetalFX to synthesize an intermediate frame. The patch treats this as a specific tool for a specific mismatch, not a license to fake smoothness everywhere. Synthetic frames are only helpful if the browser is honest about where they came from and when they cost less than the stutter they hide.

R6 moves CSS filter chains and YUV-to-RGB conversion toward in-shader tensor work. The current style of multi-pass rendering can allocate intermediate surfaces, copy through IOSurfaces, and accumulate cost exactly where HDR and media paths are least forgiving. Expressing the chain as a single Metal tensor operation is a way of retiring intermediate render targets rather than making each one slightly faster. This is the better kind of optimization: delete a trip through memory, do not merely improve the carriage.

The follow-up N3 patch brings in Game Mode for fullscreen WebGL, WebGPU, and canvas workloads. It uses a heuristic: enter when the foreground tab is fullscreen and GPU-bound for long enough to be real, exit quickly when that stops being true. This is important because native optimization can become rude if it confuses a browser tab with a game merely because both contain pixels. The patch is trying to ask for priority only when the user has effectively turned the browser into a game surface.

N7 does the strange thing that sometimes improves performance by becoming more native visually: it routes context menus through macOS 26’s Liquid Glass NSMenu presentation style rather than Chromium’s Views-painted path. The obvious story is aesthetic unification. The technical story is that native menu rendering can gain framebuffer compression, accessibility behavior, and system-managed presentation work that a custom menu has to rebuild by hand. Native controls are not automatically faster, but custom controls are automatically responsible for everything they imitate.

N8 explores Accelerate and BNNS for image filters, using AMX-backed paths where macOS 26 can dispatch them. This belongs beside R6 but lives at a different layer: when the browser is applying image operations that match system-accelerated primitives, it should not cosplay as a vector library for reasons of habit. The useful test is whether the wrapper can remain narrow enough that Chromium still owns semantics while the OS owns the math.

Finding the right metrics

Input latency is rarely one bug. It is a chain of correct decisions with time stuck between them.

I3 is the cleanest specimen: carry NSEvent.timestamp from the renderer-side input pipeline to the GPU-side latency tracker. The patch exists because sampling base::TimeTicks::Now() near CommitPresentedFrameToCA() answers a different question from “when did the hardware event enter the system?” NSEvent.timestamp is tied to the mach-absolute event clock; using it gives the latency measurement a real origin rather than a late witness.

This distinction matters because a browser can improve the wrong metric if the clock starts too late. If the only timestamp you trust is taken after coalescing, dispatch, compositor queuing, and whatever else happened in the hallway, then the measurement is technically true and practically evasive. I3 is not glamorous because it does not speed anything up by itself. It makes later claims harder to fake.

I2 adds a QoS-targeted dispatch helper for the input path. macOS has opinions about quality of service, and Chromium has its own scheduling machinery. The patch is an adapter between those worlds: input work that is latency-sensitive should run with a priority shape that says so to the system scheduler, while background work should not win by accident because it happened to be earlier in a queue. The danger is priority inflation, where everything becomes urgent and therefore nothing is. The patch’s value depends on keeping the helper narrow.

I1 adds a browser-side IME cache around text input. IME paths are full of cross-process queries that are cheap in isolation and miserable in aggregate, especially for composition-heavy languages. The cache is a small bet that nearby text input queries often rhyme with the query that just happened. It does not turn the browser into an input method; it avoids asking the same expensive question at the worst possible time.

I4 changes HTML <select> popups from a blocking NSMenu modal loop toward an asynchronous NSPopover path where available. This is one of those bugs users experience as “the browser felt stuck for a moment” and code experiences as “we entered a perfectly legal nested run loop.” The async popup patch is valuable because it names the real enemy: not menus, but modal ownership of the browser run loop for a UI affordance that should not hold the entire application hostage.

N10 is the more speculative input-adjacent patch: use VisionKit Live Text analysis on the surrounding text of a hovered link to improve prerender selectivity. The patch’s note says the classification can raise useful prerenders from roughly 12 percent to 28 percent by distinguishing a checkout link from a settings link from an article link before spending the prerender budget. The interesting part is not “AI in the browser”, which is too broad to be useful; it is that a native semantic classifier might let Chromium spend speculative work only where navigation intent is more plausible.

Move fewer bytes

There is a particular satisfaction in optimizations that make less work happen. Disk and compression are full of those opportunities because the default implementation is often a portable path that copies bytes faithfully, which is admirable until the filesystem can answer the same request with a reference.

N4 adds an APFS clonefile() adapter for HTTP disk-cache writes. Whole-file clones are old enough to be boring, but macOS 26’s clonefileat_with_attrs path makes partial-range cloning more interesting for streamed responses. The patch resolves the newer symbol dynamically and treats failure as “fall back to the normal write path.” That is the correct temperament for filesystem optimization: try the zero-copy route, and when the volume, offset, or OS says no, do not sulk. Just write the bytes.

The expected effect is not a tiny percentage improvement in a hot function. It is a change in the unit of work. A pwrite-based copy scales with the data moved. A clone is effectively metadata until copy-on-write forces reality later. For a browser cache, especially one doing repeated writes of related content, that can mean less write bandwidth, less flash wear, and fewer stalls in places users do not know how to name.

N5 wraps Apple’s Compression.framework for zstd decompression. Chromium already carries excellent portable compression code, but macOS can sometimes route a standard algorithm through hardware-backed acceleration on Apple Silicon. The patch builds a FrameworkZstdDecoder with the streaming API and falls back to the bundled library when the feature is disabled or unavailable. Again the shape is the important part: keep Chromium’s caller contract stable, use the native path only when it exists, and do not make the portable route second-class.

This is also where the speedup table becomes tempting. The patch comments talk about hardware zstd running significantly faster than NEON in the target path, and the status ledger estimates large disk-write reductions for APFS cloning. Those numbers are useful as a compass, not a deed of ownership. Until the patched browser is built, signed, launched, and traced on real macOS 26 hardware, they are hypotheses with good manners.

Use thermal signals

The power-scheduling group, N1 and N2, made the whole exercise feel least like a benchmark stunt.

N1 wires NSProcessInfo.thermalState into Chromium’s renderer scheduler. The implementation maps Apple’s thermal levels into a Chromium-facing enum, stores the current level atomically, listens for NSProcessInfoThermalStateDidChangeNotification, and exposes policy helpers: background tab paint can drop from the usual low rate to 1 Hz under serious thermal pressure, freeze under critical conditions, speculative prerender can stop once the machine is no longer merely fair, and Optimization Guide warmup can avoid piling work onto a system already struggling.

This is not the kind of optimization that wins a launch chart. It wins the afternoon. A browser that keeps background tabs politely animated while the laptop is hot is technically doing work correctly and product-wise behaving like a guest who has mistaken the kitchen fire for ambience.

N2 uses Low Power Mode as a related but distinct signal. Thermal state says what is happening to the machine. Low Power Mode says what the user or system wants to preserve. Treating those as the same signal would be lazy. A laptop can be cool and still conserving battery; it can also be warm because a user is intentionally doing heavy work. The scheduler should be allowed to hear both.

The status file estimates a 10 to 15 percent battery or sustained-use improvement from the power-aware group, which is exactly the kind of number that should be repeated carefully. The more precise version is that macOS 26 gives the browser better native signals, and the patches create places where Chromium could translate those signals into renderer, prerender, background paint, and warmup policy. The measured product improvement still belongs to the hardware run.

Native surfaces do real work

The P* patches are easy to misread because they sound like platform-integration candy: Liquid Glass, App Intents, Foundation Models, Live Activities. Some of that is product surface. Some of it is performance by another route: remove custom work, move state into system affordances, and avoid waking a full browser window for tasks the operating system can represent directly.

P1 wraps target views in NSGlassEffectView when the Liquid Glass feature is enabled on macOS 26. A glass effect can be an aesthetic trap if it becomes a design brief masquerading as engineering. Here it is a narrow wrapper behind a feature flag. The browser can adopt the platform material where appropriate without rewriting the rest of the UI around the fashion of the year.

P2 adds App Intents through a Swift module bridged to C++. The Swift side defines the intents, while C++ owns the actual browser command routing. That boundary matters. The operating system gets discoverable commands in Spotlight or Control Center; Chromium does not surrender its command semantics to a separate Swift island. The bridge is small, C-shaped, and testable, which is the only civilized way to invite Swift into a large C++ house.

P3 uses Foundation Models for tab summarization, smarter Find-in-Page, and semantic autofill labeling. This is where the post could become ridiculous if one lets it: on-device models, browser intelligence, the future of everything, cue the orchestra. The engineering version is plainer and better. The bridge assigns opaque request IDs, keeps callbacks in C++, submits async work to Swift, and returns results through registered callbacks. It treats the model as a native asynchronous service, not as a mystical paragraph dispenser.

P4 adds Live Activities for downloads and active calls. Again, the point is not that the browser needs a charming menu-bar pill. The point is that long-running browser activity often survives beyond the user’s attention to a specific window. If macOS has a native status surface for that activity, Chromium can expose progress without forcing a tab or toolbar to remain the only witness.

N6 extends that native-surface thinking into CoreSpotlight history indexing. It indexes visited URLs, removes them when history changes, and drops the whole domain on clear or teardown. The status notes macOS 26’s lower-overhead live-index pipe, but the more important product constraint is privacy: history indexing must be explicit about removal paths, incognito boundaries, and reset behavior. Native search integration without native deletion discipline is just a leak with typography.

Verification belongs in the artifact

The most honest files in the patch bundle are the verification notes.

The patch set contains 30 new C++ or Objective-C++ files, 3 Swift modules, 17 modified files, 12 BUILD.gn integrations, and an all.patch of 4,133 lines. The Linux harness cannot link Metal.framework, AppKit.framework, MetalFX.framework, VisionKit, or any of the macOS 26 SDK surfaces. It can only run clang -fsyntax-only against stub headers that declare the Apple and Chromium names used by the new files.

That sounds weak until you compare it to the alternative, which is believing a directory of patches because the filenames are persuasive.

Syntax-only verification catches declarations, expressions, Objective-C interface conformance, availability syntax, basic type compatibility, and the embarrassing class of mistakes where code looks plausible in a Markdown block and collapses when a compiler gives it a surname. It does not catch missing symbols, ARC ownership issues that the real SDK diagnoses, entitlement behavior, runtime API changes, or whether the frame pacing actually improves on a ProMotion panel. The status file says this plainly, which is why it is useful.

There is also a Z0 build integration patch, and it matters more than a casual reader might expect. Performance experiments that live outside the build graph are sketches. Once a patch touches the relevant BUILD.gn files, adds feature flags, and threads source files into Chromium’s actual ownership structure, it has to negotiate with the architecture instead of narrating around it. The integration patch modified twelve build files, including chrome/browser/mac, components/viz/service, content/browser, gpu/ipc/service, ui/accelerated_widget_mac, ui/display, net, cc/paint, and the zstd wrapper.

In other words, the verification was not proof that Chromium became faster. It was proof that the work had become specific enough to be wrong in useful ways.

Hand-drawn technical map of macOS 26 Chromium optimization layers: cold launch, renderer launch, render timing, input latency, cache I/O, compression, thermal scheduling, and verification. — FIG. 02 — PATCH FAMILIES BY PERFORMANCE SURFACE.

What the speedups mean

The status ledger aggregates the original and follow-up rounds into a tempting summary: cold open down roughly 30 percent, render P95 down roughly 29 percent, interaction P95 down roughly 28 percent, sustained battery improvements estimated around 10 to 15 percent, and disk write bandwidth reduced by 30 to 60 percent in the APFS clone path.

Those are the right numbers to investigate and the wrong numbers to tattoo on the release notes.

The more precise claim is that the patch set identifies plausible native macOS 26 routes for each of those performance surfaces, implements them behind disabled feature flags, integrates them into Chromium’s build files, and verifies syntax shape outside the target OS. The next stage is not to write a nicer table. It is to apply the patches on a real macOS 26 host, build arm64 Chromium with Xcode 26, enable selected feature groups one at a time, and run the traces that separate causality from enthusiasm.

That testing should be grouped by surface rather than by patch pride. Launch tests need cold and warm starts, dyld closure present and absent, renderer spawn microbenchmarks, and helper-process attribution checks. Rendering tests need frame pacing on 60 Hz and 120 Hz panels, GPU queue depth, memory residency behavior, and visual correctness around interpolation. Input tests need hardware timestamp propagation and actual interaction traces, not merely event-loop optimism. Disk tests need APFS and non-APFS volumes, cross-volume failure, cache eviction, and streamed partial writes. Power tests need sustained workloads, thermal transitions, Low Power Mode, background tabs, and battery draw over time.

This is where native optimization becomes less cinematic and more useful. The benchmark is not a single number. It is a map of where the browser can trust the operating system more than it used to, where it must still own the portable path, and where the native API is charming but not worth the complexity.

What this proved

The obvious lesson is that macOS 26 exposes useful native performance surfaces for a browser like Chromium. That is true, but it is not quite the lesson.

The better lesson is that native optimization is mostly translation work.

You translate dyld’s ability to precompute into a launch artifact. You translate APFS copy-on-write into cache policy. You translate CAMetalDisplayLink deadlines into compositor timing. You translate NSEvent.timestamp into a latency origin. You translate thermal and battery signals into scheduling restraint. You translate App Intents, Spotlight, Live Activities, and native menus into product surfaces that reduce custom work rather than merely decorate it. And at every translation boundary, you keep the old route available, because a browser is not allowed to be elegant only on the day the demo is recorded.

That is what made this exercise satisfying. It did not produce a single heroic patch. It produced a route map, a set of guarded crossings, and a list of places where the browser could stop pretending the operating system was just a filesystem with windows.

The next field entry belongs to the Mac build.

The compiler has inspected the specimens. The weather has not yet been walked.