Browser Caching

March 6, 2022

What is caching?

When we visit a website for the first time, for example, juejin.cn, the computer downloads the images and data from the site. When we visit the site again, the website loads directly from the computer. This is caching.

What are the benefits of caching?

  1. Reduce server load, as we don’t need to request the same data every time.
  2. Improve performance, as loading local resources is faster than fetching from the server.
  3. Reduce bandwidth usage. Using cached resources only generates minimal network traffic. Why accessing local resources still generates network traffic will be explained below.

Browser caching process: Strong Cache, Conditional Cache.

Browser cache locations are generally divided into four types: Service Worker --> Memory Cache --> Disk Cache --> Push Cache

Strong Cache

Strong cache means that when we access a URL, no request is sent to the server, and the resource is read directly from the cache. However, it still returns a 200 status code.

How to set strong cache

The first time we enter a page, a request is sent to the server. The server responds, and the browser checks the response headers to determine whether to cache the resource. If the response headers include expires, pragma, or cache-control, it indicates strong caching, and the browser stores the resource in memory cache or disk cache.

image

Cache control fields

The main fields are: expires, Cache-Control, pragma

expires

Used in HTTP/1.0 to control web page caching. The value is a timestamp (GMT), representing the resource's expiration time. If the cached resource is requested before this time, the browser uses it; otherwise, it re-requests the resource.
A drawback is that it relies on local time, which can be manually modified.

Cache-Control

Used in HTTP/1.1 to control web caching. If both Cache-Control and Expires exist, Cache-Control takes priority. Common values:

  • public: Resource can be cached by both client and server.
  • private: Resource can only be cached by the client.
  • no-cache: Client caches the resource, but validation via conditional cache is required.
  • no-store: Do not cache.
  • max-age: Cache lifetime in seconds.

Cache-Control with max-age uses relative time, solving the problem of Expires.

pragma

HTTP/1.0 field to disable caching. Value: no-cache, same effect as Cache-Control: no-cache.

Cache location

Strong cache resources are stored in memory cache and disk cache. Which resources go where?

Images and web pages are mainly cached in disk cache, while OS-level files are mostly cached in memory cache. Browsers automatically allocate based on resource usage.

Memory cache requests often show 0ms, which is very fast. Browsers check caches in the order: Service Worker --> Memory Cache --> Disk Cache --> Push Cache.

Service Worker

A separate thread running in the browser, usually used to implement caching. Requires HTTPS because it intercepts requests. Unlike other browser caches, Service Worker allows full control over which files to cache, how to match and retrieve them, and provides persistent caching.

Memory Cache

In-memory cache for resources already fetched on the current page (styles, scripts, images, etc.). Fast to read but short-lived; released when the tab or process closes.

Disk Cache

Stored on the hard drive, slower to read but can store any resource. Advantages over memory cache: capacity and persistence. Disk cache covers most resources, deciding based on HTTP headers which resources to cache, which can be used directly, and which need revalidation. Most cached resources come from disk cache.

Memory cache is much faster than disk cache. Example: fetching from a remote server may take 500ms, disk access 10–20ms, memory access 100ns, L1 CPU cache 0.5ns.

prefetch cache

Resources marked with prefetch in <link> are loaded in idle time by the browser for future use.

Push Cache

HTTP/2 feature. Only used if the previous three caches miss. Exists during the session, released after the session ends. Cache time is short (around 5 minutes in Chrome). Does not strictly follow HTTP cache headers.

Conditional Cache

Conditional cache happens when strong cache expires. The browser sends a request with cache identifiers; the server decides whether to use the cached resource.

image

Two cases:

  1. Conditional cache valid, returns 304

image-20220216161804242

  1. Conditional cache invalid, returns 200 and resource

image-20220216161818821

How to set conditional cache

Last-Modified / If-Modified-Since

last-modified

  • Last-Modified: Server returns the last modified time of the resource.
  • If-Modified-Since: Client sends this in a subsequent request. Server compares it with the current resource's last modified time:
    • If the resource is updated, return 200 and resource.
    • Otherwise, return 304 to use cached resource.

Etag / If-None-Match

etag

  • Etag: Unique identifier for the resource generated by the server.
  • If-None-Match: Client sends the Etag from last response. Server compares:
    • Match: return 304, use cache.
    • No match: return 200 with resource.

Etag / If-None-Match has higher priority than Last-Modified / If-Modified-Since.

Etag is more precise if content is regenerated but unchanged.

Strong vs Conditional Cache

  1. Strong cache does not send requests to the server, so the browser may not know when resources update. Conditional cache always checks with the server.
  2. Most web servers enable conditional cache by default.

Caching Strategy

Common strategy:

  • HTML: Conditional cache
  • CSS, JS, images: Strong cache with file name hash

Effect of Refresh

  1. Ctrl+F5 force reload: bypasses both caches, loads from server.
  2. F5 reload: bypasses strong cache, checks conditional cache.
  3. Enter URL in browser: uses strong cache if valid, conditional cache if needed.

Summary

image-20220216161428643

  • Check Cache-Control for strong cache:
    • If valid, use it.
    • Else, use conditional cache:
      • If updated, return 200 and resource.
      • Else, return 304, use cached resource.