Make web faster with HTTP Caching
Using Cache-Control we can set caching policy on the browser or Shared Cache(CDN) for each REST resource.
For example, we may want to cache an image for 6 months without revalidating the cache and JS script for till cache validation fails. Here validating cache means to check if the resource (image, script, etc) has been modified on the server or not and whether the cache has the latest version of the resource.
We can set the cache control headers in the HTTP requests from the client-side or the server can add them as part of the response. Either way, it sets the cache policy for any subsequent request to fetch the resource.
An example of the cache-control header:
cache-control: "public, max-age=15552000, no-cache"
On the right-hand side, we can pass a list of directives to cache-control to define the policy. We can break down the above cache-control header to
public: cache the resource on the browser as well as any CDN or intermediary Cache
max-age: cache for a period ao 6 months
no-cache: must-revalidate the resource in cache with the origin server
List of cache-control directives in the request header
- max-age=<N>: N seconds before resource becomes stale
- no-cache: Cache must be validated each time
- no-store: No caching of the resource at all.
- max-stale=<N>: the client allows a stored response that is stale within N seconds.
- min-fresh=<N>: the client allows a stored response that is fresh for at least N seconds.
- no-transform: Intermediate layers between client and origin server cannot modify the response. Example Google-web-light convert images to low quality to deliver a fast response to slow networks
- only-if-cached: Client wants only a cached response or none.
List of cache-control directives in the response header
- max-age
- no-cache
- must-revalidate: only revalidate the cache when the response is stale
- no-store
- private: only cache in user browser
- public: cache in both user browser and intermediary caches.
- no-transform
- immutable: a cached response won’t be updated till it’s fresh.
Refer full list here: https://developer.mozilla.org/en-US/docs/Web/HTTP/Headers/Cache-Control#response_directives
Why both request and response header?
Request headers are important so the client can override the caching policy set by the origin server for the current request.
Example: When we hit hard refresh on a page, the browser will send a “cache-control: no-cache” header in the request. This way whatever be the cache policy set by the server, the cache will be validated before sending the response and the user will get the latest response.
Cache validation
If we pass the no-cache directive in cache-control, then the cache will first validate the resource against the origin server. This validation is done using the ETag and If-None-Match headers. For example, the server will set Etag in the response header of a GET request, and when a client fetches the same GET request then the If-None-Match header will be sent in the request header with the same Etag. If Etag validates the resource is up to date with the origin server then Cache sends the stored response with 301 Not Modified status, Else server will send the updated resource.
// response header
ETag: werty3456789df45er76// request header
If-None-Match: werty3456789df45er76
Similarly Last-Modified and If-Modified-Since can also be used for cache validation.
Cache Busting
Generally, we cache static resources like JS, CSS, images, etc for a minimum period of 6 months, so in order to get a new version of the cached resource we append a new version number in the name or query string of the URL. This is called Cache Busting.
For Cache Busting we don't cache HTML response with no-cache, no-store headers in cache-control. For assets like JS, CSS we specify cache-control max-age=15552000, public, immutable. In HTML JS script will be something like below
<script src="/js/main.v1.0.0.js"/>or<script src="/js/main.js?v=1.0.0."/>
This way we ensure HTML is never cached and whenever a newer version of JS, CSS, etc is available it is fetched from the origin server.
Depreciated header
pragma and expires were used for cache control pre HTTP 1.1.
expires: <valid date>. A valid date string not more than a year.
pragma: no-cache. The only possible value for this header, to instruct the cache to revalidate stored response or not.