Deep inside webperf - Cache me if you can!

Thanks to a previous article on Horizons about web performance, you know how crucial is the performance of your website for your customers and therefore your business. You may now wonder how to improve your score. Let’s then talk about the “cache” management.

Even if this domain is quite technical, it can lead to a significant impact. For example, the cached version of the Carrefour.fr home page is displayed at least 3 times faster than the uncached version. It means its TTFB (Time To First Byte) indicator oscillates between 0.2 and 0.3 seconds with the cache and can easily reach one second or more without the cache!

In this article, we will focus on web caches, their importance and provide you with the key tips!

Web cache: understanding its use

Cache management can be done at 3 different levels, especially for an e-commerce website:

Infrastructure: This is the IT of your site that you manage – servers, storage, network. This can be hosted in a traditional datacenter or in the cloud.
The client or web browser: That you all know (Chrome, Firefox, Edge…).
Internet: Everything between your infrastructure and the browser.

What is a caching system? Let’s assume the site you are going to does not have any. Then, as soon as you request a page, this page would be completely built by the site’s servers. If two users request the same page, the servers work twice: once for each user. Then, why don’t servers build this same page once and then send it to both users? It would consume half the resources and would go much faster.

Cached objects (pages, JS, CSS, images) are usually subject to a TTL (Time To Live) that you can set. This represents the cached lifetime of an object. Instead of rebuilding the object or resending it, this setting allows you to say “Don’t rebuild or resend this object until its lifetime expires“.

For example, due to business constraints on Carrefour.fr we often set the lifetime of our product pages to 15 minutes, it allows us to flush information very quickly for each store that needs an update. The product page is built periodically by the servers and this cached page is sent in the same way to all those who request it without using your infrastructure resources.

But before getting to the heart of the matter, be aware that 99,9% of your site is static!

Website content: distinguish static from dynamic

99.9% of your site, not 99.9% of the pages of your site are static… Indeed, the static content is the one that does not change when you refresh your page. And information is very likely to change when you refresh a product page: prices, stocks, consumer reviews, promotional highlights, etc. However, the vast majority of information does not change over time: title, descriptions, images, technical characteristics.

Before starting to “cache” your site, It is important to separate the static content from the dynamic content. This is done by displaying the dynamic content via asynchronous calls using Ajax: “Asynchronous JavaScript and XML“. This is not the object of the article, but let’s prefer JSON to XML…

Ajax allows to load separately a “page” (the static content) from its dynamic elements (prices, stocks…). In other words, a browser displays a page which itself calls its dynamic elements. All this is so “fast” that you have the impression that the page has loaded in one go!

For example, some websites can load in Ajax the 3 or 4 consumer reviews that give SEO value (Search Engine Optimization). This allows to have a little more static content in the page and the rest of the display of the reviews remains managed dynamically.

On Carrefour.fr we are using “lazy loading” to dynamically load content inside image sliders. This design pattern defers loading of images on a web page until they are needed, which makes the display of the web page faster.

Becoming the cache master in no time

We cache almost everything, even (a lot of) dynamic content. Obviously, this may be surprising but we cache in particular the price and the stock.

We are using on Carrefour.fr a cache system in the backend called “Redis”. It’s working with our PHP Symfony’s frontend. Thanks to it, our microservices can cache this kind of content.

In case of high traffic of an already well “cached” website with an infrastructure that suffers from too many visitors, do not hesitate to cache the price and the stock, while preserving the “uncached” side of the shopping cart page where the customer will finally realize the price has changed or the product is no longer in stock.

Product types matter for this decision: If you are on a “private sale” site with little stock depth or an auction site, we would not recommend it. Some pretend it can be extremely disappointing for a customer to run out of stock once in the shopping cart page. We do think it is a reasonable tradeoff between a site that globally doesn’t hold the load anymore and some disappointments on the product availability in case of high traffic.

*Cache or not to cache: all is about balance!*

Some sites load the entire cart on each page. As you move your mouse over it, the content is displayed. This is fast but extremely impactful for your infrastructure. A workaround is to load only the number of items in the cart, and to trigger the total loading of the cart only when the mouse passes over it: The Ajax call is made at “mouse over”!

For all the dynamic pages (shopping cart, shopping tunnel, registrations), we use a “cached empty skeleton” but with a lot of Ajax. When the page is called, its content is changing dynamically, throughout the sequence.

A successful e-commerce site generally behaves differently in logged and unlogged mode. That is what we do for all our websites. In logged mode, for example, we will display the customer’s favorite store, promotions or suggestions of personalized items, or display an unused promo code to encourage purchase.

10,000 km to get a page is a bit long…

Between your browser and the website infrastructure, there can be a great distance! If the infrastructure was down your street, it would be much faster. It is partly because of the observation that CDNs (Content Delivery Network) were born. These CDNs are positioned “in the Internet” between your browser and a website. They have a double mission: caching static objects and storing these objects as close as possible to your home.

To summarize, instead of fetching a page directly from the site’s infrastructure (which may be geographically very far away) and asking the server to build it from scratch, you will take a cached version of the page as close as possible to your Internet connection!

The performance benefit is double: no time to build the page and less time for the information to travel! This is why we are using CDNs for all our websites.

This service is quite expensive from time to time. However, the gains in terms of performance are formidable and considerably improve your conversion rate!

A couple of tips to better use a CDN

Some CDNs have a “Cache Hierarchy” option: Instead of placing a cache server between the browser and your site, they place a second cache server, often “regional”. Before calling your site all cache servers in a region call a regional cache server which itself will call your site (cascade calls). This reduces even more the load (and therefore the costs) of your main infrastructure.

As we need to be sure that information can be updated quickly with less actions, we often put 15 minutes of TTL on all static pages. Unlike the browser cache, you keep control of the freshness of the pages thanks to the “flush”. This mechanism allows you to tell the CDNs’ servers: “Delete this page from your cache and come and look for the new one“. In a nutshell, you are saved…

Flush is however a sensitive tool that should not be left in all hands. Experience has shown that some non-IT users prefer using the “flush all” button rather than “flush /my-super-product.html” with the consequence of a site collapse. You will hence have to develop an add-on for your PIM (Product Information Management) or DAM (Digital Asset Management) to target a well-defined page or category of pages and give it back to the business.

*Flush mechanism: the way to refresh your pages*

We are soon going to use a dedicated CDN to cache images on Carrefour.fr. This CDN will host each image at different sizes. On mobile phones, where screens are smaller, there is no need to have large images… so We adapt images to screen size in order to reduce the data retrieved by browsers.

This means we will use two CDNs in parallel (one for pages, CSS and JS, and another for images). This allows us to ensure competitiveness in the quality of the performance of the routes. It’s a great help to reduce loading time.

Reduce infrastructure costs by ten?

Regarding the fact that you neither control your client’s browser (e.g., Internet Explorer or Chrome ? ADSL end of chain? Olitec 56K modem), nor the Internet flow and events (e.g., network congestion, peering problems), you have control over your infrastructure.

To save on infrastructure the simple answer is to put a cache server (Ngnix or Varnish?) in front of your web servers. Same principle as for CDNs without having the “last-mile” side but just as effective on caching.

For example, on Carrefour.fr, a “permanent” cache is managed by the backends. All the editorial contents have unlimited TTL. Each time a content changes, the backend refreshes the cache with the content updated.

*A cache server will be one of your infrastructure’s best friends*

Pay attention to the caching of microservices. One of the weak points of microservices architectures is the network verbosity: many services call for many sub-services. Although the services are generally all “APIzed”, with native caching functions, this does not solve the network verbosity which, even if it is faster because it is “cached”, persists.

On “Carrefour.fr”, we have set up “front-end” microservices that only cache data from our data lake. For example, we have a microservice “Facilities”, which caches (refreshes once a day) the list of stores/drives that are available on the website.

Client side caching: convenient but perilous

A good browser cache management is surely the Grail of caching but is quite perilous. Surely the simplest technically, it is also the most touchy, because once the page (or objects) are cached in the browser of your client, you have no recourse in case of error… It is the same caching concept as for CDNs or infrastructure caching servers but without the “flush” option.

*“Beware, the Grail cannot pass beyond…”*

We would advise to set a very short TTL on the pages, 1 hour maximum, but 6 months on the objects (images, CSS, JS). Indeed, the objects are called by the pages. So if you have an error on an object, you just have to change the url of the objects in the page. This has almost no SEO impact, while changing the URL of a page has a big impact (even if well managed in 301…).

To put it in a nutshell, there are many ways to set up cache systems for your website and mobile app. We would recommend to use cache as many times as possible in order to benefit from sales and page views increase thanks to browsing acceleration, scalability increase and infrastructure costs reduction.

Web cache: understanding its use

Website content: distinguish static from dynamic

Becoming the cache master in no time

10,000 km to get a page is a bit long…

A couple of tips to better use a CDN

Reduce infrastructure costs by ten?

Client side caching: convenient but perilous

What’s new?

Horizons’ interview – Meet Philippe Burgelin

Quijote: Harnessing windmills, harmonizing store shifts

4-in-the-box: All in for digital transformation!