Back to Tech
Bringing e-commerce search to the next level
March 10, 2021

I believe that effective e-commerce sites are focused on two main points: Helping customers to quickly find what they are looking for and helping them to receive what they ordered as quickly as possible.

e-commerce search is THE enabler for the first half of the equation. Customers use it to find their products faster, typing the article name instead of browsing through the whole catalog. However, search results do not always fit with what the customer has in mind. When this happens, filtering and navigating over a confusing taxonomy of products inevitably leads to leaving the website.

Studies confirm that search is one of the most important parameters when it comes to optimizing an e-commerce site. 

In Carrefour Spain we managed to bring search to the next level. Let me share with you how we did it.

Search@Carrefour Spain

Couple years ago, we realized that our e-commerce search could greatly improve. Our customers did not find it particularly useful and often relied on filtering and navigation in the store.

It was very easy to realize that something was off when analyzing the data. The number of queries per customer was 1,8 on average. The drop-out rate was very high as well: 29% of customers that used the search engine ended up leaving the store.

A quick look to our obsolete technical landscape helps to explain this situation. Our search engine (an old version of Endeca, linked to a legacy Oracle Commerce platform dating from 2011) presented several limitations:

  • Traditional search engine, proprietary, closed and poorly documented.
  • Search technology heavily coupled with the e-commerce platform.
  • Batch-oriented indexing process making processing failures common.
  • Limited functionality. For example, it was especially difficult to enrich the search results with different data sources.
  • Limited scalability, being very hard to combine with solutions that can adapt to peaks of activity like Kubernetes.

In turn, this led to concrete pain points for our customers. Some of these included:

  • It was not possible to personalize search results based on historical customer behavior and preferences. This is extremely important. For instance, when a customer searched for “jabón” (soap), he could be meaning “jabon para lavadora” (washing machine soap). Customers looking for “lettuce” may be interested in buying “tomatoes” too!
  • Queries that could not be auto-completed. Alternatives (“washing machine soap” or “hand soap”) were not available either.
  • We could not modify ranking algorithms. If one product became very popular, the search engine would not react to this change and it would not change the relevance to propose it to our customers in the first place.

Together with a group of e-comm enthusiasts we decided to take matters into our own hands and improve this situation. After rethinking the tech stack, we managed to get a powerful search machine that lead to applying a different type of thinking and ways of working when it came to search, much more driven by data around a three-step sequence:

1. Understanding customer search behavior.

2. Processing this information to “cook” smart search responses.

3. Leveraging search insights to evolve the interaction with customers, making it more fluid and conversational.

Rethinking the tech stack

We were convinced that customer experience would only improve if we had solid technical foundations. We decided to move on and review the tech stack, combining open source with purposeful market standards.

We chose Elasticsearch (aka “Elastic”) as a search engine. It is an open source solution, backed by a strong development community, stable and “battle tested”. Based on the Lucene library, Elastic makes it quite easy to adjust and modify the algorithmic techniques. It is also scalable, a must-have to support our ambitious e-commerce development plans.

We also decided to enrich the search experience with some AI and ML tricks. The help of was decisive to make it possible. Headquartered in the north of Spain (the beautiful Gijon!), is a company specialized in search that provides a smart solution to expand a traditional search engine.

To put it in simple words, Empathy helps us to make the search experience contextualized and personalized: Gathering information about customers’ behaviour, improving the relevance of products’ ranking and making search recommendations.

Empathy completes Elastic with a set of data management capabilities developed in the Google Cloud Platform (GCP):

  • A tagging service that listens to all “query signals” (search queries and product clicks) and sends this information to a pub / sub queue (GCP Message system).
  • The transformation of this stream of information into files that can be easily processed by Empathy’s algorithms. This is done with Dataflow (the GCP implementation of Apache bean
  • The intelligence behind all this (processing, management of ranking relevance and recommendations) with Dataproc (the GCP implementation of We also use Dataproc for statistical analysis, to see if a query is working correctly or if it is becoming popular.
  • The storage of the resulting insights and recommendations in MongoDB.
  • Finally, the publication of an API that can be used by a front-end or a mobile app. Carrefour’s Spain microservice is deployed in GKE (Google Kubernetes Engine).

In the diagram below, you can see the simplified architecture of Empathy and its main components.

Understanding customer search behavior

With the new tech in place, the usage of search evolved, becoming more data-driven.

As a starting point to decrease the drop-out rate, we decided to collect information on customers’ behavior. We gathered significant volumes of data (thousands of customers during several months) on:

  • Search queries.
  • Search results.
  • Clicks on products from search results.
  • Products added to cart from search results.
  • Products purchased from search results.

The easiest way to collect this information is to “tag” all the actions thrown by the browser with a small piece of JavaScript code and send all the data to the back-end server for study. 

Below, you will find an example of two “query signals”. Basically they are two http requests that tell us “when a customer performs a query in the search engine” and “when customer clicks on one product from a search result”.

Query event signal

Click event signal

Of course, all this information must be collected respecting our customers’ privacy. All data is anonymized and we do not store information that can be linked to a specific customer.

The collection of this data made it very clear to us that search was an avenue of opportunities! No data science magic was necessary to realize this. A simple look through raw data helped us understand things like:

  • In our case, non-food queries were more popular than food queries.
  • Food search queries had a better CTR (Click Through Rate) than non-food ones.
  • When using search, most customers were interested in very specific products. For instance, sanitary masks during the high peak of the Covid-19 pandemic.

“Cooking” smart search responses

One of the advantages of having our search data “tagged” is that, besides drawing simple conclusions from observation of the information, we can do much more. For instance, it is possible to deduct the best search suggestions for a specific customer based on his / her search history.

Here are a few things that we did within Carrefour Spain to make e-comm search smarter:

  • Contextualize search results. Automatically boosting products based on the situation. It was the case of sanitary masks over beauty masks during the high peak of the sanitary crisis.
  • Propose trending queries. These are the most popular queries based on what people were looking for in the search. For example, people seem to be very interested in buying a Playstation 5 nowadays…
  • Complement queries with related tags. Showing related tags automatically when customers type a word. 
  • Establish relationships between queries to anticipate the “next-best” search. For instance, people looking for “patatas” (potatoes) also look for “cebolla” (onions) or “huevos” (eggs) to make the famous spanish omelette:

How does this work? A bit like RankBrain, the idea is to slice and dice raw data with AI and ML algorithms to extract valuable and interesting insights.

But enough of data science generalities! Let us give a concrete example. In the case of related tags and next-best search we process very significant amounts of collected query data leveraging an extensive range of filters:

  • The blacklist filter. Filtering out terms on a blacklist.
  • The deduplication filter. Merging word forms or very similar terms, using the Jaro Winkler similarity as a metric to compare words.
  • The discard filter. Excluding queries with no results.
  • And the expanded top term filter. Using term statistics to decide if a word needs expansion and selecting the most suitable option.

Leveraging search’s information to speak to our customers

Once all this intelligence was built, it was time to expose it to the front-end apps via APIs. In the case of Carrefour Spain, we built a multi-purpose microservice that allowed for several types of search optimization actions: Contextualizing search results, analyzing trending queries, auto suggesting queries…

The front-end can leverage this service to show more accurate search results. In our case, search results are displayed based on three elements:

  • The direct match to our internal search library.
  • “Clicks” and “add to cart” events made by all customers.
  • And the previously described additional magic coming from our AI and ML algorithms.

No matter the combination, we found that the trick was to make search results extremely fluid, almost “conversational”. In other words, the aspiration shall be to provide the same type of interaction a salesman in a traditional physical store would offer. A conversational approach is particularly important in a world where most traffic comes from mobile devices, where there is no proper keyboard and it is much easier to click vs. type.

To make it more concrete, let us come back to one of the examples before. When searching for “jabón” (soap), we could simply decide to display what we think are the most interesting products for most of our customers.

However, considering that our aforementioned customer is not really thinking about hand soap but maybe washing machine soap, we co could propose a few options he / she may be interested in via autocomplete to help him / her get more specific.

Since the customer is interested in washing machine soap, he / she will click on the “lavadora” (washing machine) suggestion.

Then, we could propose related tags to present linked search results (e.g., washing machine, cleaning globes, cleaning liquid, clothes).

And finally nailing it with the “next-best” search hint. In this case, the customer will see the suggestions of the next most likely query based on the behavior of similar customers, for example, he could be shown a “suavizante ropa” (cloth softener), including promotional options.

All these features create sort of a chatty experience with our customers. The final goal is to help them find things and hence increase the number of items in the shopping cart.

The outcomes

Since the launch of the new search in February 2020 to today, the number of search queries a customer makes per session has increased by 48% and the drop-out rate for customers using search has decreased by 49%.

All in all, this work showed us that good tech -applied with purpose and passion- improves the lives of our customers. Investing in search to better listen, deliver data-driven recommendations and develop relationships with customers is one of the best ways to improve the customer experience of our website!

Check our other tech projects on Horizons.

About the Author

Carrefour Spain E-Commerce IT Director With more than 10 years of experience in e-commerce platforms, Jesús has been performing several roles from software developer and architect to management. He enjoys getting “hands-on” and helping teams to solve technical problems!

What’s new?