Artefact Data For Retail Report
Artefact Data For Retail Report
RETΛIL
WE UNLEASH THE FULL VALUE OF
DATA THROUGH DEMOCRATIZATION
The Netherlands
UK Germany
France Switzerland
Spain
New York South Korea
Lebanon
Morocco
Saudi Arabia China
UAE
Los Angeles Mexico
Colombia Malaysia
Singapore
Brazil
18 +1300 +1000
COUNTRIES EMPLOYEES CLIENTS
TABLE OF CONTENTS
3
Interview with Vincent Luciani about
Data & AI market perspectives
The strategic importance of data for companies is no
longer in question. Aware of this reality, Artefact helps
companies to capitalize on this performance, growth
and improvement lever.
4
INTERVIEW
“
processing, with a real potential for
efficiency, such as healthcare or
“heavy” industries. This is especially
true in comparison to the consumer
and retail sectors, which have begun
their data revolution and which we
It is key to ensuring that the
know very well, such as L’Oréal,
Danone, Unilever, Samsung, etc.
values of inclusiveness and diversity
We started to transform marketing
are respected. Data has an
departments by making them more
profitable and relevant in their multi-
essential role to play in creating a
channel media investments with
pioneering targeting, measurement
more ethical and just world.”
and personalisation solutions. For
the past few years, we have also been
deploying acceleration programs in
all business areas (Sales, Supply
Chain, Operations, Call Centers, HR and services, and streamlining internal currently used by more than 10,000
& Finance, etc.). We create value operations to reduce costs and waste. Orange technicians throughout the
wherever there is data, and with our For example, we’ve been working with country – a resounding success!
clients, we improve their processes the Orange telecommunications
and produce customized business group for over six years, and among This case perfectly illustrates
applications. the many use cases for leveraging Artefact’s firm belief that to achieve
the company’s automation and AI true data maturity, companies have no
Can you give us a concrete potential, we deployed a solution choice but to make data accessible to
example that shows, from a precise with their teams to optimize their everyone: not only to experts, but also
business and operational objective, technicians’ interventions on the to operational staff in the field. This
how Artefact designs AI solutions fiber network. The solution is based will lead to new forms of augmented
that improve business uses? on visual recognition technology that work, where applications and their
helps operators improve the quality interfaces put intelligent information
Data is the key to understanding of their installations or repairs. This in everyone’s hands to work more
customers, developing better products application, available on a tablet, is efficiently and with more autonomy.
5
Listening to you, it’s clear that
data should no longer be a subject
reserved for experts only. How
does Artefact see the vision of data
democratization materializing?
“
The first was the launch of the Artefact
School of Data two years ago, a key
pillar in our strategy of providing
clients with training adapted to the
constantly evolving skills of the data
Democratizing data and
industry. We’re also developing “à la
carte” e-learning platforms for clients
making it accessible to all is key
to quickly share knowledge of data and
AI with all of their employees.
to accelerating business and
And at the beginning of this year, we
creating value.”
launched our Artefact Research Center
under the leadership of Emmanuel
Malherbe (class of 2008), to bridge the
gap between fundamental research After a year of robust growth in acquisitions in particular enabled
and its democratization for business 2022, what are your forecasts for us to expand Artefact’s portfolio of
applications, in collaboration with 2023? clients and services: the acquisition
clients who provide access to their of Startup Inside, a pioneer in open
data and use cases. To achieve this, we After +50% organic growth in 2022, innovation & intrapreneurship
have partnered with several renowned our objective is to maintain the strategy consulting and international
university laboratories, including momentum in 2023 with a sustained Data and AI conference organizer,
CMAP and MEI at Polytechnique, to recruitment policy in France and in and, more recently, the merger with
host PhD students at Artefact, who our 16 subsidiaries in Europe, MENA, the Arca Blanca group, a leader
work on data model improvement and Asia, North and South America. We’ve in data consulting in the United
organization studies. They will publish just deployed our Artefact Africa entity Kingdom.
scientific articles and participate in from Morocco and will soon open an
international conferences to share office in Korea. We will also accelerate We are quite optimistic about the
their knowledge. our development in LATAM and the future because even though the
United States. economy is currently strained,
These programs are just the first steps companies need to better understand
we’re taking to democratize data and Artefact’s expansion also involves the shifting environment and find
help our clients transform faster and an ambitious M&A policy that will rapid solutions for adaptation and
better. continue in 2023. In 2021, two progress through data.
6
DATA FOR RETAIL
7
processes. It’s the seamless and need to be served as well. Once again, the transformation. Data is about
massive availability of third-party data science comes to the rescue of people. This is Artefact’s slogan, and
data in the market. It’s the ease retailers by allowing them to better rightly so.
of use of technology stacks that control inventory management. By
allow millions of transactions to be leveraging machine learning, retailers Directly monetizable data…
processed in a few milliseconds. can now analyze receipts in real But perhaps not immediately
Today, it only takes three months to time to immediately detect out-of- for everyone
build a data platform that combines stock items, calculate the spread of
all transaction data, promotions, uncertainties across all links in the The icing on the cake is that data itself
stock, product hierarchy, store chain to better size buffer stocks, is a goldmine, thanks to retail media
hierarchy, customer data, etc. And a or improve stock allocation under and data sharing. As digital signals
technology partner can manage the an infinite number of constraints become more difficult to capture, the
infrastructure, resource deployment (to optimize costs, shorten delivery billions of transactions and customer
and network dimensions in the cloud times, or reduce carbon footprint). interactions that retailers generate
through its managed services. Today, have become a critical strategic
like Monsieur Jourdain, if you know Democratizing data use advantage for them. This data, which
Excel and PowerPoint, you’re a data throughout the company: provides in-depth understanding of
analyst without realizing it: in a data is about people consumer expectations, has great
matter of days, you can take control potential for monetization. But it
of data in BigQuery (Google), Synapse In a business where margins are so represents a profound, existential
(Microsoft), or Snowflake and build tight that operational excellence is a transformation of the retailer business
interactive dashboards in Looker, necessity, the notion of a data-driven model: moving from a self-financed
Power BI or Tableau. company is far from new. What’s model (with negative working capital)
changing today is the ease of access but with very narrow margins, to a
…and better manage and use of technological platforms. model where initial investments are
inventory substantial but margins are high.
If technology is no longer a barrier, A Copernican revolution, perhaps
In recent years, the health and the challenge is still to make these not the easiest to undertake for all
geopolitical context has also solutions available to the widest players.
challenged supply chains. Today, possible audience. To democratize
the supplier service rate varies from their use, simple solutions need to In the great supermarket of retail
week to week and delivery times be deployed on a massive scale, value creation opportunities, data
can be very uncertain. Distribution employee training programs need technologies are now at the top of
channels have also grown highly to be multiplied, whether on-demand the shelf, in self-service. Retailers,
complex: not only do stores need or more intensive, and events (e.g., why wait to share these unbeatable
to be stocked, but home deliveries, hackathons) need to be organized to offers with your partners?
click-and-collect, and partnerships engage the managers who are driving
8
DATA FOR RETAIL
CASE STUDY
CARREFOUR GROUP
How Data & AI can accelerate
sustainable business transformation
CHALLENGES
9
SOLUTION
The first step for the Artefact and Carrefour teams The second step was to collect activity data in
was to agree on the scope of action for measuring this order to convert it into carbon emissions. As this
carbon footprint. They decided to limit themselves data wasn’t already present and documented in
to measuring greenhouse gas emissions generated Carrefour’s data platform, the business teams
by e-commerce orders in 2021 (logistics, warehouses, e-commerce) had to be
brought together to obtain it. This step proved to
be crucial to the operation’s success, as it allowed
all stakeholders to become ambassadors for the
“The challenge we gave Artefact was group’s “carbon neutrality 2030” objective.
to calculate the CO2 emission of an
Carrefour’s strategy for measuring its carbon
online order.How much CO2 will a footprint was based on a systemic, unifying, long-
customer produce if their order is term, iterative approach. The strategy was successful
delivered or if it’s picked up at the thanks to the participation of over 30 employees
store?” and the involvement of Carrefour customers via
their “Engaged Consumers Clubs”.
Bertrand Swiderski
“Today, we recognize that consumers are
Chief Sustainability Officer becoming experts on these topics. They want
CARREFOUR to understand how things are done and want to
challenge companies. Thanks to them, the project
has matured.”
Léonard Cahon
Consulting Manager - ARTEFACT
Encouraged by these initial results, Carrefour will
continue its commitment by publishing the carbon
footprint of each of its orders on its e-commerce
site in the near future.
“Soon, customers will clearly see the number of
kilograms per CO2 on their orders, thanks to the
insights gained from our carbon assessment.”
Manuel Chatain
E-commerce CSR Manager - CARREFOUR
10
RESULTS
11
Data-driven marketing:
the rise of the Customer
Data Platform
Florian Thiebaut
Managing Partner - Data-Marketing
ARTEFACT
A game-changing technical
and legal environment
Following Safari’s lead in 2016,
Everything seems to justify the current explosion the world’s three main browsers
eliminated (or will eliminate) the use
of the Customer Data Platform (CDP) market. of third-party cookies. On the mobile/
CDPs’ main advantage over older generation Data tablet devices side, Apple’s iOS 14
Management Platforms (DMPs) is that they ea- now requires explicit consent for any
mobile ID collection.
sily integrate identifiable first-party data (email,
phone number) and aren’t dependent on using As for regulation, GDPR laws in Europe
third-party cookies or browsing data to refine have given consumers more control
over their personal data, requiring
customer and prospect knowledge. them to give explicit consent for
the use of cookies. This regulation
CDPs are a true asset in a world that is becoming represents a major shift in the world
increasingly cookie- and ad ID-free. At a time of data-driven marketing, as it has
when the pandemic is forcing brands to digitise reduced the number of cookies
placed on European devices by 30%.
at breakneck speed, and when the transforma-
tion of the technical and regulatory environment This global trend restricting the use of
surrounding advertising trackers is forcing data IDs and advertising cookies sharply
impacts the targeting capabilities of
marketers to revise their approaches, CDPs are advertisers, who are often dependent
here to optimise the customer experience. on third party data. The vast majority
12
DATA FOR RETAIL
of them use or have used retargeting a CDP environment based on a suite This data completes a database that
and old generation DMPs that rely of tools that is both compliant and is incomplete at certain points in the
heavily on segments fed by third party sustainable. This will enable data consumer journey. Examples might
data. to be collected, stored, processed, be an agreement between an FMCG
visualised and activated, whatever brand and a retailer, a mobile phone
Along with targeting, measurement the source. From this foundation, the manufacturer with a telco or a hotel
must also be transformed. With focus must be on first-party data. chain with an airline.
more stringent consent collection
requirements, it’s more difficult to D ata governance: Brands need Three types of data to
collect the consumer IDs needed to to rethink data governance and activate via a suite of tools
track impressions, clicks or views, processes to enable secure and
and reconstruct complete customer compliant end-to-end data collection. First- and second-party data are key
journeys. to meeting the challenges of the post-
Audience segmentation: This data, cookie world. But what are they and
Four pillars for a sustainable centralised for a unified view of the what tools can be used to manage
data strategy consumer, can then be used to create them?
new audience segments and define
To maintain the same performance new metrics for measuring campaign PII or Personally Identifiable
and differentiate themselves from the results. Information is essentially CRM
competition, advertisers must design (customer relationship management)
a sustainable data strategy and S econd-party partnerships: In data. It can precisely identify an
exploit their customer and prospect addition, it’s becoming increasingly individual and is often an email address
data to its full potential. strategic to form so-called “second or a phone number for example.
party” partnerships with other partner Once anonymised, it can be used
This requires focus on four actions: companies to exploit first-party data via the APIs of media partners (e.g.,
and create win-win situations. Google Customer Match, Facebook
The CDP: The first step is to establish Custom Audience/conversion API,
13
Amazon, WeChat, etc.) to build relying on third party data
audience segments, perform audience
extensions, and reconstruct paths Data that is purely media-related,
to measure the influence of digital such as campaign impressions,
campaigns on offline sales, etc. video views and click rates, is more
voluminous and less granular than
Non-PII data can be browsing data the other two types of data. It is
that cannot lead directly to the more difficult to use but there is a
identification of an individual. It robust market of tools capable of
can be used to build more granular treating it in a secure and compliant
segments via analytics and audience manner, such as Google Ads Data
creation solutions for measuring Hub, Facebook Advanced Analytics
precision marketing actions without and Amazon Marketing Cloud.
14
CASE STUDY
CONFORAMA
AI-enabled personalization boosts
Conforama CRM campaign revenues
CHALLENGES
Conforama is the second largest home furnishings Several challenges needed to be addressed through
retailer in France and is present in seven countries, this use case:
with 300 stores, including 200 in France. The
company sells furniture and decorative items in kit • How to understand the needs of three million
form and posted sales of 1.7 billion euros in 2022. customers and recommend the most relevant
products from a catalog with 42,000 references?
As a gateway brand, Conforama’s goal is to “Make
what people want most accessible at the best price.” • How to propose only products currently in stock,
It’s an ambition backed by a transformation plan to on promotion, and not already suggested to
deliver an omnichannel experience through data and customers?
AI. An initial audit and data marketing vision with
Artefact identified and prioritized 12 use cases and • How to easily operate and maintain the technical
25 technical and organizational enablers. The first solution?
use case was to integrate a personalized product
recommendation into the company’s weekly emails.
15
“Time savings, yes, but above
all a business benefit for our
CRM teams. Because thanks to
this personalization, customers
click more and therefore buy
more. We’ve gained 15% of
the click rate following the
personalization of these emails,
which represents several million
in incremental sales.”
SOLUTION
16
DATA FOR RETAIL
Today, an email is sent to every Conforama This solution is based on 16 data tables, 25
customer each Tuesday containing eight product transformation and modeling steps, and 40
recommendations. But these recommendations automated quality tests. Dozens of iterations of
are personalized according to purchase history, and the model made it possible to choose the most
filtered exclusively for products that are on sale, are efficient approach based on transaction history.
available in stores, and that haven’t been featured Thanks to this solution, Conforama now generates
in previous activations. several million recommendations each week in 45
minutes at a cost of 50 euros per week.
The implemented AI solution includes 4 main data
processing steps: In other words, if you count development and
operation costs, as well as incremental sales, the
• Collection of transaction histories, customer and project break-even point is reached in one week,
product references, then data preparation; with an automated and reliable solution.
17
RESULTS
For many players, there are three challenges linked • Select a first use case and functionalities that can
to their level of maturity: be quickly implemented and measured to put the
organization on the road to success. For example,
LEVEL 1 this initial victory means Conforama can now plan
the deployment of product recommendations in
Personalizing a currently rule-based touchpoint stores or the improvement of their algorithm thanks
using an AI algorithmic approach; to browsing data.
LEVEL 2 • Ensure the data is reliable. Good data modeling
relies first and foremost on good quality data. For
Extending AI-based personalized recommendation Conforama, exploratory analyses were performed
across the entire customer journey (similar products on more than 50 tables to select data sources
/ complementary products / suggestion based on in areas such as customer knowledge, product
purchase history); repositories and transactions.
LEVEL 3 • Use technologies that allow teams to deploy a
technical solution quickly and collaboratively.
Optimizing the orchestration of recommendations
Conforama selected the most appropriate tools
across channels to ensure an omnichannel
for this type of workflow: DBT, BigQuery ML and
experience.
Vertex AI for their performance, modularity and
Level 1 is often the most difficult, as it requires laying portability.
the foundations for four separate dimensions: target
• Build a dedicated team capable of dealing with
vision, user experience and priorities; data sources;
all potential problems, and adopt a test and learn
technological tools; project team and work method.
approach. To do this, a multidisciplinary IT /
The Conforama example offers valuable lessons Conforama business team was formed, and a
about these four dimensions: 2-week sprint approach was adopted.
18
DATA FOR RETAIL
Retail Media:
An indispensable
asset for brands
While Retail Media represented only 9% of
digital media investments for brands in 2019, it
will soar to 43% of these investments in 2023
and is expected to double in value by 2024
to reach €100bn. Vincent Cailliot, Director
of Data Consulting and Sidney Zeder, Senior
Consulting Manager – Data Marketing, both
of Artefact, explore the opportunities of retail
media for Consumer Packaged Goods (CPG)
brands.
19
the sharing of personally identifiable to transactional data to build their sales and ROI, enabling effective
consumer data at the individual level Retail Media strategy. While the optimization of activation plans.
in an anonymous way. majority of retailers in the US have
launched Retail Media offerings, Retail Media is just the next step
A rapidly evolving ecosystem most retailers in Europe are still in the towards more collaboration
experimentation and use case-testing between retailers and brands. In a
The ecosystem of technology phases; few have yet industrialized long-term partnership perspective,
partners around Retail Media is highly use cases with brands. collaboration and data sharing can
fragmented and constantly evolving, enable the implementation of more
with partners that are more or less Valuable use cases beyond advanced category management
specialized depending on major Retail Media and supply chain use cases, such
Retail Media activities: first-party, as the analysis of the long-term
second-party or third-party cookie LeRetail Media allows brands to value of existing promotions or
data collection tools, data processing address marketing use cases from the prediction of in-store product
and audience creation, activation consumer insight generation to digital demand and thusoptimize supply
or analysis, etc. The challenge for campaign activation and marketing
brands will be to identify which performance measurement. The chain operations.
combination of technology partners availability of transactional data
will best meet their needs, depending (previously unavailable to CPG Which Go-To-Market strategy
on their current technical ecosystem B2B2C brands) at the “individual” level to launch?
and their own business challenges. enables the construction of insights
and activation plans that are all the For retailers, it’s important to define
Retailers are also an essential part more impactful. The same data can a new offer to monetize their data.
of this ecosystem, providing access be used to measure their effect on This can range from monetizing their
owned media inventory (website), to
sharing data “as a service” in a clean
room, to offering services (campaign
management or reporting as proposed
by Amazon for example). These new
offers can be marketed internally or
via partners. The internal or external
development strategy will determine
the associated costs, in terms of
salaried resources (commercial and
technical profiles to be recruited)
and technical resources (clean room
tools, technical infrastructure to be
set up).
20
DATA FOR RETAIL
CASE STUDY
UNILEVER
How does Artefact support Unilever
on Retail Media use cases to increase
its sales?
CHALLENGES
21
“Retail Media is a win/win strategy for brands and
retailers. Retailers’ data allows us to enrich the
shopper’s knowledge and accurately measure
our activities on all channels, throughout the
transformation tunnel. For their part, retailers find
a new source of revenue and differentiation from
their competitors. In addition, it is a way to better
satisfy their clients with more personalized offers
and a better anticipation of stock shortages.”
Sarah Baqa
Head of Performance Marketing - Unilever
22
DATA FOR RETAIL
SOLUTION
23
DATA FOR E-COMMERCE & RETAIL
RESULTS
24
DATA FOR RETAIL
Retail media has been on the rise said they were already investing in
on digital platforms for the last six at least one retail media platform.
years, most notably on Amazon. The This represents approximately 17%
Covid-19 crisis accelerated this trend of digital budgets already allocated
for traditional retailers. Retail media, to retail media.
in simple terms, is the means for
retailers to sell media inventory on Media investment has effectively
their e-commerce platforms. Because shifted down the marketing funnel.
Gaétan Bélan the Covid-19 pandemic fueled the Although many brands’ search
Senior Data Consultant shift to digital ways of buying, such investments still flow into the “Google
& Product Owner as e-commerce or click-and-collect, family,” we’re seeing brands diversify
ARTEFACT even for grocery shopping, retailers their digital spend into e-commerce
had no choice but to go with the flow. platforms to capitalize on the “search
destination” status they hold. When
In fact, between 2019 and 2020, CPG you’re on Amazon as a consumer,
e-commerce penetration increased you’re very close to the “moment of
by five points, from 10 to 15 percent. truth”: you’re in a purchasing mindset.
For retailers, the downside is that Therefore, when you’re on Amazon as
margins are lower in e-commerce a brand or product, the closer you can
than in brick-and-mortar. The upside get to that funnel, the better. Goldman
is that by selling online, they collect Sachs expects this trend to translate
a lot of consumer data that can be into a 6-8 percent increase in total
Sidney Zeder monetized or used to create new CPG e-commerce sales through retail
Senior Consulting Manager services. In a Goldman Sachs study, media over the next four years.
ARTEFACT 82% of CPG companies surveyed
25
Retailer data monetization behavioral (e.g. what did they look retailer to build advanced audiences
opportunities with CPG at), loyalty data (e.g. did they buy for a digital marketing campaign.
brands again), etc. This data is shared “as-a- Using the retailer’s transactional
service” in a data clean room where data, the brand was able to build
This close-to-the-funnel media brands can access the retailer’s data and activate two audiences: the
investment trend has created in a secure environment to carry out brand’s current ice cream buyers and
opportunities for retailers around specific use cases defined by the ice cream buyers from competitor
three types of data monetization with two partners. brands. As a result, the brand was
CPG brands: able to increase the uplift of its
Carrefour, for example, has created campaigns by targeting the two
1. Inventory monetization: traditional a consumer intelligence service relevant audiences with adapted
retail media consisting of selling called Carrefour Links, based on the messaging.
media inventory on proprietary LiveRamp clean room, where partners
assets. This can be offline inventory can access their cardholder data. This •Trade: the data shared by retailers
– retailers have long monetized is a self-service platform that allows allows brands to perform revenue
their customers to offer coupons or users to perform basic activities such growth management use cases by
specific promotions to brands in their as reconciling retailer and brand better optimizing promotions or
stores – but also their online inventory databases on individual customers assortment… It also unlocks store
on their own platforms, such as their to build a more complete view of the optimization use cases through
e-commerce website, where brands consumer and thus improve their enhanced in-store experience
can display banner ads, emails or experience. It also provides analytics or sales force optimization. For
even shopping mobile applications and measurement capabilities that example, one brand worked with a
to deliver personalized promotions Carrefour can bill to its partners. retailer to analyze the short- and
to their customers. long-term impact of promotions on
Access to this data can unlock three incremental margin. This enabled
2. Data monetization: retailers are types of use cases for brands: them to identify certain types of
monetizing existing consumer data promotions that were margin-
across CPG brands to support their •
M arketing: the data shared by destroying for both the brand and
customer centricity. 1P data shared retailers allows brands to gain the retailer, as opposed to those
by retailers originates from their insights about their consumers, that generated a positive long-term
loyalty program. The cardholder data activate them with media, or measure business impact.
they share can be socio-demographic marketing performance through
(e.g. the age of their consumers), transactional data. For example, an •O perations: the data shared by
transactional (e.g. what did they buy), ice cream brand partnered with a retailers allows brands to optimize
26
DATA FOR RETAIL
27
CASE STUDY CHALLENGES
28
DATA FOR RETAIL
SOLUTION
29
CASE STUDY
CHALLENGES
30
DATA FOR RETAIL
SOLUTION
In 2020, Cdiscount cemented its position as the In addition to leveraging the e-retailer ecosystem,
number one French e-retailer, generating 25 million Artefact kept looking for ways to innovate and test new
unique visitors and 10 million customers (adding a solutions to build more expertise and scale projects.
million new buyers during the year). In that way, Artefact conducted off-platform campaigns
Cdiscount’s vision of its retail media ecosystem through Shopping for Partners solutions (Google for
covers the entire sales funnel and encompasses Retail) that were activated to support milestones,
different steps (brand awareness, consideration, such as product launches.
traffic, acquisition, and insights)
In February 2020, relevanC introduced a self-serve
platform, relevanC Advertising Platform, built entirely
in-house, that offers retail media solutions through
search and display on Cdiscount.
“relevanC Advertising is a best-in-class tool to operate
and manage retail media campaigns,”
says Maïana Darmendrail – Digital Manager & E-Retail
Manager, Mattel France RESULTS
“CDiscount possesses very mature retail media
solutions in search, display and video,”
states Thomas Faure – E-Retail Lead, Artefact
Ad campaigns six times
Artefact’s made the decision to perform an all-year more effective!
round online campaign aimed towards buyer intent.
Artefact defined custom audiences out of precise Mattel achieved extremely positive results
shopper data items (purchase history, buyer intent, through retail media activations in 2020:
search history, browsing history, income level ..). • 600% increase on return on ad spend for
The key to increase performance of media investments media investments using retail media
was to conduct continuous optimizations on both data (compared to traditional campaigns)
retail and media KPIs. • from 4 up to 10 euros earned when
“We have the intimate conviction that continuous spending one euro on display and search
optimization brings performance improvements,” ad campaigns
explains Cédric Chamoux – Directeur Retail Media, “relevanC Advertising has the most
relevanC Advertising qualitative data that you can find on the
• Optimizations were made on several factors: market. The daily optimization by Artefact
channels (onsite, offsite, search, display), segments really moved the needle,”
(audience, keywords), creatives, formats.
mentions Maiana Darmendrail – Digital
• Optimizations were based on several metrics: media Manager & E-Retail Manager, Mattel France
metrics, business metrics, retail metrics (stock level,
promotions, organic positioning) Brands are able to find success when mixing
business metrics with media expertise, i.e.
• Optimizations happened on different solutions:
sync retail signals (stock level, promotion
search advertising, daily improvements on
level) with media KPIs to better optimize
keyword selection, display advertising, bi-weekly
improvements on impact measurement advertising campaigns.
31
CASE STUDY
CARREFOUR GROUP
How to reduce food waste
in the bakery-pastry department?
CHALLENGES
32
DATA FOR RETAIL
SOLUTION
Close collaboration
with the “field” teams
From an organizational point of view, the project was
led by multidisciplinary teams. Two teams on the
Artefact and Carrefour sides combined technical and
business profiles. The operational skills of Carrefour’s
retail professionals played a crucial role. They were
able to explain their business, their needs, and bring
their vision, in order to guarantee the success and
adoption of the solution “in real life”.
33
RESULTS
Carrefour Group is
multiplying AI use cases
to improve customer
experience
For the group, future experiments follow the
same model: responding to business needs,
working jointly with operational teams, to feed
the customer experience. This acceleration
of Carrefour’s digital transformation was
made possible by the creation of complete
and expert data teams within the company, In fact, over the
and the deployment of data platforms in all last five months of 2021,
countries where the group operates. approximately 100 tons
The volume and wealth of data collected
by Carrefour provides a unique opportunity
of pastries were saved. At
to explore the major challenges facing the the same time, sales have
retail sector: omnichannel, e-commerce, increased due to fewer
anticipation of consumer habits, etc.
Carrefour recently unveiled other examples
shortages at the end
of how data can be used to improve the of the day.
customer experience: five-minute shopping
on Carrefour.fr, the implementation of
personalized assortments for local stores,
and the personalization of promotions.
34
DATA FOR RETAIL
Using Machine
is to go beyond past sales
to predict future sales
Learning to predict
accurately
Managing orders and inventory is
sales in retail.
the one of the strongest competitive
advantages that can help retailers
achieve success. And it is a real
challenge to master as it involves
processing a huge number of SKUs -
some of them that are even perishable
- ordered daily. We estimate that bad
Jérôme Petit inventory management, whether it’s
Managing Partner out of order items or excess stock,
Retail & eCommerce cost US retailers close to two billion
ARTEFACT dollars per year. For decades, retailers
have been relying on the analysis of
their past sales for their Enterprise
Resource Planning (ERP) that helps
them reduce their investment and
All industries aim to manufacture just the right exploitation costs. However, these
methods are heavily biased and are
amount of products at the right time. But, for not that useful when trying to predict
retailers, this issue is even more important as they accurate sales.
also need to manage their stocks efficiently. Too Dozens of signals to take
many items in stock is bad. Too few items in stock into account when assessing
is also bad. And to predict sales as close to possible, stock levels
retailers used to only rely on the previous years
The reason why predicting sales
past sales. This method is useful only to a certain appears to be so complex and
point and suffers many biases. Thankfully, Machine difficult is because, in a given period,
Learning has now evolved to be able to provide very many factors can affect purchase:
weather, purchasing trends,
accurate predictive models using different signals regulation, product launches, global
based on how they influence purchases. pandemic, buying behaviors ... And
the main issue with these types of
predictions based on past recordings
is that they don’t factor individual
incidents, and they make monthly
sales appear as if they were perfectly
distributed when they were probably
not. In fact, an out of order item might
have caused a slowing of sales of
a particular product or a particular
category, but it won’t show in the
monthly reports. Even worse, bad
numbers are viewed as a mark of
buyers disinterest, when they could
be the opposite as consumers over-
purchase an item and cause it to sell
out. It is also important to note that
a missing product in store doesn’t
35
necessarily mean that the product is
out of stock. Big box retailers struggle
to restock their shelves in real time
so a product that becomes instantly
popular might disappear from the
shelves and thus perform worse than
expected, when in fact, it is available
in inventory. Retailers are in need
of technology that can help them
step into a new paradigm that could
seamlessly align offer and demand.
Using Machine Learning to
help employees in stores
36
DATA FOR RETAIL
37
and prioritize machine learning Image 2
based approaches over traditional
forecasting ones, to consider this
information when training the model.
Two main challenges:
intermittent values and an
extended prediction horizon
At this stage, you might think that
it is a really common forecasting
problem. You’re right and that’s why
it is interesting: it can relate to a wide
range of other projects, even if each
industry has its own characteristics.
However, this challenge has 2
important specificities that will make
the task more difficult than expected.
The first one is that the time series SEASONALITY
shifted by a certain period. For any
we are working with have a lot of Rather than using the sales date
specific item in a given store, the
intermittent values, i.e. long periods directly as a predictor, it is usually
1-week lag value would be the sales
of consecutive days with no sales, more relevant to decompose it into
made one week ago for this particular
as illustrated on the plot below. This several features to characterize
item and store. Different shift values
could be due to stock-outs or limited seasonality: year, month, week
can be considered, and the average
shelves’ area in stores. In any case, number, day of the week… The latter
of several lags is computed as well,
this complicates the task, since is particularly insightful because
to get more robust predictors. Lags
the error will skyrocket if sales are the problem has a strong weekly
can also be calculated on aggregated
predicted at a regular level while the periodicity: sales volumes are bigger
sales to capture more global trends,
product is out of shelves. (cf image 2) on the weekends, when people spend
for example at the store level or at
more time in supermarkets.
The second one comes from the the product category level.
task itself, and more precisely from Calendar events such as holidays
PRICING
the size of the prediction horizon. or NBA finals also have a strong
A product’s price can change from
Competitors are required to generate seasonal impact. One feature has
one store to another, and even
forecasts not only for the next week, been created for each event, with the
from one week to another within
but for a 4-week period. Would you following values:
the same store. These variations
rather rely on the weather forecast for • Negative values for the 15 days strongly influence sales and should
the next day or for 1 month from now? before the event (-15 to -1) therefore be described by some
The same goes for sales forecasting: • 0 on the D-day features. Rather than absolute prices,
an extended prediction horizon • Positive values for the 15 days relative price differences between
makes the problem more complex following the event (1 to 15) relevant products are more likely to
as uncertainty increases with time. • No value on periods more than 15 explain sales evolutions. That’s why
Feature engineering — days away from the event the following predictors have been
The idea is to model the seasonal computed:
Modeling sales’ driving impact not only on the D-day, but
factors • Relative difference between
also before and after. For example,
the current price of an item and
Now that we have understood the a product that will be offered a lot as
its historical average price, to
task at hand, we can start to compute a Christmas present will experience
highlight promotional offers’
features modeling all phenomenons a sales peak on the days before and
impact.
that might affect sales evolution. The a drop right after.
objective here is to describe each • Price relative difference with the
TRENDS
triplet Day x Product x Store by a set same item sold in other stores,
Recent trends also provide useful
of indicators that capture the effects to understand whether or not
information on future sales and are
of factors such as seasonality, trends the store has an attractive price.
modeled thanks to lag features. A
or pricing. lag is the value of the target variable • Price relative difference with
38
DATA FOR RETAIL
Image 4
has to be encoded into features to
help the model leveraging the dataset
hierarchy. One-hot encoding is not an
option here because some of these
categorical variables have a very high
cardinality (3049 distinct products). sales. Like many others, we have objective function is also available
Instead, we have used an ordered chosen another option, which is to for other gradient boosting models
target encoding, which means that rely on an objective function adapted such as XGBoost or CatBoost, so it’s
each observation is encoded by the to the problem: the tweedie loss. definitely worth trying if you’re dealing
average sales of past observations with intermittent values.
Without going into the mathematical
having the same categorical value. details, let’s try to understand why How to forecast 28 days in
The dataset is ordered by time for this loss function is appropriate for
this task to avoid data leakage.
advance?: Making the most
our problem, by comparing sales
distribution in the training data and
out of lag features
All categorical variables and some
of their combinations have been the tweedie distribution (cf image 3). As explained above, lag features are
encoded with this method. This sales shifted by a given period of time.
They look quite similar and both
results in very informative features, Thus, their values depend on where
have values concentrated around
the best one being the encoding of you stand in the forecasting horizon.
0. Setting the tweedie loss as an
product and store combination. If you The sales made on a particular day
objective function will basically force
wish to experiment other encoders, D can be considered as a 1-day lag
the model to maximize the likelihood
you can find a wide range of methods if you’re predicting one day ahead, or
of that distribution and thus predict
here. as a 28-day lag if you’re predicting 28
the right amount of 0s. Besides, this
days ahead. The following diagram
loss function comes with a parameter
Tweedie loss to handle illustrates this point (cf image 5)
— whose values are ranging from 1
intermittent values to 2 — that can be tuned to fit the This concept is important to
Different possible strategies can be distribution of the problem at hand understand what features will be
used to deal with the intermittent (cf image 4) available at prediction time. Here,
values issue. Some participants we are on day D and we would like to
Based on our dataset distribution,
decided to create 2 separate models: forecast sales for the next 28 days.
we can expect the optimal value to
one to predict whether or not the If we want to use the same model —
be between 1 and 1.5, but to be more
product will be available on a specific and thus the same features — to make
precise we will tune that parameter
day, and a second one to forecast predictions for the whole forecasting
later with cross-validation. This
horizon, we can only use lags that are
available to predict all days between
D+1 and D+28. This means that if we
use the 1-day lag feature to train the
model, that variable will also have to
be filled for predictions at D+2, D+3, …
and D+28, whereas it refers to dates
in the future.
Still, lags are probably the features
Image 3
39
Image 5
This method allows us to better • Model 2 makes forecasts for days • Model 4 makes forecasts for days
capitalize on lag information for the 8–14, relying on all lags except the 13 22–28, relying on all lags except the 27
first 3 weeks and thus improved our most recent ones. most recent ones just like in option 1.
40
DATA FOR RETAIL
41
Image 9
combination with the lowest error incremental accuracy also depends engineering process: make sure
over the 5 folds to train the final on the order in which each step is that all the features you compute
model. implemented. will be available at prediction
time.
Results Key takeaways
• S elect a model architecture
The different techniques mentioned We have learned a lot from this that allows you to leverage
above allowed us to reach a 0.59 challenge thanks to participants’ lags as much as possible, but
weighted RMSSE — the metric used shared insights and we hope it gave also keep in mind complexity
on Kaggle — which is equivalent to you food for thoughts as well. Here considerations if you’re willing
a weighted forecast accuracy of are our key takeaways: to go to production.
82.8%. The chart below sums up the
• Work on a small but representative • Set-up a cross-validation strategy
incremental performance generated
subset of data to iterate quickly. adapted to your business problem
by each step.
•
B e super careful about to evaluate correctly your
These figures are indicative: the experiments’ performance.
data leakage in the feature
42
WE OFFER END-TO-END
DATA & AI SERVICES
TRAVEL & TOURISM • FMCG • SPORTS & ENTERTAINMENT • BANKING & INSURANCE • RETAIL • LUXURY & COSMETICS
HEALTHCARE • eCOMMERCE • TELECOMMUNICATIONS • MANUFACTURING & UTILITIES • REAL ESTATE • PUBLIC & GOVERNMENT
CONTACT
[email protected]
artefact.com/contact-us
ARTEFACT HEADQUARTERS
19, rue Richer
75009 — Paris
France
artefact.com