0% found this document useful (0 votes)
65 views10 pages

课后阅读msom 2020JDdata

This document describes transaction-level data from JD.com, China's largest retailer, for over 2.5 million customers and 31,868 products over March 2018. The data aims to support research on e-commerce and supply chain operations. Potential research questions suggested by JD.com include examining product attributes that predict customer choice, the impact of pricing strategies on sales, and how to improve sales performance for specific customer segments.

Uploaded by

foxsariel0901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
65 views10 pages

课后阅读msom 2020JDdata

This document describes transaction-level data from JD.com, China's largest retailer, for over 2.5 million customers and 31,868 products over March 2018. The data aims to support research on e-commerce and supply chain operations. Potential research questions suggested by JD.com include examining product attributes that predict customer choice, the impact of pricing strategies on sales, and how to improve sales performance for specific customer segments.

Uploaded by

foxsariel0901
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article was downloaded by: [106.120.213.

11] On: 22 February 2022, At: 20:10


Publisher: Institute for Operations Research and the Management Sciences (INFORMS)
INFORMS is located in Maryland, USA

Manufacturing & Service Operations Management


Publication details, including instructions for authors and subscription information:
http://pubsonline.informs.org

JD.com: Transaction-Level Data for the 2020 MSOM Data


Driven Research Challenge
Max Shen, Christopher S. Tang, Di Wu, Rong Yuan, Wei Zhou

To cite this article:


Max Shen, Christopher S. Tang, Di Wu, Rong Yuan, Wei Zhou (2020) JD.com: Transaction-Level Data for the 2020 MSOM Data
Driven Research Challenge. Manufacturing & Service Operations Management

Published online in Articles in Advance 09 Dec 2020

. https://doi.org/10.1287/msom.2020.0900

Full terms and conditions of use: https://pubsonline.informs.org/Publications/Librarians-Portal/PubsOnLine-Terms-and-


Conditions

This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact [email protected].

The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.

Copyright © 2020, INFORMS

Please scroll down for article—it is on subsequent pages

With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
MANUFACTURING & SERVICE OPERATIONS MANAGEMENT
Articles in Advance, pp. 1–9
http://pubsonline.informs.org/journal/msom ISSN 1523-4614 (print), ISSN 1526-5498 (online)

Data Driven Challenge

JD.com: Transaction-Level Data for the 2020 MSOM Data Driven


Research Challenge
Max Shen,a Christopher S. Tang,b Di Wu,c Rong Yuan,d Wei Zhouc
a
Department of Industrial Engineering and Operations Research, University of California, Berkeley, Berkeley, California 94720; b UCLA
Anderson School of Management, University of California, Los Angeles, Los Angeles, California 90095; c JD.com American Technologies
Corporation, Mountain View, California 94043; d Stitch Fix, San Francisco, California 94104
Contact: [email protected], https://orcid.org/0000-0003-4538-8312 (MS); [email protected] (CST); [email protected] (DW);
[email protected] (RY); [email protected] (WZ)

Received: September 10, 2019 Abstract. To support the 2020 MSOM Data Driven Research Challenge, JD.com, China’s
Revised: November 5, 2019 largest retailer, offers transaction-level data to MSOM members for conducting data-driven
Accepted: December 20, 2019 research. This article describes the transactional data associated with over 2.5 million
Published Online in Articles in Advance: customers (457,298 made purchases) and 31,868 stock keeping units (SKUs) over the
December 9, 2020 month of March in 2018. We also present potential research questions suggested by
https://doi.org/10.1287/msom.2020.0900 JD.com. Researchers are welcome to develop econometric models or data-driven models
using this database to address some of the suggested questions or examine their own
Copyright: © 2020 INFORMS research questions.

History: Gad Allon served as guest editor-in-chief for this article.

Keywords: e-commerce • transactional data • MSOM society • data-driven research

1. Introduction The data sets provided by JD.com capture a “full


The growth of e-commerce retailing (or E-tailing) has customer experience cycle” that begins as soon as a
given rise to many new and challenging problems at customer begins browsing on the platform and ends
both strategic and operational levels. To encourage when the customer receives the delivered products.
operations management (OM) researchers to conduct The data set describes 2.5 million customers (457,298
data-driven research in E-tailing, we are collaborating made purchases) and 30,000 stock keeping units
with JD.com and the MSOM society to run a research (SKUs; from one product category) during the month
competition based on their proprietary data. This of March in 2018.
competition is intended to enable researchers to ex- Based on our discussion with the management of
amine research questions arising from customer pur- JD.com, we developed the following set of research
chasing decisions and supply chain operations in the questions. We encourage researchers to explore the
context of E-tailing. provided data and develop innovative solutions to
JD.com is China’s largest retailer with a net revenue address the following problems (or other research
of US$67.2 billion in 2018 and over 320 million annual problems of their own choosing):
active customers. According to JD.com, 1. Which product attributes and/or features have
[It] is committed to providing only high-quality, au- predictive power about the customer’s product choice?
thentic products, and is known for its fast delivery Does this product choice differ by channel (e.g., pur-
speed. JD.com sets the standard for online shopping chasing via mobile phones vs. personal computers),
through its commitment to quality, authenticity, and its region, and brand loyalty?
vast product offering covering everything from fresh 2. Would more products with similar attributes and
food and apparel to electronics and cosmetics. JD.com
features improve or hinder sales revenues for JD.com?
combines its first-party business model, where it controls
the entire supply chain, with a marketplace that inten-
3. For a specific target customer segment (e.g., fe-
tionally limits the number of sellers, to ensure that it can male customers in tier 1 cities), what should merchants
maintain strict quality oversight. JD.com has a nationwide and brands do to improve their sales performance?
fulfillment network that covers 99% of China’s pop- 4. What is the impact of various pricing and pro-
ulation, and is able to provide standard same- and motion strategies on product sales? How should
next-day delivery for approximately 90% of orders. JD.com improve its pricing and promotion strategy?

1
Shen et al.: MSOM Data Driven Research Competition
2 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS

Table 1. Description of the skus Table

Field Data type Description Sample value

sku_ID string Unique identifier of a product b4822497a5


Type int 1P or 3P SKU 1
brand_ID string Brand unique identification code c840ce7809
attribute1 int First key attribute of the category 3
attribute2 int Second key attribute of the category 60
activate_date string The date at which the SKU is first introduced 2018-03-01
deactivate_date string The date at which the SKU is terminated 2018-03-01

In particular, among all the promotion methods (e.g., as a baseline, and researchers should be careful about
direct discounts, bundle discounts, and volume dis- extrapolating their results.
counts), which one is more effective? In the database, each SKU can be identified either
5. Do ordinary customers behave differently from as first-party owned (1P) or third-party owned (3P),
JD.com’s PLUS members?1 How should JD.com improve depending on the ownership of the inventory of that
its pricing and shipping strategy for its PLUS members? SKU.4 All 1P SKUs are managed by JD.com, including
6. How should JD.com improve its demand forecast product assortments, inventory replenishments, product
accuracy for different geographic regions and dif- pricing, order deliveries, and after-sale customer ser-
ferent customer groups? vices. Despite different operations, 1P and 3P SKUs
7. How should JD.com improve its fulfillment effi- compete on the JD.com platform for sales through dif-
ciency and customer experience with better inventory ferent pricing strategies and marketing activities.
allocation strategies in a multilevel inventory network? In general, 1P SKUs are usually top sellers within
the category. By owning these 1P products, JD.com
2. Data Description can fully control the entire customer experience to
We now describe the transaction-level data provided provide guaranteed quality, fast delivery, and good
by JD.com. (We shall explain how to download the customer services. In contrast, all 3P SKUs are managed
database in Section 4.) To ensure confidentiality, by third-party merchants on the JD marketplace. Spe-
certain key identification information such as user ID cifically, to fulfill an order of a 3P SKU, the corre-
and SKU ID are anonymized.2 sponding merchant can decide freely whether to use
To keep the size of the database more manageable, the logistics services provided by JD Logistics or other
the database does not contain impression data, es- logistics service providers.5
pecially when JD.com may not have complete im- The data sets provided by JD.com offer a detailed
pression data from other channels (search, push no- view of the activities associated with all SKUs within
tification, SMS messages, social media (e.g., WeChat), one anonymized consumable category during the
mobile ads, etc.). However, our data contain all month of March in 2018. This category could be beauty
product detail page click events for each customer, care (e.g., face moisturizers) or men’s grooming (e.g.,
which can serve as a proxy. Instead, JD.com provides electric shavers) or something else. Owing to confi-
us with transaction-level data for the month of March dentiality, the specific category is not disclosed.
2018 during which there were no major holidays or The data set consists of seven tables that are labeled
promotions.3 Hence, the March data can be viewed as (1) skus, (2) users, (3) clicks, (4) orders, (5) delivery,

Figure 1. (Color online) Distribution of Attribute 1 Across Figure 2. (Color online) Distribution of Attribute 2 Across
All SKUs All SKUs
Shen et al.: MSOM Data Driven Research Competition
Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS 3

Table 2. Description of the users Table

Field Data type Description Sample value

user_ID string User unique identification code 000000f736


user_level int User level 10
first_order_month string First month in which the customer placed an order on 2017-07
JD.com (format: yyyy-mm)
plus int If user has a PLUS membership 0
gender string User gender (estimated) F
age string User age range (estimated) 26–35
marital_status string User marital status (estimated) M
education int User education level (estimated) 3
purchase_power int User purchase power (estimated) 2
city_level int City level of user address 1

(6) inventory, and (7) network. We now describe each moisturizer category, these two attributes can be sun
of these seven tables. protection factor (SPF) and percentage of antiaging
ingredients. Similarly, for the men’s electric shaver
2.1. Table 1: SKUs category, these two attributes can be the number of
The skus table (Table 1) describes the characteristics shaves per charge and the number of personalized
of all 31,868 SKUs that belong to a single product shaving modes. Hence, both attributes characterize
category receiving at least one click during March the functionality of a product so that products with the
2018.6 As such, researchers should not generalize same attribute values have the same functionality. The
their results to other product categories. We now distributions of the value associated with these two at-
define each field and provide a brief description. Each tributes across all SKUs are depicted in Figures 1 and 2.
entry in the skus table corresponds to a unique SKU Notice that many SKUs have missing values for var-
(sku_ID). In addition, each SKU ID is “seller-specific.” ious reasons, including that (a) the third-party mer-
For example, an identical product that is sold by JD chants did not provide the attribute value, especially
as a 1P product and by a third-party seller as a 3P for certain slow-moving items, or (b) a certain attribute
product will be treated as two separate SKUs with was not applicable to certain SKUs.7
different SKU IDs. Similarly, an identical product sold For each SKU, the skus table provides two extra
by multiple third-party sellers will be denoted by elements: activate_date and deactivate_date. The former
different SKU IDs. specifies the date at which a SKU is first introduced on
Of these 31,868 SKUs, 1,167 of them are 1P SKUs the JD.com platform and the latter specifies the date at
(type value = 1) and the rest (30,701) are 3P SKUs (type which the SKU is terminated and removed from
value = 2). The brand information of each SKU is JD.com.8 Note that the data set lists the activate_date
provided via the field (brand_ID). However, only and deactivate_date variables only when these dates
9,159 SKUs out of 32,343 were involved in purchase fall in the month of March in 2018; thus, these vari-
activities during March of 2018. ables are usually blank.
Each SKU also has two key attributes: the first at-
tribute takes integer values between 1 and 4, and the 2.2. Table 2: Users
second takes integer values between 30 and 100. For The users table (Table 2) describes the characteristics
each attribute, a higher value indicates better perfor- of all 457,298 users who purchased at least one of
mance of a certain functionality. For the face the SKUs in the given category during March of 2018.

Figure 3. (Color online) Distribution of Users: User Level Figure 4. (Color online) Distribution of Users: Gender
Shen et al.: MSOM Data Driven Research Competition
4 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS

Figure 5. (Color online) Distribution of Users: Age Figure 7. (Color online) Distribution of Users:
Education Levels

We now define each field and provide a brief de-


scription. Each entry in the users table corresponds
The estimated user demographics for each user
to a unique customer (user_ID). The field first_-
are (a) gender (F: female, M: male, U: unknown); (b) age
order_month specifies the month when the user made
(< =15: less than or equal to 15 years old, 16–25: 16 to
his or her first purchase on JD.com.
25 years old, 26–35: 26 to 35 years old, 36–45: 36 to 45
For each repeat customer, the corresponding user is
years old, 46–55: 46 to 55 years old, ≥ 56: greater than
classified according to his or her past purchases so
or equal to 56 years old, U: unknown); (c) marriage—
that the customer’s user_level takes on a value of 0, 1, 2,
user’s marital status (M: married, S: single, U: un-
3, or 4, where a higher user_level is associated with a
higher total purchase value in the past. For users who known); (d) education—user’s education level (1: less
are enterprise users (e.g., small shops in rural areas or than high school, 2: high school diploma or equiv-
small businesses), the corresponding user_level takes alent, 3: bachelor’s degree, 4: postgraduate degree,
on a value of 10. However, for first-time purchasers, −1: unknown); and (e) purchase_power—user’s esti-
their user_level takes on the value −1.9 Figure 3 depicts mated purchase power (ranging from 1 to 5 with 1
the distribution of user levels for all 457,298 customers. being the highest purchase power; −1 if there is
The next variable is plus. This variable equals 1 no estimation).
when the corresponding user is an existing PLUS In addition to those estimated demographics of
member on February 28, 2018.10 (The variable plus is each user, JD.com has provided actual information
based on a snapshot on February 28, and we do not about the most commonly used shipping address
have information about the PLUS membership on a for each user. This information is captured in the
daily basis.) In addition to customer past purchase field city_level, which takes on values ranging be-
value and PLUS membership, the users table contains tween 1 and 5. JD.com developed its own classifica-
certain (estimated) user demographic information, be- tion scheme for different cities: level 1 corresponds
cause JD.com’s customers are not required to provide to highly industrialized cities such as Beijing and
any demographic information when making a pur- Shanghai; level 2 cities correspond to provincial cap-
chase. However, JD.com has a sophisticated data- itals; level 3–5 cities are smaller cities; if there are no
driven artificial intelligence system to estimate user data, then the value is −1. Notice that city_level is
demographics. based on actual information.

Figure 6. (Color online) Distribution of Users: Figure 8. (Color online) Distribution of Users: Purchase
Marital Status Power Levels
Shen et al.: MSOM Data Driven Research Competition
Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS 5

Figure 9. (Color online) Distribution of Users: City Levels mobile, app, wechat, and others. Channels pc and mobile
are associated with clicks through web browsers on
personal computers and mobile devices, respectively.
Channel app corresponds to JD.com’s mobile app.
Channel wechat corresponds to the miniprogram that
runs on the social media app WeChat. Finally, channel
others aggregates the clicks from all other channels.
The distribution of all click events across all channels
is summarized in Figure 10. Because of the popularity
of smartphones in China and the popularity of mobile
payment options (e.g., WeChat payment), the majority of
click events come from the app and wechat channels.
The field request_time provides extra granularity. It
can be used to infer the customer browsing sequence
Figure 4 depicts the distribution of user gender and habits. In Figure 11, we plot the number of clicks
across all 457,298 customers in the database, and during the day on March 1, 2018, within the app
Figure 5 summarizes the distribution of estimated channel. We can clearly identify two peaks in the
user age. As shown in Figure 6, for this specific daily browsing activities: one from 8 a.m. to 4 p.m. in
product category, more than 60% of all customers are the day and the other in the late evening.
estimated to be female and the estimated ages of these
customers are in their 30s to 40s. From Figure 6, we
2.4. Table 4: Orders
observe a relatively even distribution between mar-
The orders table (Table 4) contains 486,928 unique
ried and single customers. Figures 7 and 8 provide the
customer orders associated with our focused product
customer’s estimated education level and purchase
category that were placed during the month of March
power. Figure 9 summarizes the distribution of ship-
in 2018. Each customer order (order_ID) in the orders
ping address according to different city levels. It can
table is based on a specific SKU (sku_ID) associated
be seen that most of the customers are from tier 1 and
with a unique customer (user_id). (If a customer or-
tier 2 cities.
dered multiple SKUs, then the same order_ID will
appear in multiple rows of SKUs.)
2.3. Table 3: Clicks Other pieces of information associated with a cus-
The clicks table (Table 3) establishes the linkage be- tomer order as shown in Table 4 include (a) order
tween users and SKUs through their browsing his- quantity for each SKU associated with the order
tory. Each entry in the clicks table represents a user’s (quantity), (b) the date and time when the ordering
“click event” on a specific SKU page.11 The date set event took place (order_date and order_time), (c) the
contains over 20 million click records that are asso- type of SKU being ordered (type = 1 if it is a 1P SKU
ciated with the clicks of 2.5 million customers. Note and type = 2 if it is a 3P SKU), and (d) the promised
that this table contains clicks contributed not only by delivery time of the order (promise).13 Figure 12 dem-
the users identified in the users table (Table 2) who onstrates that most orders have promised delivery
purchased at least one SKU but also by “other users” dates within two days. Figure 13 shows the total
who did not end up completing a purchase order. number of sales by date and by order type.
The records include the following: (a) the user who The orders table also offers information about product
initiated a click event (user_ID), (b) the SKU associated pricing and promotional activities for each SKU. For each
with the click event (sku_ID), (c) the time at which entry, we denote the original list price of the SKU in the
the click event occurred (request_time), and (d) the field original_unit_price and the actual paid price by
channel in which the click event occurred (channel).12 the customer for the SKU as final_unit_price. The
We classify the channel taken as five string values: pc, original list price of a SKU at any given time instant

Table 3. Description of the clicks Table

Field Data type Description Sample value

sku_ID string SKU unique identification code b4822497a5


user_ID string User unique identification code 94ff800585
request_time string The time at which the customer clicks the SKU item 2018-03-01 23:57:53
page (format: yyyy-mm-dd HH:MM:SS)
channel string The click channel wechat
Shen et al.: MSOM Data Driven Research Competition
6 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS

Figure 10. (Color online) Distribution of All Click Events buy x pencils and y pads of paper). The final_unit_price
Across Different Channels for each gift item is always equal to 0.
Coupons can also be applied to the order after all
other promotions are applied. In contrast to the four
aforementioned promotion activities where discounts
will be applied automatically once certain criteria are
met, customers must “clip” (or claim) a coupon before
making a purchase.15 The field coupon_discount rec-
ords the coupon promotional value associated with
an order. Similar to quantity discount as explained
earlier, the discount value of the coupon is allocated
between items in the same order using an allocation
rule when necessary.
Note that, for each entry in the orders table, the gap
is the same for all customers, but the final price can
between original_unit_price and final_unit_price should
vary among customers owing to various discounts
always equal the sum of direct_discount, group_discount,
or promotions.
bundle_discount, and coupon_discount.
The “gap” between the original price and the final
Finally, for each order, we show from which district
price represents the coupons and discounts associ-
the order was shipped (dc_ori) and to which district
ated with different promotional activities for each
the order was shipped (dc_des). The district here is
SKU. There are four common types of promotional
defined by the warehouse ID that covers the demand
discounts on the JD.com platform:14
of that district. In other words, one can think of dc_ori
1. SKU direct discount: The seller of a SKU may
as the warehouse where the package is shipped from
offer a price cut in terms of a direct discount. This
and dc_des as the warehouse that is nearest to the
discount reflects the reduction in the list price as
customer’s designated shipping address. If dc_ori and
stated on the product detail page.
dc_des are the same, this means that the package is
2. Group promotion: The seller of a SKU may
shipped from the warehouse closest to the customer.
offer a quantity discount to entice the customer to buy
Otherwise, it indicates that the package is fulfilled by
more. This quantity discount promotion can take dif-
some other warehouse in a different district. We note
ferent forms including “get an RMB 100 discount if
that in theory any warehouse in the nationwide
buying over RMB 199” or “buy 3 and get 1 free.” We
network can fulfill the order for any customer in the
note that the quantity discount promotion is usually
country. However, in practice, there is a complicated
on the order level and we apply a simple allocation
order fulfillment logic that determines what inven-
rule to calculate the contribution provided by each
tory should be used to fulfill each customer order to
SKU in the order.
optimize fulfillment resources while satisfying de-
3. Bundle promotion: The seller may offer a bund-
livery promise.
le_discount if a customer buys a “prespecified bundle”
One can trace the shipping path and time of each
of SKUs within an order.
4. Gift items: The seller may offer a SKU as a “free order by using Tables 4 and 5.16 First, for each order
gift” (gift_item value = 1) if the customer purchases a denoted as order_ID, the order table (Table 4) provides
“prespecified set” of SKUs (e.g., get a free eraser if you information about the “origin” warehouse that the
order is shipped from (via the variable dc_ori) and the
“destination” warehouse that the order is shipped to
Figure 11. (Color online) Number of Click Events (via the variable dc_des). By using the information
Occurring on March 1, 2018, Through JD.com’s provided in Tables 4 and 5, one can trace the shipping
App Channel
path of each order.

2.5. Table 5: Delivery


The delivery table (Table 5) establishes the linkage
between each order (order_ID) and (possibly) multi-
ple shipping packages (i.e., multiple package_IDs) in
the event that an order is split into multiple delivery
packages for logistical reasons (e.g., an order that
involves in-stock and on-order items). The delivery
table contains records for orders delivered with JD
Logistics, which represents the majority of 1P orders
and some 3P orders. The orders that cannot find a
Shen et al.: MSOM Data Driven Research Competition
Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS 7

Table 4. Description of the orders Table

Field Data type Description Sample value

order_ID string Order unique identification code 3b76bfcd3b


user_ID string User unique identification code 3cde601074
sku_ID string SKU unique identification code 443fd601f0
order_date string Order date (format: yyyy-mm-dd) 2018-03-01
order_time string Specific time at which the order gets placed (format: 2018-03-01 11:10:40.0
yyyy-mm-dd HH:MM:SS)
Quantity int Number of units ordered 1
Type int 1P or 3P orders 1
Promise int Expected delivery time (in days) 2
original_unit_price float Original list price 99.9
final_unit_price float Final purchase price 53.9
direct_discount_per_unit float Discount due to SKU direct discount 5.0
quantity_discount_per_unit float Discount due to purchase quantity 41.0
bundle_discount_per_unit float Discount due to “bundle promotion” 0.0
coupon_discount_per_unit float Discount due to customer coupon 0.0
gift_item int If the SKU is with gift promotion 0
dc_ori int Distribution center ID where the order is shipped from 29
dc_des int Destination address where the order is shipped to 29
(represented by the closest distribution center ID)

match record in the delivery table can be considered there will be no record of that SKU at that warehouse
delivered by an alternative shipping method. on that day.
The delivery table contains 293,229 packages de-
livered by JD Logistics in the given time period,
2.7. Table 7: Network
among which 244,333 orders involve 1P SKUs (type = 1)
The network table (Table 7) provides information
and 48,896 orders involve 3P SKUs (type = 0).17 We
about the assignment of different warehouses located
further provide three key timestamps (up to hourly
in different districts (dc_ID) to different geographical
granularity) for each package delivery, namely, the time
regions (region_ID). For each district, a designated
at which the package was shipped from the warehouse
warehouse (dc_ID) is responsible for fulfilling orders
(ship_out_time), the time at which the package arrived
in the district. In addition, for different districts that
at the delivery station18 (arr_station_time), and the
are assigned to a geographical region, one of the (larger)
time at which the package was successfully delivered
warehouses will be designated as the “central ware-
to the customer (arr_time).
house” for that region. In JD.com’s context, a central
warehouse provides the “back-up fulfillment” option
2.6. Table 6: Inventory when other (typically smaller) warehouses in the
The inventory table (Table 6) provides information region run out of inventory for their corresponding
about the availability of each SKU (sku_id) at each districts. Figure 14 shows the number of districts
warehouse (dc_ID). We only disclose the availability within each geographical region. We denote each central
of the inventory at the end of the day (date) instead of warehouse for each region by setting dc_ID = region_ID.
the amount of inventory. In addition, when a SKU is
not available at a specific warehouse on a specific day,
Figure 13. (Color online) Sales in Quantity by Date and
Order Type
Figure 12. (Color online) Distribution of Promise Delivery
Time (1P Orders)
Shen et al.: MSOM Data Driven Research Competition
8 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS

Table 5. Description of the delivery Table

Field Data type Description Sample value

package_ID string Package unique identification code (same as order_ID if 209a005c40


the package contains all SKUs in the order)
order_ID string Order unique identification code 209a005c40
Type int 1P or 3P orders 1
ship_out_time string The timestamp when the package is shipped out from 2018-03-01
the warehouse (format: yyyy-mm-dd HH:MM:SS) 08:37:33
arr_station_time string The timestamp when the package arrives at the delivery 2018-03-01
station (format: yyyy-mm-dd HH:MM:SS) 15:37:31
arr_time string The timestamp when the package is delivered to the 2018-03-01
customer home (format: yyyy-mm-dd HH:MM:SS) 18:49:03

Table 6. Description of the inventory Table

Field Data type Description Sample value

dc_ID int Distribution center ID 9


sku_ID string SKU unique identification code fcc883f713
Date string Date (format: yyyy-mm-dd) 2018-03-01

Table 7. Description of the network Table

Field Data type Description Sample value

region_ID int Region ID 2


dc_ID int District ID (same as warehouse ID) 6

Figure 14. (Color online) Number of Districts Within value of past purchases, PLUS membership, etc.), and
the Regions logistics information (shipping networks, orders and
inventories, delivery time, etc.). The data sets capture a
“full customer experience cycle,” which begins the
moment a customer chooses the products on the
platform and ends the moment the customer receives
the products.

4. Downloading the Data and Python Code


MSOM members can access the data sets thorough the
MSOM website (https://connect.informs.org/msom/
events/datadriven2020). For easy access, we provide a
Python notebook19 with runnable sample code to fa-
cilitate reviewing and understanding of the data sets
as well as to explain the relationships among the
seven tables described in this paper. The code is
3. Conclusion provided in the online appendix, and a runnable
The data sets provided by JD.com are presented in the version is available within the data set package.
seven tables described above. These data sets are
based on the activities associated with 2.5 million Acknowledgments
users (457,298 made purchases) and 31,868 SKUs in The authors thank the guest editor, Gad Allon, the associate
March of 2018. Researchers are invited to analyze this editor; and two anonymous reviewers for their valuable
database with data-driven models to address research suggestions.
questions posed by themselves or by JD.com as stated
in Section 1. Endnotes
The data sets include product information (attributes, 1
JD.com’s PLUS membership is a subscription-based program that
pricing, etc.), customer information (demographics, total provides its members certain benefits that range from free shipping to
Shen et al.: MSOM Data Driven Research Competition
Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS 9
12
member-specific price discounts. For details about JD.com’s PLUS Note that these data capture the click event of a SKU initiated by a
membership, see https://plus.jd.com/index.html (Chinese content). user, but each click event may not lead to the purchase of this SKU. In
2
Note that the data provided by JD.com represent only a small other words, a user may choose not to purchase this SKU even after
sample of users and SKUs. Therefore, the database does not neces- the click event.
13
sarily fully capture the business performance or business trends When promise = 1, this refers to the standard same- and next-day
of JD.com. delivery promise: Orders placed before 11 a.m. will be delivered on
3
However, it is possible that some brands may launch “super brand the same day, and orders placed before 11 p.m. will be delivered
day” promotions within March. before 3 p.m. on the following day. When promise is x (x > 1), this
4 indicates that the delivery will arrive at day t + x, where t is the day
All SKUs are displayed on JD.com’s product page with the seller
the order is placed. We note that promise information is not available
name and/or tags so that customers are fully aware of whether the for a small fraction of 1P orders and for most of the 3P orders.
corresponding SKU is a 1P SKU or a 3P SKU. 14
5
For third-party products (i.e., 3P SKUs), the discounts are con-
The fulfillment process is usually described on the product page so trolled by the sellers. However, for JD-owned 1P SKUs, the discounts
that customers will know that the shipping process is managed by the result from discussions between different vendors and JD.
merchant itself. 15
6
Coupons normally consist of a discount value, an eligibility crite-
There may be some SKUs that receive no click during March, but the rion, and an expiration date. The discount value is the monetary
information for these SKUs is not available. amount that can be deducted from the order; the eligibility criterion
7
JD.com displays product ratings for each SKU. However, in the specifies which SKU or SKU set is eligible for coupon use and whether
Chinese marketplace, most product ratings reported by customers are there is a total purchase amount criterion. The expiration date shows
usually the highest rating. Because most ratings are rated 5, the in- when the coupon can be applied. There are many ways in which
formation associated with product ratings has been shown to be un- customers can receive a coupon. They can clip coupons from the
informative. Consequently, product ratings are omitted in the database. product detail pages, promotional landing pages, or “coupon mall”
8
Note that, even though a SKU is deactivated, it may still be able to (a specific section on the JD.com platform for coupon distribution).
be bought as a part of a bundled product or as the gift portion Customers can also receive personalized coupons based on their
of a promotion. past activities.
16
9
Regardless of different users’ user_level values, they observe the The data set, however, does not provide a detailed shipping path if
same information and receive the same service from JD.com. the order is routed through multiple warehouses.
17
10
JD PLUS membership costs up to US$45 per year and members enjoy a The delivery table contains shipment information about orders
variety of perquisites including exclusive discounts, higher purchasing involving 1P products owned by JD (and some 3P products owned
reward rate, free delivery, and return with no preconditions. About 18% by the third party). However, for other orders involving 3P
of those 458,269 customers in the data set are JD PLUS members. products owned by the third party, the shipment information is not
11 available to JD.com because the logistics providers are selected by
It is worth noting that this table only contains click information on
the merchants.
the SKU detail page. There are many other page types with which a
customer can interact on JD.com, such as the website main page,
18
A delivery station corresponds to the “last mile” facility before a
category main page, various landing pages, search, recommendation package is delivered to the customer. Hence, the arr_station_time
page, and shopping cart page. Although those pages also contain specifies the time when the package arrives at the delivery station before
information about SKUs and promotions, the customers still need to it is delivered to the customers. The timestamps of internal transfers of a
go to the SKU detail page to review the detailed description of the package routing through different warehouses are omitted.
19
products and place the order. See https://jupyter.org/.

You might also like