课后阅读msom 2020JDdata
课后阅读msom 2020JDdata
. https://doi.org/10.1287/msom.2020.0900
This article may be used only for the purposes of research, teaching, and/or private study. Commercial use
or systematic downloading (by robots or other automatic processes) is prohibited without explicit Publisher
approval, unless otherwise noted. For more information, contact [email protected].
The Publisher does not warrant or guarantee the article’s accuracy, completeness, merchantability, fitness
for a particular purpose, or non-infringement. Descriptions of, or references to, products or publications, or
inclusion of an advertisement in this article, neither constitutes nor implies a guarantee, endorsement, or
support of claims made of that product, publication, or service.
With 12,500 members from nearly 90 countries, INFORMS is the largest international association of operations research (O.R.)
and analytics professionals and students. INFORMS provides unique networking and learning opportunities for individual
professionals, and organizations of all types and sizes, to better understand and use O.R. and analytics tools and methods to
transform strategic visions and achieve better outcomes.
For more information on INFORMS, its publications, membership, or meetings visit http://www.informs.org
MANUFACTURING & SERVICE OPERATIONS MANAGEMENT
Articles in Advance, pp. 1–9
http://pubsonline.informs.org/journal/msom ISSN 1523-4614 (print), ISSN 1526-5498 (online)
Received: September 10, 2019 Abstract. To support the 2020 MSOM Data Driven Research Challenge, JD.com, China’s
Revised: November 5, 2019 largest retailer, offers transaction-level data to MSOM members for conducting data-driven
Accepted: December 20, 2019 research. This article describes the transactional data associated with over 2.5 million
Published Online in Articles in Advance: customers (457,298 made purchases) and 31,868 stock keeping units (SKUs) over the
December 9, 2020 month of March in 2018. We also present potential research questions suggested by
https://doi.org/10.1287/msom.2020.0900 JD.com. Researchers are welcome to develop econometric models or data-driven models
using this database to address some of the suggested questions or examine their own
Copyright: © 2020 INFORMS research questions.
1
Shen et al.: MSOM Data Driven Research Competition
2 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS
In particular, among all the promotion methods (e.g., as a baseline, and researchers should be careful about
direct discounts, bundle discounts, and volume dis- extrapolating their results.
counts), which one is more effective? In the database, each SKU can be identified either
5. Do ordinary customers behave differently from as first-party owned (1P) or third-party owned (3P),
JD.com’s PLUS members?1 How should JD.com improve depending on the ownership of the inventory of that
its pricing and shipping strategy for its PLUS members? SKU.4 All 1P SKUs are managed by JD.com, including
6. How should JD.com improve its demand forecast product assortments, inventory replenishments, product
accuracy for different geographic regions and dif- pricing, order deliveries, and after-sale customer ser-
ferent customer groups? vices. Despite different operations, 1P and 3P SKUs
7. How should JD.com improve its fulfillment effi- compete on the JD.com platform for sales through dif-
ciency and customer experience with better inventory ferent pricing strategies and marketing activities.
allocation strategies in a multilevel inventory network? In general, 1P SKUs are usually top sellers within
the category. By owning these 1P products, JD.com
2. Data Description can fully control the entire customer experience to
We now describe the transaction-level data provided provide guaranteed quality, fast delivery, and good
by JD.com. (We shall explain how to download the customer services. In contrast, all 3P SKUs are managed
database in Section 4.) To ensure confidentiality, by third-party merchants on the JD marketplace. Spe-
certain key identification information such as user ID cifically, to fulfill an order of a 3P SKU, the corre-
and SKU ID are anonymized.2 sponding merchant can decide freely whether to use
To keep the size of the database more manageable, the logistics services provided by JD Logistics or other
the database does not contain impression data, es- logistics service providers.5
pecially when JD.com may not have complete im- The data sets provided by JD.com offer a detailed
pression data from other channels (search, push no- view of the activities associated with all SKUs within
tification, SMS messages, social media (e.g., WeChat), one anonymized consumable category during the
mobile ads, etc.). However, our data contain all month of March in 2018. This category could be beauty
product detail page click events for each customer, care (e.g., face moisturizers) or men’s grooming (e.g.,
which can serve as a proxy. Instead, JD.com provides electric shavers) or something else. Owing to confi-
us with transaction-level data for the month of March dentiality, the specific category is not disclosed.
2018 during which there were no major holidays or The data set consists of seven tables that are labeled
promotions.3 Hence, the March data can be viewed as (1) skus, (2) users, (3) clicks, (4) orders, (5) delivery,
Figure 1. (Color online) Distribution of Attribute 1 Across Figure 2. (Color online) Distribution of Attribute 2 Across
All SKUs All SKUs
Shen et al.: MSOM Data Driven Research Competition
Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS 3
(6) inventory, and (7) network. We now describe each moisturizer category, these two attributes can be sun
of these seven tables. protection factor (SPF) and percentage of antiaging
ingredients. Similarly, for the men’s electric shaver
2.1. Table 1: SKUs category, these two attributes can be the number of
The skus table (Table 1) describes the characteristics shaves per charge and the number of personalized
of all 31,868 SKUs that belong to a single product shaving modes. Hence, both attributes characterize
category receiving at least one click during March the functionality of a product so that products with the
2018.6 As such, researchers should not generalize same attribute values have the same functionality. The
their results to other product categories. We now distributions of the value associated with these two at-
define each field and provide a brief description. Each tributes across all SKUs are depicted in Figures 1 and 2.
entry in the skus table corresponds to a unique SKU Notice that many SKUs have missing values for var-
(sku_ID). In addition, each SKU ID is “seller-specific.” ious reasons, including that (a) the third-party mer-
For example, an identical product that is sold by JD chants did not provide the attribute value, especially
as a 1P product and by a third-party seller as a 3P for certain slow-moving items, or (b) a certain attribute
product will be treated as two separate SKUs with was not applicable to certain SKUs.7
different SKU IDs. Similarly, an identical product sold For each SKU, the skus table provides two extra
by multiple third-party sellers will be denoted by elements: activate_date and deactivate_date. The former
different SKU IDs. specifies the date at which a SKU is first introduced on
Of these 31,868 SKUs, 1,167 of them are 1P SKUs the JD.com platform and the latter specifies the date at
(type value = 1) and the rest (30,701) are 3P SKUs (type which the SKU is terminated and removed from
value = 2). The brand information of each SKU is JD.com.8 Note that the data set lists the activate_date
provided via the field (brand_ID). However, only and deactivate_date variables only when these dates
9,159 SKUs out of 32,343 were involved in purchase fall in the month of March in 2018; thus, these vari-
activities during March of 2018. ables are usually blank.
Each SKU also has two key attributes: the first at-
tribute takes integer values between 1 and 4, and the 2.2. Table 2: Users
second takes integer values between 30 and 100. For The users table (Table 2) describes the characteristics
each attribute, a higher value indicates better perfor- of all 457,298 users who purchased at least one of
mance of a certain functionality. For the face the SKUs in the given category during March of 2018.
Figure 3. (Color online) Distribution of Users: User Level Figure 4. (Color online) Distribution of Users: Gender
Shen et al.: MSOM Data Driven Research Competition
4 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS
Figure 5. (Color online) Distribution of Users: Age Figure 7. (Color online) Distribution of Users:
Education Levels
Figure 6. (Color online) Distribution of Users: Figure 8. (Color online) Distribution of Users: Purchase
Marital Status Power Levels
Shen et al.: MSOM Data Driven Research Competition
Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS 5
Figure 9. (Color online) Distribution of Users: City Levels mobile, app, wechat, and others. Channels pc and mobile
are associated with clicks through web browsers on
personal computers and mobile devices, respectively.
Channel app corresponds to JD.com’s mobile app.
Channel wechat corresponds to the miniprogram that
runs on the social media app WeChat. Finally, channel
others aggregates the clicks from all other channels.
The distribution of all click events across all channels
is summarized in Figure 10. Because of the popularity
of smartphones in China and the popularity of mobile
payment options (e.g., WeChat payment), the majority of
click events come from the app and wechat channels.
The field request_time provides extra granularity. It
can be used to infer the customer browsing sequence
Figure 4 depicts the distribution of user gender and habits. In Figure 11, we plot the number of clicks
across all 457,298 customers in the database, and during the day on March 1, 2018, within the app
Figure 5 summarizes the distribution of estimated channel. We can clearly identify two peaks in the
user age. As shown in Figure 6, for this specific daily browsing activities: one from 8 a.m. to 4 p.m. in
product category, more than 60% of all customers are the day and the other in the late evening.
estimated to be female and the estimated ages of these
customers are in their 30s to 40s. From Figure 6, we
2.4. Table 4: Orders
observe a relatively even distribution between mar-
The orders table (Table 4) contains 486,928 unique
ried and single customers. Figures 7 and 8 provide the
customer orders associated with our focused product
customer’s estimated education level and purchase
category that were placed during the month of March
power. Figure 9 summarizes the distribution of ship-
in 2018. Each customer order (order_ID) in the orders
ping address according to different city levels. It can
table is based on a specific SKU (sku_ID) associated
be seen that most of the customers are from tier 1 and
with a unique customer (user_id). (If a customer or-
tier 2 cities.
dered multiple SKUs, then the same order_ID will
appear in multiple rows of SKUs.)
2.3. Table 3: Clicks Other pieces of information associated with a cus-
The clicks table (Table 3) establishes the linkage be- tomer order as shown in Table 4 include (a) order
tween users and SKUs through their browsing his- quantity for each SKU associated with the order
tory. Each entry in the clicks table represents a user’s (quantity), (b) the date and time when the ordering
“click event” on a specific SKU page.11 The date set event took place (order_date and order_time), (c) the
contains over 20 million click records that are asso- type of SKU being ordered (type = 1 if it is a 1P SKU
ciated with the clicks of 2.5 million customers. Note and type = 2 if it is a 3P SKU), and (d) the promised
that this table contains clicks contributed not only by delivery time of the order (promise).13 Figure 12 dem-
the users identified in the users table (Table 2) who onstrates that most orders have promised delivery
purchased at least one SKU but also by “other users” dates within two days. Figure 13 shows the total
who did not end up completing a purchase order. number of sales by date and by order type.
The records include the following: (a) the user who The orders table also offers information about product
initiated a click event (user_ID), (b) the SKU associated pricing and promotional activities for each SKU. For each
with the click event (sku_ID), (c) the time at which entry, we denote the original list price of the SKU in the
the click event occurred (request_time), and (d) the field original_unit_price and the actual paid price by
channel in which the click event occurred (channel).12 the customer for the SKU as final_unit_price. The
We classify the channel taken as five string values: pc, original list price of a SKU at any given time instant
Figure 10. (Color online) Distribution of All Click Events buy x pencils and y pads of paper). The final_unit_price
Across Different Channels for each gift item is always equal to 0.
Coupons can also be applied to the order after all
other promotions are applied. In contrast to the four
aforementioned promotion activities where discounts
will be applied automatically once certain criteria are
met, customers must “clip” (or claim) a coupon before
making a purchase.15 The field coupon_discount rec-
ords the coupon promotional value associated with
an order. Similar to quantity discount as explained
earlier, the discount value of the coupon is allocated
between items in the same order using an allocation
rule when necessary.
Note that, for each entry in the orders table, the gap
is the same for all customers, but the final price can
between original_unit_price and final_unit_price should
vary among customers owing to various discounts
always equal the sum of direct_discount, group_discount,
or promotions.
bundle_discount, and coupon_discount.
The “gap” between the original price and the final
Finally, for each order, we show from which district
price represents the coupons and discounts associ-
the order was shipped (dc_ori) and to which district
ated with different promotional activities for each
the order was shipped (dc_des). The district here is
SKU. There are four common types of promotional
defined by the warehouse ID that covers the demand
discounts on the JD.com platform:14
of that district. In other words, one can think of dc_ori
1. SKU direct discount: The seller of a SKU may
as the warehouse where the package is shipped from
offer a price cut in terms of a direct discount. This
and dc_des as the warehouse that is nearest to the
discount reflects the reduction in the list price as
customer’s designated shipping address. If dc_ori and
stated on the product detail page.
dc_des are the same, this means that the package is
2. Group promotion: The seller of a SKU may
shipped from the warehouse closest to the customer.
offer a quantity discount to entice the customer to buy
Otherwise, it indicates that the package is fulfilled by
more. This quantity discount promotion can take dif-
some other warehouse in a different district. We note
ferent forms including “get an RMB 100 discount if
that in theory any warehouse in the nationwide
buying over RMB 199” or “buy 3 and get 1 free.” We
network can fulfill the order for any customer in the
note that the quantity discount promotion is usually
country. However, in practice, there is a complicated
on the order level and we apply a simple allocation
order fulfillment logic that determines what inven-
rule to calculate the contribution provided by each
tory should be used to fulfill each customer order to
SKU in the order.
optimize fulfillment resources while satisfying de-
3. Bundle promotion: The seller may offer a bund-
livery promise.
le_discount if a customer buys a “prespecified bundle”
One can trace the shipping path and time of each
of SKUs within an order.
4. Gift items: The seller may offer a SKU as a “free order by using Tables 4 and 5.16 First, for each order
gift” (gift_item value = 1) if the customer purchases a denoted as order_ID, the order table (Table 4) provides
“prespecified set” of SKUs (e.g., get a free eraser if you information about the “origin” warehouse that the
order is shipped from (via the variable dc_ori) and the
“destination” warehouse that the order is shipped to
Figure 11. (Color online) Number of Click Events (via the variable dc_des). By using the information
Occurring on March 1, 2018, Through JD.com’s provided in Tables 4 and 5, one can trace the shipping
App Channel
path of each order.
match record in the delivery table can be considered there will be no record of that SKU at that warehouse
delivered by an alternative shipping method. on that day.
The delivery table contains 293,229 packages de-
livered by JD Logistics in the given time period,
2.7. Table 7: Network
among which 244,333 orders involve 1P SKUs (type = 1)
The network table (Table 7) provides information
and 48,896 orders involve 3P SKUs (type = 0).17 We
about the assignment of different warehouses located
further provide three key timestamps (up to hourly
in different districts (dc_ID) to different geographical
granularity) for each package delivery, namely, the time
regions (region_ID). For each district, a designated
at which the package was shipped from the warehouse
warehouse (dc_ID) is responsible for fulfilling orders
(ship_out_time), the time at which the package arrived
in the district. In addition, for different districts that
at the delivery station18 (arr_station_time), and the
are assigned to a geographical region, one of the (larger)
time at which the package was successfully delivered
warehouses will be designated as the “central ware-
to the customer (arr_time).
house” for that region. In JD.com’s context, a central
warehouse provides the “back-up fulfillment” option
2.6. Table 6: Inventory when other (typically smaller) warehouses in the
The inventory table (Table 6) provides information region run out of inventory for their corresponding
about the availability of each SKU (sku_id) at each districts. Figure 14 shows the number of districts
warehouse (dc_ID). We only disclose the availability within each geographical region. We denote each central
of the inventory at the end of the day (date) instead of warehouse for each region by setting dc_ID = region_ID.
the amount of inventory. In addition, when a SKU is
not available at a specific warehouse on a specific day,
Figure 13. (Color online) Sales in Quantity by Date and
Order Type
Figure 12. (Color online) Distribution of Promise Delivery
Time (1P Orders)
Shen et al.: MSOM Data Driven Research Competition
8 Manufacturing & Service Operations Management, Articles in Advance, pp. 1–9, © 2020 INFORMS
Figure 14. (Color online) Number of Districts Within value of past purchases, PLUS membership, etc.), and
the Regions logistics information (shipping networks, orders and
inventories, delivery time, etc.). The data sets capture a
“full customer experience cycle,” which begins the
moment a customer chooses the products on the
platform and ends the moment the customer receives
the products.