Detecting multi-timescale consumption patterns from receipt data: a non-negative tensor factorization approach

Matsui, Akira; Kobayashi, Teruyoshi; Moriwaki, Daisuke; Ferrara, Emilio

doi:10.1007/s42001-020-00078-5

Detecting multi-timescale consumption patterns from receipt data: a non-negative tensor factorization approach

Research Article
Open access
Published: 20 August 2020

Volume 6, pages 1179–1192, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal of Computational Social Science Aims and scope Submit manuscript

Detecting multi-timescale consumption patterns from receipt data: a non-negative tensor factorization approach

Download PDF

Akira Matsui¹,
Teruyoshi Kobayashi ORCID: orcid.org/0000-0002-3135-9038^2,3,
Daisuke Moriwaki⁴ &
…
Emilio Ferrara^1,5,6

2996 Accesses
1 Citation
3 Altmetric
Explore all metrics

Abstract

Understanding consumer behavior is an important task, not only for developing marketing strategies but also for the management of economic policies. Detecting consumption patterns, however, is a high-dimensional problem in which various factors that would affect consumers’ behavior need to be considered, such as consumers’ demographics, circadian rhythm, seasonal cycles, etc. Here, we develop a method to extract multi-timescale expenditure patterns of consumers from a large dataset of scanned receipts. We use a non-negative tensor factorization (NTF) to detect intra- and inter-week consumption patterns at one time. The proposed method allows us to characterize consumers based on their consumption patterns that are correlated over different timescales.

Consumer Segmentation Based on Use Patterns

Article 19 February 2020

Correlations and dynamics of consumption patterns in social-economic networks

Article 30 January 2018

A Human Dynamics Model for Analyzing the Temporal Characteristics of User Purchase and Comment Behaviors on E-Commerce Platforms

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Introduction

Consumption has been extensively studied in multiple research disciplines, and their viewpoints differ from one another. Macroeconomists, for example, consider that individual consumers’ decision determines the economic condition at the macroscopic level [1]. In marketing studies, on the other hand, analyzing the shopping behavior of individual consumers is essential to gain insight into business strategy [2]. Researchers also study consumption at different time scales; economists often assume that representative individuals live infinitely long to investigate life-long consumption paths, while business researchers are interested in shorter practical time scales.

Many studies point out that consumption patterns change in accordance with the consumer’s stage of life [3,4,5]. Arguably, young people having a child would go to supermarkets more frequently than elderly people. Income level of an individual would also affect how often and how much they spend for what. Different demographic characteristics may therefore exhibit different dynamical patterns of expenditure, and this leads us to conjecture that we could infer consumers’ demographic properties from their dynamical expenditure patterns.

To understand the consumption behavior of individuals with different demographic properties, we explore the following research questions:

RQ1::: Does consumers’ expenditure behavior exhibit dynamical patterns over multiple timescales?
RQ2::: Do the dynamical patterns reflect demographic differences?
RQ3::: What demographic factors characterize the expenditure patterns?

To answer these research questions, we develop a non-negative tensor factorization (NTF) method to detect multi-timescale patterns of consumers’ expenditure at intra- and inter-week scales. We employ the PARAFAC decomposition as a means to factorize a three-way tensor representing the actual expenditure data [6,7,8]. The NTF method has been widely used to mine temporal patterns in different social contexts, such as face-to-face contacts among humans [9, 10], online communications [11], online game [12] and students’ life in a university [13]. However, mining multi-timescale patterns has not been done so far, except for the study uncovering the intra- and inter-day transaction patterns of banks [14].

In our model, the (i, j, k)-th element of a tensor corresponds to the number of items purchased by consumer i on jth day of week k. The NTF allows us to know how the intra-week expenditure behavior is associated with the inter-week patterns and how many such multi-timescale patterns exist. We argue that different multi-timescale patterns may come from different demographic characteristics of consumers, such as gender, marital status, and age. This suggests that people in different stages of life indeed spend differently both at intra- and inter-week scales.

Related work

Maximizing aggregate consumption is a primary goal for policymakers and is considered to contribute to social welfare [15, 16]. Economists often model consumer behavior as a solution to a utility maximization problem with infinite horizon [16,17,18,19]. Using a formal framework based on a utility maximization problem, economists have been discussing how consumers form and follow consumption habits [20, 21], including whether or not such an explicit dynamical pattern exists [21,22,23,24,25,26]. Various studies also point out that consumption patterns tend to change according to the consumer’s stage of life [3,4,5].

Marketing scientists study consumer behavior from a more business-oriented viewpoint. For instance, they model the expenditure pattern of targeted consumers to predict the effect of a business strategy, such as a recommendation system, on actual consumption [27]. Models of consumer behavior in marketing studies incorporate various factors, including the structure of consumers’ network [28, 29], self-revealed information in social media [30, 31], and spatial information regarding the consumer’s geographical location [32]. Among many factors that could explain the observed consumption patterns, the sequence of temporal actions has been particularly studied to understand consumers dynamic behavior [33,34,35,36,37]. A dynamical model has also been used to predict consumers’ future activity [38]. Notably, some studies point out that there are temporal patterns of shopping activity at the intra-week scale, i.e., day-of-week effects [39,40,41].

In this study, we employ a non-negative tensor factorization (NTF) method [7, 8] to uncover hidden patterns in our receipt data. We represent consumers’ expenditure data as a 3-way tensor, which will be detailed in the following section. NTF is widely used to mine temporal patterns in face-to-face contacts [9, 10], financial transactions [14], online communications [11] and online games [12]. Based on the decomposed patterns from our consumption data, we show that consumers with different demographics have different consumption patterns.

Data

Our dataset is constructed from the receipt data scanned through a bookkeeping smartphone application Dr.Wallet [42]. This application allows users to digitize the record of their purchases by scanning receipts using smartphones or tablet PCs. Item names listed in receipts are annotated and documented by human workers. The dataset contains the prices, the name of each item and the date when the receipt has been scanned. There are in total 2,796,008 purchased items recorded by 2624 users from April 1, 2017 to January 21, 2018. The data also contains the demographic attributes of the users such as gender, marital status and age range. Table 1 shows the basic statistics and the demography of users.

Table 1 Basic statistics of receipt data collected from Dr. Wallet between April 1, 2017 and January 21, 2018. Total number of purchased items is 2,796,008. Age range is in ascending order, i.e., 1 and 6 denote the youngest and the oldest cohorts, respectively

Full size table

Methods

Tensor representation of consumption expenditure

Our study aims to detect dynamical patterns from our shopping record dataset. To pursue this goal, we use a non-negative tensor factorization (NTF) to obtain the latent factors that would reflect the characteristic expenditure patterns across different attributes of consumers [7,8,9, 12]. Here, we try to extract multi-timescale patterns that would exist at intra- and inter-week scales [14]. We represent the users’ shopping records by a 3-way tensor, whose size is given by $I\times J\times K$, where $I=$#consumers ($=2624$), $J=$#days in a week ($=7$) and $K=$#weeks ($=42$). The constructed 3-way tensor is interpreted as representing a sequence of weekly bipartite networks in each of which the nodes denoting the days of the week are connected to users with edge weights being the number of purchased items (Fig. 1).

Non-negative tensor factorization

The NTF method decomposes tensor ${\mathcal {X}}\in {\mathbb {R}}_{+}^{I \times J \times K}$ into latent factors that characterize the activity patterns of the corresponding mode. Each element of the tensor is denoted by $x_{i j k} \in {\mathcal {X}}$. In our model, $x_{ijk}$ denotes the number of items purchased by user i on j-th day of week k. We employ the PARAFAC decomposition as an NTF algorithm throughout the analysis [6, 7]. The PARAFAC decomposition is an approximation method that expresses ${\mathcal {X}}$ as a sum of rank-one non-negative tensors $\{\hat{{\mathcal {X}}_{r}}\}_{r=1}^{R}$:

$$\begin{aligned} {\mathcal {X}} \approx \sum _{r=1}^{R} \hat{{\mathcal {X}}_{r}} = \sum _{r=1}^{R} {\mathbf {a}}_{r} \circ {\mathbf {b}}_{r} \circ {\mathbf {c}}_{r}, \end{aligned}$$

(1)

where R denotes the number of components, and ${\mathbf {a}}_{r}\in {\mathbb {R}}_{+}^{I \times 1}$, ${\mathbf {b}}_{r}\in {\mathbb {R}}_{+}^{J \times 1}$ and ${\mathbf {c}}_{r}\in {\mathbb {R}}_{+}^{K \times 1}$ represent the r-th component factors that respectively encode the membership of a user to a component, intra- and inter-week activity levels. The operator $\circ$ represents outer product.

Let ${\mathbf {A}}\in {\mathbb {R}}_{+}^{I \times R}$, ${\mathbf {B}}\in {\mathbb {R}}_{+}^{J \times R}$ and ${\mathbf {C}}\in {\mathbb {R}}_{+}^{K \times R}$ be the factor matrices, whose r-th columns are vectors ${\mathbf {a}}_{r}$, ${\mathbf {b}}_{r}$ and ${\mathbf {c}}_{r}$, respectively. The factor matrices ${\mathbf {A}}$, ${\mathbf {B}}$ and ${\mathbf {C}}$ are obtained by solving the following minimization problem with non-negativity constraints:

$$\begin{aligned} \min _{{\mathbf {A}}\ge 0,{\mathbf {B}}\ge 0, {\mathbf {C}}\ge 0} \Vert {\mathcal {X}} - \llbracket {\mathbf {A}},{\mathbf {B}},{\mathbf {C}}\rrbracket \Vert _\mathrm{F}^{2}, \end{aligned}$$

(2)

where $\Vert \cdot \Vert _\mathrm{F}$ denotes the Frobenius norm, and $\llbracket {\mathbf {A}},{\mathbf {B}},{\mathbf {C}}\rrbracket$ represents the Kruscal form of the tensor decomposition (i.e., the right-hand side of Eq. 1). To solve this problem, we use the alternating non-negative least squares (ANLS) with the block principal pivoting (BPP) [43].

Number of components

We utilize the Core-Consistency Diagnostic to determine an appropriate number of components, R [6]. The basic idea of the Core-Consistency measure is to quantify the difference between PARAFAC decomposition and a more general decomposition, namely the Tucker3 decomposition [6]. The Tucker3 decomposition is more flexible than PARAFAC because it allows for correlations between different components. If PARAFAC and Tucker3 return similar decomposition, then the PARAFAC model is considered to be a good approximation of the original tensor (i.e., ignoring correlations among components would be justified).

For the PARAFAC decomposition, the (i, j, k) element of the tensor can be written as

$$\begin{aligned} x_{i j k}=\sum _{n=1}^{R} \sum _{m=1}^{R} \sum _{p=1}^{R} \lambda _{n m p} a_{i n} b_{j m} c_{k p}, \end{aligned}$$

(3)

where $\lambda _{n m p}$ denotes a product of Kronecker delta, i.e., $\lambda _{n m p}=\delta _{n m} \delta _{m p} \delta _{n p}$, where $\delta _{nm}$ is the Kronecker delta that takes one if $n=m$, and 0 otherwise. Note that $\lambda _{n m p}$ takes 1 if $n=m=p$ and 0 otherwise, so $\lambda _{nmp}$ is the (n, m, p) element of the superdiagonal binary tensor ${\mathcal {L}}$.

For the Tucker3 model, the (i, j, k) element of the tensor is generally written as

$$\begin{aligned} x_{i j k}=\sum _{n=1}^{R_{n}} \sum _{m=1}^{R_{m}} \sum _{p=1}^{R_{p}} g_{n m p} a_{i n} b_{j m} c_{k p}, \end{aligned}$$

(4)

where $g_{nmp}$ may not be expressed by a product of the Kronecker delta. $g_{nmp}$ is an element of the core tensor $\mathcal{G}$ obtained by the Tucker3 algorithm [7].

The Core-Consistency (CC) quantifies the difference between PARAFAC and Tucker3 decomposition by computing the distance between $\mathcal{L}$ and $\mathcal{G}$ as

$$\begin{aligned} \mathrm {CC}=100 \times \left( 1-\frac{\sum _{n=1}^{R} \sum _{m=1}^{R} \sum _{p=1}^{R}\left( g_{n m p}-\lambda _{n m p}\right) ^{2}}{R}\right) . \end{aligned}$$

(5)

Note that the number of components R is common for all modes in both the PARAFAC and the Tucker3 decomposition, i.e., $R_{n}=R_{m}=R_{p}=R$. If the PARAFAC and the Tucker3 methods yield exactly the same decomposition, then $\mathrm{CC}=100$ [6]. In general, CC value decreases with R because interactions between components tend to be more evident as the number of components increases.

Results

Core-consistency

The CC values for our NTF results with different rank size R are shown in Fig. 2. Since the solution for the PARAFAC decomposition is not unique due to randomly selected seeds, we run the decomposition algorithm 20 times for each R and calculate the mean of the CC value with the 95% confidence interval. The result indicates that $R=3$ would be the best choice because the CC value is larger than a rule-of-thumb threshold ($=85$) [10] up to $R=3$ and turns negative for $R=4$. Therefore, we set $R=3$ in the following analysis. We have repeated this procedure multiple times and confirmed that the results presented in the rest of the paper is qualitatively unaffected by the randomness of seeds.

Multi-timescale expenditure patterns

We firstly examine if the shopping activities have different dynamical patterns by looking at the components of day-of-week and weekly activities (RQ1). The r-th column of factor matrices ${\mathbf {B}}$ and ${\mathbf {C}}$ contain day-of-week and weekly activity patterns of Component r, respectively. For $R=3$, we find three distinctive day-of-week expenditure patterns from matrix ${\mathbf {B}}$ (Fig. 3a). Each pattern is characterized by the days of week on which activity is concentrated, namely Weekdays, Saturday, or Sunday. This suggests that the users’ expenditure behavior during a week is characterized by one of these three patterns or a combination of them.

Similarly, weekly patterns can be extracted from ${\mathbf {C}}$ (Fig. 3b). Activity level of Component 2 (i.e., weekday-shopping pattern) is the highest among the three and relatively stable except for the last 5 weeks which correspond to the year end. The activity of Component 1 (i.e., Sunday-shopping pattern) and 3 (i.e., Saturday-shopping pattern) are lower than that of Component 2 throughout the data period, while activity of Component 1 is a bit more volatile than that of Component 3.

Expenditure patterns and demographic differences

To address RQ2, we group the users based on their activities and see if each group has a characteristic demographic property. We use the factor matrix ${\mathbf {A}}$ obtained by the PARAFAC decomposition, on which we implement the k-medoids and the k-means methods to quantify the belongingness of user i to each component. We compare the two clustering methods with silhouette analysis [44] (Figs. S1 and S2 in Supplementary Information (SI)).

We find that the k-medoids method gives us more evenly sized clusters compared to the k-means method (Figs. S1 and S2). The mean silhouette coefficients for the k-medoids clustering are roughly the same across different numbers of clusters, which does not convey enough information to determine the number of clusters. We select the number of clusters $k=5$, judging from the fact that the rate at which the sum of distances between points in a cluster and the medoid decreases slows down around $k=5$ (Fig. S3 in SI). In Sect. 5.4, we will also show the results for which the consumers are grouped based on a threshold value.

Note that each consumer is classified by the k-medoids into one of the five non-overlapping groups based on their belongingness to each component quantified by matrix ${\mathbf {A}}$. To visualize the clustering result based on the k-medoids at the user level, we project the factor matrix ${\mathbf {A}}$ onto two-dimensional space by exploiting the t-SNE embedding [45] (Fig. S4 in SI). The t-SNE is a visualization technique that allows us to convert high-dimensional data into low dimensional vectors [45].

Characterizing clusters based on the demographic properties

Different multi-timescale expenditure patterns would reflect the users’ demographic characteristics because the status of a consumer (i.e., age, gender, marital status, etc) might determine, at least partially, the timing of shopping and the variety of items purchased. Here, we compare the demographic characteristics among the five clusters identified by the k-medoids method.

Figure 4 indicates that each user cluster is characterized by some demographic properties. Typical examples can be found from Cluster 1 and Cluster 4. Cluster 1 consists of relatively young consumers having no children, while Cluster 4 appears to be formed mainly by married elderly women who have children. We use the Chi-squared test to see if the demographic distribution in each cluster is significantly different from the null distribution obtained from the original demographic structure. The chi-squared statistic is given by the sum of squared differences between the number of users identified by the k-medoids method and the expected number under the null hypothesis: $\chi ^{2}=\sum _{m}\sum _{\ell } \frac{(D_{\ell m}-E_{\ell m})^{2}}{E_{\ell m}}$, where $D_{\ell m}$ denotes the observed number of consumers in category $\ell$ (i.e., Male, Female, etc) for Cluster m, and $E_{\ell m}$ is the expected number of consumers in category $\ell$ for Cluster m under the null [46].

The results from the Chi-squared tests suggest that for each demographic attribute (i.e., gender, age, marital status and child), the distribution of users identified by the clustering method is significantly different from the null distribution ($p < 0.001$). We also test whether there is a statistical difference in the distribution of users between two particular clusters. We conduct the statistical tests for all the pairwise combinations between different clusters. For all the demographic attributes, the null hypothesis is rejected for most of the pairs of clusters (Table S1 in SI).

Finally, we answer RQ3 by focusing on representative users in each component, who are selected based on their belongingness to a component. Since the representative users in a given component would share similar demographic characteristics, we could identify which component is associated with which demographic properties.

We detect $R (=3)$ groups of representative users according to the following threshold rule: User i is considered to belong to group r if $a_{ir}/\sum _{r}a_{ir} \ge h_r$, where threshold $h_r$ is chosen such that only the upper 10 percent of users belong to group r. Figure 5 shows the demographic distributions of the representative users belonging to each component. We note that each user may belong to multiple components, but such overlap is quite small (Fig. S5 in SI).

We find that “Marital status” and “Child” are two demographic properties that distinguish Component 2 (Weekday-shopping pattern) from the other components (Fig. 5c and d). For these two family-related attributes, the demographic distribution of the representative consumers in Component 2 is clearly different from the null distribution. This finding suggests that “Marital status” and “Child” would be the two driving factors that yield the five clusters detected by the k-medoids. On the other hand, the difference in user age between clusters seem to be more reflected in the activity of Component 1 (Sunday-shopping pattern) and 3 (Saturday-shopping pattern) rather than Component 2 (Fig. 5b), while it is not clear for gender (Fig. 5a). This means that gender and user age may be less important in extracting the multi-timescale patterns and the emergence of clusters classified by them.

Conclusion

We have presented a NTF-based method to extract dynamical shopping patterns of consumers from scanned receipt data collected through a bookkeeping application. The proposed method allows us to find intra- and inter-week expenditure patterns simultaneously, which would be impossible without such a large, high-resolution yet long time-series dataset. We found three multi-time scale patterns, each of which captures a characteristic expenditure behavior that is seen at daily and weekly scales.

While our method successfully revealed explicit patterns, there remain some issues that need to be addressed in future research. First, there may be other multi-timescale activity patterns that exist shorter and/or longer time scales rather than daily and weekly. For instance, the timing of shopping may be affected by time of a day, and consumption of expensive goods (e.g., cars) may be scheduled once in every ten years. Second, consumption patterns could also be encoded in what they purchased. While our analysis is based on the number of items purchased by a user, its composition would also be useful for revealing the demographic characteristics of users. Third, more multi-timescale patterns may exist in other economic and social contexts, such as financial markets, online communication networks and face-to-face networks. NTF is a useful and user-friendly tool for the detection of multi-timescale properties, and we hope our work will stimulate further research on many economic and social activities to better understand human behavior.

References

Mankiw, N. Greg. (2003). Macroeconomics. New York: Worth Publishers.
Google Scholar
Bell, David R., & Lattin, James M. (1998). hopping behavior and consumer preference for store price format: Why large basket shoppers prefer EDLP. Marketing Science, 17, 66–88.
Article Google Scholar
Attanasio, Orazio P, & Weber, Guglielmo. (2010). Consumption and saving: Models of intertemporal allocation and their implications for public policy. Journal of Economic Literature, 48, 693–751.
Article Google Scholar
Hurd, Michael D, & Rohwedder, Susann. (2013). Heterogeneity in spending change at retirement. Journal of the Economics of Ageing, 1, 60–71.
Article Google Scholar
Aguila, Emma, Attanasio, Orazio, & Meghir, Costas. (2011). Changes in consumption at retirement: evidence from panel data. Review of Economics and Statistics, 93, 1094–1099.
Article Google Scholar
Bro, Rasmus, & Kiers, Henk A. L. (2003). A new efficient method for determining the number of components in PARAFAC models. Journal of Chemometrics, 17, 274–286.
Article Google Scholar
Kolda, Tamara G, & Bader, Brett W. (2009). Tensor decompositions and applications. SIAM Review, 51, 455–500.
Article Google Scholar
Lim, Lek-Heng, & Comon, Pierre. (2009). Nonnegative approximations of nonnegative tensors. Journal of Chemometrics, 23, 432–441.
Article Google Scholar
Gauvin, Laetitia, Panisson, André, & Cattuto, Ciro. (2014). Detecting the community structure and activity patterns of temporal networks: a non-negative tensor factorization approach. PLOS ONE, 9, e13636.
Article Google Scholar
Sapienza, Anna, Barrat, Alain, Cattuto, Ciro, & Gauvin, Laetitia. (2018). Estimating the outcome of spreading processes on networks with incomplete information: a dimensionality reduction approach. Physical Review E, 98, 012317.
Article Google Scholar
Panisson, André, Gauvin, Laetitia, Quaggiotto, Marco & Cattuto, Ciro. (2014). Mining concurrent topical activity in microblog streams. arXiv:1403.1403.
Sapienza, Anna, Bessi, Alessandro, & Ferrara, Emilio. (2018). Non-negative tensor factorization for human behavioral pattern mining in online games. Information, 9, 66.
Article Google Scholar
Hosseinmardi, Homa, Kao, Hsien-Te, Lerman, Kristina & Ferrara, Emilio. (2019). Discovering hidden structure in high dimensional human behavioral data via tensor factorization. arXiv:1905.08846.
Kobayashi, Teruyoshi, Sapienza, Anna, & Ferrara, Emilio. (2018). Extracting the multi-timescale activity patterns of online financial markets. Scientific Reports, 8, 11184.
Article Google Scholar
Woodford, Michael. (2011). Interest and Prices: Foundations of a Theory of Monetary Policy. Princeton: Princeton University Press.
Book Google Scholar
Walsh, Carl E. (2017). Monetary Theory and Policy, 4th ed. Cambridge: MIT press.
Google Scholar
Campbell, John Y., & Mankiw, N. Gregory. (1989). Consumption, income, and interest rates: reinterpreting the time series evidence. NBER Macroeconomics Annual, 4, 185–216.
Article Google Scholar
Johnson, David S, Parker, Jonathan A, & Souleles, Nicholas S. (2006). Household expenditure and the income tax rebates of 2001. American Economic Review, 96, 1589–1610.
Article Google Scholar
Hsieh, Chang-Tai. (2003). Do consumers react to anticipated income changes? Evidence from the Alaska permanent fund. American Economic Review, 93, 397–405.
Article Google Scholar
Alvarez-Cuadrado, Francisco, Monteiro, Goncalo, & Turnovsky, Stephen J. (2004). Habit formation, catching up with the Joneses, and economic growth. Journal of Economic Growth, 9, 47–80.
Article Google Scholar
Havranek, Tomas, Rusnak, Marek, & Sokolova, Anna. (2017). Habit formation in consumption: a meta-analysis. European Economic Review, 95, 142–167.
Article Google Scholar
Dynan, Karen E. (2000). Habit formation in consumer preferences: evidence from panel data. American Economic Review, 90, 391–406.
Article Google Scholar
Guariglia, Alessandra, & Rossi, Mariacristina. (2002). Consumption, habit formation, and precautionary saving: evidence from the British household panel survey. Oxford Economic Papers, 54, 1–19.
Article Google Scholar
Carrasco, Raquel, Labeaga, Jose M., & López-Salido, J. David. (2005). Consumption and habits: evidence from panel data. Economic Journal, 115, 144–165.
Article Google Scholar
Browning, Martin, & Collado, M Dolores. (2007). Habits and heterogeneity in demands: a panel data analysis. Journal of Applied Econometrics, 22, 625–640.
Article Google Scholar
Crawford, Ian. (2010). Habits revealed. Review of Economic Studies, 77, 1382–1402.
Article Google Scholar
Fong, Alvis Cheuk M, Zhou, Baoyao, Hui, Siu Cheung, Hong, Guan Y, & Do, The Anh. (2011). Web content recommender system based on consumer behavior modeling. IEEE Transactions on Consumer Electronics, 57, 962–969.
Article Google Scholar
Rosenquist, J Niels, Murabito, Joanne, Fowler, James H, & Christakis, Nicholas A. (2010). The spread of alcohol consumption behavior in a large social network. Annals of Internal Medicine, 152, 426–433.
Article Google Scholar
Bressan, Marco, Leucci, Stefano, Panconesi, Alessandro, Raghavan, Prabhakar, & Terolli, Erisa. (2016). The limits of popularity-based recommendations, and the role of social ties. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 745–754.
De Choudhury, Munmun, Sharma, Sanket, & Kiciman, Emre. (2016). Characterizing dietary choices, nutrition, and language in food deserts via social media. In Proceedings of the 19th ACM Conference on Computer Supported Cooperative Work and Social Computing, pages 1157–1170.
Silva, Thiago H, de Melo, Pedro O S Vaz, Almeida, Jussara M, Musolesi, Mirco, & Loureiro, Antonio A F. (2017). A large-scale study of cultural differences using urban data about eating and drinking preferences. Information Systems, 72, 95–116.
Article Google Scholar
Wagner, Claudia, Singer, Philipp, & Strohmaier, Markus. (2014). Spatial and temporal patterns of online food preferences. In Proceedings of the 23rd International Conference on World Wide Web, pages 553–554.
Moe, Wendy W. (2003). Buying, searching, or browsing: Differentiating between online shoppers using in-store navigational clickstream. Journal of Consumer Psychology, 13, 29–39.
Article Google Scholar
Moe, Wendy W, & Fader, Peter S. (2004). Capturing evolving visit behavior in clickstream data. Journal of Interactive Marketing, 18, 5–19.
Article Google Scholar
Olbrich, Rainer, & Holsing, Christian. (2011). Modeling consumer purchasing behavior in social shopping communities with clickstream data. International Journal of Electronic Commerce, 16, 15–40.
Article Google Scholar
Senecal, Sylvain, Kalczynski, Pawel J, & Nantel, Jacques. (2005). Consumers’ decision-making process and their online shopping behavior: a clickstream analysis. Journal of Business Research, 58, 1599–1608.
Article Google Scholar
Benson, Austin R., Kumar, Ravi, & Tomkins, Andrew. (2016). Modeling user consumption sequences. In Proceedings of the 25th International Conference on World Wide Web, pages 519–529.
Platzer, Michael, & Reutterer, Thomas. (2016). Ticking away the moments: Timing regularity helps to better predict customer activity. Marketing Science, 35, 779–799.
Article Google Scholar
Kahn, Barbara E., & Schmittlein, David C. (1989). Shopping trip behavior: an empirical investigation. Marketing Letters, 1, 55–69.
Article Google Scholar
Namin, Aidin, & Dehdashti, Yashar. (2019). A hidden side of consumer grocery shopping choice. Journal of Retailing and Consumer Services, 48, 16–27.
Article Google Scholar
Bogomolova, Svetlana, Vorobyev, Konstantin, Page, Bill, & Bogomolov, Tim. (2016). Socio-demographic differences in supermarket shopper efficiency. Australasian Marketing Journal, 24, 108–115.
Article Google Scholar
Dr. Wallet. (2020). https://www.drwallet.jp. Accessed 22 March 2020.
Kim, Jingu, & Park, Haesun. (2012). Fast nonnegative tensor factorization with an active-set-like method. In High-Performance Scientific Computing, pages 311–326. Springer.
Kaufman, Leonard, & Rousseeuw, Peter J. (2009). Finding Groups in Data: an Introduction to Cluster Analysis (Vol. 344). Hoboken: Wiley.
Google Scholar
van der Maaten, Laurens, & Hinton, Geoffrey. (2008). Visualizing data using t-SNE. Journal of Machine Learning Research, 9, 2579–2605.
Google Scholar
DeGroot, Morris H, & Schervish, Mark J. (2012). Probability and Statistics. : Pearson Education.

Download references

Acknowledgements

AM and EF are grateful to DARPA (grant no. D16AP00115). TK acknowledges financial support from JSPS KAKENHI Grant nos. 15H05729 and 19H01506.

Author information

Authors and Affiliations

Department of Computer Science, University of Southern California, Los Angeles, CA, USA
Akira Matsui & Emilio Ferrara
Department of Economics, Kobe University, Kobe, Japan
Teruyoshi Kobayashi
Center for Computational Social Science, Kobe University, Kobe, Japan
Teruyoshi Kobayashi
AI Lab, CyberAgent, Inc., Shibuya, Tokyo, Japan
Daisuke Moriwaki
Department of Communication, University of Southern California, Los Angeles, CA, USA
Emilio Ferrara
Information Sciences Institute, University of Southern California, Los Angeles, CA, USA
Emilio Ferrara

Authors

Akira Matsui
View author publications
Search author on:PubMed Google Scholar
Teruyoshi Kobayashi
View author publications
Search author on:PubMed Google Scholar
Daisuke Moriwaki
View author publications
Search author on:PubMed Google Scholar
Emilio Ferrara
View author publications
Search author on:PubMed Google Scholar

Corresponding author

Correspondence to Teruyoshi Kobayashi.

Ethics declarations

Conflicts of interest

The authors have no conflicts of interest.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Matsui, A., Kobayashi, T., Moriwaki, D. et al. Detecting multi-timescale consumption patterns from receipt data: a non-negative tensor factorization approach. J Comput Soc Sc 6, 1179–1192 (2023). https://doi.org/10.1007/s42001-020-00078-5

Download citation

Received: 28 April 2020
Accepted: 28 July 2020
Published: 20 August 2020
Issue Date: October 2023
DOI: https://doi.org/10.1007/s42001-020-00078-5

Keywords

Profiles

Emilio Ferrara View author profile

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Detecting multi-timescale consumption patterns from receipt data: a non-negative tensor factorization approach

Abstract

Similar content being viewed by others

Consumer Segmentation Based on Use Patterns

Correlations and dynamics of consumption patterns in social-economic networks

A Human Dynamics Model for Analyzing the Temporal Characteristics of User Purchase and Comment Behaviors on E-Commerce Platforms

Explore related subjects

Introduction

Related work

Data

Methods

Tensor representation of consumption expenditure

Non-negative tensor factorization

Number of components

Results

Core-consistency

Multi-timescale expenditure patterns

Expenditure patterns and demographic differences

Characterizing clusters based on the demographic properties

Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflicts of interest

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Profiles