Ieee

Uploaded by

abhaym.aidsioe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

7 views8 pages

Ieee

Uploaded by

abhaym.aidsioe

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 8

2017 IEEE 6th International Congress on Big Data

New E-Commerce User Interest Patterns

Matthias Volk, Abed Elrahman Shareef, Naoum Jamous, Klaus Turowski

Magdeburg Research and Competence Cluster for VLBA (MRCC)
Otto von Guericke University (OVGU)
Magdeburg, Germany
{matthias.volk,naoum.jamous,klaus.turowski}@ovgu.de

Abstract—The number of online purchases is increasing In order to create an advantage over competitors, to
constantly. Companies have recognized the related acquire potential customers, and to position themselves in
opportunities and they are using online channels progressively. the market, web analyses are often carried out by firms. In
In order to acquire potential customers, companies often try to particular, when it comes to investigating the behavior of an
gain a better understanding through the use of web analytics. internet user, clickstream data has been established as a
One of the most useful sources are web log files. Basically, suitable source of information [4–6]. The data itself contains
these provide an abundance of important information about a large number of fields that can provide very detailed
the user behavior on a website, such as the path or access time. information about the user of a website. Thus, companies
Mining this so-called clickstream data in the most
have used clickstream data for a variety of purposes so far.
comprehensive way has become an important task in order to
This supports them in designing their websites as attractive
predict the behavior of online customers, optimize webpages,
and give personalized recommendations. As the number of and intuitive as possible, and also enable them to provide
customers constantly rises, the volume of the generated data direct recommendation of certain products to acquire
log files also increases, both in terms of size and quantity. customers [4]. However, the volume of this clickstream data
Thus, for certain companies, the currently used technologies continues to grow as a result of the increasing interest in
are no longer sufficient. In this work, a comprehensive online shopping and the spread of mobile devices [1].
workflow will be proposed using a clustering algorithm in a Particularly, when it comes to the efficient analysis and
Hadoop ecosystem to investigate user interest patterns. The evaluation of such large amounts of data, current
complete workflow will be demonstrated on an application technologies frequently reach their limits.
scenario of one of the largest business-to-business (B2B) In the long term, both aspects should be observed in this
electronic commerce websites in Germany. Furthermore, an context, the specific data itself as well as the general
experimental evaluation method will be applied to verify the analyzing method. Therefore, a specific methodology
applicability and efficiency of the used algorithm, along with evaluating these data is crucial, both in terms of the
the associated framework. overarching procedure as well as the used algorithm. This
would ensure that only the data, that promises a certain
Keywords: big data, clustering algorithm, clickstream data, added value, will be analyzed. In addition, the use of big data
hadoop ecosystem technologies appears to be useful in this context of very large
amounts of differently structured data, as already confirmed
I. INTRODUCTION by different cases with similar requirements [7]. This leads to
In times of mobile devices, all-encompassing the main research question: “How can massive user-
networking, and the continuous shift of business into the generated clickstream data from e-commerce pages be
Internet, the number of online transactions is steadily examined in order to identify user-specific interest
increasing. Studies such as the PWC 2016 report [1], patterns?” Starting from this, various questions could be
E.Eichmann [2], and C. Annicelli et al. [3] highlight this derived, which were also investigated in the context of this
change. In 2019, for instance, it is expected that every third study:
person on Earth will make at least one purchase per year x RQ1-Which information from the recorded
over the Internet, in comparison to around every 4th. person clickstream data are essential and should be mined?
in 2014 [3]. Companies have recognized the associated x RQ2-What is the most suitable algorithm to reveal
opportunities and are using online channels increasingly for interest patterns of a user?
their sales. The increased competition leads them to find x RQ3-How can the algorithm be applied to big data
innovative solutions with which they might attract new technologies in order to examine massive amounts of
customers. Since the main objective should be to distinguish data without losing the performance and throughput?
from the competition, it is important to recognize the The main objective of this work was to implement an
interests and needs of potential customers, to provide analysis method that examines massive amounts of user-
specific recommendations through an attractive webpage. generated clickstream data, using big data technologies, to
But in doing so, it is often a nontrivial undertaking to obtain reveal interest patterns. The expected surplus values arise not
detailed information about potential customers, especially only from possible recommendations, but also from certain
when no purchases have been made yet. optimizations. This includes long-term strategic orientations

978-1-5386-1996-4/17 $31.00 © 2017 IEEE 406

DOI 10.1109/BigDataCongress.2017.60
of the product range, as well as improvements and expansion A. Clickstream data
of the structure and functionalities of the respective website. Clickstream data are essentially server generated log
II. STRUCTURE entries which are created when a user accesses a website.
Typically, they describe the “visitors path through one or
This work follows the design science research more websites” [4]. The granularity, structure, and frequency
methodology according to Hevner et al. [8] and also, the with which these entries are generated, depends on the
recommended workflow by Peffer et al. [9], to improve the configuration of the servers. In most of the cases, these semi-
clarity and the reproducibility of the investigation. According structured data contain a variety of different fields and
to this workflow, the design science methodology consists of information, as depicted in an exemplary log entry in Fig.2.
six consecutive phases. In order to gain a detailed overview
[19/Jan/2017:12:00:04+0100]144.44.26.9-
of their contents, all process steps are depicted together, with "GET/catalogdata/top?top=
their main functionality and the respective chapter of this basket&protocol=http&withStylesheet=false&lgn=no&jQueryAlreadyL
work, in Fig.1. oaded=true&isCallCenterLogin=false&ViewName=live HTTP/1.1" 200
20485 "-" - basket-release-2017-01-18-A Java/1.8.0_40

Figure 2. Sample log entry

In addition to the current system of the user, other data,

such as the time of access, the previous page, and the IP of
the user, can be transmitted as well [4]. Thus, the use and
analysis of such data offers great potential, especially in
terms of electronic commerce (e-commerce). For instance, it
can easily be identified at which page users stayed the
longest. However, when it comes to more sophisticated
questions, the right combination of a powerful processing
Figure 1. The recommended workflow following Peffer et al. [9] framework and suitable analysis methods is necessary.
In the first chapter of the work, an initial motivation, the B. A literature review
description of the problem, is given and the basic objectives Considering the previously described problem, the
are derived. This is identical to the main content of the first identification of interest patterns using clickstream data does
and second process step of the workflow. not represent a trivial undertaking. For this reason, different
Before the actual design and development take place, it is methods and algorithms had to be compared to one another,
necessary to identify the required knowledge and theory. For not only to find the most suitable method in utilizing the data
this reason, the third chapter contains information about efficiently, but also to find out which part of the clickstream
clickstream data, examines the state of the art of clustering is the most important. Hence, a thorough literature review
algorithms, and describes big data implementations. was conducted. Therefore, we used the methodology of
Chapter four, the main chapter, explains in detail the structured literature research by Arlene Fink [10] and
design and development of the intended solution and will be qualitative content analysis by Philipp Mayring [11]. The
accompanied by a real-world example. In the subsequent basic objective was to answer two of the previously derived
chapter five, an experimental evaluation is realized to research questions, in particular RQ1 and RQ2. After an
“observe and measure how well the artifact supports a initial examination and evaluation, 80 scientific papers, using
solution to the problem“[9]. Chapter six discusses the different search terms and literature databases, were
findings of this work and gives an outlook on potential identified. By using an qualitative content analysis [11], this
developments and future research. At the end of the paper, initial quantity could be reduced to 14 scientific papers.
the final chapter provides a conclusion. A closer examination of the remaining contributions
revealed that in many cases, user browsing behavior is
III. STATE OF THE ART influenced by various indicators. Most of these paper reveal
Before the design and development and therefore the various indicators which should be mined from this data,
implementation can take place, some formal considerations such as the clickstream path [12], frequency [13], or duration
have to be made. For this reason, the following subsection [14]. But at this point, some challenges still exist. For
provides some basic information about clickstream data. instance, most of the contributions derived the information
Furthermore, a structured literature review is presented only from a specific category. This is sometimes not very
which has been conducted to find an algorithm for analyzing effective, for instance, when it comes to the optimization of
such amounts of data. At the end of this chapter, the complex website taxonomies or recommendations in terms
Knowledge Discovery in Databases (KDD) process will be of cross-selling. For this reason, the information flow should
described as a basic workflow to conduct data analysis not be limited to only one category. Instead of this, the user
starting from the collection to the interpretation of results. interest and therefore the clickstream data of other categories
should be recognized as another indicator as well.

407
Su et al. [15] proposed in their work an approach that category indicator. For this reason, the leader clustering
combines three indicators to analyze the browsing behavior algorithm by Su et al. [15] was used.
of website users. Specifically, a leader clustering approach Essentially, this algorithm allocates each user to a cluster
was used which examines clickstream data on the visiting based on the various indicators and the calculated similarity.
frequency, browsing sequence, and browsing duration of an In the case of the first entry of the dataset, this takes always
e-commerce website. Contrary to some other contributions, the role of the leader in a new cluster. After this, each object,
this approach is not limited to a unique category and starting from the second entry in the dataset set, may either
recognizes other categories as well. Furthermore, it can be be assigned to the most similar cluster (according to the
noted that few of the investigated contributions were outlined chosen leader) or act as a new leader of a cluster.
in this paper as well, such as [13, 16–18]. This may be Nevertheless, it can be noted that this assignment depends on
related to the fact that very little research in this specific field two different values which need to be determined
has been carried out so far. However, this fact reveals that experimentally in beforehand. The general threshold γ
the main algorithm itself was influenced by other defines at which time the user acts as the new leader of a
contributions. Hence, it has been assumed that this approach cluster. This is the case when the similarity of certain
would be the most appropriate algorithm for further indicators deviates too much from the threshold value. In
application. In order to give an overview of the investigated case of multiple assignability, the second value, the rough
contributions and examine the suitability of this very threshold β, determines to which cluster the user can be
comprehensive approach, a comparison between the assigned. In addition to these two values, however, the
collected investigations was realized. For this purpose, the weighting of the individual indicators can also be
indicators as described in Su et al. [15] were identified and determined. The pseudocode of the generally described
used as the basic comparative features. The results of this leader clustering algorithm is shown in Fig. 3. More precise
classification are depicted in Table 1. information and formulas concerning the calculation of
similarity and the algorithm itself can be found within the
TABLE I. CLICKSTREAM ANALYSIS LITERATURE contribution by Su et al. [15].
1 INPUT
Clickstream Indicators Across 2 S: Clickstream dataset
Approach Ref.
Path Frequency Duration Cat. 3 γ: general threshold
4 β: rough threshold
Bayesian 5 OUTPUT
x x x [19]
Methods 6 leaders: array of all leaders
7 clusters: array of all clusters
Collaborati 8 STEP:
x x x [18]
ve filtering 9 Randomly choose a user from dataset as a leader of the first
Fuzzy 10 cluster
x x [14] 11 FOR EACH useri IN dataset DO
clustering 12 FOR EACH lj IN leaders DO
Stochastic 13 calculate the similarity simij between lj and useri
x x x [16] 14 Sort leaders by corresponding similarities
regression 15 Max_sim = largest value of similarity
Fuzzy 16 IF max_sim ≤ γ :
leader x x x [20] 17 Create a new cluster led by useri
18 ELSE:
clustering 19 FOR EACH lj IN sorted leaders DO
Model- 20 IF (simij/max_sim) ≥ β :
based x x [12] 21 add useri into the cluster led by lj
22 RETURN clusters, leaders
biclustering
Cross-
sectional x x [21] Figure 3. The leader clustering alogrithm [15]
approach
Graph
x x x [22]
D. Workflow for implementing data analyses
Clustering
Descision
Although the intended algorithm is a combination of
x x x [23] multiple research papers, the majority of the mentioned
Tree
Fuzzy Co- contributions did not provide specific implementation
x x [13]
Clustering details. This is very important, especially when currently
Logit established technologies reach their limits and fail while
x x x [24]
modelling preprocessing and analyzing massive amount of data. At this
Association
rule mining
x x [17] point, big data technologies are increasingly used, especially
K-Means when large amounts of differently structured data must be
x x [25] analyzed very quickly, as shown in numerous use cases [7].
clustering
Leader
x x x x [15] However, it should be noted that the implementation of these
Clustering projects is often a nontrivial undertaking. Usually, as in the
case of conventional data analysis, several steps are
necessary in advance. These steps also include the collection,
C. The leader clustering algortihm cleansing, and transformation of the data [26]. Depending on
As one can easily notice, only a few contributions the respective phase additional technological considerations
recognized all of the presented indicators. In most cases, not are necessary, considering the demands and the associated
all were considered equally. The same applies for the across characteristics of the data. This is also highlighted by

408
different real-world applications and descriptions in various threshold values. The knowledge about the interest patterns
contributions, realizing such projects [27–39]. Presumably, of the users themselves will be derived through the
this is due to the application specificity and the diversity of interpretation of the data obtained by the analysis. After that,
the individual technologies. each user is left to decide whether to use the results for the
However, this could not be applied to the basic optimization of the web pages for enhancement of decision
processing platform. In most of the cases, Hadoop or a support or as a starting point for further research. This is the
complete package, such as Hortonworks, Cloudera, or case, for instance, if further observations will be carried out
MapR, has been used. A consensus, on the other hand, exists by additional methods.
in the superordinate process itself. The KDD process was
applied in most of the cases implicitly. This process is
generally based on the work of Fayyad et al. [40]. It can be
used to gain new insights and knowledge through the
analysis of different datasets [4, 40]. In this multi-step
process, the analysis of the data is gradually approached by
the previously described steps: selection, preprocessing, data
mining and interpretation [40]. The main analysis is achieved
in most cases by data mining methods and the following
interpretation of the yielded results.
IV. DESIGN AND DEVELOPMENT
As described in the second chapter of this work, the
design and development process represents the core of Figure 4. The proposed workflow
design science methodology. The result is often described as However, when it comes to the technical implementation,
an artifact and may vary in its form [8]. Depending on the it should be noted that certain requirements and conditions
main objective, this can be presented, for instance, as a may change. In this case, the practitioner of this workflow
model, software, or a process. In the following chapter, a must react as soon as possible, for instance, to structural
general procedure will be presented based on the previously changes of the data in the preprocessing stage.
described findings. Following this, a real-world application
is demonstrated and subsequently evaluated within the next B. Demonstration of the artefact
chapter. According to the chosen methodology, a demonstration
A. The proposed approach of the developed model will now be given. More
specifically, the clickstream data of one of the largest
At the present time, a suitable algorithm has been found European B2B trade companies was investigated in one
to derive user interest patterns from clickstream data [15], batch using the proposed workflow and big data
but not in which way this could be implemented, especially technologies. In doing so, the data of a complete financial
in terms of big data technologies. For this reason, an year was used as a starting point. This includes a total of
appropriate approach was developed using the previously 1366 million entries of clickstream data distributed over
discussed findings, as depicted in Fig.4. It illustrates the 4672 log files. At the technical level, a cluster consisting of
basic procedure for the determination of user interest patterns three servers, each with 6 Cores 2.60 GHz, 128 GB of RAM,
using big data technologies on a high level. As already and 8 Terabytes of hard drive, was used to implement this
mentioned, the process is based on the individual steps of project. Corresponding to the respective levels of the
KDD and the necessary preconditions of the algorithm itself. proposed model, appropriate big data technologies had to be
The first step involves the collection and storage of the identified. As a core, the Hadoop ecosystem has been used.
user-generated log entries which act as the input clickstream The exact specifications, according to the individual steps in
data. Within the following steps, the data will be transferred the workflow, including the underlying hardware can be
to the preprocessing and the transformation layer. The used found in Table 2. Furthermore, the applied workflow in
importing tool depends on the source of the data itself as conjunction with the used technologies is depicted in Fig.5.
well as the framework for the following preparation of the
1) Storage, selection and ingestion
data. Due to the preprocessing, cleansing, and transformation
at this stage, the framework should, in turn, be selected by Before the actual implementation and application of the
the characteristics of the data. In doing so, consideration algorithm took place, the data had to be selected first and
should be given, for instance, to the structure, quantity, and then transferred. According to the previous description, the
speed with which the data will be processed. As it is the case log files of the web servers were selected as a suitable source
for many big data projects, Hadoop could be used as a of the clickstream data. The transmission of the addressed
starting point. data can be carried out in different ways. Specifically, after a
For the analysis, according to the necessary data mining preliminary exploration, four reliable tools have been
method, the leader clustering algorithm will be applied. identified. Sqoop, Flume Kafka, and Pentaho Data
Depending on the usability of the results, this can be adjusted Integration (PDI), which works with the Secure File Transfer
by changing the major constants, such as the weighting or Protocol (SFTP), turned out to be suitable tools. The later

409
was implemented to transfer the data from the weblog server (200) and an external IP address was checked. As a result, all
to Hadoop HDFS. Due to the given semi-structured data and irrelevant entries that did not have the targeted status code,
the nature of the batch processing, this seemed to be the best an internal IP, or any other Uniform Resource Identifier
option. (URI) file type extension, such as java script (.js), were
removed. A total of 94% of all clickstream data was
TABLE II. APPLICATION FRAMEWORK BASED ON removed.
THE KDD STEPS
3) Transformation
Level Specifications For the determination of the individual indicators, all
One cluster with three servers each with 6 remaining data had to be transformed into its desired format.
Hardware The timestamp values were recorded in the format 'YYYY-
cores -2.60 GHz, 128 GB RAM, 8TB HD
Platform
Hadoop ecosystem 2.6.0 and HDFS with MM-DD HH:MM:SS'. For the later calculation of the
Cloudera cdh5.4.2 relative time duration, the timestamp of each record was
Selection Pentaho PDI 5.4.0. transformed into its corresponding UNIX type. For the
Preprocessing & Regular expressions, MapReduce Jobs and identification of the respective products and categories, the
Transformation Hive 1.1.0 on Spark associated assignment was loaded from the data warehouse
and compared by means of a Hive user defined function
Analysis Hive HQL, Pig Scripts, Impala
(Hive UDF).
Interpretation Hue GUI, Solr and Microsoft Excel Different fields exist to identify users within the
clickstream data, such as the IP address or the session ID.
2) Preprocessing Due to problems, when multiple users share a common IP
After the data was successfully transferred, some address, the session ID was used, as recommended by [15].
preparatory measures had to be taken before the actual According to this, during the sessionization, a set of clicks
analysis could be carried out. In addition to the processing of within a certain session ID needs to be grouped and assigned
the raw data itself, the relevant indicators for the algorithm to a specific user. This transformation step was also
had to be identified and extracted. First of all, an initial implemented using a Hive UDF.
cleansing of the data was performed to remove duplicates
and filter out invalid and incomplete entries. Due to the C. Data analysis
complexity of the task and the amount of data, this was Subsequently, the individual indicators had to be defined
realized through the use of MapReduce. For instance, in the as input parameters for data analysis. The duration was
case of incomplete entries, attention was paid to fields such calculated by subtracting the page request time of two
as the session ID, Uniform Resource Locator (URL), or the adjacent pages belonging to the same session ID. In doing
time, which are important as input for the algorithm [15]. At so, it has been found out that the number of sessions
the same time, the data was converted into the tab-separated decreases when the time interval is increasing.
value (TSV) format and divided into equal-sized blocks to This is not to be considered a negative finding. But due
allow post-examination using Hive and Pig. After the to the heterogeneity of the sequence in terms of the viewed
incorrect entries were removed, the relevant data was products and categories, it could be assumed that the initial
extracted. At this point, it had to be ensured that all user intention has changed within the sequence. Specifically,
remaining entries were successful entries from outside of the this applies for very long visiting paths. Based on this
company network. More specifically, all Hypertext Transfer finding, a suitable temporal threshold value had to be
Protocol (HTTP) queries provided a positive status code determined until which the called page will be added to the

Figure 5. Technical representation of the intented workflow

410
current sequence. After numerous experimental tests, the the website search engine. In both cases, the user spends a
limit of 240 seconds per page was set. Therefore, the end of long time to find what he is looking for. From this
each sequence was marked by exceeding the set time limit. information, the categorization of products and the
Thus, the sequence was implicitly determined by the navigation structure could be reengineered to enhance the
temporal delineation. The frequency with which the user user experience. Furthermore, recommendations could be
visited the single product and category pages within his or given, such as binders on webpages where printers are listed
her sequence was determined by analyzing the single and vice versa.
transformed clickstream entries.
Additionally, it should be mentioned that some of the V. EVALUATION
sessions were omitted which could distort the results. Thus, To observe and measure the applicability and efficiency
only sequences were considered that had more than one entry of the developed artifact, it is necessary to verify and
and less than one thousand. In the case of the latter, we validate the proposed solution [9]. For this reason, an
assumed that these were spam bots. experimental evaluation [8] was carried out in two parts
Due to the experimental investigation of the duration and using different datasets with the already implemented
the dependency of the derived sequence, we slightly shifted technical framework of the real-world use case. On the one
the focus to the frequency. Hence, the weighting value of the hand, the developed workflow from Fig.4 itself was
frequency variable was set to 0.4, whereas the value of the evaluated and, on the other hand, the usability of the results
path and time indicators were both set to 0.3. obtained from it. Initially, the results from the demonstration
Similar to the previously described values, the remaining were compared to those of another record. In contrast to the
thresholds were also determined by experimental tests. After previous dataset, the latter contained only data from the
first evaluations, we found that a general threshold with a following month. Basically this represents around one
value higher than 0.25 generates many very small clusters. twelfth of the corresponding amount, containing a complete
This partly leads to very misleading results and interest financial year. This illustrates that even smaller datasets can
patterns in some of the clusters due to their specific scope. be examined exclusively, as already viable with conventional
For the rough threshold, which is responsible for the technologies. However, this can lead to stability problems,
clustering overlapping, we set the value to 1 and decrease it for instance caused by the statistical outlier. Regarding this,
slowly to see the differences. We realized that the number of only large amounts of data should be analyzed or aggregated.
overlapping clusters drastically increased when the value This can be observed on one of 741 clusters in Fig. 7. The
went below 0.95. This rapid change of overlaps is first record contains log entries of more than one year and
comprehensible since the analyzed dataset comes from a the second only of one month.
B2B e-commerce website. Hence, many cluster leaders share
a similar navigation approach, especially if they might be
employed by the same company.
After all targeted data was cleaned and transformed, and
the important constants were determined, the algorithm
could be executed. For the implementation and execution,
Impala and Hive were used.
D. Interpretation
Overall, 741 clusters were generated. A closer
examination of the results showed that in almost all clusters,
a high fluctuation of user requests has been found. Even
within a less extensive cluster, many different products and
categories were queried, as shown in the taxonomy of Fig.6.

Figure 7. Comparsion of the dataset results

As mentioned at the beginning of this paper, the use of
the resulting knowledge might be used to optimize, for
instance, the structure of the website. Therefore, the
dependencies and movement of the user between the
Figure 6. Taxonomy of one dervied interest pattern categories were identified. As an example, the ten most
frequent connections to a further cluster with the same
Based on this, it could be identified that if a user wants to category level were derived and presented in Table 3. As one
buy accessories, such as printer consumables and office can easily notice, these are mutually constructed, which
labels, he has to either go back to the closest category or use proves the correlation between these categories. However,

411
TABLE III. DERIVED CATEGORY CONJUNCTIONS and real-time analysis, as planned by the described company
First Category Next Category Clicks
(see Fig.5). In the use case itself, an implementation was
carried out as an example on the basis of batch processing.
Switching Program Cable Accessories 91
Particularly when it comes to possible recommendations,
Cable Accessories Switching Program 90 which are not only given during a further visit or later by
Outdoor Wall Lights Switching Program 75 means of a personalized e-mail, such time critical
Cooking & Grill
evaluations are essential. A starting point here would be, for
Switching Program 70 instance, the use of Flume which is a reliable tool for real-
Technology
Switching Program Ceiling-Mounted Lights 69 time ingestions. It should be mentioned, however, that
additional investigations would be necessary to obtain such
Switching Program LED Bulbs 68 precise technological specifications. The sole examination of
Bulbs - Fluorescent Bases Switching Program 65 a single case study is not sufficient enough. In addition to
these future developments in the technological sense, further
LED Bulbs Switching Program 62
optimizations on the proposed approach are imaginable.
Ceiling-Mounted Lights Switching Program 62 Thus, the extension of the algorithm appears to be sensible
Switching Program Outdoor Wall Lights 62 by considering additional data, such as from ratings, social
networks, customer communications, or created wishing
due to the low amount of clickstream data, there is no clear lists. A diversification of such additional sources is also
preference or ranking, for example, of further product shown in technical implementation of the demonstrated use
recommendations. This is not the case when much larger case in Fig.5. In this way, particularly extensive interest
records are used, as found in the first dataset. patterns can be derived which are composed both of the
As initially assumed, the reliability and efficiency of the derived behavior and the personally expressed opinion of the
model through the comparison of two differently long user. Furthermore, in the sense of the extension, procedures
datasets has been proven. It could also be shown that in such also appear conceivable which present a generalized
analyses, primarily very extensive datasets should be sequence to specify both the individual indicator weightings
examined or at least considered when it comes to real-time as well as the thresholds values.
evaluations. In the future, further evaluations using different
use cases could be realized to obtain knowledge about VII. CONCLUSIONS
technical implementations. Thus, derivations of possible In this work, a new workflow that illustrates the basic
technology recommendations are also conceivable, in procedure for the determination of user interest patterns
general as well as for the individual phases. Furthermore, using big data technologies on a high level has been
general methods could be established in order to determine provided. The overall goal was to identify user-specific
the threshold values and the severities of the single interest patterns from massive amounts of user-generated
indicators. clickstream data. A structured literature review was carried
out to find a suitable solution. Using this, it has been found
VI. DISCUSSION out that at the present time many approaches exist to analyze
Within the scope of this work, a process model was clickstream data. The approach by Su et al. [15] has proved
developed and demonstrated, based on the known sequence to be particularly extensive. For this reason, it was applied in
of KDD (cf. Fig 4.). At its core, it allows the application of the further course of this work. Since no common method of
the leader clustering algorithm using big data technologies. possible implementation and application, especially in the
Despite the developed model and the successful context of big data technologies, has been found, the widely
demonstration, it has to be observed that strict adherence is used KDD [40] was used. Therefore, a process has been
not an absolute guarantee for a successful implementation. developed that uses the implementation of the leader
Especially in terms of huge amounts of clickstream data, clustering algorithm considering big data technologies. This
there are certain specifications which have to be considered. was demonstrated in a real-world case and evaluated using
For instance, the complexity of the preprocessing and various datasets.
transformation depends on how the targeted clickstream data
is structured, as seen in the steps of the demonstration. REFERENCES
Furthermore, the right choice of technologies is crucial. As [1] PWC, They say they want a revolution: Total Retail 2016. [Online]
the pure implementation of the projects has shown, there are Available: http://www.pwc.com/gx/en/retail-
still significant differences and uncertainties, particularly consumer/publications/assets/total-retail-global-report.pdf. Accessed
on: Jan. 19 2017.
regarding the choice of the right technology. In the future, [2] E. Eichmann, eCommerce Industry Outlook 2016. [Online]
further investigation could be realized to find at which point Available: http://www.criteo.com/de/resources/criteo-ecommerce-
the use of big data technologies and thus the application of industry-outlook-2016/.
the proposed solution seems reasonable, as described in [41]. [3] C. Annicelli et al., Worldwide retail ecommerce sales: emarketer´s
Additionally, a recommendation of specific technologies updated estimates and forecast through 2019. [Online] Available:
http://www.emarketer.com/public_media/docs/eMarketer_eTailWes
depending on the respective stage, of both the origin model t2016_Worldwide_ECommerce_Report.pdf.
as well as the developed solution (Fig. 4), appears promising. [4] J. Lee, M. Podlaseck, E. Schonberg, and R. Hoch, “Visualization
Regarding this, it is also possible to evaluate technologies and Analysis of Clickstream Data of Online Stores for
that allow, for example, an even higher degree of automation

412
Understanding Web Merchandising,” Data Mining and Knowledge [24] D. van den Poel and W. Buckinx, “Predicting online-purchasing
Discovery, vol. 5, no. 1/2, pp. 59–84, 2001. behaviour,” European Journal of Operational Research, vol. 166,
[5] R. Cooley, P.-N. Tan, and J. Srivastava, “Discovery of Interesting no. 2, pp. 557–575, 2005.
Usage Patterns from Web Data,” in Lecture notes in computer [25] W. W. Moe, “Buying, Searching, or Browsing: Differentiating
science Lecture notes in artificial intelligence, vol. 1836, Web usage Between Online Shoppers Using In-Store Navigational
analysis and user profiling: International WEBKDD'99 Workshop, Clickstream,” Journal of Consumer Psychology, vol. 13, no. 1, pp.
San Diego, CA, USA, August 15, 1999 ; revised papers, B. Masand, 29–39, 2003.
Ed., Berlin: Springer, 2000, pp. 163–182. [26] NIST Big Data Public Working Group (NBD-PWG), NIST Big Data
[6] S. Senecal, P. J. Kalczynski, and J. Nantel, “Consumers' decision- Interoperability Framework: Volume 1, Definitions: National
making process and their online shopping behavior: A clickstream Institute of Standards and Technology, 2015.
analysis,” Journal of Business Research, vol. 58, no. 11, pp. 1599– [27] T. Hansmann and P. Niemeyer, “Big Data - Characterizing an
1608, 2005. Emerging Research Field Using Topic Models,” in IEEE/WIC/ACM
[7] NIST Big Data Public Working Group (NBD-PWG), NIST Big Data International Joint Conferences on Web Intelligence (WI) and
Interoperability Framework: Volume 3, Use Cases and General Intelligent Agent Technologies (IAT), 2014: 11 - 14 Aug. 2014,
Requirements: National Institute of Standards and Technology, Warsaw, Poland ; proceedings ; [including workshops] ; held as
2015. part of the 2014 Web Intelligence Congress (WIC '14), Piscataway,
[8] A. R. Hevner, S. T. March, J. Park, and S. Ram, “Design science in NJ: IEEE, 2014, pp. 43–51.
information systems research,” MIS quarterly, vol. 28, no. 1, pp. [28] A. Kumaresan, “Framework for Building a Big Data Platform for
75–105, 2004. Publishing Industry,” in Lecture Notes in Business Information
[9] K. Peffers, T. Tuunanen, M. Rothenberger, and S. Chatterjee, “A Processing, vol. 224, Knowledge management in organizations:
Design Science Research Methodology for Information Systems 10th international conference, KMO 2015, Maribor, Slovenia,
Research,” J. Manage. Inf. Syst., vol. 24, no. 3, pp. 45–77, 2007. August 24-28, 2015 : proceedings, L. Uden, M. Heričko, and I.-H.
[10] A. Fink, Conducting research literature reviews: From the internet Ting, Eds., Cham, Heidelberg, New York, Dordrecht, London:
to paper, 4th ed. Los Angeles: SAGE, 2014. Springer, 2015, pp. 377–388.
[11] P. Mayring, Qualitative Inhaltsanalyse: Grundlagen und Techniken, [29] J. Zhan et al., Eds., Study of the key technologies of electric power
11th ed. Weinheim: Beltz, 2010. big data and its application prospects in smart grid. 2014 IEEE PES
[12] V. Melnykov, “Model-based biclustering of clickstream data,” Asia-Pacific Power and Energy Engineering Conference (APPEEC),
Computational Statistics & Data Analysis, vol. 93, pp. 31–45, 2016. 2014.
[13] R. Rathipriya and K. Thangavel, “A Fuzzy Co-Clustering approach [30] H. Hu, Y. Wen, T.-S. Chua, and X. Li, “Toward Scalable Systems
for Clickstream Data Pattern,” Global Journal of Computer Science for Big Data Analytics: A Technology Tutorial,” IEEE Access, vol.
and Technology, vol. 10, no. 6, 2, pp. 652–687, 2014.
http://computerresearch.org/index.php/computer/article/download/9 [31] C. L. Philip Chen and C.-Y. Zhang, “Data-intensive applications,
60/958, 2010. challenges, techniques and technologies: A survey on Big Data,”
[14] L. Zheng, S. Cui, D. Yue, and X. Zhao, “User interest modeling Information Sciences, vol. 275, pp. 314–347, 2014.
based on browsing behavior,” in 3rd International Conference on [32] I. A. T. Hashem et al., “The rise of “big data” on cloud computing:
Advanced Computer Theory and Engineering (ICACTE), 2010: 20 - Review and open research issues,” Information Systems, vol. 47, pp.
22 Aug. 2010, Chengdu, China ; proceedings, Piscataway, NJ: 98–115, 2015.
IEEE, 2010, V5-455-V5-458. [33] M. Chen, S. Mao, and Y. Liu, “Big Data: A Survey,” Mobile Netw
[15] Q. Su and L. Chen, “A method for discovering clusters of e- Appl, vol. 19, no. 2, pp. 171–209, 2014.
commerce interest patterns using click-stream data,” Electronic [34] D. Dev and R. Patgiri, “A Survey of Different Technologies and
Commerce Research and Applications, vol. 14, no. 1, pp. 1–13, Recent Challenges of Big Data,” in Smart Innovation, Systems and
2015. Technologies, Proceedings of 3rd International Conference on
[16] W. W. Moe and P. S. Fader, “Dynamic Conversion Behavior at E- Advanced Computing, Networking and Informatics, A. Nagar, D. P.
Commerce Sites,” Management Science, vol. 50, no. 3, pp. 326– Mohapatra, and N. Chaki, Eds., New Delhi: Springer India, 2016,
335, 2004. pp. 537–548.
[17] Y. S. Kim and B.-J. Yum, “Recommender system based on click [35] P. Pääkkönen and D. Pakkala, “Reference Architecture and
stream data using association rule mining,” Expert Systems with Classification of Technologies, Products and Services for Big Data
Applications, vol. 38, no. 10, pp. 13320–13327, 2011. Systems,” Big Data Research, vol. 2, no. 4, pp. 166–186, 2015.
[18] Y.-J. Park and K.-N. Chang, “Individual and group behavior-based [36] L. Rodríguez-Mazahua et al., “A general perspective of Big Data:
customer profile model for personalized product recommendation,” Applications, tools, challenges and trends,” J Supercomput, vol. 72,
Expert Systems with Applications, vol. 36, no. 2, pp. 1932–1939, no. 8, pp. 3073–3113, 2016.
2009. [37] M. D. Assunção, R. N. Calheiros, S. Bianchi, M. A. Netto, and R.
[19] C. Sismeiro and R. E. Bucklin, “Modeling Purchase Behavior at an Buyya, “Big Data computing and clouds: Trends and future
E-Commerce Web Site: A Task-Completion Approach,” Journal of directions,” Journal of Parallel and Distributed Computing, vol. 79-
Marketing Research, vol. 41, no. 3, pp. 306–323, 2004. 80, pp. 3–15, 2015.
[20] H. Yu et al., “A novel possibilistic fuzzy leader clustering [38] G. Bello-Orgaz, J. J. Jung, and D. Camacho, “Social big data:
algorithm,” HIS, vol. 8, no. 1, pp. 31–40, 2011. Recent achievements and new challenges,” Information Fusion, vol.
[21] L. Aguiar and B. Martens, Digital music consumption on the 28, pp. 45–59, 2016.
Internet: Evidence from Clickstream data. Luxembourg: [39] G. Pole and P. Gera, “A Recent Study of Emerging Tools and
Publications Office, 2013. Technologies Boosting Big Data Analytics,” in Advances in
[22] G. Silahtaroglu and H. Donertasli, “Analysis and prediction of Ε- Intelligent Systems and Computing, Innovations in Computer
customers' behavior by mining clickstream data,” in 2015 IEEE Science and Engineering, H. S. Saini, R. Sayal, and S. S. Rawat,
International Conference on Big Data (Big Data): IEEE, 2015, pp. Eds., Singapore: Springer Singapore, 2016, pp. 29–36.
1466–1472. [40] U. Fayyad, G. Piatetsky-Shapiro, and P. Smyth, “From data mining
[23] G. Wang, X. Zhang, S. Tang, H. Zheng, and B. Y. Zhao, to knowledge discovery in databases,” AI magazine, vol. 17, no. 3,
“Unsupervised Clickstream Clustering for User Behavior Analysis,” p. 37, 1996.
in CHI 2016: #chi4good ; proceedings ; The 34rd Annual CHI [41] M. Volk, S. Hart, S. Bosse, and K. Turowski, “How much is Big
Conference on Human Factors in Computing Systems, San Jose, Data? A Classification Framework for IT Projects and Technologies
CA, USA, May 07 - 12, 2016, New York, NY: ACM, 2015?, pp. Diego, CA, USA, August 11-14, 2016,” in 22nd Americas
225–236. Conference on Information 2016.

413

Online-Shopper's Purchasing Intention Report
100% (2)
Online-Shopper's Purchasing Intention Report
28 pages
Case 2 Process Modeling and Analysis in An Assembly Factory The Leedssim Factory Is A
No ratings yet
Case 2 Process Modeling and Analysis in An Assembly Factory The Leedssim Factory Is A
2 pages
Analytics and Big Data for Accountants
From Everand
Analytics and Big Data for Accountants
Jim Lindell
No ratings yet
Data Structures and Algorithms Sheet #4 Recursion: Part I: Exercises
No ratings yet
Data Structures and Algorithms Sheet #4 Recursion: Part I: Exercises
2 pages
Overview of The ANSI/HFES 100 - 2007 Ergonomic Standard: A New Standard Is Released
No ratings yet
Overview of The ANSI/HFES 100 - 2007 Ergonomic Standard: A New Standard Is Released
1 page
Shopping Hard or Hardly Shopping Revealing Consume - Docx Abstract
100% (1)
Shopping Hard or Hardly Shopping Revealing Consume - Docx Abstract
61 pages
Analysis and Prediction of E-Customers' Behavior by Mining Clickstream Data
No ratings yet
Analysis and Prediction of E-Customers' Behavior by Mining Clickstream Data
7 pages
Mathematics 11 00025
No ratings yet
Mathematics 11 00025
24 pages
E Commerce N Big Data
No ratings yet
E Commerce N Big Data
13 pages
The Impact of Big Data Analytics On Customers Online Behaviour
No ratings yet
The Impact of Big Data Analytics On Customers Online Behaviour
4 pages
Advances in Multimedia - 2022 - Wang - Visual Analysis of E Commerce User Behavior Based On Log Mining
No ratings yet
Advances in Multimedia - 2022 - Wang - Visual Analysis of E Commerce User Behavior Based On Log Mining
22 pages
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
From Everand
Introduction to Data Platforms: How to leverage data fabric concepts to engineer your organization's data for today's cloud-based digital world
Anthony David Giordano
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Mining For Customer Segmentation
No ratings yet
Data Mining For Customer Segmentation
13 pages
Click Stream Analysis
No ratings yet
Click Stream Analysis
96 pages
MSC CS Shivani Pandey Roll No 36 BIG DATA ANALYTICS in E-Commerce Research Paper
No ratings yet
MSC CS Shivani Pandey Roll No 36 BIG DATA ANALYTICS in E-Commerce Research Paper
7 pages
ETCW18
No ratings yet
ETCW18
7 pages
Sales Analysis of E-Commerce Websites Using Data M
No ratings yet
Sales Analysis of E-Commerce Websites Using Data M
6 pages
Clickstream Data
No ratings yet
Clickstream Data
38 pages
Computer Security Technology in E-Commerce Platform Business
No ratings yet
Computer Security Technology in E-Commerce Platform Business
12 pages
Application of Clustering Algorithm For Effective Customer Segmentation in E-Commerce
No ratings yet
Application of Clustering Algorithm For Effective Customer Segmentation in E-Commerce
6 pages
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
2 Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
No ratings yet
2 Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
4 pages
Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
No ratings yet
Analyzing Target Customer Behavior Using Data Mining Techniques For E-Commerce Data
4 pages
SSRN 42487381
No ratings yet
SSRN 42487381
15 pages
PDL Paper 6
No ratings yet
PDL Paper 6
5 pages
Machine Learning Evaluation of Key Aspects of User Preferences and Usability of E-Commerce Websites
No ratings yet
Machine Learning Evaluation of Key Aspects of User Preferences and Usability of E-Commerce Websites
7 pages
Intelligent Document Capture with Ephesoft
From Everand
Intelligent Document Capture with Ephesoft
Pat Myers
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Data Mining 101: Core Concepts and Algorithms
From Everand
Data Mining 101: Core Concepts and Algorithms
Swarnalata Verma
No ratings yet
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
From Everand
Real-Time Analytics: Techniques to Analyze and Visualize Streaming Data
Byron Ellis
No ratings yet
Edge Computing Applications in Supply Chain Management
From Everand
Edge Computing Applications in Supply Chain Management
Bo Li
No ratings yet
BI and Big Data Management
From Everand
BI and Big Data Management
Ulrich Hambuch
No ratings yet
Building Scalable Data-Intensive Applications
From Everand
Building Scalable Data-Intensive Applications
Chandani Kaul
No ratings yet
Implementing Big Data Analytics in E-Commerce Vendor and Customer View
No ratings yet
Implementing Big Data Analytics in E-Commerce Vendor and Customer View
6 pages
Implementation of a Central Electronic Mail & Filing Structure
From Everand
Implementation of a Central Electronic Mail & Filing Structure
Patapios Tranakas
No ratings yet
"Big Data Analysis For Customer Behaviour": A Seminar Report
100% (1)
"Big Data Analysis For Customer Behaviour": A Seminar Report
15 pages
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
From Everand
The Ultimate Guide to Unlocking the Full Potential of Cloud Services: Tips, Recommendations, and Strategies for Success
Rick Spair
No ratings yet
Network Coding and Signcryption for Cloud Data Integrity
From Everand
Network Coding and Signcryption for Cloud Data Integrity
Noah Joan
No ratings yet
Cloud-Based Multi-Modal Information Analytics
From Everand
Cloud-Based Multi-Modal Information Analytics
Tanushri Kaniyar
No ratings yet
Big Data: the Revolution That Is Transforming Our Work, Market and World
From Everand
Big Data: the Revolution That Is Transforming Our Work, Market and World
PAT NAKAMOTO
No ratings yet
Zoya Parasher - 2152916 - Big Data
No ratings yet
Zoya Parasher - 2152916 - Big Data
6 pages
CS229 Project Final Write-Up Predictive Analytics For E-Commerce Customer Behavior and Demand Forecasting Team Members
No ratings yet
CS229 Project Final Write-Up Predictive Analytics For E-Commerce Customer Behavior and Demand Forecasting Team Members
6 pages
Edge Cloud Operations: A Systems Approach
From Everand
Edge Cloud Operations: A Systems Approach
Larry L Peterson
No ratings yet
Big Data in Customer Acquisition and Retention For Ecommerce - Taking Walmart As An Example
No ratings yet
Big Data in Customer Acquisition and Retention For Ecommerce - Taking Walmart As An Example
4 pages
Big Data Driven E-Commerce Architecture: International Journal of Economics, Commerce and Management
No ratings yet
Big Data Driven E-Commerce Architecture: International Journal of Economics, Commerce and Management
8 pages
Unit 6
No ratings yet
Unit 6
28 pages
An Introduction to SDN Intent Based Networking
From Everand
An Introduction to SDN Intent Based Networking
alasdair gilchrist
5/5 (1)
Big Data Analytics and Its Application in ECommerce Giants
No ratings yet
Big Data Analytics and Its Application in ECommerce Giants
12 pages
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
From Everand
Free Antivirus and its Market Implimentation: a Case Study of Qihoo 360 And Baidu
Yang Yiming
No ratings yet
Assessment 2
No ratings yet
Assessment 2
11 pages
Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites
No ratings yet
Log-Based Session Profiling and Online Behavioral Prediction in ECommerce Websites
17 pages
The Cloud Computing Revolution: From Virtualization to Automation: Unveiling the Cloud Computing Revolution
From Everand
The Cloud Computing Revolution: From Virtualization to Automation: Unveiling the Cloud Computing Revolution
Lisa Carter
No ratings yet
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
No ratings yet
Analysis of Web Server Logs To Understand Internet User Behavior and Develop Digital Marketing Strategies
7 pages
Big Data Relevance Retailers 0714 1 PDF
No ratings yet
Big Data Relevance Retailers 0714 1 PDF
14 pages
Big Data: Opportunities and challenges
From Everand
Big Data: Opportunities and challenges
BCS, The Chartered Institute for IT
No ratings yet
Enterprise Data Science: Smarter Decisions with Big Data
From Everand
Enterprise Data Science: Smarter Decisions with Big Data
Vidhur Gupta
No ratings yet
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
From Everand
Python Data Wrangling for Business Analytics: Python for Business Analytics Series
George Snypes
2/5 (1)
Application Design: Key Principles For Data-Intensive App Systems
From Everand
Application Design: Key Principles For Data-Intensive App Systems
Rob Botwright
No ratings yet
2022 Ijsom-32509 PPV
No ratings yet
2022 Ijsom-32509 PPV
24 pages
Path Breaking Case Studies in E-Commerce Using Data Mining: Rupesh Sanchati, P.C. Patidar, Gaurav Kulkarni
No ratings yet
Path Breaking Case Studies in E-Commerce Using Data Mining: Rupesh Sanchati, P.C. Patidar, Gaurav Kulkarni
6 pages
Part 5 Expert Systems Final
No ratings yet
Part 5 Expert Systems Final
39 pages
Math/Numeric Functions: - Power /pow - Round - Truncate - Mod - SQRT
No ratings yet
Math/Numeric Functions: - Power /pow - Round - Truncate - Mod - SQRT
11 pages
Inheritance in C++
No ratings yet
Inheritance in C++
25 pages
Chapter # 5 Sequential Logic: Lecturer: Noman Al Hassan
No ratings yet
Chapter # 5 Sequential Logic: Lecturer: Noman Al Hassan
19 pages
Ol Ict Past Paper 2018 English
No ratings yet
Ol Ict Past Paper 2018 English
12 pages
Designed To Keep Up With You: The Everyday Earbuds
No ratings yet
Designed To Keep Up With You: The Everyday Earbuds
1 page
BWSI 2024-2025 Build A CubeSat Challenge FAQs
No ratings yet
BWSI 2024-2025 Build A CubeSat Challenge FAQs
6 pages
Class 8 (Comp)
No ratings yet
Class 8 (Comp)
2 pages
Phases of The Application Development Life Cycle Copa
No ratings yet
Phases of The Application Development Life Cycle Copa
8 pages
DFD of Social Networking Site Project
No ratings yet
DFD of Social Networking Site Project
4 pages
Date Topic Activities or Tasks: Week 6 - Audio Information and Media
No ratings yet
Date Topic Activities or Tasks: Week 6 - Audio Information and Media
5 pages
CSE Pricelist
No ratings yet
CSE Pricelist
13 pages
Vaadin Flow
No ratings yet
Vaadin Flow
14 pages
Krones Lineatronic 735 Empty Bottle Inspector
No ratings yet
Krones Lineatronic 735 Empty Bottle Inspector
4 pages
PS LabManual
No ratings yet
PS LabManual
39 pages
5G Lte Endc
No ratings yet
5G Lte Endc
26 pages
Pasupathy T-Resume
No ratings yet
Pasupathy T-Resume
1 page
Exiting Employee Checklist: Type of Termination
No ratings yet
Exiting Employee Checklist: Type of Termination
2 pages
3.2 Logic Gates (MT-L)
No ratings yet
3.2 Logic Gates (MT-L)
19 pages
Analysis of Handwritten Joint Characters in Gujarati Language
No ratings yet
Analysis of Handwritten Joint Characters in Gujarati Language
7 pages
Daily Activity Log
No ratings yet
Daily Activity Log
13 pages
6GA-E - 21GA-E Series - GlassGrind Machine Operation Manual - ENG
No ratings yet
6GA-E - 21GA-E Series - GlassGrind Machine Operation Manual - ENG
141 pages
Introduction To Java Programming Course
No ratings yet
Introduction To Java Programming Course
2 pages
Resume - Anjali Gupta
No ratings yet
Resume - Anjali Gupta
2 pages
Guided Demonstration - Setting Up CMRO Specific Attributes: Distribution
No ratings yet
Guided Demonstration - Setting Up CMRO Specific Attributes: Distribution
3 pages
AIML MANUAL Word Final
No ratings yet
AIML MANUAL Word Final
38 pages
L7 Question Notes
No ratings yet
L7 Question Notes
12 pages

Ieee

Uploaded by

Ieee

Uploaded by

2017 IEEE 6th International Congress on Big Data

New E-Commerce User Interest Patterns

Matthias Volk, Abed Elrahman Shareef, Naoum Jamous, Klaus Turowski

978-1-5386-1996-4/17 $31.00 © 2017 IEEE 406

Figure 2. Sample log entry

In addition to the current system of the user, other data,

Figure 5. Technical representation of the intented workflow

Figure 7. Comparsion of the dataset results

You might also like