0% found this document useful (0 votes)

73 views7 pages

Sciencedirect: Performing Customer Behavior Analysis Using Big Data Analytics

Uploaded by

Rakesh Jangid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

73 views7 pages

Sciencedirect: Performing Customer Behavior Analysis Using Big Data Analytics

Uploaded by

Rakesh Jangid

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

Available online at www.sciencedirect.

com

ScienceDirect
Procedia Computer Science 79 (2016) 986 – 992

7th International Conference on Communication, Computing and Virtualization 2016

Performing Customer Behavior Analysis using Big Data Analytics

Anindita A Khade
Assistant Professor,SIESGST NERUL,India

Abstract

Although there are many systems that have implemented customer behavior analytics, it’s still
an upcoming and unexplored market that has greater potential for better advancements. Big
data is one of the most rising technology trends that have the capability for significantly
changing the way business organizations use customer behavior to analyze and transform it into
valuable insights. Even decision trees can be used efficiently for analyzing data. At the end of
this paper, a proposed Map Reduce implementation of well-known statistical classifier, C4.5
decision tree algorithm has been proposed. Apart from this,the system aims to implement
Customer data visualization using Data Driven Documents (d3.js) which allows us to build well
customized graphics.
© 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
© 2016 The Authors. Published by Elsevier B.V.
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review underresponsibility
Peer-review under responsibilityofofthe
theOrganizing
Organizing Committee
Committee of ICCCV
of ICCCV 20162016.

Keywords: Big Data analyti;C4.5 algorithm;D3.j;,Data visualization;Hadoop;MapReduce

1. Main text

Here Big data is a collection of unstructured data that has very large volume, comes from
variety of sources like web ,business organizations etc. in different formats and comes to us
with a great velocity which makes processing complex and tedious using traditional database
management tools .It can be termed as a growing torrent. So the major demanding issues in big
data processing include storage, search, distribution, transfer, analysis and visualization.
Earlier, the term 'Analytics' indicated the study of existing data to research about potential
trends and to analyze the effects of certain decisions or events that can be used for business
intelligence to gain various valuable insights. Today's biggest challenge is how to discover all
the hidden information through the huge amount of data collected from a varied collection of
sources. There comes Big Data Analytics into picture. One of them is the customer behavior
analysis which is referred as customer analytics.

1877-0509 © 2016 Published by Elsevier B.V. This is an open access article under the CC BY-NC-ND license
(http://creativecommons.org/licenses/by-nc-nd/4.0/).
Peer-review under responsibility of the Organizing Committee of ICCCV 2016
doi:10.1016/j.procs.2016.03.125
Anindita A. Khade / Procedia Computer Science 79 (2016) 986 – 992 987

Customer analytics helps to turn big data into big value by allowing the organizations to predict
the buyer behavior thereby improving their sales, market optimization, inventory planning,
fraud detection and many more applications. A wide range of approaches are available and can
be implemented but the one that stands out is the use of decision trees for the purpose of
classification that can be efficiently used in consumer analytics.
Various decision tree algorithms have been developed over a period of time with enhancement
in performance and ability to handle various types of data. One of the well-known decision tree
algorithm is C4.5 is C4.5 [3-4], an extension of basic ID3 decision tree algorithm [5]. Customer
analytics is incomplete without visualization of the data. In addition to classification of data
using decision trees it is also important to visualize the data so that organizations get a visual
aspect of the data in order to understand the variations in customer patterns.

2. Literature Survey

Traditional Analytical Systems For Customer Behavior[7]:

In the late 1970s, there were two approaches for constructing Database Management System’s
(DBMS’s). The first approach was based on the hierarchical data model, typified by
(Information Management Systems) from IBM, in response to the enormous information
storage requirements generated by the Apollo space program. The second approach
was based on the network data model, which attempted to create a database standard and resolve
some of the difficulties of the hierarchical model, such as its inability to represent complex
relationships DBMSs. However, these two models had some fundamental disadvantages like
the complex programs had to be written to answer even simple queries. Also there was minimal
data independence .

Many experimental relational DBMS were implemented thereafter, with the first commercial
products appearing in the 1970’s and early 1980’s. Relational DBMS used extensively in the
80’s and 90’s was limited in meeting the more complex entity and data needs of companies, as
their operations and applications became increasingly complex. In response to the increasing
complexity of database applications, two new data models had emerged, the Object-Relational
Database Management Systems (ORDBMS) and Object-Oriented Database Management
Systems (OODBMS), which subscribes to the relational and object data models respectively.
The OODBMS and ORDBMS have been combined to represent the third generation of
Database Management Systems.
Dawn Of Big Data Analytics:

Data turns to big data when its volume, velocity, or variety go beyond the abilities of the IT
operational systems to gather, store, analyze, and process it. Most of the organizations are
capable of handling vast amount of unstructured data using varied tools and equipments but
with the rapidly growing volume and fast flood of data, they do not have the capability of
mining it and derive necessary insights in a well-timed way.

Big Data is emerging from the realms of science projects at companies to help
telecommunication giants understand exactly which customers are happy with their service and
what processes caused the dissatisfaction, and predict which customers are going to change the
service. To obtain this information, billions of loosely-structured bytes of data in different
locations needs to be processed until the required data is found out. This type of analysis
enables executive management to fix faulty processes or people and may be able to reach out
to retain at-risk customers . Big data is becoming one of the most important technology trends
that have the potential for dramatically changing the way organizations use customer behaviour
to analyze and transform it into valuable insights.[11]

Key concepts of Customer analytics[6] :

The survey on customer analytics revealed the following key concepts:
988 Anindita A. Khade / Procedia Computer Science 79 (2016) 986 – 992

1) Venn Diagram– Discovering Hidden Relationships

Combine multiple segments to discover connections, relationships or differences. Explore

customers that have bought different categories of products and easily identify cross-selling
opportunities.

2) Data Profiling– Identify Customer Attributes

Select records from your data tree and generate customer profiles that indicate common features
and behaviors. Use customer profiles to inform effective sales and marketing strategy.

3) Forecasting – Time Series Analysis

Forecasting enables you to adapt to changes, trends and seasonal patterns. You can accurately
predict monthly sales volume or anticipate to the number of orders expected in any given month.
4) Mapping – Identify Geographical Zones

Mapping uses color-coding to indicate customer behavior as it changes across geographic

regions. A map divided into polygons that represent geographic regions shows you where your
churners are concentrated or where specific products sell the most.

5) Association Rules – Cause/ Effect – Basket Analysis

This technique detects relationship or affinity patterns across data and generates a set of rules.
It automatically selects the rules that are most useful to key business insights: What products
do customers purchase simultaneously and when? Which customers are not buying and why?
What new cross-selling opportunities exist?

6) Decision Tree – Classify and Predict Behavior

Decision trees are one of the most popular methods for classification in various data mining
applications and assist the process of decision making. Classification helps you do things like
select the right products to recommend to particular customers and predict potential churn. Most
primarily used decision tree algorithms include ID3, C4.5 and CART.

Tools for data visualization

Polymaps: Polymaps is a free JavaScript library and a joint project from SimpleGeo and
Stamen. This complex map overlay tool can load data at a range of scales, offering multi-
zoom functionality at levels ranging from country all the way down to street view. [12]

Flot: A JavaScript plotting library for jQuery, Flot is a browser-based application compatible
with most common browsers — including Internet Explorer, Chrome, Firefox, Safari and
Opera. Flot supports a variety of visualization options for data points, interactive charts, stacked
charts, panning and zooming, and other capabilities through a variety of plugins for specific
functionality. [12]

3) D3.js: A JavaScript library for creating data visualizations with an emphasis on web
standardsUsing HTML, SVG and CSS, bring documents to life with a data-driven approach to
DOM manipulation — all with the full capabilities of modern browsers and no constraints of
proprietary frameworks. [12]
4) SAS Visual Analytics:SAS Visual Analytics is a tool for exploring data sets of all sizes
visually for more comprehensive analytics. With an intuitive platform and automatic
forecasting tools, SAS Visual Analytics allows even non-technical users to explore the deeper
relationships behind data and uncover hidden opportunities. [12]
Anindita A. Khade / Procedia Computer Science 79 (2016) 986 – 992 989

3. Related Technologies

1.1 Apache Hadoop

Apache Hadoop[13] is an open source software framework [16]. Hadoop consists of two
main components: a distributed processing framework named MapReduce and a
distributed file system known as the Hadoop distributed file system, or HDFS[2]. One of
the most important reason for using this framework in this project is to process a large
amount of data and do its analysis which is not possible with other system. The storage is
provided by HDFS and the analysis is done by MapReduce. Although Hadoop is best
known for MapReduce and its distributed file system, the other subprojects provide
complementary services, or build on the core to provide high-level abstractions. [1]

1.2 Hadoop Distributed File System:

The Hadoop Distributed File System (HDFS)[15] is the storage component. In short, HDFS
provides a distributed architecture for extremely large scale storage, which can easily be
extended by scaling out. When a file is stored in HDFS, the file is divided into evenly sized
blocks. The size of block can be customized or the predefined one can be used. In this
project, the customer dataset is stored in HDFS. The dataset contains a lot of customer
records with respect to purchases. Also, the output file containing decision rules of is
written into HDFS.

1.3 Map Reduce Model:

MapReduce is a programming model for processing and generating large data sets with a
parallel, distributed algorithm on a cluster. MapReduce works by breaking the processing
into two phases: the map phase and the Reduce phase. Each phase has key-value pairs as
input and output, the types of which may be chosen by the programmer. The programmer
also specifies two functions: the Map function and the Reduce function. The input to our
map phase is the raw data of customers. We choose a text input format that gives us each
line in the dataset as a text value. The key is the offset of the beginning of the line from the
beginning of the file. The output from the map function is processed by the MapReduce
framework before being sent to the reduce function. This processing sorts and groups the
key-value pairs by key. [1]

Fig. 1: MapReduce Programming Model

Java code for the map function and the reduce function for this implementation is written
990 Anindita A. Khade / Procedia Computer Science 79 (2016) 986 – 992

for overriding the default map and reduce function provided by hadoop framework. The
programming logic for the respective is based on C4.5 algorithm.

2. Methodology
The flow of the system is as follows:
1) Loading the customer dataset from HDFS as input for the algorithm.
2) Invoke the instance of C4.5 class.
3) Using the MapReduce framework of Hadoop, Map function is invoked which checks
whether this instance belongs to Current Node or not. For all uncovered attributes it
outputs index and its value and class label of instance.
4) Reduce function counts number of occurrences of combination of (index and its value
and class Label) and prints count against it.
5) Calculate entropy, information gain and gain ratio of attributes.
6) Process the input dataset from HDFS according to the defined algorithm of C4.5
decision tree data mining in MapReduce framework.
7) Generate the decision rules and store it in HDFS.
8) Accept the new test data from web UI.
9) Access the rules and based on it, decide the category of the new data.
10) Provide visualization of the dataset from HDFS on the Web UI in the form of bar graphs, pie
charts etc. using D3.js.

Fig. 2: Flowchart of the Proposed System

2.1 C4.5 Algorithm:
C4.5[3-4] is an algorithm used to generate a decision tree developed by Ross
Quinlan. C4.5 is an extension of Quinlan’s earlier ID3 algorithm. The decision trees
generated by C4.5 can be used for classification, and for this reason C4.5 is often referred
to as a statistical classifier. C4.5 algorithm uses information gain as splitting criteria. It can
accept data with categorical or numerical values. To handle continuous values it generates
Anindita A. Khade / Procedia Computer Science 79 (2016) 986 – 992 991

threshold and then divides attributes with values above the threshold and values equal to
or below the threshold. C4.5algorithm can easily handle missing values. As missing
attribute values are not utilized in gain calculations by C4.5.[8]

Let C denote the number of classes. In this case, there are two classes in which the
records will be classified into. The classes are yes and no. The p(S, j) is the proportion of
instances in S that are assigned to j -th class. Therefore, the entropy of at tribute S is
calculated as:
Entropy(S) = -∑ j=1c p(S,j) *log p(S,j)

Entropy is calculated for each record of a particular attribute.

Accordingly the information gain by a training dataset T is defined as:
Gain(S,T)= Entropy(S)-∑v€values(Ts)|T(s,y)/T(s)* log p(S,j) where Values (TS) is
the set of values of S in T , Ts is the subset of T induced by S , and TS ,v is the subset of T
in which attribute S has a value of v.

2.2 Data Visualization using D3.js:

D3.js is a JavaScript library for manipulating documents based on data. D3 helps you bring
data to life using HTML, SVG, and

CSS. D3’s emphasis on web standards gives you the full capabilities of modern browsers
without tying yourself to a proprietary framework, combining powerful visualization
components and a data-driven approach to DOM manipulation.[14]

Key features of D3.js[14]:

o Bind arbitrary data to DOM
o Create interactive SVG bar charts
o Generate HTML tables from data sets
o Variety of components and plugins to enhance capabilities
o Built-in reusable components for ease of coding

4. Conclusion

This paper defines the proposed system for distributed implementation of C4.5 algorithm
using MapReduce framework along with the customer data visualization. With the rise in
development of cloud computing and big data, traditional decision tree algorithms cannot
fit any more and hence we introduced the mapreduce implementation of C4.5 decision tree
algorithm. Visualization done using D3.js is fast and reusable because it uses traditional
HTML elements along with Scalable Vector Graphics (SVG). In future works, the use of
fast and real time database systems like Apache HBase or MongoDB can be incorporated
with this system. In addition to this, we can use distributed refined algorithms like
ForestTree implemented in Apache Mahout to increase efficiency and scalability.

5. References

1. Tom white, ―Hadoop - The Definitive Guideǁ,3rd.Edition, O’Reilly Media, Inc.,Sebastopol, CA 95472,2012.
2. Dirk deRoos, Paul C. Zikopoulos, Bruce Brown,Rafael Coss, Roman B. Melnyk ―Hadoop For Dummiesǁ, John Wiley
& Sons, Inc., Hoboken, New Jersey,2014
3. J.R. Quinlan,―C4.5: programs for machine learningǁ, Morgan Kaufmann,1993.
992 Anindita A. Khade / Procedia Computer Science 79 (2016) 986 – 992

4. J.R. Quinlan,―Improved use of continuous attributes in C4.5ǁ, arXiv,1996 ,preprint cs/9603103.

5. J.R. Quinlan,―Induction of decision treesǁ,Machine Learning, vol.1, no.1,1986,pp.81-106.
6. Actuate Corporation, ―Customer Analytics Turn Big Data into Big Valueǁ. Available :
http://birtanalytics.actuate.com/customer-analytics-turn-big-data-into-big-value
7. Seamus Rispin, ―Database Resources,ǁ The Institute of Certified Public Accountants, Ireland.
8. Mr. Brijain R Patel, Mr. Kushik K Rana (2014).A Survey on Decision Tree Algorithm for Classification. IJEDR [Online]2(1).
Available: http://www.ijedr.org/papers/IJEDR1401001.pdf
9. Wei Dai and Wei Ji. (2014). A MapReduce Implementation of C4.5 Decision Tree Algorithm. International Journal of
Database Theory and Application [Online] 7(1), pp. 49-60. Available: http://www.chinacloud.cn/upload/2014-
03/14031920373451.pdf
10. Surbhi Hardikar, Ankur Shrivastava ,Vijay Choudhary(2012)Comparison between ID3 and C4.5 in Contrast to
IDS[Online] 2 (7), pp.659-667.Available : www.vsrdjournals.com
11. David Floyer.(2014, Jan) Enterprise Big-data [Online] Available: http://wikibon.org/wiki/v/Enterprise_Big-data
12. Andy Lurie(2014,Feb).39 Data visualization tools for big data[Online]. ProfitBricks,The Laas Company.Available :
https://blog.profitbricks.com/39-data-visualization-tools-for-big-data
13. Apache Hadoop http://hadoop.apache.org/releases.html
14. D3.js : http://d3js.org
15. HDFS : http://hortonworks.com/hadoop/hdfs
16. http://en.wikipedia.org/wiki/Apache_Hadoop

Bank 2020 Big Data Whitepaper
No ratings yet
Bank 2020 Big Data Whitepaper
70 pages
Big Data Analytics - Applications, Challenges & Future Directions
No ratings yet
Big Data Analytics - Applications, Challenges & Future Directions
6 pages
BDA Unit 1
No ratings yet
BDA Unit 1
17 pages
Big Data Analytics in Financial Reporting - Trends and Challenges
No ratings yet
Big Data Analytics in Financial Reporting - Trends and Challenges
17 pages
Flyer
No ratings yet
Flyer
1 page
Big Data and Business Opportunities
100% (1)
Big Data and Business Opportunities
6 pages
Mini Project For BSCIT 3rd
No ratings yet
Mini Project For BSCIT 3rd
9 pages
Big Data in Telecommunications
No ratings yet
Big Data in Telecommunications
10 pages
AI and DS
No ratings yet
AI and DS
6 pages
Script Freebtc
64% (14)
Script Freebtc
2 pages
CS 4402 Graded Quiz Unit 3
No ratings yet
CS 4402 Graded Quiz Unit 3
9 pages
Rajib Ahmed CV
No ratings yet
Rajib Ahmed CV
4 pages
Computer Science 10th 14 - 10 - 2024 - 110451845
No ratings yet
Computer Science 10th 14 - 10 - 2024 - 110451845
3 pages
Practicing Netiquettes
No ratings yet
Practicing Netiquettes
21 pages
Part 1 Ict Notes
No ratings yet
Part 1 Ict Notes
3 pages
Unit 1
No ratings yet
Unit 1
56 pages
R II Bca IV Sem Unit 3 Balu Sir
No ratings yet
R II Bca IV Sem Unit 3 Balu Sir
14 pages
IBM - Big Data Architecture and Patterns
No ratings yet
IBM - Big Data Architecture and Patterns
43 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
34 pages
Introduction To Big Data Platform
No ratings yet
Introduction To Big Data Platform
20 pages
Unit - 1 (Big Data)
No ratings yet
Unit - 1 (Big Data)
15 pages
R 2 Frida
No ratings yet
R 2 Frida
28 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
23 pages
Library Manager Manual Version 2.30 Human Font
No ratings yet
Library Manager Manual Version 2.30 Human Font
26 pages
Bda Unit 1
No ratings yet
Bda Unit 1
47 pages
Big Data Analysis: Concepts, Tools and Applications: Poonam
No ratings yet
Big Data Analysis: Concepts, Tools and Applications: Poonam
8 pages
Ijcrt2108014 - 2021
No ratings yet
Ijcrt2108014 - 2021
5 pages
Unit 1
No ratings yet
Unit 1
76 pages
Game Api
No ratings yet
Game Api
16 pages
Net 202 Week 4
No ratings yet
Net 202 Week 4
8 pages
Bank 2020 - Big Data - Whitepaper PDF
No ratings yet
Bank 2020 - Big Data - Whitepaper PDF
70 pages
Alienware Aurora r7 Desktop Service Manual en Us
No ratings yet
Alienware Aurora r7 Desktop Service Manual en Us
155 pages
Configuration Fallback
No ratings yet
Configuration Fallback
1 page
Unit-III CC&BD Cs62 Ab
No ratings yet
Unit-III CC&BD Cs62 Ab
85 pages
Introduction To Big Data Analytics
100% (4)
Introduction To Big Data Analytics
112 pages
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
No ratings yet
Big Data Analysis Solutions For Driving Innovation in On-Site Decision Making
9 pages
Unit 1 Big Data
No ratings yet
Unit 1 Big Data
124 pages
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
No ratings yet
What Is Need of Big Data in Enterprises and How It Is Different From Business Intelligence
56 pages
John Seoloane: Jonesc@webmail - Co.za
No ratings yet
John Seoloane: Jonesc@webmail - Co.za
3 pages
Jsaer2016 03 01 21 24
No ratings yet
Jsaer2016 03 01 21 24
4 pages
14 Big Data
No ratings yet
14 Big Data
39 pages
The Big 3: Lesson 2: Platform Technologies (ELEC4)
No ratings yet
The Big 3: Lesson 2: Platform Technologies (ELEC4)
38 pages
Bigdata, Hadoop and HDFS: Evolution of Data
No ratings yet
Bigdata, Hadoop and HDFS: Evolution of Data
28 pages
Cream Cascade For G8M Dickator: Utorial
No ratings yet
Cream Cascade For G8M Dickator: Utorial
3 pages
Autodesk Raster Design Manual
100% (1)
Autodesk Raster Design Manual
166 pages
2014-Mmac-Tr-Xxx - Ias - 10920ec001 - Investigation of Failure On Hot Standby Unit MCR Rev 01
No ratings yet
2014-Mmac-Tr-Xxx - Ias - 10920ec001 - Investigation of Failure On Hot Standby Unit MCR Rev 01
10 pages
Architectures of Big Data
No ratings yet
Architectures of Big Data
27 pages
Kabir CV
No ratings yet
Kabir CV
4 pages
BDCC 03 00032 v2 PDF
No ratings yet
BDCC 03 00032 v2 PDF
30 pages
Bda Aiml Note Unit 1
No ratings yet
Bda Aiml Note Unit 1
14 pages
Big Data Analytics
100% (1)
Big Data Analytics
11 pages
Advanced DataBase Assignment
No ratings yet
Advanced DataBase Assignment
8 pages
Led Display
No ratings yet
Led Display
5 pages
Reviewed Big Data Assignment
No ratings yet
Reviewed Big Data Assignment
6 pages
Unit 1
No ratings yet
Unit 1
11 pages
HyperView Tutorials
No ratings yet
HyperView Tutorials
88 pages
Big Data Class - Introduction
No ratings yet
Big Data Class - Introduction
60 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Vendors
No ratings yet
Vendors
266 pages
117769
No ratings yet
117769
20 pages
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
No ratings yet
Big Data Analytics: Challenges and Applications For Text, Audio, Video, and Social Media Data
11 pages
What Is Data
No ratings yet
What Is Data
20 pages
Big Data: in Banking
No ratings yet
Big Data: in Banking
16 pages
SM5100 SM EN 2nd PDF
No ratings yet
SM5100 SM EN 2nd PDF
54 pages
Big Data
No ratings yet
Big Data
5 pages
Big Data
No ratings yet
Big Data
9 pages
TOGAF Case Study 1 PDF
No ratings yet
TOGAF Case Study 1 PDF
4 pages
Need of Big Data
No ratings yet
Need of Big Data
5 pages
Unit I: Chapter 1: Introduction To Big Data
No ratings yet
Unit I: Chapter 1: Introduction To Big Data
35 pages
User Manual Tuya Smart IR + RF Ufo R2 Control WiFi Universale
No ratings yet
User Manual Tuya Smart IR + RF Ufo R2 Control WiFi Universale
1 page
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
No ratings yet
Title - Concept of Big Data: Presented by - Divyanshu Upadhyay Naman Gupta Adarsh Pandey Pankaj Chaudhary Shivbrat Singh
17 pages
Bda U1
No ratings yet
Bda U1
78 pages
Big Data: Concepts, Techniques, Storage and Challenges
No ratings yet
Big Data: Concepts, Techniques, Storage and Challenges
9 pages
Universidad Autonoma Del Estado de Mexico
No ratings yet
Universidad Autonoma Del Estado de Mexico
7 pages
STM32 Ipod Iphone Accessories Library - Presentation v0.2
No ratings yet
STM32 Ipod Iphone Accessories Library - Presentation v0.2
27 pages
Dont Do That
No ratings yet
Dont Do That
30 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Big Data
No ratings yet
Big Data
9 pages
Big Data Analytics
100% (1)
Big Data Analytics
3 pages
Hadoop Report
No ratings yet
Hadoop Report
110 pages
Evaluations of Big Data Processing PDF
No ratings yet
Evaluations of Big Data Processing PDF
10 pages
CMS Requirements Document
No ratings yet
CMS Requirements Document
19 pages
Software Requirements Specification: Final Version
No ratings yet
Software Requirements Specification: Final Version
14 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Big Data: Understanding How Data Powers Big Business
From Everand
Big Data: Understanding How Data Powers Big Business
Bill Schmarzo
2/5 (1)
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Building and Operating Data Hubs: Using a practical Framework as Toolset
From Everand
Building and Operating Data Hubs: Using a practical Framework as Toolset
Georg Graner
No ratings yet
Data Mining: Fundamentals and Applications
From Everand
Data Mining: Fundamentals and Applications
Fouad Sabry
No ratings yet
Practical Data Strategies and Recipes
From Everand
Practical Data Strategies and Recipes
Tom Henricksen
No ratings yet
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
From Everand
Spreadsheets To Cubes (Advanced Data Analytics for Small Medium Business): Data Science
alasdair gilchrist
No ratings yet

Sciencedirect: Performing Customer Behavior Analysis Using Big Data Analytics

Uploaded by

Sciencedirect: Performing Customer Behavior Analysis Using Big Data Analytics

Uploaded by

Available online at www.sciencedirect.

7th International Conference on Communication, Computing and Virtualization 2016

Performing Customer Behavior Analysis using Big Data Analytics

Keywords: Big Data analyti;C4.5 algorithm;D3.j;,Data visualization;Hadoop;MapReduce

Traditional Analytical Systems For Customer Behavior[7]:

Key concepts of Customer analytics[6] :

1) Venn Diagram– Discovering Hidden Relationships

Combine multiple segments to discover connections, relationships or differences. Explore

2) Data Profiling– Identify Customer Attributes

3) Forecasting – Time Series Analysis

Mapping uses color-coding to indicate customer behavior as it changes across geographic

5) Association Rules – Cause/ Effect – Basket Analysis

6) Decision Tree – Classify and Predict Behavior

Tools for data visualization

1.1 Apache Hadoop

1.2 Hadoop Distributed File System:

1.3 Map Reduce Model:

Fig. 1: MapReduce Programming Model

Fig. 2: Flowchart of the Proposed System

Entropy is calculated for each record of a particular attribute.

2.2 Data Visualization using D3.js:

Key features of D3.js[14]:

4. J.R. Quinlan,―Improved use of continuous attributes in C4.5ǁ, arXiv,1996 ,preprint cs/9603103.

You might also like