LJS Assignment 3
LJS Assignment 3
LJS Assignment 3
Contents
Introduction...............................................................................................................................2
Problem Statement....................................................................................................................2
What is Data Mining ?................................................................................................................3
Why do we have to use data mining?........................................................................................3
Processes of data mining...........................................................................................................3
Data mining techniques.............................................................................................................4
Examples of major data mining application areas.....................................................................5
Predictive analytics.....................................................................................................................5
Actual examples of Data Mining Using it’s tool : Orange toolkit...............................................7
Conclusion................................................................................................................................10
References................................................................................................................................11
Page 1 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
Introduction
This assignment is about data mining and how do we apply our research, analytical skills,
problem-solving skills, teamwork and collaboration, project management and
communication skills towards this topic. The modern problem requires the modern type of
solution so in order to solve modern days business problems we need to approach it by
using modern-day techniques which is data mining. The term data mining refers to the
process of extracting knowledge/interest by creating a predictive model and descriptive
model from discovering unknown trends and patterns in a huge amount of data. By
conducting this assignment we have used data mining tools like Orange toolkit which is free
and easy to use for everyone by inserting sample data set into the toolkit. For us, it is
videogame sales data set and we have created diagrams and charts to show details about
data set how they are co-related to each other and discuss important factors to solve
business problems. We also have created predictive models that is likely to affect future
sales model for any game sales company and by looking at it a company can use this model
to their decision making.
Problem Statement
Why does anyone do business? To make the world a greater place or for their own comfort?
no most businesses aim for profit which is a factor that affects everything in a business
because without money you can't do business. In business, every step of creating a
company requires capital where you can raise it by fundraising or start it with your own
savings. As time passes there are millions of companies created worldwide meaning that
there are competitors everywhere so it's hard for people to create a successful company.
Since our database is about video game sales let's say that we are creating a video game
sales company where we get paid to sell other companies' games worldwide. So how do we
maximize our profits? It's done by selling tons of games so that we get paid to increase our
profit but not everyone likes the game we sell so we need to gather information about the
games sold worldwide and its trend. To explain it briefly we need to gather specific
information on the region that we sell the games and their game genre preference by the
age and popular gaming platforms and popular gaming companies. In order to do that we
can create a predictive model by using the information gathered so that we can see which
games are likely to be sold in the near future.
Page 2 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
It contains a lot of features that help the company to increase its efficiency. First of all, since
all the company has abundant of data they need to interpret these data into useful
knowledge and information so that they can use it for analyzing applications related to
market analysis, exploration in science and, production control. The data that the company
owns is not any simple data it can be financial data in banking, sales in retail, medical history
data in health care, policy and claim data in insurance. If a human wants to replace a data
mining software it is nearly impossible since there are millions of records with so many
attributes and it rapidly increases day by day that with the normal human speed they can't
even interpret current data.
Data mining is categorized into six processes in order to change the data into useful
knowledge which is:
1. Problem definition: the data mining project is started with identifying and understanding
of the business problem. Data mining experts, domain experts, and business experts work
closely together to define the objectives of the project and what are the requirements
needed from a business perspective. In the problem definition phase, data mining tools are
not required.
2. Data exploration: Domain experts understand what is the meaning of the metadata. They
explore, collect, and describe the data. Also, they identify the quality problems of the data.
In the data exploration phase, traditional data analysis tools are required.
Page 3 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
3. Data preparation: In modeling process domain experts build the data model. They collect,
format, and cleanse the data because some of the data mining features only accepts in a
certain format. They also create new derived attributes. In the data preparation phase, data
is twisted multiple times in no order. The data modeling tool is required during this phase to
prepare the data by selecting records, attributes, and tables but the meaning of the data
remains the same.
4.Data modeling: Data mining experts select and apply various data mining features because
you can use different data mining features for the identical type of data mining problem.
Some of the features require specific data types. In the data modeling phase, the evaluation
phase is combined to process it together they can be repeated several times to change the
parameters until achievement of optimal value is done.
5. Data evaluation: Data mining experts evaluate the data model. If the their expectations
with a data model are not satisfied they go back to the process of the data modeling phase
and rebuild the model by changing it's parameters until they achieve optimal values. When
they are satisfied with the data model they can extract business explanations and evaluate
questions like "Have all business issues been considered" or "Does the model achieve the
business objective?". At the end of this data evaluation phase, data mining experts decide
how to use the data mining results.
6. Deployment: Data mining experts use data mining results by taking out the results into
other applications or database tables.
Usually, there are three types of data mining techniques which are classification and
regression, clustering and deviation detection. Classification and regression is a process of
assigning a new data record to one of several predefined classes or categories and
regression deals with predicting real-valued fields which are also called supervised learning.
Clustering is a process of partitioning the dataset into groups or subsets so that elements of
a group share a common set of properties which are also called unsupervised learning.
Deviation detection is a process of finding the records that are most different from the other
records such as finding all outliers. These records may be thrown away as noise.
The major areas for data mining can be marketing or banking. For marketing most major
direct marketing companies uses modeling and data mining to perform customer
segmentation to find out where their products are sold to find out information that leads to
their marketing. All industries can take advantage of Data Mining to discover discrete
segments in their customer bases by considering additional variables beyond traditional
analysis. For banking, they use the Data Mining for credit card fraud detection and to
identify their loyal customers. By identifying customer card issuers, acquirers and segments
Page 4 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
they can improve profitability with more effective retention and acquisition programs. Card
issuers can take advantage of data mining technology to price their products so as to
maximize profit and minimize the loss of customers. It also allows detecting credit card
fraud, money laundering, phone fraud, and securities fraud.
Predictive analytics
It helps the business user to discover unknown patterns and builds predictive models such
as decision tree and linear regression. It’s techniques can be differed into data mining,
machine learning and, predictive modeling. The two most known models are Knowledge
Discovery from Data (KDD) and Cross-Industry Standard Process for Data Mining (CRISP-
DM) model. First, we will discuss KDD models. KDD models refer to the process of finding
knowledge in data that emphasizes the high-level application of particular data mining
methods. The goal of the KDD model is to extract knowledge from data in the context of
large databases which is done by using data mining methods. KDD consists of various
processes to achieve its goal which are Selection, Preprocessing & cleaning, Transformation
& feature selection, Data mining, and Interpretation Evaluation. In detail, it starts with
creating a target data set which is selecting the data set and by using the selected data set it
performs data cleaning and preprocessing which includes removing noise and outliers,
collecting the necessary information to model or account for noise, deciding on strategies
for how to handle the missing data fields, and accounting for information based on time
sequence and known changes. After this process, it goes into data reduction and projection
which is finding useful features to represent the data depending on the goal of the task and
reducing the effective number of variables for the data by using transformation methods.
Now it's data mining process where it performs searching for patterns of interest in a
particular form of a collection of such representation as classification trees or rules,
regression, and clustering. And for the last, it interprets mined patterns.
Secondly, it is CRISP-DM models. This technology was created in 1996 and funded by the
European Commission which is widely used by business companies. The way these models
work is by going through several numbers of tasks. At first, it goes through by learning
business objectives and assess the situation. Next, it sets data mining goals and projects a
plan to go through. After setting up the plan it collects, describes, explore the data and
verify its quality. When it's done it selects, clean, construct, integrate, and format the data.
While doing this process it selects modeling techniques to use designs the test and build the
model and assess it. It evaluates results and reviews the process to determine the next
steps. Finally, when everything looks perfect it plans for deployment, monitoring, and
maintenance. It produces a final report and reviews the project when it's completed.
Page 5 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
By using video games databases I have inserted this example to orange toolkit to show how
to use it for data mining.
Page 6 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
As shown above, I have used two database subsets which are genre and global sales to find
out segments for which types of genre games are sold. In this picture, it shows that the
Action genre is the most sold product in this database.
Page 7 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
This is a predictive generalized liner model for my database for global sales.
Page 8 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
Conclusion
Data mining is really important and in the 21st century, every company needs data mining
to increase or maximize their profit by discovering hidden patterns in information that they
have gathered which can't be done by humans since there is so much information. There are
so many data mining tools and techniques developed some of them are free to use but a
company dealing with real business should pay for better service. By conducting this
assignment I have learned various ways and methods about data mining. Through
understanding its process I used the features of data mining to create a diagram for my
video game sales database to plan for strategic decisions and finding a segment for my
sales. I learned that data mining is so beneficial and helpful for companies who are seeking
for improvements.
Page 9 of 10
IBM 2101 INTRODUCTION TO BUSINESS ANALYTICS
References
Big Data Made Simple. (2019). 12 common problems in Data Mining. [online] Available at:
https://bigdata-madesimple.com/12-common-problems-in-data-mining/ [Accessed 22 Nov. 2019].
Hackernoon.com. (2019). 9 unusual problems that can be solved using Data Science. [online]
Available at: https://hackernoon.com/9-unusual-problems-that-can-be-solved-using-data-science-
e7dbb89aa0c4 [Accessed 22 Nov. 2019].
Insights, S. (2019). Predictive Analytics: What it is and why it matters. [online] Sas.com. Available at:
https://www.sas.com/en_us/insights/analytics/predictive-analytics.html [Accessed 22 Nov. 2019].
McKinsey & Company. (2019). Three keys to building a data-driven strategy. [online] Available at:
https://www.mckinsey.com/business-functions/mckinsey-digital/our-insights/three-keys-to-building-a-
data-driven-strategy [Accessed 22 Nov. 2019].
Page 10 of 10