Unit 2 1
Unit 2 1
Unit 2 1
Introduction to analytics :
• Four types of analytics to improve decision
making :
• Descriptive analytics
• Diagnostic analytics
• Predictive analytics
• Prescriptive analytics
Data analytics capabilities
Descriptive analytics
Answers : what happened
mines – raw data from multiple sources
Gives valuable insight into the past
Ex: health care provider to analyze patients
BI consulting to analyze category of products
Retailer to find average sales of month
Mines raw data from multiple sources
Gives insight of past or only answers what
happened
Diagnostic analytics
• Answers- why something happened
• Finds dependencies and identifies patterns
• Gives deep insight into the problem
• Examples: retailer compares his sales by
subcategories
• Health care provider compares patients
response
Predictive analytics
• Answers: what is likely to happen
• Uses result of descriptive and diagnostic
analytics
• Tool for forecasting
• Predictive analytics allows for ex :
a leading FMCG company to predict what they
could expect after changing brand positioning.
Prescriptive analytics
• Answers : what action to take
• Eliminates future problem
• Gives a promising trend.
• An example -a multinational company was
able to identify opportunities for repeat
purchases based on customer analytics and
sales history.
Places where analytics is used :
Reporting Vs Analytics:
• Tools are the softwares that can be used for Analytics like
SAS (statistica l analysis system) or R.
• Techniques are the procedures to be followed to reach up
to a solution.
• Various steps involved in Analytics:
1. Access
2. Manage
3. Analyze
4. Report
Various Analytics techniques are:
1.Data Preparation
2. Reporting, Dashboards & Visualization
3. Segmentation Icon
4. Forecasting
5. Descriptive Modeling
6. Predictive Modeling
7. Optimization
Application of Modeling in Business:
• A statistical model embodies a set of assumptions concerning the
generation of the observed data, and similar data from a larger
population.
• The database contains tables consisting of columns and rows. When new
data is added, new records (row) are inserted into existing tables or new
tables are added. Relationships can then be made between two or more
tables.
• Relational databases are used when the data they contain doesn’t change
very often, and when accuracy is crucial.
• Used mostly in financial applications.
• For example, a shop could store details of their customers’ names and
addresses in one table and details of their orders in another.
Non-relational database
• For example, a large store might have a database in which each customer
has their own document containing all of their information, from name
and address to order history and credit card information.
• Non-relational databases perform faster since
Query doesn’t have to view several tables in
order to deliver an answer.
Ex: Manufacturing industry also have their data divided in the groups discussed
above. Like production quantity is a discrete quantity while production rate is a
continuous data. Similarly quality parameter can be given ratings which ordinal
data.
Attribute
It is a data field that represents characteristics or features of a data object.
For a customer object attributes can be customer id, address etc.
Set of attributes used to describe a given object are known as attribute vector or feature vector.
Type of attributes :
We differentiate between different types of attributes and then preprocess the data.
It is the first step in data pre processing.
Attribute types
• Hierarchical model
• Relational model
• Network model
• Object oriented model
• Entity relationship model
Hierarchial model :
This data model makes use of hierarchy to structure the data in a tree-like format.
However, retrieving and accessing data is difficult in a hierarchical database.It is
rarely used now.
Relational model:
• An alternative to hierarchical model.
• Here data is represented in the form of tables.
• It reduces the complexity
• Provides a clear overview of the data.
Network model
• Network model is inspired by hierarchical model.
• However, unlike the hierarchical model, this model makes it
easier to convey complex relationships as each record can be
linked with multiple parent records.
Object-oriented model
• consists of a collection of objects, each with its own features
and methods.
• It is also called the post-relational database model.
Entity-relationship model:
• Also known as ER model, represents entities and their
relationships in a graphical format.
• An entity could be anything – a concept, a piece of data, or an
object.
Importance of Data Modeling
• A clear representation of data makes it easier to
analyze the data properly. It provides a quick
overview of the data which can then be used by
the developers in varied applications.
• Data modeling represents the data properly in a
model. It rules out any chances of data
redundancy and omission. This helps in clear
analysis and processing.
• Data modeling improves data quality and enables
the concerned stakeholders to make data-driven
decisions.
Missing Imputations
An object may have missing one or more attribute values.
Reasons:
information was not collected.
Example some people decline to give their phone numbers or
age details.
some attributes are not applicable to all objects.
• For Example,
mean(x)
# returns NA
•
Handling missing values/Imputing missing values
newdata<- na.omit(mydata)