0% found this document useful (0 votes)
42 views7 pages

DA Unit-2

Data analytics and cybersecurity notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
42 views7 pages

DA Unit-2

Data analytics and cybersecurity notes
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Introduction to Analytics

 Predictive Analytics is an art of predicting future on the basis of past trend.


 It is a branch of Statistics which comprises of Modeling Techniques, Machine Learning
& Data Mining.
 Predictive Analytics is primarily used in Decision Making.

What and Why analytics:

 Analytics is a journey that involves a combination of potential skills, advanced


technologies, applications, and processes used by firm to gain business insights from data
and statistics.
 This is done to perform business planning.

Places where Analytics is used:

Reporting Vs Analytics:

 Reporting is presenting result of data analysis


 Analytics is process or systems involved in analysis of data to obtain a desired output.
Introduction to tools and Environment:

Analytics is now days used in all the fields ranging from Medical Science to Aero science to
Government Activities.

 Data Science and Analytics are used by Manufacturing companies as well as Real
Estate firms to develop their business and solve various issues by the help of historical
data base.
 Tools are the software that can be used for Analytics like SAS or R.
 While techniques are the procedures to be followed to reach up to a solution.
 Various steps involved in Analytics:

1. Access
2. Manage
3. Analyze
4. Report

Various Analytics techniques are:

1.Data Preparation
2. Reporting, Dashboards & Visualization
3. Segmentation Icon
4. Forecasting
5. Descriptive Modeling
6. Predictive Modeling
7. Optimization
Application of Modeling in Business

 A statistical model embodies a set of assumptions concerning the generation of the


observed data, and similar data from a larger population.

 A model represents, often in considerably idealized form, the data-generating process.

 Signal processing is an enabling technology that encompasses the fundamental theory,


applications, algorithms, and implementations of processing or transferring information
contained in many different physical, symbolic, or abstract formats broadly designated as
signals.

 It uses mathematical, statistical, computational, heuristic, and linguistic representations,


formalisms, and techniques for representation, modeling, analysis, synthesis, discovery,
recovery, sensing, acquisition, extraction, learning, security, or forensics.

 In manufacturing statistical models are used to define Warranty policies, solving various
conveyor related issues, Statistical Process Control etc.
Databases & Type of data and variables

 Data dictionary, or metadata repository


 "centralized repository of information about data such as meaning, relationships
to other data, origin, usage, and format” as defined in the IBM Dictionary of
Computing
 A document describing a database or collection of databases
 An integral component of a DBMS that is required to determine its structure
 A piece of middleware that extends or supplants the native data dictionary of a
DBMS

 Category of Data
 Data can be categorized on various parameters like Categorical, Type etc.
 Types of Data
 Basic 2 types
 Numeric
 Character.
 Numeric data can be further divided into sub group of
 Discrete
 Continuous.
 Again, Data can be divided into 2 categories
 Nominal
 Ordinal.
 Also based on usage data, divided into 2 categories
 Quantitative
 Qualitative

 Manufacturing industry also has their data divided in the groups discussed above.
 Like production quantity is a discrete quantity
 While production rate is a continuous data.
 Similarly quality parameter can be given ratings which ordinal data.
Data Modeling Techniques Overview

 Regression analysis mainly focuses on finding a relationship between a dependent


variable and one or more independent variables.
 Predict the value of a dependent variable based on the value of at least one independent
variable.
 It explains the impact of changes in an independent variable on the dependent variable.
 Y = f(X, β) where Y is the dependent variable X is the independent variable β is the
unknown coefficient.
 Widely used in prediction and forecasting.


Missing Imputations

 In R, missing values are represented by the symbol NA (not available).


 Impossible values (e.g., dividing by zero) are represented by the symbol NaN (not a
number). Unlike SAS, R uses the same symbol for character and numeric data.
 To test if there is any missing in the dataset we use is.na () function.
 For Example, We have defined “y” and then checked if there is any missing value.
 T or True means that there is a missing value. y <- c(1,2,3,NA) is.na(y) # returns a vector
(F FF T)
 Arithmetic functions on missing values yield missing values.
 For Example, x <- c(1,2,NA,3) mean(x) # returns NA To remove missing values from
our dataset we use na.omit() function.
 For Example, We can create new dataset without missing data as below: -

newdata<- na.omit(mydata)

 we can also use “na.rm=TRUE” in argument of the operator.


 From above example we use na.rm and get desired result. x <- c(1,2,NA,3) mean(x,
na.rm=TRUE)
 # returns 2
 MICE Package -> Multiple Imputation by Chained Equations MICE uses PMM to
impute missing values in a dataset.
 PMM-> Predictive Mean Matching (PMM) is a semi-parametric imputation approach.
 It is similar to the regression method except that for each missing value, it fills in a value
randomly from among the observed donor values from an observation whose regression-
predicted values are closest to the regression-predicted value for the missing value from
the simulated regression model.

You might also like