Module 1 - Introduction To Data Analytics
Module 1 - Introduction To Data Analytics
analytics
Pallabi Kakati
What is data
• Data has been the buzzword for ages
now.
• Either the data being generated from
large-scale enterprises
• or the data generated from an
individual,
• Each and every aspect of data needs
to be analyzed to benefit yourself
from it.
• But how do we do it?
• Well, that’s where the term ‘Data
Analytics’ comes in.
Types of Data
1. Categorical data are values or observations that can be divided into groups or categories.
There are two types of categorical values: nominal and ordinal.
• A nominal variable has no intrinsic order that is identified in its category.
• An ordinal variable instead has a predetermined order.
2. Predictive Analysis
• With the help of predictive analysis, we determine the future outcome.
• Based on the analysis of the historical data, we are able to forecast the
future.
• It makes use of descriptive analysis to generate predictions about the
future.
• With the help of technological advancements and machine learning, we are
able to obtain predictive insights about the future.
Predictive Modelling
• Predictive modeling is a process used in data analysis to create or choose a suitable statistical
model to predict the probability of a result.
• After exploring data you have all the information needed to develop the mathematical model that
encodes the relationship between the data.
• These models are useful for understanding the system under study, and in a specific way they are
used for two main purposes.
1. The first is to make predictions about the data values produced by the system; in this case, you
will be dealing with regression models.
2. The second is to classify new data products, and in this case, you will be using classification
models or clustering models.
In fact, it is possible to divide the models according to the type of result that they produce:
• Classification models: If the result obtained by the model type is categorical.
• Regression models: If the result obtained by the model type is numeric.
• Clustering models: If the result obtained by the model type is descriptive.
Types of Data Analysis
3. Diagnostic Analysis
• At times, businesses are required to think critically about the nature of data
and understand the descriptive analysis in depth.
• In order to find issues in the data, we need to find anomalous patterns that
might contribute towards the poor performance of our model.
• With diagnostic analysis, you are able to diagnose various problems that
are exhibited through your data.
• Businesses use this technique to reduce their losses and optimize their
performances.
Types of Data Analysis
4. Prescriptive Analysis
• Prescriptive Analysis combines the insights from all of the above analytical techniques.
• It is referred to as the final frontier of data analytics.
• Through the details provided by the descriptive and predictive analytics, prescriptive
analytics allows the companies to make decisions based on them.
• It makes heavy usage of artificial intelligence in order to facilitate companies into
making careful business decisions.
• Major industrial players like Facebook, Netflix, Amazon, and Google are using
prescriptive analytics to make key business decisions.
• Furthermore, financial institutions are gradually leveraging the power of this technique
to increase their revenue.
Data Exploration/Visualization
• Exploring the data is essentially the search for data
in a graphical or statistical presentation in order to
find patterns, connections, and relationships in
the data.
• Data visualization is the best tool to highlight
possible patterns.
• Data exploration consists of a preliminary
examination of the data, which is important for
understanding the type of information that has
been collected and what they mean.
• In combination with the information acquired
during the definition problem, this categorization
will determine which method of data analysis
will be most suitable for arriving at a model
definition.
https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
Model Validation
• Validation of the model, that is, the test phase, is an
important phase that allows you to validate the model
built on the basis of starting data.
• That is important because it allows you to assess the
validity of the data produced by the model by
comparing them directly with the actual system.
• But this time, you are coming out from the set of
starting data on which the entire analysis has been
established.
• Generally, you will refer to the data as the training set,
when you are using them for building the model, and as
the validation set, when you are using them for
validating the model.
https://odsc.com/blog/the-comprehensive-guide-to-model-validation-framework-what-is-a-robust-machine-learning-model/
Model Validation
• Thus, by comparing the data produced by the model with those produced by the system
you will be able to evaluate the error, and using different test datasets, you can estimate
the limits of validity of the generated model.
• In fact the correctly predicted values could be valid only within a certain range, or
have different levels of matching depending on the range of values taken into account.
• This process allows you not only to numerically evaluate the effectiveness of the model
but also to compare it with any other existing models.
• There are several techniques in this regard; the most famous is the cross-validation.
• This technique is based on the division of the training set into different parts.
• Each of these parts, in turn, will be used as the validation set and any other as the training
set.
• In this iterative manner, you will have an increasingly perfected model.
Deployment
• This is the final step of the analysis process,
which aims to present the results, that is, the
conclusions of the analysis.
• In the deployment process, in the business
environment, the analysis is translated into
a benefit for the client who has
commissioned it.
• In technical or scientific environments, it is
translated into design solutions or
scientific publications.
• That is, the deployment basically consists of
putting into practice the results obtained from
the data analysis.
Deployment
• The analysis report should be directed to the managers, who are then able to make
decisions.
• Then, they will really put into practice the conclusions of the analysis.
• In the documentation supplied by the analyst, each of these four topics will generally
be discussed in detail:
• Analysis results
• Decision deployment
• Risk analysis
• Measuring the business impact
• When the results of the project include the generation of predictive models, these
models can be deployed as a stand-alone application or can be integrated within other
software.
Applications of Data Analytics
1. Fraud Detection & Risk Analytics
• In Banking, Data Analytics is heavily utilized for analyzing anomalous transaction
and customer details.
• Banks also use data analytics to analyze loan defaulters and credit scores for their
customers in order to minimize losses and prevent frauds.
2. Optimizing Transport Routes
• Companies like Uber and Ola are heavily dependent on data analytics to optimize
routes and fare for their customers.
• They use an analytical platform that analyzes the best route and calculates
percentage rise and drop in taxi fares based on several parameters.
Applications of Data Analytics
3. Providing Better Healthcare
• With the help of data analytics, hospitals and healthcare centers are able to predict
early onset of chronic diseases.
• They are able to predict diseases that might occur in the future and help the
patients to take early action that would help them to reduce medical expenditure.