0% found this document useful (0 votes)
49 views

Module 1 - Introduction To Data Analytics

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
49 views

Module 1 - Introduction To Data Analytics

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 21

Module 1- Introduction to data

analytics
Pallabi Kakati
What is data
• Data has been the buzzword for ages
now.
• Either the data being generated from
large-scale enterprises
• or the data generated from an
individual,
• Each and every aspect of data needs
to be analyzed to benefit yourself
from it.
• But how do we do it?
• Well, that’s where the term ‘Data
Analytics’ comes in.
Types of Data
1. Categorical data are values or observations that can be divided into groups or categories.
There are two types of categorical values: nominal and ordinal.
• A nominal variable has no intrinsic order that is identified in its category.
• An ordinal variable instead has a predetermined order.

2. Numerical data are values or observations that come from measurements.


There are two types of different numerical values: discrete and continuous numbers.
• Discrete values are values that can be counted and that are distinct and separated from
each other.
• Continuous values, on the other hand, are values produced by measurements or
observations that assume any value within a defined range.
What is Data Analytics for Beginners?
• Data Analytics refers to the techniques
used to analyze data to enhance
productivity and business gain.
• Data is extracted from various sources
and is cleaned and categorized to
analyze various behavioral patterns.
• The techniques and the tools used vary
according to the organization or
individual.
• The various steps of data analytics
process are shown in the figure
Problem Definition
• The process of data analysis actually begins long before the collection of raw data.
• In fact, a data analysis always starts with a problem to be solved, which needs to
be defined.
• The problem is defined only after you have well-focused the system you want to
study: this may be a mechanism, an application, or a process in general.
• Generally this study can be in order to better understand its operation, but in
particular the study will be designed to understand the principles of its behavior in
order to be able to make predictions, or to make choices (defined as an informed
choice).
Data Extraction
• Once the problem has been defined, the first step is to obtain the data in order to perform
the analysis.
• The data must be chosen with the basic purpose of building the predictive model, and so
their selection is crucial for the success of the analysis as well.
• In fact, even using huge data sets of raw data, often, if they are not collected competently,
these may portray false or unbalanced situations compared to the actual ones.
• Thus, a poor choice of data, or even performing analysis on a data set which is not
perfectly representative of the system, will lead to models that will move away from the
system under study.
• Data extraction requires a careful understanding of the nature of the data and their form,
which only good experience and knowledge in the problem’s application field can give.
• Regardless of the quality and quantity of data needed, another issue is the search and the
correct choice of data sources.
Data Preparation
• Among all the steps involved in data analysis, data preparation requires more
resources and more time to be completed.
• The collected data are often collected from different data sources, each of which
will have the data in it with a different representation and format.
• So, all of these data will have to be prepared for the process of data analysis.
• The preparation of the data is concerned with obtaining, cleaning, normalizing,
and transforming data into an optimized data set, that is, in a prepared format,
normally tabular, suitable for the methods of analysis that have been scheduled
during the design phase.
• Many problems are associated with data preparation, that must be avoided, such
as invalid, ambiguous, or missing values, replicated fields, or out-of-range
data.
Types of Data Analysis
Types of Data Analysis
1. Descriptive Analysis
• With the help of descriptive analysis, we analyze and describe the features of
a data.
• Descriptive Analysis deals with the summarization of information.
• Descriptive Analysis, when coupled with visual analysis provides us with a
comprehensive structure of data.
• In the descriptive analysis, we deal with the past data to draw conclusions
and present our data in the form of dashboards.
• In businesses, Descriptive Analysis is used for determining the Key
Performance Indicator or KPI to evaluate the performance of the business.
Types of Data Analysis

2. Predictive Analysis
• With the help of predictive analysis, we determine the future outcome.
• Based on the analysis of the historical data, we are able to forecast the
future.
• It makes use of descriptive analysis to generate predictions about the
future.
• With the help of technological advancements and machine learning, we are
able to obtain predictive insights about the future.
Predictive Modelling
• Predictive modeling is a process used in data analysis to create or choose a suitable statistical
model to predict the probability of a result.
• After exploring data you have all the information needed to develop the mathematical model that
encodes the relationship between the data.
• These models are useful for understanding the system under study, and in a specific way they are
used for two main purposes.
1. The first is to make predictions about the data values produced by the system; in this case, you
will be dealing with regression models.
2. The second is to classify new data products, and in this case, you will be using classification
models or clustering models.
In fact, it is possible to divide the models according to the type of result that they produce:
• Classification models: If the result obtained by the model type is categorical.
• Regression models: If the result obtained by the model type is numeric.
• Clustering models: If the result obtained by the model type is descriptive.
Types of Data Analysis
3. Diagnostic Analysis
• At times, businesses are required to think critically about the nature of data
and understand the descriptive analysis in depth.
• In order to find issues in the data, we need to find anomalous patterns that
might contribute towards the poor performance of our model.
• With diagnostic analysis, you are able to diagnose various problems that
are exhibited through your data.
• Businesses use this technique to reduce their losses and optimize their
performances.
Types of Data Analysis
4. Prescriptive Analysis
• Prescriptive Analysis combines the insights from all of the above analytical techniques.
• It is referred to as the final frontier of data analytics.
• Through the details provided by the descriptive and predictive analytics, prescriptive
analytics allows the companies to make decisions based on them.
• It makes heavy usage of artificial intelligence in order to facilitate companies into
making careful business decisions.
• Major industrial players like Facebook, Netflix, Amazon, and Google are using
prescriptive analytics to make key business decisions.
• Furthermore, financial institutions are gradually leveraging the power of this technique
to increase their revenue.
Data Exploration/Visualization
• Exploring the data is essentially the search for data
in a graphical or statistical presentation in order to
find patterns, connections, and relationships in
the data.
• Data visualization is the best tool to highlight
possible patterns.
• Data exploration consists of a preliminary
examination of the data, which is important for
understanding the type of information that has
been collected and what they mean.
• In combination with the information acquired
during the definition problem, this categorization
will determine which method of data analysis
will be most suitable for arriving at a model
definition.

https://towardsdatascience.com/5-quick-and-easy-data-visualizations-in-python-with-code-a2284bae952f
Model Validation
• Validation of the model, that is, the test phase, is an
important phase that allows you to validate the model
built on the basis of starting data.
• That is important because it allows you to assess the
validity of the data produced by the model by
comparing them directly with the actual system.
• But this time, you are coming out from the set of
starting data on which the entire analysis has been
established.
• Generally, you will refer to the data as the training set,
when you are using them for building the model, and as
the validation set, when you are using them for
validating the model.

https://odsc.com/blog/the-comprehensive-guide-to-model-validation-framework-what-is-a-robust-machine-learning-model/
Model Validation
• Thus, by comparing the data produced by the model with those produced by the system
you will be able to evaluate the error, and using different test datasets, you can estimate
the limits of validity of the generated model.
• In fact the correctly predicted values could be valid only within a certain range, or
have different levels of matching depending on the range of values taken into account.
• This process allows you not only to numerically evaluate the effectiveness of the model
but also to compare it with any other existing models.
• There are several techniques in this regard; the most famous is the cross-validation.
• This technique is based on the division of the training set into different parts.
• Each of these parts, in turn, will be used as the validation set and any other as the training
set.
• In this iterative manner, you will have an increasingly perfected model.
Deployment
• This is the final step of the analysis process,
which aims to present the results, that is, the
conclusions of the analysis.
• In the deployment process, in the business
environment, the analysis is translated into
a benefit for the client who has
commissioned it.
• In technical or scientific environments, it is
translated into design solutions or
scientific publications.
• That is, the deployment basically consists of
putting into practice the results obtained from
the data analysis.
Deployment
• The analysis report should be directed to the managers, who are then able to make
decisions.
• Then, they will really put into practice the conclusions of the analysis.
• In the documentation supplied by the analyst, each of these four topics will generally
be discussed in detail:
• Analysis results
• Decision deployment
• Risk analysis
• Measuring the business impact
• When the results of the project include the generation of predictive models, these
models can be deployed as a stand-alone application or can be integrated within other
software.
Applications of Data Analytics
1. Fraud Detection & Risk Analytics
• In Banking, Data Analytics is heavily utilized for analyzing anomalous transaction
and customer details.
• Banks also use data analytics to analyze loan defaulters and credit scores for their
customers in order to minimize losses and prevent frauds.
2. Optimizing Transport Routes
• Companies like Uber and Ola are heavily dependent on data analytics to optimize
routes and fare for their customers.
• They use an analytical platform that analyzes the best route and calculates
percentage rise and drop in taxi fares based on several parameters.
Applications of Data Analytics
3. Providing Better Healthcare
• With the help of data analytics, hospitals and healthcare centers are able to predict
early onset of chronic diseases.
• They are able to predict diseases that might occur in the future and help the
patients to take early action that would help them to reduce medical expenditure.

4. Managing Energy Expenditure


• Public-sector energy companies are using data analytics to monitor the usage of
energy by households and industries.
• Based on the usage patterns, they are optimizing energy supply in order to reduce
costs and cut down on energy consumption.
Applications of Data Analytics
5. Improving Search Results
• Companies like Google are using data analytics to provide search results to users
based on their preferences and search history.
• Furthermore, companies like Airbnb use search analytics to provide the best
accommodation to its customers.
• Amazon also makes use of search analytics to provide recommendations to
customers.
6. Optimization of Logistics
• Various companies are relying on Big Data Analytics to analyze supply chains and
reduce latency in logistics.
• Companies like Amazon are using consumer analytics to analyze their
requirements and send them products without any latency.

You might also like