Data Analytics-Unit1 Notes
Data Analytics-Unit1 Notes
Relational Databases:
After the von Neumann architecture was invented, the data had
been regarded and processed as data to be processed for data analysis. The turning
point was the appearance of RDB (relational database) in the 1980s which allowed
users to write Sequel (SQL) to retrieve data from a database. For users, the advantage
of RDB and SQL is to be able to analyze their data on demand. It made the process to
get data easy and helped to spread database use. As you see, the combination of
easier/cheaper data collection with cheaper/faster data storage/retrieval technology
has pushed the boundaries of what we can do with data.
1
DATA ANALYTICS
Data Mining:
Data mining, which appeared around the 1990s, is the computational
process to discover patterns in large datasets. By analyzing data in a different way
from usual methods, unexpected but beneficial results could be expected. The
development of data mining was made possible thanks to database and data
warehouse technologies, which enable companies to store more data and still analyze
it in a reasonable manner. A general business trend emerged, where companies started
to “predict” customers' potential needs based on analysis of historical purchasing
patterns.
Conclusion:
As we’ve seen, data analysis and computer technology have been
developing and affecting each other, ever since the advent of computing. As the
collected data size gets larger, new methods of data analysis have been introduced in
each stage, out of necessity. As data collection and computing gets even cheaper, we
should continue to see breakthroughs in the area of big data.
2
DATA ANALYTICS
Data analytics Overview
2. Better Customer Service: Data analytics allows you to tailor customer service
according to their needs. It also provides personalization and builds stronger
relationships with customers. Analyzed data can reveal information about
customers’ interests, concerns, and more. It helps you give better
recommendations for products and services.
3. Efficient Operations: With the help of data analytics, you can streamline your
processes, save money, and boost production. With an improved understanding of
what your audience wants, you spend lesser time creating ads and content that
aren’t in line with your audience’s interests.
4. Effective Marketing: Data analytics gives you valuable insights into how your
campaigns are performing. This helps in fine-tuning them for optimal outcomes.
Additionally, you can also find potential customers who are most likely to
interact with a campaign and convert into leads.
3
DATA ANALYTICS
Steps involved in Data analytics:
There are a few steps that are involved in the data analytics lifecycle. Let’s have a
look at it with the help of an analogy.
Imagine you are running an e-commerce business and your company has nearly a
million in customer base. Your aim is to figure out certain problems related to your
business, and subsequently come up with data-driven solutions to grow your business.
Below are the steps that you can take to solve your problems.
Let's illustrate these steps with an example related to an e-commerce company, like
Flipkart:
You also use predictive modeling to develop algorithms that can optimize delivery
routes and times. For instance, you use machine learning to predict which routes are
likely to experience delays based on historical data and external factors like traffic
and weather.
In this example, the steps of the analytics process help address a specific problem in
the e-commerce business by leveraging historical data, data cleaning, and
sophisticated analysis techniques to make data-driven decisions that optimize the
delivery system. These steps can be applied to various aspects of e-commerce
operations to drive improvements and achieve business goals.
5
DATA ANALYTICS
6. Apache Spark: Apache Spark is an open-source data analytics engine that
processes data in real-time and carries out sophisticated analytics using SQL queries
and machine learning algorithms.
7. SAS: SAS is a statistical analysis software that can help you perform analytics,
visualize data, write SQL queries, perform statistical analysis, and build machine
learning models to make future predictions.
1. Retail: Data analytics helps retailers understand their customer needs and buying
habits to predict trends, recommend new products, and boost their business. They
optimize the supply chain, and retail operations at every step of the customer journey.
2. Healthcare: Healthcare industries analyze patient data to provide lifesaving
diagnoses and treatment options. Data analytics help in discovering new drug
development methods as well.
3.Manufacturing: Using data analytics, manufacturing sectors can discover new
cost-saving opportunities. They can solve complex supply chain issues, labor
constraints, and equipment breakdowns.
4.Banking sector: Banking and financial institutions use analytics to find out
probable loan defaulters and customer churn out rate(knowing which customers are
likely to leave or unsubscribe from your service.). It also helps in detecting fraudulent
transactions immediately.
5. Logistics: Logistics companies use data analytics to develop new business models
and optimize routes. This, in turn,ensures that the delivery reaches on time in a cost-
efficient manner.
6
DATA ANALYTICS
Phase 1: Discovery –
Steps to explore, preprocess, and condition data prior to modeling and analysis.
It requires the presence of an analytic sandbox, the team execute, load, and
transform, to get data into the sandbox.
Data preparation tasks are likely to be performed multiple times and not in
predefined order.
Phase 3: Model Planning –
The team communicates benefits of project more broadly and sets up pilot project
to deploy work in controlled way before broadening the work to full enterprise of
users.
This approach enables team to learn about performance and related constraints of
the model in production environment on small scale  , and make
adjustments before full deployment.
The team delivers final reports, briefings, codes.
WHAT IS DATA ANALYTICS IN BUSINESS?
Data analytics is the practice of examining data to answer questions, identify
trends, and extract insights.
When data analytics is used in business, it’s often called business analytics.
You can use tools, frameworks, and software to analyze data, such as Microsoft Excel
and Power BI, Google Charts, Data Wrapper, Infogram, Tableau, and Zoho Analytics.
7
DATA ANALYTICS
These can help you examine data from different angles and create visualizations that
illuminate the story you’re trying to tell.
8
DATA ANALYTICS
process likely utilizes machine learning to spot patterns and connections in data
automatically.
Diagnostic analytics examines why things happened the way they did, diagnosing
a problem or root cause. It seeks to identify causes of trends and anomalies that
descriptive analytics may have previously spotted. Diagnostic analytics can do
this with data mining and correlation, among other methods.
As the name suggests, predictive analytics uses historical data to make
predictions. It provides forecasts on probability and possible effects of particular
future outcomes. This enables the management of organizations to work with a
proactive, data-backed approach to their decision-making. A company can also
utilize predictive analytics to understand the possible impact of problems.
And finally, prescriptive analytics makes use of results from descriptive,
diagnostic, and predictive analytics to arrive at suggestions for businesses to
ensure good potential outcomes.
1. Descriptive analytics:
Descriptive analytics is a statistical interpretation used to analyze historical
data to identify patterns and relationships. Descriptive analytics seeks to describe an
event, phenomenon, or outcome. It helps understand what has happened in the past
and provides businesses the perfect base to track trends.
Descriptive analytics is about finding meaning within data. Data needs context:
analytics provide the where and when turning figures into measurable patterns.
The practice of descriptive analytics produces business metrics, reports, and KPIs
(Key Performance Indicators) to help businesses track their performance and different
trends. As a result, companies understand what's happened thus far and, when
combined with the other types of business analytics, get an idea of why things
happened, what things may occur, and how to prepare for future events.
Here’s a descriptive analytics example — a very timely one in today’s digital
world — social media engagement. Descriptive analytics provides metrics that help
businesses figure out the return rate on different social media initiatives. These
initiatives include engagement rates, numbers of followers, whether they’re growing
or declining, and revenue generated via social media platforms.
10
DATA ANALYTICS
charts, or line graphs. Visible data is easier to grasp. Finance specialists on the other
hand, may want the information presented through numbers and tables.
11
DATA ANALYTICS
The frequency distribution is a method that provides an overview of all the
responses to a question.
The bar chart is a visual representation that displays how responses vary on
different dimensions.
The pie chart displays how responses vary on different dimensions.
A scatterplot displays how two variables relate to each other.
A histogram provides an overview of all the responses to a question, with each
response grouped into bins according to some criterion such as age or income
level.
Measures:
Mean: The sum of all observations divided by the total number of observations
Median: The middle or central value in an ordered set
Mode: A commonly used measure of central tendency
3. Measures of Dispersion At times, understanding how data is distributed across a
range is crucial. Consider the average weight of a sample of two people to further
explain this. The average weight will be 60 kilograms if both people weigh 60
kilograms. The average weight is still 60 kg even if one person weighs 40 kg and the
other 80 kg. This type of distribution can be measured using dispersion metrics like
range or standard deviation.
12
DATA ANALYTICS
Variance is the average squared distance between the data points and the
mean.
A standard deviation (or σ) is a measure of how dispersed the data is in
relation to the mean
4. Measures of Position
Identifying the position of a single value or its response in relation to others is
another aspect of descriptive analysis. In this field of expertise, metrics like
percentiles and quartiles are extremely helpful.
Measures:
Quartiles: Split the data into four equal parts. The first quartile (Q1) is the
same as the 25th percentile, and the third quartile (Q3) is the same as the 75th
percentile.
Percentiles: Divide the data into 100 equal regions. They describe the position
of a data point relative to the rest of the dataset using a percent.
(Eg:If a person scores 80th percentile,it means he is 80% better than the rest of
the people)
Deciles: Split the data into ten equal parts.
Standard scores: Also known as z-scores. A z-score is a statistical
measurement that describes how far a value is from the mean of a group of
values. It's also known as a standard score
.z = (x − μ)/ σ,Where x is the test value, μ is the mean, and σ is the standard
value
5. Contingency table
In statistics, a contingency table, also known as a two-way frequency table—is a
tabular representation with at least two rows and two columns that are used to present
categorical data as frequency counts. For instance, the contingency table below, which
has two rows and five columns, displays the findings of a random sample of 2200
adults categorized by gender and preferred method of eating Icy dessert.
Excel: Microsoft Excel is a widely used tool that can be used for simple
descriptive analytics. It has powerful statistical and data visualization capabilities.
14
DATA ANALYTICS
Pivot tables are a particularly useful feature for summarizing and analyzing large
data sets.
Tableau: Tableau is a data visualization tool that is used to represent data in a
graphical or pictorial format. It can handle large data sets and allows for real-time
data analysis.
Power BI: Power BI, another product from Microsoft, is a business analytics tool
that provides interactive visualizations with self-service business intelligence
capabilities.
R and Python: Both are programming languages that have robust capabilities for
statistical analysis and data visualization. With packages like pandas, matplotlib,
seaborn in Python and ggplot2, dplyr in R, these languages are powerful tools for
descriptive analytics.
2. Diagnostic analytics
Diagnostic analytics is a form of data analytics that examines data or content to
answer the question, “Why did it happen?” It is characterized by techniques such as
drill-down, data discovery, data mining, and correlations. It’s used to identify
behaviors, trends, and patterns to figure out why certain outcomes have occurred. This
type of analysis is more advanced than descriptive analytics (which simply describes
what has happened) but not as advanced as predictive analytics (which makes
predictions about the future based on the data) or prescriptive analytics (which
suggests actions to benefit from predictions and optimize outcomes).
**Scenario**: Imagine you work for a retail company, and you've noticed a
sudden and significant drop in online sales over the past quarter.
1. **Drill-Down Analysis**:
- Start with the general question: "Why did online sales drop last quarter?"
- Drill down by breaking sales data into specific categories like product types,
geographic regions, and time periods.
- Find that the drop in sales was most significant in the electronics category in
the Western region during a specific month.
2. **Data Discovery**:
- Use data visualization tools to explore the sales data.
- Discover a pattern: Sales dropped right after the company updated its website
and changed the checkout process, leading to increased cart abandonment.
3. **Data Mining**:
- Analyze large volumes of customer data, website interactions, and purchase
history.
- Discover a cluster of customers who abandoned their carts after the website
update and identify common characteristics or behaviors.
4. **Correlation Analysis**:
- Analyze the correlation between website page load times and cart
abandonment rates.
- Find that longer page load times are strongly correlated with higher cart
abandonment.
5. **Regression Analysis**:
- Conduct regression analysis to predict the impact of page load times
(independent variable) on sales (dependent variable).
- Determine that a 1-second increase in page load time results in a 5% decrease
in sales.
By applying these diagnostic analytics techniques, you can determine that the
drop in online sales was primarily due to slower page load times caused by
increased traffic after a successful marketing campaign. This diagnosis enables
you to take corrective actions, such as upgrading server capacity, optimizing the
16
DATA ANALYTICS
website, and better coordinating marketing efforts with IT resources, to prevent
similar issues in the future.
Define the Problem or Outcome: The first step is to clearly define the outcome
you are trying to understand. This could be a particular trend, a performance
metric, a business problem, or any other outcome that you have observed in your
descriptive analytics.
Data Collection: Once you’ve defined the problem, the next step is to gather the
relevant data. This could involve compiling data from various sources, such as
databases, logs, surveys, etc.
Data Cleaning and Preparation: After collecting the data, it’s important to
clean and prepare it for analysis. This involves removing or correcting errors,
handling missing values, and possibly transforming the data into a suitable
format.
Data Analysis: The next step is to conduct the analysis. This typically involves
using statistical methods to identify patterns, correlations, or trends in the data.
The specific techniques used will depend on the problem and the data, but could
include methods like regression analysis, time-series analysis, correlation
analysis, etc.
Interpret Results: After the analysis, the results need to be interpreted. This
involves understanding the relationships and patterns identified in the data and
drawing conclusions about the causes of the outcome you’re investigating.
Communicate Findings: The last step is to communicate the findings. This
could involve creating a report or a presentation that clearly explains the results
of the analysis and the conclusions drawn.
Take Action: Based on the insights from the diagnostic analytics, the appropriate
actions are taken to address the identified issues or to exploit the discovered
opportunities.
17
DATA ANALYTICS
In risk management and fraud detection: Diagnostic analytics can help
identify patterns that might suggest fraudulent activity or highlight areas of risk
that need to be managed.
In product development and innovation: If a product isn’t performing as well
as expected or if you’re trying to understand how a product can be improved,
diagnostic analytics can help identify factors that are impacting product
performance.
18
DATA ANALYTICS
20
DATA ANALYTICS
3.Predictive analytics:
22
DATA ANALYTICS
Deploy your model. Deploy your predictive model and put it to work on new
data. Get results and reports – and automate decision-making based on the
output.
Monitor and refine your model. Regularly monitor your model to review its
performance and ensure it’s providing the expected results. Refine and
optimise your model as needed.
4.Prescriptive analytics:
Prescriptive Analytics is the area of Business Analytics dedicated to
searching out the best solution for day-to-day occurring problems. It is directly
related to the other two comparable processes, i.e. Descriptive and Predictive
Analytics. Prescriptive Analytics can be defined as a type of data analytics that uses
algorithms and analysis of raw data to achieve better and more effective decisions
for a long and short span of time. It suggests strategy over possible scenarios,
accumulated statistics, and past/present databases collected through the consumer
community.
Example:
Google’s Self-driving car, Waymo is a preferred example showing
prescriptive analytics. It showcases millions of calculations on every trip.
The car makes its own decision to turn in whichever direction, to
slow/speed up and even when and where to change lanes- these acts are
every day like any human being’s decision-making process while driving a
car.
25
DATA ANALYTICS
Predictive Prescriptive
Descriptive Analysis
Analysis Analysis
What’s going to
Summary What happened? What should happen?
happen?
Function It uses data mining and data It looks at It takes the conclusions
26
DATA ANALYTICS
It needs lots of
It requires a lot of past
It offers a limited view, and historical data to
data and often cannot
Cons doesn't go beyond the data’s work. It will
account for all possible
surface. never be 100%
variables.
accurate.
27
DATA ANALYTICS
The information will help to access the organization’s storage and
information system.
You will be able to take immediate action based on valuable information.
The companies are trends to focus on experiments with analytical
languages and tools to develop new ideas.
Benefits:
Improved Decision Making
Foremost among the top data analytics benefits is better decision-making. It
offers insightful, data-driven information that aids organizations in
understanding their customers, operations, and markets. They can spot
patterns, trends, and correlations. Moreover, they use this knowledge to make
well-informed choices supported by data and metrics rather than mere
guesswork. Businesses can boost productivity, cut costs, find new
opportunities, and reduce risks by optimizing their strategies and making more
informed decisions. Because they are based on actual data and analytics, data
analytics also enables organizations to make more transparent and dependable
decisions.
Increased Efficiency and Productivity
Data analytics enables organizations to increase efficiency and productivity by
automating and streamlining processes, maximizing resource allocation, and
minimizing manual labor. Businesses can streamline their workflows by
locating bottlenecks and getting rid of duplication. Additionally, data analytics
assists businesses in identifying areas where productivity can be increased,
such as waste reduction, better inventory control, and supply chain
optimization.
Enhanced Customer Experience
By giving organizations useful insights into customer behavior, preferences,
and needs, data analytics enables businesses to identify areas where they can
improve their customer experience–such as lowering wait times, enhancing
customer service, or streamlining user interfaces. Data analytics thus helps
businesses tailor their offerings to meet consumers’ unique needs, thus forging
closer ties with them and fostering greater customer loyalty.
Improved Risk Management
Businesses can find patterns and correlations in data from various sources that
point to potential risks. Data analytics can, for instance, assist companies in
identifying potential fraud, online threats, or operational risks. Businesses can
also take preventative action to mitigate potential risks by monitoring data in
real-time. By utilizing data analytics to enhance risk management, they can
lessen the possibility of monetary losses, reputational damage, and other
negative outcomes.
Competitive Advantage
Businesses can gain a competitive edge using data analytics to make more
informed, data-driven decisions. Analyzing data from various sources allows
businesses to understand market trends, consumer behavior, and competitor
activities. Businesses can use this information to improve their strategies, spot
new opportunities, and set themselves apart from the competition. Data
analytics can, for instance, aid companies in identifying underserved market
segments, anticipating client needs, and enhancing product offerings. Simply
put, businesses can increase their market share, spur revenue growth, and
fortify their brand by utilizing data analytics to gain a competitive advantage.
28
DATA ANALYTICS
Applications of data analytics in business
Data analytics plays a crucial role in modern business operations by
providing valuable insights and enabling data-driven decision-making. Here
are some key applications of data analytics in business:
1. Customer Analytics:
- Customer Segmentation: Analyzing customer data to group them based
on demographics, behavior, or preferences, helping businesses tailor their
marketing strategies.
- Churn Prediction: Identifying customers at risk of leaving and
implementing retention strategies.
- Cross-selling and Upselling: Recommending additional products or
services to existing customers based on their past behavior and preferences.
3. Operational Analytics:
- Supply Chain Optimization: Analyzing supply chain data to reduce costs,
optimize inventory, and improve delivery times.
- Quality Control: Using data to monitor and improve product quality,
reducing defects and waste.
- Resource Allocation: Allocating resources efficiently based on data
insights, optimizing workforce scheduling and resource usage.
4. Financial Analytics:
- Fraud Detection: Identifying unusual financial transactions or patterns to
detect and prevent fraud.
- Risk Management: Assessing and managing financial risks through data-
driven analysis.
- Financial Forecasting: Predicting future financial performance and
making investment decisions.
5. Human Resources:
- Employee Performance: Evaluating employee performance and
identifying areas for improvement.
- Talent Acquisition: Using data to make informed hiring decisions and
reduce employee turnover.
- Workforce Planning: Predicting workforce needs and optimizing staffing
levels.
6. Product Development:
- Product Analytics: Analyzing user data and feedback to enhance existing
products and develop new ones.
-Market Research: Using data to understand market trends and consumer
preferences.
- A/B Testing: Experimenting with product changes and analyzing data to
determine which version is more effective.
29
DATA ANALYTICS
7. Customer Service:
- Sentiment Analysis: Analyzing customer feedback and social media data
to gauge customer sentiment and address issues promptly.
- Chatbots and Virtual Assistants: Using data-driven AI solutions to
improve customer support and automate responses.
30