0% found this document useful (0 votes)

21 views15 pages

Introduction-to-Data-Analytics

Uploaded by

Prince Lazarus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views15 pages

Introduction-to-Data-Analytics

Uploaded by

Prince Lazarus

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 15

Unit – 5

Data Analytics
Data analytics is a rapidly growing field that involves
collecting, cleaning, transforming, and analyzing large
datasets to extract meaningful insights and patterns. This
process helps organizations make informed decisions,
optimize operations, improve customer experiences, and
gain a competitive advantage in today's data-driven world.
This presentation will delve into the core concepts of data
analytics, exploring machine learning, big data analytics, the
significance of big data analytics, challenges associated with
handling large datasets, and the crucial steps involved in
preparing data for effective analysis.
What is Data Analytics?
Data analytics is the process of examining large datasets to uncover hidden
patterns, correlations, and insights. It encompasses various techniques and
approaches to analyze and interpret data.

Examine
Analyze large datasets

Uncover
Find hidden patterns

Interpret
Gain valuable insights
Concepts of Machine Learning
Machine learning is a key component of data analytics. It involves algorithms that can learn from and make
predictions or decisions based on data.

Supervised Learning Unsupervised Learning Reinforcement Learning

Algorithms learn from labeled Algorithms find patterns in Algorithms learn through
training data unlabeled data interaction with an environment
Big Data Analytics
Big data analytics refers to the process of examining large and varied datasets to uncover hidden patterns, unknown correlations,
market trends, customer preferences and other useful business information.

Volume Velocity Variety

Massive amounts of data are generated Data is generated at an unprecedented Big data encompasses a wide range of
every day, making it difficult to process rate, making it essential for data types, including structured data
and analyze manually. Big data organizations to process and analyze from databases, semi-structured data
analytics tools and techniques allow data in real-time to keep up with the from social media feeds, and
organizations to handle this volume of dynamic nature of business unstructured data from images, videos,
data effectively, enabling them to environments. Traditional methods of and audio files. Organizations can gain
derive valuable insights. The volume of data analysis are often too slow to a more comprehensive understanding
data is constantly growing, as new handle the velocity of big data. of their business and customers by
sources of data emerge and the Therefore, big data analytics analyzing this diverse range of data.
amount of data generated by existing techniques are employed to analyze Different types of data require different
sources increases. data as it is being generated, enabling analysis techniques, and big data
real-time decision-making and response analytics solutions offer flexibility to
to market changes. handle diverse data sources.
Need for Big Data Analytics
Big data analytics is crucial for organizations to gain valuable insights and make data-driven decisions.

1 Competitive 2 Improved 3 Cost Reduction

Advantage Decision Making
Identify inefficiencies and
Gain insights to outperform Make informed choices cut costs
competitors based on data

4 Innovation
Discover new opportunities and trends
Challenges in Big Data Analytics: Volume
One of the main challenges in big data analytics is dealing with the sheer volume of data. Every day, organizations are overwhelmed
by the sheer quantity of data generated by their operations, customer interactions, and external sources. From sensor data to social
media posts, the amount of information being collected is growing exponentially. This presents a significant challenge for
organizations looking to gain meaningful insights from their data, as they need to find ways to manage, store, and process these
massive datasets effectively.

Data Collection
Gathering large amounts of data from various sources

Storage
Storing massive datasets efficiently and securely

Processing
Analyzing and processing huge volumes of data quickly
Challenges in Big Data Analytics: Variety
Another challenge in big data analytics is handling the variety of data types and formats. This diversity poses significant challenges for organizations, as
different types of data require different analysis techniques and tools. The ability to effectively integrate and analyze data from various sources is crucial
for gaining a comprehensive understanding of the business environment.

Structured Data Unstructured Data Semi-structured Data

Organized data in predefined formats Data without a predefined structure Data with some organizational properties

For example, a retail organization might collect structured data from its point-of-sale systems, unstructured data from customer reviews on social media,
and semi-structured data from product descriptions on its website. Analyzing these diverse data sources together can provide valuable insights into
customer preferences, market trends, and product performance.

Furthermore, the variety of data formats adds complexity to the analysis process. Data from different sources may be stored in different formats, such as
CSV, JSON, XML, or PDF. Organizations need to ensure that data from various sources can be integrated and transformed into a format suitable for analysis.
Challenges in Big Data Analytics: Velocity
The high velocity of data generation and processing is another significant challenge in big data analytics.

1 Data Generation
Continuous creation of new data from various sources

2 Data Ingestion
Rapidly collecting and storing incoming data

3 Real-time Processing
Analyzing data as it arrives for immediate insights

4 Decision Making
Using real-time insights for quick decision-making
Challenges in Big Data Analytics: Vera
Ensuring the quality and reliability of data is a crucial challenge in big data analytics.

Data Accuracy Data Consistency Data Completeness

Ensuring data is correct and Maintaining uniformity across Ensuring all necessary data is
free from errors different data sources available

Data Timeliness
Ensuring data is up-to-date and relevant
Preparing Data for Analytics: Data Collection
The first step in preparing data for analytics is collecting relevant data from various sources. This process involves gathering information from
different systems, platforms, and devices to create a comprehensive dataset that can be analyzed for insights.

Databases Web Scraping

Structured data from internal systems, such as customer databases, Extracting data from websites, such as e-commerce platforms, social
sales records, and financial reports, provide a foundational media sites, and news articles, allows you to tap into a wealth of publicly
understanding of your organization's operations. These databases often available information. This data can provide insights into customer
contain well-organized information that can be easily integrated with behavior, market trends, and competitor activities.
other data sources for comprehensive analysis.

APIs IoT Devices

Accessing data from external services, such as weather APIs, social Collecting data from sensors and devices, such as smart home devices,
media APIs, and financial APIs, provides you with real-time data that can wearable fitness trackers, and industrial equipment, provides you with
enhance your analytical capabilities. By integrating this external data real-time data streams that can be analyzed to optimize operations,
with your internal data sources, you can gain deeper insights into improve efficiency, and personalize customer experiences.
market dynamics and customer behavior.
Preparing Data for Analytics: Data Cleaning
Data cleaning is a crucial step in preparing data for analytics, ensuring the quality and reliability of the dataset. It involves identifying and correcting errors, inconsistencies,
and missing values to ensure that the data is accurate, consistent, complete, and timely. This process can be time-consuming but is essential for drawing meaningful insights
from your data.

Identify Errors
The first step in data cleaning is identifying errors. This involves detecting inconsistencies, duplicates, and missing values within the dataset. Inconsistencies can
arise from data entry errors, different data formats, or changes in data definitions. Duplicates occur when the same data point is recorded multiple times, leading
to skewed results. Missing values indicate incomplete data, which can hinder your analysis.

Correct or Remove
Once errors are identified, you need to correct or remove them. This step requires careful consideration to avoid introducing bias into your data. For correctable
errors, you can use imputation techniques to fill in missing values or replace incorrect values with the correct ones. For irrecoverable errors, the data points might
need to be removed from the dataset.

Standardize
Data standardization ensures consistency in data formats and units. For instance, you might need to convert all dates to a standard format, ensure that all
numerical values are in the same units, or standardize categorical variables using consistent labels. This step ensures that your data is comparable and can be
analyzed effectively.

Validate
After cleaning your data, it's essential to validate its accuracy. This involves verifying that the cleaned data meets the desired quality standards and is suitable for
your analysis. This step can be achieved through automated validation processes or manual checks, depending on the complexity of your data and the level of
assurance required.
Preparing Data for Analytics: Data Integration
Data integration involves combining data from multiple sources into a unified view for analysis. This process is essential for gaining a
comprehensive understanding of your data and deriving meaningful insights. By integrating data from different sources, you can create a
holistic view of your business or research area, revealing patterns and trends that might not be visible when analyzing data in isolation.

Data Mapping Data Transformation Data Consolidation

Identify relationships between different Convert data into a common format. Data Merge data from various sources into a
data sources. Data mapping involves transformation is a critical step in data single dataset. Data consolidation is the
understanding the structure, format, and integration, ensuring that data from final stage of data integration, bringing
meaning of data from various sources. different sources can be combined together transformed data from different
This helps to establish connections effectively. This may involve converting sources into a unified dataset. This step
between different data fields and identify dates to a standard format, standardizing might involve combining data from
any potential conflicts or redundancies. A units of measurement, or handling different tables, files, or databases,
well-defined data mapping process is different data types. Proper data creating a single source of truth for your
crucial for ensuring consistency and transformation ensures that the integrated analysis. The consolidated dataset should
accuracy during integration. dataset is consistent and can be analyzed be clean, consistent, and ready for
reliably. analysis.
Preparing Data for Analytics: Feature Engineering
Feature Creation

Develop new features based on domain knowledge. This involves identifying relevant information that's not directly captured in the existing
features but can be derived from them or other data sources. For example, you might create a new feature that represents the average
purchase frequency of a customer, calculated from their past purchase history. This new feature can provide valuable insights into customer
behavior and help improve the accuracy of your models.

Feature Transformation
Apply mathematical functions to existing features to improve their distribution or relationship with the target variable. Common
transformations include log transformations for skewed data, standardization to ensure features have zero mean and unit variance, and
binning to group continuous features into discrete categories. For example, you could transform a continuous feature like age into a
categorical feature with three categories: young, middle-aged, and elderly. This can help models to better understand the relationship
between age and the target variable.
Feature Selection

Choose the most relevant features for analysis. This involves identifying features that contribute most to the predictive power of your model
and discarding those that are irrelevant or redundant. Feature selection can be performed using various techniques, such as correlation
analysis, feature importance, and feature elimination methods. By focusing on the most relevant features, you can simplify your model,
reduce overfitting, and improve its performance.
Preparing Data for Analytics: Data Sampl
Data sampling is the process of selecting a subset of data from a larger dataset for analysis, often used when
dealing with big data. This is essential because analyzing the entire dataset can be computationally expensive
and time-consuming, especially with large datasets. By selecting a representative sample, you can gain insights
from the data while managing resources effectively. Data sampling techniques help you create smaller,
manageable datasets that maintain the essential characteristics of the original dataset, allowing for faster
analysis and model training.
Sampling Method Description Use Case

Random Sampling Selecting data points randomly General purpose

Stratified Sampling Sampling from subgroups Ensuring representation

Cluster Sampling Sampling from groups or clusters Geographically dispersed data

Preparing Data for Analytics
Preparing data for analytics is a crucial step in the data analytics process. It involves data collection, cleaning,
integration, feature engineering, and sampling. These steps ensure that the data is ready for analysis, helping
to overcome the challenges in big data analytics and enabling organizations to gain valuable insights.

Data collection is the process of gathering relevant data from various sources, such as databases, APIs,
sensors, and social media. This step is vital for ensuring that you have the necessary data to answer your
research questions and make informed decisions. It's important to identify and collect data that is both
relevant and reliable.
Data cleaning is the process of identifying and correcting errors, inconsistencies, and missing values in the data.
This step is essential for ensuring the quality and accuracy of your data. Cleaning data can involve filling in
missing values, correcting inconsistencies, and removing duplicates. A clean dataset will lead to more reliable
analysis results.

Data integration involves combining data from multiple sources into a single, consistent dataset. This is important
for gaining a holistic view of the data and understanding the relationships between different data points. Data
integration can involve merging different datasets, resolving conflicts, and transforming data into a consistent
format.

Feature engineering is the process of creating new features or transforming existing features to improve the
performance of your models. It involves identifying patterns, relationships, and trends in the data, and creating
features that capture these insights. Feature engineering can significantly improve the accuracy and predictive
power of your models.

Data sampling is the process of selecting a subset of data from a larger dataset for analysis. This is often
necessary when dealing with large datasets, as analyzing the entire dataset can be computationally expensive
and time-consuming. Sampling techniques allow you to create smaller, manageable datasets that maintain the
essential characteristics of the original dataset, enabling faster analysis and model training.

Big Data
No ratings yet
Big Data
65 pages
Data Analysis _Unit1
No ratings yet
Data Analysis _Unit1
65 pages
big data
No ratings yet
big data
20 pages
Unit_1.pptx
No ratings yet
Unit_1.pptx
57 pages
Unit - I DA.pptx
No ratings yet
Unit - I DA.pptx
107 pages
2024 Coinbase Method
33% (3)
2024 Coinbase Method
36 pages
Big Data - Iv Bda
No ratings yet
Big Data - Iv Bda
143 pages
Unit 1 Notes Final Part C
No ratings yet
Unit 1 Notes Final Part C
38 pages
Bda Combined
No ratings yet
Bda Combined
102 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
BDA Unit 1 Bigdata Intro
No ratings yet
BDA Unit 1 Bigdata Intro
69 pages
Intro To Data Analytics
No ratings yet
Intro To Data Analytics
42 pages
Data Analytics Complete Notes
No ratings yet
Data Analytics Complete Notes
33 pages
Unit II
No ratings yet
Unit II
91 pages
BUSINESS ANALYTICS NOTES
No ratings yet
BUSINESS ANALYTICS NOTES
31 pages
DA Unit 2
No ratings yet
DA Unit 2
16 pages
Data Analytics
No ratings yet
Data Analytics
30 pages
1-DA (1).pptx
No ratings yet
1-DA (1).pptx
44 pages
Unit 1 Introduction
No ratings yet
Unit 1 Introduction
70 pages
chapter-1 Introduction to Data Analytics
No ratings yet
chapter-1 Introduction to Data Analytics
34 pages
What is Data Analytics
No ratings yet
What is Data Analytics
44 pages
BIG DATA ANALTICS (UNIT 1)
No ratings yet
BIG DATA ANALTICS (UNIT 1)
31 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
unit-1ppt
No ratings yet
unit-1ppt
29 pages
Module 6_Analytics and Big Data
No ratings yet
Module 6_Analytics and Big Data
24 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Getting An Overview of Big Data (Module1)
No ratings yet
Getting An Overview of Big Data (Module1)
58 pages
Super 25 Unit 1 and Unit 2 (1)
No ratings yet
Super 25 Unit 1 and Unit 2 (1)
15 pages
Unit II Notes
No ratings yet
Unit II Notes
36 pages
Unit 1 Introduction to Data Analytics
No ratings yet
Unit 1 Introduction to Data Analytics
20 pages
BIG-DATA-ANALYTICS-GROUP-6
No ratings yet
BIG-DATA-ANALYTICS-GROUP-6
23 pages
Insights Into Big Data: An Industrial Perspective
No ratings yet
Insights Into Big Data: An Industrial Perspective
52 pages
Unit-1 DA
No ratings yet
Unit-1 DA
23 pages
Unit I - BigData
No ratings yet
Unit I - BigData
47 pages
dataanalyticsunit-1[1]
No ratings yet
dataanalyticsunit-1[1]
26 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Pcsmg3hw01ei Manual Hw Pcsm Gen3 (1)
No ratings yet
Pcsmg3hw01ei Manual Hw Pcsm Gen3 (1)
62 pages
Midterm Data Analytics
No ratings yet
Midterm Data Analytics
15 pages
Document From Shivam
No ratings yet
Document From Shivam
35 pages
2.1_Data_Analytics[1]
No ratings yet
2.1_Data_Analytics[1]
16 pages
Big Data Analytics
No ratings yet
Big Data Analytics
37 pages
TP 4 2docuatrimestre
No ratings yet
TP 4 2docuatrimestre
10 pages
Big Data Analytics
No ratings yet
Big Data Analytics
14 pages
Big Data Analytics Project Proposal by Slidesgo
No ratings yet
Big Data Analytics Project Proposal by Slidesgo
12 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
As You Delve Into The World of Data Analytics
No ratings yet
As You Delve Into The World of Data Analytics
10 pages
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
No ratings yet
Big Data and Data Analysis: Offurum Paschal I Kunoch Education and Training College, Owerri
35 pages
1 - Konsep Big Data
No ratings yet
1 - Konsep Big Data
35 pages
1.big Data and Its Importance
No ratings yet
1.big Data and Its Importance
17 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Unit 1 Introduction To Data Analysis
No ratings yet
Unit 1 Introduction To Data Analysis
10 pages
What is Big Data
No ratings yet
What is Big Data
4 pages
Introduction to Big Data
No ratings yet
Introduction to Big Data
4 pages
Big Data Categories-Life Cycle
No ratings yet
Big Data Categories-Life Cycle
15 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Unit 1 - From Big Data Analytics PDF
No ratings yet
Unit 1 - From Big Data Analytics PDF
5 pages
Reviewed Big Data Assignment
No ratings yet
Reviewed Big Data Assignment
6 pages
SIT103_SIT772-2.1P
No ratings yet
SIT103_SIT772-2.1P
3 pages
CSE Syllebus of MBSTU
100% (2)
CSE Syllebus of MBSTU
41 pages
SVM90L
No ratings yet
SVM90L
28 pages
Zero Export - Technical Overview
No ratings yet
Zero Export - Technical Overview
4 pages
Audio_Deepfake_Approaches
No ratings yet
Audio_Deepfake_Approaches
31 pages
Over load VK3-Control-Operating-Manual
No ratings yet
Over load VK3-Control-Operating-Manual
8 pages
Introduction To Assembly Language: Lab Experiment # 1
No ratings yet
Introduction To Assembly Language: Lab Experiment # 1
4 pages
Ans 5
No ratings yet
Ans 5
10 pages
Kalyan Gulbarga Cardio Commercial
No ratings yet
Kalyan Gulbarga Cardio Commercial
18 pages
Standard Chartered Bank BD
No ratings yet
Standard Chartered Bank BD
1 page
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
No ratings yet
Deep Learning With Databricks: Srijith Rajamohan, Ph.D. John O'Dwyer
38 pages
The Agileguidetobusinessanalysts
No ratings yet
The Agileguidetobusinessanalysts
5 pages
Analysis On Ornithopter
No ratings yet
Analysis On Ornithopter
16 pages
PGP in Data Science and AI With Fellowship
No ratings yet
PGP in Data Science and AI With Fellowship
14 pages
Hydraulic System Diagram /spare Parts Manual
100% (1)
Hydraulic System Diagram /spare Parts Manual
4 pages
Customer Perception On E-Banking Services - A Study With Reference To Private and Public Sector Banks
No ratings yet
Customer Perception On E-Banking Services - A Study With Reference To Private and Public Sector Banks
12 pages
Amazon Basics Headset Troubleshooting 2
No ratings yet
Amazon Basics Headset Troubleshooting 2
4 pages
Project Report of M&P
No ratings yet
Project Report of M&P
16 pages
Blank Print Document
No ratings yet
Blank Print Document
1 page
Communication Networks Leon Garcia PDF
0% (1)
Communication Networks Leon Garcia PDF
2 pages
Performance Evaluation and Off Design Analysis of The HP and LP Feed Water Heaters On A 3 135 MW Coal Fired Power Plant 2168 9873 1000308
No ratings yet
Performance Evaluation and Off Design Analysis of The HP and LP Feed Water Heaters On A 3 135 MW Coal Fired Power Plant 2168 9873 1000308
14 pages
5G Network Slicing: A Security Overview
No ratings yet
5G Network Slicing: A Security Overview
11 pages
Formal Techniques For Optimizing Iso 26262 Fault Analysis
No ratings yet
Formal Techniques For Optimizing Iso 26262 Fault Analysis
9 pages
9 Jeff PDF
No ratings yet
9 Jeff PDF
35 pages
The Healing Tablet
No ratings yet
The Healing Tablet
4 pages
Yamaha FX HO Service Manual
64% (36)
Yamaha FX HO Service Manual
515 pages
Single Channel Speech Dereverberation Using The LP Residual Cepstrum
No ratings yet
Single Channel Speech Dereverberation Using The LP Residual Cepstrum
5 pages
Informatica: The Powercenter/Powermart
No ratings yet
Informatica: The Powercenter/Powermart
3 pages
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet
Business Analytics: Leveraging Data for Insights and Competitive Advantage
From Everand
Business Analytics: Leveraging Data for Insights and Competitive Advantage
Ronald BLaha
No ratings yet
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
From Everand
Data-Driven Business Strategies: Understanding and Harnessing the Power of Big Data
Steven Vollmer
No ratings yet
Data Analytics with Python: Data Analytics in Python Using Pandas
From Everand
Data Analytics with Python: Data Analytics in Python Using Pandas
Frank Millstein
3/5 (1)

Introduction-to-Data-Analytics

Uploaded by

Introduction-to-Data-Analytics

Uploaded by

Unit – 5

Supervised Learning Unsupervised Learning Reinforcement Learning

Volume Velocity Variety

1 Competitive 2 Improved 3 Cost Reduction

Structured Data Unstructured Data Semi-structured Data

Data Accuracy Data Consistency Data Completeness

Databases Web Scraping

APIs IoT Devices

Data Mapping Data Transformation Data Consolidation

Random Sampling Selecting data points randomly General purpose

Stratified Sampling Sampling from subgroups Ensuring representation

Cluster Sampling Sampling from groups or clusters Geographically dispersed data

You might also like