Chapter-1 Introduction To Data Analytics

Uploaded by

tinayetakundwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views34 pages

Chapter-1 Introduction To Data Analytics

Uploaded by

tinayetakundwa

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 34

Chapter-1

Introduction to Data Analytics

Prepared By :- Assistant Professor Manthan Rankaja
Definition of Data Analytics
• Data Analytics involves the use of specialized systems and software to
analyze data and draw insights from it.
• In the era of big data, analytics help organizations make informed
decisions, predict trends, and understand customer behavior.
Applications of Data Analytics
• Various industries where data analytics is applied: Healthcare
(predicting disease outbreaks), Finance (fraud detection), Retail
(customer segmentation), and many more.
• Real-world examples of data analytics: Netflix’s recommendation
system, Credit card fraud detection, etc
Types of Data Analytics
• Descriptive: Analyzes historical data to understand what has
happened.
• Diagnostic: Digs deeper into data to understand the root cause of the
outcome.
• Predictive: Uses statistical models and forecasts techniques to
understand the future.
• Prescriptive: Uses optimization and simulation algorithms to advise
on possible outcomes.
Descriptive Analytics
• Definition: Descriptive Analytics deals with the analysis of historical
data to understand changes that have occurred in a business.
• Use cases: Sales trend analysis, Social media trend analysis.
• Examples: Monthly revenue report, Social media post reach analysis.
Diagnostic Analytics
• Definition: Diagnostic Analytics is a form of advanced analytics that
examines data to answer the question “Why did it happen?”.
• Use cases: Sales decline analysis, Customer churn analysis.
• Examples: Analyzing customer feedback to understand a drop in
product sales, Studying customer behavior data to understand churn.
Predictive Analytics
• Definition: Predictive Analytics uses statistical techniques and
machine learning algorithms to understand the future.
• Use cases: Customer lifetime value prediction, Predictive
maintenance.
• Examples: Using past purchase history to predict a customer’s future
purchase, Predicting machine failure using sensor data.
Prescriptive Analytics
• Definition: Prescriptive Analytics goes beyond predicting future
outcomes by also suggesting actions to benefit from the predictions.
• Use cases: Supply chain optimization, Personalized marketing.
• Examples: Optimizing delivery routes in real-time to save costs,
Personalizing marketing messages based on customer behavior
prediction.
Types of Data
• Structured data: Data that is organized and formatted so it’s easily
readable.
• For example, a database of customer information where data is
organized in rows and columns.
• Unstructured data: Data that doesn’t follow a specified format. For
example, emails, social media posts, etc.
• Semi-structured data: A mix of structured and unstructured data. For
example, a document which contains metadata.
Structured Data
• Definition: Structured data is highly organized and formatted in a way
so it’s easily searchable in relational databases.
• Examples:
Customer databases, Excel spreadsheets, etc.
• Advantages:
Easy to enter, store, query, and analyze.
• Disadvantages:
Requires a lot of time and resources to maintain.
Not suitable for complex, interconnected data.
Unstructured Data
• Definition: Unstructured data is not organized in a pre-defined
manner or does not have a pre-defined data model. It is difficult to
process and analyze.
• Examples: Word documents, PDFs, emails, audio files, etc.
• Advantages: Can capture nuanced information. More flexible as it
does not require a predefined schema.
• Disadvantages: Difficult to analyze and process. Requires more
storage space.
Semi-Structured Data
• Definition: Semi-structured data is a type of data that is both raw and
formatted, falling somewhere in between structured and
unstructured data.
• Examples: XML files, JSON files, etc.
• Advantages: More flexible than structured data, while still being
easier to analyze than unstructured data.
• Disadvantages: Can be more complex to work with and manage
compared to structured data.
•XML: extensible Markup Language
<person>
<name>John Doe</name>
<email>[email protected]</email>
<age>30</age>
</person>
JSON: JavaScript Object Notation

{
"person": {
"name": "John Doe",
"email": "[email protected]",
"age": 30
}
}
Data Sources
• Explanation:
• Data sources are the locations, files, databases, or services where
data comes from.
• Understanding data sources is important as the quality and reliability
of the data can greatly impact the results of data analysis.
Databases
• Explanation: Databases are structured sets of data. They are a
common source of data for analytics.
• Discussion: There are different types of databases,
• such as SQL (relational databases) and
• NoSQL (non-relational databases like MongoDB).
• Examples: Customer information in a SQL database, product
information in a NoSQL database.
Web Data
• Explanation: Web data refers to data that is obtained from the
internet. This can include data scraped from websites, data from
social media platforms, etc.
• Discussion: Different types of web data include text data, user
behaviour data, transactional data, etc.
• Examples: Tweets scraped from Twitter for sentiment analysis,
product reviews scraped from e-commerce websites.
Sensor Data
• Explanation: Sensor data is data that is collected by sensors, which
can be anything from temperature sensors to motion sensors.
• Discussion: Different types of sensor data include time series data,
spatial data, etc.
• This data is often used in IoT (Internet of Things) applications.
• Examples: Temperature data from a weather station, accelerometer
data from a smartphone
Data Collection Types
• Primary data collection involves gathering new data directly from the
source,
• while secondary data collection involves using data that already
exists, such as data from existing databases or data collected by
others.
Data Collection Methods
• Explanation: Data collection methods refer to how we obtain data.
• Common methods include surveys, where we ask people for
information;
• experiments, where we observe outcomes under controlled
conditions;
• observations, where we collect data about real-world behavior.
Data Preprocessing
• Definition: Data preprocessing is the process of cleaning and
transforming raw data into an understandable format.
• It’s a crucial step before data analysis or data modeling.
• Overview:
• Preprocessing involves data cleaning (removing noise and
inconsistencies),
• data transformation (normalizing data),
• data integration (combining data from various sources).
Data Cleaning
• Definition: Data cleaning involves handling missing values, removing
duplicates, and treating outliers.
• It ensures the quality of the data and improves the accuracy of the
insights derived from it.
• Discussion: Techniques include imputation for handling missing
values, deduplication for removing duplicate data, and outlier
detection methods for identifying and handling anomalies in the data.
Data Transformation
• Definition: Data transformation involves changing the format,
structure, or values of data to prepare it for analysis.
• It can involve
• normalization (scaling data to a small, specified range),
• standardization (shifting the distribution of each attribute to have a
mean of zero and a standard deviation of one),
• binning (converting numerical variables into categorical
counterparts).
• Discussion: These techniques help in reducing the complexity of data
and making data compatible for analysis.
Normalization

• Normalization involves scaling data to fit within a small, specified

range, typically between 0 and 1. This is useful when you want to
ensure that all features contribute equally to the analysis. The
formula for min-max normalization is:

• [ 10, 20, 30, 40, 50 ] >[ 0, 0.25, 0.5, 0.75, 1 ]

Standardization
• Standardization transforms data to have a mean of zero and a
standard deviation of one. This is useful when you want to compare
data that have different units or scales. The formula for
standardization is

• [ 10, 20, 30, 40, 50 ] >[ -1.41, -0.71, 0, 0.71, 1.41 ]

Data Integration
• Definition: Data integration involves combining data from different
sources and providing users with a unified view of the data.
• Discussion: This process becomes significant in a variety of situations,
which include both
• commercial (when two similar companies need to merge their
databases)
• scientific (combining research findings from different bioinformatics
repositories, for example) applications.
Data Analytics Tools
• Data analytics tools are software applications used to process and
analyze data. They help data analysts manage and interpret data from
various sources
• We will be discussing the features and use cases of popular data
analytics tools like R, Python, and SAS.
SAS
• Introduction to SAS:
SAS (Statistical Analysis System) is a software suite developed by SAS
Institute for advanced analytics, business intelligence, data
management, and predictive analytics.
• Key features and use cases of SAS in data analytics:
SAS provides a graphical point-and-click user interface for non-technical
users and more advanced options through the SAS language.
It is widely used in the corporate world.
R
• Introduction to R:
• R is a programming language and free software environment for
statistical computing and graphics.
It is widely used among statisticians and data miners for developing
statistical software and data analysis.
• Key features and use cases of R in data analytics:
R provides a wide variety of statistical and graphical techniques and is
highly extensible.
It is used in fields like healthcare, finance, academia, etc.
Python
• Python is a high-level, interpreted programming language. It is known
for its simplicity and readability, making it a popular choice for
beginners and experts in data analytics
• Python has powerful libraries for data manipulation and analysis like
pandas, NumPy, and SciPy.
• It is used in various domains like web development, machine learning,
AI, and more
Data Analytics Technologies
• Data analytics technologies refer to the frameworks and systems used
to process and analyze large datasets. They are designed to handle
big data and are essential for advanced analytics.
• Discussion on various technologies such as Hadoop, Spark, etc.: We
will be discussing the features and use cases of popular data analytics
technologies like Hadoop and Spark.
Hadoop
• Hadoop is an open-source software framework for storing data and
running applications on clusters of commodity hardware.
• It provides massive storage for any kind of data, enormous processing
power, and the ability to handle virtually limitless concurrent tasks or
jobs.
• Key features and use cases of Hadoop in data analytics: Hadoop is
known for its scalability, cost-effectiveness, flexibility, and fault
tolerance.
• It is used in various industries like finance, healthcare, media, etc.
Spark
• Introduction to Spark: Spark is an open-source, distributed computing
system used for big data processing and analytics.
• It provides an interface for programming entire clusters with implicit
data parallelism and fault tolerance.
• Key features and use cases of Spark in data analytics: Spark is known
for its speed, ease of use, and versatility.
• It can be used for various tasks like batch processing, real-time data
streaming, machine learning, etc.

Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
9 pages
List of Experiment
No ratings yet
List of Experiment
51 pages
UNIT1
100% (1)
UNIT1
37 pages
Module 1 & 2 DAEH QB
No ratings yet
Module 1 & 2 DAEH QB
69 pages
DA Merge Notes (30!09!24)
No ratings yet
DA Merge Notes (30!09!24)
348 pages
Summary - Introduction To Data Analytics (2) - 3978
No ratings yet
Summary - Introduction To Data Analytics (2) - 3978
7 pages
Da Unit-1
No ratings yet
Da Unit-1
24 pages
Lecture 3 (DS) - Steps in Data Science Process
No ratings yet
Lecture 3 (DS) - Steps in Data Science Process
57 pages
Big Data and Analytics
No ratings yet
Big Data and Analytics
86 pages
Introd Ata Lytics
No ratings yet
Introd Ata Lytics
32 pages
Chapter 1
No ratings yet
Chapter 1
149 pages
TMP/TME3413 Software Engineering Lab
No ratings yet
TMP/TME3413 Software Engineering Lab
13 pages
DAUnit 1
No ratings yet
DAUnit 1
20 pages
Unitwise Imp Notes
No ratings yet
Unitwise Imp Notes
34 pages
GD Nist 800 53 Compliance Controls
No ratings yet
GD Nist 800 53 Compliance Controls
21 pages
Chapter 1 DA
No ratings yet
Chapter 1 DA
73 pages
Data Analytics and Supporting Services - Module 3-1
No ratings yet
Data Analytics and Supporting Services - Module 3-1
65 pages
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
100% (1)
Nosql Database Systems: M.Tech. (Iind, Sem Ce/Cn)
135 pages
Unit 1ppt 241202105748 Ba1c594f
No ratings yet
Unit 1ppt 241202105748 Ba1c594f
30 pages
(BIT-601) Data Analytics Question Bank
No ratings yet
(BIT-601) Data Analytics Question Bank
56 pages
Data Analytics For IOT
No ratings yet
Data Analytics For IOT
57 pages
Data Analytics Quantum
No ratings yet
Data Analytics Quantum
144 pages
Vendor Attestation Policy PDF
No ratings yet
Vendor Attestation Policy PDF
9 pages
DAVAI Macro
No ratings yet
DAVAI Macro
6 pages
Data Analytics
No ratings yet
Data Analytics
20 pages
Unit 1ppt
No ratings yet
Unit 1ppt
29 pages
DA Unit 2 Trio 1
No ratings yet
DA Unit 2 Trio 1
26 pages
Unit-II Notes
No ratings yet
Unit-II Notes
9 pages
1 A
No ratings yet
1 A
177 pages
Unit II
No ratings yet
Unit II
91 pages
Unit 1
No ratings yet
Unit 1
54 pages
Unit 1
No ratings yet
Unit 1
36 pages
Da Mod 1
No ratings yet
Da Mod 1
60 pages
Introduction To Data Analytics
No ratings yet
Introduction To Data Analytics
15 pages
Introduction To Data Analysis
No ratings yet
Introduction To Data Analysis
94 pages
Lecture 2
No ratings yet
Lecture 2
50 pages
Data Analytics
No ratings yet
Data Analytics
5 pages
Data Analytics-Wps Office
No ratings yet
Data Analytics-Wps Office
21 pages
Session1 DataCharacteristics
No ratings yet
Session1 DataCharacteristics
41 pages
50 Shades Blue Resulting S4hana Migration Guide
No ratings yet
50 Shades Blue Resulting S4hana Migration Guide
27 pages
Business Analytics Summary (Units 1.2 - 1.8)
No ratings yet
Business Analytics Summary (Units 1.2 - 1.8)
8 pages
L01-Fundamentals of Big Data and Data Analytics
No ratings yet
L01-Fundamentals of Big Data and Data Analytics
58 pages
1 - Konsep Big Data
No ratings yet
1 - Konsep Big Data
35 pages
Globally Distributed, Secure MongoDB With Azure Cosmos DB
No ratings yet
Globally Distributed, Secure MongoDB With Azure Cosmos DB
23 pages
Data Analysis - Unit1
No ratings yet
Data Analysis - Unit1
65 pages
What Is Data Analytics
No ratings yet
What Is Data Analytics
12 pages
Communications Model For Data Governance Team Deliverable Template
No ratings yet
Communications Model For Data Governance Team Deliverable Template
3 pages
AA THeory and Methods
No ratings yet
AA THeory and Methods
40 pages
ToolKit 1 - Unit 1 - Introduction To Data Analytics
No ratings yet
ToolKit 1 - Unit 1 - Introduction To Data Analytics
15 pages
320 Red Teaming Exchange PDF
No ratings yet
320 Red Teaming Exchange PDF
105 pages
Unit 1
No ratings yet
Unit 1
21 pages
3.3 Infrastructure Automation
No ratings yet
3.3 Infrastructure Automation
20 pages
Database Management
No ratings yet
Database Management
22 pages
Download
No ratings yet
Download
4 pages
Unit 1 Introduction To Data Analytics
No ratings yet
Unit 1 Introduction To Data Analytics
20 pages
All About Data Science
No ratings yet
All About Data Science
35 pages
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
No ratings yet
Assignment OF Data Science (AIT 120) : Submitted To: Submitted by
10 pages
Snowflake and Its Benefits
No ratings yet
Snowflake and Its Benefits
93 pages
Week 1 Lecture
No ratings yet
Week 1 Lecture
26 pages
White Box Testing
No ratings yet
White Box Testing
3 pages
Unit 1
No ratings yet
Unit 1
57 pages
FortiAnalyzer Administrator
No ratings yet
FortiAnalyzer Administrator
2 pages
Fusion Architecture and Overview
No ratings yet
Fusion Architecture and Overview
4 pages
Dataanalyticsunit 1
No ratings yet
Dataanalyticsunit 1
26 pages
Software Testing Release Life Cycle: Draft Draft
No ratings yet
Software Testing Release Life Cycle: Draft Draft
6 pages
Paysend Privacy Notice: Let'S Introduce Ourselves
No ratings yet
Paysend Privacy Notice: Let'S Introduce Ourselves
10 pages
Da Unit-1
No ratings yet
Da Unit-1
23 pages
2.1 Data Analytics
No ratings yet
2.1 Data Analytics
16 pages
Ovn-Architecture 7 Openvswitch-Manual
No ratings yet
Ovn-Architecture 7 Openvswitch-Manual
16 pages
Module 4: Memory System Organization & Architecture
No ratings yet
Module 4: Memory System Organization & Architecture
97 pages
Saa S
No ratings yet
Saa S
2 pages
Network Security or Firewall Policy
No ratings yet
Network Security or Firewall Policy
4 pages
Hazelcast IMDG Azure Deployment Guide v1.2 PDF
No ratings yet
Hazelcast IMDG Azure Deployment Guide v1.2 PDF
4 pages
Unit 1 Topic 1 Intro
No ratings yet
Unit 1 Topic 1 Intro
30 pages
Commands List-22
No ratings yet
Commands List-22
3 pages
CCNA Routing and Switching Charter 5
No ratings yet
CCNA Routing and Switching Charter 5
4 pages
Sdrdpy: An Application To Graphically Visualize The Knowledge Obtained With Supervised Descriptive Rule Algorithms
No ratings yet
Sdrdpy: An Application To Graphically Visualize The Knowledge Obtained With Supervised Descriptive Rule Algorithms
17 pages
ABW2011IQM15
No ratings yet
ABW2011IQM15
21 pages
7 PN 4 D 54 ZBPF 6
No ratings yet
7 PN 4 D 54 ZBPF 6
9 pages
Unit 1
No ratings yet
Unit 1
61 pages
Notes - KCS 061 Big Data Unit 1
No ratings yet
Notes - KCS 061 Big Data Unit 1
25 pages
Oose Mcqs Mid Term: COMSATS University Islamabad, Abbottabad Campus
No ratings yet
Oose Mcqs Mid Term: COMSATS University Islamabad, Abbottabad Campus
2 pages
Unit 1
No ratings yet
Unit 1
21 pages
Syllabus BTech First Yr PPS R-17-18
No ratings yet
Syllabus BTech First Yr PPS R-17-18
2 pages
CSE3343 CC QB Module 1,2 New
No ratings yet
CSE3343 CC QB Module 1,2 New
7 pages
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
From Everand
DATA ANALYSIS AND DATA SCIENCE: Unlock Insights and Drive Innovation with Advanced Analytical Techniques (2024 Guide)
WINTON CLEM
No ratings yet
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Mastering Data Mining Techniques
From Everand
Mastering Data Mining Techniques
Dhaanyalakshmi Ahuja
No ratings yet
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
Data Analytics and Data Processing Essentials
From Everand
Data Analytics and Data Processing Essentials
gareth thomas
No ratings yet

Chapter-1 Introduction To Data Analytics

Uploaded by

Chapter-1 Introduction To Data Analytics

Uploaded by

Chapter-1

Introduction to Data Analytics

• Normalization involves scaling data to fit within a small, specified

• [ 10, 20, 30, 40, 50 ] >[ 0, 0.25, 0.5, 0.75, 1 ]

• [ 10, 20, 30, 40, 50 ] >[ -1.41, -0.71, 0, 0.71, 1.41 ]

You might also like