0% found this document useful (0 votes)
8 views22 pages

FDS-Unit II-ECE

Unit II covers various aspects of data, including types (structured vs. unstructured, quantitative vs. qualitative), collection methods (primary vs. secondary), and challenges in data quality and ethics. It also details data analysis and analytics techniques, such as descriptive, diagnostic, predictive, prescriptive, and mechanistic analysis, along with tools and technologies used in these processes. The document emphasizes the importance of data preprocessing, cleaning, transformation, and visualization in deriving meaningful insights from data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views22 pages

FDS-Unit II-ECE

Unit II covers various aspects of data, including types (structured vs. unstructured, quantitative vs. qualitative), collection methods (primary vs. secondary), and challenges in data quality and ethics. It also details data analysis and analytics techniques, such as descriptive, diagnostic, predictive, prescriptive, and mechanistic analysis, along with tools and technologies used in these processes. The document emphasizes the importance of data preprocessing, cleaning, transformation, and visualization in deriving meaningful insights from data.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Unit II

Data: Data Types, Data Collection, Data Pre-Processing, Data


Analysis and Analytics, Descriptive Analytics, Diagnostic
Analytics, Predictive and Perspective Analytics, Explorative
Analysis, Mechanistic Analysis.
1. Structured vs. Unstructured Data

● Structured Data – Organized in a tabular format (e.g., relational databases, Excel sheets).
● Unstructured Data – Does not follow a fixed format (e.g., text, images, videos, emails).

2. Quantitative vs. Qualitative Data

● Quantitative Data (Numerical) – Data that can be measured or counted.


○ Discrete Data – Whole numbers, countable (e.g., number of students, product count).
○ Continuous Data – Measured and can take any value within a range (e.g., temperature, height).
● Qualitative Data (Categorical) – Data that describes qualities or characteristics.
○ Nominal Data – No natural order (e.g., gender, colors, names).
○ Ordinal Data – Has a meaningful order (e.g., survey ratings, education levels).

3. Primary vs. Secondary Data

● Primary Data – Collected firsthand for a specific purpose (e.g., surveys, experiments).
● Secondary Data – Pre-existing data collected by others (e.g., research papers, reports).
1. Types of Data Collection Methods
A. Primary Data Collection (First-Hand Data)

Collected directly from sources for a specific purpose.

🔹 Surveys & Questionnaires – Structured forms to gather responses from individuals.


🔹 Interviews – One-on-one discussions for in-depth insights.
🔹 Observations – Monitoring behaviors or events in real-time.
🔹 Experiments – Controlled studies to test hypotheses.
🔹 Sensors & IoT Devices – Real-time data from physical devices.
🔹 Web Scraping – Automated extraction of data from websites.

B. Secondary Data Collection (Pre-Existing Data)

Data gathered from existing sources for analysis.

🔹 Databases & Repositories – Government, corporate, or public datasets (e.g., Kaggle, UCI).
🔹 APIs (Application Programming Interfaces) – Real-time access to online data (e.g., weather, social media).
🔹 Open Data Portals – Publicly available datasets (e.g., World Bank, WHO).
🔹 Research Papers & Reports – Academic and industry studies.
🔹 Logs & Transactions – System-generated records (e.g., server logs, purchase histories).
2. Data Collection Challenges
✔ Data Quality – Ensuring accuracy, consistency, and completeness.
✔ Ethical Considerations – Respecting privacy and security (e.g., GDPR, HIPAA).
✔ Volume & Scalability – Handling large datasets efficiently.

3. Tools & Technologies for Data Collection


SQL & Databases – MySQL, PostgreSQL, MongoDB
Web Scraping – BeautifulSoup, Scrapy
APIs – REST APIs, JSON, Postman
Data Pipelines – Apache Kafka, Airflow
Survey Tools – Google Forms, Qualtrics
1. Steps in Data Preprocessing

A. Data Cleaning (Handling Missing & Noisy Data)


🔹 Handling Missing Values

● Remove missing data (if minimal impact).


● Fill missing values using mean, median, mode
● Use predictive models (e.g., regression, k-NN imputation).

🔹 Handling Noisy Data

● Remove duplicate or irrelevant data.


● Apply smoothing techniques (e.g., moving average, binning).
● Use outlier detection methods (e.g., Z-score).
B. Data Transformation (Standardization & Encoding)
🔹 Scaling & Normalization
● Standardization (Z-score Normalization): Transforms data to
have a mean of 0 and a standard deviation of 1.
Data Analysis
Definition:
Data Analysis is the process of cleaning, processing, and interpreting raw data to extract meaningful
insights, patterns, and trends. It is often used for understanding past data and making data-driven
decisions.

Key Characteristics:
Descriptive & Diagnostic – Focuses on "What happened?" and "Why did it happen?"
Exploratory Approach – Identifies patterns, relationships, and anomalies in data.

Uses Statistical Methods – Mean, median, standard deviation, correlation, etc.


Visualization Techniques – Charts, graphs, histograms, heatmaps.

Example Use Case:


A retail company analyzes sales data to determine which products performed well last quarter.
Data Analytics
Definition:
Data Analytics goes a step further than analysis by applying advanced techniques (such as predictive
modeling, machine learning, and business intelligence) to find actionable insights and make future
predictions.

Key Characteristics:
Predictive & Prescriptive – Focuses on "What will happen?" and "How can we make it happen?"
Data-Driven Decision Making – Uses algorithms and AI to extract deeper insights.
Machine Learning & AI Techniques – Regression, clustering, deep learning.
Business Intelligence (BI) Applications – Dashboards, KPI tracking, forecasting models.

Example Use Case:


A bank uses data analytics to predict which customers are at risk of loan default based on their transaction
history and credit score.
Descriptive analytics is like a rear-view mirror, giving us a clear picture of past data to understand what has
happened. It's the simplest form of data analysis and is usually the first step in data processing.

it involves:

1. Summarizing Data: It involves collecting historical data and presenting it in a readable and understandable
format, using measures like mean, median, mode, etc.
2. Data Visualization: Use of charts, graphs, histograms, and other visual aids to make data easily digestible.
3. Identifying Trends: Helps in identifying patterns and trends over time, such as sales trends, customer
behavior, and operational performance.

Tools commonly used in descriptive analytics:

● Excel: Popular for its pivot tables, charts, and graphs.


● Tableau: Great for interactive data visualization.
● Power BI: Microsoft's tool for data visualization and business intelligence.
● SQL: Used to query and manage data in databases.
Diagnostic analytics is like playing detective with your data. It goes a step further than descriptive analytics by not only
looking at what happened but also trying to understand why it happened. This is crucial for uncovering root causes and making
better decisions moving forward.

Here's what it involves:

1. Identifying Anomalies: Look for outliers or unusual patterns that deviate from the norm.
2. Drill-Down Analysis: Dig deeper into data subsets to get more detailed insights and identify the underlying factors.
3. Correlation Analysis: Assess the relationships between different variables to understand how they influence one another.
4. Hypothesis Testing: Form and test hypotheses to determine the causes of specific outcomes.

Tools commonly used in diagnostic analytics:

● R and Python: These programming languages are popular for their powerful statistical and data analysis packages.
● SQL: Helpful for querying detailed data sets and performing complex joins.
● Tableau and Power BI: These tools allow you to drill down into visual data representations and explore the factors behind
the trends.
Prescriptive analytics is a type of data analytics that goes beyond
descriptive and diagnostic analytics. While descriptive analytics tells you what
happened and diagnostic analytics explains why it happened, prescriptive
analytics recommends actions you can take to achieve desired outcomes. It
leverages advanced techniques such as optimization algorithms, machine
learning, and simulation to provide actionable insights and guidance on the best
course of action.
Predictive analytics involves using historical data, machine learning algorithms, and statistical techniques
to predict future outcomes. It aims to forecast trends, behaviors, and events by analyzing patterns found in existing
data. This technique is widely applied across various industries, including finance, marketing, healthcare, and more.

Here are some key aspects of predictive analytics:

● Data Collection: Gathering relevant historical data from various sources.


● Data Cleaning: Ensuring the data is accurate and free of inconsistencies or errors.
● Data Analysis: Applying statistical methods and machine learning algorithms to identify patterns and
relationships in the data.
● Model Building: Creating predictive models that can forecast future events based on the analyzed data.
● Validation: Testing the models to ensure their accuracy and reliability.
● Deployment: Implementing the models into real-world scenarios to make predictions and informed decisions.
Steps in Exploratory Data Analysis
1. Data Collection: Gather raw data from various sources.
2. Data Cleaning: Handle missing values, duplicates, and incorrect data
entries.
3. Descriptive Statistics: Calculate basic statistics like mean, median,
mode, variance, and standard deviation.
4. Data Visualization: Create plots and graphs (e.g., histograms, scatter
plots, box plots) to visualize distributions and relationships.
5. Identifying Patterns: Look for trends, correlations, and outliers that
might indicate underlying relationships.
6. Hypothesis Testing: Formulate and test hypotheses about the data.
Mechanistic analysis is a method used to understand
the underlying mechanisms or processes that drive a
system's behavior. It involves breaking down complex
systems into their individual components and studying
how each part interacts to produce the observed
outcomes. This type of analysis is commonly used in
fields like biology, chemistry, physics, and
engineering.
Aspect Data Analytics Data Analysis

Scope Broad, includes various techniques Narrower, focuses on specific techniques

Goal To provide actionable insights for To uncover patterns and trends within the
decision-making data

Tools Advanced tools and software Statistical tools and basic software

Application Business intelligence, marketing, finance, etc. Research, academic studies, operations

You might also like