0% found this document useful (0 votes)
7 views12 pages

Types of Data

The document provides an overview of data science and data analytics, highlighting their goals, tools, and outputs. It also discusses various applications of data analytics in engineering, types of data, errors in data analysis, and confidence intervals. Key distinctions between data science and data analytics are outlined, along with examples and explanations of concepts such as continuous and discrete data, dendrograms, and types of errors.

Uploaded by

naveed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
7 views12 pages

Types of Data

The document provides an overview of data science and data analytics, highlighting their goals, tools, and outputs. It also discusses various applications of data analytics in engineering, types of data, errors in data analysis, and confidence intervals. Key distinctions between data science and data analytics are outlined, along with examples and explanations of concepts such as continuous and discrete data, dendrograms, and types of errors.

Uploaded by

naveed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Q1: Data Science and Data Analytics?

Data Science:

Data science involves drawing conclusions from data by applying methods such as statistical
modeling, machine learning, and big data tools. It covers every stage of the lifecycle, from gathering data to
implementing predictive models.

Examples:

• Fraud Detection: Banks use machine learning (e.g., Random Forests) to detect suspicious
transactions.

• Recommendation Systems: Netflix uses collaborative filtering to suggest movies based on user
behavior.

• Natural Language Processing (NLP): Analyzing social media sentiment to gauge brand perception.

Data Analytics:

The main goal of data analytics is to analyze past data to find answers to particular queries.
This process frequently uses descriptive statistics and visualization tools.

Examples:

• Sales Dashboard: Visualizing monthly revenue trends using Tableau.

• Customer Segmentation: Grouping users into "High-Value" or "Low-Value" categories using SQL.

• A/B Testing: Comparing website conversion rates for two UI designs.[1]


Q2: Difference between Data Science & Data Analytics?

Aspect Data Science[2] Data Analytics[1]

Goal Goal Predictive insights automation


Descriptive/diagnostic insights

Tools Tools Python, TensorFlow, SQL, Excel, Tableau


Hadoop

Output ML models, AI systems Reports, dashboards

Fig.1 Skill Level Difference[3]


Q3: Application of Data Analytics in Engineering?

Data analytics plays a crucial role in modern engineering by enabling engineers to make data-driven decisions,
optimize processes, and predict outcomes. Here are some key applications:

1. Predictive Maintenance:

In industries like manufacturing and aerospace, data analytics is used to predict


when machines or equipment are likely to fail, allowing for timely maintenance and reducing downtime.

Example:

Monitoring vibrations or temperature data from machines to predict wear and tear before a failure occurs.

2. Quality Control:

In manufacturing, data analytics is used to monitor product quality in real-time.


Engineers can detect defects, identify trends, and ensure products meet specifications by analyzing production
data.

Example:

Analyzing sensor data during the production process to identify defects in materials or processes early.

Energy Efficiency:

Data analytics helps optimize energy consumption in buildings, factories, and


infrastructure. Engineers analyze energy usage patterns to suggest ways to reduce energy costs and improve
sustainability.

Example:

Analyzing energy consumption in a building to optimize heating, ventilation, and air conditioning
(HVAC) systems.

Structural Health Monitoring:


In civil engineering, data analytics is applied to monitor the health of infrastructure, such as bridges or
buildings. By analyzing data from sensors embedded in structures, engineers can detect potential risks and
extend the life of the infrastructure.

Example:

Using strain gauge data to detect cracks or structural weaknesses in bridges.

Design Optimization:

Data analytics allows engineers to optimize designs based on performance data. This ensures that materials
are used efficiently, and designs are as cost-effective and functional as possible.

Example:

Analyzing data from wind tunnels or simulations to refine the design of an aircraft or vehicle.[4]

Q4: Data and types?

Data:

Data are pieces of information that can be numbers, facts, or symbols, used to describe things. They
can be discrete or continuous and help in understanding and decision-making.

Example:

• Prices and costs.

• Numbers of items sold.

• Employee names.

• Product names

Types of data:

There are many different types of data:


1. Continuous Data

2. Discrete Data

3. Categorical Data

4. Qualitative Data[5]

Q5: Continuous Data and Discrete Data?

Continuous Data:

Continuous data is the complete opposite of discrete data. This is type of numerical data that relates to
the countless potential measurements that can exist between two assumed points.

Example:

• The weather temperature

• The wind speeds

Discrete Data:

Discrete data involves integers and only a limited number of values is possible. This category of data
cannot be divided into smaller components.

Example:

The number of students who have attended the class.[6]

Q6: Categorical Data & Qualitative Data?

Categorical Data:

The categorical data includes categorical variables that describe characteristics such as a person’s
gender, hometown etc.

Example:
• Birthdate

• Favorite sport[7]

Qualitative Data:

Qualitative data is descriptive information that focuses on concepts and characteristics, rather than
numbers and statistics. The data cannot be counted, measured or expressed numerically.

Example:

• Research and observation.

• Interviews.[8]

Q7: Dendrograms?

A dendrogram is a tree-like diagram used in hierarchical clustering to represent the arrangement of data points
or clusters. It visually shows how different items or groups are related, based on their similarity or distance.

Key Features:

• Nodes: Represent data points or clusters.

• Branches: Indicate the similarity between clusters. Shorter branches mean higher similarity.

• Height: The height at which clusters merge shows the dissimilarity between them. Lower height means
greater similarity.

Applications:

• Bioinformatics: Used to visualize evolutionary relationships between species (e.g., phylogenetic trees).

• Psychology: Helps in grouping psychological traits or behaviors based on similarity.

• Marketing: Used for customer segmentation, grouping customers based on similar purchasing
behaviors.
• Machine Learning: Hierarchical clustering applications in classifying and grouping data.

Purpose:

• Dendrograms visually represent how objects are grouped hierarchically based on their similarities or
distances.[9]

Fig.2 Dendrograms[9]

Q8: Types of Error?

In data analysis and measurement, errors are typically categorized into two main types: Systematic Errors and
Random Errors.

1. Systematic Errors:

Errors that consistently occur in the same direction due to identifiable causes, lead
to measurements that are consistently higher or lower than the true value.

Characteristics:
• Predictable and reproducible.
• Often arise from faulty equipment, calibration issues, or environmental factors.
• Affect the accuracy of measurements.

Example:

A scale that consistently reads 2 kilograms heavier than the actual weight due to miscalibration.

2. Random Errors:

Errors that occur unpredictably and vary in magnitude and direction, often due to uncontrollable factors.

Characteristics:

• Unpredictable and vary with each measurement.


• Arise from factors like slight fluctuations in environmental conditions or limitations in measurement
precision.
• Affect the precision of measurements.

Example:

Slight variations in temperature affecting the reading of a thermometer.[10]

3. Type I and Type II Errors (in Statistical Hypothesis Testing):

Type I Error (False Positive):

• Rejecting a true null hypothesis.


• Example: Concluding that a new drug is effective when it is not.

Type II Error (False Negative):

• Failing to reject a false null hypothesis.


• Example: Concluding that a new drug is not effective when it actually is.[11]
Q9: Difference between Truncation and Round-off Error?

Truncation Error:

Occurs when exact models are replaced with approximations (e.g., finite-difference derivatives instead of
analytical derivatives).

Round-off Error:

Results from the inability of computers to represent real numbers exactly (e.g., limited mantissa bits in
floating-point numbers).

Aspect Truncation Error Round-off Error


Definition Caused by approximating exact Arises from finite precision in representing
mathematical procedures. numbers.

Source Using approximations (e.g., finite Limitations of digital computers (e.g.,


differences, Taylor series truncation). floating-point representation).

Example 𝑑𝑣 𝑣(𝑡+𝛥𝑡)−𝑣(𝑡) Storing 𝜋 as 3.141 instead of 3.14159265.


Approximating wth .
𝑑𝑡 𝛥𝑡

Dependency Increases with larger step sizes (ℎ). Decreases with larger step sizes (fewer
computations).

Mitigation Use higher-order approximations (e.g., more Increase precision (e.g., double-precision
Taylor terms). arithmetic).[12]

Q10: Confidence Intervals?

A confidence interval (CI) is a range of values used to estimate a population parameter, such as the population
mean, based on sample data. It helps assess the degree of uncertainty around the point estimate (e.g., sample
mean) and indicates the precision of the estimate.

Key Components:

Point Estimate:

A single value derived from sample data (like the sample mean or proportion) that estimates the population
parameter.

Margin of Error:
The range above and below the point estimate, reflecting the uncertainty in the estimate. It's calculated using
the sample's standard error and a critical value (e.g., Z-score for a specific confidence level).

Confidence Level:

The percentage of confidence that the true population parameter lies within the interval. Common levels are
90%, 95%, and 99%.

Confidence Interval Formula:

For a population mean where the population standard deviation (\sigma) is known, the confidence interval is
calculated as:

ˉ 𝜎
𝐶𝐼 = 𝑥 ± 𝑧 ×
√𝑛
Where:
ˉ
• 𝑥 is the sample mean,
• 𝑧 is the Z-score corresponding to the desired confidence level (e.g., 1.96 for 95%),
• 𝜎 is the population standard deviation,
• 𝑛 is the sample size.

Interpretation:

A 95% confidence interval means that if you were to repeatedly sample from the population and compute a
confidence interval from each sample, about 95% of those intervals would contain the true population
parameter. It’s important to understand that the interval either contains the true parameter or it doesn’t the
95% refers to the reliability of the method.

Example:

If a sample of 100 individuals has a mean height of 170 cm and a population standard deviation of 10 cm, the
95% confidence interval for the true population mean height would be:

10
170 ± 1.96 × = 170 ± 1.96 × 1 = (168.04,171.96)
√100
This means we are 95% confident that the true population mean height is between 168.04 cm and 171.96
cm.
Applications of Confidence Intervals:

• Estimating Population Parameters: CI is widely used in statistics to estimate parameters like means and
proportions, providing a range within which the true parameter lies.

• Hypothesis Testing: CIs are used in hypothesis testing to determine if a sample statistic is significantly
different from a hypothesized value.

• Quality Control: In manufacturing, CI is used to assess if production processes are within acceptable limits.

Common Misunderstandings:

• A CI does not mean there is a 95% chance that the true parameter lies within the interval. It means that if the
sampling process were repeated many times, 95% of those intervals would contain the true parameter.

• A 95% CI doesn’t imply 95% of data points fall within it.[13]

Fig.3 CI[14]
References:

[1] F. Provost and T. Fawcett, “Introduction: Data-Analytic Thinking,” Data Science for Business : What
You Need to Know About Data Mining and Data-Analytic Thinking., pp. 1–18, 2013, Accessed: Feb.
19, 2025. [Online]. Available:
https://www.researchgate.net/publication/256438799_Data_Science_for_Business
[2] “Data Analyst, Data Scientist, Data Engineer: Who’s the Real MVP of Data-Driven Decision
Making? | Rocket Recruiting Blog.” Accessed: Feb. 19, 2025. [Online]. Available:
https://www.getrocket.com/post/data-roles
[3] “Data Analyst Vs. Data Scientist: The Comparison To Distinguish The Best.” Accessed: Feb. 19,
2025. [Online]. Available: https://www.digitalvidya.com/blog/data-science-career-data-analyst-vs-
data-scientist/
[4] “Data Analysis in Engineering: Examples, Importance & Uses.” Accessed: Feb. 19, 2025. [Online].
Available: https://www.studysmarter.co.uk/explanations/engineering/professional-engineering/data-
analysis-in-engineering/
[5] T. H. Vines et al., “The availability of research data declines rapidly with article age,” Current
Biology, vol. 24, no. 1, p. <span class="nowrap">94-</span>97, Jan. 2014, doi:
10.1016/j.cub.2013.11.014.
[6] “Discrete vs. Continuous Data: What Is The Difference? | Whatagraph.” Accessed: Feb. 19, 2025.
[Online]. Available: https://whatagraph.com/blog/articles/discrete-vs-continuous-data
[7] “Categorical Data & Qualitative Data (Definition and Types).” Accessed: Feb. 19, 2025. [Online].
Available: https://byjus.com/maths/categorical-data/
[8] “What is Qualitative Data? | Definition from TechTarget.” Accessed: Feb. 19, 2025. [Online].
Available: https://www.techtarget.com/searchcio/definition/qualitative-data
[9] “Dendrogram - Wikipedia.” Accessed: Feb. 19, 2025. [Online]. Available:
https://en.wikipedia.org/wiki/Dendrogram
[10] “Errors in Measurement: Measurement, Gross Errors, Systematic Errors, Random Errors and FAQs.”
Accessed: Feb. 19, 2025. [Online]. Available: https://byjus.com/physics/accuracy-precision-error-
measurement/
[11] “Type I and type II errors - Wikipedia.” Accessed: Feb. 19, 2025. [Online]. Available:
https://en.wikipedia.org/wiki/Type_I_and_type_II_errors
[12] D. University, “Part 1 Chapter 4 Roundoff and Truncation Errors.”
[13] “Understanding Confidence Intervals | Easy Examples & Formulas.” Accessed: Feb. 19, 2025.
[Online]. Available: https://www.scribbr.com/statistics/confidence-interval/
[14] “Confidence Intervals and Z Score - Programmatically.” Accessed: Feb. 19, 2025. [Online].
Available: https://programmathically.com/confidence-intervals-and-z-score/

You might also like