0% found this document useful (0 votes)

38 views7 pages

DS

Uploaded by

24f1002613

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views7 pages

DS

Uploaded by

24f1002613

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 7

1. Why is Data Science important?

• Data Science helps extract meaningful insights from massive amounts of structured and
unstructured data.

• It aids in decision-making, predictive analytics, and automation.

• Applications include healthcare (diagnostics), business (customer behavior analysis), finance

(fraud detection), and more.

2. Differentiate nominal and ordinal data.

Nominal Data Ordinal Data

Represents categories without any

Represents categories with a defined order.
order.

Example: Education levels (High School, Bachelor's,

Example: Colors (Red, Blue, Green).
Master's).

Cannot perform arithmetic operations. Can rank values but not quantify differences.

3. How to handle missing values in a dataset?

• Delete Rows/Columns: If the missing data is minimal and non-critical.

• Imputation:

o Replace with mean, median, or mode.

o Use regression or KNN to predict missing values.

• Flag Missing Data: Create a separate category for missing values.

• Use advanced techniques like machine learning models to estimate missing values.

4. Differentiate the operations of descriptive and inferential statistics.

Descriptive Statistics Inferential Statistics

Summarizes and describes data. Makes predictions or inferences about a population.

Example: Mean, Median, Mode. Example: Hypothesis testing, confidence intervals.

Does not go beyond the data analyzed. Uses sample data to draw conclusions about a larger group.

5. Logistic regression is not preferred for prediction. Justify.

• Logistic regression assumes a linear relationship between independent variables and the log
odds, which may not hold true in complex datasets.
• It cannot capture intricate patterns or non-linear relationships as well as machine learning
models like Random Forests or Neural Networks.

• It may struggle with overfitting or underfitting, depending on the dataset.

6. State Bayes' Theorem.

Bayes' Theorem provides a way to calculate the probability of an event based on prior knowledge:

P(A∣B)=P(B∣A)⋅P(A)P(B)P(A|B) = \frac{P(B|A) \cdot P(A)}{P(B)}P(A∣B)=P(B)P(B∣A)⋅P(A)

Where:

• P(A∣B)P(A|B)P(A∣B): Probability of event AAA given BBB.

• P(B∣A)P(B|A)P(B∣A): Probability of event BBB given AAA.

• P(A)P(A)P(A): Prior probability of AAA.

• P(B)P(B)P(B): Prior probability of BBB.

7. How can the maximum marginal hyperplane be fixed in SVM?

• The maximum marginal hyperplane (MMH) is the decision boundary that maximizes the
margin (distance) between classes.

• SVM optimizes the MMH by solving a convex optimization problem using techniques like
Lagrange multipliers.

• Kernel functions (e.g., linear, polynomial, RBF) can be used to handle non-linear data.

8. Compare the working of decision trees and random forests.

Decision Tree Random Forest

Single tree structure used for decision-making. Collection of multiple decision trees (ensemble).

Prone to overfitting. Reduces overfitting by averaging multiple trees.

Faster to train and interpret. Slower but more accurate.

9. Mention any four different types of plots used for data visualization.

1. Bar Plot: Compare categorical data.

2. Scatter Plot: Visualize relationships between two variables.

3. Histogram: Show frequency distribution of a dataset.

4. Box Plot: Summarize data distribution and detect outliers.

10. List any four common data science applications used in real time.

1. Healthcare: Disease prediction and diagnostics.

2. Finance: Fraud detection and credit scoring.

3. Retail: Recommendation systems and inventory management.

4. Transportation: Route optimization and autonomous vehicles.

11. What is outlier analysis?

Outlier analysis involves identifying data points that deviate significantly from the majority of the
dataset.

• Causes: Data entry errors, fraud, or true variability.

• Techniques: Z-score, IQR (Interquartile Range), or machine learning-based anomaly

detection.

12. Define data wrangling.

Data wrangling is the process of cleaning, transforming, and preparing raw data into a usable format.
Steps include:

• Removing duplicates.

• Handling missing values.

• Converting data types.

• Normalizing or scaling data for analysis.

Let me know if you'd like more details on any of these topics!

You said:
ChatGPT said:

ChatGPT

Here are the answers to the questions from the second image:

1. Differentiate between structured and unstructured data, providing examples of each.

Structured Data Unstructured Data

Organized and stored in tabular formats (e.g., databases). Does not have a pre-defined format.

Example: Customer information table with columns (Name, Example: Images, videos, audio files,
Age, Email). emails.

Requires more preprocessing to

Easy to search and analyze.
analyze.

2. List the relationship between computer science, data science, and traditional sciences.

• Computer Science: Provides computational tools, algorithms, and infrastructure for data
analysis (e.g., programming, databases).

• Data Science: Combines computer science and statistics to extract insights from data.

• Traditional Sciences: Use data science for experimentation, modeling, and simulations (e.g.,
biology for genomics, physics for particle analysis).

3. State statistical inference and its significance.

Statistical inference involves drawing conclusions about a population based on a sample of data.
Significance:

• Helps make predictions or test hypotheses.

• Provides tools to generalize findings from a limited dataset to a broader context.

4. Define correlation matrix.

A correlation matrix is a table showing correlation coefficients between variables.

• Each cell indicates the strength and direction of the relationship between two variables.

• Values range from -1 (strong negative) to +1 (strong positive), with 0 indicating no

correlation.

5. Compare random forests and decision trees as ensemble learning methods.

Decision Tree Random Forest

Uses a single tree to make decisions. Combines multiple decision trees for better accuracy.

Prone to overfitting. Reduces overfitting by averaging predictions.

Simpler and faster. More complex and computationally intensive.

6. Tell about the Naive Bayes algorithm and its utilization in text classification.

• Naive Bayes is a probabilistic algorithm based on Bayes' Theorem.

• Assumes independence between features, making it computationally efficient.

• Utilization in text classification:

o Spam detection (classifying emails as spam or not spam).

o Sentiment analysis (positive or negative reviews).

7. Define Data Visualization and its importance in data science.

Definition: Data visualization is the graphical representation of information and data using charts,
graphs, and maps.
Importance:

• Simplifies complex data for easier interpretation.

• Helps identify trends, patterns, and outliers.

• Enhances decision-making and storytelling.

8. List the different types of data visualization techniques.

1. Line charts.

2. Bar charts.

3. Pie charts.

4. Scatter plots.

5. Heatmaps.

6. Histograms.

7. Box plots.

8. Tree maps.

9. Name a common application of data science in image classification.

• Facial recognition: Identifying individuals in photos or videos.

• Object detection: Detecting and classifying objects in images (e.g., self-driving cars).

10. Tell two well-known examples of Recommender Systems.

1. Netflix: Recommends movies and TV shows based on viewing history.

2. Amazon: Suggests products based on user behavior and purchase history.

11. Define clustering and its types.

Definition: Clustering is an unsupervised learning technique that groups similar data points into
clusters.
Types:

1. K-Means Clustering: Partitions data into kkk clusters.

2. Hierarchical Clustering: Creates a tree of clusters.

3. DBSCAN (Density-Based): Groups points based on density.

4. Mean-Shift Clustering: Finds clusters based on data density peaks.

12. Compare and contrast supervised and unsupervised learning.

Supervised Learning Unsupervised Learning

Labeled data is used for training. Uses unlabeled data.

Goal: Predict output or classify data. Goal: Discover hidden patterns or groupings.

Examples: Regression, classification. Examples: Clustering, dimensionality reduction.

Results and Discussion Quantitative Sample
100% (1)
Results and Discussion Quantitative Sample
5 pages
Sandeep Garg Economics Class 11 Solutions For Chapter 8 - Measures of Correlation
0% (1)
Sandeep Garg Economics Class 11 Solutions For Chapter 8 - Measures of Correlation
5 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
No ratings yet
AD3491 - Unit 1 - Introduction To Data Science Important Questions 2 Marks With Answer - 3-8
6 pages
Introduction To Data Science - 23CSH-283
100% (1)
Introduction To Data Science - 23CSH-283
48 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
Data Preprocessing (DWDM MOD 2)
No ratings yet
Data Preprocessing (DWDM MOD 2)
62 pages
Data Science Ai Important Questions Answers - 250322 - 101649
No ratings yet
Data Science Ai Important Questions Answers - 250322 - 101649
31 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
Machine Learning Bangalore City University 2024
No ratings yet
Machine Learning Bangalore City University 2024
5 pages
FDS - Unit 1 Question Bank
No ratings yet
FDS - Unit 1 Question Bank
16 pages
Mining Class Comparisons
100% (1)
Mining Class Comparisons
4 pages
Chap 10 B
No ratings yet
Chap 10 B
20 pages
Regression Analysis Project
100% (1)
Regression Analysis Project
4 pages
Forecasting PPT Notes
No ratings yet
Forecasting PPT Notes
69 pages
Math 7-Q4-Module-6
No ratings yet
Math 7-Q4-Module-6
16 pages
Da 1733591326
No ratings yet
Da 1733591326
132 pages
Ixs8h l8mgc
No ratings yet
Ixs8h l8mgc
40 pages
ADS Viva
No ratings yet
ADS Viva
55 pages
File 2
No ratings yet
File 2
43 pages
Z and T Test
No ratings yet
Z and T Test
63 pages
DS 3-Marks Semeseter Suggestion
No ratings yet
DS 3-Marks Semeseter Suggestion
54 pages
Data Science Set - B
No ratings yet
Data Science Set - B
5 pages
703 Application of Statistics in Marine Science
100% (1)
703 Application of Statistics in Marine Science
21 pages
12 2marks With Ans
No ratings yet
12 2marks With Ans
21 pages
2 Marks With Answers
No ratings yet
2 Marks With Answers
39 pages
Untitled Document
No ratings yet
Untitled Document
21 pages
Data Science
No ratings yet
Data Science
28 pages
Unit 4 & 5-Data Science and Computer Vision
No ratings yet
Unit 4 & 5-Data Science and Computer Vision
18 pages
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
No ratings yet
Regression Analysis: Basic Concepts: 1 The Simple Linear Model
4 pages
RM Unit 4 - Overview
No ratings yet
RM Unit 4 - Overview
62 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Set. No - 2 P18pecs021-Data Science QP - Ph.d.
No ratings yet
Set. No - 2 P18pecs021-Data Science QP - Ph.d.
20 pages
Basic Statistical Concepts
No ratings yet
Basic Statistical Concepts
28 pages
Big Data (Imp-Questions)
No ratings yet
Big Data (Imp-Questions)
17 pages
Data Science Interview QnAs by CloudyML
No ratings yet
Data Science Interview QnAs by CloudyML
21 pages
Fdsa 12 - 2M
No ratings yet
Fdsa 12 - 2M
15 pages
Data Science
No ratings yet
Data Science
31 pages
AIL Quiz
No ratings yet
AIL Quiz
30 pages
AIL Quiz Loc
No ratings yet
AIL Quiz Loc
33 pages
Revision
No ratings yet
Revision
19 pages
S2 Binomial Distribution
No ratings yet
S2 Binomial Distribution
24 pages
Data Science
No ratings yet
Data Science
14 pages
Data Science
No ratings yet
Data Science
10 pages
PDS Question Bank
No ratings yet
PDS Question Bank
19 pages
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
No ratings yet
2marks Unit 1 2marks Unit 1: Foundations of Datascience (Anna University) Foundations of Datascience (Anna University)
8 pages
Q - ClassX - AI - Ch5 and 6 - DS and CV
No ratings yet
Q - ClassX - AI - Ch5 and 6 - DS and CV
12 pages
Unit I 2 Marks With Ans
No ratings yet
Unit I 2 Marks With Ans
7 pages
Mad, Mse, Mape Formulas
No ratings yet
Mad, Mse, Mape Formulas
18 pages
FDS - 1 Solved
No ratings yet
FDS - 1 Solved
17 pages
01.ad3491 Fdsa QB
No ratings yet
01.ad3491 Fdsa QB
16 pages
DS Mcqs
No ratings yet
DS Mcqs
19 pages
Bayes Factor Design Analysis With Informed Priors
No ratings yet
Bayes Factor Design Analysis With Informed Priors
27 pages
Chapter No.4 Exercise Solution (Computer)
No ratings yet
Chapter No.4 Exercise Solution (Computer)
8 pages
DS Final 3 Marks
No ratings yet
DS Final 3 Marks
10 pages
ProNEVA User Manual
No ratings yet
ProNEVA User Manual
15 pages
Data Science Assignment
No ratings yet
Data Science Assignment
9 pages
Data Analytics
No ratings yet
Data Analytics
11 pages
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
No ratings yet
Data Science MCQs Sample Mid2xlsx 2024 11-29-23!19!54
8 pages
Data Science Mcqs - Hamza Zahoor
No ratings yet
Data Science Mcqs - Hamza Zahoor
9 pages
FDS Unit 1 QB
No ratings yet
FDS Unit 1 QB
7 pages
Dele
No ratings yet
Dele
29 pages
ML Chapter 2
No ratings yet
ML Chapter 2
9 pages
Data Science Notes and Questions - 250605 - 112515
No ratings yet
Data Science Notes and Questions - 250605 - 112515
5 pages
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
No ratings yet
Predictive Accuracy: A Misleading Performance Measure For Highly Imbalanced Data
12 pages
Q1. Explain Data Science Process Along With Detailed Diagram
No ratings yet
Q1. Explain Data Science Process Along With Detailed Diagram
7 pages
Data Science
No ratings yet
Data Science
3 pages
Xii - Ai - Notes - U 2
No ratings yet
Xii - Ai - Notes - U 2
8 pages
Unit 4
No ratings yet
Unit 4
6 pages
Ds Revision 1
No ratings yet
Ds Revision 1
5 pages
Data Science and Analytics Reviewer
No ratings yet
Data Science and Analytics Reviewer
5 pages
Uncertainties in Above Ground Tree Biomass Estimation: Lihou Qin Shengwang Meng Guang Zhou Qijing Liu Zhenzhao Xu
No ratings yet
Uncertainties in Above Ground Tree Biomass Estimation: Lihou Qin Shengwang Meng Guang Zhou Qijing Liu Zhenzhao Xu
12 pages
Revision Worksheet Ai - Grade Xii 2025
No ratings yet
Revision Worksheet Ai - Grade Xii 2025
6 pages
Ch.4.Data Science X-1
No ratings yet
Ch.4.Data Science X-1
3 pages
Data Science Questions
No ratings yet
Data Science Questions
5 pages
Data Science 3
No ratings yet
Data Science 3
4 pages
Introduction To Data Science Important Questions
No ratings yet
Introduction To Data Science Important Questions
3 pages
Statistics - JEE Main 2023 April Chapterwise PYQ - MathonGo
No ratings yet
Statistics - JEE Main 2023 April Chapterwise PYQ - MathonGo
9 pages
Data Mining Mid Term
No ratings yet
Data Mining Mid Term
9 pages
Da #2
No ratings yet
Da #2
1 page
One Way ANOVA For H0: M
No ratings yet
One Way ANOVA For H0: M
2 pages
SAMPLING METHODS in Order To Answer The Research Questions
No ratings yet
SAMPLING METHODS in Order To Answer The Research Questions
5 pages
Ai Chapter 3
No ratings yet
Ai Chapter 3
8 pages
Screenshot 2024-01-31 at 6.54.16 PM
No ratings yet
Screenshot 2024-01-31 at 6.54.16 PM
8 pages
Safari 7
No ratings yet
Safari 7
1 page
Biostatistics I Handout
No ratings yet
Biostatistics I Handout
5 pages
Lecture Notes - Econometrics: The Kalman Filter: Paul S Oderlind
No ratings yet
Lecture Notes - Econometrics: The Kalman Filter: Paul S Oderlind
4 pages
Interval Estimation Solve The Problem
No ratings yet
Interval Estimation Solve The Problem
3 pages
Data Analytics with Generative AI
From Everand
Data Analytics with Generative AI
Younish P
No ratings yet
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
From Everand
Data Science through R. Unsupervised Learning. Dimension Reduction Techniques: Principal Components, Factor Analysis and Correspondence Analysis
César Pérez López
No ratings yet