0% found this document useful (0 votes)

2 views6 pages

Data Science Viva Notes

The document provides a comprehensive overview of key concepts in data science, including Exploratory Data Analysis (EDA), data cleaning, statistical analysis, and various modeling techniques. It covers essential topics such as regression analysis, clustering, time series analysis, and machine learning fundamentals. Each concept is defined succinctly, making it a useful reference for understanding data science methodologies.

Uploaded by

shrutikurade0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views6 pages

Data Science Viva Notes

Uploaded by

shrutikurade0

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

Data Science Viva Notes

Q: What is Exploratory Data Analysis (EDA)?

A: EDA is the process of exploring and visualizing data to understand its structure, patterns, and relationships

before applying any model.

Q: Summary statistics means?

A: Summary statistics are basic values that describe a dataset like mean, median, mode, min, max, standard

deviation.

Q: Histogram displays what?

A: A histogram shows the frequency distribution of a numeric variable.

Q: Box plot?

A: A box plot shows the spread of data using median, quartiles, and outliers.

Q: How to conclude after seeing boxplot?

A: You can see if data is symmetric, skewed, and whether there are outliers.

Q: Whiskers means?

A: Whiskers in a boxplot show the minimum and maximum values within 1.5 IQR from the quartiles.

Q: Scatter plot?

A: A scatter plot shows the relationship between two numeric variables.

Q: Data cleaning?

A: Data cleaning means fixing or removing wrong, incomplete, or inconsistent data.

Page 1
Data Science Viva Notes

Q: Handling inconsistencies mean?

A: It means correcting values that are wrongly formatted or mismatched in the dataset.

Q: How to apply imputation?

A: Imputation is filling missing values using mean, median, mode, or predictive models.

Q: How to remove duplicates?

A: Use tools or code (like `.drop_duplicates()` in Python) to delete repeated rows.

Q: Data transformation and feature engineering means?

A: Data transformation changes the data format or scale. Feature engineering creates new useful features for

the model.

Q: Normalization means?

A: Scaling all numeric data to a common range (like 0 to 1) to treat all features equally.

Q: Data transformation: converting categorical variables?

A: Convert them into numbers using encoding like One-Hot Encoding or Label Encoding.

Q: Binning?

A: Binning means converting continuous data into fixed intervals or categories.

Q: Polynomial feature creation?

A: Creating new features by raising existing numeric features to powers (like x², x³).

Q: Statistical analysis?

Page 2
Data Science Viva Notes

A: It involves using mathematical techniques to summarize, understand, and draw conclusions from data.

Q: Hypothesis testing means?

A: It tests if a statement about a population is likely true using sample data.

Q: Regression analysis?

A: A technique to study relationships between variables and predict one based on others.

Q: Linear regression model means?

A: A model that predicts an output using a straight-line relationship with input(s).

Q: T-test means?

A: A test to compare the means of two groups to see if they are significantly different.

Q: Chi-square test?

A: A test to check the association between two categorical variables.

Q: P-value means?

A: It shows the probability that the result happened by chance. A small p-value (<0.05) means the result is

statistically significant.

Q: Logistic regression?

A: A model used for classification problems (like yes/no) by predicting probability.

Q: Accuracy means?

A: Accuracy is the percentage of correct predictions made by a model.

Page 3
Data Science Viva Notes

Q: Accuracy, Precision, and Recall?

A: Accuracy: Overall correct predictions. Precision: Correct positive predictions. Recall: All actual positives

correctly predicted.

Q: ROC AUC curve?

A: A graph that shows model performance. AUC score near 1 is best.

Q: Clustering means?

A: Grouping similar data points together based on features.

Q: Segmentation?

A: Dividing data into meaningful groups (like customer segments).

Q: K-means clustering?

A: A method that divides data into 'k' clusters based on similarity.

Q: Difference between clustering and segmentation?

A: Clustering is the technique, segmentation is the result or goal.

Q: Churn prediction model?

A: A model that predicts which customers are likely to leave (churn).

Q: Time series analysis?

A: Analyzing data collected over time to find patterns and trends.

Page 4
Data Science Viva Notes

Q: Trend?

A: Long-term movement in data (upward or downward).

Q: Seasonality?

A: Repeating patterns at regular intervals (like monthly or yearly).

Q: Noise components?

A: Random or irregular variations in data that cannot be explained.

Q: Outliers mean?

A: Unusual values far from most of the data.

Q: ARIMA?

A: A forecasting model using past values and errors. It stands for AutoRegressive Integrated Moving

Average.

Q: Forecasting means?

A: Predicting future values based on past data.

Q: Exponential smoothing?

A: A method to forecast data by giving more weight to recent observations.

Q: Anomalies?

A: Unusual or unexpected data points that don?t fit the pattern.

Q: Z-score?

Page 5
Data Science Viva Notes

A: A value that shows how far a data point is from the mean, in standard deviations.

Q: Isolation Forest model?

A: A model used to detect anomalies by isolating them from the rest of the data.

Q: Profiling means?

A: Creating a summary of data to understand its structure, quality, and patterns.

Q: Correlation matrix?

A: A table showing how variables relate to each other (with values between -1 and 1).

Q: Correlation coefficient?

A: A number that shows the strength and direction of the relationship between two variables.

Q: ML, AI, and Deep Learning?

A: AI is the broad field. ML is a part of AI that learns from data. Deep Learning is a type of ML using neural

networks.

Q: Supervised and Unsupervised learning?

A: Supervised: learns with labeled data (has answers). Unsupervised: finds patterns from unlabeled data.

Page 6

Statistics Symbols
67% (6)
Statistics Symbols
7 pages
Homework 3 1
No ratings yet
Homework 3 1
11 pages
Crack Data Science Interview 1731300339
No ratings yet
Crack Data Science Interview 1731300339
132 pages
Data Science Viva Questions
No ratings yet
Data Science Viva Questions
2 pages
Da 1733591326
No ratings yet
Da 1733591326
132 pages
6220010
No ratings yet
6220010
37 pages
CS3352-QB Fds
No ratings yet
CS3352-QB Fds
12 pages
Exploratory Data Analysis
100% (1)
Exploratory Data Analysis
209 pages
File
No ratings yet
File
27 pages
Data Mining
No ratings yet
Data Mining
34 pages
DS Unit-1 PDF
No ratings yet
DS Unit-1 PDF
50 pages
Unit I 2 Marks
No ratings yet
Unit I 2 Marks
5 pages
FDSNotes
No ratings yet
FDSNotes
12 pages
Data Science
No ratings yet
Data Science
11 pages
Unit 1 - Exploratory Data Analysis Fundamentals
No ratings yet
Unit 1 - Exploratory Data Analysis Fundamentals
47 pages
Data Science S3mca
No ratings yet
Data Science S3mca
55 pages
Data Science - Ebook
No ratings yet
Data Science - Ebook
32 pages
JobRecord MUHAMMAD NAEEM F70a3eba Db3d 11ef A12f 96f32f87411b
No ratings yet
JobRecord MUHAMMAD NAEEM F70a3eba Db3d 11ef A12f 96f32f87411b
63 pages
Data Science Interview Questions
No ratings yet
Data Science Interview Questions
32 pages
2 Marks Foundations of Data Science
No ratings yet
2 Marks Foundations of Data Science
13 pages
Unit 4
No ratings yet
Unit 4
6 pages
Datascience (Mod1)
No ratings yet
Datascience (Mod1)
4 pages
Ixs8h l8mgc
No ratings yet
Ixs8h l8mgc
40 pages
Selected Topics - Datascience
No ratings yet
Selected Topics - Datascience
17 pages
Slidesgo Enhancing Insights A Comprehensive Overview of Data Science Modules 20250113133756aOMY
No ratings yet
Slidesgo Enhancing Insights A Comprehensive Overview of Data Science Modules 20250113133756aOMY
14 pages
DSF 1-2
No ratings yet
DSF 1-2
28 pages
Introduction To Data Science
No ratings yet
Introduction To Data Science
11 pages
Data Science
No ratings yet
Data Science
10 pages
Unit I
No ratings yet
Unit I
52 pages
Introduction To Datasciecne
No ratings yet
Introduction To Datasciecne
50 pages
Chapter 01 2
No ratings yet
Chapter 01 2
19 pages
Approaches in Data Science (Slides)
No ratings yet
Approaches in Data Science (Slides)
13 pages
Ds Intro KK
No ratings yet
Ds Intro KK
11 pages
Chap 1 B
No ratings yet
Chap 1 B
24 pages
Summer Training
No ratings yet
Summer Training
8 pages
EDS Unit 1?
No ratings yet
EDS Unit 1?
15 pages
Data Science Excercises (Chaprers 1-4)
No ratings yet
Data Science Excercises (Chaprers 1-4)
4 pages
Data
No ratings yet
Data
43 pages
DSC Unit 1
No ratings yet
DSC Unit 1
59 pages
Data Science Lecture No 02
No ratings yet
Data Science Lecture No 02
21 pages
Data Scientist
No ratings yet
Data Scientist
12 pages
Datascience Notes
No ratings yet
Datascience Notes
161 pages
The 365 DS Booklet PDF
100% (1)
The 365 DS Booklet PDF
67 pages
Q1. Explain Data Science Process Along With Detailed Diagram
No ratings yet
Q1. Explain Data Science Process Along With Detailed Diagram
7 pages
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
No ratings yet
Kenny-230718-Top 70 Microsoft Data Science Interview Questions
17 pages
Data Science Lecture No 02
No ratings yet
Data Science Lecture No 02
21 pages
Data Science 1
100% (4)
Data Science 1
133 pages
Lecture 1 Introduction Tools An - Chniques For Data Science
No ratings yet
Lecture 1 Introduction Tools An - Chniques For Data Science
16 pages
Foundation of Data Science Previous Year Question Paper
No ratings yet
Foundation of Data Science Previous Year Question Paper
40 pages
FDS PYQ Solution
No ratings yet
FDS PYQ Solution
8 pages
DAT100 - Int - Data - Ana - Lec2 - Intro II
No ratings yet
DAT100 - Int - Data - Ana - Lec2 - Intro II
39 pages
Unit 1 Part 1
No ratings yet
Unit 1 Part 1
18 pages
CSIC 221: Machine Learning & Data Analytics: Mayank Dave Professor Dept. of Computer Engineering
No ratings yet
CSIC 221: Machine Learning & Data Analytics: Mayank Dave Professor Dept. of Computer Engineering
23 pages
Question Bank With Answers
No ratings yet
Question Bank With Answers
103 pages
FDS 1
No ratings yet
FDS 1
5 pages
FDS - Unit 1
No ratings yet
FDS - Unit 1
233 pages
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
No ratings yet
Contact For The Course: - Instructor: Dr. Kauser Ahmed P
54 pages
Foundation of Data Science Imp Notes
No ratings yet
Foundation of Data Science Imp Notes
6 pages
22mca341 - Data Science
No ratings yet
22mca341 - Data Science
109 pages
Data Science From A Research Perspective
No ratings yet
Data Science From A Research Perspective
45 pages
Illuminating Data: A hands on guide to data visualization in R
From Everand
Illuminating Data: A hands on guide to data visualization in R
Eman Ahmad
No ratings yet
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
From Everand
Certified Lean Six Sigma Green Belt (ICGB) Practice Questions And Exam Tests ICGB Exam Guidebook And Updated Questions
Idea Link
No ratings yet
One Sample Proportion Test: Practical Steps Involved in Test For Proportion of Successes
No ratings yet
One Sample Proportion Test: Practical Steps Involved in Test For Proportion of Successes
11 pages
Yippee! I'm in Statistics
No ratings yet
Yippee! I'm in Statistics
38 pages
R Package
0% (1)
R Package
123 pages
Sample Size For MSA
No ratings yet
Sample Size For MSA
80 pages
Cps Exams
No ratings yet
Cps Exams
9 pages
Defining Surveys and Experiments
No ratings yet
Defining Surveys and Experiments
3 pages
Ecomometrics 2020 08 Chapter All
No ratings yet
Ecomometrics 2020 08 Chapter All
502 pages
Activity 2.0 - Statistical Analysis and Design
No ratings yet
Activity 2.0 - Statistical Analysis and Design
18 pages
Econ f342 Appeco
No ratings yet
Econ f342 Appeco
3 pages
Chapter No. 03 Experiments With A Single Factor - The Analysis of Variance (Presentation)
No ratings yet
Chapter No. 03 Experiments With A Single Factor - The Analysis of Variance (Presentation)
81 pages
LM-Webinar On Multivariate Techniques For Research - Intro and MRA
No ratings yet
LM-Webinar On Multivariate Techniques For Research - Intro and MRA
24 pages
An Introduction To Robust Estimation With R Functi Removed
No ratings yet
An Introduction To Robust Estimation With R Functi Removed
12 pages
Econ 303: Homework 2: by 11:59 PM Via Dropbox
No ratings yet
Econ 303: Homework 2: by 11:59 PM Via Dropbox
3 pages
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
No ratings yet
Evaluation Metrics For Regression: Dr. Jasmeet Singh Assistant Professor, Csed Tiet, Patiala
13 pages
Stat501.101 SU14
No ratings yet
Stat501.101 SU14
2 pages
Design of Engineering Experiments Part 10 - Nested and Split-Plot Designs
No ratings yet
Design of Engineering Experiments Part 10 - Nested and Split-Plot Designs
25 pages
Sample Size Calculation: Basic Principles: Review Article
No ratings yet
Sample Size Calculation: Basic Principles: Review Article
5 pages
Pertemuan Sesi 3
No ratings yet
Pertemuan Sesi 3
34 pages
102 02 Answers
No ratings yet
102 02 Answers
17 pages
Latihan Pls
No ratings yet
Latihan Pls
109 pages
Robust Multiple Linear Backward Eliminationregression: Dhaka University Journal of Science October 2023
No ratings yet
Robust Multiple Linear Backward Eliminationregression: Dhaka University Journal of Science October 2023
9 pages
Pengaruh Pengalaman Kerja, Independensi, Integritas, Kompetensi Dan Etika Auditor Kualitas Audit
No ratings yet
Pengaruh Pengalaman Kerja, Independensi, Integritas, Kompetensi Dan Etika Auditor Kualitas Audit
13 pages
Athey 2015
No ratings yet
Athey 2015
2 pages
Mini
No ratings yet
Mini
28 pages
Descriptive Sta-WPS Office
No ratings yet
Descriptive Sta-WPS Office
3 pages
Chapter 3
No ratings yet
Chapter 3
5 pages
Weeks 11-12 Uses and Abuses - ARG
No ratings yet
Weeks 11-12 Uses and Abuses - ARG
4 pages