0% found this document useful (0 votes)
2 views6 pages

Data Science Viva Notes

The document provides a comprehensive overview of key concepts in data science, including Exploratory Data Analysis (EDA), data cleaning, statistical analysis, and various modeling techniques. It covers essential topics such as regression analysis, clustering, time series analysis, and machine learning fundamentals. Each concept is defined succinctly, making it a useful reference for understanding data science methodologies.

Uploaded by

shrutikurade0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

Data Science Viva Notes

The document provides a comprehensive overview of key concepts in data science, including Exploratory Data Analysis (EDA), data cleaning, statistical analysis, and various modeling techniques. It covers essential topics such as regression analysis, clustering, time series analysis, and machine learning fundamentals. Each concept is defined succinctly, making it a useful reference for understanding data science methodologies.

Uploaded by

shrutikurade0
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 6

Data Science Viva Notes

Q: What is Exploratory Data Analysis (EDA)?

A: EDA is the process of exploring and visualizing data to understand its structure, patterns, and relationships

before applying any model.

Q: Summary statistics means?

A: Summary statistics are basic values that describe a dataset like mean, median, mode, min, max, standard

deviation.

Q: Histogram displays what?

A: A histogram shows the frequency distribution of a numeric variable.

Q: Box plot?

A: A box plot shows the spread of data using median, quartiles, and outliers.

Q: How to conclude after seeing boxplot?

A: You can see if data is symmetric, skewed, and whether there are outliers.

Q: Whiskers means?

A: Whiskers in a boxplot show the minimum and maximum values within 1.5 IQR from the quartiles.

Q: Scatter plot?

A: A scatter plot shows the relationship between two numeric variables.

Q: Data cleaning?

A: Data cleaning means fixing or removing wrong, incomplete, or inconsistent data.

Page 1
Data Science Viva Notes

Q: Handling inconsistencies mean?

A: It means correcting values that are wrongly formatted or mismatched in the dataset.

Q: How to apply imputation?

A: Imputation is filling missing values using mean, median, mode, or predictive models.

Q: How to remove duplicates?

A: Use tools or code (like `.drop_duplicates()` in Python) to delete repeated rows.

Q: Data transformation and feature engineering means?

A: Data transformation changes the data format or scale. Feature engineering creates new useful features for

the model.

Q: Normalization means?

A: Scaling all numeric data to a common range (like 0 to 1) to treat all features equally.

Q: Data transformation: converting categorical variables?

A: Convert them into numbers using encoding like One-Hot Encoding or Label Encoding.

Q: Binning?

A: Binning means converting continuous data into fixed intervals or categories.

Q: Polynomial feature creation?

A: Creating new features by raising existing numeric features to powers (like x², x³).

Q: Statistical analysis?

Page 2
Data Science Viva Notes

A: It involves using mathematical techniques to summarize, understand, and draw conclusions from data.

Q: Hypothesis testing means?

A: It tests if a statement about a population is likely true using sample data.

Q: Regression analysis?

A: A technique to study relationships between variables and predict one based on others.

Q: Linear regression model means?

A: A model that predicts an output using a straight-line relationship with input(s).

Q: T-test means?

A: A test to compare the means of two groups to see if they are significantly different.

Q: Chi-square test?

A: A test to check the association between two categorical variables.

Q: P-value means?

A: It shows the probability that the result happened by chance. A small p-value (<0.05) means the result is

statistically significant.

Q: Logistic regression?

A: A model used for classification problems (like yes/no) by predicting probability.

Q: Accuracy means?

A: Accuracy is the percentage of correct predictions made by a model.

Page 3
Data Science Viva Notes

Q: Accuracy, Precision, and Recall?

A: Accuracy: Overall correct predictions. Precision: Correct positive predictions. Recall: All actual positives

correctly predicted.

Q: ROC AUC curve?

A: A graph that shows model performance. AUC score near 1 is best.

Q: Clustering means?

A: Grouping similar data points together based on features.

Q: Segmentation?

A: Dividing data into meaningful groups (like customer segments).

Q: K-means clustering?

A: A method that divides data into 'k' clusters based on similarity.

Q: Difference between clustering and segmentation?

A: Clustering is the technique, segmentation is the result or goal.

Q: Churn prediction model?

A: A model that predicts which customers are likely to leave (churn).

Q: Time series analysis?

A: Analyzing data collected over time to find patterns and trends.

Page 4
Data Science Viva Notes

Q: Trend?

A: Long-term movement in data (upward or downward).

Q: Seasonality?

A: Repeating patterns at regular intervals (like monthly or yearly).

Q: Noise components?

A: Random or irregular variations in data that cannot be explained.

Q: Outliers mean?

A: Unusual values far from most of the data.

Q: ARIMA?

A: A forecasting model using past values and errors. It stands for AutoRegressive Integrated Moving

Average.

Q: Forecasting means?

A: Predicting future values based on past data.

Q: Exponential smoothing?

A: A method to forecast data by giving more weight to recent observations.

Q: Anomalies?

A: Unusual or unexpected data points that don?t fit the pattern.

Q: Z-score?

Page 5
Data Science Viva Notes

A: A value that shows how far a data point is from the mean, in standard deviations.

Q: Isolation Forest model?

A: A model used to detect anomalies by isolating them from the rest of the data.

Q: Profiling means?

A: Creating a summary of data to understand its structure, quality, and patterns.

Q: Correlation matrix?

A: A table showing how variables relate to each other (with values between -1 and 1).

Q: Correlation coefficient?

A: A number that shows the strength and direction of the relationship between two variables.

Q: ML, AI, and Deep Learning?

A: AI is the broad field. ML is a part of AI that learns from data. Deep Learning is a type of ML using neural

networks.

Q: Supervised and Unsupervised learning?

A: Supervised: learns with labeled data (has answers). Unsupervised: finds patterns from unlabeled data.

Page 6

You might also like