Data Science Viva Notes
Data Science Viva Notes
A: EDA is the process of exploring and visualizing data to understand its structure, patterns, and relationships
A: Summary statistics are basic values that describe a dataset like mean, median, mode, min, max, standard
deviation.
Q: Box plot?
A: A box plot shows the spread of data using median, quartiles, and outliers.
A: You can see if data is symmetric, skewed, and whether there are outliers.
Q: Whiskers means?
A: Whiskers in a boxplot show the minimum and maximum values within 1.5 IQR from the quartiles.
Q: Scatter plot?
Q: Data cleaning?
Page 1
Data Science Viva Notes
A: It means correcting values that are wrongly formatted or mismatched in the dataset.
A: Imputation is filling missing values using mean, median, mode, or predictive models.
A: Data transformation changes the data format or scale. Feature engineering creates new useful features for
the model.
Q: Normalization means?
A: Scaling all numeric data to a common range (like 0 to 1) to treat all features equally.
A: Convert them into numbers using encoding like One-Hot Encoding or Label Encoding.
Q: Binning?
A: Creating new features by raising existing numeric features to powers (like x², x³).
Q: Statistical analysis?
Page 2
Data Science Viva Notes
A: It involves using mathematical techniques to summarize, understand, and draw conclusions from data.
Q: Regression analysis?
A: A technique to study relationships between variables and predict one based on others.
Q: T-test means?
A: A test to compare the means of two groups to see if they are significantly different.
Q: Chi-square test?
Q: P-value means?
A: It shows the probability that the result happened by chance. A small p-value (<0.05) means the result is
statistically significant.
Q: Logistic regression?
Q: Accuracy means?
Page 3
Data Science Viva Notes
A: Accuracy: Overall correct predictions. Precision: Correct positive predictions. Recall: All actual positives
correctly predicted.
Q: Clustering means?
Q: Segmentation?
Q: K-means clustering?
Page 4
Data Science Viva Notes
Q: Trend?
Q: Seasonality?
Q: Noise components?
Q: Outliers mean?
Q: ARIMA?
A: A forecasting model using past values and errors. It stands for AutoRegressive Integrated Moving
Average.
Q: Forecasting means?
Q: Exponential smoothing?
Q: Anomalies?
Q: Z-score?
Page 5
Data Science Viva Notes
A: A value that shows how far a data point is from the mean, in standard deviations.
A: A model used to detect anomalies by isolating them from the rest of the data.
Q: Profiling means?
Q: Correlation matrix?
A: A table showing how variables relate to each other (with values between -1 and 1).
Q: Correlation coefficient?
A: A number that shows the strength and direction of the relationship between two variables.
A: AI is the broad field. ML is a part of AI that learns from data. Deep Learning is a type of ML using neural
networks.
A: Supervised: learns with labeled data (has answers). Unsupervised: finds patterns from unlabeled data.
Page 6