Ai Chapter 3
Ai Chapter 3
1. Reading data
2. Variable identification
3. Univariate Analysis
4. Bivariate Analysis
6. Outlier treatment
7. Variable transformation
• Accuracy: The ratio of total number of correct predictions to the total number of
predictions.
• Precision: Out of all the positive predictions, the percentage that is truly positive.
1. Problem definition
2. Hypothesis generation
3. Data extraction/collection
•
4. Data exploration and transformation
6. Model deployment
• Specificity: The percentage of true negative instances out of the overall actual
negative instances present in the dataset.
1. Logarithm
2. Square root
3. Cube root
4. Binning
1. Medical diagnostics
2. Email filtering
3. Computer vision
•
2. Functions for filtering, selecting, and manipulating data
4. Exporting data
9. Classify ML.
1. Supervised Learning
2. Unsupervised Learning
3. Reinforcement Learning
1. Classification
2. Regression
• Continuous Variables: Can have an infinite number of possible values (e.g., age,
salary).
1. Histogram
2. Boxplot
14. Draw a labeled boxplot.
• Bivariate Analysis: Studying two variables together for their empirical relationship.
1. Continuous-Continuous Analysis
2. Categorical-Continuous Analysis
3. Categorical-Categorical Analysis
1. Non-response
•
2. Summary statistics
2. Measurement error
1. Deleting observations
1. Logarithm
2. Square root
3. Cube root
4. Binning
1. Problem definition
2. Hypothesis generation
•
3. Data extraction/collection
5. Model building
6. Model deployment
• The process of placing a finished machine learning model into a live environment
where it can be used for its intended purpose.
1. Confusion matrix
2. Accuracy
3. Sensitivity/Recall
4. Precision
• True Positive (TP): The predicted value matches the actual positive value.
• True Negative (TN): The predicted value matches the actual negative value.
• False Positive (FP): The actual value is negative, but the model predicted it as
positive.
• False Negative (FN): The actual value is positive, but the model predicted it as
negative.
1. Hold-out validation
• Hyperparameters are the settings that can be tuned before running a training job to
control the behavior of an ML algorithm.
1. Random Search
2. Grid Search
3. Bayesian Optimization
4 Marks Questions
• A confusion matrix is a 2x2 matrix used for binary classification. It shows the
performance of a classification model by comparing the actual values with the
predicted values. For example, if a model predicts whether an email is spam or not,
the confusion matrix will show the counts of True Positives (correctly predicted
spam), True Negatives (correctly predicted not spam), False Positives (predicted
spam but not), and False Negatives (predicted not spam but is spam).
• Bivariate analysis involves studying the relationship between two variables. It helps
in understanding how one variable affects another and can be used to identify
correlations or associations. For example, analyzing the relationship between age
and income can reveal trends and patterns.
4. Explain the Steps to read csv and excel file in Jupyter notebook inside pandas.
4. Use the head() function to display the first few rows of the DataFrame to confirm successful
reading.
1. Algorithm Selection: Choosing the appropriate algorithm based on the problem type.
2. Training Model: Learning the relationship between independent and dependent variables
using the training data.
3. Prediction/Scoring: Using the trained model to predict outcomes on the test dataset.