ML U1 & U2 Notes

‭Machine Learning Unit 1 and Unit 2 Notes‬
‭Unit 1‬
‭Q1) What do you mean by Learning? Justify cognitive automation is‬

‭a subset of machine learning by giving examples.‬
‬
es
‭●‬ ‭Machine‬‭learning‬‭is a process through which computerized‬
‭systems use human-supplied data and feedback to independently‬
ot
‭make decisions and predictions, typically becoming more accurate‬
‭with continual training. This contrasts with traditional computing,‬
‭in which every action taken by a computer must be‬
‭pre-programmed.‬
N
‭●‬ ‭Reinforcement‬‭learning‬‭teaches a system as it interacts with an‬
's
‭environment by offering it rewards when it performs an action‬
‭correctly.‬
ah
‭●‬ ‭Supervised‬‭learning‬‭, which applies to the computer-vision systems‬

‭used in autonomous vehicles‬
‭●‬ ‭Unsupervised‬‭learning‬‭, which is used when data need to be‬
hm
‭clustered (for example, audience segmentation for streaming‬

‭services or product recommendations to online shoppers).‬
‭●‬ ‭’Cognition’‬‭refers to all processes by which the sensory input is‬
‭Re
‭transformed, reduced, elaborated, stored, recovered, and used.‬

‭Such terms as sensation, perception, imagery, retention, recall,‬
‭problem-solving, and thinking, among many others, refer to‬
‭hypothetical stages or aspects of cognition‬
‭●‬ ‭Automation‬‭refers to the full or partial “execution by a machine‬
‭agent (usually a computer) of a function that was previously‬
‭carried out by a human”‬
‭●‬ ‭Cognitive Automation‬‭refers to seizing ML for automating‬
‭cognitive knowledge and service work to realize value offered by‬
‭AI, which is based on implementing artificial cognition that‬
‭mimics and approximates human cognition in machines‬
‭Justifying Cognitive Automation as a Subset of Machine Learning with‬
‬
‭Examples:‬
es
‭●‬ ‭Natural Language Processing (NLP) in Customer Service: Cognitive‬
‭automation systems that handle customer inquiries use supervised‬
ot
‭machine learning models to understand text (via NLP). They learn‬
‭from historical chat data to provide responses that mimic a‬
‭human customer service agent. These systems can independently‬
N
‭refine their responses based on interactions, making them more‬
‭efficient over time.‬
's
‭●‬ ‭Example: Virtual assistants like chatbots that interpret customer‬
‭queries and provide responses, learning from past interactions to‬
ah
‭become better at understanding and solving customer problems.‬

‭●‬ ‭Automated Document Processing in Finance: Cognitive automation‬
‭can read and understand invoices, contracts, and financial‬
hm
‭statements using machine learning. It uses techniques from‬

‭supervised learning to extract information from scanned‬
‭documents and reinforcement learning to improve over time based‬
‭Re
‭on the accuracy of its extractions.‬

‭●‬ ‭Example: A machine learning system automating the review and‬
‭approval of loan applications, making decisions by learning from‬
‭historical data on past approvals and rejections.‬
‭●‬ ‭Intelligent Decision Support Systems: Machine learning‬
‭algorithms in these systems learn from historical data and human‬
‭input to assist in decision-making. For instance, in healthcare,‬
‭cognitive automation helps doctors by recommending treatment‬
‭plans based on previous patient data, imaging, and test results,‬
‭which uses both unsupervised learning (to group similar patient‬
‭cases) and supervised learning (to make treatment predictions).‬
‭●‬ ‭Example: AI-powered diagnostic tools that assist doctors by‬
‭analyzing medical data and images to suggest possible diagnoses.‬
‬
es
‭Q2) Describe various phases used by machine learning.‬
ot
N
's
ah
hm
‭Re
‭1. Gathering Data:‬

‭Data Gathering is the first step of the machine learning life cycle. The‬
‭goal of this step is to identify and obtain all data-related problems.‬
‭This step includes the below tasks:‬
‭●‬ ‭Identify various data sources‬
‭●‬ ‭Collect data‬
‭●‬ ‭Integrate the data obtained from different sources‬
‭●‬ ‭By performing the above task, we get a coherent set of data, also‬
‭called as a dataset. It will be used in further steps.‬
‭2. Data preparation‬

‭After collecting the data, we need to prepare it for further steps.‬
‬
‭Data preparation is a step where we put our data into a suitable place‬
es
‭and prepare it to use in our machine learning training.‬
‭This step can be further divided into two processes:‬
ot
‭Data exploration:‬
‭It is used to understand the nature of data that we have to work with.‬
‭We need to understand the characteristics, format, and quality of‬
N
‭data. A better understanding of data leads to an effective outcome. In‬
‭this, we find Correlations, general trends, and outliers.‬
's
‭Data pre-processing:‬
‭Now the next step is preprocessing of data for its analysis.‬
ah
‭3. Data Wrangling‬

‭Data wrangling is the process of cleaning and converting raw data into a‬
hm
‭useable format. It is the process of cleaning the data, selecting the‬

‭variable to use, and transforming the data in a proper format to make‬
‭it more suitable for analysis in the next step.‬
‭Re
‭●‬ ‭Missing Values‬

‭●‬ ‭Duplicate data‬
‭●‬ ‭Invalid data‬
‭●‬ ‭Noise‬
‭4. Data Analysis‬

‭Now the cleaned and prepared data is passed on to the analysis step.‬
‭This step involves:‬
‭●‬ ‭Selection of analytical techniques‬
‭●‬ ‭Building models‬
‭●‬ ‭Review the result‬
‭The aim of this step is to build a machine learning model to analyze the‬
‬
‭data using various analytical techniques and review the outcome.‬
es
‭Hence, in this step, we take the data and use machine learning‬
‭algorithms to build the model.‬
ot
‭5. Train Model‬
‭Now the next step is to train the model, in this step we train our model‬
N
‭to improve its performance for better outcome of the problem.‬
‭We use datasets to train the model using various machine learning‬
's
‭algorithms. Training a model is required so that it can understand the‬
‭various patterns, rules, and, features.‬
ah
‭6. Test Model‬

‭Once our machine learning model has been trained on a given dataset,‬
hm
‭then we test the model. In this step, we check for the accuracy of our‬
‭model by providing a test dataset to it.‬
‭Testing the model determines the percentage accuracy of the model as‬
‭Re
‭per the requirement of project or problem.‬
‭7. Deployment‬
‭The last step of machine learning life cycle is deployment, where we‬
‭deploy the model in the real-world system.‬
‭If the above-prepared model is producing an accurate result as per our‬
‭requirement with acceptable speed, then we deploy the model in the‬
‭real system. But before deploying the project, we will check whether it‬
‭is improving its performance using available data or not. The‬
‭deployment phase is similar to making the final report for a project.‬
‭Q3) Every machine learning algorithm should have some key points‬
‭while designing. What are they? Explain them in brief.‬
‬
es
ot
N
's
ah
hm
‭Step 1) Choosing the Training Experience:‬‭The very important and‬

‭Re
‭first task is to choose the training data or training experience which‬

‭will be fed to the Machine Learning Algorithm. It is important to note‬
‭that the data or experience that we fed to the algorithm must have a‬
‭significant impact on the Success or Failure of the Model. So Training‬
‭data or experience should be chosen wisely.‬
‭Below are the attributes which will impact on Success and Failure of‬
‭Data:‬
‭●‬ ‭The training experience will be able to provide direct or indirect‬
‭feedback regarding choices. For example: While Playing chess the‬
‭training data will provide feedback to itself like instead of this‬
‭move if this is chosen the chances of success increases.‬
‬
‭●‬ ‭Second important attribute is the degree to which the learner‬
es
‭will control the sequences of training examples. For example:‬
‭when training data is fed to the machine then at that time‬
ot
‭accuracy is very less but when it gains experience while playing‬
‭again and again with itself or opponent the machine algorithm will‬
‭get feedback and control the chess game accordingly.‬
N
‭●‬ ‭Third important attribute is how it will represent the distribution‬
‭of examples over which performance will be measured. For‬
's
‭example, a Machine learning algorithm will get experience while‬
‭going through a number of different cases and different‬
ah
‭examples. Thus, Machine Learning Algorithm will get more and‬

‭more experience by passing through more and more examples and‬
‭hence its performance will increase.‬
hm
‭Step 2- Choosing target function:‬‭The next important step is‬

‭choosing the target function. It means according to the knowledge fed‬
‭Re
‭to the algorithm the machine learning will choose NextMove function‬
‭which will describe what type of legal moves should be taken. For‬
‭example : While playing chess with the opponent, when opponent will‬
‭play then the machine learning algorithm will decide what be the‬
‭number of possible legal moves taken in order to get success.‬
‭Step 3- Choosing Representation for Target function:‬‭When the‬
‭machine algorithm will know all the possible legal moves the next step is‬
‭to choose the optimized move using any representation i.e. using linear‬
‭Equations, Hierarchical Graph Representation, Tabular form etc. The‬
‭NextMove function will move the Target move like out of these move‬
‭which will provide more success rate. For Example : while playing chess‬
‬
‭machine have 4 possible moves, so the machine will choose that‬
es
‭optimized move which will provide success to it.‬
ot
‭Step 4- Choosing Function Approximation Algorithm:‬‭An optimized‬
‭move cannot be chosen just with the training data. The training data‬
‭had to go through with set of example and through these examples the‬
N
‭training data will approximates which steps are chosen and after that‬
‭machine will provide feedback on it. For Example : When a training data‬
's
‭of Playing chess is fed to algorithm so at that time it is not machine‬
‭algorithm will fail or get success and again from that failure or success‬
ah
‭it will measure while next move what step should be chosen and what is‬
‭its success rate.‬
hm
‭Step 5- Final Design:‬‭The final design is created at last when system‬

‭goes from number of examples , failures and success , correct and‬
‭incorrect decision and what will be the next step etc. Example:‬
‭Re
‭DeepBlue is an intelligent computer which is ML-based won chess game‬

‭against the chess expert Garry Kasparov, and it became the first‬
‭computer which had beaten a human chess expert.‬
‭Q4) What is the process of machine learning algorithm and its‬

‭testing in real life?‬
‭Q5) State and explain various types of data used in ML with‬
‭suitable examples.‬
‭Types of Data related to Machine Learning‬

‭Data Types are a way of classification that specifies which type of‬
‭value a variable can store and what type of mathematical operations,‬
‬
‭relational, or logical operations can be applied to the variable without‬
es
‭causing an error. In Machine learning, it is very important to know‬
‭appropriate datatypes of independent and dependent variable.‬
ot
‭as it provides the basis for selecting classification or regression‬
‭models. Incorrect identification of data types leads to incorrect‬
‭modeling which in turn leads to an incorrect solution.‬
N
‭Here I will be discussing different types of data types with suitable‬
‭examples.‬
's
‭Different Types of data types‬
ah
hm
‭Re
‭1. Quantitative data type: –‬

‭This type of data type consists of numerical values. Anything which is‬
‭measured by numbers.‬
‭E.g., Profit, quantity sold, height, weight, temperature, etc.‬
‭This is again of two types‬
‭A.) Discrete data type: –‬

‭The numeric data which have discrete values or whole numbers. This‬
‭type of variable value if expressed in decimal format will have no‬
‬
‭proper meaning. Their values can be counted.‬
es
‭E.g.: – No. of cars you have, no. of marbles in containers, students in a‬
‭class, etc.‬
ot
N
's
ah
‭B.) Continuous data type: –‬

‭The numerical measures which can take the value within a certain‬
‭range. This type of variable value if expressed in decimal format has‬
hm
‭true meaning. Their values can not be counted but measured. The value‬
‭can be infinite‬
‭E.g.: – height, weight, time, area, distance, measurement of rainfall,‬
‭Re
‭etc.‬
‬
es
‭2. Qualitative data type: –‬
ot
‭These are the data types that cannot be expressed in numbers. This‬
‭describes categories or groups and is hence known as the categorical‬
‭data type.‬
‭This can be divided into:-‬
N
's
‭a. Structured Data:‬
‭This type of data is either number or words. This can take numerical‬
ah
‭values but mathematical operations cannot be performed on it. This‬

‭type of data is expressed in tabular format.‬
hm
‭E.g.) Sunny=1, cloudy=2, windy=3 or binary form data like 0 or1, Good‬
‭or bad, etc.‬
‭Re
‭b. Unstructured data:‬
‭This type of data does not have the proper format and therefore‬
‭known as unstructured data.This comprises textual data, sounds,‬
‭images, videos, etc.‬
‬
es
ot
N
‭Besides this, there are also other types refer as Data Types‬
‭preliminaries or Data Measures:-‬
's
‭These can also be refer different scales of measurements.‬
ah
‭I. Nominal Data Type:‬

‭This is in use to express names or labels which are not order or‬
hm
‭measurable.‬
‭E.g., male or female (gender), race, country, etc.‬
‭Re
‭II. Ordinal Data Type:‬
‭This is also a categorical data type like nominal data but has some‬
‭natural ordering associated with it.‬
‭E.g., Likert rating scale, Shirt sizes, Ranks, Grades, etc.‬
‬
es
ot
‭III. Interval Data Type:‬
‭This is numeric data which has proper order and the exact zero means‬
N
‭the true absence of a value attached. Here zero means not a complete‬
‭absence but has some value. This is the local scale.‬
's
‭E.g., Temperature measured in degree Celsius, time, Sat score, credit‬
‭score, pH, etc. difference between values is familiar. In this case,‬
ah
‭there is no absolute zero. Absolute‬

hm
‭Re
‭IV. Ratio Data Type:‬

‭This quantitative data type is the same as the interval data type but‬
‭has the absolute zero. Here zero means complete absence and the‬
‭scale starts from zero. This is the global scale.‬
‭E.g., Temperature in Kelvin, height, weight, etc.‬
‬
es
ot
N
‭Q6) Define the term dataset. What are the properties of dataset one‬
's
‭should consider while choosing dataset.‬
ah
‭A Dataset is a set of data grouped into a collection with which‬

‭developers can work to meet their goals. In a dataset, the rows‬
hm
‭represent the number of data points and the columns represent the‬
‭features of the Dataset. Datasets may vary in size and complexity and‬
‭they mostly require cleaning and preprocessing to ensure data quality‬
‭Re
‭and suitability for analysis or modeling.‬

‭Let us see an example below:‬
‬
es
‭Dataset‬
‭This is the Iris dataset. Since this is a dataset with which we build‬
ot
‭models, there are input features and output features. Here:‬
‭The input features are Sepal Length, Sepal Width, Petal Length, and‬
‭Petal Width.‬
‭Species is the output feature.‬
N
's
‭Datasets can be stored in multiple formats. The most common ones are‬
‭CSV, Excel, JSON, and zip files for large datasets such as image‬
ah
‭datasets.‬
hm
‭Why are datasets used?‬

‭Datasets are used to train and test AI models, analyze trends, and gain‬
‭insights from data. They provide the raw material for computers to‬
‭Re
‭learn patterns and make predictions.‬
‭Types of Datasets‬
‭Numerical Dataset, Categorical Dataset, Web Dataset, Time series‬
‭Dataset, Image Dataset, Ordered Dataset, Partitioned Dataset,‬
‭File-Based Datasets, Bivariate Dataset, Multivariate Dataset‬
‭Data Interpretation‬
‭It means conducting a complete study of the data. Analyzing number of‬
‭rows and columns, data types, useful and redundant data, checking for‬
‭null values.‬
‭Based on this study various operations are done on the data to make it‬
‭suitable for entering in ML models. Operations such as Feature‬
‬
‭Engineering, Dimension Reduction, Null Values induction, Missing values‬
es
‭fill-in, data types conversion by encoding methods, etc.‬
ot
‭The choice of dataset can significantly impact the model's‬
‭performance, generalization, and the insights it can provide. Here are‬
‭some key considerations and steps to guide you in choosing the right‬
N
‭dataset for your machine learning project:‬
's
‭1. Define Your Problem and Objectives:‬
‭Start by clearly defining the problem you want to solve and the‬
ah
‭objectives you want to achieve with your machine learning model.‬

‭Understanding the problem domain and the goals of your project is‬
‭essential for selecting an appropriate dataset.‬
hm
‭2. Data Relevance:‬

‭Ensure that the dataset is relevant to your problem. It should contain‬
‭Re
‭features (attributes) that are meaningful and related to the problem‬

‭you're trying to solve. Irrelevant or redundant features can introduce‬
‭noise and reduce model performance.‬
‭3. Data Size:‬

‭Consider the size of the dataset. In general, larger datasets tend to‬
‭produce more accurate and robust models, especially for complex‬
‭problems. However, collecting and processing large datasets can be‬
‭resource-intensive.‬
‭4. Data Quality:‬

‭Data quality is paramount. Check for missing values, outliers, and errors‬
‭in the dataset. Low-quality data can lead to biased or inaccurate‬
‬
‭models. Data preprocessing may be required to clean and prepare the‬
es
‭dataset.‬
ot
‭5. Data Balance:‬
‭For classification problems, check the class distribution. An imbalanced‬
‭dataset (where one class significantly outnumbers the others) can lead‬
N
‭to biased models. Techniques like oversampling or undersampling may‬
‭be needed to address class imbalance.‬
's
‭6. Data Diversity:‬
ah
‭Ensure that the dataset covers a diverse range of scenarios or‬

‭conditions relevant to your problem. Diversity helps the model‬
‭generalize better to unseen data.‬
hm
‭7. Data Availability:‬

‭Consider the accessibility and availability of the dataset. Ensure that‬
‭Re
‭you have the necessary permissions to use the data, and check for any‬
‭legal or ethical constraints.‬
‭8. Data Collection:‬

‭Depending on your problem, you may need to collect your own data‬
‭through surveys, sensors, web scraping, or other means. Be mindful of‬
‭data collection methods and ethics.‬
‭9. Public Datasets:‬
‭Explore publicly available datasets from sources like Kaggle, UCI‬
‭Machine Learning Repository, government databases, or academic‬
‭datasets. These datasets can be a valuable resource for‬
‭experimentation.‬
‬
es
‭10. Domain Knowledge:‬
‭- Leverage domain knowledge and expertise in the field related to your‬
ot
‭problem. Subject matter experts can guide you in selecting relevant‬
‭datasets and understanding the nuances of the data.‬
‭11. Data Exploration:‬

N
‭- Perform exploratory data analysis (EDA) to gain insights into the‬
's
‭dataset. Visualizations, summary statistics, and correlations can help‬
‭you understand the data's characteristics.‬
ah
‭12. Data Splitting:‬

‭- Divide the dataset into training, validation, and testing sets. This is‬
hm
‭crucial for model evaluation and preventing overfitting.‬
‭13. Ethical Considerations:‬

‭Re
‭- Be aware of ethical considerations when working with data, especially‬

‭if the data contains sensitive information. Ensure that privacy and‬
‭ethical guidelines are followed.‬
‭14. Data Licensing:‬

‭- Check the licensing terms and restrictions associated with the‬
‭dataset. Some datasets may have specific usage terms that you need‬
‭to adhere to.‬
‭15. Iterative Process:‬

‭- Dataset selection is often an iterative process. You may need to‬
‬
‭experiment with different datasets to find the one that works best‬
es
‭for your problem.‬
ot
‭Q7) Compare supervised and unsupervised learning‬
N
‭Feature‬ ‭Supervised Learning‬ ‭Unsupervised Learning‬
‭Definition‬ ‭Learns from labeled data,‬ ‭Learns from unlabeled‬

‭where the correct output‬ ‭data, discovering‬
's
‭is known.‬ ‭patterns on its own.‬
ah
‭Objective‬ ‭Predict outcomes or‬ ‭Find hidden patterns,‬

‭classify data into‬ ‭group data, or detect‬
‭predefined categories.‬ ‭anomalies.‬
hm
‭Training Data‬ ‭Uses labeled data (data‬ ‭Uses unlabeled data‬

‭tagged with the correct‬ ‭(data without predefined‬
‭Re
‭output).‬ ‭labels).‬
‭Types of‬ ‭- Classification‬ ‭- Clustering (grouping‬

‭Problems‬ ‭(categorical outcomes)‬ ‭based on similarity)‬
‭- Regression (continuous‬ ‭- Association (finding‬
‭outcomes)‬ ‭relationships)‬
‭Examples‬ ‭- Predicting if an email is‬ ‭- Grouping customers by‬
‭spam or not‬ ‭purchasing behavior‬
‭- Predicting house prices‬ ‭- Market basket analysis‬
‭Algorithms‬ ‭- Linear/Logistic‬ ‭- K-Means Clustering‬

‭Regression‬ ‭- Hierarchical Clustering‬
‭- Decision Trees‬ ‭- Apriori Algorithm‬
‬
es
‭- Support Vector‬ ‭- DBSCAN‬
‭Machines (SVM)‬
‭- Random Forests‬
ot
‭- Naive Bayes‬
N
‭Evaluation‬ ‭- Accuracy‬ ‭- Silhouette Score‬
‭Metrics‬ ‭- Precision‬ ‭- Adjusted Rand Index‬
‭- Recall‬ ‭- Davies-Bouldin Index‬
's
‭- F1 Score‬ ‭- Calinski-Harabasz‬
‭- Mean Squared Error‬ ‭Score‬
ah
‭(MSE)‬
‭Presence of‬ ‭Yes, output labels are‬ ‭No, output labels are not‬
hm
‭Output Labels‬ ‭present.‬ ‭available.‬
‭Task‬ ‭Maps input to a known‬ ‭Identifies hidden‬

‭output.‬ ‭patterns or structures in‬
‭Re
‭data.‬
‭Approach‬ ‭Learns by example with‬ ‭Learns without guidance,‬

‭guidance.‬ ‭based on data's inherent‬
‭structure.‬
‭Use Cases‬ ‭- Spam detection‬ ‭- Anomaly detection‬
‭- Fraud detection‬ ‭- Market segmentation‬
‭- Speech recognition‬ ‭- Network analysis‬
‭Advantages‬ ‭- Accurate predictions‬ ‭- Works with unlabeled‬

‭with well-labeled data‬ ‭data‬
‭- Can handle complex‬ ‭- Finds hidden patterns‬
‬
es
‭tasks‬ ‭automatically‬
‭Disadvantages‬ ‭- Requires labeled data‬ ‭- May produce less‬
ot
‭- Time-consuming to label‬ ‭accurate results‬
‭data‬ ‭- Harder to interpret‬
N
‭- Can struggle with‬ ‭and validate findings‬
‭complex or dynamic‬
‭environments‬
's
‭Q8) Justify the need of data mining in machine learning.‬
ah
‭Data mining‬‭and‬‭machine learning‬‭are closely interrelated,‬‭and data‬

‭mining is crucial for the success of machine learning algorithms. Here's‬
hm
‭a justification for the need of data mining in machine learning:‬
‭1. Extracting Relevant Patterns and Knowledge from Large Datasets‬

‭Re
‭●‬ ‭Machine learning‬‭algorithms rely heavily on clean,‬

‭well-structured, and relevant data to learn and make predictions.‬
‭Data mining‬‭helps in identifying and extracting useful‬‭patterns,‬
‭relationships, and trends from vast amounts of raw data, turning‬
‭it into meaningful information. For example, it can identify‬
‭patterns in customer behavior, fraud detection, or market‬
‭trends, which are valuable inputs for machine learning models.‬
‭2. Data Preprocessing‬
‭●‬ ‭Before machine learning algorithms can be applied, the raw data‬
‭must be preprocessed to handle missing values, outliers, and‬
‬
‭irrelevant features.‬‭Data mining techniques‬‭like data‬‭cleaning,‬
es
‭transformation, and normalization ensure the dataset is of high‬
‭quality and suitable for training machine learning models.‬
ot
‭●‬ ‭For example, in customer data, irrelevant features such as‬
‭unrelated columns or inconsistencies in user details can negatively‬
‭affect model accuracy. Data mining helps clean and prepare such‬
‭data.‬
N
‭3. Feature Selection and Engineering‬
's
‭●‬ ‭The performance of machine learning algorithms significantly‬
ah
‭depends on the choice of features.‬‭Data mining‬‭techniques‬‭assist‬

‭in feature selection (choosing the most relevant variables) and‬
‭feature engineering (creating new useful features from existing‬
hm
‭ones). This improves the performance and accuracy of machine‬

‭learning models.‬
‭●‬ ‭Example: In predictive analytics for loan approvals, data mining‬
‭Re
‭can help identify key features like income, credit score, and loan‬
‭history, which are most relevant for making predictions.‬
‭4. Handling Unstructured Data‬
‭●‬ ‭A large portion of data, such as text, images, and videos, is‬
‭unstructured.‬‭Data mining‬‭techniques can be used to‬‭extract‬
‭structured information from this data, which is essential for‬
‭feeding into machine learning models.‬
‭●‬ ‭Example: Mining text data to extract useful features such as‬
‭sentiment or topic, which can be used in natural language‬
‭processing (NLP) tasks like sentiment analysis or recommendation‬
‭systems.‬
‬
es
‭5. Discovering Hidden Patterns in Unlabeled Data‬
‭●‬ ‭Unsupervised learning‬‭methods (such as clustering or association)‬
ot
‭are used in machine learning to find hidden patterns in data‬
‭without labels.‬‭Data mining‬‭techniques help in uncovering‬‭these‬
‭patterns, associations, or clusters, which machine learning models‬
N
‭can use to improve their decision-making processes.‬
‭●‬ ‭Example: In market segmentation, data mining might discover‬
's
‭clusters of customers with similar buying habits, which can then‬
‭be used for personalized marketing strategies.‬
ah
‭6. Reducing Dimensionality‬
‭●‬ ‭Data mining‬‭can help in dimensionality reduction, which reduces‬

hm
‭the number of variables while preserving the important‬

‭information. This is essential for machine learning, especially‬
‭when dealing with high-dimensional data, where too many features‬
‭Re
‭can lead to overfitting or slow processing.‬

‭●‬ ‭Example: Techniques like‬‭Principal Component Analysis‬‭(PCA)‬‭or‬
‭Singular Value Decomposition (SVD)‬‭are often used‬‭in data‬
‭mining to reduce dimensions, making machine learning algorithms‬
‭more efficient.‬
‭7. Improving Accuracy and Predictive Power‬

‭●‬ ‭By extracting useful patterns, relationships, and trends from raw‬
‭data,‬‭data mining‬‭enhances the accuracy and predictive‬‭power of‬
‭machine learning models. Data mining helps in identifying the most‬
‭relevant factors that can influence outcomes, thereby refining‬
‭the learning process.‬
‭●‬ ‭Example: In fraud detection, data mining can uncover subtle‬
‬
‭patterns in transaction data that machine learning algorithms can‬
es
‭use to accurately predict and detect fraud.‬
‭8. Knowledge Discovery in Databases (KDD)‬
ot
‭●‬ ‭Machine learning is often a part of the broader process called‬
‭Knowledge Discovery in Databases (KDD)‬‭, where data‬‭mining‬
N
‭plays a pivotal role. Data mining techniques are essential to‬
‭discover previously unknown relationships or patterns from large‬
's
‭datasets, which can then be leveraged by machine learning models‬
‭for prediction and classification.‬
ah
‭●‬ ‭Example: In medical research, data mining can help discover‬

‭hidden correlations between symptoms and diseases, leading to‬
‭the development of better predictive healthcare models using‬
hm
‭machine learning.‬
‭9. Handling Noisy and Incomplete Data‬

‭Re
‭●‬ ‭Real-world data is often noisy or incomplete, which can degrade‬

‭the performance of machine learning models.‬‭Data mining‬
‭techniques are designed to clean and handle such data, ensuring‬
‭that the final dataset used for machine learning is reliable and‬
‭robust.‬
‭●‬ ‭Example: In financial data, missing values or errors in‬
‭transactional records can be corrected or imputed through data‬
‭mining techniques before applying machine learning for credit risk‬
‭assessment.‬
‭10. Efficiency in Dealing with Large-Scale Data‬
‬
‭●‬ ‭With the growing amount of data generated every day (e.g., from‬
es
‭IoT devices, social media, or transactions),‬‭data‬‭mining‬‭is‬
‭essential for handling and processing large-scale datasets‬
ot
‭efficiently. Machine learning models often need summarized or‬
‭reduced data to operate effectively, which data mining provides‬
‭through aggregation and summarization techniques.‬
N
‭●‬ ‭Example: In social media analysis, mining relevant social posts‬
‭from millions of records and summarizing them for machine‬
's
‭learning sentiment analysis or trend prediction.‬
‭Conclusion:‬
ah
‭Data mining‬‭serves as a crucial preprocessing step and knowledge‬

‭discovery tool in the machine learning pipeline. It ensures that the raw‬
hm
‭data is converted into a form that machine learning algorithms can‬

‭effectively learn from, enhancing their predictive accuracy, efficiency,‬
‭and relevance. Without data mining, machine learning would struggle to‬
‭Re
‭handle large-scale, unstructured, and noisy datasets, limiting its‬

‭real-world applicability.‬
‭Q9) As a researcher you are expected to work with detection of‬

‭COVID at early stage. What kind of data and types of data you‬
‭will select? What kind of properties you will consider while choosing‬
‭data and dataset.‬
‭Main detection features will be the following 3:‬
‭Antibody test (IgG)‬
‭Antibody testing is also known as serological testing. Your doctor or‬
‭medical laboratory technician will use it to examine the type of‬
‭antibodies present in your blood.‬
‬
es
‭There are numerous antibodies in the blood. The technician or nurse‬
‭will collect a sample of your blood and examine it for IgM and IgG. Ig‬
ot
‭stands for an immunoglobulin molecule.‬
‭● IgM antibodies develop at an early stage of infection against‬
‭SARS-CoV-2.‬
N
‭● IgG antibodies develop against SARS-CoV-2 once the person has‬
‭recovered from coronavirus.‬
's
‭Results by Antibody Test (IgG)‬
ah
‭The antibody testing kits take around 30-60 minutes to show results.‬
‭Reverse Transcription Polymerase Chain Reaction (RT – PCR)‬

hm
‭A polymerase chain reaction test is a highly sensitive test. Due to its‬

‭increased sensitivity and high fidelity, it is known as the most accurate‬
‭testing method for COVID 19 to date. It works by detecting the‬
‭Re
‭presence of genetic material from a specific pathogen.‬
‭Results by RT-PCR‬
‭RT-PCR is capable of delivering an accurate diagnosis and result for‬
‭COVID 19 within 3 hours. The laboratories take 6-8 hours to derive a‬
‭conclusive result.‬
‭TrueNat‬
‭TrueNat is a chip-based, portable RT-PCR machine that was initially‬
‭developed to diagnose tuberculosis. You can confirm your sample using‬
‭confirmatory tests for SARS-CoV-2 if you test positive by TrueNat‬
‭Beta CoV.‬
‬
‭Results by TrueNat‬
es
‭It is capable of producing faster results than standard RT-PCR tests.‬
ot
‭Aside from these, patient demographic and time for test and reporting‬
‭will also be recorded and used for detecting underlying patterns.‬
‭These underlying patterns can be used in statistics to generate‬
‭hypotheses and theories.‬
N
's
‭Properties of the data will be the same as Q6.‬
ah
‭Q10) What is the need of regression? Describe various types of‬

‭the same.‬
hm
‭●‬ ‭Regression Analysis is a statistical process for estimating the‬

‭relationships between the dependent variables or criterion‬
‭variables and one or more independent variables or predictors.‬
‭Re
‭●‬ ‭Regression analysis is generally used when we deal with a dataset‬

‭that has the target variable in the form of continuous data.‬
‭Regression analysis explains the changes in criteria about changes‬
‭in select predictors.‬
‭●‬ ‭The conditional expectation of the criteria is based on predictors‬
‭where the average value of the dependent variables is given when‬
‭the independent variables are changed.‬
‭●‬ ‭Three major uses for regression analysis are determining the‬
‭strength of predictors, forecasting an effect, and trend‬
‭forecasting.‬
‭●‬ ‭There are times when we would like to analyze the effect of‬
‭different independent features on the target or what we say‬
‭dependent features. This helps us make decisions that can affect‬
‬
‭the target variable in the desired direction.‬
es
‭●‬ ‭Regression analysis is heavily based on statistics and hence gives‬
‭quite reliable results to this reason only regression models are‬
ot
‭used to find the linear as well as non-linear relation between the‬
‭independent and the dependent or target variables.‬
N
‭Types of Regression are as follows:‬
‭●‬ ‭Linear regression‬‭is used for predictive analysis. Linear‬
's
‭regression is a linear approach for modeling the relationship‬
‭between the criterion or the scalar response and the multiple‬
ah
‭predictors or explanatory variables. Linear regression focuses on‬

‭the conditional probability distribution of the response given the‬
‭values of the predictors. The formula for linear regression is: y =‬
hm
‭θx + b‬
‭●‬ ‭Polynomial Regression:‬‭This is an extension of linear regression‬
‭and is used to model a non-linear relationship between the‬
‭Re
‭dependent variable and independent variables. Here as well‬

‭syntax remains the same but now in the input variables we include‬
‭some polynomial or higher degree terms of some already existing‬
‭features as well. Linear regression was only able to fit a linear‬
‭model to the data at hand but with polynomial features, we can‬
‭easily fit some non-linear relationship between the target as well‬
‭as input features.‬
‭●‬ ‭Stepwise regression‬‭is used for fitting regression models with‬
‭predictive models. It is carried out automatically. With each step,‬
‭the variable is added or subtracted from the set of explanatory‬
‭variables. The approaches for stepwise regression are forward‬
‭selection, backward elimination, and bidirectional elimination. The‬
‭formula for stepwise regression is:‬
‬
‭●‬ ‭Decision Tree Regression:‬‭A Decision Tree is the most powerful‬
es
‭and popular tool for classification and prediction. A Decision tree‬
‭is a flowchart-like tree structure, where each internal node‬
ot
‭denotes a test on an attribute, each branch represents an‬
‭outcome of the test, and each leaf node (terminal node) holds a‬
‭class label. There is a non-parametric method used to model a‬
N
‭decision tree to predict a continuous outcome.‬
‭●‬ ‭Random Forest‬‭is an ensemble technique capable of performing‬
's
‭both regression and classification tasks with the use of multiple‬
‭decision trees and a technique called Bootstrap and Aggregation,‬
ah
‭commonly known as bagging. The basic idea behind this is to‬

‭combine multiple decision trees in determining the final output‬
‭rather than relying on individual decision trees.‬
hm
‭●‬ ‭Support vector regression (SVR)‬‭is a type of support vector‬

‭machine (SVM) that is used for regression tasks. It tries to find‬
‭a function that best predicts the continuous output value for a‬
‭Re
‭given input value.SVR can use both linear and non-linear kernels. A‬
‭linear kernel is a simple dot product between two input vectors,‬
‭while a non-linear kernel is a more complex function that can‬
‭capture more intricate patterns in the data. The choice of kernel‬
‭depends on the data’s characteristics and the task’s complexity.‬
‭●‬ ‭Ridge Regression‬‭: Ridge regression is a technique for analyzing‬
‭multiple regression data. When multicollinearity occurs, least‬
‭squares estimates are unbiased. This is a regularized linear‬
‭regression model, it tries to reduce the model complexity by‬
‭adding a penalty term to the cost function. A degree of bias is‬
‭added to the regression estimates, and as a result, ridge‬
‬
‭regression reduces the standard errors.‬
es
‭●‬ ‭Lasso regression‬‭is a regression analysis method that performs‬
ot
‭both variable selection and regularization. Lasso regression uses‬
‭soft thresholding. Lasso regression selects only a subset of the‬
N
‭provided covariates for use in the final model. This is another‬
‭regularized linear regression model, it works by adding a penalty‬
‭term to the cost function, but it tends to zero out some features’‬
's
‭coefficients, which makes it useful for feature selection.‬
‭●‬ ‭ElasticNet Regression‬‭: Linear Regression suffers from‬
ah
‭overfitting and can’t deal with collinear data. When there are‬
‭many features in the dataset and even some of them are not‬
hm
‭relevant to the predictive model. This makes the model more‬

‭complex with a too-inaccurate prediction on the test set (or‬
‭overfitting). Such a model with high variance does not generalize‬
‭Re
‭on the new data. So, to deal with these issues, we include both‬
‭L-2 and L-1 norm regularization to get the benefits of both Ridge‬
‭and Lasso at the same time. The resultant model has better‬
‭predictive power than Lasso‬
‭●‬ ‭Bayesian Linear Regression‬‭: As the name suggests this algorithm‬
‭is purely based on Bayes Theorem. Because of this reason only we‬
‭do not use the Least Square method to determine the‬
‭coefficients of the regression model. So, the technique which is‬
‭used here to find the model weights and parameters relies on‬
‭features posterior distribution and this provides an extra‬
‭stability factor to the regression model which is based on this‬
‭technique.‬
‬
‭Q11) Problems based on regression‬
es
‭Numerical PDF‬
ot
‭Q12) Note on polynomial regression‬
‭Polynomial Regression‬
N
‭Polynomial Regression is a regression algorithm that models the‬
‭relationship between a dependent(y) and independent variable(x) as nth‬
's
‭degree polynomial. The Polynomial Regression equation is given below:‬
ah
‭2‬ ‭3‬ ‭𝑛‬

‭𝑦‬‭‬ = ‭‬‭𝑏𝑜‬‭‬ + ‭‭𝑏
‬ ‬‭1‬‭𝑥‭‬‬ + ‭‬‭𝑏‬‭1‬‭𝑥‬ + ‭‬‭𝑏‬‭2‭𝑥
‬ ‬ + ‭‬... ‭‬ + ‭‬‭𝑏𝑛‬‭𝑥‬
‭It is also called the special case of Multiple Linear Regression in ML.‬
hm
‭Because we add some polynomial terms to the Multiple Linear‬

‭regression equation to convert it into Polynomial Regression.‬
‭It is a linear model with some modification in order to increase the‬
‭Re
‭accuracy. The dataset used in Polynomial regression for training is of‬

‭non-linear nature. It makes use of a linear regression model to fit the‬
‭complicated and nonlinear functions and datasets.‬
‭Hence, "In Polynomial regression, the original features are converted‬
‭into Polynomial features of required degree (2,3,..,n) and then modeled‬
‭using a linear model."‬
‭Need for Polynomial Regression:‬

‭●‬ ‭If we apply a linear model on a linear dataset, then it provides us‬
‬
‭a good result as we have seen in Simple Linear Regression, but if‬
es
‭we apply the same model without any modification on a non-linear‬
‭dataset, then it will produce a drastic output. Due to which loss‬
ot
‭function will increase, the error rate will be high, and accuracy‬
‭will be decreased.‬
‭●‬ ‭So for such cases, where data points are arranged in a non-linear‬
N
‭fashion, we need the Polynomial Regression model. We can‬
‭understand it in a better way using the below comparison diagram‬
's
‭of the linear dataset and non-linear dataset.‬
ah
hm
‭Re
‭In the above image, we have taken a dataset which is arranged‬

‭non-linearly. So if we try to cover it with a linear model, then we can‬
‭clearly see that it hardly covers any data point. On the other hand, a‬
‭curve is suitable to cover most of the data points, which is of the‬
‭Polynomial model.‬
‭Hence, if the datasets are arranged in a non-linear fashion, then we‬
‭should use the Polynomial Regression model instead of Simple Linear‬
‭Regression.‬
‭Note: A Polynomial Regression algorithm is also called Polynomial Linear‬

‭Regression because it does not depend on the variables, instead, it‬
‬
‭depends on the coefficients, which are arranged in a linear fashion.‬
es
‭Equation of the Polynomial Regression Model:‬
‭Simple Linear Regression equation:‬
ot
‭y = b0+b1x‬
‭Polynomial Regression equation:‬
‭2‬ ‭3‬ ‭𝑛‬
N
‭𝑦‬‭‬ = ‭‬‭𝑏𝑜‬‭‬ + ‭‭𝑏
‬ ‬‭1‬‭𝑥‭‬‬ + ‭‬‭𝑏‬‭1‬‭𝑥‬ + ‭‬‭𝑏‬‭2‭𝑥
‬ ‬ + ‭‬... ‭‬ + ‭‬‭𝑏𝑛‬‭𝑥‬
‭Q13) Identify the domain where you can apply linear regression and‬
's
‭polynomial regression‬
ah
‭Domain‬ ‭Linear Regression‬ ‭Polynomial Regression‬
‭Economics‬ ‭- Stock price‬ ‭- Modeling non-linear stock‬

hm
‭and Finance‬ ‭prediction‬ ‭price trends‬

‭- Sales forecasting‬ ‭- Demand forecasting with‬
‭- House price‬ ‭non-linear growth‬
‭Re
‭prediction‬
‭Healthcare‬ ‭- Medical cost‬ ‭- Disease progression‬

‭prediction‬ ‭modeling (e.g., cancer growth)‬
‭- Predicting BMI‬ ‭- Drug efficacy over time‬
‭based on‬
‭height/weight‬
‭Engineering‬ ‭- Energy consumption‬ ‭- Trajectory modeling in‬
‭prediction based on‬ ‭physics‬
‭usage patterns‬ ‭- Heat transfer in complex‬
‭systems‬
‭Social‬ ‭- Income vs.‬ ‭- Modeling complex social‬

‭Sciences‬ ‭education level‬ ‭behaviors (e.g., crime rate‬
‬
es
‭- Predicting‬ ‭fluctuations)‬
‭population trends‬
ot
‭Marketing‬ ‭- Customer spending‬ ‭- Customer lifetime value with‬
‭and‬ ‭prediction‬ ‭non-linear trends‬
N
‭Advertising‬ ‭- Pricing models‬ ‭- Advanced pricing models‬
‭with curve fitting‬
‭Environmental‬ ‭- Temperature vs.‬ ‭- Climate change modeling‬

's
‭Science‬ ‭energy consumption‬ ‭- Pollution level prediction‬
‭(complex interactions)‬
ah
‭Agriculture‬ ‭- Predicting crop‬ ‭- Modeling crop yield‬

‭yield based on linear‬ ‭considering non-linear factors‬
hm
‭factors like rainfall‬ ‭like soil fertility changes over‬

‭time‬
‭Re
‭Physics and‬ ‭- Simple force or‬ ‭- Complex motion trajectories‬

‭Mechanics‬ ‭speed predictions‬ ‭- Fatigue testing of materials‬
‭Education‬ ‭- Predicting student‬ ‭- Modeling non-linear trends‬

‭performance based‬ ‭in student learning behavior‬
‭on study hours‬
‭Real Estate‬ ‭- House price‬ ‭- Real estate value prediction‬
‭prediction based on‬ ‭based on complex, non-linear‬
‭linear factors (e.g.,‬ ‭factors (e.g., proximity to‬
‭size, location)‬ ‭future development)‬
‭Q14) What is reinforcement learning? How it is different than‬
‬
‭supervised and unsupervised learning?‬
es
‭Reinforcement Learning (RL)‬‭is a branch of machine learning focused on‬
ot
‭making decisions to maximize cumulative rewards in a given situation.‬
‭Unlike supervised learning, which relies on a training dataset with‬
N
‭predefined answers, RL involves learning through experience. In RL, an‬
‭agent learns to achieve a goal in an uncertain, potentially complex‬
‭environment by performing actions and receiving feedback through‬
's
‭rewards or penalties.‬
ah
‭Key Concepts of Reinforcement Learning‬

‭Agent‬‭: The learner or decision-maker.‬
hm
‭Environment‬‭: Everything the agent interacts with.‬

‭State‬‭: A specific situation in which the agent finds itself.‬
‭Action‬‭: All possible moves the agent can make.‬
‭Re
‭Reward‬‭: Feedback from the environment based on the action taken.‬
‭How Reinforcement Learning Works‬

‭RL operates on the principle of learning optimal behavior through trial‬
‭and error. The agent takes actions within the environment, receives‬
‭rewards or penalties, and adjusts its behavior to maximize the‬
‭cumulative reward. This learning process is characterized by the‬
‭following elements:‬
‭Policy‬‭: A strategy used by the agent to determine the next action‬

‭based on the current state.‬
‭Reward Function:‬‭A function that provides a scalar feedback signal‬
‬
‭based on the state and action.‬
es
‭Value Function:‬‭A function that estimates the expected cumulative‬
‭reward from a given state.‬
ot
‭Model of the Environment‬‭: A representation of the environment that‬
‭helps in planning by predicting future states and rewards.‬
‭Example: Navigating a Maze‬

N
‭The problem is as follows: We have an agent and a reward, with many‬
's
‭hurdles in between. The agent is supposed to find the best possible‬
‭path to reach the reward. The following problem explains the problem‬
ah
‭more easily.‬
hm
‭Re
‭The above image shows the robot, diamond, and fire. The goal of the‬
‭robot is to get the reward that is the diamond and avoid the hurdles‬
‭that are fired. The robot learns by trying all the possible paths and‬
‭then choosing the path which gives him the reward with the least‬
‭hurdles. Each right step will give the robot a reward and each wrong‬
‭step will subtract the reward of the robot. The total reward will be‬
‬
‭calculated when it reaches the final reward that is the diamond.‬
es
‭Main points in Reinforcement learning –‬
ot
‭●‬ ‭Input‬‭: The input should be an initial state from which the model‬
‭will start‬
‭●‬ ‭Output‬‭: There are many possible outputs as there are a variety‬
N
‭of solutions to a particular problem‬
‭●‬ ‭Training‬‭: The training is based upon the input, The model will‬
's
‭return a state and the user will decide to reward or punish the‬
‭model based on its output.‬
ah
‭●‬ ‭The model keeps continues to learn.‬

‭●‬ ‭The best solution is decided based on the maximum reward.‬
hm
‭Feature‬ ‭Supervised‬ ‭Unsupervised‬ ‭Reinforcement‬

‭Learning‬ ‭Learning‬ ‭Learning‬
‭Re
‭Definition‬ ‭Learning from‬ ‭Learning from‬ ‭Learning through‬

‭labeled data with‬ ‭unlabeled data,‬ ‭interaction with an‬
‭the correct output‬ ‭finding hidden‬ ‭environment, using‬
‭provided for each‬ ‭patterns or‬ ‭rewards and penalties.‬
‭input.‬ ‭structures.‬
‭Input Data‬ ‭Labeled data‬ ‭Unlabeled data (no‬ ‭The environment‬
‭(input-output pairs‬ ‭explicit output‬ ‭provides states and‬
‭are provided).‬ ‭labels).‬ ‭the agent chooses‬
‭actions to receive‬
‭feedback‬
‭(rewards/penalties).‬
‬
‭Objective‬ ‭Predict the correct‬ ‭Discover hidden‬ ‭Maximize the‬
es
‭label for new,‬ ‭patterns or‬ ‭cumulative reward by‬
‭unseen data.‬ ‭groupings in the‬ ‭learning the best‬
‭data.‬ ‭sequence of actions.‬
ot
‭Learning‬ ‭The model is‬ ‭The model‬ ‭The agent learns‬
‭Process‬ ‭trained by‬ ‭organizes data‬ ‭through trial and‬
‭minimizing the‬
N
‭based on‬
‭predictions and true‬ ‭specific guidance.‬

‭error, receiving‬
‭difference between‬ ‭similarity, without‬ ‭feedback for its‬
‭actions and adjusting‬
's
‭labels (error).‬ ‭its strategy.‬
‭Data‬ ‭Requires large‬ ‭Does not require‬ ‭Data comes from‬

ah
‭Dependency‬ ‭amounts of labeled‬ ‭labeled data;‬ ‭continuous interaction‬

‭data for training.‬ ‭focuses on‬ ‭with an environment‬
‭exploring data‬ ‭(dynamic and‬
hm
‭structure.‬ ‭sequential).‬
‭Common‬ ‭- Linear Regression‬ ‭- K-Means‬ ‭- Q-Learning‬

‭Algorithms‬ ‭- Decision Trees‬ ‭Clustering‬ ‭- Deep Q-Network‬
‭Re
‭- Random Forests‬ ‭- Hierarchical‬ ‭(DQN)‬

‭- Support Vector‬ ‭Clustering‬ ‭- SARSA‬
‭Machines (SVMs)‬ ‭- Principal‬ ‭- Proximal Policy‬
‭Component‬ ‭Optimization (PPO)‬
‭Analysis (PCA)‬
‭Example‬ ‭- Spam detection‬ ‭- Customer‬ ‭- Robotics‬
‭Applications‬ ‭- Image‬ ‭segmentation‬ ‭- Game AI (e.g.,‬
‭classification‬ ‭- Anomaly‬ ‭AlphaGo)‬
‭- Medical diagnosis‬ ‭detection‬ ‭- Autonomous driving‬
‭- Market basket‬ ‭- Personalized‬
‭analysis‬ ‭recommendations‬
‬
‭Type of‬ ‭Explicit feedback in‬ ‭No feedback; the‬ ‭Reward signal or‬
es
‭Feedback‬ ‭the form of labeled‬ ‭model‬ ‭penalty after each‬
‭data‬ ‭self-discovers‬ ‭action (delayed‬
‭(correct/incorrect‬ ‭patterns in the‬ ‭feedback).‬
ot
‭labels).‬ ‭data.‬
‭Task Type‬ ‭Classification,‬ ‭Clustering,‬ ‭Sequential‬
‭Advantages‬
‭Regression‬
‭- Provides accurate‬
‭predictions with‬
N
‭Association‬
‭- Can work with‬

‭unlabeled data,‬
‭decision-making tasks‬
‭- Learns complex‬
‭strategies and adapts‬
's
‭well-labeled data.‬ ‭which is more‬ ‭to dynamic‬
‭- Clear evaluation‬ ‭readily available.‬ ‭environments.‬
ah
‭metrics (accuracy,‬ ‭- Can reveal‬ ‭- Maximizes long-term‬

‭precision, recall,‬ ‭unknown patterns‬ ‭rewards.‬
‭etc.).‬ ‭in the data.‬
hm
‭Disadvantage‬ ‭- Requires labeled‬ ‭- Harder to‬ ‭- Requires a large‬

‭s‬ ‭data, which can be‬ ‭evaluate the‬ ‭amount of trial and‬
‭costly and‬ ‭performance‬ ‭error.‬
‭Re
‭time-consuming to‬ ‭without clear‬ ‭- May struggle with‬

‭obtain.‬ ‭labels.‬ ‭long-term planning due‬
‭- Can be less‬ ‭to delayed rewards.‬
‭interpretable.‬
‭Real-World‬ ‭- Predicting house‬ ‭- Grouping‬ ‭- Training a robot to‬
‭Example‬ ‭prices‬ ‭customers into‬ ‭navigate a room‬
‭- Classifying emails‬ ‭segments for‬ ‭- Self-driving cars‬
‭as spam/not spam‬ ‭targeted‬ ‭learning to avoid‬
‭marketing‬ ‭obstacles‬
‭- Product‬
‭recommendations‬
‬
es
‭Q15) Compare polynomial and linear regression‬
ot
‭Feature‬ ‭Linear Regression‬ ‭Polynomial Regression‬
N
‭Definition‬ ‭Models the relationship‬ ‭Models the relationship‬
‭between the dependent‬ ‭between the dependent and‬
‭and independent‬ ‭independent variables as a‬
's
‭variables as a straight‬ ‭polynomial curve.‬
‭line.‬
ah
‭Equation‬ ‭y=b0+b1xy‬ ‭y=b0+b1x+b2x^2+⋯+bnx^n‬

hm
‭Type of‬ ‭Assumes a‬‭linear‬ ‭Assumes a‬‭non-linear‬

‭Relationship‬ ‭relationship between‬ ‭relationship that can be‬
‭variables.‬ ‭represented as a polynomial.‬
‭Re
‭Complexity‬ ‭Simple, requires fewer‬ ‭More complex, as‬

‭computational‬ ‭higher-degree polynomials‬
‭resources.‬ ‭increase the model’s‬
‭complexity.‬
‭Fitting‬ ‭Fits straight lines to‬ ‭Fits curves to data; better‬
‭Ability‬ ‭data; useful for linearly‬ ‭for capturing more complex,‬
‭separable data.‬ ‭non-linear trends.‬
‭Overfitting‬ ‭Lower risk of‬ ‭Higher risk of overfitting‬

‭Risk‬ ‭overfitting, especially‬ ‭with high-degree‬
‭for small datasets.‬ ‭polynomials.‬
‬
es
‭Applications‬ ‭- Stock price prediction‬ ‭- Trajectory modeling‬
‭- Sales forecasting‬ ‭- Disease progression‬
ot
‭- House price prediction‬ ‭- Climate change modeling‬
‭- Medical cost‬ ‭- Crop yield prediction‬
N
‭prediction‬
‭Interpretabi‬ ‭Easier to interpret and‬ ‭Can be harder to interpret‬

‭lity‬ ‭understand, as the‬ ‭as the complexity of the‬
's
‭relationship is simple.‬ ‭curve increases.‬
ah
‭Handling of‬ ‭Best for‬‭linear‬‭data‬ ‭Suitable for‬‭non-linear‬‭data‬

‭Data‬ ‭patterns where a‬ ‭patterns where curves‬
‭Patterns‬ ‭straight-line‬ ‭better fit the data.‬
hm
‭approximation is‬
‭sufficient.‬
‭Re
‭Example‬ ‭Predicting house prices‬ ‭Modeling population growth‬

‭based on features like‬ ‭trends or predicting complex‬
‭area, number of rooms,‬ ‭physics-based trajectories.‬
‭etc.‬
‭Computation‬ ‭More computationally‬ ‭Requires more computational‬
‭al Efficiency‬ ‭efficient and faster.‬ ‭power, especially for‬
‭high-degree polynomials.‬
‭Overfitting‬ ‭Less prone to‬ ‭Needs regularization‬

‭Prevention‬ ‭overfitting, works well‬ ‭techniques (e.g., Lasso or‬
‭with small data sizes.‬ ‭Ridge) to avoid overfitting.‬
‬
es
‭Unit 2‬
ot
‭Q1) Define the term classification in machine learning by providing‬
‭three real life examples‬
N
‭Classification:‬‭A classification problem is when the‬‭output variable is a‬
's
‭category, such as “Red” or “blue” , “disease” or “no disease”.‬
‭Classification is a type of supervised learning that is used to predict‬
ah
‭categorical values, such as whether a customer will churn or not,‬

‭whether an email is spam or not, or whether a medical image shows a‬
‭tumor or not. Classification algorithms learn a function that maps from‬
hm
‭the input features to a probability distribution over the output classes.‬
‭Some common classification algorithms include:‬

‭Re
‭Logistic Regression, Support Vector Machines, Decision Trees, Random‬

‭Forests, Naive Baye‬
‭Evaluation Metrics of Classification:‬

‭Accuracy, Precision, Recall, F1 Score, Confusion Matrix‬
‭Advantages of Supervised Learning:‬
‭1. Since Supervised Learning works with a data set, so we can have an‬
‭exact idea about the classes of objects.‬
‭2. These algorithms are useful or helpful in predicting the output‬
‭based on the prior experience.‬
‬
es
‭Disadvantages of Supervised Learning:‬
ot
‭1. These algorithms are not able to solve complex problems.‬
‭2. It may predict the wrong output if the test data is different from‬
‭the training data.‬
N
‭3. It requires lot of computational time to train the algorithm.‬
's
‭Applications of Supervised Learning:‬
‭1. Email Spam Detection‬

ah
‭●‬ ‭Application‬‭: Email services like Gmail and Outlook use‬

‭classification algorithms to automatically filter out spam emails‬
hm
‭from the inbox.‬

‭●‬ ‭How it Works‬‭: A classification model is trained on a labeled‬
‭dataset of emails, where each email is marked as either "spam" or‬
‭Re
‭"not spam." The model learns patterns from the email content,‬
‭sender information, and other metadata to classify incoming‬
‭emails.‬
‭●‬ ‭Classification Type‬‭: Binary classification (Spam/Not‬‭Spam).‬
‭2. Medical Diagnosis‬

‭●‬ ‭Application‬‭: In healthcare, classification algorithms are used to‬
‭diagnose diseases based on patient data such as symptoms, test‬
‭results, and medical history.‬
‭●‬ ‭How it Works‬‭: For instance, a model trained on labeled‬‭medical‬
‭datasets can classify whether a patient has a specific disease‬
‭(e.g., diabetes, cancer) or not, based on input features like blood‬
‬
‭sugar levels, age, weight, and more.‬
es
‭●‬ ‭Classification Type‬‭: Multi-class classification (e.g.,‬‭Disease A,‬
‭Disease B, or No Disease).‬
ot
‭3. Credit Card Fraud Detection‬
‭●‬ ‭Application‬‭: Financial institutions use classification‬‭to detect‬
N
‭fraudulent credit card transactions.‬
‭●‬ ‭How it Works‬‭: A classification model is trained on past‬
's
‭transaction data, where transactions are labeled as either‬
‭"fraudulent" or "legitimate." The model learns patterns and can‬
ah
‭flag suspicious transactions for further investigation.‬

‭●‬ ‭Classification Type‬‭: Binary classification (Fraud/Legitimate).‬
hm
‭Q2) How one can differentiate classification and clustering.‬
‭Feature‬ ‭Classification‬ ‭Clustering‬

‭Re
‭Definition‬ ‭Assigns predefined‬ ‭Groups data points into‬

‭labels to data points‬ ‭clusters based on similarity,‬
‭based on training data.‬ ‭without predefined labels.‬
‭Type of‬ ‭Supervised Learning‬ ‭Unsupervised Learning‬

‭Learning‬ ‭(requires labeled data).‬ ‭(works with unlabeled data).‬
‭Objective‬ ‭Predict the‬ ‭Discover hidden patterns or‬
‭category/class for new‬ ‭structures in data by‬
‭data points.‬ ‭grouping similar items.‬
‭Input Data‬ ‭Labeled data (with‬ ‭Unlabeled data (no prior‬

‭known class labels).‬ ‭knowledge of classes).‬
‬
‭Output‬ ‭Discrete class labels‬ ‭Groupings or clusters of‬
es
‭(e.g., "spam" or "not‬ ‭data points (e.g., cluster 1,‬
‭spam").‬ ‭cluster 2).‬
ot
‭Common‬ ‭- Logistic Regression‬ ‭- K-Means Clustering‬
‭Algorithms‬ ‭- Decision Trees‬ ‭- Hierarchical Clustering‬
‭- Support Vector‬
‭Machines (SVMs)‬ N ‭- DBSCAN‬
's
‭Evaluation‬ ‭- Accuracy‬ ‭- Silhouette Score‬
‭Metrics‬ ‭- Precision‬ ‭- Davies-Bouldin Index‬
ah
‭- Recall‬ ‭- Calinski-Harabasz Score‬

‭- F1 Score‬
hm
‭Real-World‬ ‭- Email spam detection‬ ‭- Customer segmentation‬

‭Example‬ ‭(spam/not spam)‬ ‭- Image segmentation‬
‭- Disease diagnosis‬
‭Re
‭(disease/no disease)‬
‭Data‬ ‭Requires labeled data‬ ‭No need for labeled data; it‬
‭Dependency‬ ‭for training and‬ ‭groups data based on‬
‭classification.‬ ‭similarities.‬
‭Output‬ ‭Predicts a class or‬ ‭Assigns data points to‬
‭Interpretati‬ ‭category based on‬ ‭clusters based on their‬
‭on‬ ‭learned patterns.‬ ‭relative distance or‬
‭similarity.‬
‭Number of‬ ‭Known in advance (e.g.,‬ ‭Number of clusters may or‬

‭Categories‬ ‭two classes for binary‬ ‭may not be known and can‬
‬
es
‭classification).‬ ‭vary.‬
‭Examples of‬ ‭- Fraud detection‬ ‭- Market segmentation‬
ot
‭Use‬ ‭- Medical diagnosis‬ ‭- Social network analysis‬
‭- Sentiment analysis‬ ‭- Anomaly detection‬
‭Advantages‬
‭results‬
‭- Can handle complex‬
N
‭- Clear, interpretable‬ ‭- Does not require labeled‬
‭data‬
‭- Can reveal hidden patterns‬
's
‭labeled data‬ ‭in the data‬
ah
‭Disadvantage‬ ‭- Requires labeled data,‬ ‭- Harder to interpret‬

‭s‬ ‭which can be costly to‬ ‭results‬
‭obtain‬ ‭- Sensitive to noise and‬
hm
‭- Not effective for‬ ‭data distribution‬

‭discovering unknown‬
‭patterns‬
‭Re
‭Q3) Explain the working of random forest by giving example‬
‭Random Forest is a popular machine learning algorithm that belongs to‬

‭the supervised learning technique. It can be used for both‬
‭Classification and Regression problems in ML. It is based on the‬
‭concept of ensemble learning, which is a process of combining multiple‬
‭classifiers to solve a complex problem and to improve the performance‬
‭of the model.‬
‭As the name suggests, "Random Forest is a classifier that contains a‬

‭number of decision trees on various subsets of the given dataset and‬
‬
‭takes the average to improve the predictive accuracy of that dataset."‬
es
‭Instead of relying on one decision tree, the random forest takes the‬
‭prediction from each tree and based on the majority votes of‬
ot
‭predictions, and it predicts the final output.‬
‭The greater number of trees in the forest leads to higher accuracy‬
N
‭and prevents the problem of overfitting.‬
's
‭The below diagram explains the working of the Random Forest‬
‭algorithm:‬
ah
hm
‭Re
‭There are two assumptions for a better Random forest classifier:‬
‭1.‬ ‭There should be some actual values in the feature variable of the‬
‭dataset so that the classifier can predict accurate results rather‬
‭than a guessed result.‬
‭2.‬ ‭The predictions from each tree must have very low correlations.‬
‬
‭Why use Random Forest?‬
es
‭●‬ ‭It takes less training time as compared to other algorithms.‬
‭●‬ ‭It predicts output with high accuracy, even for the large dataset‬
ot
‭it runs efficiently.‬
‭●‬ ‭It can also maintain accuracy when a large proportion of data is‬
‭missing.‬
N
‭How does Random Forest algorithm work?‬
's
‭Random Forest works in two-phase first is to create the random forest‬
‭by combining N decision tree, and second is to make predictions for‬
ah
‭each tree created in the first phase.‬
‭Step-1: Select random K data points from the training set.‬

hm
‭Step-2: Build the decision trees associated with the selected data‬
‭points (Subsets).‬
‭Re
‭Step-3: Choose the number N for decision trees that you want to build.‬
‭Step-4: Repeat Step 1 & 2.‬

‭Step-5: For new data points, find the predictions of each decision tree,‬
‭and assign the new data points to the category that wins the majority‬
‭votes.‬
‭Example: Suppose there is a dataset that contains multiple fruit‬

‭images. So, this dataset is given to the Random forest classifier. The‬
‬
‭dataset is divided into subsets and given to each decision tree. During‬
es
‭the training phase, each decision tree produces a prediction result, and‬
‭when a new data point occurs, then based on the majority of results,‬
ot
‭the Random Forest classifier predicts the final decision. Consider the‬
‭below image:‬
N
's
ah
hm
‭Re
‭Applications of Random Forest‬
‭●‬ ‭Banking: Banking sector mostly uses this algorithm for the‬
‭identification of loan risk.‬
‭●‬ ‭Medicine: With the help of this algorithm, disease trends and‬
‭risks of the disease can be identified.‬
‬
‭●‬ ‭Land Use: We can identify the areas of similar land use by this‬
es
‭algorithm.‬
‭●‬ ‭Marketing: Marketing trends can be identified using this‬
ot
‭algorithm.‬
‭Advantages of Random Forest‬
N
‭●‬ ‭Random Forest is capable of performing both Classification and‬
‭Regression tasks.‬
's
‭●‬ ‭It is capable of handling large datasets with high dimensionality.‬
‭●‬ ‭It enhances the accuracy of the model and prevents the‬
ah
‭overfitting issue.‬
‭Disadvantages of Random Forest‬

hm
‭●‬ ‭Although random forest can be used for both classification and‬
‭regression tasks, it is not more suitable for Regression tasks.‬
‭Re
‭Q4) List and elaborate applications of random forest‬
‭1. Healthcare: Disease Diagnosis and Risk Prediction‬
‭●‬ ‭Application‬‭: Random Forest is extensively used in‬‭the medical‬

‭field to predict diseases, identify high-risk patients, and assist in‬
‭diagnostic procedures.‬
‭●‬ ‭How it Works‬‭: The algorithm processes medical data‬‭(such as‬
‭patient history, symptoms, test results) to classify whether a‬
‭patient is at risk of a disease (e.g., diabetes, heart disease). It‬
‭also helps predict outcomes like the chances of recovery based on‬
‭multiple medical variables.‬
‭●‬ ‭Example‬‭: Predicting whether a patient has cancer based‬‭on biopsy‬
‬
‭features, identifying high-risk cardiovascular patients, or‬
es
‭predicting whether a person is prone to certain genetic diseases.‬
‭2. Finance: Fraud Detection‬
ot
‭●‬ ‭Application‬‭: Random Forest is widely used to detect fraudulent‬
‭transactions in real-time within financial institutions, such as‬
‭banks or credit card companies.‬
N
‭●‬ ‭How it Works‬‭: The algorithm analyzes past transaction‬‭data‬
's
‭labeled as "fraudulent" or "legitimate" and learns patterns that‬
‭indicate fraud. It can then flag suspicious transactions for‬
ah
‭further investigation.‬
‭●‬ ‭Example‬‭: Identifying credit card fraud by analyzing‬‭transaction‬
‭behaviors (e.g., location, transaction time, amount), or predicting‬
hm
‭fraudulent insurance claims.‬
‭3. Marketing: Customer Segmentation and Recommendation Systems‬

‭Re
‭●‬ ‭Application‬‭: In marketing, Random Forest is used for‬‭customer‬

‭segmentation, personalized recommendations, and identifying‬
‭customer churn.‬
‭●‬ ‭How it Works‬‭: By analyzing customer behaviors, purchasing‬
‭history, and demographic information, the algorithm can classify‬
‭customers into segments and help marketers create targeted‬
‭campaigns.‬
‭●‬ ‭Example‬‭: Grouping customers with similar buying patterns,‬
‭predicting which customers are likely to churn, or recommending‬
‭products based on past purchases.‬
‭4. E-commerce: Product Recommendations‬
‬
es
‭●‬ ‭Application‬‭: Random Forest can be used in e-commerce‬‭platforms‬
‭to provide personalized product recommendations.‬
ot
‭●‬ ‭How it Works‬‭: The algorithm analyzes past purchase behavior,‬
‭browsing history, and customer preferences to suggest relevant‬
‭products.‬
N
‭●‬ ‭Example‬‭: Amazon’s recommendation engine uses Random‬‭Forest‬
‭models to suggest products that customers might want to buy‬
's
‭based on past behavior and similar user profiles.‬
‭5. Banking: Credit Risk Analysis‬

ah
‭●‬ ‭Application‬‭: Banks use Random Forest for assessing‬

‭creditworthiness and determining whether to approve loan‬
hm
‭applications or credit card limits.‬

‭●‬ ‭How it Works‬‭: The model evaluates financial data,‬‭credit history,‬
‭income, and other features to classify whether a borrower is‬
‭Re
‭likely to default on a loan or manage credit well.‬

‭●‬ ‭Example‬‭: Predicting the risk level of a loan applicant‬‭based on‬
‭their credit score, employment history, and debt-to-income ratio.‬
‭6. Agriculture: Crop Disease Detection and Yield Prediction‬

‭●‬ ‭Application‬‭: In agriculture, Random Forest is used for crop‬
‭disease identification and yield prediction.‬
‭●‬ ‭How it Works‬‭: The model can analyze features like‬‭soil‬
‭conditions, temperature, rainfall, and satellite images to classify‬
‭whether crops are healthy or diseased and to predict crop yields‬
‭for the season.‬
‬
‭●‬ ‭Example‬‭: Identifying diseased crops from image data,‬‭predicting‬
es
‭wheat yield based on climate and soil data.‬
‭7. Natural Language Processing (NLP): Sentiment Analysis‬
ot
‭●‬ ‭Application‬‭: Random Forest is employed in sentiment analysis to‬
‭classify text (such as reviews, social media posts) into categories‬
N
‭like positive, negative, or neutral sentiment.‬
‭●‬ ‭How it Works‬‭: The algorithm analyzes word frequencies,‬
's
‭sentence structures, and other textual features to classify text‬
‭into various sentiment categories.‬
ah
‭●‬ ‭Example‬‭: Classifying movie reviews, product feedback,‬‭or social‬

‭media posts as positive, negative, or neutral for brand reputation‬
‭analysis.‬
hm
‭8. Cybersecurity: Intrusion Detection‬
‭●‬ ‭Application‬‭: Random Forest is used to detect network‬‭intrusions‬

‭Re
‭and cybersecurity threats by identifying abnormal patterns in‬

‭network traffic.‬
‭●‬ ‭How it Works‬‭: By learning from past intrusion data,‬‭the model‬
‭can classify incoming traffic as normal or malicious, helping‬
‭network administrators flag suspicious activities.‬
‭●‬ ‭Example‬‭: Detecting unauthorized access attempts, malware‬
‭attacks, or unusual login patterns in a network.‬
‭9. Environmental Science: Climate Change Prediction‬
‭●‬ ‭Application‬‭: Random Forest is employed to predict‬‭climate‬

‭patterns and understand environmental changes based on vast‬
‬
‭amounts of climate data.‬
es
‭●‬ ‭How it Works‬‭: The algorithm processes historical weather‬‭data,‬
‭temperature records, CO2 levels, and other environmental‬
ot
‭factors to predict future climate scenarios.‬
‭●‬ ‭Example‬‭: Predicting temperature rise, rainfall patterns, or CO2‬
‭levels in the atmosphere for the next decade based on historical‬
‭data.‬
N
‭10. Manufacturing: Quality Control and Fault Detection‬
's
‭●‬ ‭Application‬‭: Random Forest is used to improve product‬‭quality and‬
ah
‭detect defects in manufacturing processes.‬

‭●‬ ‭How it Works‬‭: The algorithm analyzes data from manufacturing‬
‭equipment, such as sensor readings, production rates, and‬
hm
‭material properties, to classify whether a product meets quality‬

‭standards or if there is a defect.‬
‭●‬ ‭Example‬‭: Detecting faulty parts in a car manufacturing‬‭process‬
‭Re
‭by analyzing machine sensor data, or predicting machinery‬

‭breakdowns based on operational data.‬
‭11. Bioinformatics: Gene Classification‬

‭●‬ ‭Application‬‭: In bioinformatics, Random Forest is applied to‬
‭classify genes based on their expression profiles or identify gene‬
‭mutations linked to specific diseases.‬
‭●‬ ‭How it Works‬‭: The algorithm analyzes genetic data,‬‭including‬
‭gene expression levels or DNA sequences, to classify genes into‬
‭categories (e.g., normal vs. mutated) or predict the functions of‬
‬
‭unknown genes.‬
es
‭●‬ ‭Example‬‭: Classifying tumor vs. non-tumor genes based‬‭on‬
‭expression data, identifying genetic markers associated with‬
ot
‭hereditary diseases.‬
‭12. Image Recognition: Object Detection and Classification‬
N
‭●‬ ‭Application‬‭: Random Forest is used in image recognition to‬
‭classify and detect objects in images.‬
's
‭●‬ ‭How it Works‬‭: The algorithm processes pixel values,‬‭colors,‬
‭shapes, and textures from images to classify objects or detect‬
ah
‭specific patterns.‬
‭●‬ ‭Example‬‭: Recognizing objects like cars, animals, or‬‭faces in‬
‭images, or classifying handwritten digits for automated data‬
hm
‭entry.‬
‭Q5) What is confusion matrix? Provide examples‬

‭Re
‭What is a Confusion Matrix‬

‭The confusion matrix shows the ways in which your classification model‬
‭is confused when it makes predictions.‬
‭A Confusion matrix is an N x N matrix used for evaluating the‬
‭performance of a classification model, where N is the number of target‬
‭classes.‬
‭The matrix compares the actual target values with those predicted by‬
‭the machine learning model.‬
‭This gives us a holistic view of how well our classification model is‬
‭performing and what kinds of errors it is making.‬
‭How to Calculate a Confusion Matrix‬
‬
‭1. You need a test dataset or a validation dataset with expected‬
es
‭outcome values.‬
‭2. Make a prediction for each row in your test dataset.‬
ot
‭3. From the expected outcomes and predictions count: The number of‬
‭correct predictions for each class.‬
‭The number of incorrect predictions for each class, organized by the‬
‭class that was predicted.‬
N
‭4. These numbers are then organized into a table, or a matrix as‬
's
‭follows:‬
ah
hm
‭Expected down the side‬‭: Each row of the matrix corresponds to a‬

‭Re
‭predicted class.‬
‭Predicted across the top‬‭: Each column of the matrix corresponds to‬
‭an actual class.‬
‭Confusion Matrix‬
‭True Positive (TP)‬
‬
‭The predicted value matches the actual value‬
es
‭The actual value was positive and the model predicted a positive value‬
‭True Negative (TN)‬
ot
‭The actual value was negative and the model predicted a negative value‬
‭False Positive (FP)‬
N
‭The predicted value was falsely predicted‬
‭The actual value was negative but the model predicted a positive value‬
's
‭False Negative (FN)‬
ah
‭The actual value was positive but the model predicted a negative value‬
‭Need for Confusion Matrix in Machine learning‬

hm
‭It evaluates the performance of the classification models, when they‬

‭make predictions on test data, and tells how good our classification‬
‭model is.‬
‭Re
‭• It not only tells the error made by the classifiers but also the type‬
‭of errors such as it is either type-l or type-ll error.‬
‭• With the help of the confusion matrix, we can calculate the different‬
‭parameters for the model, such as accuracy, precision, etc.‬
‭Example:‬
‭Expected‬ ‭Predicted‬
‭man‬ ‭woman‬
‭man‬ ‭man‬
‭woman‬ ‭woman‬
‭man‬ ‭man‬
‬
es
‭women‬ ‭man‬
‭women‬ ‭women‬
ot
‭man‬ ‭man‬
‭man‬
N
‭women‬
's
ah
‭men classified as men: 3 women classified as women: 4‬

‭men classified as women: 2 woman classified as men: 1‬
hm
‭man‬ ‭woman‬
‭man‬ ‭3‬ ‭2‬
‭woman‬ ‭1‬ ‭4‬

‭Re
‭The total actual men in the dataset is the sum of the values on the men‬
‭column (3+2)‬
‭The total actual women in the dataset is the sum of values in the‬
‭women column (1 +4).‬
‭The correct values are organized in a diagonal line from top left to‬
‭bottom-right of the matrix (3+4).‬
‭More errors were made by predicting men as women than predicting‬
‭women as men‬
‬
es
ot
‭True Positive:‬
‭Interpretation: You predicted positive and it's true. You predicted‬
N
‭that a woman is pregnant and she actually is.‬
‭True Negative:‬
's
‭Interpretation: You predicted negative and it's true. You predicted‬
‭that a man is not pregnant and he actually is not‬
ah
‭False Positive:‬
‭Interpretation: You predicted positive and it's false. You predicted‬
‭that a man is pregnant but he actually is not.‬
hm
‭False Negative:‬
‭Interpretation: You predicted negative and it's false. You predicted‬
‭that a woman is not pregnant but she actually is.‬
‭Re
‭Q6) Explain the concept of type 1 and type 2 errors by giving suitable‬
‭examples.‬
‭Confusion Matrix‬
‭True Positive (TP)‬
‬
es
‭The actual value was positive and the model predicted a positive value‬
‭True Negative (TN)‬
ot
‭The actual value was negative and the model predicted a negative value‬
‭False Positive (FP)‬
N
‭The actual value was negative but the model predicted a positive value‬
's
‭False Negative (FN)‬
ah
‭The actual value was positive but the model predicted a negative value‬
‭Type 1 and Type 2 Error‬

hm
‭Scenario 1:‬‭We don't have a kitten among the group. Yet, ML algo‬
‭predicts it is there. If we accept the ML algo prediction then it is Type‬
‭1 error also known as 'False Positive'‬
‭Re
‭Scenario 2:‬‭We have a kitten among the group. Yet, ML algo predicts it‬
‭is not there. If we accept the ML algo prediction then it is Type 2‬
‭error also known as 'False Negative'.‬
‭Use cases of Type 1 and Type 2‬

‭Scenario/Problem Statement 1:‬‭Providing access to an asset post a‬
‭biometric scan.‬
‭Type I error: Possibility of rejection even with an authorized match.‬
‭Type II error: Possibility of acceptance even with a unauthorized‬
‭match.‬
‭Scenario/Problem Statement 2:‬‭Construction Model of a bridge is‬
‭correct‬
‭Type I error: Predicting that the model is correct when it is not.‬
‬
‭Type II error: Predicting that a model is not correct when it is‬
es
‭correct.‬
‭Scenario/Problem Statement 3:‬‭Medical trials for a drug which is a‬
ot
‭cure for Cancer‬
‭Type I error: Predicting that a cure is found when it is not the case.‬
‭Type II error: Predicting that a cure is not found when in fact it is the‬
‭case.‬
N
's
‭Q7) Discuss following by giving suitable examples.‬
‭● Overfitting ● Underfitting‬
ah
‭Underfitting in Machine Learning‬

‭A statistical model or a machine learning algorithm is said to have‬
hm
‭underfitting when a model is too simple to capture data complexities.‬

‭It represents the inability of the model to learn the training data‬
‭effectively result in poor performance both on the training and testing‬
‭Re
‭data. It mainly happens when we uses very simple model with overly‬
‭simplified assumptions. To address underfitting problem of the model,‬
‭we need to use more complex models, with enhanced feature‬
‭representation, and less regularization.‬
‭Note: The underfitting model has High bias and low variance.‬
‭Reasons for Underfitting‬
‭●‬ ‭The model is too simple, So it may be not capable to represent‬
‭the complexities in the data.‬
‭●‬ ‭The input features which is used to train the model is not the‬
‭adequate representations of underlying factors influencing the‬
‭target variable.‬
‬
‭●‬ ‭The size of the training dataset used is not enough.‬
es
‭●‬ ‭Excessive regularization are used to prevent the overfitting,‬
‭which constraint the model to capture the data well.‬
ot
‭●‬ ‭Features are not scaled.‬
‭Techniques to Reduce Underfitting‬

‭●‬ ‭Increase model complexity.‬
N
‭●‬ ‭Increase the number of features, performing feature‬
's
‭engineering.‬
‭●‬ ‭Remove noise from the data.‬
ah
‭●‬ ‭Increase the number of epochs or increase the duration of‬

‭training to get better results.‬
hm
‭Overfitting in Machine Learning‬

‭A statistical model is said to be overfitted when the model does not‬
‭make accurate predictions on testing data. When a model gets trained‬
‭Re
‭with so much data, it starts learning from the noise and inaccurate‬
‭data entries in our data set. And when testing with test data results in‬
‭High variance. Then the model does not categorize the data correctly,‬
‭because of too many details and noise. The causes of overfitting are‬
‭the non-parametric and non-linear methods because these types of‬
‭machine learning algorithms have more freedom in building the model‬
‭based on the dataset and therefore they can really build unrealistic‬
‭models. A solution to avoid overfitting is using a linear algorithm if we‬
‭have linear data or using the parameters like the maximal depth if we‬
‭are using decision trees.‬
‭In a nutshell, Overfitting is a problem where the evaluation of machine‬

‭learning algorithms on training data is different from unseen data.‬
‬
es
‭Reasons for Overfitting:‬
‭●‬ ‭High variance and low bias.‬
ot
‭●‬ ‭The model is too complex.‬
‭●‬ ‭The size of the training data.‬
‭Techniques to Reduce Overfitting‬

N
‭●‬ ‭Improving the quality of training data reduces overfitting by‬
's
‭focusing on meaningful patterns, mitigate the risk of fitting the‬
‭noise or irrelevant features.‬
ah
‭●‬ ‭Increase the training data can improve the model’s ability to‬
‭generalize to unseen data and reduce the likelihood of‬
‭overfitting.‬
hm
‭●‬ ‭Reduce model complexity.‬

‭●‬ ‭Early stopping during the training phase (have an eye over the‬
‭loss over the training period as soon as loss begins to increase‬
‭Re
‭stop training).‬
‭●‬ ‭Ridge Regularization and Lasso Regularization.‬
‭●‬ ‭Use dropout for neural networks to tackle overfitting.‬
‬
es
ot
N
's
ah
‭Q8) Define or explain following terms‬

hm
‭● Entropy ● Information gain‬
‭What is Entropy in Machine Learning‬

‭Re
‭Entropy is the measurement of disorder or impurities in the‬

‭information processed in machine learning. It determines how a‬
‭decision tree chooses to split data.‬
‭We can understand the term entropy with any simple example: flipping‬
‭a coin. When we flip a coin, then there can be two outcomes. However,‬
‭it is difficult to conclude what would be the exact outcome while‬
‭flipping a coin because there is no direct relation between flipping a‬
‭coin and its outcomes. There is a 50% probability of both outcomes;‬
‭then, in such scenarios, entropy would be high. This is the essence of‬
‭entropy in machine learning.‬
‭Mathematical Formula for Entropy‬
‬
‭Consider a data set having a total number of N classes, then the‬
es
‭entropy (E) can be determined with the formula below:‬
ot
‭Where;‬
‭Pi = Probability of randomly selecting an example in class I;‬
N
‭Entropy always lies between 0 and 1, however depending on the number‬
‭of classes in the dataset, it can be greater than 1. But the high value‬
's
‭of‬
ah
‭Let's understand it with an example where we have a dataset having‬

‭three colors of fruits as red, green, and yellow. Suppose we have 2 red,‬
hm
‭2 green, and 4 yellow observations throughout the dataset. Then as per‬

‭the above equation:‬
‭Re
‭Where;‬
‭Pr = Probability of choosing red fruits;‬
‭Pg = Probability of choosing green fruits and;‬
‭Py = Probability of choosing yellow fruits.‬
‭Pr = 2/8 =1/4 [As only 2 out of 8 datasets represents red fruits]‬
‭Pg = 2/8 =1/4 [As only 2 out of 8 datasets represents green fruits]‬
‭Py = 4/8 = 1/2 [As only 4 out of 8 datasets represents yellow fruits]‬
‬
‭Now our final equation will be such as;‬
es
ot
‭So, entropy will be 1.5.‬
N
's
‭Let's consider a case when all observations belong to the same class;‬
ah
‭then entropy will always be 0.‬

hm
‭E=−(1log21)‬
‭= 0‬
‭Re
‭When entropy becomes 0, then the dataset has no impurity. Datasets‬

‭with 0 impurities are not useful for learning. Further, if the entropy is‬
‭1, then this kind of dataset is good for learning.‬
‬
es
‭What is the information gain in Entropy?‬
‭Information gain is defined as the pattern observed in the dataset and‬
ot
‭reduction in the entropy.‬
‭formula:‬
N
‭Mathematically, information gain can be expressed with the below‬
‭Information Gain = (Entropy of parent node)-(Entropy of child node)‬

's
‭Note: Information gain is calculated as 1-Entropy.‬
ah
‭Let's understand it with an example having three scenarios as follows:‬

hm
‭Re
‭Let's say we have a tree with a total of four values at the root node‬
‭that is split into the first level having one value in one branch (say,‬
‭Branch 1) and three values in the other branch (Branch 2). The entropy‬
‭at the root node is 1.‬
‭Now, to compute the entropy at the child node 1, the weights are taken‬
‭as ? for Branch 1 and ? for Branch 2 and are calculated using Shannon's‬
‭entropy formula. As we had seen above, the entropy for child node 2 is‬
‬
‭zero because there is only one value in that child node, meaning there‬
es
‭is no uncertainty, and hence, the heterogeneity is not present.‬
ot
‭H(X) = - [(1/3 * log2 (1/3)) + (2/3 * log2 (2/3))] = 0.9184‬
‭The information gain for the above case is the reduction in the‬
‭weighted average of the entropy.‬
N
's
‭Information Gain = 1 - ( ¾ * 0.9184) - (¼ *0) = 0.3112‬
ah
‭The more the entropy is removed, the greater the information gain.‬
‭The higher the information gain, the better the split.‬
hm
‭Q9) Explain naïve bayes classification‬
‭●‬ ‭Naïve Bayes algorithm is a supervised learning algorithm, which is‬

‭Re
‭based on Bayes theorem and used for solving classification‬

‭problems.‬
‭●‬ ‭It is mainly used in text classification that includes a‬
‭high-dimensional training dataset.‬
‭●‬ ‭Naïve Bayes Classifier is one of the simple and most effective‬
‭Classification algorithms which helps in building the fast machine‬
‭learning models that can make quick predictions.‬
‭●‬ ‭It is a probabilistic classifier, which means it predicts on the‬
‭basis of the probability of an object.‬
‭●‬ ‭Some popular examples of Naïve Bayes Algorithm are spam‬
‭filtration, Sentimental analysis, and classifying articles.‬
‭●‬ ‭Naïve: It is called Naïve because it assumes that the occurrence‬
‭of a certain feature is independent of the occurrence of other‬
‬
‭features.‬
es
‭●‬ ‭Such as if the fruit is identified on the bases of color, shape, and‬
‭taste, then red, spherical, and sweet fruit is recognized as an‬
ot
‭apple. Hence each feature individually contributes to identify‬
‭that it is an apple without depending on each other.‬
‭●‬ ‭Bayes: It is called Bayes because it depends on the principle of‬
‭Bayes' Theorem.‬
N
‭●‬ ‭Bayes' Theorem: Bayes' theorem is also known as Bayes' Rule or‬
's
‭Bayes' law, which is used to determine the probability of a‬
‭hypothesis with prior knowledge. It depends on the conditional‬
ah
‭probability.‬
‭●‬ ‭The formula for Bayes' theorem is given as:‬
hm
‭Where,‬
‭P(A|B) is Posterior probability: Probability of hypothesis A on the‬
‭Re
‭observed event B.‬

‭P(B|A) is Likelihood probability: Probability of the evidence given that‬
‭the probability of a hypothesis is true.‬
‭P(A) is Prior Probability: Probability of hypothesis before observing the‬
‭evidence.‬
‭P(B) is Marginal Probability: Probability of Evidence.‬
‭Advantages of Naïve Bayes Classifier:‬
‭●‬ ‭Naïve Bayes is one of the fast and easy ML algorithms to predict‬
‭a class of datasets.‬
‭●‬ ‭It can be used for Binary as well as Multi-class Classifications.‬
‭●‬ ‭It performs well in Multi-class predictions as compared to the‬
‭other Algorithms.‬
‬
‭●‬ ‭It is the most popular choice for text classification problems.‬
es
‭Disadvantages of Naïve Bayes Classifier:‬
ot
‭●‬ ‭Naive Bayes assumes that all features are independent or‬
‭unrelated, so it cannot learn the relationship between features.‬
N
‭Applications of Naïve Bayes Classifier:‬
‭●‬ ‭It is used for Credit Scoring.‬
's
‭●‬ ‭It is used in medical data classification.‬
‭●‬ ‭It can be used in real-time predictions because Naïve Bayes‬
ah
‭Classifier is an eager learner.‬

‭●‬ ‭It is used in Text classification such as Spam filtering and‬
‭Sentiment analysis.‬
hm
‭Types of Naïve Bayes Model:‬

‭●‬ ‭Gaussian: The Gaussian model assumes that features follow a‬
‭Re
‭normal distribution. This means if predictors take continuous‬

‭values instead of discrete, then the model assumes that these‬
‭values are sampled from the Gaussian distribution.‬
‭●‬ ‭Multinomial: The Multinomial Naïve Bayes classifier is used when‬
‭the data is multinomial distributed. It is primarily used for‬
‭document classification problems, it means a particular document‬
‭belongs to which category such as Sports, Politics, education, etc.‬
‭The classifier uses the frequency of words for the predictors.‬
‭●‬ ‭Bernoulli: The Bernoulli classifier works similar to the Multinomial‬
‭classifier, but the predictor variables are the independent‬
‭Booleans variables. Such as if a particular word is present or not‬
‭in a document. This model is also famous for document‬
‬
‭classification tasks.‬
es
‭Q10) Describe advantages and applications of naïve bayes‬
ot
‭classification‬
‭Advantages of Naïve Bayes Classification‬
N
‭1.‬ ‭Simplicity and Ease of Implementation‬‭:‬
‭○‬ ‭Naïve Bayes is simple and easy to implement. It assumes‬
's
‭independence between the features, which reduces‬
‭complexity, making it suitable for quick applications with‬
ah
‭limited computational resources.‬

‭2.‬ ‭Fast and Efficient‬‭:‬
‭○‬ ‭Naïve Bayes is computationally efficient and works well with‬
hm
‭large datasets. Its training and prediction times are fast‬

‭because it simplifies probability calculations using‬
‭conditional independence.‬
‭Re
‭3.‬ ‭Works Well with Small Datasets‬‭:‬

‭○‬ ‭Despite being a simple algorithm, Naïve Bayes performs‬
‭surprisingly well even when the dataset is small. This makes‬
‭it ideal in situations where gathering large amounts of data‬
‭is difficult.‬
‭4.‬ ‭Performs Well with Categorical Data‬‭:‬
‭○‬ ‭Naïve Bayes works particularly well when the input features‬
‭are categorical (e.g., for text classification problems). It‬
‭can handle both binary and multi-class classification tasks‬
‭effectively.‬
‭5.‬ ‭Performs Well for Multiclass Classification‬‭:‬
‭○‬ ‭Unlike some algorithms that struggle with multiclass‬
‬
‭classification, Naïve Bayes handles multiple classes very‬
es
‭well. This makes it ideal for tasks with more than two‬
‭outcomes.‬
ot
‭6.‬ ‭Robust to Irrelevant Features‬‭:‬
‭○‬ ‭Naïve Bayes is relatively immune to irrelevant features in‬
‭the data. Even if the assumption of independence between‬
N
‭features is violated, it can still perform well in many‬
‭practical applications.‬
's
‭7.‬ ‭Performs Well with Text Data and Natural Language‬
‭Processing (NLP)‬‭:‬
ah
‭○‬ ‭Naïve Bayes is popular in text-related tasks (e.g., spam‬

‭filtering, sentiment analysis) because of its ability to handle‬
‭high-dimensional data and its efficiency with text‬
hm
‭classification tasks.‬
‭8.‬ ‭Handles Missing Data‬‭:‬
‭○‬ ‭Naïve Bayes can handle missing data relatively well. While‬
‭Re
‭some machine learning algorithms may need data imputation‬

‭methods, Naïve Bayes can make predictions even with‬
‭missing attributes by ignoring them during probability‬
‭calculations.‬
‭Applications of Naïve Bayes Classification‬

‭1.‬ ‭Spam Filtering‬‭:‬
‭○‬ ‭Application‬‭: Email service providers like Gmail and‬‭Yahoo use‬
‭Naïve Bayes for spam detection. The classifier labels emails‬
‭as either "spam" or "not spam" based on their content,‬
‭sender information, and other features.‬
‭○‬ ‭How it Works‬‭: The algorithm is trained on a dataset‬‭of‬
‬
‭labeled emails (spam and not spam). It calculates the‬
es
‭likelihood of an email being spam based on the presence or‬
‭absence of certain keywords and features.‬
ot
‭2.‬ ‭Sentiment Analysis‬‭:‬
‭○‬ ‭Application‬‭: Naïve Bayes is used to classify customer‬
‭reviews, social media posts, or feedback into categories like‬
N
‭positive, negative, or neutral sentiment.‬
‭○‬ ‭How it Works‬‭: By analyzing the frequency of positive‬‭or‬
's
‭negative words in a dataset of labeled text (reviews or‬
‭tweets), Naïve Bayes can predict the sentiment of new text‬
ah
‭data.‬
‭○‬ ‭Example‬‭: It’s widely used in e-commerce platforms to‬
‭analyze customer reviews and gauge the overall sentiment‬
hm
‭towards products.‬
‭3.‬ ‭Document Classification‬‭:‬
‭○‬ ‭Application‬‭: Naïve Bayes is widely used in text classification‬
‭Re
‭tasks such as news categorization, topic labeling, and‬

‭document classification.‬
‭○‬ ‭How it Works‬‭: The classifier analyzes words or phrases‬‭in‬
‭documents and classifies them into predefined categories,‬
‭such as politics, sports, entertainment, or technology.‬
‭○‬ ‭Example‬‭: News websites use Naïve Bayes to automatically‬
‭categorize articles based on their content.‬
‭4.‬ ‭Medical Diagnosis‬‭:‬
‭○‬ ‭Application‬‭: Naïve Bayes is used in healthcare to‬‭predict‬
‭diseases based on patient data such as symptoms, medical‬
‭history, and test results.‬
‭○‬ ‭How it Works‬‭: The algorithm is trained on a dataset‬‭of‬
‭patient data with known diagnoses. It then uses this‬
‬
‭information to predict whether new patients might have a‬
es
‭particular disease based on the likelihood of specific‬
‭symptoms.‬
ot
‭○‬ ‭Example‬‭: Predicting the likelihood of a patient having a‬
‭disease like diabetes or heart disease based on input‬
‭features like age, weight, blood sugar levels, etc.‬
‭5.‬ ‭Recommendation Systems‬‭:‬
N
‭○‬ ‭Application‬‭: Naïve Bayes is applied in recommendation‬
's
‭engines to suggest items such as movies, books, or products‬
‭to users based on their preferences.‬
ah
‭○‬ ‭How it Works‬‭: By analyzing user behavior and preferences‬

‭(such as past purchases or movie ratings), the algorithm‬
‭classifies items into categories (e.g., “highly recommended”‬
hm
‭or “not recommended”) and makes personalized‬

‭recommendations.‬
‭○‬ ‭Example‬‭: Netflix or Amazon recommending movies or‬
‭Re
‭products based on user preferences.‬

‭6.‬ ‭Credit Scoring and Risk Prediction‬‭:‬
‭○‬ ‭Application‬‭: Banks and financial institutions use‬‭Naïve Bayes‬
‭to assess the creditworthiness of loan applicants and‬
‭predict the risk of default.‬
‭○‬ ‭How it Works‬‭: The algorithm analyzes features such as‬
‭credit history, income, and employment to classify‬
‭customers into low-risk or high-risk categories.‬
‭○‬ ‭Example‬‭: Predicting whether a customer is likely to‬‭default‬
‭on a loan or not, based on their financial behavior.‬
‭7.‬ ‭Face Recognition‬‭:‬
‬
‭○‬ ‭Application‬‭: Naïve Bayes can be used in facial recognition‬
es
‭systems to classify faces in images or videos.‬
‭○‬ ‭How it Works‬‭: The algorithm analyzes facial features,‬‭such‬
ot
‭as distance between eyes, shape of the nose, etc., and‬
‭matches them to pre-classified images in the database.‬
‭○‬ ‭Example‬‭: Used in security systems to recognize and‬‭verify‬
‭individuals' identities.‬
‭8.‬ ‭Anomaly Detection‬‭:‬ N
's
‭○‬ ‭Application‬‭: Naïve Bayes is applied in cybersecurity‬‭to‬
‭detect unusual patterns or anomalies, such as fraud or‬
ah
‭network intrusions.‬
‭○‬ ‭How it Works‬‭: It learns the normal behavior from historical‬
‭data and flags any outliers or anomalies as potential threats.‬
hm
‭○‬ ‭Example‬‭: Detecting unusual login attempts or financial‬

‭transactions that might indicate fraud.‬
‭9.‬ ‭Real-Time Prediction in E-commerce‬‭:‬
‭Re
‭○‬ ‭Application‬‭: Naïve Bayes is used for real-time predictions,‬

‭such as determining whether a customer will make a‬
‭purchase or abandon the cart.‬
‭○‬ ‭How it Works‬‭: By analyzing user behavior data, the‬
‭algorithm classifies customers into groups, such as “likely to‬
‭purchase” or “unlikely to purchase,” in real time.‬
‭○‬ ‭Example‬‭: E-commerce sites like Amazon may use this to‬
‭offer last-minute discounts to users who are likely to‬
‭abandon their shopping cart.‬
‭Q11) Problems based on decision tree CART/ ID3‬

‭Q12) Problems based on naïve bayes‬
‬
es
‭Numerical PDF‬
ot
‭Q13) Explain the working of SVM‬
‭SUPPORT VECTOR MACHINE‬
N
‭SVM is a method for classification of both linear and non-linear data.‬
's
‭Linearly Separable Data:‬
‭If the given data is classified into distinct classes such that‬
ah
‭they can be separated by a‬‭decision boundary‬‭, it is called as‬

‭Linearly Separable Data‬
hm
‭If the given data is classified into distinct classes such that‬
‭they cannot be separated by a decision boundary, it is called‬
‭Non-linearly Separable Data. Since it cannot be separated by a‬
‭Re
‭single line, it is non-linear.‬
‭SVM uses the concept of MMH (Maximum Marginal HyperPlane)‬
‭The goal of the SVM algorithm is to create the best line or‬
‭decision boundary that can segregate ‘n’ dimensional space into‬
‭classes so that we can easily put the new data points in the‬
‭correct category in the future.‬
‭This best-decision boundary is known as‬‭Hyperplane‬
‭SVM chooses the extreme points/vectors that helps in creating‬

‭the Hyperplane.‬
‬
es
‭These extreme cases are called Support Vectors and hence the‬
‭algorithm is termed as Support Vector Machine(SVM).‬
ot
‭The line formed by joining the points closest to the hyperplane‬
‭is the Margin.‬
N
‭Margin is the distance between the support vectors and the‬
‭hyperplane.‬
's
‭TERMINOLOGIES:‬
‭1. Hyperplane‬‭:‬‭It is a decision boundary used to separate‬

ah
‭data points of different classes.‬
‭For a linear classification, it will be a linear equation:‬

hm
‭𝑊‬‭𝑥‭‬ ‬ + ‭‬‭‭𝑏
‬ ‬‭‬ = ‭‭0
‬‬
‭Re
‭where,‬
‭W = weight vector‬
‭b = bias‬
‭We can write the equation for the two classes:‬
‭𝑊‬‭𝑥‬ + ‭‭‬‭𝑏
‬ ≥‬‭‭1
‬ ‬‭‭‬‭𝑓
‬ 𝑜𝑟‬‭‭𝑦
‬ ‭‬‭𝑖‬‭‬ ‬ = ‭‬‭1‬
‭𝑊‬‭𝑥‬ + ‭‭‬‭𝑏
‬ ≥‬‭‭1
‬ ‬‭‭‬‭𝑓
‬ 𝑜𝑟‬‭‭𝑦
‬ ‭‬‭𝑖‬‭‬ ‬ = ‭‬ − ‭1‬
‭Considering these two equalities,‬
‭𝑦‬‭‬(‭𝑊‬‭𝑥‬ + ‭‭𝑏
‬ ‬)‭‬ = ‭‭1
‬‬
‭This is used to decide the Support Vectors.‬
‬
‭2.‬‭These are the closest data points to the hyperplane which‬
es
‭plays a critical role in deciding the hyperplane and margin.‬
‭3.‬‭Margins are of two types : Hard margin & Soft Margin‬
ot
N
's
ah
hm
‭Hard Margin : The maximum margin hyperplane or the hard‬

‭margin is a hyperplane that properly separates the data points‬
‭Re
‭of different categories without any miss-classifications.‬
‭Soft Margin : When the data is not perfectly separable or‬

‭contain outliers, SVM permits a soft margin technique.‬
‭4.‬‭Each‬ ‭datapoint‬ ‭has‬ ‭a‬ ‭Slack‬ ‭Variable‬‭introduced‬‭by‬‭the‬

‭soft-margin‬ ‭formulation,‬ ‭which‬ ‭softens‬ ‭the‬‭strict‬‭margin‬
‭requirements‬ ‭and‬ ‭permits‬ ‭certain‬ ‭miss-classifications‬ ‭or‬
‭violations.‬
‭5.‬ ‭The‬ ‭margin‬ ‭is‬ ‭calculated‬ ‭as‬ ‭:‬
‬
es
ot
‭Types of SVM:‬
‭1.‬ ‭Linear SVM‬
‭2.‬ ‭Non-Linear SVM‬

N
's
‭Advantages of SVM:‬
‭1.‬‭Effective in high-dimensional cases.‬

ah
‭2.‬‭Different kernel function can be specified for the different‬

‭functions as it is possible to specify custom kernel.‬
hm
‭3.‬‭It is memory efficient.‬
‭Disadvantages:‬
‭Re
‭1.‬‭If the number of features is much greater than the number‬

‭of samples, avoid overfeeding in choosing the kernel‬
‭functions.‬
‭2.‬‭SVM’s do not directly provide probability estimates , these‬

‭are calculated using an expensive Five-Fold Cross-Validation.‬
‭Q14) Applications of SVM‬
‭1. Image Classification‬
‭●‬ ‭Application‬‭: SVM is widely used for classifying images‬‭in‬

‭computer vision tasks, such as facial recognition, object‬
‭detection, and handwriting recognition.‬
‬
‭●‬ ‭How it Works‬‭: The algorithm can effectively classify‬‭images by‬
es
‭finding the optimal hyperplane that separates different classes in‬
‭the feature space derived from the image data.‬
ot
‭●‬ ‭Example‬‭: Recognizing handwritten digits in the MNIST dataset‬
‭or classifying images of cats and dogs.‬
‭2. Text Classification‬
N
‭●‬ ‭Application‬‭: SVM is employed for text categorization‬‭tasks, such‬
's
‭as spam detection, sentiment analysis, and document‬
‭classification.‬
ah
‭●‬ ‭How it Works‬‭: The algorithm converts text data into‬‭numerical‬

‭feature vectors using techniques like TF-IDF or word embeddings‬
‭and then classifies the documents based on these features.‬
hm
‭●‬ ‭Example‬‭: Classifying emails as spam or non-spam, or‬‭determining‬

‭the sentiment of product reviews as positive or negative.‬
‭Re
‭3. Bioinformatics‬
‭●‬ ‭Application‬‭: In bioinformatics, SVM is used for classifying‬‭genes,‬

‭proteins, and biological sequences based on their features.‬
‭●‬ ‭How it Works‬‭: It helps in identifying gene functions‬‭or‬
‭predicting protein structures by analyzing complex biological‬
‭data.‬
‭●‬ ‭Example‬‭: Classifying genes associated with particular diseases or‬
‭predicting protein-protein interactions.‬
‭4. Finance‬
‭●‬ ‭Application‬‭: SVM is applied in financial markets for‬‭credit‬

‭scoring, fraud detection, and stock price prediction.‬
‬
‭●‬ ‭How it Works‬‭: It analyzes historical financial data‬‭to classify‬
es
‭transactions as fraudulent or legitimate, or to predict whether a‬
‭stock price will rise or fall.‬
ot
‭●‬ ‭Example‬‭: Classifying credit applicants into "approved"‬‭or "denied"‬
‭categories based on their financial history.‬
‭5. Medical Diagnosis‬
N
‭●‬ ‭Application‬‭: SVM is used to assist in diagnosing diseases‬‭based on‬
's
‭patient data, such as symptoms and medical history.‬
‭●‬ ‭How it Works‬‭: By analyzing various features related‬‭to patient‬
ah
‭health, SVM can classify individuals as healthy or as having a‬

‭particular disease.‬
‭●‬ ‭Example‬‭: Diagnosing diseases such as cancer by analyzing‬‭medical‬
hm
‭imaging data or patient biomarkers.‬
‭6. Face Detection and Recognition‬

‭Re
‭●‬ ‭Application‬‭: SVM is used in computer vision for face‬‭detection‬

‭and recognition in images and videos.‬
‭●‬ ‭How it Works‬‭: The algorithm classifies regions in‬‭an image as‬
‭containing a face or not, based on features extracted from the‬
‭image.‬
‭●‬ ‭Example‬‭: Implementing face recognition systems in security‬
‭applications or social media platforms.‬
‭7. Customer Segmentation‬
‭●‬ ‭Application‬‭: SVM can be used for customer segmentation‬‭in‬

‭marketing to classify customers into different groups based on‬
‬
‭purchasing behavior and preferences.‬
es
‭●‬ ‭How it Works‬‭: By analyzing customer data, SVM identifies‬
‭distinct groups, allowing marketers to target specific segments‬
ot
‭with tailored campaigns.‬
‭●‬ ‭Example‬‭: Classifying customers as "high value," "low‬‭value," or "at‬
‭risk" based on their purchasing history.‬
‭8. Anomaly Detection‬ N

's
‭●‬ ‭Application‬‭: SVM is employed for anomaly detection‬‭tasks in‬
‭various fields, including cybersecurity, fraud detection, and‬
ah
‭network security.‬
‭●‬ ‭How it Works‬‭: The algorithm can identify unusual patterns‬‭or‬
‭outliers in data, classifying them as anomalies that may require‬
hm
‭further investigation.‬
‭●‬ ‭Example‬‭: Detecting fraudulent transactions in credit‬‭card‬
‭processing or identifying potential intrusions in network traffic.‬
‭Re
‭9. Natural Language Processing (NLP)‬
‭●‬ ‭Application‬‭: SVM is used in NLP tasks, such as part-of-speech‬

‭tagging, named entity recognition, and language identification.‬
‭●‬ ‭How it Works‬‭: It classifies words or phrases based on their‬
‭contextual features to determine their role or identity within‬
‭text.‬
‭●‬ ‭Example‬‭: Classifying sentences as declarative, interrogative,‬‭or‬
‭exclamatory based on their structure.‬
‭10. Time Series Forecasting‬
‬
es
‭●‬ ‭Application‬‭: SVM can be utilized for forecasting time‬‭series data‬
‭in fields like economics, weather prediction, and stock market‬
ot
‭analysis.‬
‭●‬ ‭How it Works‬‭: By analyzing historical data trends,‬‭SVM can‬
‭predict future values in a time series dataset.‬
N
‭●‬ ‭Example‬‭: Predicting future stock prices based on historical‬
‭trends or forecasting weather conditions based on past climate‬
's
‭data.‬
‭11. Robotics‬
ah
‭●‬ ‭Application‬‭: SVM is applied in robotics for object‬‭recognition,‬

‭navigation, and human-robot interaction.‬
hm
‭●‬ ‭How it Works‬‭: It helps robots classify objects in‬‭their‬

‭environment and make decisions based on the classified data.‬
‭●‬ ‭Example‬‭: Enabling robots to recognize and pick up‬‭specific‬
‭Re
‭objects in an industrial setting.‬
‭12. Environmental Science‬
‭●‬ ‭Application‬‭: SVM is used in environmental monitoring‬‭and‬

‭classification of various environmental data, such as land cover‬
‭classification and species distribution modeling.‬
‭●‬ ‭How it Works‬‭: By analyzing satellite imagery and ecological data,‬
‭SVM can classify land types and predict species habitats.‬
‭●‬ ‭Example‬‭: Classifying land use types (e.g., urban,‬‭agricultural,‬
‭forest) from satellite images.‬
‭Q15) what do you mean by hypothesis? Provide examples for null‬

‭and alternate hypothesis with explanation. Provide working of null‬
‬
‭and alternate hypothesis.‬
es
‭What is Hypothesis Testing?‬
ot
‭Hypothesis testing is a statistical method that is used in making‬
‭statistical decisions using experimental data.‬
‭population parameter.‬
‭Ex:‬
N
‭Hypothesis Testing is basically an assumption that we make about the‬
's
‭1) you say an average student in class is 40 or a boy is taller than girls.‬
‭2) Some scientists claim that ultraviolet (UV) light can damage the‬
ah
‭eyes then it may also cause blindness.‬

hm
‭Re
‭Terms‬
‭Hypothesis space (H):‬‭Hypothesis space is defined as a set of all‬
‭possible legal hypotheses; hence it is also known as a hypothesis set‬
‭Hypothesis (h):‬‭It is defined as the approximate function that best‬
‭describes the target in supervised machine learning algorithms. It is‬
‭primarily based on data as well as bias and restrictions applied to data.‬
‬
es
ot
N
's
ah
hm
‭Need of Hypothesis‬
‭Hypothesis testing is an essential procedure in statistics.‬
‭A hypothesis test evaluates two mutually exclusive statements about a‬
‭Re
‭population to determine which statement is best supported by the‬

‭sample data. When we say that a finding is statistically significant‬
‭means a hypothesis test.‬
‭If a person gets 7 hours of sleep, then he will feel less fatigue than if‬
‭he sleeps less. Consumption of sugary drinks every day leads to obesity‬
‭Parameters of Hypothesis Testing:‬
‭Null Hypothesis‬
‭Alternate Hypothesis‬
‭Parameter‬ ‭Null Hypothesis‬ ‭Alternate Hypothesis‬
‭Definition‬ ‭ null hypothesis is a‬

A ‭ n alternative‬
A
‬
‭statement in which‬ ‭hypothesis is a‬
es
‭there is no relation‬ ‭statement in which‬
‭between the two‬ ‭there is some‬
ot
‭Variables.‬ ‭statistical‬
‭relationship between‬
‭the two variables.‬
‭What it is?‬
N
‭ enerally,‬
G
‭researchers try to‬
‭ esearchers try to‬
R
‭accept or‬
's
‭reject or disprove it‬ ‭prove it.‬
‭Testing Process‬ ‭Indirect and Implicit‬ ‭Direct and Explicit‬

ah
‭P-Value‬ ‭ ull hypothesis is‬

N ‭ n alternative‬
A
‭rejected if‬ ‭hypothesis is‬
hm
‭the p-value is less‬ ‭accepted if the‬

‭than the alpha-value;‬ ‭p-value is less than‬
‭otherwise, it is‬ ‭the alpha-value‬
‭accepted.‬ ‭otherwise, it is‬
‭Re
‭rejected.‬
‭Notation‬ ‭H0‬ ‭H1‬
‭Symbol Used‬ ‭ quality Symbols =,‬

E ‭ nequality Symbols !=,‬
I
‭<=, >=‬ ‭!<=, !>=‬
‭Effect on Bio-fertilizer ‘x’ increases plant growth‬
‭Alternative Hypothesis H1:‬‭Application of Bio-fertilizer ‘x’ increases‬
‭plant growth‬
‭Null Hypothesis H1:‬‭Application of Bio-fertilizer ‘x’ does not increase‬
‭plant growth‬
‬
‭Q16) Write a short note on Multivariate Regression‬
es
‭Multivariate Regression‬
ot
‭Multivariate regression is a statistical technique that uses a‬
‭mathematical model to estimate the relationship between a dependent‬
‭variable and multiple independent variables‬
N
‭It's an extension of linear regression, which only involves one response‬
's
‭variable. Multivariate regression can be used in a variety of‬
‭applications, including: Identifying risk factors for an outcome,‬
ah
‭Determining the effect of a procedure on an outcome, Comparing‬

‭different treatment strategies, Quantifying the magnitude of an‬
‭effect, and Developing risk-prediction models.‬
hm
‭Multivariate Regression is a method used to measure the degree at‬

‭which more than one independent variable (predictors) and more than‬
‭Re
‭one dependent variable (responses), are linearly related. The method is‬
‭broadly used to predict the behavior of the response variables‬
‭associated to changes in the predictor variables, once a desired degree‬
‭of relation has been established.‬
‭The Multivariate Regression model, relates more than one predictor‬

‭and more than one response.‬
‭Y = X*B+ ϵ‬
‬
‭Here are some examples of how multivariate regression can be used:‬
es
‭●‬ ‭Pesticide concentration in surface water‬
‭A multivariate regression model can estimate the relationship between‬
ot
‭river flow and seasonal pesticide use, and how these factors affect‬
‭pesticide concentration in surface water.‬
‭●‬ ‭Intracranial bleeding‬
N
‭A multivariate logistic regression analysis can identify the strongest‬
‭predictors of intracranial bleeding, such as vomiting/nausea and‬
's
‭seizures.‬
‭●‬ ‭Multiple genetic variants and neuroimaging phenotypes‬
ah
‭A multivariate regression model can capture the complex relationships‬

‭between genes and brain measurements.‬
hm
‭Q17) State the importance of feature selection. How it is useful in‬

‭machine learning algorithms?‬
‭Re
‭Feature selection is a critical step in the machine learning pipeline that‬

‭involves selecting a subset of relevant features (or variables) for use in‬
‭model construction. It plays a significant role in improving the‬
‭performance of machine learning algorithms. Here’s an overview of its‬
‭importance and usefulness:‬
‭Importance of Feature Selection‬

‭1.‬ ‭Reduces Overfitting‬‭:‬
‭○‬ ‭By eliminating irrelevant or redundant features, feature‬
‭selection helps to reduce the complexity of the model. This,‬
‭in turn, lowers the risk of overfitting, where the model‬
‭learns noise instead of the underlying patterns in the‬
‭training data.‬
‬
‭2.‬ ‭Improves Model Performance‬‭:‬
es
‭○‬ ‭Selecting the most relevant features can enhance the‬
‭model’s accuracy and predictive power. It allows the model‬
ot
‭to focus on the most informative data points, which can lead‬
‭to better generalization to unseen data.‬
‭3.‬ ‭Enhances Interpretability‬‭:‬
N
‭○‬ ‭A model with fewer features is often easier to interpret‬
‭and understand. This is especially important in fields such as‬
's
‭healthcare and finance, where stakeholders need to‬
‭understand the factors driving predictions.‬
ah
‭4.‬ ‭Reduces Computational Cost‬‭:‬

‭○‬ ‭Fewer features lead to a simpler model that requires less‬
hm
‭computational resources, which is particularly beneficial for‬

‭large datasets. It speeds up the training process and‬
‭reduces the time and memory required for both training and‬
‭prediction.‬
‭Re
‭5.‬ ‭Addresses the Curse of Dimensionality‬‭:‬

‭○‬ ‭In high-dimensional spaces, the amount of data required to‬
‭make reliable predictions increases exponentially. Feature‬
‭selection mitigates this issue by reducing the‬
‭dimensionality, allowing the model to perform better with‬
‭limited data.‬
‭6.‬ ‭Improves Data Quality‬‭:‬
‭○‬ ‭Feature selection can help identify and remove noisy or‬
‭irrelevant features that do not contribute meaningfully to‬
‭the analysis, leading to higher-quality datasets and better‬
‭model performance.‬
‭7.‬ ‭Facilitates Model Selection‬‭:‬
‬
‭○‬ ‭Different machine learning algorithms may require‬
es
‭different features for optimal performance. Feature‬
‭selection can help identify the most relevant features for‬
ot
‭each algorithm, aiding in model comparison and selection.‬
‭Usefulness in Machine Learning Algorithms‬
‭1.‬ ‭Enhanced Learning‬‭:‬

N
‭○‬ ‭Machine learning algorithms perform better when they‬
's
‭focus on the most relevant features. Feature selection‬
‭helps in identifying those features that contribute most‬
ah
‭significantly to the output, leading to improved learning.‬

‭2.‬ ‭Faster Training‬‭:‬
‭○‬ ‭Training times are reduced when fewer features are used.‬
hm
‭This is especially important for algorithms like Support‬

‭Vector Machines (SVM), Random Forest, or Neural‬
‭Networks, which can be computationally intensive.‬
‭Re
‭3.‬ ‭Better Generalization‬‭:‬

‭○‬ ‭Models built with relevant features are more likely to‬
‭generalize well to new, unseen data. This leads to better‬
‭performance in real-world applications where the model‬
‭encounters data it has not seen before.‬
‭4.‬ ‭Support for Specific Algorithms‬‭:‬
‭○‬ ‭Some algorithms, like decision trees, can benefit‬
‭significantly from feature selection. By reducing the number‬
‭of features, decision trees can create simpler models that‬
‭make better splits and predictions.‬
‭5.‬ ‭Increased Robustness‬‭:‬
‭○‬ ‭Feature selection can lead to models that are less sensitive‬
‬
‭to variations in the data, making them more robust in the‬
es
‭presence of noise or outliers.‬
‭Methods of Feature Selection‬
ot
‭Feature selection can be performed using various methods, including:‬
N
‭●‬ ‭Filter Methods‬‭: Evaluate features based on statistical measures‬
‭(e.g., correlation, Chi-square test) to select relevant features‬
‭independent of the learning algorithm.‬
's
ah
‭●‬ ‭Wrapper Methods‬‭: Use a specific machine learning algorithm to‬

hm
‭evaluate combinations of features and select the best-performing‬

‭subset.‬
‭Re
‭●‬ ‭Embedded Methods‬‭: Perform feature selection during the model‬

‭training process (e.g., Lasso regression, which adds a penalty for‬
‭including too many features).‬
‬
‭Q18) State and explain common errors in machine learning / 6‬
es
‭Common Mistakes Machine Learning‬
ot
‭5 Common Machine Learning Errors‬
‭●‬ ‭Lack of understanding the mathematical aspect of machine‬
‭learning algorithms‬
N
‭●‬ ‭Data Preparation and Sampling‬
‭○‬ ‭Data Cleansing‬
's
‭○‬ ‭Feature Engineering‬
‭○‬ ‭Sampling‬
ah
‭●‬ ‭Implementing machine learning algorithms without a strategy‬

‭●‬ ‭Implementing everything from scratch‬
‭●‬ ‭Ignoring outliers‬
hm
‭References‬
‭Re
‭Research Paper on Cognitive automation by Christian Engel1 · Philipp‬

‭Ebel1 · Jan Marco Leimeister1‬
‭https://www.javatpoint.com/‬
‭https://www.geeksforgeeks.org/‬
‭https://www.kaggle.com/‬

ML U1 & U2 Notes

Uploaded by

Copyright:

Available Formats

ML U1 & U2 Notes

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

ML U1 & U2 Notes

Uploaded by

Copyright:

Available Formats

‭Machine Learning Unit 1 and Unit 2 Notes‬

‭Q1) What do you mean by Learning? Justify cognitive automation is‬

‭●‬ ‭Supervised‬‭learning‬‭, which applies to the computer-vision systems‬

‭clustered (for example, audience segmentation for streaming‬

‭transformed, reduced, elaborated, stored, recovered, and used.‬

‭Justifying Cognitive Automation as a Subset of Machine Learning with‬

‭become better at understanding and solving customer problems.‬

‭statements using machine learning. It uses techniques from‬

‭on the accuracy of its extractions.‬

‭1. Gathering Data:‬

‭2. Data preparation‬

‭3. Data Wrangling‬

‭useable format. It is the process of cleaning the data, selecting the‬

‭●‬ ‭Missing Values‬

‭4. Data Analysis‬

‭6. Test Model‬

‭per the requirement of project or problem.‬

‭Step 1) Choosing the Training Experience:‬‭The very important and‬

‭first task is to choose the training data or training experience which‬

‭examples. Thus, Machine Learning Algorithm will get more and‬

‭Step 2- Choosing target function:‬‭The next important step is‬

‭Step 5- Final Design:‬‭The final design is created at last when system‬

‭DeepBlue is an intelligent computer which is ML-based won chess game‬

‭Q4) What is the process of machine learning algorithm and its‬

‭Types of Data related to Machine Learning‬

‭1. Quantitative data type: –‬

‭A.) Discrete data type: –‬

‭B.) Continuous data type: –‬

‭values but mathematical operations cannot be performed on it. This‬

‭I. Nominal Data Type:‬

‭there is no absolute zero. Absolute‬

‭IV. Ratio Data Type:‬

‭A Dataset is a set of data grouped into a collection with which‬

‭and suitability for analysis or modeling.‬

‭Why are datasets used?‬

‭learn patterns and make predictions.‬

‭objectives you want to achieve with your machine learning model.‬

‭2. Data Relevance:‬

‭features (attributes) that are meaningful and related to the problem‬

‭3. Data Size:‬

‭4. Data Quality:‬

‭Ensure that the dataset covers a diverse range of scenarios or‬

‭7. Data Availability:‬

‭8. Data Collection:‬

‭11. Data Exploration:‬

‭12. Data Splitting:‬

‭crucial for model evaluation and preventing overfitting.‬

‭13. Ethical Considerations:‬

‭- Be aware of ethical considerations when working with data, especially‬

‭14. Data Licensing:‬

‭15. Iterative Process:‬

‭Definition‬ ‭Learns from labeled data,‬ ‭Learns from unlabeled‬

‭Objective‬ ‭Predict outcomes or‬ ‭Find hidden patterns,‬

‭Training Data‬ ‭Uses labeled data (data‬ ‭Uses unlabeled data‬

‭Types of‬ ‭- Classification‬ ‭- Clustering (grouping‬

‭Algorithms‬ ‭- Linear/Logistic‬ ‭- K-Means Clustering‬

‭Output Labels‬ ‭present.‬ ‭available.‬

‭Task‬ ‭Maps input to a known‬ ‭Identifies hidden‬

‭Approach‬ ‭Learns by example with‬ ‭Learns without guidance,‬