ML U1 & U2 Notes

Download as pdf or txt
Download as pdf or txt
You are on page 1of 92

‭Machine Learning Unit 1 and Unit 2 Notes‬

‭Unit 1‬

‭Q1) What do you mean by Learning? Justify cognitive automation is‬


‭a subset of machine learning by giving examples.‬


es
‭●‬ ‭Machine‬‭learning‬‭is a process through which computerized‬
‭systems use human-supplied data and feedback to independently‬

ot
‭make decisions and predictions, typically becoming more accurate‬
‭with continual training. This contrasts with traditional computing,‬
‭in which every action taken by a computer must be‬
‭pre-programmed.‬
N
‭●‬ ‭Reinforcement‬‭learning‬‭teaches a system as it interacts with an‬
's
‭environment by offering it rewards when it performs an action‬
‭correctly.‬
ah

‭●‬ ‭Supervised‬‭learning‬‭, which applies to the computer-vision systems‬


‭used in autonomous vehicles‬
‭●‬ ‭Unsupervised‬‭learning‬‭, which is used when data need to be‬
hm

‭clustered (for example, audience segmentation for streaming‬


‭services or product recommendations to online shoppers).‬
‭●‬ ‭’Cognition’‬‭refers to all processes by which the sensory input is‬
‭Re

‭transformed, reduced, elaborated, stored, recovered, and used.‬


‭Such terms as sensation, perception, imagery, retention, recall,‬
‭problem-solving, and thinking, among many others, refer to‬
‭hypothetical stages or aspects of cognition‬
‭●‬ ‭Automation‬‭refers to the full or partial “execution by a machine‬
‭agent (usually a computer) of a function that was previously‬
‭carried out by a human”‬
‭●‬ ‭Cognitive Automation‬‭refers to seizing ML for automating‬
‭cognitive knowledge and service work to realize value offered by‬
‭AI, which is based on implementing artificial cognition that‬
‭mimics and approximates human cognition in machines‬

‭Justifying Cognitive Automation as a Subset of Machine Learning with‬


‭Examples:‬

es
‭●‬ ‭Natural Language Processing (NLP) in Customer Service: Cognitive‬
‭automation systems that handle customer inquiries use supervised‬

ot
‭machine learning models to understand text (via NLP). They learn‬
‭from historical chat data to provide responses that mimic a‬
‭human customer service agent. These systems can independently‬

N
‭refine their responses based on interactions, making them more‬
‭efficient over time.‬
's
‭●‬ ‭Example: Virtual assistants like chatbots that interpret customer‬
‭queries and provide responses, learning from past interactions to‬
ah

‭become better at understanding and solving customer problems.‬


‭●‬ ‭Automated Document Processing in Finance: Cognitive automation‬
‭can read and understand invoices, contracts, and financial‬
hm

‭statements using machine learning. It uses techniques from‬


‭supervised learning to extract information from scanned‬
‭documents and reinforcement learning to improve over time based‬
‭Re

‭on the accuracy of its extractions.‬


‭●‬ ‭Example: A machine learning system automating the review and‬
‭approval of loan applications, making decisions by learning from‬
‭historical data on past approvals and rejections.‬
‭●‬ ‭Intelligent Decision Support Systems: Machine learning‬
‭algorithms in these systems learn from historical data and human‬
‭input to assist in decision-making. For instance, in healthcare,‬
‭cognitive automation helps doctors by recommending treatment‬
‭plans based on previous patient data, imaging, and test results,‬
‭which uses both unsupervised learning (to group similar patient‬
‭cases) and supervised learning (to make treatment predictions).‬
‭●‬ ‭Example: AI-powered diagnostic tools that assist doctors by‬
‭analyzing medical data and images to suggest possible diagnoses.‬


es
‭Q2) Describe various phases used by machine learning.‬

ot
N
's
ah
hm
‭Re

‭1. Gathering Data:‬


‭Data Gathering is the first step of the machine learning life cycle. The‬
‭goal of this step is to identify and obtain all data-related problems.‬
‭This step includes the below tasks:‬
‭●‬ ‭Identify various data sources‬
‭●‬ ‭Collect data‬
‭●‬ ‭Integrate the data obtained from different sources‬
‭●‬ ‭By performing the above task, we get a coherent set of data, also‬
‭called as a dataset. It will be used in further steps.‬

‭2. Data preparation‬


‭After collecting the data, we need to prepare it for further steps.‬


‭Data preparation is a step where we put our data into a suitable place‬

es
‭and prepare it to use in our machine learning training.‬
‭This step can be further divided into two processes:‬

ot
‭Data exploration:‬
‭It is used to understand the nature of data that we have to work with.‬
‭We need to understand the characteristics, format, and quality of‬

N
‭data. A better understanding of data leads to an effective outcome. In‬
‭this, we find Correlations, general trends, and outliers.‬
's
‭Data pre-processing:‬
‭Now the next step is preprocessing of data for its analysis.‬
ah

‭3. Data Wrangling‬


‭Data wrangling is the process of cleaning and converting raw data into a‬
hm

‭useable format. It is the process of cleaning the data, selecting the‬


‭variable to use, and transforming the data in a proper format to make‬
‭it more suitable for analysis in the next step.‬
‭Re

‭●‬ ‭Missing Values‬


‭●‬ ‭Duplicate data‬
‭●‬ ‭Invalid data‬
‭●‬ ‭Noise‬

‭4. Data Analysis‬


‭Now the cleaned and prepared data is passed on to the analysis step.‬
‭This step involves:‬
‭●‬ ‭Selection of analytical techniques‬
‭●‬ ‭Building models‬
‭●‬ ‭Review the result‬
‭The aim of this step is to build a machine learning model to analyze the‬


‭data using various analytical techniques and review the outcome.‬

es
‭Hence, in this step, we take the data and use machine learning‬
‭algorithms to build the model.‬

ot
‭5. Train Model‬
‭Now the next step is to train the model, in this step we train our model‬

N
‭to improve its performance for better outcome of the problem.‬
‭We use datasets to train the model using various machine learning‬
's
‭algorithms. Training a model is required so that it can understand the‬
‭various patterns, rules, and, features.‬
ah

‭6. Test Model‬


‭Once our machine learning model has been trained on a given dataset,‬
hm

‭then we test the model. In this step, we check for the accuracy of our‬
‭model by providing a test dataset to it.‬
‭Testing the model determines the percentage accuracy of the model as‬
‭Re

‭per the requirement of project or problem.‬

‭7. Deployment‬
‭The last step of machine learning life cycle is deployment, where we‬
‭deploy the model in the real-world system.‬
‭If the above-prepared model is producing an accurate result as per our‬
‭requirement with acceptable speed, then we deploy the model in the‬
‭real system. But before deploying the project, we will check whether it‬
‭is improving its performance using available data or not. The‬
‭deployment phase is similar to making the final report for a project.‬

‭Q3) Every machine learning algorithm should have some key points‬
‭while designing. What are they? Explain them in brief.‬


es
ot
N
's
ah
hm

‭Step 1) Choosing the Training Experience:‬‭The very important and‬


‭Re

‭first task is to choose the training data or training experience which‬


‭will be fed to the Machine Learning Algorithm. It is important to note‬
‭that the data or experience that we fed to the algorithm must have a‬
‭significant impact on the Success or Failure of the Model. So Training‬
‭data or experience should be chosen wisely.‬
‭Below are the attributes which will impact on Success and Failure of‬
‭Data:‬
‭●‬ ‭The training experience will be able to provide direct or indirect‬
‭feedback regarding choices. For example: While Playing chess the‬
‭training data will provide feedback to itself like instead of this‬
‭move if this is chosen the chances of success increases.‬


‭●‬ ‭Second important attribute is the degree to which the learner‬

es
‭will control the sequences of training examples. For example:‬
‭when training data is fed to the machine then at that time‬

ot
‭accuracy is very less but when it gains experience while playing‬
‭again and again with itself or opponent the machine algorithm will‬
‭get feedback and control the chess game accordingly.‬

N
‭●‬ ‭Third important attribute is how it will represent the distribution‬
‭of examples over which performance will be measured. For‬
's
‭example, a Machine learning algorithm will get experience while‬
‭going through a number of different cases and different‬
ah

‭examples. Thus, Machine Learning Algorithm will get more and‬


‭more experience by passing through more and more examples and‬
‭hence its performance will increase.‬
hm

‭Step 2- Choosing target function:‬‭The next important step is‬


‭choosing the target function. It means according to the knowledge fed‬
‭Re

‭to the algorithm the machine learning will choose NextMove function‬
‭which will describe what type of legal moves should be taken. For‬
‭example : While playing chess with the opponent, when opponent will‬
‭play then the machine learning algorithm will decide what be the‬
‭number of possible legal moves taken in order to get success.‬
‭Step 3- Choosing Representation for Target function:‬‭When the‬
‭machine algorithm will know all the possible legal moves the next step is‬
‭to choose the optimized move using any representation i.e. using linear‬
‭Equations, Hierarchical Graph Representation, Tabular form etc. The‬
‭NextMove function will move the Target move like out of these move‬
‭which will provide more success rate. For Example : while playing chess‬


‭machine have 4 possible moves, so the machine will choose that‬

es
‭optimized move which will provide success to it.‬

ot
‭Step 4- Choosing Function Approximation Algorithm:‬‭An optimized‬
‭move cannot be chosen just with the training data. The training data‬
‭had to go through with set of example and through these examples the‬

N
‭training data will approximates which steps are chosen and after that‬
‭machine will provide feedback on it. For Example : When a training data‬
's
‭of Playing chess is fed to algorithm so at that time it is not machine‬
‭algorithm will fail or get success and again from that failure or success‬
ah

‭it will measure while next move what step should be chosen and what is‬
‭its success rate.‬
hm

‭Step 5- Final Design:‬‭The final design is created at last when system‬


‭goes from number of examples , failures and success , correct and‬
‭incorrect decision and what will be the next step etc. Example:‬
‭Re

‭DeepBlue is an intelligent computer which is ML-based won chess game‬


‭against the chess expert Garry Kasparov, and it became the first‬
‭computer which had beaten a human chess expert.‬

‭Q4) What is the process of machine learning algorithm and its‬


‭testing in real life?‬
‭Q5) State and explain various types of data used in ML with‬
‭suitable examples.‬

‭Types of Data related to Machine Learning‬


‭Data Types are a way of classification that specifies which type of‬
‭value a variable can store and what type of mathematical operations,‬


‭relational, or logical operations can be applied to the variable without‬

es
‭causing an error. In Machine learning, it is very important to know‬
‭appropriate datatypes of independent and dependent variable.‬

ot
‭as it provides the basis for selecting classification or regression‬
‭models. Incorrect identification of data types leads to incorrect‬
‭modeling which in turn leads to an incorrect solution.‬

N
‭Here I will be discussing different types of data types with suitable‬
‭examples.‬
's
‭Different Types of data types‬
ah
hm
‭Re

‭1. Quantitative data type: –‬


‭This type of data type consists of numerical values. Anything which is‬
‭measured by numbers.‬
‭E.g., Profit, quantity sold, height, weight, temperature, etc.‬
‭This is again of two types‬

‭A.) Discrete data type: –‬


‭The numeric data which have discrete values or whole numbers. This‬
‭type of variable value if expressed in decimal format will have no‬


‭proper meaning. Their values can be counted.‬

es
‭E.g.: – No. of cars you have, no. of marbles in containers, students in a‬
‭class, etc.‬

ot
N
's
ah

‭B.) Continuous data type: –‬


‭The numerical measures which can take the value within a certain‬
‭range. This type of variable value if expressed in decimal format has‬
hm

‭true meaning. Their values can not be counted but measured. The value‬
‭can be infinite‬
‭E.g.: – height, weight, time, area, distance, measurement of rainfall,‬
‭Re

‭etc.‬

es
‭2. Qualitative data type: –‬

ot
‭These are the data types that cannot be expressed in numbers. This‬
‭describes categories or groups and is hence known as the categorical‬
‭data type.‬
‭This can be divided into:-‬
N
's
‭a. Structured Data:‬
‭This type of data is either number or words. This can take numerical‬
ah

‭values but mathematical operations cannot be performed on it. This‬


‭type of data is expressed in tabular format.‬
hm

‭E.g.) Sunny=1, cloudy=2, windy=3 or binary form data like 0 or1, Good‬
‭or bad, etc.‬
‭Re
‭b. Unstructured data:‬
‭This type of data does not have the proper format and therefore‬
‭known as unstructured data.This comprises textual data, sounds,‬
‭images, videos, etc.‬


es
ot
N
‭Besides this, there are also other types refer as Data Types‬
‭preliminaries or Data Measures:-‬
's
‭These can also be refer different scales of measurements.‬
ah

‭I. Nominal Data Type:‬


‭This is in use to express names or labels which are not order or‬
hm

‭measurable.‬
‭E.g., male or female (gender), race, country, etc.‬
‭Re
‭II. Ordinal Data Type:‬
‭This is also a categorical data type like nominal data but has some‬
‭natural ordering associated with it.‬
‭E.g., Likert rating scale, Shirt sizes, Ranks, Grades, etc.‬


es
ot
‭III. Interval Data Type:‬
‭This is numeric data which has proper order and the exact zero means‬

N
‭the true absence of a value attached. Here zero means not a complete‬
‭absence but has some value. This is the local scale.‬
's
‭E.g., Temperature measured in degree Celsius, time, Sat score, credit‬
‭score, pH, etc. difference between values is familiar. In this case,‬
ah

‭there is no absolute zero. Absolute‬


hm
‭Re

‭IV. Ratio Data Type:‬


‭This quantitative data type is the same as the interval data type but‬
‭has the absolute zero. Here zero means complete absence and the‬
‭scale starts from zero. This is the global scale.‬
‭E.g., Temperature in Kelvin, height, weight, etc.‬


es
ot
N
‭Q6) Define the term dataset. What are the properties of dataset one‬
's
‭should consider while choosing dataset.‬
ah

‭A Dataset is a set of data grouped into a collection with which‬


‭developers can work to meet their goals. In a dataset, the rows‬
hm

‭represent the number of data points and the columns represent the‬
‭features of the Dataset. Datasets may vary in size and complexity and‬
‭they mostly require cleaning and preprocessing to ensure data quality‬
‭Re

‭and suitability for analysis or modeling.‬


‭Let us see an example below:‬

es
‭Dataset‬
‭This is the Iris dataset. Since this is a dataset with which we build‬

ot
‭models, there are input features and output features. Here:‬
‭The input features are Sepal Length, Sepal Width, Petal Length, and‬
‭Petal Width.‬
‭Species is the output feature.‬
N
's
‭Datasets can be stored in multiple formats. The most common ones are‬
‭CSV, Excel, JSON, and zip files for large datasets such as image‬
ah

‭datasets.‬
hm

‭Why are datasets used?‬


‭Datasets are used to train and test AI models, analyze trends, and gain‬
‭insights from data. They provide the raw material for computers to‬
‭Re

‭learn patterns and make predictions.‬

‭Types of Datasets‬
‭Numerical Dataset, Categorical Dataset, Web Dataset, Time series‬
‭Dataset, Image Dataset, Ordered Dataset, Partitioned Dataset,‬
‭File-Based Datasets, Bivariate Dataset, Multivariate Dataset‬
‭Data Interpretation‬
‭It means conducting a complete study of the data. Analyzing number of‬
‭rows and columns, data types, useful and redundant data, checking for‬
‭null values.‬
‭Based on this study various operations are done on the data to make it‬
‭suitable for entering in ML models. Operations such as Feature‬


‭Engineering, Dimension Reduction, Null Values induction, Missing values‬

es
‭fill-in, data types conversion by encoding methods, etc.‬

ot
‭The choice of dataset can significantly impact the model's‬
‭performance, generalization, and the insights it can provide. Here are‬
‭some key considerations and steps to guide you in choosing the right‬

N
‭dataset for your machine learning project:‬
's
‭1. Define Your Problem and Objectives:‬
‭Start by clearly defining the problem you want to solve and the‬
ah

‭objectives you want to achieve with your machine learning model.‬


‭Understanding the problem domain and the goals of your project is‬
‭essential for selecting an appropriate dataset.‬
hm

‭2. Data Relevance:‬


‭Ensure that the dataset is relevant to your problem. It should contain‬
‭Re

‭features (attributes) that are meaningful and related to the problem‬


‭you're trying to solve. Irrelevant or redundant features can introduce‬
‭noise and reduce model performance.‬

‭3. Data Size:‬


‭Consider the size of the dataset. In general, larger datasets tend to‬
‭produce more accurate and robust models, especially for complex‬
‭problems. However, collecting and processing large datasets can be‬
‭resource-intensive.‬

‭4. Data Quality:‬


‭Data quality is paramount. Check for missing values, outliers, and errors‬
‭in the dataset. Low-quality data can lead to biased or inaccurate‬


‭models. Data preprocessing may be required to clean and prepare the‬

es
‭dataset.‬

ot
‭5. Data Balance:‬
‭For classification problems, check the class distribution. An imbalanced‬
‭dataset (where one class significantly outnumbers the others) can lead‬

N
‭to biased models. Techniques like oversampling or undersampling may‬
‭be needed to address class imbalance.‬
's
‭6. Data Diversity:‬
ah

‭Ensure that the dataset covers a diverse range of scenarios or‬


‭conditions relevant to your problem. Diversity helps the model‬
‭generalize better to unseen data.‬
hm

‭7. Data Availability:‬


‭Consider the accessibility and availability of the dataset. Ensure that‬
‭Re

‭you have the necessary permissions to use the data, and check for any‬
‭legal or ethical constraints.‬

‭8. Data Collection:‬


‭Depending on your problem, you may need to collect your own data‬
‭through surveys, sensors, web scraping, or other means. Be mindful of‬
‭data collection methods and ethics.‬
‭9. Public Datasets:‬
‭Explore publicly available datasets from sources like Kaggle, UCI‬
‭Machine Learning Repository, government databases, or academic‬
‭datasets. These datasets can be a valuable resource for‬
‭experimentation.‬


es
‭10. Domain Knowledge:‬
‭- Leverage domain knowledge and expertise in the field related to your‬

ot
‭problem. Subject matter experts can guide you in selecting relevant‬
‭datasets and understanding the nuances of the data.‬

‭11. Data Exploration:‬


N
‭- Perform exploratory data analysis (EDA) to gain insights into the‬
's
‭dataset. Visualizations, summary statistics, and correlations can help‬
‭you understand the data's characteristics.‬
ah

‭12. Data Splitting:‬


‭- Divide the dataset into training, validation, and testing sets. This is‬
hm

‭crucial for model evaluation and preventing overfitting.‬

‭13. Ethical Considerations:‬


‭Re

‭- Be aware of ethical considerations when working with data, especially‬


‭if the data contains sensitive information. Ensure that privacy and‬
‭ethical guidelines are followed.‬

‭14. Data Licensing:‬


‭- Check the licensing terms and restrictions associated with the‬
‭dataset. Some datasets may have specific usage terms that you need‬
‭to adhere to.‬

‭15. Iterative Process:‬


‭- Dataset selection is often an iterative process. You may need to‬


‭experiment with different datasets to find the one that works best‬

es
‭for your problem.‬

ot
‭Q7) Compare supervised and unsupervised learning‬

N
‭Feature‬ ‭Supervised Learning‬ ‭Unsupervised Learning‬

‭Definition‬ ‭Learns from labeled data,‬ ‭Learns from unlabeled‬


‭where the correct output‬ ‭data, discovering‬
's
‭is known.‬ ‭patterns on its own.‬
ah

‭Objective‬ ‭Predict outcomes or‬ ‭Find hidden patterns,‬


‭classify data into‬ ‭group data, or detect‬
‭predefined categories.‬ ‭anomalies.‬
hm

‭Training Data‬ ‭Uses labeled data (data‬ ‭Uses unlabeled data‬


‭tagged with the correct‬ ‭(data without predefined‬
‭Re

‭output).‬ ‭labels).‬

‭Types of‬ ‭- Classification‬ ‭- Clustering (grouping‬


‭Problems‬ ‭(categorical outcomes)‬ ‭based on similarity)‬
‭- Regression (continuous‬ ‭- Association (finding‬
‭outcomes)‬ ‭relationships)‬
‭Examples‬ ‭- Predicting if an email is‬ ‭- Grouping customers by‬
‭spam or not‬ ‭purchasing behavior‬
‭- Predicting house prices‬ ‭- Market basket analysis‬

‭Algorithms‬ ‭- Linear/Logistic‬ ‭- K-Means Clustering‬


‭Regression‬ ‭- Hierarchical Clustering‬
‭- Decision Trees‬ ‭- Apriori Algorithm‬


es
‭- Support Vector‬ ‭- DBSCAN‬
‭Machines (SVM)‬
‭- Random Forests‬

ot
‭- Naive Bayes‬

N
‭Evaluation‬ ‭- Accuracy‬ ‭- Silhouette Score‬
‭Metrics‬ ‭- Precision‬ ‭- Adjusted Rand Index‬
‭- Recall‬ ‭- Davies-Bouldin Index‬
's
‭- F1 Score‬ ‭- Calinski-Harabasz‬
‭- Mean Squared Error‬ ‭Score‬
ah

‭(MSE)‬

‭Presence of‬ ‭Yes, output labels are‬ ‭No, output labels are not‬
hm

‭Output Labels‬ ‭present.‬ ‭available.‬

‭Task‬ ‭Maps input to a known‬ ‭Identifies hidden‬


‭output.‬ ‭patterns or structures in‬
‭Re

‭data.‬

‭Approach‬ ‭Learns by example with‬ ‭Learns without guidance,‬


‭guidance.‬ ‭based on data's inherent‬
‭structure.‬
‭Use Cases‬ ‭- Spam detection‬ ‭- Anomaly detection‬
‭- Fraud detection‬ ‭- Market segmentation‬
‭- Speech recognition‬ ‭- Network analysis‬

‭Advantages‬ ‭- Accurate predictions‬ ‭- Works with unlabeled‬


‭with well-labeled data‬ ‭data‬
‭- Can handle complex‬ ‭- Finds hidden patterns‬


es
‭tasks‬ ‭automatically‬

‭Disadvantages‬ ‭- Requires labeled data‬ ‭- May produce less‬

ot
‭- Time-consuming to label‬ ‭accurate results‬
‭data‬ ‭- Harder to interpret‬

N
‭- Can struggle with‬ ‭and validate findings‬
‭complex or dynamic‬
‭environments‬
's
‭Q8) Justify the need of data mining in machine learning.‬
ah

‭Data mining‬‭and‬‭machine learning‬‭are closely interrelated,‬‭and data‬


‭mining is crucial for the success of machine learning algorithms. Here's‬
hm

‭a justification for the need of data mining in machine learning:‬

‭1. Extracting Relevant Patterns and Knowledge from Large Datasets‬


‭Re

‭●‬ ‭Machine learning‬‭algorithms rely heavily on clean,‬


‭well-structured, and relevant data to learn and make predictions.‬
‭Data mining‬‭helps in identifying and extracting useful‬‭patterns,‬
‭relationships, and trends from vast amounts of raw data, turning‬
‭it into meaningful information. For example, it can identify‬
‭patterns in customer behavior, fraud detection, or market‬
‭trends, which are valuable inputs for machine learning models.‬

‭2. Data Preprocessing‬

‭●‬ ‭Before machine learning algorithms can be applied, the raw data‬
‭must be preprocessed to handle missing values, outliers, and‬


‭irrelevant features.‬‭Data mining techniques‬‭like data‬‭cleaning,‬

es
‭transformation, and normalization ensure the dataset is of high‬
‭quality and suitable for training machine learning models.‬

ot
‭●‬ ‭For example, in customer data, irrelevant features such as‬
‭unrelated columns or inconsistencies in user details can negatively‬
‭affect model accuracy. Data mining helps clean and prepare such‬
‭data.‬
N
‭3. Feature Selection and Engineering‬
's
‭●‬ ‭The performance of machine learning algorithms significantly‬
ah

‭depends on the choice of features.‬‭Data mining‬‭techniques‬‭assist‬


‭in feature selection (choosing the most relevant variables) and‬
‭feature engineering (creating new useful features from existing‬
hm

‭ones). This improves the performance and accuracy of machine‬


‭learning models.‬
‭●‬ ‭Example: In predictive analytics for loan approvals, data mining‬
‭Re

‭can help identify key features like income, credit score, and loan‬
‭history, which are most relevant for making predictions.‬

‭4. Handling Unstructured Data‬

‭●‬ ‭A large portion of data, such as text, images, and videos, is‬
‭unstructured.‬‭Data mining‬‭techniques can be used to‬‭extract‬
‭structured information from this data, which is essential for‬
‭feeding into machine learning models.‬
‭●‬ ‭Example: Mining text data to extract useful features such as‬
‭sentiment or topic, which can be used in natural language‬
‭processing (NLP) tasks like sentiment analysis or recommendation‬
‭systems.‬


es
‭5. Discovering Hidden Patterns in Unlabeled Data‬

‭●‬ ‭Unsupervised learning‬‭methods (such as clustering or association)‬

ot
‭are used in machine learning to find hidden patterns in data‬
‭without labels.‬‭Data mining‬‭techniques help in uncovering‬‭these‬
‭patterns, associations, or clusters, which machine learning models‬

N
‭can use to improve their decision-making processes.‬
‭●‬ ‭Example: In market segmentation, data mining might discover‬
's
‭clusters of customers with similar buying habits, which can then‬
‭be used for personalized marketing strategies.‬
ah

‭6. Reducing Dimensionality‬

‭●‬ ‭Data mining‬‭can help in dimensionality reduction, which reduces‬


hm

‭the number of variables while preserving the important‬


‭information. This is essential for machine learning, especially‬
‭when dealing with high-dimensional data, where too many features‬
‭Re

‭can lead to overfitting or slow processing.‬


‭●‬ ‭Example: Techniques like‬‭Principal Component Analysis‬‭(PCA)‬‭or‬
‭Singular Value Decomposition (SVD)‬‭are often used‬‭in data‬
‭mining to reduce dimensions, making machine learning algorithms‬
‭more efficient.‬

‭7. Improving Accuracy and Predictive Power‬


‭●‬ ‭By extracting useful patterns, relationships, and trends from raw‬
‭data,‬‭data mining‬‭enhances the accuracy and predictive‬‭power of‬
‭machine learning models. Data mining helps in identifying the most‬
‭relevant factors that can influence outcomes, thereby refining‬
‭the learning process.‬
‭●‬ ‭Example: In fraud detection, data mining can uncover subtle‬


‭patterns in transaction data that machine learning algorithms can‬

es
‭use to accurately predict and detect fraud.‬

‭8. Knowledge Discovery in Databases (KDD)‬

ot
‭●‬ ‭Machine learning is often a part of the broader process called‬
‭Knowledge Discovery in Databases (KDD)‬‭, where data‬‭mining‬

N
‭plays a pivotal role. Data mining techniques are essential to‬
‭discover previously unknown relationships or patterns from large‬
's
‭datasets, which can then be leveraged by machine learning models‬
‭for prediction and classification.‬
ah

‭●‬ ‭Example: In medical research, data mining can help discover‬


‭hidden correlations between symptoms and diseases, leading to‬
‭the development of better predictive healthcare models using‬
hm

‭machine learning.‬

‭9. Handling Noisy and Incomplete Data‬


‭Re

‭●‬ ‭Real-world data is often noisy or incomplete, which can degrade‬


‭the performance of machine learning models.‬‭Data mining‬
‭techniques are designed to clean and handle such data, ensuring‬
‭that the final dataset used for machine learning is reliable and‬
‭robust.‬
‭●‬ ‭Example: In financial data, missing values or errors in‬
‭transactional records can be corrected or imputed through data‬
‭mining techniques before applying machine learning for credit risk‬
‭assessment.‬

‭10. Efficiency in Dealing with Large-Scale Data‬


‭●‬ ‭With the growing amount of data generated every day (e.g., from‬

es
‭IoT devices, social media, or transactions),‬‭data‬‭mining‬‭is‬
‭essential for handling and processing large-scale datasets‬

ot
‭efficiently. Machine learning models often need summarized or‬
‭reduced data to operate effectively, which data mining provides‬
‭through aggregation and summarization techniques.‬

N
‭●‬ ‭Example: In social media analysis, mining relevant social posts‬
‭from millions of records and summarizing them for machine‬
's
‭learning sentiment analysis or trend prediction.‬

‭Conclusion:‬
ah

‭Data mining‬‭serves as a crucial preprocessing step and knowledge‬


‭discovery tool in the machine learning pipeline. It ensures that the raw‬
hm

‭data is converted into a form that machine learning algorithms can‬


‭effectively learn from, enhancing their predictive accuracy, efficiency,‬
‭and relevance. Without data mining, machine learning would struggle to‬
‭Re

‭handle large-scale, unstructured, and noisy datasets, limiting its‬


‭real-world applicability.‬

‭Q9) As a researcher you are expected to work with detection of‬


‭COVID at early stage. What kind of data and types of data you‬
‭will select? What kind of properties you will consider while choosing‬
‭data and dataset.‬
‭Main detection features will be the following 3:‬
‭Antibody test (IgG)‬
‭Antibody testing is also known as serological testing. Your doctor or‬
‭medical laboratory technician will use it to examine the type of‬
‭antibodies present in your blood.‬


es
‭There are numerous antibodies in the blood. The technician or nurse‬
‭will collect a sample of your blood and examine it for IgM and IgG. Ig‬

ot
‭stands for an immunoglobulin molecule.‬
‭● IgM antibodies develop at an early stage of infection against‬
‭SARS-CoV-2.‬

N
‭● IgG antibodies develop against SARS-CoV-2 once the person has‬
‭recovered from coronavirus.‬
's
‭Results by Antibody Test (IgG)‬
ah

‭The antibody testing kits take around 30-60 minutes to show results.‬

‭Reverse Transcription Polymerase Chain Reaction (RT – PCR)‬


hm

‭A polymerase chain reaction test is a highly sensitive test. Due to its‬


‭increased sensitivity and high fidelity, it is known as the most accurate‬
‭testing method for COVID 19 to date. It works by detecting the‬
‭Re

‭presence of genetic material from a specific pathogen.‬

‭Results by RT-PCR‬
‭RT-PCR is capable of delivering an accurate diagnosis and result for‬
‭COVID 19 within 3 hours. The laboratories take 6-8 hours to derive a‬
‭conclusive result.‬
‭TrueNat‬
‭TrueNat is a chip-based, portable RT-PCR machine that was initially‬
‭developed to diagnose tuberculosis. You can confirm your sample using‬
‭confirmatory tests for SARS-CoV-2 if you test positive by TrueNat‬
‭Beta CoV.‬


‭Results by TrueNat‬

es
‭It is capable of producing faster results than standard RT-PCR tests.‬

ot
‭Aside from these, patient demographic and time for test and reporting‬
‭will also be recorded and used for detecting underlying patterns.‬
‭These underlying patterns can be used in statistics to generate‬
‭hypotheses and theories.‬
N
's
‭Properties of the data will be the same as Q6.‬
ah

‭Q10) What is the need of regression? Describe various types of‬


‭the same.‬
hm

‭●‬ ‭Regression Analysis is a statistical process for estimating the‬


‭relationships between the dependent variables or criterion‬
‭variables and one or more independent variables or predictors.‬
‭Re

‭●‬ ‭Regression analysis is generally used when we deal with a dataset‬


‭that has the target variable in the form of continuous data.‬
‭Regression analysis explains the changes in criteria about changes‬
‭in select predictors.‬
‭●‬ ‭The conditional expectation of the criteria is based on predictors‬
‭where the average value of the dependent variables is given when‬
‭the independent variables are changed.‬
‭●‬ ‭Three major uses for regression analysis are determining the‬
‭strength of predictors, forecasting an effect, and trend‬
‭forecasting.‬
‭●‬ ‭There are times when we would like to analyze the effect of‬
‭different independent features on the target or what we say‬
‭dependent features. This helps us make decisions that can affect‬


‭the target variable in the desired direction.‬

es
‭●‬ ‭Regression analysis is heavily based on statistics and hence gives‬
‭quite reliable results to this reason only regression models are‬

ot
‭used to find the linear as well as non-linear relation between the‬
‭independent and the dependent or target variables.‬

N
‭Types of Regression are as follows:‬
‭●‬ ‭Linear regression‬‭is used for predictive analysis. Linear‬
's
‭regression is a linear approach for modeling the relationship‬
‭between the criterion or the scalar response and the multiple‬
ah

‭predictors or explanatory variables. Linear regression focuses on‬


‭the conditional probability distribution of the response given the‬
‭values of the predictors. The formula for linear regression is: y =‬
hm

‭θx + b‬
‭●‬ ‭Polynomial Regression:‬‭This is an extension of linear regression‬
‭and is used to model a non-linear relationship between the‬
‭Re

‭dependent variable and independent variables. Here as well‬


‭syntax remains the same but now in the input variables we include‬
‭some polynomial or higher degree terms of some already existing‬
‭features as well. Linear regression was only able to fit a linear‬
‭model to the data at hand but with polynomial features, we can‬
‭easily fit some non-linear relationship between the target as well‬
‭as input features.‬
‭●‬ ‭Stepwise regression‬‭is used for fitting regression models with‬
‭predictive models. It is carried out automatically. With each step,‬
‭the variable is added or subtracted from the set of explanatory‬
‭variables. The approaches for stepwise regression are forward‬
‭selection, backward elimination, and bidirectional elimination. The‬
‭formula for stepwise regression is:‬


‭●‬ ‭Decision Tree Regression:‬‭A Decision Tree is the most powerful‬

es
‭and popular tool for classification and prediction. A Decision tree‬
‭is a flowchart-like tree structure, where each internal node‬

ot
‭denotes a test on an attribute, each branch represents an‬
‭outcome of the test, and each leaf node (terminal node) holds a‬
‭class label. There is a non-parametric method used to model a‬

N
‭decision tree to predict a continuous outcome.‬
‭●‬ ‭Random Forest‬‭is an ensemble technique capable of performing‬
's
‭both regression and classification tasks with the use of multiple‬
‭decision trees and a technique called Bootstrap and Aggregation,‬
ah

‭commonly known as bagging. The basic idea behind this is to‬


‭combine multiple decision trees in determining the final output‬
‭rather than relying on individual decision trees.‬
hm

‭●‬ ‭Support vector regression (SVR)‬‭is a type of support vector‬


‭machine (SVM) that is used for regression tasks. It tries to find‬
‭a function that best predicts the continuous output value for a‬
‭Re

‭given input value.SVR can use both linear and non-linear kernels. A‬
‭linear kernel is a simple dot product between two input vectors,‬
‭while a non-linear kernel is a more complex function that can‬
‭capture more intricate patterns in the data. The choice of kernel‬
‭depends on the data’s characteristics and the task’s complexity.‬
‭●‬ ‭Ridge Regression‬‭: Ridge regression is a technique for analyzing‬
‭multiple regression data. When multicollinearity occurs, least‬
‭squares estimates are unbiased. This is a regularized linear‬
‭regression model, it tries to reduce the model complexity by‬
‭adding a penalty term to the cost function. A degree of bias is‬
‭added to the regression estimates, and as a result, ridge‬


‭regression reduces the standard errors.‬

es
‭●‬ ‭Lasso regression‬‭is a regression analysis method that performs‬

ot
‭both variable selection and regularization. Lasso regression uses‬
‭soft thresholding. Lasso regression selects only a subset of the‬

N
‭provided covariates for use in the final model. This is another‬
‭regularized linear regression model, it works by adding a penalty‬
‭term to the cost function, but it tends to zero out some features’‬
's
‭coefficients, which makes it useful for feature selection.‬
‭●‬ ‭ElasticNet Regression‬‭: Linear Regression suffers from‬
ah

‭overfitting and can’t deal with collinear data. When there are‬
‭many features in the dataset and even some of them are not‬
hm

‭relevant to the predictive model. This makes the model more‬


‭complex with a too-inaccurate prediction on the test set (or‬
‭overfitting). Such a model with high variance does not generalize‬
‭Re

‭on the new data. So, to deal with these issues, we include both‬
‭L-2 and L-1 norm regularization to get the benefits of both Ridge‬
‭and Lasso at the same time. The resultant model has better‬
‭predictive power than Lasso‬
‭●‬ ‭Bayesian Linear Regression‬‭: As the name suggests this algorithm‬
‭is purely based on Bayes Theorem. Because of this reason only we‬
‭do not use the Least Square method to determine the‬
‭coefficients of the regression model. So, the technique which is‬
‭used here to find the model weights and parameters relies on‬
‭features posterior distribution and this provides an extra‬
‭stability factor to the regression model which is based on this‬
‭technique.‬


‭Q11) Problems based on regression‬

es
‭Numerical PDF‬

ot
‭Q12) Note on polynomial regression‬

‭Polynomial Regression‬

N
‭Polynomial Regression is a regression algorithm that models the‬
‭relationship between a dependent(y) and independent variable(x) as nth‬
's
‭degree polynomial. The Polynomial Regression equation is given below:‬
ah

‭2‬ ‭3‬ ‭𝑛‬


‭𝑦‬‭‬ = ‭‬‭𝑏𝑜‬‭‬ + ‭‭𝑏
‬ ‬‭1‬‭𝑥‭‬‬ + ‭‬‭𝑏‬‭1‬‭𝑥‬ + ‭‬‭𝑏‬‭2‭𝑥
‬ ‬ + ‭‬... ‭‬ + ‭‬‭𝑏𝑛‬‭𝑥‬

‭It is also called the special case of Multiple Linear Regression in ML.‬
hm

‭Because we add some polynomial terms to the Multiple Linear‬


‭regression equation to convert it into Polynomial Regression.‬
‭It is a linear model with some modification in order to increase the‬
‭Re

‭accuracy. The dataset used in Polynomial regression for training is of‬


‭non-linear nature. It makes use of a linear regression model to fit the‬
‭complicated and nonlinear functions and datasets.‬
‭Hence, "In Polynomial regression, the original features are converted‬
‭into Polynomial features of required degree (2,3,..,n) and then modeled‬
‭using a linear model."‬

‭Need for Polynomial Regression:‬


‭●‬ ‭If we apply a linear model on a linear dataset, then it provides us‬


‭a good result as we have seen in Simple Linear Regression, but if‬

es
‭we apply the same model without any modification on a non-linear‬
‭dataset, then it will produce a drastic output. Due to which loss‬

ot
‭function will increase, the error rate will be high, and accuracy‬
‭will be decreased.‬
‭●‬ ‭So for such cases, where data points are arranged in a non-linear‬

N
‭fashion, we need the Polynomial Regression model. We can‬
‭understand it in a better way using the below comparison diagram‬
's
‭of the linear dataset and non-linear dataset.‬
ah
hm
‭Re

‭In the above image, we have taken a dataset which is arranged‬


‭non-linearly. So if we try to cover it with a linear model, then we can‬
‭clearly see that it hardly covers any data point. On the other hand, a‬
‭curve is suitable to cover most of the data points, which is of the‬
‭Polynomial model.‬
‭Hence, if the datasets are arranged in a non-linear fashion, then we‬
‭should use the Polynomial Regression model instead of Simple Linear‬
‭Regression.‬

‭Note: A Polynomial Regression algorithm is also called Polynomial Linear‬


‭Regression because it does not depend on the variables, instead, it‬


‭depends on the coefficients, which are arranged in a linear fashion.‬

es
‭Equation of the Polynomial Regression Model:‬
‭Simple Linear Regression equation:‬

ot
‭y = b0+b1x‬
‭Polynomial Regression equation:‬
‭2‬ ‭3‬ ‭𝑛‬

N
‭𝑦‬‭‬ = ‭‬‭𝑏𝑜‬‭‬ + ‭‭𝑏
‬ ‬‭1‬‭𝑥‭‬‬ + ‭‬‭𝑏‬‭1‬‭𝑥‬ + ‭‬‭𝑏‬‭2‭𝑥
‬ ‬ + ‭‬... ‭‬ + ‭‬‭𝑏𝑛‬‭𝑥‬

‭Q13) Identify the domain where you can apply linear regression and‬
's
‭polynomial regression‬
ah

‭Domain‬ ‭Linear Regression‬ ‭Polynomial Regression‬

‭Economics‬ ‭- Stock price‬ ‭- Modeling non-linear stock‬


hm

‭and Finance‬ ‭prediction‬ ‭price trends‬


‭- Sales forecasting‬ ‭- Demand forecasting with‬
‭- House price‬ ‭non-linear growth‬
‭Re

‭prediction‬

‭Healthcare‬ ‭- Medical cost‬ ‭- Disease progression‬


‭prediction‬ ‭modeling (e.g., cancer growth)‬
‭- Predicting BMI‬ ‭- Drug efficacy over time‬
‭based on‬
‭height/weight‬
‭Engineering‬ ‭- Energy consumption‬ ‭- Trajectory modeling in‬
‭prediction based on‬ ‭physics‬
‭usage patterns‬ ‭- Heat transfer in complex‬
‭systems‬

‭Social‬ ‭- Income vs.‬ ‭- Modeling complex social‬


‭Sciences‬ ‭education level‬ ‭behaviors (e.g., crime rate‬


es
‭- Predicting‬ ‭fluctuations)‬
‭population trends‬

ot
‭Marketing‬ ‭- Customer spending‬ ‭- Customer lifetime value with‬
‭and‬ ‭prediction‬ ‭non-linear trends‬

N
‭Advertising‬ ‭- Pricing models‬ ‭- Advanced pricing models‬
‭with curve fitting‬

‭Environmental‬ ‭- Temperature vs.‬ ‭- Climate change modeling‬


's
‭Science‬ ‭energy consumption‬ ‭- Pollution level prediction‬
‭(complex interactions)‬
ah

‭Agriculture‬ ‭- Predicting crop‬ ‭- Modeling crop yield‬


‭yield based on linear‬ ‭considering non-linear factors‬
hm

‭factors like rainfall‬ ‭like soil fertility changes over‬


‭time‬
‭Re

‭Physics and‬ ‭- Simple force or‬ ‭- Complex motion trajectories‬


‭Mechanics‬ ‭speed predictions‬ ‭- Fatigue testing of materials‬

‭Education‬ ‭- Predicting student‬ ‭- Modeling non-linear trends‬


‭performance based‬ ‭in student learning behavior‬
‭on study hours‬
‭Real Estate‬ ‭- House price‬ ‭- Real estate value prediction‬
‭prediction based on‬ ‭based on complex, non-linear‬
‭linear factors (e.g.,‬ ‭factors (e.g., proximity to‬
‭size, location)‬ ‭future development)‬

‭Q14) What is reinforcement learning? How it is different than‬


‭supervised and unsupervised learning?‬

es
‭Reinforcement Learning (RL)‬‭is a branch of machine learning focused on‬

ot
‭making decisions to maximize cumulative rewards in a given situation.‬
‭Unlike supervised learning, which relies on a training dataset with‬

N
‭predefined answers, RL involves learning through experience. In RL, an‬
‭agent learns to achieve a goal in an uncertain, potentially complex‬
‭environment by performing actions and receiving feedback through‬
's
‭rewards or penalties.‬
ah

‭Key Concepts of Reinforcement Learning‬


‭Agent‬‭: The learner or decision-maker.‬
hm

‭Environment‬‭: Everything the agent interacts with.‬


‭State‬‭: A specific situation in which the agent finds itself.‬
‭Action‬‭: All possible moves the agent can make.‬
‭Re

‭Reward‬‭: Feedback from the environment based on the action taken.‬

‭How Reinforcement Learning Works‬


‭RL operates on the principle of learning optimal behavior through trial‬
‭and error. The agent takes actions within the environment, receives‬
‭rewards or penalties, and adjusts its behavior to maximize the‬
‭cumulative reward. This learning process is characterized by the‬
‭following elements:‬

‭Policy‬‭: A strategy used by the agent to determine the next action‬


‭based on the current state.‬
‭Reward Function:‬‭A function that provides a scalar feedback signal‬


‭based on the state and action.‬

es
‭Value Function:‬‭A function that estimates the expected cumulative‬
‭reward from a given state.‬

ot
‭Model of the Environment‬‭: A representation of the environment that‬
‭helps in planning by predicting future states and rewards.‬

‭Example: Navigating a Maze‬


N
‭The problem is as follows: We have an agent and a reward, with many‬
's
‭hurdles in between. The agent is supposed to find the best possible‬
‭path to reach the reward. The following problem explains the problem‬
ah

‭more easily.‬
hm
‭Re
‭The above image shows the robot, diamond, and fire. The goal of the‬
‭robot is to get the reward that is the diamond and avoid the hurdles‬
‭that are fired. The robot learns by trying all the possible paths and‬
‭then choosing the path which gives him the reward with the least‬
‭hurdles. Each right step will give the robot a reward and each wrong‬
‭step will subtract the reward of the robot. The total reward will be‬


‭calculated when it reaches the final reward that is the diamond.‬

es
‭Main points in Reinforcement learning –‬

ot
‭●‬ ‭Input‬‭: The input should be an initial state from which the model‬
‭will start‬
‭●‬ ‭Output‬‭: There are many possible outputs as there are a variety‬

N
‭of solutions to a particular problem‬
‭●‬ ‭Training‬‭: The training is based upon the input, The model will‬
's
‭return a state and the user will decide to reward or punish the‬
‭model based on its output.‬
ah

‭●‬ ‭The model keeps continues to learn.‬


‭●‬ ‭The best solution is decided based on the maximum reward.‬
hm

‭Feature‬ ‭Supervised‬ ‭Unsupervised‬ ‭Reinforcement‬


‭Learning‬ ‭Learning‬ ‭Learning‬
‭Re

‭Definition‬ ‭Learning from‬ ‭Learning from‬ ‭Learning through‬


‭labeled data with‬ ‭unlabeled data,‬ ‭interaction with an‬
‭the correct output‬ ‭finding hidden‬ ‭environment, using‬
‭provided for each‬ ‭patterns or‬ ‭rewards and penalties.‬
‭input.‬ ‭structures.‬
‭Input Data‬ ‭Labeled data‬ ‭Unlabeled data (no‬ ‭The environment‬
‭(input-output pairs‬ ‭explicit output‬ ‭provides states and‬
‭are provided).‬ ‭labels).‬ ‭the agent chooses‬
‭actions to receive‬
‭feedback‬
‭(rewards/penalties).‬


‭Objective‬ ‭Predict the correct‬ ‭Discover hidden‬ ‭Maximize the‬

es
‭label for new,‬ ‭patterns or‬ ‭cumulative reward by‬
‭unseen data.‬ ‭groupings in the‬ ‭learning the best‬
‭data.‬ ‭sequence of actions.‬

ot
‭Learning‬ ‭The model is‬ ‭The model‬ ‭The agent learns‬
‭Process‬ ‭trained by‬ ‭organizes data‬ ‭through trial and‬
‭minimizing the‬

N
‭based on‬

‭predictions and true‬ ‭specific guidance.‬


‭error, receiving‬
‭difference between‬ ‭similarity, without‬ ‭feedback for its‬
‭actions and adjusting‬
's
‭labels (error).‬ ‭its strategy.‬

‭Data‬ ‭Requires large‬ ‭Does not require‬ ‭Data comes from‬


ah

‭Dependency‬ ‭amounts of labeled‬ ‭labeled data;‬ ‭continuous interaction‬


‭data for training.‬ ‭focuses on‬ ‭with an environment‬
‭exploring data‬ ‭(dynamic and‬
hm

‭structure.‬ ‭sequential).‬

‭Common‬ ‭- Linear Regression‬ ‭- K-Means‬ ‭- Q-Learning‬


‭Algorithms‬ ‭- Decision Trees‬ ‭Clustering‬ ‭- Deep Q-Network‬
‭Re

‭- Random Forests‬ ‭- Hierarchical‬ ‭(DQN)‬


‭- Support Vector‬ ‭Clustering‬ ‭- SARSA‬
‭Machines (SVMs)‬ ‭- Principal‬ ‭- Proximal Policy‬
‭Component‬ ‭Optimization (PPO)‬
‭Analysis (PCA)‬
‭Example‬ ‭- Spam detection‬ ‭- Customer‬ ‭- Robotics‬
‭Applications‬ ‭- Image‬ ‭segmentation‬ ‭- Game AI (e.g.,‬
‭classification‬ ‭- Anomaly‬ ‭AlphaGo)‬
‭- Medical diagnosis‬ ‭detection‬ ‭- Autonomous driving‬
‭- Market basket‬ ‭- Personalized‬
‭analysis‬ ‭recommendations‬


‭Type of‬ ‭Explicit feedback in‬ ‭No feedback; the‬ ‭Reward signal or‬

es
‭Feedback‬ ‭the form of labeled‬ ‭model‬ ‭penalty after each‬
‭data‬ ‭self-discovers‬ ‭action (delayed‬
‭(correct/incorrect‬ ‭patterns in the‬ ‭feedback).‬

ot
‭labels).‬ ‭data.‬

‭Task Type‬ ‭Classification,‬ ‭Clustering,‬ ‭Sequential‬

‭Advantages‬
‭Regression‬

‭- Provides accurate‬
‭predictions with‬
N
‭Association‬

‭- Can work with‬


‭unlabeled data,‬
‭decision-making tasks‬

‭- Learns complex‬
‭strategies and adapts‬
's
‭well-labeled data.‬ ‭which is more‬ ‭to dynamic‬
‭- Clear evaluation‬ ‭readily available.‬ ‭environments.‬
ah

‭metrics (accuracy,‬ ‭- Can reveal‬ ‭- Maximizes long-term‬


‭precision, recall,‬ ‭unknown patterns‬ ‭rewards.‬
‭etc.).‬ ‭in the data.‬
hm

‭Disadvantage‬ ‭- Requires labeled‬ ‭- Harder to‬ ‭- Requires a large‬


‭s‬ ‭data, which can be‬ ‭evaluate the‬ ‭amount of trial and‬
‭costly and‬ ‭performance‬ ‭error.‬
‭Re

‭time-consuming to‬ ‭without clear‬ ‭- May struggle with‬


‭obtain.‬ ‭labels.‬ ‭long-term planning due‬
‭- Can be less‬ ‭to delayed rewards.‬
‭interpretable.‬
‭Real-World‬ ‭- Predicting house‬ ‭- Grouping‬ ‭- Training a robot to‬
‭Example‬ ‭prices‬ ‭customers into‬ ‭navigate a room‬
‭- Classifying emails‬ ‭segments for‬ ‭- Self-driving cars‬
‭as spam/not spam‬ ‭targeted‬ ‭learning to avoid‬
‭marketing‬ ‭obstacles‬
‭- Product‬
‭recommendations‬


es
‭Q15) Compare polynomial and linear regression‬

ot
‭Feature‬ ‭Linear Regression‬ ‭Polynomial Regression‬

N
‭Definition‬ ‭Models the relationship‬ ‭Models the relationship‬
‭between the dependent‬ ‭between the dependent and‬
‭and independent‬ ‭independent variables as a‬
's
‭variables as a straight‬ ‭polynomial curve.‬
‭line.‬
ah

‭Equation‬ ‭y=b0+b1xy‬ ‭y=b0+b1x+b2x^2+⋯+bnx^n‬


hm

‭Type of‬ ‭Assumes a‬‭linear‬ ‭Assumes a‬‭non-linear‬


‭Relationship‬ ‭relationship between‬ ‭relationship that can be‬
‭variables.‬ ‭represented as a polynomial.‬
‭Re

‭Complexity‬ ‭Simple, requires fewer‬ ‭More complex, as‬


‭computational‬ ‭higher-degree polynomials‬
‭resources.‬ ‭increase the model’s‬
‭complexity.‬
‭Fitting‬ ‭Fits straight lines to‬ ‭Fits curves to data; better‬
‭Ability‬ ‭data; useful for linearly‬ ‭for capturing more complex,‬
‭separable data.‬ ‭non-linear trends.‬

‭Overfitting‬ ‭Lower risk of‬ ‭Higher risk of overfitting‬


‭Risk‬ ‭overfitting, especially‬ ‭with high-degree‬
‭for small datasets.‬ ‭polynomials.‬


es
‭Applications‬ ‭- Stock price prediction‬ ‭- Trajectory modeling‬
‭- Sales forecasting‬ ‭- Disease progression‬

ot
‭- House price prediction‬ ‭- Climate change modeling‬
‭- Medical cost‬ ‭- Crop yield prediction‬

N
‭prediction‬

‭Interpretabi‬ ‭Easier to interpret and‬ ‭Can be harder to interpret‬


‭lity‬ ‭understand, as the‬ ‭as the complexity of the‬
's
‭relationship is simple.‬ ‭curve increases.‬
ah

‭Handling of‬ ‭Best for‬‭linear‬‭data‬ ‭Suitable for‬‭non-linear‬‭data‬


‭Data‬ ‭patterns where a‬ ‭patterns where curves‬
‭Patterns‬ ‭straight-line‬ ‭better fit the data.‬
hm

‭approximation is‬
‭sufficient.‬
‭Re

‭Example‬ ‭Predicting house prices‬ ‭Modeling population growth‬


‭based on features like‬ ‭trends or predicting complex‬
‭area, number of rooms,‬ ‭physics-based trajectories.‬
‭etc.‬
‭Computation‬ ‭More computationally‬ ‭Requires more computational‬
‭al Efficiency‬ ‭efficient and faster.‬ ‭power, especially for‬
‭high-degree polynomials.‬

‭Overfitting‬ ‭Less prone to‬ ‭Needs regularization‬


‭Prevention‬ ‭overfitting, works well‬ ‭techniques (e.g., Lasso or‬
‭with small data sizes.‬ ‭Ridge) to avoid overfitting.‬


es
‭Unit 2‬

ot
‭Q1) Define the term classification in machine learning by providing‬
‭three real life examples‬

N
‭Classification:‬‭A classification problem is when the‬‭output variable is a‬
's
‭category, such as “Red” or “blue” , “disease” or “no disease”.‬
‭Classification is a type of supervised learning that is used to predict‬
ah

‭categorical values, such as whether a customer will churn or not,‬


‭whether an email is spam or not, or whether a medical image shows a‬
‭tumor or not. Classification algorithms learn a function that maps from‬
hm

‭the input features to a probability distribution over the output classes.‬

‭Some common classification algorithms include:‬


‭Re

‭Logistic Regression, Support Vector Machines, Decision Trees, Random‬


‭Forests, Naive Baye‬

‭Evaluation Metrics of Classification:‬


‭Accuracy, Precision, Recall, F1 Score, Confusion Matrix‬
‭Advantages of Supervised Learning:‬

‭1. Since Supervised Learning works with a data set, so we can have an‬
‭exact idea about the classes of objects.‬
‭2. These algorithms are useful or helpful in predicting the output‬
‭based on the prior experience.‬


es
‭Disadvantages of Supervised Learning:‬

ot
‭1. These algorithms are not able to solve complex problems.‬
‭2. It may predict the wrong output if the test data is different from‬
‭the training data.‬

N
‭3. It requires lot of computational time to train the algorithm.‬
's
‭Applications of Supervised Learning:‬

‭1. Email Spam Detection‬


ah

‭●‬ ‭Application‬‭: Email services like Gmail and Outlook use‬


‭classification algorithms to automatically filter out spam emails‬
hm

‭from the inbox.‬


‭●‬ ‭How it Works‬‭: A classification model is trained on a labeled‬
‭dataset of emails, where each email is marked as either "spam" or‬
‭Re

‭"not spam." The model learns patterns from the email content,‬
‭sender information, and other metadata to classify incoming‬
‭emails.‬
‭●‬ ‭Classification Type‬‭: Binary classification (Spam/Not‬‭Spam).‬

‭2. Medical Diagnosis‬


‭●‬ ‭Application‬‭: In healthcare, classification algorithms are used to‬
‭diagnose diseases based on patient data such as symptoms, test‬
‭results, and medical history.‬
‭●‬ ‭How it Works‬‭: For instance, a model trained on labeled‬‭medical‬
‭datasets can classify whether a patient has a specific disease‬
‭(e.g., diabetes, cancer) or not, based on input features like blood‬


‭sugar levels, age, weight, and more.‬

es
‭●‬ ‭Classification Type‬‭: Multi-class classification (e.g.,‬‭Disease A,‬
‭Disease B, or No Disease).‬

ot
‭3. Credit Card Fraud Detection‬

‭●‬ ‭Application‬‭: Financial institutions use classification‬‭to detect‬

N
‭fraudulent credit card transactions.‬
‭●‬ ‭How it Works‬‭: A classification model is trained on past‬
's
‭transaction data, where transactions are labeled as either‬
‭"fraudulent" or "legitimate." The model learns patterns and can‬
ah

‭flag suspicious transactions for further investigation.‬


‭●‬ ‭Classification Type‬‭: Binary classification (Fraud/Legitimate).‬
hm

‭Q2) How one can differentiate classification and clustering.‬

‭Feature‬ ‭Classification‬ ‭Clustering‬


‭Re

‭Definition‬ ‭Assigns predefined‬ ‭Groups data points into‬


‭labels to data points‬ ‭clusters based on similarity,‬
‭based on training data.‬ ‭without predefined labels.‬

‭Type of‬ ‭Supervised Learning‬ ‭Unsupervised Learning‬


‭Learning‬ ‭(requires labeled data).‬ ‭(works with unlabeled data).‬
‭Objective‬ ‭Predict the‬ ‭Discover hidden patterns or‬
‭category/class for new‬ ‭structures in data by‬
‭data points.‬ ‭grouping similar items.‬

‭Input Data‬ ‭Labeled data (with‬ ‭Unlabeled data (no prior‬


‭known class labels).‬ ‭knowledge of classes).‬


‭Output‬ ‭Discrete class labels‬ ‭Groupings or clusters of‬

es
‭(e.g., "spam" or "not‬ ‭data points (e.g., cluster 1,‬
‭spam").‬ ‭cluster 2).‬

ot
‭Common‬ ‭- Logistic Regression‬ ‭- K-Means Clustering‬
‭Algorithms‬ ‭- Decision Trees‬ ‭- Hierarchical Clustering‬
‭- Support Vector‬
‭Machines (SVMs)‬ N ‭- DBSCAN‬
's
‭Evaluation‬ ‭- Accuracy‬ ‭- Silhouette Score‬
‭Metrics‬ ‭- Precision‬ ‭- Davies-Bouldin Index‬
ah

‭- Recall‬ ‭- Calinski-Harabasz Score‬


‭- F1 Score‬
hm

‭Real-World‬ ‭- Email spam detection‬ ‭- Customer segmentation‬


‭Example‬ ‭(spam/not spam)‬ ‭- Image segmentation‬
‭- Disease diagnosis‬
‭Re

‭(disease/no disease)‬

‭Data‬ ‭Requires labeled data‬ ‭No need for labeled data; it‬
‭Dependency‬ ‭for training and‬ ‭groups data based on‬
‭classification.‬ ‭similarities.‬
‭Output‬ ‭Predicts a class or‬ ‭Assigns data points to‬
‭Interpretati‬ ‭category based on‬ ‭clusters based on their‬
‭on‬ ‭learned patterns.‬ ‭relative distance or‬
‭similarity.‬

‭Number of‬ ‭Known in advance (e.g.,‬ ‭Number of clusters may or‬


‭Categories‬ ‭two classes for binary‬ ‭may not be known and can‬


es
‭classification).‬ ‭vary.‬

‭Examples of‬ ‭- Fraud detection‬ ‭- Market segmentation‬

ot
‭Use‬ ‭- Medical diagnosis‬ ‭- Social network analysis‬
‭- Sentiment analysis‬ ‭- Anomaly detection‬

‭Advantages‬
‭results‬
‭- Can handle complex‬
N
‭- Clear, interpretable‬ ‭- Does not require labeled‬
‭data‬
‭- Can reveal hidden patterns‬
's
‭labeled data‬ ‭in the data‬
ah

‭Disadvantage‬ ‭- Requires labeled data,‬ ‭- Harder to interpret‬


‭s‬ ‭which can be costly to‬ ‭results‬
‭obtain‬ ‭- Sensitive to noise and‬
hm

‭- Not effective for‬ ‭data distribution‬


‭discovering unknown‬
‭patterns‬
‭Re

‭Q3) Explain the working of random forest by giving example‬

‭Random Forest is a popular machine learning algorithm that belongs to‬


‭the supervised learning technique. It can be used for both‬
‭Classification and Regression problems in ML. It is based on the‬
‭concept of ensemble learning, which is a process of combining multiple‬
‭classifiers to solve a complex problem and to improve the performance‬
‭of the model.‬

‭As the name suggests, "Random Forest is a classifier that contains a‬


‭number of decision trees on various subsets of the given dataset and‬


‭takes the average to improve the predictive accuracy of that dataset."‬

es
‭Instead of relying on one decision tree, the random forest takes the‬
‭prediction from each tree and based on the majority votes of‬

ot
‭predictions, and it predicts the final output.‬

‭The greater number of trees in the forest leads to higher accuracy‬

N
‭and prevents the problem of overfitting.‬
's
‭The below diagram explains the working of the Random Forest‬
‭algorithm:‬
ah
hm
‭Re
‭There are two assumptions for a better Random forest classifier:‬
‭1.‬ ‭There should be some actual values in the feature variable of the‬
‭dataset so that the classifier can predict accurate results rather‬
‭than a guessed result.‬
‭2.‬ ‭The predictions from each tree must have very low correlations.‬


‭Why use Random Forest?‬

es
‭●‬ ‭It takes less training time as compared to other algorithms.‬
‭●‬ ‭It predicts output with high accuracy, even for the large dataset‬

ot
‭it runs efficiently.‬
‭●‬ ‭It can also maintain accuracy when a large proportion of data is‬
‭missing.‬

N
‭How does Random Forest algorithm work?‬
's
‭Random Forest works in two-phase first is to create the random forest‬
‭by combining N decision tree, and second is to make predictions for‬
ah

‭each tree created in the first phase.‬

‭Step-1: Select random K data points from the training set.‬


hm

‭Step-2: Build the decision trees associated with the selected data‬
‭points (Subsets).‬
‭Re

‭Step-3: Choose the number N for decision trees that you want to build.‬

‭Step-4: Repeat Step 1 & 2.‬


‭Step-5: For new data points, find the predictions of each decision tree,‬
‭and assign the new data points to the category that wins the majority‬
‭votes.‬

‭Example: Suppose there is a dataset that contains multiple fruit‬


‭images. So, this dataset is given to the Random forest classifier. The‬


‭dataset is divided into subsets and given to each decision tree. During‬

es
‭the training phase, each decision tree produces a prediction result, and‬
‭when a new data point occurs, then based on the majority of results,‬

ot
‭the Random Forest classifier predicts the final decision. Consider the‬
‭below image:‬

N
's
ah
hm
‭Re
‭Applications of Random Forest‬
‭●‬ ‭Banking: Banking sector mostly uses this algorithm for the‬
‭identification of loan risk.‬
‭●‬ ‭Medicine: With the help of this algorithm, disease trends and‬
‭risks of the disease can be identified.‬


‭●‬ ‭Land Use: We can identify the areas of similar land use by this‬

es
‭algorithm.‬
‭●‬ ‭Marketing: Marketing trends can be identified using this‬

ot
‭algorithm.‬

‭Advantages of Random Forest‬

N
‭●‬ ‭Random Forest is capable of performing both Classification and‬
‭Regression tasks.‬
's
‭●‬ ‭It is capable of handling large datasets with high dimensionality.‬
‭●‬ ‭It enhances the accuracy of the model and prevents the‬
ah

‭overfitting issue.‬

‭Disadvantages of Random Forest‬


hm

‭●‬ ‭Although random forest can be used for both classification and‬
‭regression tasks, it is not more suitable for Regression tasks.‬
‭Re

‭Q4) List and elaborate applications of random forest‬

‭1. Healthcare: Disease Diagnosis and Risk Prediction‬

‭●‬ ‭Application‬‭: Random Forest is extensively used in‬‭the medical‬


‭field to predict diseases, identify high-risk patients, and assist in‬
‭diagnostic procedures.‬
‭●‬ ‭How it Works‬‭: The algorithm processes medical data‬‭(such as‬
‭patient history, symptoms, test results) to classify whether a‬
‭patient is at risk of a disease (e.g., diabetes, heart disease). It‬
‭also helps predict outcomes like the chances of recovery based on‬
‭multiple medical variables.‬
‭●‬ ‭Example‬‭: Predicting whether a patient has cancer based‬‭on biopsy‬


‭features, identifying high-risk cardiovascular patients, or‬

es
‭predicting whether a person is prone to certain genetic diseases.‬

‭2. Finance: Fraud Detection‬

ot
‭●‬ ‭Application‬‭: Random Forest is widely used to detect fraudulent‬
‭transactions in real-time within financial institutions, such as‬
‭banks or credit card companies.‬
N
‭●‬ ‭How it Works‬‭: The algorithm analyzes past transaction‬‭data‬
's
‭labeled as "fraudulent" or "legitimate" and learns patterns that‬
‭indicate fraud. It can then flag suspicious transactions for‬
ah

‭further investigation.‬
‭●‬ ‭Example‬‭: Identifying credit card fraud by analyzing‬‭transaction‬
‭behaviors (e.g., location, transaction time, amount), or predicting‬
hm

‭fraudulent insurance claims.‬

‭3. Marketing: Customer Segmentation and Recommendation Systems‬


‭Re

‭●‬ ‭Application‬‭: In marketing, Random Forest is used for‬‭customer‬


‭segmentation, personalized recommendations, and identifying‬
‭customer churn.‬
‭●‬ ‭How it Works‬‭: By analyzing customer behaviors, purchasing‬
‭history, and demographic information, the algorithm can classify‬
‭customers into segments and help marketers create targeted‬
‭campaigns.‬
‭●‬ ‭Example‬‭: Grouping customers with similar buying patterns,‬
‭predicting which customers are likely to churn, or recommending‬
‭products based on past purchases.‬

‭4. E-commerce: Product Recommendations‬


es
‭●‬ ‭Application‬‭: Random Forest can be used in e-commerce‬‭platforms‬
‭to provide personalized product recommendations.‬

ot
‭●‬ ‭How it Works‬‭: The algorithm analyzes past purchase behavior,‬
‭browsing history, and customer preferences to suggest relevant‬
‭products.‬

N
‭●‬ ‭Example‬‭: Amazon’s recommendation engine uses Random‬‭Forest‬
‭models to suggest products that customers might want to buy‬
's
‭based on past behavior and similar user profiles.‬

‭5. Banking: Credit Risk Analysis‬


ah

‭●‬ ‭Application‬‭: Banks use Random Forest for assessing‬


‭creditworthiness and determining whether to approve loan‬
hm

‭applications or credit card limits.‬


‭●‬ ‭How it Works‬‭: The model evaluates financial data,‬‭credit history,‬
‭income, and other features to classify whether a borrower is‬
‭Re

‭likely to default on a loan or manage credit well.‬


‭●‬ ‭Example‬‭: Predicting the risk level of a loan applicant‬‭based on‬
‭their credit score, employment history, and debt-to-income ratio.‬

‭6. Agriculture: Crop Disease Detection and Yield Prediction‬


‭●‬ ‭Application‬‭: In agriculture, Random Forest is used for crop‬
‭disease identification and yield prediction.‬
‭●‬ ‭How it Works‬‭: The model can analyze features like‬‭soil‬
‭conditions, temperature, rainfall, and satellite images to classify‬
‭whether crops are healthy or diseased and to predict crop yields‬
‭for the season.‬


‭●‬ ‭Example‬‭: Identifying diseased crops from image data,‬‭predicting‬

es
‭wheat yield based on climate and soil data.‬

‭7. Natural Language Processing (NLP): Sentiment Analysis‬

ot
‭●‬ ‭Application‬‭: Random Forest is employed in sentiment analysis to‬
‭classify text (such as reviews, social media posts) into categories‬

N
‭like positive, negative, or neutral sentiment.‬
‭●‬ ‭How it Works‬‭: The algorithm analyzes word frequencies,‬
's
‭sentence structures, and other textual features to classify text‬
‭into various sentiment categories.‬
ah

‭●‬ ‭Example‬‭: Classifying movie reviews, product feedback,‬‭or social‬


‭media posts as positive, negative, or neutral for brand reputation‬
‭analysis.‬
hm

‭8. Cybersecurity: Intrusion Detection‬

‭●‬ ‭Application‬‭: Random Forest is used to detect network‬‭intrusions‬


‭Re

‭and cybersecurity threats by identifying abnormal patterns in‬


‭network traffic.‬
‭●‬ ‭How it Works‬‭: By learning from past intrusion data,‬‭the model‬
‭can classify incoming traffic as normal or malicious, helping‬
‭network administrators flag suspicious activities.‬
‭●‬ ‭Example‬‭: Detecting unauthorized access attempts, malware‬
‭attacks, or unusual login patterns in a network.‬

‭9. Environmental Science: Climate Change Prediction‬

‭●‬ ‭Application‬‭: Random Forest is employed to predict‬‭climate‬


‭patterns and understand environmental changes based on vast‬


‭amounts of climate data.‬

es
‭●‬ ‭How it Works‬‭: The algorithm processes historical weather‬‭data,‬
‭temperature records, CO2 levels, and other environmental‬

ot
‭factors to predict future climate scenarios.‬
‭●‬ ‭Example‬‭: Predicting temperature rise, rainfall patterns, or CO2‬
‭levels in the atmosphere for the next decade based on historical‬
‭data.‬
N
‭10. Manufacturing: Quality Control and Fault Detection‬
's
‭●‬ ‭Application‬‭: Random Forest is used to improve product‬‭quality and‬
ah

‭detect defects in manufacturing processes.‬


‭●‬ ‭How it Works‬‭: The algorithm analyzes data from manufacturing‬
‭equipment, such as sensor readings, production rates, and‬
hm

‭material properties, to classify whether a product meets quality‬


‭standards or if there is a defect.‬
‭●‬ ‭Example‬‭: Detecting faulty parts in a car manufacturing‬‭process‬
‭Re

‭by analyzing machine sensor data, or predicting machinery‬


‭breakdowns based on operational data.‬

‭11. Bioinformatics: Gene Classification‬


‭●‬ ‭Application‬‭: In bioinformatics, Random Forest is applied to‬
‭classify genes based on their expression profiles or identify gene‬
‭mutations linked to specific diseases.‬
‭●‬ ‭How it Works‬‭: The algorithm analyzes genetic data,‬‭including‬
‭gene expression levels or DNA sequences, to classify genes into‬
‭categories (e.g., normal vs. mutated) or predict the functions of‬


‭unknown genes.‬

es
‭●‬ ‭Example‬‭: Classifying tumor vs. non-tumor genes based‬‭on‬
‭expression data, identifying genetic markers associated with‬

ot
‭hereditary diseases.‬

‭12. Image Recognition: Object Detection and Classification‬

N
‭●‬ ‭Application‬‭: Random Forest is used in image recognition to‬
‭classify and detect objects in images.‬
's
‭●‬ ‭How it Works‬‭: The algorithm processes pixel values,‬‭colors,‬
‭shapes, and textures from images to classify objects or detect‬
ah

‭specific patterns.‬
‭●‬ ‭Example‬‭: Recognizing objects like cars, animals, or‬‭faces in‬
‭images, or classifying handwritten digits for automated data‬
hm

‭entry.‬

‭Q5) What is confusion matrix? Provide examples‬


‭Re

‭What is a Confusion Matrix‬


‭The confusion matrix shows the ways in which your classification model‬
‭is confused when it makes predictions.‬
‭A Confusion matrix is an N x N matrix used for evaluating the‬
‭performance of a classification model, where N is the number of target‬
‭classes.‬
‭The matrix compares the actual target values with those predicted by‬
‭the machine learning model.‬
‭This gives us a holistic view of how well our classification model is‬
‭performing and what kinds of errors it is making.‬

‭How to Calculate a Confusion Matrix‬


‭1. You need a test dataset or a validation dataset with expected‬

es
‭outcome values.‬
‭2. Make a prediction for each row in your test dataset.‬

ot
‭3. From the expected outcomes and predictions count: The number of‬
‭correct predictions for each class.‬
‭The number of incorrect predictions for each class, organized by the‬
‭class that was predicted.‬
N
‭4. These numbers are then organized into a table, or a matrix as‬
's
‭follows:‬
ah
hm

‭Expected down the side‬‭: Each row of the matrix corresponds to a‬


‭Re

‭predicted class.‬
‭Predicted across the top‬‭: Each column of the matrix corresponds to‬
‭an actual class.‬

‭Confusion Matrix‬
‭True Positive (TP)‬


‭The predicted value matches the actual value‬

es
‭The actual value was positive and the model predicted a positive value‬
‭True Negative (TN)‬

ot
‭The predicted value matches the actual value‬
‭The actual value was negative and the model predicted a negative value‬
‭False Positive (FP)‬

N
‭The predicted value was falsely predicted‬
‭The actual value was negative but the model predicted a positive value‬
's
‭False Negative (FN)‬
‭The predicted value was falsely predicted‬
ah

‭The actual value was positive but the model predicted a negative value‬

‭Need for Confusion Matrix in Machine learning‬


hm

‭It evaluates the performance of the classification models, when they‬


‭make predictions on test data, and tells how good our classification‬
‭model is.‬
‭Re

‭• It not only tells the error made by the classifiers but also the type‬
‭of errors such as it is either type-l or type-ll error.‬
‭• With the help of the confusion matrix, we can calculate the different‬
‭parameters for the model, such as accuracy, precision, etc.‬

‭Example:‬
‭Expected‬ ‭Predicted‬

‭man‬ ‭woman‬

‭man‬ ‭man‬

‭woman‬ ‭woman‬

‭man‬ ‭man‬


es
‭women‬ ‭man‬

‭women‬ ‭women‬

ot
‭women‬ ‭women‬

‭man‬ ‭man‬

‭man‬
N
‭women‬
's
‭women‬ ‭women‬
ah

‭men classified as men: 3 women classified as women: 4‬


‭men classified as women: 2 woman classified as men: 1‬
hm

‭man‬ ‭woman‬

‭man‬ ‭3‬ ‭2‬

‭woman‬ ‭1‬ ‭4‬


‭Re

‭The total actual men in the dataset is the sum of the values on the men‬
‭column (3+2)‬
‭The total actual women in the dataset is the sum of values in the‬
‭women column (1 +4).‬
‭The correct values are organized in a diagonal line from top left to‬
‭bottom-right of the matrix (3+4).‬
‭More errors were made by predicting men as women than predicting‬
‭women as men‬


es
ot
‭True Positive:‬
‭Interpretation: You predicted positive and it's true. You predicted‬

N
‭that a woman is pregnant and she actually is.‬
‭True Negative:‬
's
‭Interpretation: You predicted negative and it's true. You predicted‬
‭that a man is not pregnant and he actually is not‬
ah

‭False Positive:‬
‭Interpretation: You predicted positive and it's false. You predicted‬
‭that a man is pregnant but he actually is not.‬
hm

‭False Negative:‬
‭Interpretation: You predicted negative and it's false. You predicted‬
‭that a woman is not pregnant but she actually is.‬
‭Re

‭Q6) Explain the concept of type 1 and type 2 errors by giving suitable‬
‭examples.‬

‭Confusion Matrix‬
‭True Positive (TP)‬


‭The predicted value matches the actual value‬

es
‭The actual value was positive and the model predicted a positive value‬
‭True Negative (TN)‬

ot
‭The predicted value matches the actual value‬
‭The actual value was negative and the model predicted a negative value‬
‭False Positive (FP)‬

N
‭The predicted value was falsely predicted‬
‭The actual value was negative but the model predicted a positive value‬
's
‭False Negative (FN)‬
‭The predicted value was falsely predicted‬
ah

‭The actual value was positive but the model predicted a negative value‬

‭Type 1 and Type 2 Error‬


hm

‭Scenario 1:‬‭We don't have a kitten among the group. Yet, ML algo‬
‭predicts it is there. If we accept the ML algo prediction then it is Type‬
‭1 error also known as 'False Positive'‬
‭Re

‭Scenario 2:‬‭We have a kitten among the group. Yet, ML algo predicts it‬
‭is not there. If we accept the ML algo prediction then it is Type 2‬
‭error also known as 'False Negative'.‬

‭Use cases of Type 1 and Type 2‬


‭Scenario/Problem Statement 1:‬‭Providing access to an asset post a‬
‭biometric scan.‬
‭Type I error: Possibility of rejection even with an authorized match.‬
‭Type II error: Possibility of acceptance even with a unauthorized‬
‭match.‬
‭Scenario/Problem Statement 2:‬‭Construction Model of a bridge is‬
‭correct‬
‭Type I error: Predicting that the model is correct when it is not.‬


‭Type II error: Predicting that a model is not correct when it is‬

es
‭correct.‬
‭Scenario/Problem Statement 3:‬‭Medical trials for a drug which is a‬

ot
‭cure for Cancer‬
‭Type I error: Predicting that a cure is found when it is not the case.‬
‭Type II error: Predicting that a cure is not found when in fact it is the‬
‭case.‬
N
's
‭Q7) Discuss following by giving suitable examples.‬
‭● Overfitting ● Underfitting‬
ah

‭Underfitting in Machine Learning‬


‭A statistical model or a machine learning algorithm is said to have‬
hm

‭underfitting when a model is too simple to capture data complexities.‬


‭It represents the inability of the model to learn the training data‬
‭effectively result in poor performance both on the training and testing‬
‭Re

‭data. It mainly happens when we uses very simple model with overly‬
‭simplified assumptions. To address underfitting problem of the model,‬
‭we need to use more complex models, with enhanced feature‬
‭representation, and less regularization.‬

‭Note: The underfitting model has High bias and low variance.‬
‭Reasons for Underfitting‬
‭●‬ ‭The model is too simple, So it may be not capable to represent‬
‭the complexities in the data.‬
‭●‬ ‭The input features which is used to train the model is not the‬
‭adequate representations of underlying factors influencing the‬
‭target variable.‬


‭●‬ ‭The size of the training dataset used is not enough.‬

es
‭●‬ ‭Excessive regularization are used to prevent the overfitting,‬
‭which constraint the model to capture the data well.‬

ot
‭●‬ ‭Features are not scaled.‬

‭Techniques to Reduce Underfitting‬


‭●‬ ‭Increase model complexity.‬
N
‭●‬ ‭Increase the number of features, performing feature‬
's
‭engineering.‬
‭●‬ ‭Remove noise from the data.‬
ah

‭●‬ ‭Increase the number of epochs or increase the duration of‬


‭training to get better results.‬
hm

‭Overfitting in Machine Learning‬


‭A statistical model is said to be overfitted when the model does not‬
‭make accurate predictions on testing data. When a model gets trained‬
‭Re

‭with so much data, it starts learning from the noise and inaccurate‬
‭data entries in our data set. And when testing with test data results in‬
‭High variance. Then the model does not categorize the data correctly,‬
‭because of too many details and noise. The causes of overfitting are‬
‭the non-parametric and non-linear methods because these types of‬
‭machine learning algorithms have more freedom in building the model‬
‭based on the dataset and therefore they can really build unrealistic‬
‭models. A solution to avoid overfitting is using a linear algorithm if we‬
‭have linear data or using the parameters like the maximal depth if we‬
‭are using decision trees.‬

‭In a nutshell, Overfitting is a problem where the evaluation of machine‬


‭learning algorithms on training data is different from unseen data.‬


es
‭Reasons for Overfitting:‬
‭●‬ ‭High variance and low bias.‬

ot
‭●‬ ‭The model is too complex.‬
‭●‬ ‭The size of the training data.‬

‭Techniques to Reduce Overfitting‬


N
‭●‬ ‭Improving the quality of training data reduces overfitting by‬
's
‭focusing on meaningful patterns, mitigate the risk of fitting the‬
‭noise or irrelevant features.‬
ah

‭●‬ ‭Increase the training data can improve the model’s ability to‬
‭generalize to unseen data and reduce the likelihood of‬
‭overfitting.‬
hm

‭●‬ ‭Reduce model complexity.‬


‭●‬ ‭Early stopping during the training phase (have an eye over the‬
‭loss over the training period as soon as loss begins to increase‬
‭Re

‭stop training).‬
‭●‬ ‭Ridge Regularization and Lasso Regularization.‬
‭●‬ ‭Use dropout for neural networks to tackle overfitting.‬

es
ot
N
's
ah

‭Q8) Define or explain following terms‬


hm

‭● Entropy ● Information gain‬

‭What is Entropy in Machine Learning‬


‭Re

‭Entropy is the measurement of disorder or impurities in the‬


‭information processed in machine learning. It determines how a‬
‭decision tree chooses to split data.‬

‭We can understand the term entropy with any simple example: flipping‬
‭a coin. When we flip a coin, then there can be two outcomes. However,‬
‭it is difficult to conclude what would be the exact outcome while‬
‭flipping a coin because there is no direct relation between flipping a‬
‭coin and its outcomes. There is a 50% probability of both outcomes;‬
‭then, in such scenarios, entropy would be high. This is the essence of‬
‭entropy in machine learning.‬

‭Mathematical Formula for Entropy‬


‭Consider a data set having a total number of N classes, then the‬

es
‭entropy (E) can be determined with the formula below:‬

ot
‭Where;‬
‭Pi = Probability of randomly selecting an example in class I;‬

N
‭Entropy always lies between 0 and 1, however depending on the number‬
‭of classes in the dataset, it can be greater than 1. But the high value‬
's
‭of‬
ah

‭Let's understand it with an example where we have a dataset having‬


‭three colors of fruits as red, green, and yellow. Suppose we have 2 red,‬
hm

‭2 green, and 4 yellow observations throughout the dataset. Then as per‬


‭the above equation:‬
‭Re

‭Where;‬
‭Pr = Probability of choosing red fruits;‬
‭Pg = Probability of choosing green fruits and;‬
‭Py = Probability of choosing yellow fruits.‬
‭Pr = 2/8 =1/4 [As only 2 out of 8 datasets represents red fruits]‬

‭Pg = 2/8 =1/4 [As only 2 out of 8 datasets represents green fruits]‬

‭Py = 4/8 = 1/2 [As only 4 out of 8 datasets represents yellow fruits]‬


‭Now our final equation will be such as;‬

es
ot
‭So, entropy will be 1.5.‬
N
's
‭Let's consider a case when all observations belong to the same class;‬
ah

‭then entropy will always be 0.‬


hm

‭E=−(1log21)‬
‭= 0‬
‭Re

‭When entropy becomes 0, then the dataset has no impurity. Datasets‬


‭with 0 impurities are not useful for learning. Further, if the entropy is‬
‭1, then this kind of dataset is good for learning.‬

es
‭What is the information gain in Entropy?‬
‭Information gain is defined as the pattern observed in the dataset and‬

ot
‭reduction in the entropy.‬

‭formula:‬
N
‭Mathematically, information gain can be expressed with the below‬

‭Information Gain = (Entropy of parent node)-(Entropy of child node)‬


's
‭Note: Information gain is calculated as 1-Entropy.‬
ah

‭Let's understand it with an example having three scenarios as follows:‬


hm
‭Re

‭Let's say we have a tree with a total of four values at the root node‬
‭that is split into the first level having one value in one branch (say,‬
‭Branch 1) and three values in the other branch (Branch 2). The entropy‬
‭at the root node is 1.‬

‭Now, to compute the entropy at the child node 1, the weights are taken‬
‭as ? for Branch 1 and ? for Branch 2 and are calculated using Shannon's‬
‭entropy formula. As we had seen above, the entropy for child node 2 is‬


‭zero because there is only one value in that child node, meaning there‬

es
‭is no uncertainty, and hence, the heterogeneity is not present.‬

ot
‭H(X) = - [(1/3 * log2 (1/3)) + (2/3 * log2 (2/3))] = 0.9184‬

‭The information gain for the above case is the reduction in the‬
‭weighted average of the entropy.‬
N
's
‭Information Gain = 1 - ( ¾ * 0.9184) - (¼ *0) = 0.3112‬
ah

‭The more the entropy is removed, the greater the information gain.‬
‭The higher the information gain, the better the split.‬
hm

‭Q9) Explain naïve bayes classification‬

‭●‬ ‭Naïve Bayes algorithm is a supervised learning algorithm, which is‬


‭Re

‭based on Bayes theorem and used for solving classification‬


‭problems.‬
‭●‬ ‭It is mainly used in text classification that includes a‬
‭high-dimensional training dataset.‬
‭●‬ ‭Naïve Bayes Classifier is one of the simple and most effective‬
‭Classification algorithms which helps in building the fast machine‬
‭learning models that can make quick predictions.‬
‭●‬ ‭It is a probabilistic classifier, which means it predicts on the‬
‭basis of the probability of an object.‬
‭●‬ ‭Some popular examples of Naïve Bayes Algorithm are spam‬
‭filtration, Sentimental analysis, and classifying articles.‬
‭●‬ ‭Naïve: It is called Naïve because it assumes that the occurrence‬
‭of a certain feature is independent of the occurrence of other‬


‭features.‬

es
‭●‬ ‭Such as if the fruit is identified on the bases of color, shape, and‬
‭taste, then red, spherical, and sweet fruit is recognized as an‬

ot
‭apple. Hence each feature individually contributes to identify‬
‭that it is an apple without depending on each other.‬
‭●‬ ‭Bayes: It is called Bayes because it depends on the principle of‬
‭Bayes' Theorem.‬
N
‭●‬ ‭Bayes' Theorem: Bayes' theorem is also known as Bayes' Rule or‬
's
‭Bayes' law, which is used to determine the probability of a‬
‭hypothesis with prior knowledge. It depends on the conditional‬
ah

‭probability.‬
‭●‬ ‭The formula for Bayes' theorem is given as:‬
hm

‭Where,‬
‭P(A|B) is Posterior probability: Probability of hypothesis A on the‬
‭Re

‭observed event B.‬


‭P(B|A) is Likelihood probability: Probability of the evidence given that‬
‭the probability of a hypothesis is true.‬
‭P(A) is Prior Probability: Probability of hypothesis before observing the‬
‭evidence.‬
‭P(B) is Marginal Probability: Probability of Evidence.‬
‭Advantages of Naïve Bayes Classifier:‬
‭●‬ ‭Naïve Bayes is one of the fast and easy ML algorithms to predict‬
‭a class of datasets.‬
‭●‬ ‭It can be used for Binary as well as Multi-class Classifications.‬
‭●‬ ‭It performs well in Multi-class predictions as compared to the‬
‭other Algorithms.‬


‭●‬ ‭It is the most popular choice for text classification problems.‬

es
‭Disadvantages of Naïve Bayes Classifier:‬

ot
‭●‬ ‭Naive Bayes assumes that all features are independent or‬
‭unrelated, so it cannot learn the relationship between features.‬

N
‭Applications of Naïve Bayes Classifier:‬
‭●‬ ‭It is used for Credit Scoring.‬
's
‭●‬ ‭It is used in medical data classification.‬
‭●‬ ‭It can be used in real-time predictions because Naïve Bayes‬
ah

‭Classifier is an eager learner.‬


‭●‬ ‭It is used in Text classification such as Spam filtering and‬
‭Sentiment analysis.‬
hm

‭Types of Naïve Bayes Model:‬


‭●‬ ‭Gaussian: The Gaussian model assumes that features follow a‬
‭Re

‭normal distribution. This means if predictors take continuous‬


‭values instead of discrete, then the model assumes that these‬
‭values are sampled from the Gaussian distribution.‬
‭●‬ ‭Multinomial: The Multinomial Naïve Bayes classifier is used when‬
‭the data is multinomial distributed. It is primarily used for‬
‭document classification problems, it means a particular document‬
‭belongs to which category such as Sports, Politics, education, etc.‬
‭The classifier uses the frequency of words for the predictors.‬
‭●‬ ‭Bernoulli: The Bernoulli classifier works similar to the Multinomial‬
‭classifier, but the predictor variables are the independent‬
‭Booleans variables. Such as if a particular word is present or not‬
‭in a document. This model is also famous for document‬


‭classification tasks.‬

es
‭Q10) Describe advantages and applications of naïve bayes‬

ot
‭classification‬

‭Advantages of Naïve Bayes Classification‬

N
‭1.‬ ‭Simplicity and Ease of Implementation‬‭:‬
‭○‬ ‭Naïve Bayes is simple and easy to implement. It assumes‬
's
‭independence between the features, which reduces‬
‭complexity, making it suitable for quick applications with‬
ah

‭limited computational resources.‬


‭2.‬ ‭Fast and Efficient‬‭:‬
‭○‬ ‭Naïve Bayes is computationally efficient and works well with‬
hm

‭large datasets. Its training and prediction times are fast‬


‭because it simplifies probability calculations using‬
‭conditional independence.‬
‭Re

‭3.‬ ‭Works Well with Small Datasets‬‭:‬


‭○‬ ‭Despite being a simple algorithm, Naïve Bayes performs‬
‭surprisingly well even when the dataset is small. This makes‬
‭it ideal in situations where gathering large amounts of data‬
‭is difficult.‬
‭4.‬ ‭Performs Well with Categorical Data‬‭:‬
‭○‬ ‭Naïve Bayes works particularly well when the input features‬
‭are categorical (e.g., for text classification problems). It‬
‭can handle both binary and multi-class classification tasks‬
‭effectively.‬
‭5.‬ ‭Performs Well for Multiclass Classification‬‭:‬
‭○‬ ‭Unlike some algorithms that struggle with multiclass‬


‭classification, Naïve Bayes handles multiple classes very‬

es
‭well. This makes it ideal for tasks with more than two‬
‭outcomes.‬

ot
‭6.‬ ‭Robust to Irrelevant Features‬‭:‬
‭○‬ ‭Naïve Bayes is relatively immune to irrelevant features in‬
‭the data. Even if the assumption of independence between‬

N
‭features is violated, it can still perform well in many‬
‭practical applications.‬
's
‭7.‬ ‭Performs Well with Text Data and Natural Language‬
‭Processing (NLP)‬‭:‬
ah

‭○‬ ‭Naïve Bayes is popular in text-related tasks (e.g., spam‬


‭filtering, sentiment analysis) because of its ability to handle‬
‭high-dimensional data and its efficiency with text‬
hm

‭classification tasks.‬
‭8.‬ ‭Handles Missing Data‬‭:‬
‭○‬ ‭Naïve Bayes can handle missing data relatively well. While‬
‭Re

‭some machine learning algorithms may need data imputation‬


‭methods, Naïve Bayes can make predictions even with‬
‭missing attributes by ignoring them during probability‬
‭calculations.‬

‭Applications of Naïve Bayes Classification‬


‭1.‬ ‭Spam Filtering‬‭:‬
‭○‬ ‭Application‬‭: Email service providers like Gmail and‬‭Yahoo use‬
‭Naïve Bayes for spam detection. The classifier labels emails‬
‭as either "spam" or "not spam" based on their content,‬
‭sender information, and other features.‬
‭○‬ ‭How it Works‬‭: The algorithm is trained on a dataset‬‭of‬


‭labeled emails (spam and not spam). It calculates the‬

es
‭likelihood of an email being spam based on the presence or‬
‭absence of certain keywords and features.‬

ot
‭2.‬ ‭Sentiment Analysis‬‭:‬
‭○‬ ‭Application‬‭: Naïve Bayes is used to classify customer‬
‭reviews, social media posts, or feedback into categories like‬

N
‭positive, negative, or neutral sentiment.‬
‭○‬ ‭How it Works‬‭: By analyzing the frequency of positive‬‭or‬
's
‭negative words in a dataset of labeled text (reviews or‬
‭tweets), Naïve Bayes can predict the sentiment of new text‬
ah

‭data.‬
‭○‬ ‭Example‬‭: It’s widely used in e-commerce platforms to‬
‭analyze customer reviews and gauge the overall sentiment‬
hm

‭towards products.‬
‭3.‬ ‭Document Classification‬‭:‬
‭○‬ ‭Application‬‭: Naïve Bayes is widely used in text classification‬
‭Re

‭tasks such as news categorization, topic labeling, and‬


‭document classification.‬
‭○‬ ‭How it Works‬‭: The classifier analyzes words or phrases‬‭in‬
‭documents and classifies them into predefined categories,‬
‭such as politics, sports, entertainment, or technology.‬
‭○‬ ‭Example‬‭: News websites use Naïve Bayes to automatically‬
‭categorize articles based on their content.‬
‭4.‬ ‭Medical Diagnosis‬‭:‬
‭○‬ ‭Application‬‭: Naïve Bayes is used in healthcare to‬‭predict‬
‭diseases based on patient data such as symptoms, medical‬
‭history, and test results.‬
‭○‬ ‭How it Works‬‭: The algorithm is trained on a dataset‬‭of‬
‭patient data with known diagnoses. It then uses this‬


‭information to predict whether new patients might have a‬

es
‭particular disease based on the likelihood of specific‬
‭symptoms.‬

ot
‭○‬ ‭Example‬‭: Predicting the likelihood of a patient having a‬
‭disease like diabetes or heart disease based on input‬
‭features like age, weight, blood sugar levels, etc.‬
‭5.‬ ‭Recommendation Systems‬‭:‬
N
‭○‬ ‭Application‬‭: Naïve Bayes is applied in recommendation‬
's
‭engines to suggest items such as movies, books, or products‬
‭to users based on their preferences.‬
ah

‭○‬ ‭How it Works‬‭: By analyzing user behavior and preferences‬


‭(such as past purchases or movie ratings), the algorithm‬
‭classifies items into categories (e.g., “highly recommended”‬
hm

‭or “not recommended”) and makes personalized‬


‭recommendations.‬
‭○‬ ‭Example‬‭: Netflix or Amazon recommending movies or‬
‭Re

‭products based on user preferences.‬


‭6.‬ ‭Credit Scoring and Risk Prediction‬‭:‬
‭○‬ ‭Application‬‭: Banks and financial institutions use‬‭Naïve Bayes‬
‭to assess the creditworthiness of loan applicants and‬
‭predict the risk of default.‬
‭○‬ ‭How it Works‬‭: The algorithm analyzes features such as‬
‭credit history, income, and employment to classify‬
‭customers into low-risk or high-risk categories.‬
‭○‬ ‭Example‬‭: Predicting whether a customer is likely to‬‭default‬
‭on a loan or not, based on their financial behavior.‬
‭7.‬ ‭Face Recognition‬‭:‬


‭○‬ ‭Application‬‭: Naïve Bayes can be used in facial recognition‬

es
‭systems to classify faces in images or videos.‬
‭○‬ ‭How it Works‬‭: The algorithm analyzes facial features,‬‭such‬

ot
‭as distance between eyes, shape of the nose, etc., and‬
‭matches them to pre-classified images in the database.‬
‭○‬ ‭Example‬‭: Used in security systems to recognize and‬‭verify‬
‭individuals' identities.‬
‭8.‬ ‭Anomaly Detection‬‭:‬ N
's
‭○‬ ‭Application‬‭: Naïve Bayes is applied in cybersecurity‬‭to‬
‭detect unusual patterns or anomalies, such as fraud or‬
ah

‭network intrusions.‬
‭○‬ ‭How it Works‬‭: It learns the normal behavior from historical‬
‭data and flags any outliers or anomalies as potential threats.‬
hm

‭○‬ ‭Example‬‭: Detecting unusual login attempts or financial‬


‭transactions that might indicate fraud.‬
‭9.‬ ‭Real-Time Prediction in E-commerce‬‭:‬
‭Re

‭○‬ ‭Application‬‭: Naïve Bayes is used for real-time predictions,‬


‭such as determining whether a customer will make a‬
‭purchase or abandon the cart.‬
‭○‬ ‭How it Works‬‭: By analyzing user behavior data, the‬
‭algorithm classifies customers into groups, such as “likely to‬
‭purchase” or “unlikely to purchase,” in real time.‬
‭○‬ ‭Example‬‭: E-commerce sites like Amazon may use this to‬
‭offer last-minute discounts to users who are likely to‬
‭abandon their shopping cart.‬

‭Q11) Problems based on decision tree CART/ ID3‬


‭Q12) Problems based on naïve bayes‬


es
‭Numerical PDF‬

ot
‭Q13) Explain the working of SVM‬

‭SUPPORT VECTOR MACHINE‬

N
‭SVM is a method for classification of both linear and non-linear data.‬
's
‭Linearly Separable Data:‬

‭If the given data is classified into distinct classes such that‬
ah

‭they can be separated by a‬‭decision boundary‬‭, it is called as‬


‭Linearly Separable Data‬
hm

‭If the given data is classified into distinct classes such that‬
‭they cannot be separated by a decision boundary, it is called‬
‭Non-linearly Separable Data. Since it cannot be separated by a‬
‭Re

‭single line, it is non-linear.‬

‭SVM uses the concept of MMH (Maximum Marginal HyperPlane)‬

‭The goal of the SVM algorithm is to create the best line or‬
‭decision boundary that can segregate ‘n’ dimensional space into‬
‭classes so that we can easily put the new data points in the‬
‭correct category in the future.‬

‭This best-decision boundary is known as‬‭Hyperplane‬

‭SVM chooses the extreme points/vectors that helps in creating‬


‭the Hyperplane.‬


es
‭These extreme cases are called Support Vectors and hence the‬
‭algorithm is termed as Support Vector Machine(SVM).‬

ot
‭The line formed by joining the points closest to the hyperplane‬
‭is the Margin.‬

N
‭Margin is the distance between the support vectors and the‬
‭hyperplane.‬
's
‭TERMINOLOGIES:‬

‭1. Hyperplane‬‭:‬‭It is a decision boundary used to separate‬


ah

‭data points of different classes.‬

‭For a linear classification, it will be a linear equation:‬


hm

‭𝑊‬‭𝑥‭‬ ‬ + ‭‬‭‭𝑏
‬ ‬‭‬ = ‭‭0
‬‬
‭Re

‭where,‬

‭W = weight vector‬

‭b = bias‬
‭We can write the equation for the two classes:‬

‭𝑊‬‭𝑥‬ + ‭‭‬‭𝑏
‬ ≥‬‭‭1
‬ ‬‭‭‬‭𝑓
‬ 𝑜𝑟‬‭‭𝑦
‬ ‭‬‭𝑖‬‭‬ ‬ = ‭‬‭1‬
‭𝑊‬‭𝑥‬ + ‭‭‬‭𝑏
‬ ≥‬‭‭1
‬ ‬‭‭‬‭𝑓
‬ 𝑜𝑟‬‭‭𝑦
‬ ‭‬‭𝑖‬‭‬ ‬ = ‭‬ − ‭1‬

‭Considering these two equalities,‬

‭𝑦‬‭‬(‭𝑊‬‭𝑥‬ + ‭‭𝑏
‬ ‬)‭‬ = ‭‭1
‬‬

‭This is used to decide the Support Vectors.‬


‭2.‬‭These are the closest data points to the hyperplane which‬

es
‭plays a critical role in deciding the hyperplane and margin.‬
‭3.‬‭Margins are of two types : Hard margin & Soft Margin‬

ot
N
's
ah
hm

‭Hard Margin : The maximum margin hyperplane or the hard‬


‭margin is a hyperplane that properly separates the data points‬
‭Re

‭of different categories without any miss-classifications.‬

‭Soft Margin : When the data is not perfectly separable or‬


‭contain outliers, SVM permits a soft margin technique.‬

‭4.‬‭Each‬ ‭datapoint‬ ‭has‬ ‭a‬ ‭Slack‬ ‭Variable‬‭introduced‬‭by‬‭the‬


‭soft-margin‬ ‭formulation,‬ ‭which‬ ‭softens‬ ‭the‬‭strict‬‭margin‬
‭requirements‬ ‭and‬ ‭permits‬ ‭certain‬ ‭miss-classifications‬ ‭or‬
‭violations.‬

‭5.‬ ‭The‬ ‭margin‬ ‭is‬ ‭calculated‬ ‭as‬ ‭:‬


es
ot
‭Types of SVM:‬

‭1.‬ ‭Linear SVM‬

‭2.‬ ‭Non-Linear SVM‬


N
's
‭Advantages of SVM:‬

‭1.‬‭Effective in high-dimensional cases.‬


ah

‭2.‬‭Different kernel function can be specified for the different‬


‭functions as it is possible to specify custom kernel.‬
hm

‭3.‬‭It is memory efficient.‬

‭Disadvantages:‬
‭Re

‭1.‬‭If the number of features is much greater than the number‬


‭of samples, avoid overfeeding in choosing the kernel‬
‭functions.‬

‭2.‬‭SVM’s do not directly provide probability estimates , these‬


‭are calculated using an expensive Five-Fold Cross-Validation.‬
‭Q14) Applications of SVM‬

‭1. Image Classification‬

‭●‬ ‭Application‬‭: SVM is widely used for classifying images‬‭in‬


‭computer vision tasks, such as facial recognition, object‬
‭detection, and handwriting recognition.‬


‭●‬ ‭How it Works‬‭: The algorithm can effectively classify‬‭images by‬

es
‭finding the optimal hyperplane that separates different classes in‬
‭the feature space derived from the image data.‬

ot
‭●‬ ‭Example‬‭: Recognizing handwritten digits in the MNIST dataset‬
‭or classifying images of cats and dogs.‬

‭2. Text Classification‬

N
‭●‬ ‭Application‬‭: SVM is employed for text categorization‬‭tasks, such‬
's
‭as spam detection, sentiment analysis, and document‬
‭classification.‬
ah

‭●‬ ‭How it Works‬‭: The algorithm converts text data into‬‭numerical‬


‭feature vectors using techniques like TF-IDF or word embeddings‬
‭and then classifies the documents based on these features.‬
hm

‭●‬ ‭Example‬‭: Classifying emails as spam or non-spam, or‬‭determining‬


‭the sentiment of product reviews as positive or negative.‬
‭Re

‭3. Bioinformatics‬

‭●‬ ‭Application‬‭: In bioinformatics, SVM is used for classifying‬‭genes,‬


‭proteins, and biological sequences based on their features.‬
‭●‬ ‭How it Works‬‭: It helps in identifying gene functions‬‭or‬
‭predicting protein structures by analyzing complex biological‬
‭data.‬
‭●‬ ‭Example‬‭: Classifying genes associated with particular diseases or‬
‭predicting protein-protein interactions.‬

‭4. Finance‬

‭●‬ ‭Application‬‭: SVM is applied in financial markets for‬‭credit‬


‭scoring, fraud detection, and stock price prediction.‬


‭●‬ ‭How it Works‬‭: It analyzes historical financial data‬‭to classify‬

es
‭transactions as fraudulent or legitimate, or to predict whether a‬
‭stock price will rise or fall.‬

ot
‭●‬ ‭Example‬‭: Classifying credit applicants into "approved"‬‭or "denied"‬
‭categories based on their financial history.‬

‭5. Medical Diagnosis‬

N
‭●‬ ‭Application‬‭: SVM is used to assist in diagnosing diseases‬‭based on‬
's
‭patient data, such as symptoms and medical history.‬
‭●‬ ‭How it Works‬‭: By analyzing various features related‬‭to patient‬
ah

‭health, SVM can classify individuals as healthy or as having a‬


‭particular disease.‬
‭●‬ ‭Example‬‭: Diagnosing diseases such as cancer by analyzing‬‭medical‬
hm

‭imaging data or patient biomarkers.‬

‭6. Face Detection and Recognition‬


‭Re

‭●‬ ‭Application‬‭: SVM is used in computer vision for face‬‭detection‬


‭and recognition in images and videos.‬
‭●‬ ‭How it Works‬‭: The algorithm classifies regions in‬‭an image as‬
‭containing a face or not, based on features extracted from the‬
‭image.‬
‭●‬ ‭Example‬‭: Implementing face recognition systems in security‬
‭applications or social media platforms.‬

‭7. Customer Segmentation‬

‭●‬ ‭Application‬‭: SVM can be used for customer segmentation‬‭in‬


‭marketing to classify customers into different groups based on‬


‭purchasing behavior and preferences.‬

es
‭●‬ ‭How it Works‬‭: By analyzing customer data, SVM identifies‬
‭distinct groups, allowing marketers to target specific segments‬

ot
‭with tailored campaigns.‬
‭●‬ ‭Example‬‭: Classifying customers as "high value," "low‬‭value," or "at‬
‭risk" based on their purchasing history.‬

‭8. Anomaly Detection‬ N


's
‭●‬ ‭Application‬‭: SVM is employed for anomaly detection‬‭tasks in‬
‭various fields, including cybersecurity, fraud detection, and‬
ah

‭network security.‬
‭●‬ ‭How it Works‬‭: The algorithm can identify unusual patterns‬‭or‬
‭outliers in data, classifying them as anomalies that may require‬
hm

‭further investigation.‬
‭●‬ ‭Example‬‭: Detecting fraudulent transactions in credit‬‭card‬
‭processing or identifying potential intrusions in network traffic.‬
‭Re

‭9. Natural Language Processing (NLP)‬

‭●‬ ‭Application‬‭: SVM is used in NLP tasks, such as part-of-speech‬


‭tagging, named entity recognition, and language identification.‬
‭●‬ ‭How it Works‬‭: It classifies words or phrases based on their‬
‭contextual features to determine their role or identity within‬
‭text.‬
‭●‬ ‭Example‬‭: Classifying sentences as declarative, interrogative,‬‭or‬
‭exclamatory based on their structure.‬

‭10. Time Series Forecasting‬


es
‭●‬ ‭Application‬‭: SVM can be utilized for forecasting time‬‭series data‬
‭in fields like economics, weather prediction, and stock market‬

ot
‭analysis.‬
‭●‬ ‭How it Works‬‭: By analyzing historical data trends,‬‭SVM can‬
‭predict future values in a time series dataset.‬

N
‭●‬ ‭Example‬‭: Predicting future stock prices based on historical‬
‭trends or forecasting weather conditions based on past climate‬
's
‭data.‬

‭11. Robotics‬
ah

‭●‬ ‭Application‬‭: SVM is applied in robotics for object‬‭recognition,‬


‭navigation, and human-robot interaction.‬
hm

‭●‬ ‭How it Works‬‭: It helps robots classify objects in‬‭their‬


‭environment and make decisions based on the classified data.‬
‭●‬ ‭Example‬‭: Enabling robots to recognize and pick up‬‭specific‬
‭Re

‭objects in an industrial setting.‬

‭12. Environmental Science‬

‭●‬ ‭Application‬‭: SVM is used in environmental monitoring‬‭and‬


‭classification of various environmental data, such as land cover‬
‭classification and species distribution modeling.‬
‭●‬ ‭How it Works‬‭: By analyzing satellite imagery and ecological data,‬
‭SVM can classify land types and predict species habitats.‬
‭●‬ ‭Example‬‭: Classifying land use types (e.g., urban,‬‭agricultural,‬
‭forest) from satellite images.‬

‭Q15) what do you mean by hypothesis? Provide examples for null‬


‭and alternate hypothesis with explanation. Provide working of null‬


‭and alternate hypothesis.‬

es
‭What is Hypothesis Testing?‬

ot
‭Hypothesis testing is a statistical method that is used in making‬
‭statistical decisions using experimental data.‬

‭population parameter.‬
‭Ex:‬
N
‭Hypothesis Testing is basically an assumption that we make about the‬
's
‭1) you say an average student in class is 40 or a boy is taller than girls.‬
‭2) Some scientists claim that ultraviolet (UV) light can damage the‬
ah

‭eyes then it may also cause blindness.‬


hm
‭Re
‭Terms‬
‭Hypothesis space (H):‬‭Hypothesis space is defined as a set of all‬
‭possible legal hypotheses; hence it is also known as a hypothesis set‬
‭Hypothesis (h):‬‭It is defined as the approximate function that best‬
‭describes the target in supervised machine learning algorithms. It is‬
‭primarily based on data as well as bias and restrictions applied to data.‬


es
ot
N
's
ah
hm

‭Need of Hypothesis‬
‭Hypothesis testing is an essential procedure in statistics.‬
‭A hypothesis test evaluates two mutually exclusive statements about a‬
‭Re

‭population to determine which statement is best supported by the‬


‭sample data. When we say that a finding is statistically significant‬
‭means a hypothesis test.‬
‭If a person gets 7 hours of sleep, then he will feel less fatigue than if‬
‭he sleeps less. Consumption of sugary drinks every day leads to obesity‬
‭Parameters of Hypothesis Testing:‬
‭Null Hypothesis‬
‭Alternate Hypothesis‬

‭Parameter‬ ‭Null Hypothesis‬ ‭Alternate Hypothesis‬

‭Definition‬ ‭ null hypothesis is a‬


A ‭ n alternative‬
A


‭statement in which‬ ‭hypothesis is a‬

es
‭there is no relation‬ ‭statement in which‬
‭between the two‬ ‭there is some‬

ot
‭Variables.‬ ‭statistical‬
‭relationship between‬
‭the two variables.‬

‭What it is?‬
N
‭ enerally,‬
G
‭researchers try to‬
‭ esearchers try to‬
R
‭accept or‬
's
‭reject or disprove it‬ ‭prove it.‬

‭Testing Process‬ ‭Indirect and Implicit‬ ‭Direct and Explicit‬


ah

‭P-Value‬ ‭ ull hypothesis is‬


N ‭ n alternative‬
A
‭rejected if‬ ‭hypothesis is‬
hm

‭the p-value is less‬ ‭accepted if the‬


‭than the alpha-value;‬ ‭p-value is less than‬
‭otherwise, it is‬ ‭the alpha-value‬
‭accepted.‬ ‭otherwise, it is‬
‭Re

‭rejected.‬

‭Notation‬ ‭H0‬ ‭H1‬

‭Symbol Used‬ ‭ quality Symbols =,‬


E ‭ nequality Symbols !=,‬
I
‭<=, >=‬ ‭!<=, !>=‬
‭Effect on Bio-fertilizer ‘x’ increases plant growth‬
‭Alternative Hypothesis H1:‬‭Application of Bio-fertilizer ‘x’ increases‬
‭plant growth‬
‭Null Hypothesis H1:‬‭Application of Bio-fertilizer ‘x’ does not increase‬
‭plant growth‬


‭Q16) Write a short note on Multivariate Regression‬

es
‭Multivariate Regression‬

ot
‭Multivariate regression is a statistical technique that uses a‬
‭mathematical model to estimate the relationship between a dependent‬
‭variable and multiple independent variables‬

N
‭It's an extension of linear regression, which only involves one response‬
's
‭variable. Multivariate regression can be used in a variety of‬
‭applications, including: Identifying risk factors for an outcome,‬
ah

‭Determining the effect of a procedure on an outcome, Comparing‬


‭different treatment strategies, Quantifying the magnitude of an‬
‭effect, and Developing risk-prediction models.‬
hm

‭Multivariate Regression is a method used to measure the degree at‬


‭which more than one independent variable (predictors) and more than‬
‭Re

‭one dependent variable (responses), are linearly related. The method is‬
‭broadly used to predict the behavior of the response variables‬
‭associated to changes in the predictor variables, once a desired degree‬
‭of relation has been established.‬

‭The Multivariate Regression model, relates more than one predictor‬


‭and more than one response.‬
‭Y = X*B+ ϵ‬


‭Here are some examples of how multivariate regression can be used:‬

es
‭●‬ ‭Pesticide concentration in surface water‬
‭A multivariate regression model can estimate the relationship between‬

ot
‭river flow and seasonal pesticide use, and how these factors affect‬
‭pesticide concentration in surface water.‬
‭●‬ ‭Intracranial bleeding‬

N
‭A multivariate logistic regression analysis can identify the strongest‬
‭predictors of intracranial bleeding, such as vomiting/nausea and‬
's
‭seizures.‬
‭●‬ ‭Multiple genetic variants and neuroimaging phenotypes‬
ah

‭A multivariate regression model can capture the complex relationships‬


‭between genes and brain measurements.‬
hm

‭Q17) State the importance of feature selection. How it is useful in‬


‭machine learning algorithms?‬
‭Re

‭Feature selection is a critical step in the machine learning pipeline that‬


‭involves selecting a subset of relevant features (or variables) for use in‬
‭model construction. It plays a significant role in improving the‬
‭performance of machine learning algorithms. Here’s an overview of its‬
‭importance and usefulness:‬

‭Importance of Feature Selection‬


‭1.‬ ‭Reduces Overfitting‬‭:‬
‭○‬ ‭By eliminating irrelevant or redundant features, feature‬
‭selection helps to reduce the complexity of the model. This,‬
‭in turn, lowers the risk of overfitting, where the model‬
‭learns noise instead of the underlying patterns in the‬
‭training data.‬


‭2.‬ ‭Improves Model Performance‬‭:‬

es
‭○‬ ‭Selecting the most relevant features can enhance the‬
‭model’s accuracy and predictive power. It allows the model‬

ot
‭to focus on the most informative data points, which can lead‬
‭to better generalization to unseen data.‬
‭3.‬ ‭Enhances Interpretability‬‭:‬

N
‭○‬ ‭A model with fewer features is often easier to interpret‬
‭and understand. This is especially important in fields such as‬
's
‭healthcare and finance, where stakeholders need to‬
‭understand the factors driving predictions.‬
ah

‭4.‬ ‭Reduces Computational Cost‬‭:‬


‭○‬ ‭Fewer features lead to a simpler model that requires less‬
hm

‭computational resources, which is particularly beneficial for‬


‭large datasets. It speeds up the training process and‬
‭reduces the time and memory required for both training and‬
‭prediction.‬
‭Re

‭5.‬ ‭Addresses the Curse of Dimensionality‬‭:‬


‭○‬ ‭In high-dimensional spaces, the amount of data required to‬
‭make reliable predictions increases exponentially. Feature‬
‭selection mitigates this issue by reducing the‬
‭dimensionality, allowing the model to perform better with‬
‭limited data.‬
‭6.‬ ‭Improves Data Quality‬‭:‬
‭○‬ ‭Feature selection can help identify and remove noisy or‬
‭irrelevant features that do not contribute meaningfully to‬
‭the analysis, leading to higher-quality datasets and better‬
‭model performance.‬
‭7.‬ ‭Facilitates Model Selection‬‭:‬


‭○‬ ‭Different machine learning algorithms may require‬

es
‭different features for optimal performance. Feature‬
‭selection can help identify the most relevant features for‬

ot
‭each algorithm, aiding in model comparison and selection.‬

‭Usefulness in Machine Learning Algorithms‬

‭1.‬ ‭Enhanced Learning‬‭:‬


N
‭○‬ ‭Machine learning algorithms perform better when they‬
's
‭focus on the most relevant features. Feature selection‬
‭helps in identifying those features that contribute most‬
ah

‭significantly to the output, leading to improved learning.‬


‭2.‬ ‭Faster Training‬‭:‬
‭○‬ ‭Training times are reduced when fewer features are used.‬
hm

‭This is especially important for algorithms like Support‬


‭Vector Machines (SVM), Random Forest, or Neural‬
‭Networks, which can be computationally intensive.‬
‭Re

‭3.‬ ‭Better Generalization‬‭:‬


‭○‬ ‭Models built with relevant features are more likely to‬
‭generalize well to new, unseen data. This leads to better‬
‭performance in real-world applications where the model‬
‭encounters data it has not seen before.‬
‭4.‬ ‭Support for Specific Algorithms‬‭:‬
‭○‬ ‭Some algorithms, like decision trees, can benefit‬
‭significantly from feature selection. By reducing the number‬
‭of features, decision trees can create simpler models that‬
‭make better splits and predictions.‬
‭5.‬ ‭Increased Robustness‬‭:‬
‭○‬ ‭Feature selection can lead to models that are less sensitive‬


‭to variations in the data, making them more robust in the‬

es
‭presence of noise or outliers.‬

‭Methods of Feature Selection‬

ot
‭Feature selection can be performed using various methods, including:‬

N
‭●‬ ‭Filter Methods‬‭: Evaluate features based on statistical measures‬
‭(e.g., correlation, Chi-square test) to select relevant features‬
‭independent of the learning algorithm.‬
's
ah

‭●‬ ‭Wrapper Methods‬‭: Use a specific machine learning algorithm to‬


hm

‭evaluate combinations of features and select the best-performing‬


‭subset.‬
‭Re

‭●‬ ‭Embedded Methods‬‭: Perform feature selection during the model‬


‭training process (e.g., Lasso regression, which adds a penalty for‬
‭including too many features).‬

‭Q18) State and explain common errors in machine learning / 6‬

es
‭Common Mistakes Machine Learning‬

ot
‭5 Common Machine Learning Errors‬
‭●‬ ‭Lack of understanding the mathematical aspect of machine‬
‭learning algorithms‬

N
‭●‬ ‭Data Preparation and Sampling‬
‭○‬ ‭Data Cleansing‬
's
‭○‬ ‭Feature Engineering‬
‭○‬ ‭Sampling‬
ah

‭●‬ ‭Implementing machine learning algorithms without a strategy‬


‭●‬ ‭Implementing everything from scratch‬
‭●‬ ‭Ignoring outliers‬
hm

‭References‬
‭Re

‭Research Paper on Cognitive automation by Christian Engel1 · Philipp‬


‭Ebel1 · Jan Marco Leimeister1‬
‭https://www.javatpoint.com/‬
‭https://www.geeksforgeeks.org/‬
‭https://www.kaggle.com/‬

You might also like