AIML Hard
AIML Hard
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
Week 1: Prerequisites
Introduction to - Python Programming. How is Python used in machine learning? Discuss Python with Google
Colab.
- https://www.kaggle.com/learn/python
Numpy
- Creating a blank array, with predefined data, with pattern-specific data, Slicing and Updating elements,
Shape manipulations, looping over arrays, reading files in Numpy, use Numpy vs list for matrix
multiplication of 1000 X 1000 array, and evaluating computing performance.
- For Help:
https://www.dataquest.io/m/289-introduction-to-numpy
https://cloudxlab.com/blog/numpy-pandas-introduction/
Pandas
- Creating data frames, Reading files, Slicing manipulations, Exporting data to files, Columns and row
manipulations with loops
- Use pandas for masking data and reading if in Boolean format.
- For Help:
https://www.kaggle.com/learn/pandas
Matplotlib
- Importing matplotlib, Simple line chart, Correlation chart, Histogram, Plotting of Multivariate data, Plot
Pi Chart
- For Help:
https://matplotlib.org/stable/gallery/showcase/anatomy.html
https://towardsdatascience.com/data-visualization-using-matplotlib-16f1aae5ce70
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
Pollutant-Specific Analysis:
- For each pollutant, which cities have the highest and lowest average concentrations?
- Can you identify any correlations between different pollutants?
- Are there any specific locations that consistently report high levels of pollution?
Select a real dataset from UCI machine Learning Repository with one dependent variable and one independent
variable to compare the results of each approach and respond to the following questions.
1. Discuss the full story of the dataset and discuss why regression is applicable on the dataset.
2. Write a code to show
2.1. How many total observations in data?
2.2. Data Distribution of independent and independent variables?
2.3. Relationship between dependent and independent variables(Correlation analysis).
3. Write a code to implement linear regression using the Ordinary Least Squares method on selected dataset.
4. Use sklearn API to create linear regression using selected dataset. Print intercept and slope of a model.
5. Write a code to implement linear regression using Gradient Descent from scratch on selected dataset.
6. Quantify the goodness of your model using a table to display the result of predictions using SSE, RMSE and
R2 Score and discuss interpretation of errors and steps taken for improvement of errors.
7. Prepare presentation for this work in group of 5
References:
Sklearn API: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
Kaggle Notebook: https://www.kaggle.com/code/nargisbegum82/step-by-step-ml-linear-regression
Complete Tutorial: https://realpython.com/linear-regression-in-python/
API reference:
https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
Dataset Reference: https://archive.ics.uci.edu/datasets
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
References:
- https://medium.com/@anishsingh20/logistic-regression-in-python-423c8d32838b
- https://www.datacamp.com/community/tutorials/understanding-logistic-regression-python
- https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8
- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
Week 7: KNN
Objective - To apply and understand both multi-class classification and regression using the K-Nearest
Neighbors (KNN) algorithm. This practical assignment will help you grasp the intricacies of KNN, its
implementation, and its evaluation in both classification and regression contexts.
Task Instructions:
1. Dataset Selection:
○ Choose a dataset suitable for both multi-class classification and regression tasks. Explain why
you selected this dataset and provide a detailed background story behind the dataset. Example
datasets: Iris, Wine, California Housing, or any other relevant dataset.
2. Dataset Exploration:
○ Write code to display descriptive statistics of the dataset and distribution of dependent variables.
Determine which variables are most useful for classification and regression. Provide evidence
using correlation analysis.
○ Create X_train, y_train, X_test, y_test for both datasets.
3. KNN Implementation:
○ Implement the KNN algorithm for multi-class classification and regression using the sklearn
library. Ensure your code is well-documented and modular.
○ Write code to find the best value of 'k' for both classification and regression.
4. Model Evaluation:
○ For Classification: Quantify the goodness of your model using appropriate metrics (accuracy,
precision, recall, F1-score).
○ For Regression: Quantify the goodness of your model using appropriate metrics (mean squared
error, RMSE, R2 Score).
○ Discuss steps taken for improving model performance, such as feature selection, handling
missing values, or tuning hyperparameters.
6. Group Activity:
○ Form groups of 5 students each. Collaborate on this task, ensuring that each group member
contributes to different sections of the task.
○ Present your findings and implementation in a 10-minute presentation to the class.
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
3. Evaluation Criteria:
Objective - In this assignment, you will apply three popular machine learning algorithms—Decision Tree,
Random Forest, and XGBoost—to a classification problem. You will compare the performance of these models
using various evaluation metrics and analyze their results. Select appropriate dataset to demonstrate Tree based
classification model from UCI machine learning Repository.
Note:
Use appropriate libraries such as scikit-learn for Decision Tree and Random Forest, and XGBoost’s
library for the XGBoost model.
Make sure to follow best practices for machine learning workflows, including cross-validation and
hyperparameter tuning.
Assignment Tasks:
1. Data Preprocessing:
Identify and missing values if present. Convert categorical features into numerical format using
techniques such as one-hot encoding. Normalize or standardize features if necessary.
Split the dataset into training and testing sets (e.g., 80% training, 20% testing).
4. Visualization:
Plot the decision tree to visualize the model's decision boundaries.
Create feature importance plots for Random Forest and XGBoost.
Generate a confusion matrix for each model and discuss the results.
Submission:
Submit a Jupyter Notebook containing your code and all relevant plots and visualizations.
Include a written report in a PDF format with code.
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
Week-11 NLP
1. Write a program to perform tokenization by word and sentence using spacy.
2. Write a program to eliminate stop words using spacy.
3. Write a program to perform Parts of speech tagging using spacy.
4. Write a program to perform lemmatization using spacy.
5. Write a program to perform Named Entity Recognition using spacy.
6. Write a python program to find Term Frequency and Inverse Document Frequency (TF-IDF).
(from sklearn.feature_extraction.text import TfidfVectorizer)
7. Write a python program to find all unigrams, bigrams and trigrams present in the given corpus.
(from sklearn.feature_extraction.text import CountVectorizer)
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1