0% found this document useful (0 votes)
22 views22 pages

AIML Hard

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views22 pages

AIML Hard

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 22

IT362_Artificial Intelligence and Machine 22IT05

Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 1: Prerequisites
Introduction to - Python Programming. How is Python used in machine learning? Discuss Python with Google
Colab.
- https://www.kaggle.com/learn/python

Numpy
- Creating a blank array, with predefined data, with pattern-specific data, Slicing and Updating elements,
Shape manipulations, looping over arrays, reading files in Numpy, use Numpy vs list for matrix
multiplication of 1000 X 1000 array, and evaluating computing performance.
- For Help:
https://www.dataquest.io/m/289-introduction-to-numpy
https://cloudxlab.com/blog/numpy-pandas-introduction/

Pandas
- Creating data frames, Reading files, Slicing manipulations, Exporting data to files, Columns and row
manipulations with loops
- Use pandas for masking data and reading if in Boolean format.
- For Help:
https://www.kaggle.com/learn/pandas

Matplotlib
- Importing matplotlib, Simple line chart, Correlation chart, Histogram, Plotting of Multivariate data, Plot
Pi Chart
- For Help:
https://matplotlib.org/stable/gallery/showcase/anatomy.html
https://towardsdatascience.com/data-visualization-using-matplotlib-16f1aae5ce70
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 2: Exploratory Data Analysis


The Indian government is interested in understanding the factors influencing air quality across major cities. You
have been provided with a dataset containing historical air quality measurements from various monitoring
stations across India.
Dataset:
• Dataset Name: "India Air Quality Data"
• Source: Kaggle (https://www.kaggle.com/datasets/rohanrao/air-quality-data-in-india)

Key Features (Columns):


- stn_code: Station code (unique identifier)
- sampling_date: Date of measurement (YYYY-MM-DD)
- state: State name
- city: City name
- location: Monitoring station location
- pollutant_id: Pollutant identifier (e.g., PM2.5, NO2, SO2)
- pollutant_min: Minimum concentration of the pollutant
- pollutant_max: Maximum concentration of the pollutant
- pollutant_avg: Average concentration of the pollutant
Perform an exploratory data analysis of the India Air Quality Data to answer the following questions:

Data Cleaning and Preprocessing:


- Are there any missing values in the dataset? If so, how will you handle them?
- Are there any outliers or inconsistencies in the data? How would you address them?

Overall Air Quality Trends:


- What are the most common pollutants measured across India?
- How does the overall air quality vary across different states and cities?
- Are there any noticeable trends in air quality over time (e.g., seasonal variations)?

Pollutant-Specific Analysis:
- For each pollutant, which cities have the highest and lowest average concentrations?
- Can you identify any correlations between different pollutants?
- Are there any specific locations that consistently report high levels of pollution?

Insights and Recommendations:


- Based on your analysis, what insights can you derive about air quality in India?
- What recommendations would you make to policymakers or relevant authorities for improving air quality?
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 3: Simple Linear Regression


The goal of this assignment is to study and implement simple linear regression using three different approaches:
1. Ordinary Least Squares (OLS) method
2. SKLearn library
3. Gradient Descent

Select a real dataset from UCI machine Learning Repository with one dependent variable and one independent
variable to compare the results of each approach and respond to the following questions.

1. Discuss the full story of the dataset and discuss why regression is applicable on the dataset.
2. Write a code to show
2.1. How many total observations in data?
2.2. Data Distribution of independent and independent variables?
2.3. Relationship between dependent and independent variables(Correlation analysis).
3. Write a code to implement linear regression using the Ordinary Least Squares method on selected dataset.
4. Use sklearn API to create linear regression using selected dataset. Print intercept and slope of a model.
5. Write a code to implement linear regression using Gradient Descent from scratch on selected dataset.
6. Quantify the goodness of your model using a table to display the result of predictions using SSE, RMSE and
R2 Score and discuss interpretation of errors and steps taken for improvement of errors.
7. Prepare presentation for this work in group of 5

References:
 Sklearn API: https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html
 Kaggle Notebook: https://www.kaggle.com/code/nargisbegum82/step-by-step-ml-linear-regression
 Complete Tutorial: https://realpython.com/linear-regression-in-python/
 API reference:
https://scikitlearn.org/stable/modules/generated/sklearn.linear_model.LinearRegression.html
 Dataset Reference: https://archive.ics.uci.edu/datasets
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 5: Binary Classification Using Logistic Regression


Objective - In this assignment, you will apply logistic regression to a dataset of your choice to perform binary
classification.
- Learn to create synthetic datasets and also explore the real datasets for classification.
- Implement logistic regression both from scratch and using sklearn.
- Evaluate the model's performance using various classification evaluation methods.

Task 1: Dataset preparation


- Create synthetic dataset using make_classification api of sklearn and prepare train and test dataset. Give
details on how each parameter from skelarn api is used in creating the dataset.
- Create a synthetic dataset using the hypothesis of your choice and give details of the hypothesis
including references to prove your hypothesis.
- Select a dataset of your choice for classification from the UCI machine learning repository. Give details
about the dataset and prepare for classification in train and test sets.

Task 2: Model training


- Implement logistic regression from scratch by developing mathematical foundation and apply it on all
training datasets developed in the task 1
- Implement logistic regression using sklearn API and apply it on all training datasets developed in the
task 1

Task 3: Model evaluation


- Apply all models in task 2, on the test dataset created in task 1, and document accuracy, precision
recall and f1 score in the table.
- Discuss hyperparameter tuning for classification and cases of overfitting and underfitting.

References:
- https://medium.com/@anishsingh20/logistic-regression-in-python-423c8d32838b
- https://www.datacamp.com/community/tutorials/understanding-logistic-regression-python
- https://towardsdatascience.com/building-a-logistic-regression-in-python-step-by-step-becd4d56c9c8
- https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 7: KNN
Objective - To apply and understand both multi-class classification and regression using the K-Nearest
Neighbors (KNN) algorithm. This practical assignment will help you grasp the intricacies of KNN, its
implementation, and its evaluation in both classification and regression contexts.

Note: Ensure that your work is original and well-researched.

Task Instructions:

1. Dataset Selection:
○ Choose a dataset suitable for both multi-class classification and regression tasks. Explain why
you selected this dataset and provide a detailed background story behind the dataset. Example
datasets: Iris, Wine, California Housing, or any other relevant dataset.

2. Dataset Exploration:
○ Write code to display descriptive statistics of the dataset and distribution of dependent variables.
Determine which variables are most useful for classification and regression. Provide evidence
using correlation analysis.
○ Create X_train, y_train, X_test, y_test for both datasets.

3. KNN Implementation:
○ Implement the KNN algorithm for multi-class classification and regression using the sklearn
library. Ensure your code is well-documented and modular.
○ Write code to find the best value of 'k' for both classification and regression.

4. Model Evaluation:
○ For Classification: Quantify the goodness of your model using appropriate metrics (accuracy,
precision, recall, F1-score).
○ For Regression: Quantify the goodness of your model using appropriate metrics (mean squared
error, RMSE, R2 Score).
○ Discuss steps taken for improving model performance, such as feature selection, handling
missing values, or tuning hyperparameters.

5. Presentation: Prepare a presentation summarizing your work. This should include:


○ Dataset overview and key findings from the exploratory data analysis.
○ Evaluation metrics (Results) and model performance for both tasks.
○ Discussion on the applicability of KNN for regression and classification, and its drawbacks.

6. Group Activity:
○ Form groups of 5 students each. Collaborate on this task, ensuring that each group member
contributes to different sections of the task.
○ Present your findings and implementation in a 10-minute presentation to the class.
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 9: SVM (Maximum Margin separation)


Objective: The goal of this assignment is to understand and apply hyperparameter tuning in Support Vector
Machines (SVM) to achieve maximum margin separation on a synthetic dataset and then apply SVM on a real
dataset from the UCI Machine Learning Repository.

1. Synthetic Dataset Analysis with SVM:

 Create a synthetic dataset using make_blobs from the sklearn.datasets module.


 Train an SVM classifier on this dataset and experiment with different hyperparameters to
achieve optimal margin separation.
 Visualize the results to demonstrate how different hyperparameters affect the decision boundary and
margin.

2. Real Dataset Analysis with SVM:

 Choose a real dataset from the UCI Machine Learning Repository.


 Preprocess the dataset as necessary (handling missing values, encoding categorical variables, etc.).
 Train an SVM classifier on this real dataset and evaluate its performance.
 Apply hyperparameter tuning to improve the classifier's performance.

3. Evaluation Criteria:

 Correct implementation of SVM on both synthetic and real datasets.


 Effective hyperparameter tuning and its impact on model performance.
 Quality and clarity of visualizations and reporting.
 Understanding and application of preprocessing steps on real datasets.
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 10: Tree based Models


Assignment Task - Classification Using Decision Tree, Random Forest, and XGBoost

Objective - In this assignment, you will apply three popular machine learning algorithms—Decision Tree,
Random Forest, and XGBoost—to a classification problem. You will compare the performance of these models
using various evaluation metrics and analyze their results. Select appropriate dataset to demonstrate Tree based
classification model from UCI machine learning Repository.

Note:
 Use appropriate libraries such as scikit-learn for Decision Tree and Random Forest, and XGBoost’s
library for the XGBoost model.
 Make sure to follow best practices for machine learning workflows, including cross-validation and
hyperparameter tuning.

Assignment Tasks:

1. Data Preprocessing:
 Identify and missing values if present. Convert categorical features into numerical format using
techniques such as one-hot encoding. Normalize or standardize features if necessary.
 Split the dataset into training and testing sets (e.g., 80% training, 20% testing).

2. Model Training and Evaluation:


 Train a Decision Tree, Random Forest and XGBoost on the training data.
 Tune hyperparameters if necessary.
 Evaluate the model using metrics accuracy, precision, recall, and F1-score as per domain requirement
and the test data.

3. Comparison and Analysis:


 Compare the performance of the three models based on evaluation metrics.
 Discuss the strengths and weaknesses of each model in the context of this dataset.
 Analyze feature importances for each model.

4. Visualization:
 Plot the decision tree to visualize the model's decision boundaries.
 Create feature importance plots for Random Forest and XGBoost.
 Generate a confusion matrix for each model and discuss the results.

Submission:
 Submit a Jupyter Notebook containing your code and all relevant plots and visualizations.
 Include a written report in a PDF format with code.
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week-11 NLP
1. Write a program to perform tokenization by word and sentence using spacy.
2. Write a program to eliminate stop words using spacy.
3. Write a program to perform Parts of speech tagging using spacy.
4. Write a program to perform lemmatization using spacy.
5. Write a program to perform Named Entity Recognition using spacy.
6. Write a python program to find Term Frequency and Inverse Document Frequency (TF-IDF).
(from sklearn.feature_extraction.text import TfidfVectorizer)
7. Write a python program to find all unigrams, bigrams and trigrams present in the given corpus.
(from sklearn.feature_extraction.text import CountVectorizer)
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 12: CNN/ANN


Apply ANN and CNN for image classification of your choice.
Follow code in attachment.
Justify the goodness of your model with charts and numbers.
Compare ANN and CNN in terms of effectiveness.
IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

Week 13: RNN


Prepare a code based on a task explained in the following blog.
1. https://colah.github.io/posts/2015-08-Understanding-LSTMs/
2. https://karpathy.github.io/2015/05/21/rnn-effectiveness/

Submit pdf of your work explaining sequence modeling using RNN.


IT362_Artificial Intelligence and Machine 22IT05
Learning 1
IT362_Artificial Intelligence and Machine 22IT05
Learning 1

You might also like