0% found this document useful (0 votes)
26 views

Assignment 2

The document outlines the assignment for a Bachelor of Technology course in Information Technology, focusing on Artificial Intelligence, Machine Learning, and Deep Learning. Students are required to work with a unique dataset, complete various tasks including data import, visualization, preprocessing, model selection, training, and evaluation, and submit their findings in a Google Colab file converted to PDF. The assignment has a submission deadline of March 29, 2025, and provides a list of datasets and additional resources for assistance.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

Assignment 2

The document outlines the assignment for a Bachelor of Technology course in Information Technology, focusing on Artificial Intelligence, Machine Learning, and Deep Learning. Students are required to work with a unique dataset, complete various tasks including data import, visualization, preprocessing, model selection, training, and evaluation, and submit their findings in a Google Colab file converted to PDF. The assignment has a submission deadline of March 29, 2025, and provides a list of datasets and additional resources for assistance.

Uploaded by

devhirpara8
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

School of Technology Design and Computer Application

College of Technology
Bachelor of Technology
Information Technology

Semester: 6 Academic Year: 2024-2025

Course Artificial Intelligence with Course Code: 1010103322


Name: concepts of Machine
Learning & Deep Learning

Assignment 2 [Unit: 5,6]

Instructions
Each student/group will be assigned a unique dataset. The following tasks must be completed and
documented in the report:

1. Import the Dataset

●​ Load the dataset using appropriate Python libraries (pandas, tensorflow, sklearn, etc.).
●​ Display the first few rows and understand the dataset’s structure.

2. Data Visualization & Preprocessing

●​ Identify missing values and handle them appropriately.


●​ Perform exploratory data analysis (EDA) using matplotlib and seaborn.
●​ Check for class imbalances and outliers.
●​ Perform necessary feature scaling and encoding if required.

3. Feature Extraction

●​ Identify important features using correlation, mutual information, or PCA.


●​ Drop irrelevant or redundant features.

4. Train-Test Data Split

●​ Split the dataset into training and testing sets (e.g., 80-20 or 70-30 split).
●​ Use train_test_split() from sklearn.model_selection.

1
5. Model Selection

●​ Choose an appropriate machine learning or deep learning model.


●​ Justify your choice of model for the given dataset.
●​ Consider traditional ML models (SVM, Decision Trees, Random Forest, Logistic Regression)
and deep learning models (CNN, LSTMs, Transformers) where applicable.

6. Model Training

●​ Train the selected model on the training dataset.


●​ Use hyperparameter tuning (GridSearchCV, RandomizedSearchCV, etc.) to improve model
performance.

7. Model Evaluation

●​ Evaluate model performance using appropriate metrics:


○​ Classification: Accuracy, Precision, Recall, F1-score, AUC-ROC
○​ Regression: RMSE, MAE, R2-score
○​ Time Series: MSE, Mean Absolute Percentage Error (MAPE)
●​ Visualize results using confusion matrix, ROC curves, or loss/accuracy plots.

8. Conclusion

●​ Interpret model performance.


●​ Suggest improvements and future enhancements.
●​ Compare different models (if applicable) and justify the best choice.

Datasets & Assignments

Each student/group will work on one of the following datasets:

1.​ Titanic Survival Prediction (Classification) - Kaggle Link


2.​ House Price Prediction (Regression) - Kaggle Link
3.​ IMDB Movie Reviews Sentiment Analysis (NLP) - tensorflow.keras.datasets.imdb
4.​ CIFAR-10 Image Classification (Computer Vision) - tensorflow.keras.datasets.cifar10
5.​ UCI Heart Disease Prediction (Medical Classification) - Kaggle Link
6.​ Retail Sales Forecasting (Walmart Sales Data) (Time Series) - Kaggle Link
7.​ Fake News Detection (NLP) - Kaggle Link
8.​ Credit Card Fraud Detection (Anomaly Detection) - Kaggle Link
9.​ Human Activity Recognition (HAR) with Smartphones (Classification) - Kaggle Link
10.​Plant Seedlings Classification (Image Classification) - Kaggle Link

2
Submission Guidelines

●​ The assignment must be submitted in the form of a google colab file, convert that file into
PDF then take print out and submit it after midsem exam.
●​ A PDF summarizing the approach, results, and analysis must be included.
●​ Deadline for submission: [29/03/2025 Saturday].

Additional Resources

●​ Python Libraries: pandas, numpy, sklearn, tensorflow, matplotlib, seaborn


●​ Kaggle Datasets: https://www.kaggle.com/datasets
●​ Google Colab for running models online: https://colab.research.google.com

Need Help?

If you have any questions, feel free to reach out via email or during teaching hours at EA-601(Ms.
Purvi patel).

Good luck and happy coding!

You might also like