0% found this document useful (0 votes)

76 views7 pages

Data Mining & Machine Learning Courseoutline

The document outlines the course CS404 Data Mining and Machine Learning for Year 4 Computer Science students, detailing its objectives, study units, assignments, and recommended tools. The course aims to provide students with practical skills in data mining and machine learning techniques, focusing on data preprocessing, algorithm application, and model evaluation. It includes various group project topics and guidelines for assignment submissions, emphasizing the importance of ethical considerations and real-world applications.

Uploaded by

Tatekulor Aloysius Caleb Kamara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

76 views7 pages

Data Mining & Machine Learning Courseoutline

Uploaded by

Tatekulor Aloysius Caleb Kamara

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

Computer Science Department

Yoni Campus
Year 4 Semester 1

Course Outline
Data Mining and Machine Learning
Credit Hours: 3
Course Code: CS404

Lecturers
Engr. Abdulai Brato Kamara
Engr. Ibrahim Kalokoh (HOD)

Academic Year 2024 - 2025

Contents
1 Course Description 1

2 Objectives 1

3 Study Units 1

4 Assignment 2

5 Recommended Tools and Frameworks 5

6 References/Further Readings 5
1 Course Description
CS404 Data Mining and Machine Learning is an advanced course designed for Year 4 Computer
Science students, providing comprehensive knowledge and practical skills in data mining and
machine learning techniques. This course aims to equip students with the ability to extract
valuable insights from large datasets and develop predictive models using various machine
learning algorithms. Emphasis is placed on understanding the theoretical foundations, applying
algorithms to real-world datasets, and evaluating the performance of machine learning models.

2 Objectives
By the end of the course, students will be able to perform data preprocessing, select and apply
appropriate machine learning algorithms, evaluate model performance, and implement data
mining solutions to solve complex problems in various ﬁelds.

3 Study Units
Week 1-2: Introduction to Data Mining and Machine Learning Overview of Data
Mining and Machine Learning Historical perspective and applications Basic concepts and
terminology The data mining process and the machine learning pipeline Comparison of supervised
and unsupervised learning
Week 3-4: Data Preprocessing Data preprocessing tools and techniques Data cleaning and
integration Data transformation and reduction Handling missing values and outliers Feature
engineering
Week 5-7: Supervised Learning Algorithms Linear regression and its applications Logistic
regression for classiﬁcation problems Decision trees and ensemble methods Support Vector
Machines (SVM) Evaluation metrics for classiﬁcation models k-Nearest Neighbors
Week 7-9: Unsupervised Learning Algorithms Clustering algorithms (k-means, hierarchical
clustering) Association rule mining Principal Component Analysis (PCA) t-Distributed Stochastic
Neighbor Embedding (t-SNE)
Week 9-11: Advanced Topics in Machine Learning Ensemble methods (bagging, boosting)
Neural networks and deep learning Reinforcement learning basics and applications Feature
selection and dimensionality reduction techniques Evaluation metrics for machine learning
models
Week 12-13: Data Mining Applications Real-world applications of data mining and
machine learning Group projects: Implementing machine learning algorithms on datasets Ethical
considerations in data mining and machine learning
4 Assignment
The deadlines for submitting assignments and presentations are set, and students must submit
one project assignment and deliver a presentation on the chosen topic, with no group choosing
the same topic. Divide yourself into groups of 5-6 members
Due Date: 16th December 2024 at 11:59PM

1. Apply clustering or association rule mining to spatial data (e.g., geographic information).
Discuss practical applications and challenges of mining spatial data.

2. Use data summarization techniques (e.g., pivot tables, aggregation) on a large dataset.
Explain how summarization helps reveal key trends or patterns.

3. Use visualization techniques to explore a complex dataset (e.g., network or social media
data) and identify patterns. Discuss how visualization assists in the interpretation of
data.

4. Explain the diﬀerences between supervised and unsupervised learning. Provide examples
of datasets where each type would be appropriate, discussing why each approach suits
speciﬁc problems.

5. Describe common evaluation metrics (accuracy, precision, recall, F1-score, ROC-AUC).

Compare models using these metrics on a dataset of your choice and explain the impact
of each metric on model selection.

6. Implement a linear regression model on a dataset (e.g., housing prices, sales data).
Analyze the models performance and interpret the signiﬁcance of each feature in predicting
the target variable.

7. Compare the performance of diﬀerent classiﬁcation algorithms (e.g., SVM, Decision Trees,
and KNN) on a dataset (e.g., Iris, MNIST). Evaluate the models using appropriate metrics
and discuss the results.

8. Describe various feature selection techniques (e.g., correlation, chi-square, L1 regularization).

Apply feature selection to a dataset and discuss how it impacts model accuracy and
complexity.

9. Implement and compare the performance of ensemble methods (e.g., Bagging, Boosting)
on a dataset. Explain why ensemble methods can improve model performance.

10. Explain cross-validation and its importance in machine learning. Perform cross-validation
on a model and analyze how it impacts the models performance and reliability.

11. Describe grid search and random search techniques for hyperparameter tuning. Apply
one of these techniques to improve the performance of a classiﬁcation model and analyze
the results.

Page 2
Assignment Submission Guidelines
- Submit a Jupyter Notebook or a well-documented code ﬁle along with a report.
- Include visualizations and plots to support your analysis.
- Clearly state any assumptions made during the analysis.
Submit to: [email protected] or [email protected]

Grading Criteria
- Correct implementation of data preprocessing and feature extraction.
- Implementation and correct usage of classiﬁcation algorithms.
- Thorough evaluation of model performance using appropriate metrics.
- Eﬀectiveness of hyperparameter tuning.
- Quality of analysis and conclusions.

Presentation Date: 13th January, 2025

Test Date: 20th January 2025

Final Project Topics

Due Date: Depends on the exams timetable

Group One
- Use machine learning algorithms (e.g., logistic regression, Naive Bayes) to classify sentiment
in text data (e.g., Twitter or product reviews). Discuss preprocessing steps like tokenization
and sentiment labeling.
- Explain the steps involved in data preprocessing. Apply these steps to a real-world dataset
and describe how they improve the quality of data for mining.

Group Two
- Implement a multi-label classiﬁcation model for a dataset where each instance can belong to
multiple categories (e.g., music genres, movie tags). Evaluate using appropriate metrics (e.g.,
Hamming loss). - Use the Apriori or FP-Growth algorithm on a retail dataset. Identify frequent
itemsets and generate association rules, interpreting the signiﬁcance of the results.

Group Three
- Apply clustering to a dataset for customer segmentation. Discuss how clustering results could
be used to tailor marketing strategies.
- Implement the K-Means or DBSCAN algorithm on a dataset of your choice (e.g., customer
segmentation data). Visualize and analyze the clusters, discussing the relevance of diﬀerent
clustering methods.

Group Four

Page 3
- Implement an image segmentation algorithm (e.g., U-Net) to segment regions of interest in
an image dataset (e.g., medical images). Evaluate the segmentation accuracy.
- Apply PCA or t-SNE to a high-dimensional dataset, and explain how dimensionality reduction
helps in the context of data mining. Analyze the principal components and interpret their
signiﬁcance.

Group Five
- Create a voting ensemble classiﬁer that combines predictions from multiple models (e.g.,
Decision Trees, KNN, SVM). Evaluate the ensemble’s performance compared to individual
models.
- Identify and remove outliers from a dataset. Discuss diﬀerent methods (e.g., Z-score, IQR)
and analyze the impact of removing outliers on model accuracy.

Group Six
- Implement a gradient boosting model (e.g., XGBoost, CatBoost) on a dataset and evaluate
its accuracy. Discuss how gradient boosting improves model performance.
- Describe anomaly detection techniques and apply one to a dataset (e.g., fraud detection
in transactions). Explain the challenges and signiﬁcance of anomaly detection in real-world
applications.

Group Seven
- Use an Isolation Forest model to detect anomalies in a dataset (e.g., network traﬃc, sensor
data). Discuss the use cases of anomaly detection in real-time systems.
- Use Natural Language Processing (NLP) to extract key topics from a text dataset (e.g.,
customer reviews, social media posts). Apply topic modeling techniques like LDA and interpret
the results.

Group Eight
- Perform hyperparameter tuning using both grid search and random search on a complex
model. Compare the results and discuss the trade-oﬀs in performance and computation time.
- Merge two or more datasets with related information (e.g., customer transactions and demographics).
- Discuss challenges and approaches for dealing with inconsistent or missing data during
integration.

Group Nine
- Create a machine learning pipeline that includes steps like preprocessing, feature engineering,
and model training. Test the pipeline on a dataset and evaluate its accuracy.
- Apply a sequential pattern mining algorithm on time-series data (e.g., clickstream or event
logs) to identify common sequences of events. Interpret the patterns and their business relevance.

Page 4
Group Ten
- Generate adversarial examples for a trained image classiﬁer (e.g., using FGSM). Discuss the
impact of adversarial examples on model robustness and implications for security.
- Use image processing techniques (e.g., scaling, color normalization) on an image dataset and
explain how these preprocessing steps improve data quality for mining.

5 Recommended Tools and Frameworks

- Python programming language
- Scikit-Learn, TensorFlow, and PyTorch for machine learning implementations
- Jupyter Notebooks for interactive coding and analysis

6 References/Further Readings
1. Stemkoski, L., & Pascale, M. (2021, May 27). Developing graphics frameworks with
python and OpenGL. OAPEN Home.

2. Python Machine Learning for Beginners: Learning from scratch NumPy, pandas, Matplotlib,
Seaborn, Scikitlearn, and TensorFlow for Machine Learning and data science. (2020). AI
Publishing.

3. Geron, A. (2022). Hands-on machine learning with scikit-learn, Keras, and tensorﬂow
concepts, tools, and techniques to build Intelligent Systems. O’Reilly.

4. Witten, I. H., Frank, E.„ Hall, M. A. (2011). Data Mining: Practical Machine Learning
Tools and Techniques. Amsterdam: Morgan Kaufmann. ISBN: 978-0-12-374856-0

5. HAN, J.I.A.W.E.I. (2022) Data Mining: Concepts and Techniques. S.l.: MORGAN KAUFMANN.

Page 5

Comprehensive Guide to Data Science Basics
No ratings yet
Comprehensive Guide to Data Science Basics
6 pages
Machine Learning Project Guide
No ratings yet
Machine Learning Project Guide
3 pages
CUML1021 Machine Learning For Predictive Analytics Syllabus
No ratings yet
CUML1021 Machine Learning For Predictive Analytics Syllabus
4 pages
CourseCurriculum EML
No ratings yet
CourseCurriculum EML
3 pages
Objective
No ratings yet
Objective
3 pages
Course Content
No ratings yet
Course Content
3 pages
Skill Based Projects - Data - Science (See List On Last Page)
No ratings yet
Skill Based Projects - Data - Science (See List On Last Page)
4 pages
Document 1
No ratings yet
Document 1
6 pages
Data Science & AI Essentials
100% (1)
Data Science & AI Essentials
20 pages
Python Data Science Essentials
No ratings yet
Python Data Science Essentials
11 pages
MLT Syllabus
No ratings yet
MLT Syllabus
3 pages
Domain Course
No ratings yet
Domain Course
15 pages
Machine Learning Syllabus V2
No ratings yet
Machine Learning Syllabus V2
5 pages
Course Curriculum
No ratings yet
Course Curriculum
3 pages
ML Syllabus
No ratings yet
ML Syllabus
5 pages
ML 5 Days
No ratings yet
ML 5 Days
7 pages
F21DL 2024-25 Coursework-1 - 240918 - 110502
No ratings yet
F21DL 2024-25 Coursework-1 - 240918 - 110502
7 pages
Cmsa Sem 6 Dse ML
No ratings yet
Cmsa Sem 6 Dse ML
3 pages
Course Objectives DM
No ratings yet
Course Objectives DM
4 pages
Data Science Interview Study Notes
No ratings yet
Data Science Interview Study Notes
7 pages
Machine Learning Theory and Application
No ratings yet
Machine Learning Theory and Application
3 pages
ML Assignments
No ratings yet
ML Assignments
2 pages
AA Syllabus 2024 25
No ratings yet
AA Syllabus 2024 25
4 pages
AL-405 Machine Learning Lab Manual
No ratings yet
AL-405 Machine Learning Lab Manual
40 pages
Complete Data Science Learning Guide - Beginner To Expert
No ratings yet
Complete Data Science Learning Guide - Beginner To Expert
25 pages
INF385T IMLsyllabus
No ratings yet
INF385T IMLsyllabus
4 pages
MCA, DCS, IIICT, Indus University: Course Objectives
No ratings yet
MCA, DCS, IIICT, Indus University: Course Objectives
2 pages
Assignment 2
No ratings yet
Assignment 2
3 pages
B.Tech CSE Sem VII-VIII Course Details
No ratings yet
B.Tech CSE Sem VII-VIII Course Details
51 pages
Course-Outline - Introduction To ML
No ratings yet
Course-Outline - Introduction To ML
3 pages
Syllabus
No ratings yet
Syllabus
3 pages
Course Outline 2025
No ratings yet
Course Outline 2025
5 pages
L0 Big Picture of ML - PMDS
No ratings yet
L0 Big Picture of ML - PMDS
12 pages
Machine Learning and Data Science Master
No ratings yet
Machine Learning and Data Science Master
19 pages
Manual Data
No ratings yet
Manual Data
13 pages
Session 4 Machine Learning Process
No ratings yet
Session 4 Machine Learning Process
28 pages
ML Notes-1
No ratings yet
ML Notes-1
59 pages
Data Science Deep Learning & Artificial Intelligence
No ratings yet
Data Science Deep Learning & Artificial Intelligence
9 pages
ML Masters Curriculum Brochure
No ratings yet
ML Masters Curriculum Brochure
12 pages
Handout
No ratings yet
Handout
4 pages
AI Project With Placeholders Final
No ratings yet
AI Project With Placeholders Final
24 pages
4.introductin To Machine Learning
No ratings yet
4.introductin To Machine Learning
28 pages
Machine Learning Article Writing Guide
No ratings yet
Machine Learning Article Writing Guide
3 pages
Machine Learning Course Overview
No ratings yet
Machine Learning Course Overview
3 pages
Data Mining Course Handout BITS Goa
No ratings yet
Data Mining Course Handout BITS Goa
4 pages
ML - 3170724 Cipat 2025
No ratings yet
ML - 3170724 Cipat 2025
3 pages
Syllabus 2
No ratings yet
Syllabus 2
3 pages
Advanced Techniques in Machine Learning and Optimization
No ratings yet
Advanced Techniques in Machine Learning and Optimization
8 pages
ML Course Outline
No ratings yet
ML Course Outline
4 pages
Comp5541 20231
No ratings yet
Comp5541 20231
3 pages
Mcse615l - Data-Analytics - TH - 1.0 - 71 - Mcse615l - 67 Acp
No ratings yet
Mcse615l - Data-Analytics - TH - 1.0 - 71 - Mcse615l - 67 Acp
2 pages
Important Questions
No ratings yet
Important Questions
4 pages
Lab Manual - MACHINE LEARNING LABORATORY
No ratings yet
Lab Manual - MACHINE LEARNING LABORATORY
42 pages
Machine Learning BCA57204LAB
No ratings yet
Machine Learning BCA57204LAB
41 pages
AIML Curriculum
No ratings yet
AIML Curriculum
25 pages
Machine Learning Lab Assignment Overview
No ratings yet
Machine Learning Lab Assignment Overview
38 pages
PGPAIML Curriculum Overview
No ratings yet
PGPAIML Curriculum Overview
15 pages
Krishna Edx Machine Learning With Python
No ratings yet
Krishna Edx Machine Learning With Python
18 pages
Python Debugging Handbook - Marking
No ratings yet
Python Debugging Handbook - Marking
220 pages
Regression Analysis in Excel (In Easy Steps)
No ratings yet
Regression Analysis in Excel (In Easy Steps)
4 pages
ATCD - Unit 5 - QB
No ratings yet
ATCD - Unit 5 - QB
4 pages
ANSI Common LISP Prentice Hall Series in Artificial Intelligence 1st Edition by Paul Graham ISBN 0133708756 978-0133708752
No ratings yet
ANSI Common LISP Prentice Hall Series in Artificial Intelligence 1st Edition by Paul Graham ISBN 0133708756 978-0133708752
44 pages
Esp32 Cam
No ratings yet
Esp32 Cam
3 pages
13-Booth''s Algorithm-14-08-2025
No ratings yet
13-Booth''s Algorithm-14-08-2025
24 pages
And: Tools For Gregorian Score Engraving.: Gregorio Gregoriotex
No ratings yet
And: Tools For Gregorian Score Engraving.: Gregorio Gregoriotex
305 pages
K-Maps for Circuit Simplification
No ratings yet
K-Maps for Circuit Simplification
26 pages
AI Practical File (Programs)
No ratings yet
AI Practical File (Programs)
7 pages
Programming Languages Theory Guide
No ratings yet
Programming Languages Theory Guide
19 pages
CS0051 Fa3 Fa4
No ratings yet
CS0051 Fa3 Fa4
19 pages
2-3 Tree Java Solution
No ratings yet
2-3 Tree Java Solution
3 pages
Graph Theory Assignment Tasks
No ratings yet
Graph Theory Assignment Tasks
3 pages
Log
No ratings yet
Log
5 pages
Pre-Final Exam in Computer Programming
No ratings yet
Pre-Final Exam in Computer Programming
5 pages
Lect 17 SHA 256 11122023 021605pm
No ratings yet
Lect 17 SHA 256 11122023 021605pm
27 pages
4.1.6.relation Extraction
No ratings yet
4.1.6.relation Extraction
6 pages
ASCII
No ratings yet
ASCII
29 pages
CSC 206 - 104 - Tutorial Questions Solution
No ratings yet
CSC 206 - 104 - Tutorial Questions Solution
10 pages
CSE Complete Syllabus
No ratings yet
CSE Complete Syllabus
9 pages
Python Unit 1
No ratings yet
Python Unit 1
15 pages
Industrial Training Report
No ratings yet
Industrial Training Report
60 pages
Cambridge IGCSE™: Cambridge International Mathematics 0607/63 May/June 2020
No ratings yet
Cambridge IGCSE™: Cambridge International Mathematics 0607/63 May/June 2020
6 pages
Personal APT, Reference Guide, NC - CNC Programming Software - PDF Room
No ratings yet
Personal APT, Reference Guide, NC - CNC Programming Software - PDF Room
193 pages
ML Cheatsheet 2024-2025
No ratings yet
ML Cheatsheet 2024-2025
2 pages
Introduction To Data Structure
No ratings yet
Introduction To Data Structure
20 pages
Execute Python Syntax
No ratings yet
Execute Python Syntax
8 pages
Week 3 Tokens in C
No ratings yet
Week 3 Tokens in C
33 pages
Discrete Structures
No ratings yet
Discrete Structures
153 pages
CP Assignment 1.0
No ratings yet
CP Assignment 1.0
42 pages

Data Mining & Machine Learning Courseoutline

Uploaded by

Data Mining & Machine Learning Courseoutline

Uploaded by

Computer Science Department

Academic Year 2024 - 2025

5 Recommended Tools and Frameworks 5

5. Describe common evaluation metrics (accuracy, precision, recall, F1-score, ROC-AUC).

8. Describe various feature selection techniques (e.g., correlation, chi-square, L1 regularization).

Presentation Date: 13th January, 2025

Final Project Topics

5 Recommended Tools and Frameworks

You might also like