0% found this document useful (0 votes)
8 views

Data Mining & Machine Learning Courseoutline

The document outlines the course CS404 Data Mining and Machine Learning for Year 4 Computer Science students, detailing its objectives, study units, assignments, and recommended tools. The course aims to provide students with practical skills in data mining and machine learning techniques, focusing on data preprocessing, algorithm application, and model evaluation. It includes various group project topics and guidelines for assignment submissions, emphasizing the importance of ethical considerations and real-world applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

Data Mining & Machine Learning Courseoutline

The document outlines the course CS404 Data Mining and Machine Learning for Year 4 Computer Science students, detailing its objectives, study units, assignments, and recommended tools. The course aims to provide students with practical skills in data mining and machine learning techniques, focusing on data preprocessing, algorithm application, and model evaluation. It includes various group project topics and guidelines for assignment submissions, emphasizing the importance of ethical considerations and real-world applications.
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 7

Computer Science Department

Yoni Campus
Year 4 Semester 1

Course Outline
Data Mining and Machine Learning
Credit Hours: 3
Course Code: CS404

Lecturers
Engr. Abdulai Brato Kamara
Engr. Ibrahim Kalokoh (HOD)

Academic Year 2024 - 2025


Contents
1 Course Description 1

2 Objectives 1

3 Study Units 1

4 Assignment 2

5 Recommended Tools and Frameworks 5

6 References/Further Readings 5
1 Course Description
CS404 Data Mining and Machine Learning is an advanced course designed for Year 4 Computer
Science students, providing comprehensive knowledge and practical skills in data mining and
machine learning techniques. This course aims to equip students with the ability to extract
valuable insights from large datasets and develop predictive models using various machine
learning algorithms. Emphasis is placed on understanding the theoretical foundations, applying
algorithms to real-world datasets, and evaluating the performance of machine learning models.

2 Objectives
By the end of the course, students will be able to perform data preprocessing, select and apply
appropriate machine learning algorithms, evaluate model performance, and implement data
mining solutions to solve complex problems in various fields.

3 Study Units
Week 1-2: Introduction to Data Mining and Machine Learning Overview of Data
Mining and Machine Learning Historical perspective and applications Basic concepts and
terminology The data mining process and the machine learning pipeline Comparison of supervised
and unsupervised learning
Week 3-4: Data Preprocessing Data preprocessing tools and techniques Data cleaning and
integration Data transformation and reduction Handling missing values and outliers Feature
engineering
Week 5-7: Supervised Learning Algorithms Linear regression and its applications Logistic
regression for classification problems Decision trees and ensemble methods Support Vector
Machines (SVM) Evaluation metrics for classification models k-Nearest Neighbors
Week 7-9: Unsupervised Learning Algorithms Clustering algorithms (k-means, hierarchical
clustering) Association rule mining Principal Component Analysis (PCA) t-Distributed Stochastic
Neighbor Embedding (t-SNE)
Week 9-11: Advanced Topics in Machine Learning Ensemble methods (bagging, boosting)
Neural networks and deep learning Reinforcement learning basics and applications Feature
selection and dimensionality reduction techniques Evaluation metrics for machine learning
models
Week 12-13: Data Mining Applications Real-world applications of data mining and
machine learning Group projects: Implementing machine learning algorithms on datasets Ethical
considerations in data mining and machine learning
4 Assignment
The deadlines for submitting assignments and presentations are set, and students must submit
one project assignment and deliver a presentation on the chosen topic, with no group choosing
the same topic. Divide yourself into groups of 5-6 members
Due Date: 16th December 2024 at 11:59PM

1. Apply clustering or association rule mining to spatial data (e.g., geographic information).
Discuss practical applications and challenges of mining spatial data.

2. Use data summarization techniques (e.g., pivot tables, aggregation) on a large dataset.
Explain how summarization helps reveal key trends or patterns.

3. Use visualization techniques to explore a complex dataset (e.g., network or social media
data) and identify patterns. Discuss how visualization assists in the interpretation of
data.

4. Explain the differences between supervised and unsupervised learning. Provide examples
of datasets where each type would be appropriate, discussing why each approach suits
specific problems.

5. Describe common evaluation metrics (accuracy, precision, recall, F1-score, ROC-AUC).


Compare models using these metrics on a dataset of your choice and explain the impact
of each metric on model selection.

6. Implement a linear regression model on a dataset (e.g., housing prices, sales data).
Analyze the models performance and interpret the significance of each feature in predicting
the target variable.

7. Compare the performance of different classification algorithms (e.g., SVM, Decision Trees,
and KNN) on a dataset (e.g., Iris, MNIST). Evaluate the models using appropriate metrics
and discuss the results.

8. Describe various feature selection techniques (e.g., correlation, chi-square, L1 regularization).


Apply feature selection to a dataset and discuss how it impacts model accuracy and
complexity.

9. Implement and compare the performance of ensemble methods (e.g., Bagging, Boosting)
on a dataset. Explain why ensemble methods can improve model performance.

10. Explain cross-validation and its importance in machine learning. Perform cross-validation
on a model and analyze how it impacts the models performance and reliability.

11. Describe grid search and random search techniques for hyperparameter tuning. Apply
one of these techniques to improve the performance of a classification model and analyze
the results.

Page 2
Assignment Submission Guidelines
- Submit a Jupyter Notebook or a well-documented code file along with a report.
- Include visualizations and plots to support your analysis.
- Clearly state any assumptions made during the analysis.
Submit to: [email protected] or [email protected]

Grading Criteria
- Correct implementation of data preprocessing and feature extraction.
- Implementation and correct usage of classification algorithms.
- Thorough evaluation of model performance using appropriate metrics.
- Effectiveness of hyperparameter tuning.
- Quality of analysis and conclusions.

Presentation Date: 13th January, 2025


Test Date: 20th January 2025

Final Project Topics


Due Date: Depends on the exams timetable

Group One
- Use machine learning algorithms (e.g., logistic regression, Naive Bayes) to classify sentiment
in text data (e.g., Twitter or product reviews). Discuss preprocessing steps like tokenization
and sentiment labeling.
- Explain the steps involved in data preprocessing. Apply these steps to a real-world dataset
and describe how they improve the quality of data for mining.

Group Two
- Implement a multi-label classification model for a dataset where each instance can belong to
multiple categories (e.g., music genres, movie tags). Evaluate using appropriate metrics (e.g.,
Hamming loss). - Use the Apriori or FP-Growth algorithm on a retail dataset. Identify frequent
itemsets and generate association rules, interpreting the significance of the results.

Group Three
- Apply clustering to a dataset for customer segmentation. Discuss how clustering results could
be used to tailor marketing strategies.
- Implement the K-Means or DBSCAN algorithm on a dataset of your choice (e.g., customer
segmentation data). Visualize and analyze the clusters, discussing the relevance of different
clustering methods.

Group Four

Page 3
- Implement an image segmentation algorithm (e.g., U-Net) to segment regions of interest in
an image dataset (e.g., medical images). Evaluate the segmentation accuracy.
- Apply PCA or t-SNE to a high-dimensional dataset, and explain how dimensionality reduction
helps in the context of data mining. Analyze the principal components and interpret their
significance.

Group Five
- Create a voting ensemble classifier that combines predictions from multiple models (e.g.,
Decision Trees, KNN, SVM). Evaluate the ensemble’s performance compared to individual
models.
- Identify and remove outliers from a dataset. Discuss different methods (e.g., Z-score, IQR)
and analyze the impact of removing outliers on model accuracy.

Group Six
- Implement a gradient boosting model (e.g., XGBoost, CatBoost) on a dataset and evaluate
its accuracy. Discuss how gradient boosting improves model performance.
- Describe anomaly detection techniques and apply one to a dataset (e.g., fraud detection
in transactions). Explain the challenges and significance of anomaly detection in real-world
applications.

Group Seven
- Use an Isolation Forest model to detect anomalies in a dataset (e.g., network traffic, sensor
data). Discuss the use cases of anomaly detection in real-time systems.
- Use Natural Language Processing (NLP) to extract key topics from a text dataset (e.g.,
customer reviews, social media posts). Apply topic modeling techniques like LDA and interpret
the results.

Group Eight
- Perform hyperparameter tuning using both grid search and random search on a complex
model. Compare the results and discuss the trade-offs in performance and computation time.
- Merge two or more datasets with related information (e.g., customer transactions and demographics).
- Discuss challenges and approaches for dealing with inconsistent or missing data during
integration.

Group Nine
- Create a machine learning pipeline that includes steps like preprocessing, feature engineering,
and model training. Test the pipeline on a dataset and evaluate its accuracy.
- Apply a sequential pattern mining algorithm on time-series data (e.g., clickstream or event
logs) to identify common sequences of events. Interpret the patterns and their business relevance.

Page 4
Group Ten
- Generate adversarial examples for a trained image classifier (e.g., using FGSM). Discuss the
impact of adversarial examples on model robustness and implications for security.
- Use image processing techniques (e.g., scaling, color normalization) on an image dataset and
explain how these preprocessing steps improve data quality for mining.

5 Recommended Tools and Frameworks


- Python programming language
- Scikit-Learn, TensorFlow, and PyTorch for machine learning implementations
- Jupyter Notebooks for interactive coding and analysis

6 References/Further Readings
1. Stemkoski, L., & Pascale, M. (2021, May 27). Developing graphics frameworks with
python and OpenGL. OAPEN Home.

2. Python Machine Learning for Beginners: Learning from scratch NumPy, pandas, Matplotlib,
Seaborn, Scikitlearn, and TensorFlow for Machine Learning and data science. (2020). AI
Publishing.

3. Geron, A. (2022). Hands-on machine learning with scikit-learn, Keras, and tensorflow
concepts, tools, and techniques to build Intelligent Systems. O’Reilly.

4. Witten, I. H., Frank, E.„ Hall, M. A. (2011). Data Mining: Practical Machine Learning
Tools and Techniques. Amsterdam: Morgan Kaufmann. ISBN: 978-0-12-374856-0

5. HAN, J.I.A.W.E.I. (2022) Data Mining: Concepts and Techniques. S.l.: MORGAN KAUFMANN.

Page 5

You might also like