0% found this document useful (0 votes)

46 views

Databyte ML Task 1

The document provides instructions for a machine learning task for the DataByte club. Applicants must choose one of six problem statements to work on, including liver disease prediction, taxi fare prediction, air quality prediction, wine quality prediction, flower classification, and real estate price prediction. For the chosen problem, applicants must perform exploratory data analysis, implement at least three different machine learning models from scratch, and evaluate and compare the models. Optional tasks involve implementing deep learning models to improve accuracy. Guidelines are provided for coding environment, evaluation metrics, and submitting work on Github for evaluation.

Uploaded by

Mohini Thakur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

46 views

Databyte ML Task 1

Uploaded by

Mohini Thakur

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 6

DataByte

The Official Machine Learning and Data Science Club of

NITT
FIRST-YEAR INDUCTION

TASK-1

MACHINE LEARNING

All the machine learning models are to be built from scratch without taking the help of
sklearn or any similar library. Otherwise, the submission would not be considered. All the
problem statements involve very basic steps to approach them, so we expect Innovation and
creativity in approach, more than implementing a model.

General Instructions for Submission:

● The submission must include all the EDA performed and the results from them.
● There must be at least 3 very different models implemented and results verified (say
decision trees, KNN, Logistic regression)
● Apart from the given evaluation metric, explore others and suggest why F1 or any
other metric is better for the problem statement you are working on
● Comment on the code correctly and make sure they are readable, and highlight
wherever you have tried something creative or used an innovative approach.
● All the labels have been given for the problem statements given below, but what do
you do if you just have a bunch of numbers without labels? Try implementing that
along with your task for exploring your creativity.

1
Problem Statements:

(Choose any one problem statement)

1. Liver Disease Prediction (Classification):

The task is to develop prediction models utilising 18 clinical characteristics to

forecast the stage of liver cirrhosis. Cirrhosis causes liver harm for a number of
reasons, which results in scarring and liver failure. These types of applications are
very crucial to use AI in healthcare.

This is basically a classification problem where the goal is to categorise the stage of
the disease so that measures can be taken according to it. Before modelling, use
suitable processing techniques and present visually the inference you get from the
EDA. Then proceed to develop the prediction models and compare them using
evaluation metrics, and justify the results statistically to support your model’s
performance.

About data:
Train Dataset - It consists of a total of 6801 data points.
Test Dataset - You must predict the stage of cirrhosis of 3201 data points.

Evaluation: F1 Score

Dataset: Liver_disease_pred

(or)

2. Taxi Fare Prediction (Regression):

Have you ever felt taxis or autos just get you fooled at times, asking for an insanely
high amount, now you will have to approach this as a budding machine learning
enthusiast. Your task is to delve into the world of transportation economics and build
a regression-based model to predict the fare using features like the trip duration,
distance travelled, etc. Problems like this are solved better using domain knowledge
so feel free to take an intuitive approach to this mystery.

About data:
Train: 20967 x 8
Test: 89861, 8

2
Evaluation Metric: Choose the best metric to evaluate for the dataset with statistical reason

Dataset: Fare_prediction

(or)

3. Decode AQI (Classification):

The Air Quality Index (AQI) is a widely-used measure of air pollution that provides
information on the quality of air in a city on a daily basis. The task is to develop a
model which can predict the AQI_Bucket based on features like PM2.5, CO, etc.
This dataset contains air quality measurements for several cities, with observations
taken at different dates and times. The variables represent different pollutants and
their concentrations in the air, as well as the AQI bucket, which provides an overall
measure of air quality."

About data:
Train: 495512 x 15
Test: 212363 x 14

Evaluation: F1 Score

Dataset : AQI
(or)

4. Wine Quality Prediction (Classification):

Embark on a journey into the world of wine, where predicting quality takes an
unexpected twist. Your mission is to develop a model that accurately forecasts wine
quality using freely available datasets from the internet. But here's the catch: the
dataset you'll be working with lacks clear information on the usual factors used to
assess wine quality.

As a curious data scientist, your task is to uncover hidden connections within this
mysterious dataset. By harnessing the power of machine learning, you'll need to
uncover insights that could hold the key to predicting wine quality. Dive deep into
the unknown, examining hidden variables and patterns that might reveal the secrets
of excellence. Your model must adapt and discover the unspoken features that truly
influence wine quality, exploring new territory in the world of data-driven wine
assessment.

3
Dataset: Wine Quality - UCI Machine Learning Repository

(or)
5. Smart Botanist (Classification):
As a botanist, the task at hand involves developing a machine learning model to
effectively classify different species of flowers. This model will utilise available
features from the dataset to accurately determine the species of a given flower. The
ultimate objective is to create a reliable and robust classification model that can
generalise well to new samples of iris flowers. Take into consideration the effects of
seasonal and climatic changes i.e environmental conditions.(Hint> Soil types, rainfall
etc). Various geographic locations differ in the types of iris flowers present, the
objective is to provide accurate predictions for their species(Do research into
incorporating these points into your dataset while building your model).

Now, consider that the model is able to perform well on the training data but does
not perform well while testing on unseen input features of flowers. What will you do
in order to prevent this? Furthermore, the model's performance will be assessed using
appropriate evaluation metrics such as precision, recall, and F1 score, ensuring a
comprehensive understanding of its effectiveness across various classes. Suggest a
way you can automate the learning process of the model.
Dataset: Iris - UCI Machine Learning Repository

(or)

6. Real Estate Price Prediction (Regression):

Melbourne, being the evergreen city in the beautiful country of Australia, has a
variety of households for the average real estate buyer to choose from. Your
objective is to correctly predict the price of a respective real estate property in
Melbourne, taking into consideration the various factors such as land area,
the number of bedrooms, etc. The model must utilise all factors that may affect the
price of a house in Melbourne with the help of regression techniques, train and test
your dataset so as to evaluate the model with regressive metrics such as (MSE)
Mean Squared Error, RMSE, etc.
Your dataset may have outliers, so do the needful and make sure it can adapt and
improvise to correctly predict the price of a house for a prospective buyer. Be sure to
produce realistic predictions.

Brownie points will be given to those who can do an appropriate EDA on the dataset
and derive meaningful conclusions.

4
Dataset: House_Pricing

Brownie points:
Devise a deep learning algorithm to improvise the accuracy scores of above-implemented
models developed using machine learning techniques.
(NOTE: This task is optional. But it is highly recommended to do so. Implementing this task
will increase your chance of selection into the club)
Also,
● Try implementing these algorithms from scratch without using libraries
● To further enhance your skills and exposure, try executing the clustering
classification and regression algorithms in different environments. This will allow
you to explore various features and libraries simultaneously.

Expectations:
● Implement it with the help of PyTorch (preferred) or Tensorflow for the task.
● Use Jupyter Notebooks as the coding environment.
● Consider setting up Jupyter in VS Code to execute the algorithms.

Guidelines:
● The read.me file of Github must include the accuracies of algorithms by adding
suitable evaluation metrics scores, graphs of epochs, loss, and accuracy.
● Utilize various evaluation methods such as confusion matrix, ROC curve, or
precision-recall curve to assess the performance of each approach.
● Provide insights into the strengths and limitations of each method based on the
evaluation results.

Github link submission:

Include your code implementation, evaluation metric scores, graphs, and report. (NOTE:
Failing to include any of these will abstain evaluation of your work)

5
Resources:

Setting up Google Colab Notebooks

● Google Colab Tutorial for Beginners | Get Started with Google Colab
Setting up Jupyter Notebooks
● Install Miniconda (Python) with Jupyter Notebook and Setting Up Virtual Environments on
Windows 10
Setting up VS Code
● VSCode Tutorial For Beginners - Getting Started With VSCode
Python
● https://www.python.org/
Virtual Environment
● Create your own python virtual environment using the link below
● https://www.youtube.com/watch?v=ohlRbcasPAc

Machine Learning
● Machine Learning Tutorial Python | Machine Learning For Beginners - YouTube
● https://trainings.internshala.com/machine-learning-
course/?tracking_source=trainings-search-tags
● Practical Machine Learning Tutorial with Python Intro p.1
● https://www.coursera.org/learn/machine-learning or refer to Andrew NG videos on
youtube with the link below#1 Machine Learning Specialization [Course 1, Week 1,
Lesson 1]
● https://www.kaggle.com/learn/intro-to-machine-learning for extensive notes on
various concepts on the basics of ML

Deep Learning
● Go through @patloeber , his courses for deep learning techniques and machine
learning from scratch.
● Deep Learning playlist overview & Machine Learning intro
● PyTorch Prerequisites - Syllabus for Neural Network Programming Course
● https://youtube.com/playlist?list=PLhhyoLH6IjfxVOdVC1P1L5z5azs0XjMsb
● Pytorch
Learn basics of Pytorch and its implementation using the link below:
https://pytorch.org/tutorials/index.html#

Github tutorials
● GitHub Tutorial - Beginner's Training Guide

Culture-Driven Team Building Quizes - Coursera
No ratings yet
Culture-Driven Team Building Quizes - Coursera
5 pages
Advanced Algorithmic Trading
100% (1)
Advanced Algorithmic Trading
28 pages
Machine Learning (16CIC73) Project Report Template
33% (3)
Machine Learning (16CIC73) Project Report Template
12 pages
DiagoonHousingDelft2016 Herman Hertz
No ratings yet
DiagoonHousingDelft2016 Herman Hertz
20 pages
Alan Watts - A Psychedelic Experience
100% (1)
Alan Watts - A Psychedelic Experience
9 pages
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
From Everand
DATA MINING AND MACHINE LEARNING. PREDICTIVE TECHNIQUES: REGRESSION, GENERALIZED LINEAR MODELS, SUPPORT VECTOR MACHINE AND NEURAL NETWORKS
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. PREDICTIVE TECHNIQUES: ENSEMBLE METHODS, BOOSTING, BAGGING, RANDOM FOREST, DECISION TREES and REGRESSION TREES.: Examples with MATLAB
César Pérez López
No ratings yet
index
No ratings yet
index
2 pages
CS-605-MJPLab Course On CS-602-MJ (Machine Learning)
No ratings yet
CS-605-MJPLab Course On CS-602-MJ (Machine Learning)
2 pages
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING: CLUSTER ANALYSIS and kNN CLASSIFIERS. Examples with MATLAB
César Pérez López
No ratings yet
ML Assignment 2
No ratings yet
ML Assignment 2
3 pages
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
From Everand
DEEP LEARNING TECHNIQUES: CLUSTER ANALYSIS and PATTERN RECOGNITION with NEURAL NETWORKS. Examples with MATLAB
César Pérez López
No ratings yet
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
From Everand
DATA MINING and MACHINE LEARNING. CLASSIFICATION PREDICTIVE TECHNIQUES: SUPPORT VECTOR MACHINE, LOGISTIC REGRESSION, DISCRIMINANT ANALYSIS and DECISION TREES: Examples with MATLAB
César Pérez López
No ratings yet
Data Mining Models: Techniques and Applications
From Everand
Data Mining Models: Techniques and Applications
Ravi Deshpande
No ratings yet
ML QB Ans
No ratings yet
ML QB Ans
141 pages
A3 Classification and Feature Engineering
No ratings yet
A3 Classification and Feature Engineering
2 pages
Lab 1. Boston House
No ratings yet
Lab 1. Boston House
7 pages
AI-900: Microsoft Azure AI Fundamentals Preparation
From Everand
AI-900: Microsoft Azure AI Fundamentals Preparation
Georgio Daccache
No ratings yet
Microsoft Azure Machine Learning
From Everand
Microsoft Azure Machine Learning
Sumit Mund
4.5/5 (3)
Thesis
No ratings yet
Thesis
45 pages
# Tommy Trojan # ITP 449 Fall 2021 # Final Project # Q1
No ratings yet
# Tommy Trojan # ITP 449 Fall 2021 # Final Project # Q1
6 pages
ML ASSIGNMENT
No ratings yet
ML ASSIGNMENT
3 pages
DS Assignment (1)
No ratings yet
DS Assignment (1)
2 pages
Coursework Assessment MFKhan v1.4
No ratings yet
Coursework Assessment MFKhan v1.4
9 pages
"Big Data Science" Basic Concepts and Applications
From Everand
"Big Data Science" Basic Concepts and Applications
Sukanta Bhattacharya
No ratings yet
PRACTICAL GUIDE TO LEARN ALGORITHMS: Master Algorithmic Problem-Solving Techniques (2024 Guide for Beginners)
From Everand
PRACTICAL GUIDE TO LEARN ALGORITHMS: Master Algorithmic Problem-Solving Techniques (2024 Guide for Beginners)
MARTY TWITTY
No ratings yet
Machine Learning with Python: Foundations and Applications: ML, #1
From Everand
Machine Learning with Python: Foundations and Applications: ML, #1
Mohammed Nurudeen
No ratings yet
Final Report (1)
No ratings yet
Final Report (1)
17 pages
D Caltech PG AI & ML Project
No ratings yet
D Caltech PG AI & ML Project
4 pages
RevisedCO327 ML Practical List
No ratings yet
RevisedCO327 ML Practical List
2 pages
ML Project Report
No ratings yet
ML Project Report
12 pages
Module 5
No ratings yet
Module 5
46 pages
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
From Everand
Microsoft Certified: Power BI Data Analyst Associate PL 300 Practice Tests
CertSquad Professional Trainers
No ratings yet
Business Analytics 1 Ca 2
No ratings yet
Business Analytics 1 Ca 2
26 pages
Unit 2 Supervised Learning
No ratings yet
Unit 2 Supervised Learning
36 pages
Data Science and AI Simplified
From Everand
Data Science and AI Simplified
Ekaaksh Deshpande
No ratings yet
ECON 460202E006 MLforBI2 S23o
No ratings yet
ECON 460202E006 MLforBI2 S23o
5 pages
Lab 5 Specification-3
No ratings yet
Lab 5 Specification-3
2 pages
Lab 4 Specification
No ratings yet
Lab 4 Specification
3 pages
Final Projects ATI
No ratings yet
Final Projects ATI
1 page
CP4252 Machine Learning Lab Manual
No ratings yet
CP4252 Machine Learning Lab Manual
26 pages
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
From Everand
Data Science Project Ideas for Thesis, Term Paper, and Portfolio
Zemelak Goraga
No ratings yet
Be Data Curious!: Be Data Curious!, #1
From Everand
Be Data Curious!: Be Data Curious!, #1
Nick Jewell
No ratings yet
Mastering Machine Learning: A Comprehensive Guide to Success
From Everand
Mastering Machine Learning: A Comprehensive Guide to Success
Rick Spair
No ratings yet
House Report
No ratings yet
House Report
26 pages
Animal Image Recognition System (1)
No ratings yet
Animal Image Recognition System (1)
2 pages
List of projects
No ratings yet
List of projects
1 page
Case Study 219302405
No ratings yet
Case Study 219302405
14 pages
CL-I Lab Manual
No ratings yet
CL-I Lab Manual
131 pages
Cotton Crop Disease Prediction Using Deep Learning
No ratings yet
Cotton Crop Disease Prediction Using Deep Learning
13 pages
Important Questions
No ratings yet
Important Questions
4 pages
Machine learning lab manual
No ratings yet
Machine learning lab manual
22 pages
Computer Lab 2 Block 1-3
No ratings yet
Computer Lab 2 Block 1-3
7 pages
Project Description Document
No ratings yet
Project Description Document
7 pages
Module 2 Own Notes
No ratings yet
Module 2 Own Notes
10 pages
Productivity Algorithms
From Everand
Productivity Algorithms
Tom Austin
No ratings yet
Backtrader Essentials: Building Successful Strategies with Python
From Everand
Backtrader Essentials: Building Successful Strategies with Python
Ali AZARY
No ratings yet
Advanced Techniques in Machine Learning and Optimization (3)
No ratings yet
Advanced Techniques in Machine Learning and Optimization (3)
8 pages
Aastha Mahajan Python File
No ratings yet
Aastha Mahajan Python File
17 pages
Exercises 5
No ratings yet
Exercises 5
3 pages
P3_Practical
No ratings yet
P3_Practical
20 pages
AI - ML Dev Plan - 29102018
No ratings yet
AI - ML Dev Plan - 29102018
10 pages
Capstones AIML and DS Capstone Projects
No ratings yet
Capstones AIML and DS Capstone Projects
6 pages
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
From Everand
The Supervised Learning Workshop - Second Edition: A New, Interactive Approach to Understanding Supervised Learning Algorithms, 2nd Edition
Blaine Bateman
No ratings yet
Algorithmic Trading
No ratings yet
Algorithmic Trading
27 pages
Databyte WebDev Task 1
No ratings yet
Databyte WebDev Task 1
3 pages
Databyte AppDev Task 1
No ratings yet
Databyte AppDev Task 1
2 pages
Groups Norms and Practices Essays on Inferentialism and Collective Intentionality Ladislav Koreň download pdf
100% (2)
Groups Norms and Practices Essays on Inferentialism and Collective Intentionality Ladislav Koreň download pdf
52 pages
005 How To Foster Positive Communication - 9 Effective Techniques
No ratings yet
005 How To Foster Positive Communication - 9 Effective Techniques
37 pages
Natural Secince Vs Social Sciences
No ratings yet
Natural Secince Vs Social Sciences
31 pages
The Humanistic Approach
No ratings yet
The Humanistic Approach
7 pages
Design Thinking Challenge
No ratings yet
Design Thinking Challenge
5 pages
Five Rules of Thumb For Polite and Diplomatic Language - Macmillan
No ratings yet
Five Rules of Thumb For Polite and Diplomatic Language - Macmillan
4 pages
Persian 3 Minute Kobo Audiobook
No ratings yet
Persian 3 Minute Kobo Audiobook
194 pages
DLL Mathematics-9 Q2 W1
No ratings yet
DLL Mathematics-9 Q2 W1
2 pages
Unit 7 - Out and About
No ratings yet
Unit 7 - Out and About
16 pages
Intermediate 30 Mins: Lesson Plan - Stage 1
67% (3)
Intermediate 30 Mins: Lesson Plan - Stage 1
2 pages
Benefits of Learning Languages
No ratings yet
Benefits of Learning Languages
13 pages
Lesson Plan Cross Cultural Communication
No ratings yet
Lesson Plan Cross Cultural Communication
40 pages
Grammar B1 2 PDF
No ratings yet
Grammar B1 2 PDF
4 pages
End of The School Year Self Evaluation: I CAN... : Speaking
No ratings yet
End of The School Year Self Evaluation: I CAN... : Speaking
2 pages
Recommended Elements of Infographics in Education (Programming Focused)
No ratings yet
Recommended Elements of Infographics in Education (Programming Focused)
12 pages
Business Requirements Modelling Holocentric
No ratings yet
Business Requirements Modelling Holocentric
12 pages
The Self and Its Development
No ratings yet
The Self and Its Development
25 pages
Examination Pattern - Docx MAPC 1 Cognitive Psychology, Learning and Memory
No ratings yet
Examination Pattern - Docx MAPC 1 Cognitive Psychology, Learning and Memory
8 pages
Using Mainstream Game To Teach Technology Through An Interest Framework
No ratings yet
Using Mainstream Game To Teach Technology Through An Interest Framework
12 pages
WeTransfer Ideas Report 2018 PDF
No ratings yet
WeTransfer Ideas Report 2018 PDF
31 pages
Labov 1987
No ratings yet
Labov 1987
8 pages
Bachelor of Business Administration: A Study On HR Practices and Employee Engagement in Talented Minds Company
No ratings yet
Bachelor of Business Administration: A Study On HR Practices and Employee Engagement in Talented Minds Company
63 pages
Translation
No ratings yet
Translation
4 pages
Perspectives On Concept Generation and Design Creativity: 2.1 Very Early Stage of Design
No ratings yet
Perspectives On Concept Generation and Design Creativity: 2.1 Very Early Stage of Design
12 pages
An Analysis of Spelling Errors in The Essays of SS3 Students of Baptist High School, Marke.
No ratings yet
An Analysis of Spelling Errors in The Essays of SS3 Students of Baptist High School, Marke.
42 pages
Boyatzis' Theory of Self-Directed Learning
No ratings yet
Boyatzis' Theory of Self-Directed Learning
8 pages
GSv4-U-GUI-REP in School Guide - EN
No ratings yet
GSv4-U-GUI-REP in School Guide - EN
16 pages

Databyte ML Task 1

Uploaded by

Databyte ML Task 1

Uploaded by

DataByte

The Official Machine Learning and Data Science Club of

General Instructions for Submission:

(Choose any one problem statement)

1. Liver Disease Prediction (Classification):

The task is to develop prediction models utilising 18 clinical characteristics to

2. Taxi Fare Prediction (Regression):

3. Decode AQI (Classification):

4. Wine Quality Prediction (Classification):

6. Real Estate Price Prediction (Regression):

Github link submission:

Setting up Google Colab Notebooks

You might also like