Data Science Fundamentals
Data Science Fundamentals
Test Projects:
20 Use Cases
Description: The project aims to develop a loan eligibility prediction model that
utilizes machine learning algorithms to assess the eligibility of individuals for
obtaining loans. The Model will provide a user-friendly interface for inputting
customer data and generate real-time loan eligibility predictions based on the trained
model
Learning Outcome:
Through this project, you will gain experience and understanding in the following
areas:
2. Feature Engineering: Identifying and creating relevant features from the loan
dataset that can improve the loan eligibility prediction model's performance. This may
involve feature extraction, transformation, or combination.
Tasks:
2. Feature Engineering: Identify and create relevant features from the loan dataset
to enhance the loan eligibility prediction model's performance. Perform feature
extraction, transformation, or combination as required.
Evaluation:
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills.
Description:
The project aims to develop a diabetes prediction Model that utilizes machine learning
algorithms to predict the likelihood of an individual developing diabetes based on
certain risk factors. The Model will provide users with a user-friendly interface to input
their health information and generate real-time predictions regarding their risk of
developing diabetes.
Learning Outcome:
Through this project, you will gain experience and understanding in the following
areas:
Tasks:
2. Feature Engineering: Identify and create relevant features from the diabetes
dataset to enhance the prediction model's performance. Perform feature extraction,
transformation, or combination as required.
2. Live Evaluation: Industrial experts will conduct a live evaluation session where
they will assess your understanding of the project components, your ability to explain
the implemented features, and your problem-solving skills related to the project.
Description:
The project aims to develop a glass classification Model using machine learning
algorithms to predict the type of glass based on its chemical composition. The Model
will provide users with a user-friendly interface to input the chemical attributes of
glass samples and generate real-time predictions regarding the glass type.
Learning Outcome:
Through this project, you will gain experience and understanding in the following
areas:
Tasks:
2. Feature Engineering: Identify and select relevant features from the glass dataset
to enhance the classification model's performance. Perform feature extraction,
transformation, or combination as required.
2. Live Evaluation: Industrial experts will conduct a live evaluation session where
they will assess your understanding of the project components, your ability to explain
the implemented features, and your problem-solving skills related to the project.
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills
Description:
The project aims to perform data analysis on PhonePe Pulse data, a fictional mobile
payment service, to gain insights and extract valuable information. The analysis will
involve exploring the dataset, performing statistical calculations, and generating
visualizations to understand user behaviour, transaction patterns, and other relevant
metrics.
Learning Outcome: Through this project, you will gain experience and
understanding in the following areas:
Tasks:
1. Data Exploration: Explore the PhonePe Pulse dataset, examine its structure, and
identify relevant variables for analysis.
2. Data Cleaning: Handle missing values, outliers, and inconsistencies in the dataset,
ensuring data integrity and quality.
3. Data Wrangling: Transform and reshape the data as needed, merge multiple
datasets if available, and create new variables for analysis.
2. Live Evaluation: Industrial experts will conduct a live evaluation session where
they will assess your understanding of the project components, your ability to explain
the implemented features, and your problem-solving skills related to the project.
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills.
Description:
The project aims to develop a breast cancer prediction Model that incorporates both
frontend and backend components. The Model will utilize machine learning algorithms
to classify breast tissue as malignant (cancerous) or benign (non-cancerous) based
on various features. It will provide users with a user-friendly interface to input the
relevant features and generate real-time predictions regarding the likelihood of
breast cancer.
Learning Outcome: Through this project, you will gain experience and
understanding in the following areas:
2. Feature Engineering: Identifying and selecting relevant features from the glass
dataset that can improve the classification model's performance. This may involve
feature extraction, transformation, or combination.
Tasks:
2. Feature Engineering: Identify and select relevant features from the glass dataset
to enhance the classification model's performance. Perform feature extraction,
transformation, or combination as required.
Evaluation: The evaluation of the project will consist of the following components:
2. Live Evaluation: Industrial experts will conduct a live evaluation session to assess
your understanding of the project components, your ability to explain the
implemented features, and your problem-solving skills related to breast cancer
prediction.
Description:
The project aims to develop a flight ticket price prediction Model using machine
learning algorithms to forecast the prices of airline tickets based on various factors
such as departure city, destination, travel dates, airline, and other relevant
parameters. The Model will provide users with real-time predictions to help them
make informed decisions when booking flights.
Learning Outcome: Through this project, you will gain experience and
understanding in the following areas:
1. Data Collection and Exploration: Collecting flight ticket data from reliable sources
and exploring the dataset to understand its structure, variables, and data types.
2. Data Cleaning and Pre-processing: Cleaning and reprocessing the flight ticket
dataset, handling missing values, outliers, and data inconsistencies. Applying
techniques such as imputation, feature scaling, and encoding categorical variables if
required.
3. Feature Engineering: Extracting and creating new features from the existing
dataset that may have an impact on flight ticket prices. This may include feature
transformations, aggregations, or the creation of derived variables.
1. Data Collection and Exploration: Collect flight ticket data from reliable sources and
explore the dataset to understand its structure and variables.
2. Data Cleaning and Pre-processing: Clean, transform, and pre-process the flight
ticket dataset, handling missing values, outliers, and data inconsistencies.
3. Feature Engineering: Extract and create new features from the dataset that may
have an impact on flight ticket prices.
4. Machine Learning Model Building and Evaluation: Select and implement suitable
machine learning algorithms for flight ticket price prediction. Train the model using
the pre-processed data and evaluate its performance using appropriate metrics.
Evaluation: The evaluation of the project will consist of the following components:
2. Live Evaluation: Industrial experts will conduct a live evaluation session to assess
your understanding of the project components, your ability to explain the
implemented features, and your problem-solving skills related to breast cancer
prediction.
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills.
Use Case 7: Loan Risk Assessment Model
Description:
The project aims to develop a loan default classification Model using machine learning
algorithms to predict the likelihood of loan default for borrowers based on various
factors such as credit score, income, employment history, loan amount, and other
relevant features. The Model will provide users with real-time predictions to assist
lenders in assessing the creditworthiness of loan applicants and making informed
decisions.
Learning Outcome: Through this project, you will gain experience and
understanding in the following areas:
2. Feature Selection and Engineering: Selecting and creating relevant features that
have a significant impact on loan default prediction. This may involve techniques like
correlation analysis, feature importance ranking, or domain knowledge-based feature
engineering.
2. Feature Selection and Engineering: Select and create relevant features that have
a significant impact on loan default prediction.
3. Machine Learning Model Building and Evaluation: Select and implement suitable
machine learning algorithms for loan default classification. Train the model using the
pre-processed data and evaluate its performance using appropriate metrics.
Evaluation: The evaluation of the project will consist of the following components:
2. Live Evaluation: Industrial experts will conduct a live evaluation session to assess
your understanding of the project components, your ability to explain the
implemented features, and your problem-solving skills related to breast cancer
prediction.
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills.
Description:
The project aims to develop a stock price prediction Model for Amazon, Microsoft,
Google, and Apple using regression models. The Model will utilize historical stock
price data, along with other relevant factors such as market trends, news sentiment,
and financial indicators, to predict future stock prices. Users will be able to access
the predictions, visualize historical trends, and make informed investment decisions
based on the provided insights.
Learning Outcome: Through this project, you will gain experience and
understanding in the following areas:
1. Data Collection: Gathering historical stock price data for Amazon, Microsoft,
Google, and Apple from reliable financial sources or APIs. Collecting additional
relevant data, such as market trends and financial indicators, to enhance the
prediction models.
4. Model Training and Evaluation: Splitting the data into training and testing sets.
Training the regression models using the training data and evaluating their
performance using appropriate metrics such as mean squared error (MSE), mean
absolute error (MAE), and R-squared.
Tasks:
1. Data Collection: Gather historical stock price data for Amazon, Microsoft, Google,
and Apple. Collect additional relevant data, such as market trends and financial
indicators.
2. Feature Engineering: Select and create appropriate features from the collected
data to enhance the prediction models.
3. Regression Model Building: Build regression models to predict future stock prices
based on historical and additional feature data.
4. Model Training and Evaluation: Train the regression models using the training data
and evaluate their performance using appropriate metrics.
Evaluation:
2. Live Evaluation: Industrial experts will conduct a live evaluation session to assess
your understanding of the project components, your ability to explain the
implemented features, and your problem-solving skills related to breast cancer
prediction.
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills
Description:
Tasks:
4. Model Evaluation:
● Evaluating the trained regression models using metrics like mean absolute
error (MAE), mean squared error (MSE), or R-squared.
Evaluation:
2. Live Evaluation: Industrial experts will conduct a live evaluation session where
they will assess your understanding of the project components, your ability to explain
the implemented features, and your problem-solving skills related to the project.
This evaluation aims to assess your proficiency in the covered technologies and your
ability to apply them to real-world projects, as well as to provide valuable feedback
from industry experts to further enhance your skills.
Use Case 10: Customer Churn Prediction
Description:
The project focuses on developing a customer churn prediction system for a business.
Customer churn refers to the rate at which customers stop doing business with a
company or switch to a competitor. By analysing historical customer data and
relevant features, the goal is to build a model that can predict which customers are
most likely to churn. This information can help businesses proactively identify at-risk
customers and take appropriate retention measures to reduce churn rates.
Learning Outcome:
Through this project, you will gain experience and understanding in the following
areas:
2. Data Pre-processing and Feature Engineering: Clean and pre-process the data,
handle missing values, and perform feature engineering to extract meaningful
features for churn prediction. This may involve creating new features, transforming
variables, and encoding categorical variables.
5. Model Training and Evaluation: Split the data into training and testing sets,
train the classification models using the training data, and evaluate their performance
using appropriate evaluation metrics such as accuracy, precision, recall, and F1-
score.
Tasks:
1. Data Collection and Exploration: Gather and explore customer data, including
demographics, transaction history, and customer interactions.
2. Data Pre-processing and Feature Engineering: Clean the data, handle missing
values, and perform feature engineering to extract relevant features for churn
prediction.
3. Feature Selection: Identify the most important features for churn prediction.
5. Model Training and Evaluation: Split the data, train the classification models,
and evaluate their performance using appropriate metrics.
Evaluation:
2. Live Evaluation: Industrial experts will conduct a live evaluation session where
they will assess your understanding of the project components, your ability to explain
the implemented features, and your problem-solving skills related to the project.
Description:
The project focuses on developing a mobile price classification system using machine
learning techniques. The system aims to predict the price range of mobile phones
based on various features and specifications. By analysing the dataset of mobile
phones with labelled price ranges, the model will learn patterns and correlations to
accurately classify the price range of new mobile phones. The project involves data
pre-processing, feature selection, model building and evaluation, and the
development of a user-friendly interface for accessing the price classification system.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
Tasks:
1. Data Collection: Collect a dataset of mobile phones with labelled price ranges.
The dataset should include various features such as brand, display size, RAM, internal
storage, camera quality, battery capacity, etc.
Evaluation:
This examination aims to assess your knowledge of the subjects presented and your
ability to apply it to actual projects, as well as to offer insightful criticism from
professionals in the field to help you develop your abilities.
The project's goal is to create a system for predicting house prices for residential
properties in India. The project focuses on developing a machine learning model that
can precisely estimate house prices based on several traits and parameters using a
dataset specifically designed for the Indian housing industry. Data pre-processing,
feature engineering, model training and evaluation, as well as the creation of a user-
friendly interface to access and interact with the price prediction system, will all be
part of the system.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
Tasks:
3. Feature Engineering: Analyse the dataset and extract meaningful features that
can capture the variations and patterns in the Indian housing market. Create new
features if necessary, such as price per square foot or distance to important
landmarks.
Evaluation:
This examination aims to assess your knowledge of the subjects presented and your
ability to apply it to actual projects, as well as to offer insightful criticism from
professionals in the field to help you develop your abilities.
Description:
The project's goal is to create a mechanism for estimating prices for Airbnb listings
in different European locations. The project's goal is to develop a machine learning
model that can precisely estimate the costs of Airbnb rooms based on numerous
variables and parameters by using a dataset that is particular to European Airbnb
listings. The system entails data pre-processing, feature engineering, model training
and evaluation, as well as the creation of an intuitive user interface for accessing and
interacting with the price prediction system.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
Tasks:
3. Feature Engineering: Analyse the dataset and extract relevant features that
capture the variations and patterns in the European Airbnb market. Create new
features if necessary, considering factors such as proximity to attractions,
transportation options, and local amenities.
Evaluation:
Description:
The project aims to develop a machine learning model to predict airline passenger
satisfaction based on various factors and features. By utilising a dataset specific to
airline passenger reviews and feedback, the project focuses on building a model that
accurately classifies whether a passenger is satisfied or dissatisfied with their flying
experience. The system involves data pre-processing, feature engineering, model
training and evaluation, the development of a user-friendly interface for predictions,
and the deployment of the system for real-time satisfaction predictions.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
Tasks:
This test is designed to evaluate your understanding of the material and your ability
to apply it to real-world tasks. It also intends to provide you with helpful feedback
from industry experts to help you improve your skills.
Description:
The project aims to develop a machine learning model to predict salary categories
based on various features and factors. By utilising a dataset specific to job listings
and corresponding salaries, the project focuses on building a classification model that
can accurately classify whether a salary falls into low, medium, or high categories.
The system involves data pre-processing, feature engineering, model training and
evaluation, and the development of a user-friendly interface for salary predictions.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
Tasks:
3. Feature Engineering: Analyse the dataset and extract relevant features that
capture the variations and patterns in job salaries. Create new features if necessary,
considering factors such as education and experience.
4. Machine Learning Model Building: Select suitable classification algorithms and
train multiple models using the pre-processed dataset. Experiment with different
algorithms, hyperparameters, and ensemble methods to find the best- performing
model.
Evaluation:
This examination aims to assess your knowledge of the subjects presented and your
ability to apply it to actual projects, as well as to offer insightful criticism from
professionals in the field to help you develop your abilities.
Description:
The project aims to perform advanced analysis of world university rankings data to
gain insights and understand the factors that contribute to a university's ranking. By
utilising a dataset containing various attributes of universities and their rankings, the
project focuses on exploring the data, conducting statistical analysis, and developing
visualisations to uncover patterns, trends, and relationships. The analysis will involve
data pre-processing, exploratory data analysis, hypothesis testing, and advanced
visualisation techniques.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
5. Interpretation and Insights: Analyse the results of the statistical tests and
visualisations to derive meaningful insights about the factors that significantly impact
university rankings. Draw conclusions and make recommendations based on the
analysis.
Tasks:
6. Interpretation and Insights: Analyse the results of the statistical tests and
visualisations to derive meaningful insights about the factors that significantly impact
university rankings. Summarise the findings and draw conclusions.
Evaluation:
This test is designed to evaluate your understanding of the material and your ability
to apply it to real-world tasks. It also intends to provide you with helpful feedback
from industry experts and faculties to help you improve your skills.
Description:
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
Tasks:
3. Feature Engineering: Analyse the dataset and extract relevant features that
are indicative of placement outcomes. Create aggregate features, derive new
features, and identify key predictors of placements.
Evaluation:
This test is designed to evaluate your understanding of the material and your ability
to apply it to real-world tasks. It also intends to provide you with helpful feedback
from industry experts and faculties to help you improve your skills.
Description:
The project's objective is to examine and learn more about data science wages in
2023. The research focuses on analysing salary trends, identifying variables driving
salary variances, and giving a thorough analysis of the data science job market by
using a dataset specialised to data science job positions and their accompanying
salaries. Data pre-processing, exploratory data analysis, statistical modelling, and
visualisation approaches will all be used in the analysis.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
1. Understanding Data Science Salaries: Familiarise yourself with the factors that
influence data science salaries, such as experience level, education, location,
industry, and job responsibilities.
6. Interpretation and Insights: Analyse the results of the statistical modelling and
visualisations to derive meaningful insights about the factors that significantly impact
data science salaries. Identify the most influential factors and provide
recommendations or insights for job seekers or employers.
Tasks:
1. Data Collection: Collect a dataset containing data science job positions and
their corresponding salaries for the year 2023. Include relevant information such as
experience level, education, location, industry, and job responsibilities.
6. Interpretation and Insights: Analyse the results of the statistical modelling and
visualisations to derive meaningful insights about the factors that significantly impact
data science salaries. Summarise findings and provide recommendations or insights
for job seekers or employers.
Evaluation:
This test is designed to evaluate your understanding of the material and your ability
to apply it to real-world tasks. It also intends to provide you with helpful feedback
from industry experts and faculties to help you improve your skills.
Description:
The project's objective is to examine global energy consumption trends and gather
new knowledge about them. The research focuses on studying energy consumption
patterns, identifying the key sources of energy, and analysing the distribution of
energy consumption across regions by utilising a comprehensive dataset on energy
consumption across various countries and energy sources. Techniques for exploratory
data analysis, and visualisation will all be used in the analysis.
Learning Outcome:
By working on this project, you will gain the following learning outcomes:
5. Interpretation and Insights: Analyse the results of the statistical modelling and
visualisations to derive meaningful insights about the factors that influence energy
consumption. Identify the primary energy sources, understand regional variations,
and provide recommendations or insights for energy policymakers and stakeholders.
Tasks:
5. Interpretation and Insights: Analyse the results of the statistical modelling and
visualisations to derive meaningful insights about the factors that influence energy
consumption. Summarise findings and provide recommendations or insights for
energy policymakers and stakeholders.
Evaluation:
This test is designed to evaluate your understanding of the material and your ability
to apply it to real-world tasks. It also intends to provide you with helpful feedback
from industry experts and faculties to help you improve your skills.
Description:
The IPL Data Analysis project focuses on analysing the data from the Indian Premier
League (IPL), a popular professional Twenty20 cricket league in India. By working on
this project, participants will gain insights into team performance, player statistics,
match results, and various aspects of the IPL. The project aims to provide valuable
information for cricket enthusiasts, team management, and decision-making in the
context of the IPL. Participants will utilize data analysis techniques, visualization
tools, and statistical methods to analyse player performances, team strategies, match
outcomes, and other relevant factors.
Learning Outcome:
By working on the IPL Data Analysis project, participants will have the opportunity to
expand their knowledge and gain expertise in the following areas:
1. Data Pre-processing and cleaning techniques for IPL data: Participants will
learn how to handle missing values, inconsistencies, and outliers in the IPL dataset.
They will gain experience in data cleaning and transformation techniques to ensure
the data is suitable for analysis.
5. Utilizing data visualization tools to present insights and trends in IPL data:
Participants will gain proficiency in creating visually appealing and informative
visualizations using tools such as matplotlib, seaborn, or Plotly. They will learn how
to effectively communicate complex insights from IPL data through charts, graphs,
and interactive visualizations.
6. Identifying key players, team dynamics, and factors contributing to match
outcomes: Participants will analyze the performance of individual players and their
impact on team success. They will gain insights into the dynamics of team
performance, understanding how different players contribute to match outcomes and
overall team performance.
Tasks:
- Collecting IPL data, including match results, player statistics, and team
information.
- Analyzing and visualizing IPL data to identify patterns, trends, and interesting
insights.
fielding performance.
4. Statistical Evaluation:
6. Data Visualization:
Evaluation:
This test is designed to evaluate your understanding of the material and your ability
to apply it to real-world tasks. It also intends to provide you with helpful feedback
from industry experts and faculties to help you improve your skills.