0% found this document useful (0 votes)

18 views50 pages

Class12 DataScience Project Template 2024-25

DataScience_Project

Uploaded by

Sravan Kumar Dunna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views50 pages

Class12 DataScience Project Template 2024-25

DataScience_Project

Uploaded by

Sravan Kumar Dunna

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 50

DELHI PUBLIC SCHOOL, HYDERABAD

DATA SCIENCE

PROJECT II –
PREDICT BASKETBALL PLAYER
EFFICIENCY RATINGS BY USING
MACHINE LEARNING AND VISUAL
STUDIO CODE

GRADE XII

CBSE BOARD ROLL NUMBER:

Academic Year: 2024 - 25

Title of the Project: Project II - Prediction Model

Name of the Student: Harshith Dunna

Class & Section: XII D

Batch: 2024 - 2025

Subject Teacher: Ms. Veena Hegde

CERTIFICATE FROM THE SCHOOL

This is to certify that

of class XII, Delhi Public School, Hyderabad, has
done this project as a part of Data Science (844)
Curriculum issued by CBSE.

___________ has shown sincerity and utmost

care in the completion of this project. I certify
that this project is up to my expectations and as
per the guidelines issued by the CBSE.

Internal Examiner External Examiner

I, do hereby declare that this
project is implemented by me and I would like to thank Ms.
Veena Hegde for her wholehearted support and guidance for
making it possible to complete this project on time.

I would also like to thank Microsoft for all the study materials.

I also thank the Central Board of Secondary Education (CBSE)

for designing the curriculum in such a manner that it provided a
wonderful opportunity to gain hands-on experience and
guidance.

Name:
Signature:
PROJECT II
INDEX

1. Introduction

2. Set up your local environment for data science coding

3. Data cleansing part 1 – Find missing values

4. Data cleansing part 2 – Drop unnecessary columns and rows

5. Data exploration part 1 – Identify outliers

6. Data exploration part 2 – Check the distribution of data

7. Data exploration part 3 – Find data representing more than

one population

8. Data manipulation part 1 – Add relevant player information

9. Data manipulation part 2 – Fill in missing values in specific

column.

10. Data manipulation part 3 – Use machine learning to predict

and fill in missing data

11. Knowledge Check

12. Conclusion
INTRODUCTION TO THE PROJECT

We can create an code using Visual Studio Code to

study and predict the performance of different
Basketball players in the new film Space Jam: A New
Legacy

We will use some tools and methods from Data

Science and machine learning, explained below:

● Use Python, Pandas, and Visual Studio Code to

look at basketball stats.

● Apply machine learning to clean and fill any

missing data in the datasets.

● Learn how to find patterns in data for both human

and Tune Squad basketball players.
SETUP LOCAL PYTHON
ENVIRONMENT IN VISUAL STUDIO
CODE

1) Create a folder named ‘space-jam-anl’ anywhere on the

computer.

2) Open Visual Studio Code and open the folder you created
3) Create file ‘space-jam-anl.ipynb’ in the ‘space-jam-anl’
folder

4) Ensure file opens in notebook, Jupyter server is connected

and kernel points to correct Python version

DOWNLOAD DATA FOR BASKETBALL PLAYERS

5) On GitHub download CSV file at player_data.csv

6) Save the player_data.csv file in your space-jam-anl folder

7) Select the CSV File to view on Visual Studio Code

DATA CLEANSING PART 1 - FIND
MISSING VALUES:

● Explore the data: (use the below code)

“import pandas as pd”

Output:

● Look for missing values:

(Use the isna() function with the data frame)
Output:

Print out the information:

Output:
DATA CLEANSING PART 2 - DROP
COLUMNS AND ROWS

To drop columns, you'll use the dropna():

● By default, dropna() removes rows, so specify that

you want to remove columns by using the axis
parameter.

● The dropna() method usually returns a new

DataFrame.

● Use the inplace parameter to tell it to drop these

columns in the original player_df DataFrame.

● The dropna() to remove only columns in which all

the values are missing.

● Set the how parameter to ‘all’.

● The thresh parameter refers to threshold.
● This parameter lets us to set the minimum number
of non-NaN values a row or
● column needs to avoid being dropped by dropna().
● To remove specific rows from the DataFrame, set
thresh to 12.
The index counts 0 through 10, skipping 8.
● The row that had the index of 8 was dropped
because it had more than
● two NaN values.
● In the 14 columns, the rows that had three or more
NaN values didn’t meet the
● threshold of 12.
DATA EXPLORATION PART 1 -
CHECKING FOR OUTLIERS

● Outliers are data values so far outside the

distribution of other values that they
● bring into question whether they even belong in the
dataset.
● Outliers often arise from data errors or other
undesirable noise.
● Need to check for and deal with possible outliers
before we analyze the data.
● A quick way to identify outliers is to use the pandas
describe() function.
● A list of the column names, excluding ID.

● The list is to find specific values within each row.

● Create a matrix of subplots to have one figure that

shows all 13 columns.

● Add padding around the subplots to make them

easier to read.

● Create a box plot based on the data in each column,

across all the rows.
The tail() function shows the last five values of a
DataFrame.

Resetting the index for the DataFrame to ensure

accuracy within the data.
DATA EXPLORATION PART 2 - CHECK
THE DISTRIBUTION OF THE DATA

● A common way to visualize the distribution of data

is a histogram.
● A histogram is a bar chart that shows how many
times the data in a dataset appears within a range of
values
● Ranges are called bins.
Create kernel-density estimates of the DataFrame
data

Using a for loop to generate a matrix of KDEs for all the

columns.
● Each top represents a mode of the data, or a
value around which values in the dataset
concentrate.
● The fact that so many of the columns are
bimodal indicates that the dataset
● represents samples from two discrete
populations.
DATA EXPLORATION PART 3 - DATA
THAT REPRESENTS MORE THAN ONE
POPULATION

Around 1,600, the two populations split.

Rows where players scored more than 1,600 points:
The PER for player 34 is given as NaN since no value
was given for them, so the system replaces the value
with NaN.

Now that we have found the outliers, its time to

manipulate the data a little bit.
DATA MANIPULATION PART 1 - ADD
QUALIFYING PLAYERS
INFORMATION

● Identified the groups of players by examining the

bimodal histograms.

● Creating a column to indicate whether a row

represents a human or Tune Squad player.

● Giving each row a unique ‘name.’

● Creating the new column for the DataFrame.

● Creating it by making a list of values for the column

and then assigning the column a name.
Adding the list of strings to the DataFrame.

Changing the column order:

Output:
DATA MANIPULATION PART 2 -
IMPUTE MISSING VALUES FOR
COLUMNS

Checking the columns with missing values:

Revisit the histograms for GP and MPG.
Impute missing values by using average values

Data is clean now. We have only one column left to

clean using machine learning algorithm.
DATA MANIPULATION PART 3 -
IMPUTE MISSING VALUES BY USING
MACHINE LEARNING

• As player_df.isna().sum() confirmed that nine missing

values remain in PER.
• Can not use a simple average to impute values in PER
column.
• PER is computed from the values of the nine columns that
precede it in the DataFrame (GP through REBR).
• To get some sense of a model’s accuracy, you could use
machine learning to split your data into two subsets: test and
training.
• The training subset is the portion of the data you use to
train the model and other subset to test the model.
• 75 percent of the data is used to train the model, and 25
percent is used to test the model.
• To achieve this, we can use a technique called cross-
validation.
• The idea is to iterate through the dataset, splitting the data
in different ways between training data and test data.
• The following image provides a visualization of the cross-
validation process.
Cross-validate the R2 scores for the model

● I have used 10-fold cross validation.

● Python will iterate through the data 10 times,
reserving 10 percent of the data for
● testing and training on the other 90 percent of the
data each time.
● A histogram of the results is plotted.
Output:

This model is good because R2 (R Square) score is

99.95 percent.
Fit the regression model for the player data

● Model fitting using all the data.

● General Rule:
● use cross-validation for model selection or
evaluation, but you use all of the
● data for model building.

Creating a mask of rows that use missing values in

the DataFrame
Using the mask and the fitted mask to impute the
final missing values in the DataFrame
To check any other missing value. Printing the
entire dataset.
Exporting in another file as a CSV file.

The new data set will look like:

KNOWLEDGE CHECK

CONCLUSION
In conclusion, this project helped me learn how to use
data science and machine learning to understand
basketball stats. I used Python, Pandas, and Visual
Studio Code to clean and analyze the data and predict
how players might perform. The steps in the project
made it easy to see how these tools can be used to
work with real-world data and find helpful insights.
Overall, it was a good way to learn about coding and
data analysis through basketball.

ML Lab Manual 2025-2
No ratings yet
ML Lab Manual 2025-2
35 pages
IP Practical File 2024-25
100% (7)
IP Practical File 2024-25
22 pages
Practical File 12.
No ratings yet
Practical File 12.
22 pages
RAKESH
No ratings yet
RAKESH
24 pages
Practical File Ishanvi
No ratings yet
Practical File Ishanvi
36 pages
Divya Class 12 Board Practical File
No ratings yet
Divya Class 12 Board Practical File
31 pages
ML 1
No ratings yet
ML 1
16 pages
Ip Practical File
No ratings yet
Ip Practical File
34 pages
Dhruv 1121
No ratings yet
Dhruv 1121
24 pages
Ip Investigatory Project
No ratings yet
Ip Investigatory Project
28 pages
Practical File IP Initial Pages - XII
No ratings yet
Practical File IP Initial Pages - XII
7 pages
Ip Practical File
No ratings yet
Ip Practical File
26 pages
Practical File Informatics Practices Class 12
No ratings yet
Practical File Informatics Practices Class 12
27 pages
Ipl Data Analysis Porgram
No ratings yet
Ipl Data Analysis Porgram
6 pages
PracticalList - EDT - BCA - 2024 SET B1 - 4
No ratings yet
PracticalList - EDT - BCA - 2024 SET B1 - 4
8 pages
Ip Practical File
No ratings yet
Ip Practical File
21 pages
SREE
No ratings yet
SREE
24 pages
Exemplar - Perform Feature Engineering
No ratings yet
Exemplar - Perform Feature Engineering
14 pages
SL-III Lab Manual
No ratings yet
SL-III Lab Manual
74 pages
Practical File Sai Lalit
No ratings yet
Practical File Sai Lalit
32 pages
Py Report
No ratings yet
Py Report
13 pages
T20 Batting Analysis
No ratings yet
T20 Batting Analysis
22 pages
Identifying Columns With Missing Values
No ratings yet
Identifying Columns With Missing Values
4 pages
Dsbdal Lab Manual
No ratings yet
Dsbdal Lab Manual
107 pages
Data - Science - Manaul (Te)
No ratings yet
Data - Science - Manaul (Te)
78 pages
Ajeet
No ratings yet
Ajeet
26 pages
IP Practical
No ratings yet
IP Practical
15 pages
HIT391-week 3-New
No ratings yet
HIT391-week 3-New
43 pages
Practical File Infomatics Practices 2024-25
No ratings yet
Practical File Infomatics Practices 2024-25
39 pages
Practical File 12th
No ratings yet
Practical File 12th
19 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
Data-Engineering EINDE
No ratings yet
Data-Engineering EINDE
13 pages
Practical For Class XII
No ratings yet
Practical For Class XII
19 pages
Ipl Data Anlysis
No ratings yet
Ipl Data Anlysis
20 pages
Data Science Project - Flow Graph
No ratings yet
Data Science Project - Flow Graph
7 pages
IP Practical File 2022
No ratings yet
IP Practical File 2022
26 pages
IP - PRACTICAL EXAM - Revision
No ratings yet
IP - PRACTICAL EXAM - Revision
24 pages
Coding Notes Data Science
No ratings yet
Coding Notes Data Science
4 pages
04 DS 2023
No ratings yet
04 DS 2023
63 pages
Holiday Homework Xii B
No ratings yet
Holiday Homework Xii B
5 pages
Voice Assistant
No ratings yet
Voice Assistant
46 pages
DSBDAlab Manual
No ratings yet
DSBDAlab Manual
116 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Dsbda Lab Manual Merged
No ratings yet
Dsbda Lab Manual Merged
117 pages
1152CS239-Intro. To Data Science-Syllabus
No ratings yet
1152CS239-Intro. To Data Science-Syllabus
6 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
Project Report-Micro Credit Loan
No ratings yet
Project Report-Micro Credit Loan
8 pages
DAV Practical 2
No ratings yet
DAV Practical 2
6 pages
ML Task
No ratings yet
ML Task
4 pages
Data Science in Society Cat
No ratings yet
Data Science in Society Cat
5 pages
Ip Final Practical File
No ratings yet
Ip Final Practical File
22 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
Python Practice Questions
No ratings yet
Python Practice Questions
5 pages
XII - IP - Practical - List 2023-24
No ratings yet
XII - IP - Practical - List 2023-24
4 pages
DBDAL LAB - MANUAL - Final
No ratings yet
DBDAL LAB - MANUAL - Final
93 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Ip 12th Practical
No ratings yet
Ip 12th Practical
22 pages
XII IP Practical List 2023-24
No ratings yet
XII IP Practical List 2023-24
4 pages
Apractical File Informatics Practices Class 12 For 2022-23 No WM
No ratings yet
Apractical File Informatics Practices Class 12 For 2022-23 No WM
27 pages
Predicting Players Rating
No ratings yet
Predicting Players Rating
4 pages
Getting Started With Power Query: Presented By: John Larimer
No ratings yet
Getting Started With Power Query: Presented By: John Larimer
45 pages
ZTE OLTs Initial Setup
No ratings yet
ZTE OLTs Initial Setup
3 pages
FAQs ICTO
100% (1)
FAQs ICTO
3 pages
A Real Time Novel Technique For Controlling CNC System
No ratings yet
A Real Time Novel Technique For Controlling CNC System
9 pages
SNMPv1 Network Management Organization and Information Models
No ratings yet
SNMPv1 Network Management Organization and Information Models
52 pages
CGR Microproject
No ratings yet
CGR Microproject
11 pages
Goldman Sachs Interview Prep
No ratings yet
Goldman Sachs Interview Prep
5 pages
ROVE R3 User Manual
No ratings yet
ROVE R3 User Manual
96 pages
CN Lab Manual
No ratings yet
CN Lab Manual
36 pages
3MTT Onboarding Learning Resources
No ratings yet
3MTT Onboarding Learning Resources
31 pages
Softdot Hi - Tech Educational & Training Institute Unit-1 Operating System Overview
No ratings yet
Softdot Hi - Tech Educational & Training Institute Unit-1 Operating System Overview
67 pages
DLD Lab-Report
No ratings yet
DLD Lab-Report
49 pages
CS8661 - IP Lab Manual Final
No ratings yet
CS8661 - IP Lab Manual Final
86 pages
Memory Access Method
No ratings yet
Memory Access Method
14 pages
HeliCopter Report
No ratings yet
HeliCopter Report
30 pages
BRKCRS 3810
No ratings yet
BRKCRS 3810
143 pages
Canon PIXMA GM2070 Single Function Wi
No ratings yet
Canon PIXMA GM2070 Single Function Wi
6 pages
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
No ratings yet
Data Sheet Acronis SCS Cyber Backup 12.5 Hardened Edition EN US 230627
2 pages
New Text Document
No ratings yet
New Text Document
8 pages
Icom Rev (Com)
No ratings yet
Icom Rev (Com)
17 pages
Risa3dtutorial32024 1737985583983
No ratings yet
Risa3dtutorial32024 1737985583983
11 pages
Gujarat Technological University: Bachelor of Engineering Subject Code: 3160707
No ratings yet
Gujarat Technological University: Bachelor of Engineering Subject Code: 3160707
4 pages
31725H Unit6 Pef20200318
No ratings yet
31725H Unit6 Pef20200318
25 pages
Dbms Aptitute Q and A
No ratings yet
Dbms Aptitute Q and A
63 pages
W2 Topic3 RelationalDatabaseDesign 2021
No ratings yet
W2 Topic3 RelationalDatabaseDesign 2021
13 pages
CN Suggesion Ca3
No ratings yet
CN Suggesion Ca3
2 pages
Compass NNW Nne - Google Search
No ratings yet
Compass NNW Nne - Google Search
1 page
Distributed Fine-Tuning With The Transformers API by HuggingFace - Databricks
No ratings yet
Distributed Fine-Tuning With The Transformers API by HuggingFace - Databricks
7 pages
Cprcs It Kerala
No ratings yet
Cprcs It Kerala
17 pages

Class12 DataScience Project Template 2024-25

Uploaded by

Class12 DataScience Project Template 2024-25

Uploaded by

DELHI PUBLIC SCHOOL, HYDERABAD

CBSE BOARD ROLL NUMBER:

Academic Year: 2024 - 25

Name of the Student: Harshith Dunna

Class & Section: XII D

Batch: 2024 - 2025

Subject Teacher: Ms. Veena Hegde

This is to certify that

___________ has shown sincerity and utmost

Internal Examiner External Examiner

I also thank the Central Board of Secondary Education (CBSE)

2. Set up your local environment for data science coding

3. Data cleansing part 1 – Find missing values

4. Data cleansing part 2 – Drop unnecessary columns and rows

5. Data exploration part 1 – Identify outliers

6. Data exploration part 2 – Check the distribution of data

7. Data exploration part 3 – Find data representing more than

8. Data manipulation part 1 – Add relevant player information

9. Data manipulation part 2 – Fill in missing values in specific

10. Data manipulation part 3 – Use machine learning to predict

11. Knowledge Check

We can create an code using Visual Studio Code to

We will use some tools and methods from Data

● Use Python, Pandas, and Visual Studio Code to

● Apply machine learning to clean and fill any

● Learn how to find patterns in data for both human

1) Create a folder named ‘space-jam-anl’ anywhere on the

4) Ensure file opens in notebook, Jupyter server is connected

DOWNLOAD DATA FOR BASKETBALL PLAYERS

5) On GitHub download CSV file at player_data.csv

6) Save the player_data.csv file in your space-jam-anl folder

7) Select the CSV File to view on Visual Studio Code

● Explore the data: (use the below code)

“import pandas as pd”

● Look for missing values:

Print out the information:

To drop columns, you'll use the dropna():

● By default, dropna() removes rows, so specify that

● The dropna() method usually returns a new

● Use the inplace parameter to tell it to drop these

● The dropna() to remove only columns in which all

● Set the how parameter to ‘all’.

● Outliers are data values so far outside the

● The list is to find specific values within each row.

● Create a matrix of subplots to have one figure that

● Add padding around the subplots to make them

● Create a box plot based on the data in each column,

Resetting the index for the DataFrame to ensure

● A common way to visualize the distribution of data

Using a for loop to generate a matrix of KDEs for all the

Around 1,600, the two populations split.

Now that we have found the outliers, its time to

● Identified the groups of players by examining the

● Creating a column to indicate whether a row

● Giving each row a unique ‘name.’

● Creating the new column for the DataFrame.

● Creating it by making a list of values for the column

Changing the column order:

Checking the columns with missing values:

Data is clean now. We have only one column left to

• As player_df.isna().sum() confirmed that nine missing

● I have used 10-fold cross validation.

This model is good because R2 (R Square) score is

● Model fitting using all the data.

Creating a mask of rows that use missing values in

The new data set will look like:

You might also like