0% found this document useful (0 votes)
26 views10 pages

Data Science Cr7

The document describes the steps taken to predict Cristiano Ronaldo's goal scoring using decision tree classification on a CSV dataset. The steps include: 1) reading the CSV file into a Jupyter notebook, 2) cleaning the data by replacing missing values and one-hot encoding categorical variables, 3) splitting the data into training and test sets, 4) using a decision tree classifier to predict goal scoring and probabilities, 5) outputting the results to a new CSV file. The process provided a valuable learning experience in applying data science skills to a real-world problem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views10 pages

Data Science Cr7

The document describes the steps taken to predict Cristiano Ronaldo's goal scoring using decision tree classification on a CSV dataset. The steps include: 1) reading the CSV file into a Jupyter notebook, 2) cleaning the data by replacing missing values and one-hot encoding categorical variables, 3) splitting the data into training and test sets, 4) using a decision tree classifier to predict goal scoring and probabilities, 5) outputting the results to a new CSV file. The process provided a valuable learning experience in applying data science skills to a real-world problem.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10

BY:VENKAT S RAGHAVAN

 CRISTIANO RONALDO one of the greatest


players in football is always a delight to watch
and me, being one of his greatest fans am
privileged to predict his goal scoring using
data science.
 STEP 1
 Since the data was given to me in a csv file, I
decided to use python 3 since I am well of
with data analysis and visualization using
pandas library. I used a Jupyter notebook for
programming in python and data
manipulation.
 STEP 2:
 Now I read the csv file using the required
commands in python and then looked out for
the shape of the data set given since it will
give me a brief overview as to what is actually
happening out there.
 Then I replaced all the ‘no values’ with 0 for
the time being since machine learning
algorithms cannot run when there is no data
in the training set itself(i.e alternative to
dropping rows).
 STEP 3:
 I Then created dummy variables for all the
string type elements using One Hot encoding
method and removed the duplicate columns
for removing redundancy.
 The next step was for me to replace the 0
value in the columns by the mean values. This
was done only in columns where the data was
already in float format . If it is done in the
other columns boolean could not be
converted to float.
 STEP 4:
 I decided the value of the y variable as the
‘is_goal’ column and the X variable as all the
columns other than y for predicting the
outcome.
 Please know that as a football player all the
parameters matter while playing and thus I
have chosen every column for learning .
 STEP 5:
 I imported the decision tree classifier library
from scikit-learn and then predicted the
outcome of the goal scoring using the same
algorithm.
 I split the data set into Training and test set
using train_test_split in model selection
library.
 Finally I predicted the values of the column
is_goal.
 STEP 8
 Since the probability of scoring was asked I
used the decision tree algorithm to find the
probablity using predict_proba function and
thus got a numpy array
 The numpy array was finally converted to a
1D numpy array using the function ravel().
 Step 9:
 The required output was separately collected
in a dataframe and was named in a csv file as
per the instructions .
 The Entire programming document was also
collected.Here is a sample output
 I am really happy that I got to solve a real life
case study using my programming skills.
Whether the answer I got was right or wrong I
really pushed myself to the limits especially in
my thought process.
 Thank you for giving me such a wonderful
experience.

You might also like