The document describes the steps taken to predict Cristiano Ronaldo's goal scoring using decision tree classification on a CSV dataset. The steps include: 1) reading the CSV file into a Jupyter notebook, 2) cleaning the data by replacing missing values and one-hot encoding categorical variables, 3) splitting the data into training and test sets, 4) using a decision tree classifier to predict goal scoring and probabilities, 5) outputting the results to a new CSV file. The process provided a valuable learning experience in applying data science skills to a real-world problem.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0 ratings0% found this document useful (0 votes)
26 views10 pages
Data Science Cr7
The document describes the steps taken to predict Cristiano Ronaldo's goal scoring using decision tree classification on a CSV dataset. The steps include: 1) reading the CSV file into a Jupyter notebook, 2) cleaning the data by replacing missing values and one-hot encoding categorical variables, 3) splitting the data into training and test sets, 4) using a decision tree classifier to predict goal scoring and probabilities, 5) outputting the results to a new CSV file. The process provided a valuable learning experience in applying data science skills to a real-world problem.
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 10
BY:VENKAT S RAGHAVAN
CRISTIANO RONALDO one of the greatest
players in football is always a delight to watch and me, being one of his greatest fans am privileged to predict his goal scoring using data science. STEP 1 Since the data was given to me in a csv file, I decided to use python 3 since I am well of with data analysis and visualization using pandas library. I used a Jupyter notebook for programming in python and data manipulation. STEP 2: Now I read the csv file using the required commands in python and then looked out for the shape of the data set given since it will give me a brief overview as to what is actually happening out there. Then I replaced all the ‘no values’ with 0 for the time being since machine learning algorithms cannot run when there is no data in the training set itself(i.e alternative to dropping rows). STEP 3: I Then created dummy variables for all the string type elements using One Hot encoding method and removed the duplicate columns for removing redundancy. The next step was for me to replace the 0 value in the columns by the mean values. This was done only in columns where the data was already in float format . If it is done in the other columns boolean could not be converted to float. STEP 4: I decided the value of the y variable as the ‘is_goal’ column and the X variable as all the columns other than y for predicting the outcome. Please know that as a football player all the parameters matter while playing and thus I have chosen every column for learning . STEP 5: I imported the decision tree classifier library from scikit-learn and then predicted the outcome of the goal scoring using the same algorithm. I split the data set into Training and test set using train_test_split in model selection library. Finally I predicted the values of the column is_goal. STEP 8 Since the probability of scoring was asked I used the decision tree algorithm to find the probablity using predict_proba function and thus got a numpy array The numpy array was finally converted to a 1D numpy array using the function ravel(). Step 9: The required output was separately collected in a dataframe and was named in a csv file as per the instructions . The Entire programming document was also collected.Here is a sample output I am really happy that I got to solve a real life case study using my programming skills. Whether the answer I got was right or wrong I really pushed myself to the limits especially in my thought process. Thank you for giving me such a wonderful experience.
Deriving A Model To Calculate The Probability of Scoring A Goal From Every Shooting Position in The Football Pitch and Applying It To Predict The XG For Different Matches.