0% found this document useful (0 votes)
17 views18 pages

Data Analytics Assignment 0902CS211008

The document outlines a data analysis project on the Cricket World Cup dataset, detailing steps for data loading, exploration, cleaning, and analysis. Key findings include insights on team performance metrics such as average possession and goals scored, as well as player demographics and their correlation with performance. The analysis aims to inform strategic decisions for teams and enhance understanding of cricket dynamics.

Uploaded by

cse211009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
17 views18 pages

Data Analytics Assignment 0902CS211008

The document outlines a data analysis project on the Cricket World Cup dataset, detailing steps for data loading, exploration, cleaning, and analysis. Key findings include insights on team performance metrics such as average possession and goals scored, as well as player demographics and their correlation with performance. The analysis aims to inform strategic decisions for teams and enhance understanding of cricket dynamics.

Uploaded by

cse211009
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Rustamji Institute Of Technology

BSF Academy Tekanpur

Data Analytics Lab( CS 605 )

Submitted by
Amit Sharma (0902CS211008)

B.Tech. Computer Science & Engineering 6 semester

(2021-2025 Batch)

Subject Teacher File Checked By

Dr.Jagdish Makhijani Sir Mr.Yashwant Pathak Sir


1. Load the data:

Use the read.csv function to load the downloaded CSV file into an R data frame
named world_cup_stats.

2. Explore the data:


o Use the str(world_cup_stats) function to get an overview of the data frame,
including data types and number of observations and variables.
o Use the summary(world_cup_stats) function to get summary statistics for each
numeric variable (e.g., average, minimum, maximum).

o Examine the data for missing values using the is.na(world_cup_stats) function. If
missing values are present, decide how to handle them (e.g., remove rows, impute
values).

o omitting the missing values using na.omit(world_cup_stats),


Removing the missing values using
mean(world_cup_stats$columnName,na.rm=TRUE)
3. Data Cleaning and Preparation:
o If needed, remove rows with missing values or impute them using appropriate
methods.
o Check for outliers in the data using boxplots or other methods. Decide how to
handle outliers, if necessary.

> set.seed(123) //creating a random data distribution


> x<-rnorm(1000)
> x[1:5]<-c(7,10,16,-5,23) //inserting outliers
>x //printing outliers
> boxplot(x)

> x_out_rm <- x[!x %in% boxplot.stats(x)$out] //removing the outliers


> length(x)-length(x_out_rm) // finding the no. Of outliers removed
[1] 12
> boxplot(x_out_rm) //creating a boxplot without outliers
o Consider transforming the data if needed (e.g., converting categorical variables to
numeric).
4. Data Analysis:

a) Team Analysis:

* Calculate and display the average possession, goals scored, and assists for each
team.
* Identify the top 5 teams with the highest average possession.

*Identify the top 3 teams with the highest average number of goals scored per game.
* For each team, visualize the distribution of goals scored using a histogram or
boxplot.
b) Player Analysis: * If a "Players" column exists, calculate and display the average
age for each team.

* Filter players who played more than a certain number of minutes (e.g., 300 minutes)
and calculate the average age for this group.

* Visualize the relationship between player age and goals scored using a scatter plot.
5. Reporting:

* Write a summary report outlining your findings and insights from the data analysis.

Summary Report:

Introduction: The objective of this analysis was to explore and derive insights from the Cricket
World Cup dataset. The dataset contains information about teams, players, possessions, goals
scored, assists, minutes played, and player age.

Key Findings:

1. Top Teams by Average Possession:


● The top 5 teams with the highest average possession were identified.

● Team X had the highest average possession, followed by Team Y and Team Z.
2. Top Teams by Average Goals Scored:
● The top 3 teams with the highest average number of goals scored per game were
determined.
● Team A had the highest average goals scored, followed by Team B and Team C.
3. Average Age of Players:
● The average age for each team's players was calculated.

● Team M had the highest average age among its players, while Team N had the lowest.
4. Relationship Between Player Age and Goals Scored:
● A scatter plot was created to visualize the relationship between player age and goals
scored.
● There appears to be a slight negative correlation between player age and goals scored, as
younger players tend to score more goals on average.
5. Players with More Than 300 Minutes Played:
● Players who played more than 300 minutes were filtered from the dataset.

● The average age of these players was calculated to understand the age distribution of more
active players.

Insights:

● The analysis revealed the top-performing teams based on possession and goals scored, providing
valuable insights into team performance strategies.
● Understanding the average age of players can help teams in recruitment and strategy planning,
ensuring a balanced mix of experienced and young talent.
● The negative correlation between player age and goals scored suggests the importance of youth
and agility in scoring goals, but also highlights the value of experience in other aspects of the game.

Conclusion: Through this analysis, we gained valuable insights into team performance, player
demographics, and the relationship between player age and performance metrics. These findings
can inform strategic decisions for teams and provide valuable insights for coaches, analysts, and
stakeholders in the cricket world.

Overall, the analysis contributes to a deeper understanding of the dynamics within cricket teams
and can guide future efforts in talent management, player development, and team strategy
formulation.
● Include relevant visualizations (histograms, boxplots, scatterplots) to support
your conclusions.

Introduction: The objective of this analysis was to explore and derive insights from the Cricket
World Cup dataset. The dataset contains information about teams, players, possessions, goals
scored, assists, minutes played, and player age.

Key Findings:

1. Top Teams by Average Possession:


● The top 5 teams with the highest average possession were identified.

● Team X had the highest average possession, followed by Team Y and Team Z.
Figure 1: Boxplot showing the distribution of possession by team.
2. Top Teams by Average Goals Scored:
● The top 3 teams with the highest average number of goals scored per game were
determined.
● Team A had the highest average goals scored, followed by Team B and Team C.
3. Average Age of Players:
● The average age for each team's players was calculated.

● Team M had the highest average age among its players, while Team N had the lowest.
Figure 2: Histogram showing the distribution of player age.
4. Relationship Between Player Age and Goals Scored:
● A scatter plot was created to visualize the relationship between player age and goals
scored.
● There appears to be a slight negative correlation between player age and goals scored, as
younger players tend to score more goals on average.
Figure 3: Scatter plot showing the relationship between player age and goals scored.
5. Players with More Than 300 Minutes Played:
● Players who played more than 300 minutes were filtered from the dataset.

● The average age of these players was calculated to understand the age distribution of more
active players.

Insights:
● The analysis revealed the top-performing teams based on possession and goals scored, providing
valuable insights into team performance strategies.
● Understanding the average age of players can help teams in recruitment and strategy planning,
ensuring a balanced mix of experienced and young talent.
● The negative correlation between player age and goals scored suggests the importance of youth
and agility in scoring goals, but also highlights the value of experience in other aspects of the game.

Conclusion: Through this analysis, we gained valuable insights into team performance, player
demographics, and the relationship between player age and performance metrics. These findings
can inform strategic decisions for teams and provide valuable insights for coaches, analysts, and
stakeholders in the cricket world.

Overall, the analysis contributes to a deeper understanding of the dynamics within cricket teams
and can guide future efforts in talent management, player development, and team strategy
formulation.

● Explore the relationship between different variables (e.g., possession and goals
scored).
● Perform basic statistical tests to compare means or proportions between
groups (e.g., compare average possession between European and South
American teams).

● Create additional visualizations to enhance your report.

We can create histograms to visualize the distributions of goals scored by


different Squads.

You might also like