0% found this document useful (0 votes)

142 views20 pages

Astros

Uploaded by

karthikeyan R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

142 views20 pages

Astros

Uploaded by

karthikeyan R

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 20

Houston Astros Questionnaire

1) Given a set of hitting performance metrics, how do you choose which, if any, of the metrics are useful for evaluating players? What
properties do you think a good measure of player performance should exhibit? Note that we are not asking for specific baseball metrics
you like, but rather a general approach to identifying whether a metric is useful.

EXPLANATION: When we are provided with a dataset of hitting performance metrics, and need to figure out which metrics/attributes are
useful for our analysis, we need to perform Exploratory Data Analysis(EDA).
Before that we need to ensure that our data is clean and doesn't consist of error values.
It is performed by handling the missing values, detecting the outliers, and scaling the attributes for uniformity.
OBJECTIVE: To showcase various methods to approach feature selection.
The basic idea in evaluating the metrics is to detect which metric has high covariance and correlation towards the desired
objective/output.
Covariance: It shows how two variables are closely related and change accordingly, which significantly constitutes the strength of the
relationship between them.
Correlation: It shows the strength of relationship as well as the direction which means the change in values affecting them either
positively or negatively.
I have taken a sample data to show more in detail.

import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
# Creating a random dataset with atrributes as follows
example = {
'player': ['Player1', 'Player2', 'Player3', 'Player4', 'Player5'],
'batting_avg': [0.320, 0.275, 0.305, 0.290, 0.310],
'on_base_pct': [0.400, 0.350, 0.380, 0.370, 0.390],
'slugging_pct': [0.550, 0.450, 0.500, 0.480, 0.520],
'RBIs': [90, 70, 85, 80, 88],
'runs': [100, 85, 95, 90, 98],
'stolen_bases': [5, 15, 8, 7, 12],
'age': [27,21,20,28,25]
}
# creating the dataframe
df = pd.DataFrame(example)
sns.pairplot(df.drop('player', axis=1))
plt.suptitle('Pairplot of Hitting Metrics', y=1.02)
plt.show()
# Correlation Analysis and visualization of heatmap
correlation_matrix = df.drop('player', axis=1).corr()
print("Correlation Matrix:")
print(correlation_matrix)
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Hitting Metrics')
plt.show()

Correlation Matrix:
batting_avg on_base_pct slugging_pct RBIs runs \
batting_avg 1.000000 0.992540 0.984185 0.982647 0.995701
on_base_pct 0.992540 1.000000 0.989813 0.986056 0.991678
slugging_pct 0.984185 0.989813 1.000000 0.953463 0.978235
RBIs 0.982647 0.986056 0.953463 1.000000 0.984983
runs 0.995701 0.991678 0.978235 0.984983 1.000000
stolen_bases -0.648027 -0.663151 -0.650462 -0.668256 -0.590085
age 0.337312 0.444936 0.442146 0.398734 0.337700

stolen_bases age
batting_avg -0.648027 0.337312
on_base_pct -0.663151 0.444936
slugging_pct -0.650462 0.442146
RBIs -0.668256 0.398734
runs -0.590085 0.337700
stolen_bases 1.000000 -0.545600
age -0.545600 1.000000
We can clearly see that compared to other features , stolen bases and age have less correlation with RBIs which indicates it doesn't
have significant relevance to our target variable.

Filter Methods:

Filter methods select features based on their scores in statistical tests for their correlation with the outcome variable. Examples
include the chi-squared test, information gain, and correlation coefficient scores. These methods are fast and straightforward but
they ignore the potential combined effect of individual features.

Wrapper Methods:

Wrapper methods consider the selection of a set of features as a search problem, where different combinations are prepared,
evaluated and compared to other combinations. A predictive model is used to evaluate a combination of features and assign a score
based on model accuracy. Examples of wrapper methods are recursive feature elimination and forward selection.

Embedded Methods:

Embedded methods learn which features best contribute to the accuracy of the model while the model is being created. The most
common type of embedded feature selection methods are regularization methods. Regularization methods are also called
penalization methods that introduce additional constraints into the optimization of a predictive algorithm (like a regression algorithm)
that bias the model toward lower complexity (fewer coefficients).

#Another example here where I have used embedded method to show feature selection with their importance
from sklearn.model_selection import train_test_split
X = df.drop(['RBIs', 'player'], axis=1)
y = df['RBIs']
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Feature Importance using Random Forest

from sklearn.ensemble import RandomForestRegressor
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X, y)
feature_importances = rf_model.feature_importances_
print("Feature Importances:", feature_importances)
# Visualize Feature Importances
features = X.columns
plt.figure(figsize=(8, 6))
plt.barh(features, feature_importances)
plt.xlabel('Importance')
plt.ylabel('Feature')
plt.title('Feature Importance from Random Forest')
plt.show()

Feature Importances: [0.21061774 0.20053971 0.14464277 0.21270678 0.13329108 0.09820191]

Once we get to know the correlation values, we can now extract the important features which necessarily means feature selection.
We are eradicating the irrelevant features, and focusing on the important features.

2) When developing a model to predict player performance, what methods do you employ to ensure that your model is useful on novel
data?

EXPLANATION: Predicting player performance is imminent to better the performance of not just the player,but also uplifts the overall
team performance.
OBJECTIVE: Discuss on the methods to employ in a model that works well for novel data.
Once we acquire the historical data of the player, the foremost work is to clean the data removing outliers, imputing missing values with
mean , median values and normalizing the values.
Random splitting the entire data into training and testing data is ideal in order to make sure the model we develop is generalized and
works well for the novel data(test data/any new data being added in future).
Now, the question is just because we split the data does the model which we develop works well for novel data? Definitely Not!!!
We need to choose the best model which generalizes the work of predicting the performance of player without any bias, but this can be
by,

Feature Selection using Correlation , Lasso Regression or KNN Best features inbuilt function.
Next,we need to perform model selection which best performs with all the features selected and which more correlates to the
problem being solved.Here we can use a regression model, random forest, or KNN regressor to predict what attributes contribute
well to a player performance.
We can use evaluating metrics like calculating Mean square error or accuracy which best suitable for our analysis.
To make the model even better by opitmizing it, we need to perform hyperparameter tuning such that having a learning rate or
such factors to enhance the tarining of the model.
Cross-validation technique is one such which helps in choosing the best hyperparameters by creating a validation set from the
training set and to learn which parameter value best suits for the training of the model.
All these should be done in order to make our model have a balance between bias and variance trade-off, else our model might
become overfitted or underfitted.(doesn't perform well for novel data)
Lastly, once we get a balanced and better model, we can evaluate on the test data and calculate metrics like accuracy to know its
performance.

3) If you had batted ball exit speed data for hitters, with widely varying sample sizes, how would you estimate a hitter's true exit speed
skill level?

EXPLANATION: We are provided with batted ball exit speed data for hitters(Let's say)
EXPLANATION: We are provided with batted ball exit speed data for hitters(Let's say)
OBJECTIVE: To estimate a hitter's true exit speed skill level.
In this case, we need to focus on the Bayesian methods which provide a robust framework for estimating the true skill level. Bayesian
inference allows us to incorporate prior knowledge and update our beliefs with new data, taking into account the varying sample sizes.
This approach not only provides a more accurate estimate but also quantifies the uncertainty associated with the estimate.
To understand it more precisely, I am going to consider a dataset with random values of exit speed.
Let's say
Hitter 1: 10 samples, mean exit speed 90
Hitter 2: 50 samples, mean exit speed 95
Hitter 3: 30 samples, mean exit speed 92

from scipy.stats import norm

#Below is the random data generated with diferent sample sizes and speed values for 3 different players.
exit_speeds = {
'hitter1':np.random.normal(loc=90, scale=5, size=10),
'hitter2':np.random.normal(loc=95, scale=8, size=50),
'hitter3':np.random.normal(loc=92, scale=6, size=30)
}
# Visualize exit speeds
plt.figure(figsize=(10, 6))
for player, speeds in exit_speeds.items():
sns.histplot(speeds, kde=True, label=player, stat='density', linewidth=0)
plt.title('Distribution of Exit Speeds')
plt.xlabel('Exit Speed (mph)')
plt.ylabel('Density')
plt.legend()
plt.show()

import scipy.stats as stats

import numpy as np
import matplotlib.pyplot as plt
# Setting up the Prior distribution parameters
mu_prior = 90
sigma_prior = 10
posterior_means = {}
# Function to calculate the posterior mean and variance
def calculate_posterior(data, mu_prior, sigma_prior):
n = len(data)
sample_mean = np.mean(data)
sample_variance = np.var(data, ddof=1)

# Posterior variance
posterior_variance = 1 / (n / sample_variance + 1 / sigma_prior**2)

# Posterior mean
posterior_mean = posterior_variance * (sample_mean * n / sample_variance + mu_prior / sigma_prior**2)

return posterior_mean, np.sqrt(posterior_variance)

# Calculate posterior for each hitter

for hitter, speeds in exit_speeds.items():
posterior_mean, posterior_std = calculate_posterior(speeds, mu_prior, sigma_prior)
posterior_means[hitter] = posterior_mean
print(f"{hitter}: Posterior Mean = {posterior_mean:.2f}, Posterior Std = {posterior_std:.2f}")

# Plotting posterior distributions

x = np.linspace(80, 105, 1000)
for hitter, speeds in exit_speeds.items():
posterior_mean, posterior_std = calculate_posterior(speeds, mu_prior, sigma_prior)
y = stats.norm.pdf(x, posterior_mean, posterior_std)
plt.plot(x, y, label=f'{hitter}: {posterior_mean:.2f} ± {posterior_std:.2f}')

# Labels and title

plt.xlabel('Exit Speed')
plt.ylabel('Density')
plt.title('Posterior Distributions of True Exit Speed Skill Levels')
plt.legend()
plt.grid(True)
plt.show()

hitter1: Posterior Mean = 89.53, Posterior Std = 2.29

hitter2: Posterior Mean = 95.94, Posterior Std = 1.08
hitter3: Posterior Mean = 92.64, Posterior Std = 1.02

With the above method, we will be able to approximate a hitter's true exit speed skill level with varying sample sizes.

4) Our General Manager has a miraculous encounter with a baseball genie who offers to conjure for the team a player to be our
designated hitter. The genie offers us the choice of one of two players, a. Force-Field Fred (magically guaranteed to walk every time he
comes up to bat), or b. Long-Ball Larry (magically guaranteed to homer in 25% of his plate appearances but to strikeout in the other 75%,
outcomes distributed over a random schedule over the full season).

Which player would you rather have on our team? Why? Are there any circumstances regarding the performance of the rest of our team’s
offense for which you would change your answer?

EXPLANATION: We need to analyze between two players and put forth a statement stating who creates a better impact overall to a
team's success.
OBJECTIVE:We are using Bayesian interference to update our belief about the expected runs created by each player in the context of
the team's performance between Force-Field Fred and Long-Ball Larry.

Taking into consider the prior belief, I am assuming prior mean, and runs scored for both the players and perform analysis and find
the posterior distribution of runs.

import numpy as np
import scipy.stats as stats
import matplotlib.pyplot as plt
# Priors for runs created per plate appearance
# Force-Field Fred: Prior belief that reflects an average performance
mu_prior_fred = 0.40 # Example prior mean
sigma_prior_fred = 0.02 # Example prior standard deviation
# Long-Ball Larry: Prior belief considering the 25% home run rate
mu_home_run_larry = 0.40 # Expected runs created per PA due to home runs
mu_prior_larry = 0.25 * mu_home_run_larry # Prior mean, adjusted for 25% home run rate
sigma_prior_larry = 0.10 # Example prior standard deviation
# Observed data: runs created per plate appearance (hypothetical)
observed_runs_fred = np.array([0.11, 0.09, 0.12, 0.10, 0.11])
observed_runs_larry = np.array([0.28, 0.32, 0.30, 0.31, 0.29])

Using Bayes' Theorem to update the prior distribution with the observed runs column to get the posterior distribution, which reflects
our updated belief about the player's impact on run creation.

# Posterior parameters calculation function

def calculate_posterior(mu_prior, sigma_prior, observed_data):
n = len(observed_data)
sample_mean = np.mean(observed_data)
sample_variance = np.var(observed_data, ddof=1)

# Posterior variance
posterior_variance = 1 / (n / sample_variance + 1 / sigma_prior**2)
posterior_std = np.sqrt(posterior_variance)

# Posterior mean
posterior_mean = posterior_variance * (sample_mean * n / sample_variance + mu_prior / sigma_prior**2)

return posterior_mean, posterior_std

# Calculate posteriors
posterior_mean_fred, posterior_std_fred = calculate_posterior(mu_prior_fred, sigma_prior_fred, observed_runs_fred
posterior_mean_larry, posterior_std_larry = calculate_posterior(mu_prior_larry, sigma_prior_larry, observed_runs_larr

print(f"Force-Field Fred: Posterior Mean = {posterior_mean_fred:.4f}, Posterior Std = {posterior_std_fred:.4f}")

print(f"Long-Ball Larry: Posterior Mean = {posterior_mean_larry:.4f}, Posterior Std = {posterior_std_larry:.4f}"

Force-Field Fred: Posterior Mean = 0.1239, Posterior Std = 0.0049

Long-Ball Larry: Posterior Mean = 0.2990, Posterior Std = 0.0071

# Plotting posterior distributions

x = np.linspace(0, 0.5, 1000)
posterior_fred = stats.norm.pdf(x, posterior_mean_fred, posterior_std_fred)
posterior_larry = stats.norm.pdf(x, posterior_mean_larry, posterior_std_larry)

plt.plot(x, posterior_fred, label=f'Fred: {posterior_mean_fred:.4f} ± {posterior_std_fred:.4f}')

plt.plot(x, posterior_larry, label=f'Larry: {posterior_mean_larry:.4f} ± {posterior_std_larry:.4f}')

plt.xlabel('Runs Created per Plate Appearance')

plt.ylabel('Density')
plt.title('Posterior Distributions of Runs Created')
plt.legend()
plt.grid(True)
plt.show()

So ultimately I would take Long-Ball Larry because of the runs he can create rather than Force-Field Larry as suggested above and
also in games you need to have an aggressive intent and take chances.
Also circumstancially speaking we need to have a mixture of players as a single player can win you matches, but a balanced team
can win you tournaments. Hence the team combination is relatively important while choosing a player but here Larry edges over
Fred.

1. Please answer the following question without using statistical software (using a calculator is acceptable).
Suppose batter has a true-talent batting average of .300 (he is expected to record hits in 30% of any sample of his official at-bats).
How probable is it that he could record fewer than 100 hits in 600 at-bats?
a.) Greater than 50%
b.) Between 10% and 50%
c.) Between 1% and 10%
d.) Between .1% and 1%
e.) Less than .1%
In a few sentences, please describe how you reached your conclusion.

EXPLANATION: We can understand that we need to find the probability of the batter hitting fewer than 100 hits in 600 at-bats.
Let 'X' be the event of batter hitting fewer than 100 hits in 600 at-bats.

OBJECTIVE: We need to approach this problem as a hit or miss, as those are the possible outcomes hence we can use binomial
distribution to arrive at the solution.

Number of at-bats(n)=600
Probability of getting a hit(p)=0.3

Mean=np=6000.3=180
Standard deviation=sqrt(np(1-p))=sqrt(6000.3(1-0.3))=11.22
Now, we need to find the cummulative probability, where P(X<100)=P(X=1)+P(X=2)+P(X=3)+....+P(X=99) which is a tedious process,
hence we approximate the X by z-standardization, where z=(X-Mean)/Standard deviation
z=(100-180)/11.22=>z=-7.13

From the z table, we can see than for very low values like z=-3.5 or lower, P(z<-3.5)=0.0001 ,hence for z=-7.13, it is even more less.

ANSWER:e.) Less than .1%

6) 11% of MLB players throw left handed. 32% of MLB players hit left handed. 85% of MLB players who throw left handed hit left handed.
Player X hits left handed. What is the probability that player X throws left handed?

EXPLANATION: Probability that a player throws left-handed(P_T)=0.11 Probability that a player hits left-handed(P_H)=0.32 Probability
that a player hits left-handed given they throw left-handed(P_H_given_T)=0.85
OBJECTIVE:To find the conditional probability that a player X throws left handed provided hits left handed.Using Bayes' Theorem

P_T_given_H = (P_H_given_T) * P_T) / P_H

# Display the result

print(f"The probability that a player throws left-handed given they hit left-handed is: {P_T_given_H:.4f}")

# Visualization: Plotting the probabilities

labels = ['Throws Left-Handed and Hits Left-Handed', 'Other Cases']
sizes = [P_T_given_H, 1 - P_T_given_H]
colors = ['skyblue', 'lightcoral']
explode = (0.1, 0) # explode the first slice
fig1, ax1 = plt.subplots()
ax1.pie(sizes, explode=explode, labels=labels, colors=colors, autopct='%1.2f%%',
shadow=True, startangle=90)
ax1.axis('equal') # Equal aspect ratio ensures that pie is drawn as a circle.
plt.title('Probability of Throwing Left-Handed Given Hitting Left-Handed')
plt.show()

The probability that a player throws left-handed given they hit left-handed is: 0.2922

7) Breakout hitter Lou Rice is off to a tremendous start to the season, with hits in 43 of his first 100 at-bats. Going into the season our
7) Breakout hitter Lou Rice is off to a tremendous start to the season, with hits in 43 of his first 100 at-bats. Going into the season our
belief about Lou’s true-talent batting average, i.e., what Lou’s hits per at-bat rate would be in the limit, could be described as a Beta
distribution parameterized with α = 58 and β = 142. Given our beliefs at the outset of the season and Lou’s performance through his first
100 at-bats, what should we believe Lou’s chance to be batting over .400 (hits in at least 40% of his at-bats) after he’s accumulated 500
at-bats on the season in total? Describe your reasoning.

EXPLANATION: We are provided with the information that hitter has 43 hits of his 100 at-bats.
hits_observed = 43; total_at_bats = 100;
But we are provided with a belief where his batting average described in terms of Beta distribution with α = 58 ;β = 142.
So, Lou Rice's Expected batting average E(X)=α/(α+β) = 58/(58+142) =0.29
OBJECTIVE:We need to find the probability that Lou's batting average will exceed over 0.4 after he has accumulated 500 at-bats in total.

import scipy.stats as stats

# Prior belief (before the season)

alpha_prior = 58
beta_prior = 142

# Observation (first 100 at-bats)

hits_observed = 43
total_at_bats = 100

# Updating Beta distribution parameters considering the belief probability as well.

alpha_posterior = alpha_prior + hits_observed
beta_posterior = beta_prior + (total_at_bats - hits_observed)

# Calculating the probability of batting over .400 (hits in at least 40% of at-bats)
desired_batting_average = 0.400

# Computing the probability using the Beta distribution CDF,P(X>0.4)

probability_over_400 = 1 - stats.beta.cdf(desired_batting_average, alpha_posterior, beta_posterior)

print(f"The probability that Lou Rice will bat over .400 after 500 at-bats is: {probability_over_400:.4f}")

The probability that Lou Rice will bat over .400 after 500 at-bats is: 0.0115

To have a better understanding we can visualize how the probability varies after taking into consideration of our belief by
analyzing the posterior distribution as follows.

# Calculating CDF value at 0.400

cdf_at_desired_ba = stats.beta.cdf(desired_batting_average, alpha_posterior, beta_posterior)
probability_over_400 = 1 - cdf_at_desired_ba

# Generate x values
x = np.linspace(0, 1, 1000)
# Generate CDF values
y = stats.beta.cdf(x, alpha_posterior, beta_posterior)

# Plot the CDF

plt.figure(figsize=(10, 6))
plt.plot(x, y, label='Beta CDF', color='blue')

# Highlight the point at x = 0.400

plt.axvline(desired_batting_average, color='red', linestyle='dashed', linewidth=1, label=f'x = {desired_batting_avera
plt.axhline(cdf_at_desired_ba, color='green', linestyle='dashed', linewidth=1, label=f'CDF at x = {desired_batting_av

# Add text annotations

plt.text(desired_batting_average + 0.02, 0.5, f'CDF(0.400) = {cdf_at_desired_ba:.4f}', color='green')
plt.text(0.5, cdf_at_desired_ba + 0.02, f'1 - CDF(0.400) = {probability_over_400:.4f}', color='red')

# Labels and title

plt.xlabel('Batting Average')
plt.ylabel('CDF')
plt.title('Cumulative Distribution Function (CDF) for Lou Rice\'s Batting Average')
plt.legend()

# Show plot
plt.grid(True)
plt.show()
8) Over the last few seasons, there’s been a lot of discussion in and around baseball about the qualities of the ball itself. Many have
claimed that the ball has become ‘juiced’, causing batted balls to travel further in the air than they had in previous seasons and resulting
in notably higher home run rates. A prominent hypothesis as to the manner with which the ball changed concerns it’s drag coefficient.
Drag is a measure of a projectile’s sensitivity to air resistance opposite the direction it’s traveling. A projectile with a lower drag coefficient
will travel through the air further than a similar projectile with a higher drag coefficient, all else equal. Provided is a random sample of
major league batted ball data from the past five seasons. You’re tasked with analyzing the data to determine whether or not the ball
actually varied over these five seasons. Provide any code you used to generate your conclusion (a markdown or notebook file is
recommended!) and present your argument for when and how the ball varied. Illustrations/data visualizations to help communicate your
findings are encouraged. Please spend no more than 4 hours on this question. Fields:

1. year
2. month
3. pitcher_throws ('L' for left handed pitcher, 'R' for right handed pitcher)
4. bat_side ('L' for left handed batter, 'R' for right handed batter)
5. pitch_type ('FF' for four-seam fastball, 'FT' for two-seam fastball)
6. release_speed (magnitude of velocity of the pitch towards the plate at 50' in mph)
7. plate_speed (magnitude of velocity of pitch as it crosses the front of home plate in mph)
8. hit_exit_speed (magnitude of velocity of the batted ball upon contact in mph)
9. hit_spinrate (rate of rotation of the ball upon contact in rpm)
10. hit_vertical_angle (launch angle; direction of the ball off the bat on the vertical plane -- 0 degrees is parallel to the ground, positive is
up, negative is down, in degrees)
11. hit_bearing (direction from the tip of home plate to the initial landing position of the batted ball on the horizontal plane -- 0 degrees is
directly up the middle, positive is towards the 1st base side, negative is towards the 3rd base side, in degrees)

12. hit_distance (distance between the tip of home plate and the initial landing position of the batted ball in feet)

13. event_result (text field describing the category of batted ball event outcome)

EXPALANTION: We are provided with a data set to analyze if the ball has varied over the last 5 years or not.
OBJECTIVE: To compare the hit_distance as well as find the drag coefficient and compare over the years to support the hypothesis
using Bayesian analysis methods.

#importing the necessary libraries

import numpy as np
import pandas as pd
import sklearn
import seaborn as sns
import matplotlib.pyplot as plt

#loading the dataset to a dataframe

data=pd.read_csv(r'C:\Users\karth\Downloads\data_sample.csv')

#getting an overview about the attributes and their count

data.info()

<class 'pandas.core.frame.DataFrame'>
RangeIndex: 5000 entries, 0 to 4999
Data columns (total 13 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 year 5000 non-null int64
1 month 5000 non-null int64
2 pitcher_throws 5000 non-null object
3 bat_side 5000 non-null object
4 pitch_type 5000 non-null object
5 release_speed 5000 non-null float64
6 plate_speed 5000 non-null float64
7 hit_exit_speed 4522 non-null float64
8 hit_spinrate 2939 non-null float64
9 hit_vertical_angle 4522 non-null float64
10 hit_bearing 4522 non-null float64
11 hit_distance 4522 non-null float64
12 event_result 5000 non-null object
dtypes: float64(7), int64(2), object(4)
memory usage: 507.9+ KB

data.describe()

year month release_speed plate_speed hit_exit_speed hit_spinrate hit_vertical_angle hit_bearing hit_distance

count 5000.000000 5000.000000 5000.000000 5000.000000 4522.000000 2939.000000 4522.000000 4522.000000 4522.000000

mean 2017.000000 6.555600 91.882837 84.985879 89.997665 2747.016428 11.998355 -0.310381 178.682747

std 1.414355 1.734217 2.817465 2.649820 13.984109 1287.734854 24.391452 26.953949 137.925585

min 2015.000000 3.000000 79.315483 73.368980 19.051146 414.808960 -73.101883 -179.131378 0.506874

25% 2016.000000 5.000000 90.176728 83.369785 82.775057 1682.917419 -3.684100 -18.847590 26.810730

50% 2017.000000 7.000000 91.971830 85.065407 92.808498 2574.025635 12.943902 -0.594337 188.074951

75% 2018.000000 8.000000 93.789923 86.806101 100.107733 3729.503418 28.040596 18.584233 303.585815

max 2019.000000 10.000000 103.396928 95.852409 117.753525 6855.195312 87.656059 176.065750 468.331482

#checking the null values

data.isnull().sum()

year 0
month 0
pitcher_throws 0
bat_side 0
pitch_type 0
release_speed 0
plate_speed 0
hit_exit_speed 478
hit_spinrate 2061
hit_vertical_angle 478
hit_bearing 478
hit_distance 478
event_result 0
dtype: int64

#dropping the null values

data=data.dropna()

#finding the covariance to find which features have an impact on hit_distance

data.cov()

C:\Users\karth\AppData\Local\Temp\ipykernel_40188\360834189.py:2: FutureWarning: The default value of numeric_o

nly in DataFrame.cov is deprecated. In a future version, it will default to False. Select only valid columns or
specify the value of numeric_only to silence this warning.
data.cov()
year month release_speed plate_speed hit_exit_speed hit_spinrate hit_vertical_angle hit_bearing hit_distance

year 1.885955 -0.063778 0.311990 0.481316 0.107759 3.125765e+02 1.635863 -0.128138 5.741926

month -0.063778 3.039575 0.396104 0.519096 0.037471 4.889908e+01 0.637760 0.888197 -0.161634

release_speed 0.311990 0.396104 7.710609 7.016733 -0.957655 1.569088e+02 0.520724 -1.127049 1.233447

plate_speed 0.481316 0.519096 7.016733 6.803553 -0.785150 1.301968e+02 -0.723909 -0.141827 -0.085241

hit_exit_speed 0.107759 0.037471 -0.957655 -0.785150 161.774480 -2.974013e+03 -42.857555 6.045054 505.754846

hit_spinrate 312.576517 48.899077 156.908803 130.196805 -2974.013030 1.658261e+06 13059.063748 3406.279827 21993.413916

hit_vertical_angle 1.635863 0.637760 0.520724 -0.723909 -42.857555 1.305906e+04 247.574851 7.174752 709.790666

hit_bearing -0.128138 0.888197 -1.127049 -0.141827 6.045054 3.406280e+03 7.174752 689.055516 50.606922

hit_distance 5.741926 -0.161634 1.233447 -0.085241 505.754846 2.199341e+04 709.790666 50.606922 11114.233301

#defining the correlation matrix

correlation_matrix = data.corr()
plt.figure(figsize=(10, 6))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm')
plt.title('Correlation Matrix of Player Performance Metrics')
plt.show()

C:\Users\karth\AppData\Local\Temp\ipykernel_40188\1870844869.py:2: FutureWarning: The default value of numeric_

only in DataFrame.corr is deprecated. In a future version, it will default to False. Select only valid columns
or specify the value of numeric_only to silence this warning.
correlation_matrix = data.corr()

features = ['hit_exit_speed', 'hit_vertical_angle', 'hit_distance', 'hit_spinrate', 'plate_speed', 'release_speed'

sns.set(style="ticks")

#pairplot to visualize dependencies if any

sns.pairplot(data[features])
plt.suptitle('Pairplot of Baseball Batted Ball Variables', y=1.02)
plt.show()
# Plot release speed vs plate speed colored by year
plt.figure(figsize=(10, 6))
sns.scatterplot(x='release_speed', y='plate_speed', hue='year', data=data)
plt.title('Release Speed vs Plate Speed')
plt.xlabel('Release Speed (mph)')
plt.ylabel('Plate Speed (mph)')
plt.show()
# Plot hit exit speed vs hit distance colored by hit spinrate
plt.figure(figsize=(10, 6))
sns.scatterplot(x='hit_exit_speed', y='hit_distance', hue='hit_spinrate', data=data)
plt.title('Hit Exit Speed vs Hit Distance')
plt.xlabel('Hit Exit Speed (mph)')
plt.ylabel('Hit Distance (feet)')
plt.show()

We can infer that from the plot, there is highly significant correlation between plate_speed and release speed.
We can also see hit_exit_speed and hit_vertical_angle has significant correlation with hit_distance.

# Plot hit distance over years

plt.figure(figsize=(10, 6))
sns.boxplot(x='year', y='hit_distance', data=data)
plt.title('Hit Distance Over Years')
plt.xlabel('Year')
plt.ylabel('Hit Distance')
plt.show()

It can be seen that the average hit distance has marginally increased over the years from 2015 to 2017 and then stayed pretty same.
But this summarizes all event outcomes, lets analyze for each outcome.

from sklearn.preprocessing import StandardScaler

from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
# Step 3: Feature selection
# Select relevant features
features = ['release_speed', 'plate_speed', 'hit_exit_speed', 'hit_spinrate', 'hit_vertical_angle', 'hit_bearing'
target = 'hit_distance'

X = data[features]
y = data[target]

# Standardize the features

scaler = StandardScaler()
X_scaled = scaler.fit_transform(X)

# Step 4: Feature importance using Random Forest

rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
rf_model.fit(X_scaled, y)

# Get feature importances

importances = rf_model.feature_importances_
feature_importances = pd.DataFrame({'Feature': features, 'Importance': importances}).sort_values(by='Importance'

# Visualize feature importances

plt.figure(figsize=(10, 6))
sns.barplot(x='Importance', y='Feature', data=feature_importances)
plt.title('Feature Importances for Hitting Distance')
plt.show()
# Step 5: Analyze hitting distance over the years
data['year'] = pd.to_datetime(data['year'], format='%Y')
data['year'] = data['year'].dt.year

# Group by year and calculate mean hitting distance

mean_distance_per_year_event = data.groupby(['year', 'event_result'])['hit_distance'].mean().reset_index()

# Visualize hitting distance over the years

plt.figure(figsize=(12, 6))
sns.lineplot(data=mean_distance_per_year_event, x='year', y='hit_distance', hue='event_result', marker='o')
plt.title('Average Hitting Distance Over the Years')
plt.xlabel('Year')
plt.ylabel('Average Hitting Distance')
plt.grid(True)
plt.show()

# Conclusion
print("Mean Hitting Distance per Year:")
print(mean_distance_per_year)
Mean Hitting Distance per Year:
year
2015 373.854976
2016 387.742645
2017 292.457418
2018 358.723862
2019 307.871089
Name: hit_distance, dtype: float64

#visualizing for home_runs

home_run_data = data[data['event_result'] == 'home_run']

# Group by year and calculate mean hitting distance for home runs
mean_distance_per_year_home_run = home_run_data.groupby('year')['hit_distance'].mean().reset_index()

# Visualize hitting distance over the years for home runs

plt.figure(figsize=(14, 8))
sns.lineplot(data=mean_distance_per_year_home_run, x='year', y='hit_distance', marker='o')
plt.title('Average Hitting Distance Over the Years for Home Runs')
plt.xlabel('Year')
plt.ylabel('Average Hitting Distance')
plt.grid(True)
plt.show()

There is no much variation in the hitting distance when it comes to home_run where drag coefficient plays a tremendous role as the
projectile's sensitivity to air resistance will be significant.
Still will try to use a statistical test to confirm on the same as for a double_pay, there is a big difference.

#visualizing for home_runs

double_play_data = data[data['event_result'] == 'double_play']

# Group by year and calculate mean hitting distance for home runs
mean_distance_per_year_double_play = double_play_data.groupby('year')['hit_distance'].mean().reset_index()

# Visualize hitting distance over the years for home runs

plt.figure(figsize=(14, 8))
sns.lineplot(data=mean_distance_per_year_double_play, x='year', y='hit_distance', marker='o')
plt.title('Average Hitting Distance Over the Years for Double_Play')
plt.xlabel('Year')
plt.ylabel('Average Hitting Distance')
plt.grid(True)
plt.show()
We can see a huge significant difference which might be a cause of data imbalance, hence we go for statistic test.

mean_distance_per_year = data.groupby('year')['hit_distance'].mean()
variance_distance_per_year = data.groupby('year')['hit_distance'].var()

# Perform ANOVA test to see if there are significant differences between years
years = data['year'].unique()
grouped_distances = [data[data['year'] == year]['hit_distance'].values for year in years]
f_value, p_value = stats.f_oneway(*grouped_distances)
print("ANOVA test results: F-value =", f_value, ", P-value =", p_value)

# Conclusion
if p_value < 0.05:
print("There is a significant difference in hitting distances over the years, suggesting that the ball might have
else:
print("There is no significant difference in hitting distances over the years, suggesting that the ball might not

print("\nMean Hitting Distance per Year for Home Runs:")

print(mean_distance_per_year_home_run)

ANOVA test results: F-value = 3.1858058293520846 , P-value = 0.012728353516836105

There is a significant difference in hitting distances over the years, suggesting that the ball might have chan
ged.

Mean Hitting Distance per Year for Home Runs:

year hit_distance
0 2015 408.342609
1 2016 394.806732
2 2017 405.576999
3 2018 397.460663
4 2019 401.684965

We can see that the test concludes there is a significant change overall, yet will check for separate event types.

event_type = 'home_run'
event_data = data[data['event_result'] == event_type]
mean_distance_per_year_event = event_data.groupby('year')['hit_distance'].mean().reset_index()

# Statistical Analysis: Hypothesis Testing

# Calculate mean and variance of hitting distances per year for the specific event
mean_distance_per_year = event_data.groupby('year')['hit_distance'].mean()
variance_distance_per_year = event_data.groupby('year')['hit_distance'].var()

# Perform ANOVA test to see if there are significant differences between years for the specific event
years = event_data['year'].unique()
grouped_distances = [event_data[event_data['year'] == year]['hit_distance'].values for year in years]
f_value, p_value = stats.f_oneway(*grouped_distances)
print(f"ANOVA test results for {event_type}: F-value =", f_value, ", P-value =", p_value)

# Conclusion
if p_value < 0.05:
print(f"There is a significant difference in hitting distances over the years for {event_type}, suggesting that t
else:
print(f"There is no significant difference in hitting distances over the years for {event_type}, suggesting that

ANOVA test results for home_run: F-value = 1.8153742795351 , P-value = 0.12679220333038124

There is no significant difference in hitting distances over the years for home_run, suggesting that the ball m
ight not have changed.

We can infer that for home runs, there is no significant variation which mean the ball has not varied.

event_type = 'triple'
event_data = data[data['event_result'] == event_type]
mean_distance_per_year_event = event_data.groupby('year')['hit_distance'].mean().reset_index()

# Statistical Analysis: Hypothesis Testing

ANOVA test results for triple: F-value = 2.370830064743944 , P-value = 0.07853673370434733

There is no significant difference in hitting distances over the years for triple, suggesting that the ball mig
ht not have changed.

For a triple which needs to be hit with a better vertical angle and power travelling through the air also suggests there isn't any
significant change.

Let's calculate the drag coefficient and try to check the variation of the ball.
# Constants
g = 32.174 # gravity in ft/s^2

# Function to calculate drag coefficient with additional variables

def calculate_drag_coefficient(v_exit, theta, distance, spinrate, release_speed, plate_speed):
return (2 * g * (release_speed - plate_speed) * (1 + 0.5 * spinrate) / v_exit**2) * (distance / np.sin(2 * np

# Calculate drag coefficient for each row

data['drag_coefficient'] = calculate_drag_coefficient(data['hit_exit_speed'], data['hit_vertical_angle'], data['hit_d
data['hit_spinrate'], data['release_speed'], data['plate_speed'

data.head()

year month pitcher_throws bat_side pitch_type release_speed plate_speed hit_exit_speed hit_spinrate hit_vertical_angle hit_bearing hit_d

0 2016 7 R R FF 93.433688 85.791840 101.387283 1954.304443 25.563499 -22.539516

1 2016 5 L R FT 89.341958 82.691620 94.986938 5588.018066 60.409538 -46.960789

2 2016 4 R L FF 91.367354 84.554413 80.617020 2264.892334 30.243307 39.408298

4 2016 7 R R FF 91.033388 84.686417 104.878571 1015.863892 12.043263 1.585894

7 2016 9 R L FF 89.689889 82.036316 90.263031 4674.958008 44.270741 -4.610770

data.head()

year month pitcher_throws bat_side pitch_type release_speed plate_speed hit_exit_speed hit_spinrate hit_vertical_angle hit_bearing hit_d

0 2016 7 R R FF 93.433688 85.791840 101.387283 1954.304443 25.563499 -22.539516

1 2016 5 L R FT 89.341958 82.691620 94.986938 5588.018066 60.409538 -46.960789

2 2016 4 R L FF 91.367354 84.554413 80.617020 2264.892334 30.243307 39.408298

4 2016 7 R R FF 91.033388 84.686417 104.878571 1015.863892 12.043263 1.585894

7 2016 9 R L FF 89.689889 82.036316 90.263031 4674.958008 44.270741 -4.610770

import pandas as pd
import numpy as np
from scipy import stats

# Assuming 'data' is your DataFrame with columns 'year' and 'drag_coefficient'

# Calculate mean and variance of drag coefficient per year
mean_drag_coefficient_per_year = data.groupby('year')['drag_coefficient'].mean()
variance_drag_coefficient_per_year = data.groupby('year')['drag_coefficient'].var()

# Perform ANOVA test to see if there are significant differences between years
years = data['year'].unique()
grouped_drag_coefficients = [data[data['year'] == year]['drag_coefficient'].values for year in years]
f_value, p_value = stats.f_oneway(*grouped_drag_coefficients)
print("ANOVA test results: F-value =", f_value, ", P-value =", p_value)

# Conclusion based on P-value

alpha = 0.05
if p_value < alpha:
print("There is a significant difference in drag coefficients over the years.")
else:
print("There is no significant difference in drag coefficients over the years.")

# Print mean drag coefficient per year (optional)

print("\nMean Drag Coefficient per Year:")
print(mean_drag_coefficient_per_year)

ANOVA test results: F-value = 0.68306598982621 , P-value = 0.6036401459643094

There is no significant difference in drag coefficients over the years.

Mean Drag Coefficient per Year:

year
2015 34679.031673
2016 34814.657313
2017 25612.040079
2018 35963.735929
2019 49905.246035
Name: drag_coefficient, dtype: float64

The drag coefficient which we calculated also states that the ball has not varied over the years.

From the analysis and the statistical tests held, we can see at most cases it suggests the ball hasn't varied much and the cases
showing a change could possibly be a data imbalance which moight cause a bias. But on calculating the drag coefficient it is seen
that the ball has not varied.

Loading [MathJax]/jax/output/CommonHTML/fonts/TeX/fontdata.js

The Ultimate Guide To Prompt Engineering From Beginner To Expert Free Resources Hands-On Practice With Practical Examples (Yadav, Chandradev) (Z-Library)
100% (1)
The Ultimate Guide To Prompt Engineering From Beginner To Expert Free Resources Hands-On Practice With Practical Examples (Yadav, Chandradev) (Z-Library)
76 pages
FRA Milestone 1 Jupyter Notebook PDF
100% (3)
FRA Milestone 1 Jupyter Notebook PDF
42 pages
Lecture - 2 Classification (Machine Learning Basic and KNN)
No ratings yet
Lecture - 2 Classification (Machine Learning Basic and KNN)
94 pages
Sample Paper AI 2not
No ratings yet
Sample Paper AI 2not
16 pages
Machine Learning Mindmap PDF
100% (1)
Machine Learning Mindmap PDF
5 pages
Predicting Players Rating
No ratings yet
Predicting Players Rating
4 pages
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
No ratings yet
About The Dataset - Car Evaluation Dataset (UCI Machine Learning Repository
5 pages
Feature Engg Pre Processing Python
No ratings yet
Feature Engg Pre Processing Python
68 pages
3 Awesome Visualization Techniques For Every Dataset: Mlwhiz
No ratings yet
3 Awesome Visualization Techniques For Every Dataset: Mlwhiz
13 pages
Pattern Recognition
No ratings yet
Pattern Recognition
26 pages
The 5 Feature Selection Algorithms Every Data Scientist Should Know
No ratings yet
The 5 Feature Selection Algorithms Every Data Scientist Should Know
29 pages
Report
No ratings yet
Report
8 pages
Notes Unit 1
No ratings yet
Notes Unit 1
13 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
Beating The Odds: Learning To Bet On Soccer Matches Using Historical Data
No ratings yet
Beating The Odds: Learning To Bet On Soccer Matches Using Historical Data
7 pages
Review of Products Using Sentiment Analysis (4-2 Project Report) - 3
No ratings yet
Review of Products Using Sentiment Analysis (4-2 Project Report) - 3
75 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
6 - Data Pre-Processing-III
No ratings yet
6 - Data Pre-Processing-III
30 pages
Xplore Feature Engineering
No ratings yet
Xplore Feature Engineering
9 pages
Assignment 1 - LP1
No ratings yet
Assignment 1 - LP1
14 pages
Import As Import As From Import Import As Matplotlib Import Import
No ratings yet
Import As Import As From Import Import As Matplotlib Import Import
5 pages
Cost Practical
No ratings yet
Cost Practical
13 pages
Ip Practical File
No ratings yet
Ip Practical File
23 pages
Numpy and Pandas
No ratings yet
Numpy and Pandas
11 pages
CS550 Lec7-ClassificationIntro
No ratings yet
CS550 Lec7-ClassificationIntro
49 pages
Plagiarism Checker X - Report: Originality Assessment
No ratings yet
Plagiarism Checker X - Report: Originality Assessment
30 pages
Data-Engineering EINDE
No ratings yet
Data-Engineering EINDE
13 pages
Data Science Manual
No ratings yet
Data Science Manual
16 pages
02450ex Fall2017
No ratings yet
02450ex Fall2017
12 pages
Python Class 6 Assignment Solution
No ratings yet
Python Class 6 Assignment Solution
9 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Increasing Adoption Rates at Animal Shelters: A Two-Phase Approach To Predict Length of Stay and Optimal Shelter Allocation
No ratings yet
Increasing Adoption Rates at Animal Shelters: A Two-Phase Approach To Predict Length of Stay and Optimal Shelter Allocation
16 pages
Analysing NBA DATA
No ratings yet
Analysing NBA DATA
13 pages
Flowerformer: Empowering Neural Architecture Encoding Using A Flow-Aware Graph Transformer
No ratings yet
Flowerformer: Empowering Neural Architecture Encoding Using A Flow-Aware Graph Transformer
12 pages
Exemplar - Perform Feature Engineering
No ratings yet
Exemplar - Perform Feature Engineering
14 pages
Interim Layout
No ratings yet
Interim Layout
9 pages
Improving Facial Expression Recognition Through Data Preparation and Merging
No ratings yet
Improving Facial Expression Recognition Through Data Preparation and Merging
22 pages
Prediction of English Premier League Soccer Matches
No ratings yet
Prediction of English Premier League Soccer Matches
60 pages
Target-Oriented Time-Lapse Waveform Inversion Using Deep
No ratings yet
Target-Oriented Time-Lapse Waveform Inversion Using Deep
11 pages
Advancing Material Property Prediction Using Physics-Informed Machine Learning Models For Viscosity
No ratings yet
Advancing Material Property Prediction Using Physics-Informed Machine Learning Models For Viscosity
14 pages
GRP 5 Tan Yi Xuen
No ratings yet
GRP 5 Tan Yi Xuen
122 pages
Feature Selection - New
No ratings yet
Feature Selection - New
41 pages
CSY3025 Artificial Intelligence Techniques: Deep Learning
No ratings yet
CSY3025 Artificial Intelligence Techniques: Deep Learning
42 pages
Major Project Detailed Report
No ratings yet
Major Project Detailed Report
50 pages
Final Project Journal C4.5 Algorithm Decision Tree
No ratings yet
Final Project Journal C4.5 Algorithm Decision Tree
8 pages
AK Class X AI PreBoard1 Set A 2024-25
No ratings yet
AK Class X AI PreBoard1 Set A 2024-25
9 pages
CSL0777 L07fgfdg
No ratings yet
CSL0777 L07fgfdg
28 pages
02 Pneumonia - Detection - From - Chest - X-Rays - Using - The - CH
No ratings yet
02 Pneumonia - Detection - From - Chest - X-Rays - Using - The - CH
12 pages
Group 2 Final Project
No ratings yet
Group 2 Final Project
15 pages
CS F320 - Assignment II - Draft (Subject To A Few Changes in The Description of Problems)
No ratings yet
CS F320 - Assignment II - Draft (Subject To A Few Changes in The Description of Problems)
12 pages
FIERY
No ratings yet
FIERY
16 pages
IEEE Zeta Rho Chapter - Artificial Intelligence-Based Fault Detection and Localization For Underground Cables - Slides
No ratings yet
IEEE Zeta Rho Chapter - Artificial Intelligence-Based Fault Detection and Localization For Underground Cables - Slides
26 pages
Pilot Study Using Decision Trees To Diagnose The Efficacy of Virtual Offshore Egress Training
No ratings yet
Pilot Study Using Decision Trees To Diagnose The Efficacy of Virtual Offshore Egress Training
15 pages
Employee Performance Analysis
No ratings yet
Employee Performance Analysis
3 pages
Assignment1 LATEX
No ratings yet
Assignment1 LATEX
11 pages
Visualization Techiniques
No ratings yet
Visualization Techiniques
4 pages
ML 1
No ratings yet
ML 1
16 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
Deep Learning On Edge Computing Devices: Design Challenges of Algorithm and Architecture 1st Edition Xichuan Zhou - Ebook PDFinstant Download
100% (3)
Deep Learning On Edge Computing Devices: Design Challenges of Algorithm and Architecture 1st Edition Xichuan Zhou - Ebook PDFinstant Download
55 pages
5 Data Pre Processing III
No ratings yet
5 Data Pre Processing III
30 pages
Learning To Rewrite Prompts For Personalized Text Generation
No ratings yet
Learning To Rewrite Prompts For Personalized Text Generation
12 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Maxbox Starter140 Data Correlation Analysis
No ratings yet
Maxbox Starter140 Data Correlation Analysis
6 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Assignment Data Science Intern
No ratings yet
Assignment Data Science Intern
8 pages
Data Science Project - Flow Graph
No ratings yet
Data Science Project - Flow Graph
7 pages
4 - Basics in Statistics and Linear Algebra
No ratings yet
4 - Basics in Statistics and Linear Algebra
7 pages
Drawing DeepSeek R1 Architecture and Training Process From Scratch - by Fareed Khan - Feb, 2025 - Level Up Coding
No ratings yet
Drawing DeepSeek R1 Architecture and Training Process From Scratch - by Fareed Khan - Feb, 2025 - Level Up Coding
39 pages
Sms Spam Filtering System Hybrid Approaches
No ratings yet
Sms Spam Filtering System Hybrid Approaches
25 pages
Project Report Format (2024-25)
No ratings yet
Project Report Format (2024-25)
35 pages
Dream Team
No ratings yet
Dream Team
4 pages
Feature Extraction and Dimensionality Reduction - 2
No ratings yet
Feature Extraction and Dimensionality Reduction - 2
75 pages
Dav Week8 240953580
No ratings yet
Dav Week8 240953580
15 pages
ML Lab A1 A4
No ratings yet
ML Lab A1 A4
6 pages
Predicting Baseball Wins Using Machine Learning
No ratings yet
Predicting Baseball Wins Using Machine Learning
3 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
Experimenting With Data Analysis Packages and Statistical Operations
No ratings yet
Experimenting With Data Analysis Packages and Statistical Operations
18 pages
Report
No ratings yet
Report
25 pages
Feature Selection
No ratings yet
Feature Selection
53 pages
Fingerprinting Attack On Tor Anonymity U
No ratings yet
Fingerprinting Attack On Tor Anonymity U
6 pages
Fantasy Sports Prediction Clustering Analysis
No ratings yet
Fantasy Sports Prediction Clustering Analysis
21 pages
INeuron ML Practical Assignments
No ratings yet
INeuron ML Practical Assignments
14 pages
AMT305SYLLABUS
No ratings yet
AMT305SYLLABUS
16 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
MLB Bayesian Hierarchies Dino Pymc Jupyter Notebook
No ratings yet
MLB Bayesian Hierarchies Dino Pymc Jupyter Notebook
14 pages
Module 2
No ratings yet
Module 2
12 pages
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
From Everand
(Excerpts From) Investigating Performance: Design and Outcomes With Xapi
Janet Laane Effron
No ratings yet
Process Performance Models: Statistical, Probabilistic & Simulation
From Everand
Process Performance Models: Statistical, Probabilistic & Simulation
Vishnuvarthanan Moorthy
No ratings yet
Core Concepts in Statistical Learning
From Everand
Core Concepts in Statistical Learning
Tushar Gulati
No ratings yet
Database schema Standard Requirements
From Everand
Database schema Standard Requirements
Gerardus Blokdyk
No ratings yet