0% found this document useful (0 votes)
2 views

Machine learning algorithms are generally categorized into three main types

The document provides a comprehensive overview of statistical measures, probability distributions, hypothesis testing, correlation, feature selection metrics, and machine learning algorithms. It includes Python code snippets for calculating mean, median, variance, and performing various statistical tests, as well as examples of supervised and unsupervised learning algorithms. Key concepts such as mutual information, entropy, and information gain in decision trees are also discussed.

Uploaded by

arunkumar799392
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views

Machine learning algorithms are generally categorized into three main types

The document provides a comprehensive overview of statistical measures, probability distributions, hypothesis testing, correlation, feature selection metrics, and machine learning algorithms. It includes Python code snippets for calculating mean, median, variance, and performing various statistical tests, as well as examples of supervised and unsupervised learning algorithms. Key concepts such as mutual information, entropy, and information gain in decision trees are also discussed.

Uploaded by

arunkumar799392
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 7

📌 1️⃣ Measures of Central Tendency (Mean, Median, Mode)

python

CopyEdit

import numpy as np

import pandas as pd

from scipy import stats

# Sample Data

data = [10, 15, 12, 18, 15, 21, 25, 10, 30, 15]

# Mean

mean_value = np.mean(data)

print("Mean:", mean_value)

# Median

median_value = np.median(data)

print("Median:", median_value)

# Mode

mode_value = stats.mode(data, keepdims=True)

print("Mode:", mode_value.mode[0])

📌 2️⃣ Measures of Dispersion (Variance, Standard Deviation, Range, IQR)

python

CopyEdit

# Variance

variance_value = np.var(data, ddof=1) # Sample variance

print("Variance:", variance_value)

# Standard Deviation

std_dev = np.std(data, ddof=1)


print("Standard Deviation:", std_dev)

# Range

range_value = max(data) - min(data)

print("Range:", range_value)

# Interquartile Range (IQR)

q1 = np.percentile(data, 25)

q3 = np.percentile(data, 75)

iqr_value = q3 - q1

print("IQR:", iqr_value)

📌 3️⃣ Probability Distributions

Normal Distribution (Gaussian)

python

CopyEdit

import matplotlib.pyplot as plt

import scipy.stats as stats

# Generate data with normal distribution

mu, sigma = 0, 1 # Mean and Standard Deviation

normal_data = np.random.normal(mu, sigma, 1000)

# Plot normal distribution

plt.hist(normal_data, bins=30, density=True, alpha=0.6, color='g')

# Plot theoretical normal curve

xmin, xmax = plt.xlim()

x = np.linspace(xmin, xmax, 100)

p = stats.norm.pdf(x, mu, sigma)

plt.plot(x, p, 'k', linewidth=2)


plt.title("Normal Distribution")

plt.show()

Binomial Distribution

python

CopyEdit

# Binomial Distribution Example: 10 trials, 0.5 probability of success

n, p = 10, 0.5

binom_data = np.random.binomial(n, p, 1000)

# Plot histogram

plt.hist(binom_data, bins=10, density=True, alpha=0.6, color='b')

plt.title("Binomial Distribution")

plt.show()

📌 4️⃣ Hypothesis Testing

T-Test (Comparing Two Groups)

python

CopyEdit

# Generate two random samples

group1 = np.random.normal(50, 10, 30)

group2 = np.random.normal(55, 10, 30)

# Perform t-test

t_stat, p_value = stats.ttest_ind(group1, group2)

print("T-Test Statistic:", t_stat)

print("P-Value:", p_value)

Chi-Square Test (Categorical Variables)

python

CopyEdit
# Contingency Table

observed = np.array([[10, 20, 30], [6, 9, 17]])

# Perform Chi-Square Test

chi2, p, dof, expected = stats.chi2_contingency(observed)

print("Chi-Square Value:", chi2)

print("P-Value:", p)

📌 5️⃣ Correlation & Covariance

python

CopyEdit

# Generate two random datasets

x = np.random.rand(10)

y = np.random.rand(10)

# Correlation Coefficient

correlation = np.corrcoef(x, y)[0, 1]

print("Correlation Coefficient:", correlation)

# Covariance

covariance = np.cov(x, y)[0, 1]

print("Covariance:", covariance)

📌 6️⃣ Feature Selection Metrics

Mutual Information

python

CopyEdit

from sklearn.feature_selection import mutual_info_classif

# Example dataset

X = np.array([[1], [2], [3], [4], [5]])


y = np.array([0, 1, 0, 1, 0])

# Compute Mutual Information

mi_score = mutual_info_classif(X, y)

print("Mutual Information Score:", mi_score[0])

📌 7️⃣ Entropy & Information Gain (Decision Trees)

python

CopyEdit

from sklearn.tree import DecisionTreeClassifier

# Sample Data

X = np.array([[0], [1], [2], [3], [4], [5]])

y = np.array([0, 1, 0, 1, 0, 1])

# Train Decision Tree

clf = DecisionTreeClassifier(criterion='entropy')

clf.fit(X, y)

# Get Information Gain for each feature

feature_importance = clf.feature_importances_

print("Information Gain:", feature_importance)

🚀 Summary of Implemented Concepts

✅ Mean, Median, Mode


✅ Variance, Standard Deviation, Range, IQR
✅ Normal & Binomial Distributions
✅ T-Test & Chi-Square Test
✅ Correlation & Covariance
✅ Mutual Information & Information Gain

Machine learning algorithms are generally categorized into three main types:

1️⃣ Supervised Learning


In supervised learning, the algorithm is trained on labeled data, meaning each training example has
an input and a corresponding correct output.

🔹 Regression Algorithms (For predicting continuous values)

 Linear Regression

 Polynomial Regression

 Decision Tree Regression

 Random Forest Regression

 Support Vector Regression (SVR)

# Importing regression models

from sklearn.linear_model import LinearRegression

from sklearn.tree import DecisionTreeRegressor

from sklearn.ensemble import RandomForestRegressor

from sklearn.svm import SVR

# Initializing models with key parameters

linear_model = LinearRegression(fit_intercept=True, normalize='deprecated') #


fit_intercept=True means adding bias term

tree_model = DecisionTreeRegressor(criterion='mse', max_depth=5, min_samples_split=2)

forest_model = RandomForestRegressor(n_estimators=100, max_depth=10,


min_samples_split=2, random_state=42)

svr_model = SVR(kernel='rbf', C=1.0, epsilon=0.1) # C is the regularization parameter

🔹 Classification Algorithms (For predicting discrete categories)

 Logistic Regression

 K-Nearest Neighbors (KNN)

 Decision Tree

 Random Forest

 Support Vector Machine (SVM)

 Naïve Bayes

 Neural Networks

# Importing classification models

from sklearn.linear_model import LogisticRegression

from sklearn.neighbors import KNeighborsClassifier


from sklearn.tree import DecisionTreeClassifier

from sklearn.ensemble import RandomForestClassifier

from sklearn.svm import SVC

from sklearn.naive_bayes import GaussianNB

# Initializing models with key parameters

logistic_model = LogisticRegression(penalty='l2', C=1.0, solver='lbfgs', max_iter=1000)

knn_model = KNeighborsClassifier(n_neighbors=5, metric='minkowski', p=2) # p=2 is for


Euclidean distance

tree_classifier = DecisionTreeClassifier(criterion='gini', max_depth=5, min_samples_split=2)

forest_classifier = RandomForestClassifier(n_estimators=100, max_depth=10,


min_samples_split=2, random_state=42)

svm_classifier = SVC(kernel='rbf', C=1.0, gamma='scale', probability=True)

naive_bayes = GaussianNB(var_smoothing=1e-9)

2️⃣ Unsupervised Learning

In unsupervised learning, the algorithm is trained on unlabeled data and tries to find hidden
patterns.

🔹 Clustering Algorithms (For grouping similar data points)

 K-Means

 Hierarchical Clustering

 DBSCAN (Density-Based Clustering)

# Importing clustering models

from sklearn.cluster import KMeans, DBSCAN

# Initializing models with key parameters

kmeans_model = KMeans(n_clusters=3, init='k-means++', max_iter=300, n_init=10,


random_state=42)

dbscan_model = DBSCAN(eps=0.5, min_samples=5, metric='euclidean')

You might also like