0% found this document useful (0 votes)

33 views5 pages

Unit1 ML Programs

The document provides code snippets for performing various data preprocessing and analysis techniques in Python. These include handling missing values and outliers, calculating correlation matrices, identifying skewed distributions and applying transformations, dimensionality reduction using PCA, and feature selection with RFE. The code examples demonstrate how to load datasets, apply preprocessing/analysis algorithms, and visualize results.

Uploaded by

diroja5648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

33 views5 pages

Unit1 ML Programs

Uploaded by

diroja5648

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 5

unit1

December 23, 2023

[ ]: #9)Take a dataset with missing values and outliers and perform data␣
↪preprocessing steps such as imputation, outlier treatment, and normalization␣

↪in Python.

import pandas as pd
import numpy as np
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
from sklearn.neighbors import LocalOutlierFactor

# importing DataSet
df = pd.read_csv("dataset.csv")

# Handling missing values

imp = SimpleImputer(missing_values=np.nan, strategy='mean')
df_imp = pd.DataFrame(imp.fit_transform(df))
df_imp.columns = df.columns
df_imp.index = df.index

# Handling outliers
clf = LocalOutlierFactor(n_neighbors=20, contamination='auto')
y_pred = clf.fit_predict(df_imp)

# Boolean mask for inliers

mask = y_pred != -1

# Creating new DataFrame for inliers

df_inliers = df_imp[mask]

# Handling normalization
scaler = StandardScaler()
df_normalized = pd.DataFrame(scaler.fit_transform(df_inliers))
df_normalized.columns = df_inliers.columns
df_normalized.index = df_inliers.index

[ ]: #11)Given a dataset, calculate the correlation matrix and interpret the␣

↪relationships between different features.

import pandas as pd

1
# import dataset
df = pd.read_csv('dataset.csv')

# calculate the correlation matrix.

correlation_matrix = df.corr()
import seaborn as sns
import matplotlib.pyplot as plt

#visualization
plt.figure(figsize=(12, 8))
sns.heatmap(correlation_matrix, annot=True, cmap='coolwarm', fmt='.2f',␣
↪linewidths=.5)

plt.title('Correlation Matrix')
plt.show()

# interpreting relationships between different features

correlation_feature1_feature2 = correlation_matrix.loc['Feature1', 'Feature2']
print(f"Correlation between Feature1 and Feature2:␣
↪{correlation_feature1_feature2:.2f}")

[ ]: #12)Identify the presence of skewed distributions in a dataset and apply␣

↪suitable transformations (e.g., log transformation) to make the data more␣

↪normally distributed.

import matplotlib.pyplot as plt

import seaborn as sns
import numpy as np
from scipy.stats import skew

# Calculate skewness
skewness = df.apply(lambda x: skew(x))

# Plot histograms
sns.histplot(df['Feature1'], kde=True)
plt.title('Histogram of Feature1')
plt.show()

# log transformation to feature

df['Transformed_Feature1'] = np.log1p(df['Feature1'])

transformed_skewness = np.log1p(df['Transformed_Feature1']).skew()
print(f"Skewness after log transformation: {transformed_skewness}")

# Visualize transformed feature

sns.histplot(df['Transformed_Feature1'], kde=True)
plt.title('Histogram of Transformed_Feature1')
plt.show()

2
# Applying log transformations to other features
df['Transformed_Feature2'] = np.log1p(df['Feature2'])

# plot overall distribution

sns.histplot(df['Transformed_Feature1'], kde=True)
sns.histplot(df['Transformed_Feature2'], kde=True)
plt.title('Transformed Distributions')
plt.show()

[ ]: #13)Create a scatter plot matrix (pair plot) for a multi-dimensional dataset␣

↪and analyze the relationships between different pairs of features.

#importing libraries
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load your dataset into a pandas DataFrame

df = pd.read_csv('dataset.csv')

# Create a pair plot

sns.pairplot(df)
plt.show()

# Analyze specific relationships

sns.scatterplot(x='Feature1', y='Feature2', data=df)
plt.title('Scatter Plot of Feature1 vs Feature2')
plt.show()

sns.scatterplot(x='Feature3', y='Feature4', data=df)

plt.title('Scatter Plot of Feature3 vs Feature4')
plt.show()

[ ]: #14)Generate a box plot or violin plot to visualize the distribution of a␣

↪numeric attribute for different categories in the dataset.

import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt

# Load your dataset into a pandas DataFrame

df = pd.read_csv('your_dataset.csv')

# Choose a numeric attribute and a categorical attribute for visualization

numeric_attribute = 'NumericAttribute'
categorical_attribute = 'CategoricalAttribute'

# Box Plot

3
plt.figure(figsize=(10, 6))
sns.boxplot(x=categorical_attribute, y=numeric_attribute, data=df)
plt.title(f'Box Plot of {numeric_attribute} for {categorical_attribute}')
plt.show()

# Violin Plot
plt.figure(figsize=(10, 6))
sns.violinplot(x=categorical_attribute, y=numeric_attribute, data=df)
plt.title(f'Violin Plot of {numeric_attribute} for {categorical_attribute}')
plt.show()

[ ]: #15)Select a dataset with a large number of features and apply dimensionality␣

↪reduction techniques (e.g., PCA - Principal Component Analysis) to reduce␣

↪the number of features.

import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the Iris dataset

iris = load_iris()
iris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)
target = iris.target

# Standardize
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
iris_scaled = scaler.fit_transform(iris_df)

# Apply PCA to reduce dimensionality

pca = PCA(n_components=2) # You can choose the number of components
iris_pca = pca.fit_transform(iris_scaled)

# Create a DataFrame with the principal components

pca_df = pd.DataFrame(data=iris_pca, columns=['PC1', 'PC2'])
pca_df['Target'] = target

# Visualize
plt.figure(figsize=(10, 8))
sns.scatterplot(x='PC1', y='PC2', hue='Target', data=pca_df, palette='viridis',␣
↪s=70)

plt.title('PCA - Principal Component Analysis')

plt.show()

[ ]: #16)Implement a feature selection algorithm (e.g., Recursive Feature␣

↪Elimination) to choose the most relevant features from a dataset.

import pandas as pd

4
from sklearn.datasets import load_iris
from sklearn.feature_selection import RFE
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import train_test_split
from sklearn.metrics import accuracy_score

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,␣
↪random_state=42)

# classifier for feature ranking

clf = RandomForestClassifier(n_estimators=100, random_state=42)

# Create the RFE model and select the number of features to keep
num_features_to_keep = 2
rfe = RFE(estimator=clf, n_features_to_select=num_features_to_keep)
X_train_rfe = rfe.fit_transform(X_train, y_train)
X_test_rfe = rfe.transform(X_test)

# Fit a classifier on the selected features

clf.fit(X_train_rfe, y_train)

# Make predictions and evaluate accuracy on the test set

y_pred = clf.predict(X_test_rfe)
accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy with {num_features_to_keep} features: {accuracy:.2f}')

# Get the selected feature indices

selected_feature_indices = [i for i, selected in enumerate(rfe.support_) if␣
↪selected]

selected_features = iris.feature_names[selected_feature_indices]
print('Selected features:', selected_features)

(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Basics of Statistics and Probability MCQs
No ratings yet
Basics of Statistics and Probability MCQs
5 pages
Machine Learning Algorithms PDF
100% (1)
Machine Learning Algorithms PDF
148 pages
Kendall Rank Correlation Coefficient
100% (1)
Kendall Rank Correlation Coefficient
4 pages
ML Lab
No ratings yet
ML Lab
14 pages
UNITIV BtechIot
No ratings yet
UNITIV BtechIot
43 pages
M PDF
No ratings yet
M PDF
13 pages
ML 3
No ratings yet
ML 3
24 pages
ML Short Code - Under Updating
No ratings yet
ML Short Code - Under Updating
4 pages
External
No ratings yet
External
11 pages
Roll NO 2020
No ratings yet
Roll NO 2020
8 pages
20BCP021 Assignment 6
No ratings yet
20BCP021 Assignment 6
15 pages
ModuleAr Merged
No ratings yet
ModuleAr Merged
42 pages
MLLab Manual
No ratings yet
MLLab Manual
24 pages
Code Shabab Error 7
No ratings yet
Code Shabab Error 7
5 pages
ML Labmanual
No ratings yet
ML Labmanual
33 pages
ML Lab - Exp1-10
No ratings yet
ML Lab - Exp1-10
4 pages
Strangers
No ratings yet
Strangers
8 pages
ML Lab File
No ratings yet
ML Lab File
43 pages
Final ML File
No ratings yet
Final ML File
34 pages
1
No ratings yet
1
13 pages
Machine Learning Laboratory
No ratings yet
Machine Learning Laboratory
23 pages
Mlalllabprgs
No ratings yet
Mlalllabprgs
17 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
Advance Python
No ratings yet
Advance Python
5 pages
To Study About Numpy, Pandas and Matplotlib Libraries in Python
No ratings yet
To Study About Numpy, Pandas and Matplotlib Libraries in Python
21 pages
Advertising in ML
No ratings yet
Advertising in ML
9 pages
EDA Plots Code
No ratings yet
EDA Plots Code
13 pages
ML - Lab Manual
No ratings yet
ML - Lab Manual
54 pages
MACHINE LEARNING Manual
No ratings yet
MACHINE LEARNING Manual
36 pages
Data Science Libraries
No ratings yet
Data Science Libraries
4 pages
ML Assignment 01 Code
No ratings yet
ML Assignment 01 Code
21 pages
ML Lab Manual Completed
No ratings yet
ML Lab Manual Completed
56 pages
DA Programs
No ratings yet
DA Programs
44 pages
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
No ratings yet
Asset-V1 VIT+MBA109+2020+type@asset+block@Introductio To ML Using Python
7 pages
ML Manual
No ratings yet
ML Manual
30 pages
Principal Component Analysis Notes : Info
No ratings yet
Principal Component Analysis Notes : Info
22 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
9 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
18 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
Machine Learning
No ratings yet
Machine Learning
30 pages
Data Preprocessing Example Programs1
No ratings yet
Data Preprocessing Example Programs1
9 pages
Data Science
No ratings yet
Data Science
15 pages
Python Cheat Sheet For Data Analysis
No ratings yet
Python Cheat Sheet For Data Analysis
2 pages
Cheat Sheet Modeldeploy
No ratings yet
Cheat Sheet Modeldeploy
2 pages
ML Self Unit 2
No ratings yet
ML Self Unit 2
20 pages
Machine Learning Lab Manual
No ratings yet
Machine Learning Lab Manual
26 pages
Lecture Material 10
No ratings yet
Lecture Material 10
9 pages
Set-C AnsKey CT2
No ratings yet
Set-C AnsKey CT2
10 pages
ML LAB - Principal Component Analysis
No ratings yet
ML LAB - Principal Component Analysis
3 pages
ML Assignment
No ratings yet
ML Assignment
34 pages
Lecture Material 3
No ratings yet
Lecture Material 3
7 pages
Experiment 1
No ratings yet
Experiment 1
19 pages
MDS372 Lab4 2448001
No ratings yet
MDS372 Lab4 2448001
17 pages
Ashwin Report
No ratings yet
Ashwin Report
18 pages
Featureselection
No ratings yet
Featureselection
11 pages
Data Pre Processing
No ratings yet
Data Pre Processing
2 pages
CQF June 2021 M4L4 Solutions
No ratings yet
CQF June 2021 M4L4 Solutions
14 pages
Week 1 Get Familier With Jupyter Notebook
No ratings yet
Week 1 Get Familier With Jupyter Notebook
4 pages
Northbay Summarizes Data Pre-Processing Algorithms
No ratings yet
Northbay Summarizes Data Pre-Processing Algorithms
10 pages
Computer Engineering Laboratory Solution Primer
From Everand
Computer Engineering Laboratory Solution Primer
Karan Bhandari
No ratings yet
TensorFlow深度学习项目实战: Chinese Edition
From Everand
TensorFlow深度学习项目实战: Chinese Edition
Posts & Telecom Press
No ratings yet
Module 2 PDF
No ratings yet
Module 2 PDF
14 pages
Advanced Business Quantitative Methods L4: Reinhold Kamati, PHD
No ratings yet
Advanced Business Quantitative Methods L4: Reinhold Kamati, PHD
7 pages
3-Polynomial Regression Using Python
No ratings yet
3-Polynomial Regression Using Python
14 pages
Economtric 2 Eqution
No ratings yet
Economtric 2 Eqution
64 pages
Chapter II
No ratings yet
Chapter II
71 pages
L8 - Skewness
No ratings yet
L8 - Skewness
7 pages
2013 Abstracts
No ratings yet
2013 Abstracts
131 pages
Friedman's Two-Way Analysis of Variance by Ranks
No ratings yet
Friedman's Two-Way Analysis of Variance by Ranks
5 pages
Basic Statistics: Formulas Sheet For
No ratings yet
Basic Statistics: Formulas Sheet For
7 pages
Untitled
No ratings yet
Untitled
11 pages
Clinical Applications Fourth Edition Revised and Expanded Drugs and The Pharmaceutical Sciences 1830408
100% (4)
Clinical Applications Fourth Edition Revised and Expanded Drugs and The Pharmaceutical Sciences 1830408
53 pages
Result
No ratings yet
Result
10 pages
Chapter 3 Forecasting
100% (1)
Chapter 3 Forecasting
87 pages
Assignment
No ratings yet
Assignment
9 pages
Multicollinearity and Remedies
No ratings yet
Multicollinearity and Remedies
23 pages
Portfolio Risk and Return - Part II Risk
No ratings yet
Portfolio Risk and Return - Part II Risk
39 pages
Notes For Correlation Unit - 3 Business Statistics
No ratings yet
Notes For Correlation Unit - 3 Business Statistics
21 pages
1 s2.0 S1877050918315163 Main PDF
No ratings yet
1 s2.0 S1877050918315163 Main PDF
11 pages
UNIT 4-STATISTICAL INFERENCE BM
No ratings yet
UNIT 4-STATISTICAL INFERENCE BM
7 pages
Central Tendency and Variability
No ratings yet
Central Tendency and Variability
41 pages
g03 Bergonio g05 Fabul
No ratings yet
g03 Bergonio g05 Fabul
25 pages
Capstone Notes-1
No ratings yet
Capstone Notes-1
18 pages
Ecc 3014 Lecture 1 - Introduction To Statistics
No ratings yet
Ecc 3014 Lecture 1 - Introduction To Statistics
19 pages
Module No. 12 Title: Pearson R and Spearman Rho: 1. The Coefficient of Correlation 2. Rank Correlation
100% (1)
Module No. 12 Title: Pearson R and Spearman Rho: 1. The Coefficient of Correlation 2. Rank Correlation
14 pages
Co Stat
No ratings yet
Co Stat
19 pages
Estimating Population Variances
No ratings yet
Estimating Population Variances
17 pages
Apl 103 Hardware Project
No ratings yet
Apl 103 Hardware Project
2 pages
Data Science Lab
No ratings yet
Data Science Lab
2 pages

Unit1 ML Programs

Uploaded by

Unit1 ML Programs

Uploaded by

unit1

December 23, 2023

# Handling missing values

# Boolean mask for inliers

# Creating new DataFrame for inliers

[ ]: #11)Given a dataset, calculate the correlation matrix and interpret the␣

# calculate the correlation matrix.

# interpreting relationships between different features

[ ]: #12)Identify the presence of skewed distributions in a dataset and apply␣

import matplotlib.pyplot as plt

# log transformation to feature

# Visualize transformed feature

# plot overall distribution

[ ]: #13)Create a scatter plot matrix (pair plot) for a multi-dimensional dataset␣

# Load your dataset into a pandas DataFrame

# Create a pair plot

# Analyze specific relationships

sns.scatterplot(x='Feature3', y='Feature4', data=df)

[ ]: #14)Generate a box plot or violin plot to visualize the distribution of a␣

# Load your dataset into a pandas DataFrame

# Choose a numeric attribute and a categorical attribute for visualization

[ ]: #15)Select a dataset with a large number of features and apply dimensionality␣

↪the number of features.

# Load the Iris dataset

# Apply PCA to reduce dimensionality

# Create a DataFrame with the principal components

plt.title('PCA - Principal Component Analysis')

[ ]: #16)Implement a feature selection algorithm (e.g., Recursive Feature␣

# Load the Iris dataset

# Split the dataset into training and testing sets

# classifier for feature ranking

# Fit a classifier on the selected features

# Make predictions and evaluate accuracy on the test set

# Get the selected feature indices

You might also like