0% found this document useful (0 votes)

41 views26 pages

TYCS Practical

Data science involves analyzing data to extract useful information and insights. Common techniques include wrangling, preprocessing, modeling, and visualizing data. This document discusses concepts like regression, clustering, principal component analysis and how to apply them in Python.

Uploaded by

latestfullmovies74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

41 views26 pages

TYCS Practical

Uploaded by

latestfullmovies74

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 26

Data science

Vikas College Of Arts, Science and Commerce Page 1

INDEX

Sr
Title Date Sign
No

1 Introduction to Excel

2 Data Frames and Basic Data Pre-processing

3 Feature Scaling and Dummification

4 Hypothesis Testing

5 ANOVA (Analysis of Variance)

6 Regression and Its Types

7 Logistic Regression and Decision Tree

8 K-Means Clustering

9 Principal Component Analysis (PCA)

10 Data Visualization and Storytelling

Vikas College Of Arts, Science and Commerce Page 2

PRACTICAL 1
Introduction to Excel
A. Perform conditional formatting on a dataset using various criteria.

Steps
Step 1: Go to conditional formatting > Greater Than

Step 2: Enter the greater than filter value for example 2000.

Vikas College Of Arts, Science and Commerce Page 3

Step 3: Go to Data Bars > Solid Fill in conditional formatting.

B. Create a pivot table to analyse and summarize data.

Steps
Step 1: select the entire table and go to Insert tab PivotChart > Pivotchart Step 2:
Select “New worksheet” in the create pivot chart window.

Vikas College Of Arts, Science and Commerce Page 4

Step 3: Select and drag attributes in the below boxes.

C. Use VLOOKUP function to retrieve information from a different worksheet or table. Steps:
Step 1: click on an empty cell and type the following command.
=VLOOKUP(B3, B3:D3,1, TRUE)

Vikas College Of Arts, Science and Commerce Page 5

D. Perform what-if analysis using Goal Seek to determine input values for desired output.
Steps-
Step 1: In the Data tab go to the what if analysis>Goal seek.

Step 2: Fill the information in the window accordingly and click ok.

Vikas College Of Arts, Science and Commerce Page 6

Vikas College Of Arts, Science and Commerce Page 7
PRACTICAL 2
Data Frames and Basic Data Pre-processing
A. Read data from CSV and JSON files into a data frame.
B. Perform basic data pre-processing tasks such as handling missing values and outliers. Code:
import pandas as pd

# Reading CSV file into DataFrame

df = pd.read_csv("samp.csv")
print("Our dataset:")
print(df)

# Reading JSON file into DataFrame

data = pd.read_json("sample.json")
print(data)

# Displaying the first 10 rows of the DataFrame

df.head(10)

# Filling missing values with 0

print("Dataset after filling NA values with 0:")
df2 = df.fillna(value=0)
print(df2)

# Dropping rows with any missing values

print("Dataset after dropping NA values:")
df.dropna(inplace=True)
print(df)

Vikas College Of Arts, Science and Commerce Page 8

C. Manipulate and transform data using functions like filtering, sorting, and grouping Code:
import pandas as pd

# Reading CSV file into DataFrame

df = pd.read_csv("samp.csv")

# Filtering data based on a condition (e.g., age greater than 25)

filtered_df = df[df["age"] > 25]

# Sorting data based on a column (e.g., sorting by age in descending order)

sorted_df = df.sort_values(by="age", ascending=False)

# Grouping data based on a column and applying an aggregation function (e.g., finding the average age per
city)
grouped_df = df.groupby("city").agg({"age": "mean"})

# Displaying the filtered DataFrame

print("Filtered DataFrame:")
print(filtered_df)

# Displaying the sorted DataFrame

print("\nSorted DataFrame:")
print(sorted_df)

# Displaying the grouped DataFrame

print("\nGrouped DataFrame:")
print(grouped_df)

Vikas College Of Arts, Science and Commerce Page 9

PRACTICAL 3
Feature Scaling and Dummification
A. Apply feature-scaling techniques like standardization and normalization to numerical
features.

Code:
# Standardization and normalization import pandas as pd
import numpy as np
from sklearn.preprocessing import Normalizer
from sklearn.preprocessing import StandardScaler

print("printing few data")

df = pd.read_csv("D:\TYCS\Data Science\SampleFile.csv")
print(df.head())

print("Max values")
max_vals = np.max(np.abs(df))
print(max_vals)
print((df - max_vals) / max_vals)

print("Normalization")
scaler = Normalizer()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
print(scaled_df.head())

print("Standardization")
scaler = StandardScaler()
scaled_data = scaler.fit_transform(df)
scaled_df = pd.DataFrame(scaled_data, columns=df.columns)
print(scaled_df.head())

Vikas College Of Arts, Science and Commerce Page 10

Vikas College Of Arts, Science and Commerce Page 11
B. Perform feature Dummification to convert categorical variables into numerical
representations.
Code:

import pandas as pd
data = pd.read_csv("data32.csv")
categorical_features = data.select_dtypes(include="object")
dummies = pd.get_dummies(categorical_features)
data = pd.concat([data, dummies], axis=1)
data.drop(categorical_features, axis=1, inplace=True)
data.to_csv("Output.csv")

Vikas College Of Arts, Science and Commerce Page 12

Practical 4 Hypothesis
Testing
Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi-square test) # t-test
import numpy as np
import scipy.stats as stats

np.random.seed(42)
scoreA = np.random.normal(loc=70,scale=10,size=30)
scoreB = np.random.normal(loc=75,scale=10,size=30)

t_stat,pvalue = stats.ttest_ind(scoreA,scoreB)
print(f"T-Statistics: {t_stat}\nP-Value: {pvalue}")

alpha = 0.05
if pvalue < alpha:
print("Reject the null hypothesis. There is a significant difference in exam scores.")
else:
print("Fail to reject the null hypothesis. There is no significant difference in exam scores.")

Output:

Chi-test
import numpy as np
import scipy.stats as stats
observed_data = np.array([[25, 15], [20, 40]])
chi2, pvalue, dof, expected = stats.chi2_contingency(observed_data)
print(f'Chi-Square Statistic: {chi2}\nPvalue: {pvalue}\nDegrees of Freedom: {dof}\nExpected
frequency:\n{expected}')
alpha = 0.05
if pvalue < alpha:
print("Reject the null hypothesis. There is a significant association between gender and job satisfaction.")
else:
print("Fail to reject the null hypothesis. Gender and job satisfaction are independent.")
Output:

Vikas College Of Arts, Science and Commerce Page 13

Practical 5
ANOVA (Analysis of Variance)
Perform one-way ANOVA to compare means across multiple groups.
from scipy.stats import f_oneway

# Define sample data for each group

group1 = [15, 20, 25, 30, 35]

group2 = [10, 18, 22, 28, 32]

group3 = [12, 16, 20, 24, 28]

f_statistic, p_value = f_oneway(group1, group2, group3)

print("One-way ANOVA results:")

print("F-statistic:", f_statistic)

print("P-value:", p_value)

alpha = 0.05

if p_value < alpha:

print(

"Reject null hypothesis: There are significant differences between the means of the groups."

else:

print(

"Fail to reject null hypothesis: There are no significant differences between the means of the groups."

Output:-

Vikas College Of Arts, Science and Commerce Page 14

Practical 6
Regression and its Types.
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error, r2_score

# Dependent variable (predictor)

X = np.array([[1], [2], [3], [4], [5], [6], [7], [8], [9], [10]])
# Independent variable (predictor)
y = np.array([[7], [9], [11], [13], [15], [17], [19], [21], [23], [25]])
# Dependent variable (response)

# Splitting the data into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)

# Simple Linear Regression

model = LinearRegression()
model.fit(X_train, y_train) # Fitting the model

# Coefficients
print("Intercept:", model.intercept_[0])
print("Coefficient:", model.coef_[0][0])

# Predictions
y_pred = model.predict(X_test)

# Model Evaluation
mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
print("Mean Squared Error:", mse)
print("R-squared:", r2)

Vikas College Of Arts, Science and Commerce Page 15

# Plotting the regression line
plt.scatter(X_test, y_test, color="blue")
plt.plot(X_test, y_pred, color="red")
plt.title("Simple Linear Regression")
plt.xlabel("Independent Variable (X)")
plt.ylabel("Dependent Variable (y)")
plt.show()

Output:

Vikas College Of Arts, Science and Commerce Page 16

Practical 7
Logistic Regression and Decision Tree
import numpy as np
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
from sklearn.cluster import KMeans
from sklearn.metrics import silhouette_score

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=5, cluster_std=0.60, random_state=0)

# Determine the optimal number of clusters using the silhouette score

silhouette_scores = []
for k in range(2, 11):
kmeans = KMeans(n_clusters=k, random_state=0).fit(X)
score = silhouette_score(X, kmeans.labels_)
silhouette_scores.append(score)

# Plot the silhouette scores

plt.plot(range(2, 11), silhouette_scores, marker="o")
plt.xlabel("Number of clusters")
plt.ylabel("Silhouette Score")
plt.title("Silhouette Score for Optimal Number of Clusters")
plt.show()

# Choose the optimal number of clusters based on the silhouette score

optimal_k = silhouette_scores.index(max(silhouette_scores)) + 2

# Apply K-Means clustering with the optimal number of clusters

kmeans = KMeans(n_clusters=optimal_k, random_state=0).fit(X)

# Visualize the clustering results

plt.scatter(X[:, 0], X[:, 1], c=kmeans.labels_, cmap="viridis", s=50, alpha=0.7)
plt.scatter(
kmeans.cluster_centers_[:, 0],
kmeans.cluster_centers_[:, 1],
Vikas College Of Arts, Science and Commerce Page 17
s=200,
c="red",
marker="X",
label="Centroids",
)
plt.title("K-Means Clustering")
plt.xlabel("Feature 1")
plt.ylabel("Feature 2")
plt.legend()
plt.show()

# Analyze the cluster characteristics

silhouette_avg = silhouette_score(X, kmeans.labels_)
print(f"Silhouette Score: {silhouette_avg}")
Output:

Vikas College Of Arts, Science and Commerce Page 18

Vikas College Of Arts, Science and Commerce Page 19
Practical 8
K-Means clustering
import pandas as pd
from sklearn.preprocessing import MinMaxScaler
from sklearn.cluster import KMeans
import matplotlib.pyplot as plt

# Load data
data = pd.read_csv("wholesale.csv")

# Display the first few rows of the dataset

data.head()

# Define categorical and continuous features

categorical_features = ["Channel", "Region"]
continuous_features = [
"Fresh",
"Milk",
"Grocery",
"Frozen",
"Detergents_Paper",
"Delicassen",
]

# Descriptive statistics for continuous features

data[continuous_features].describe()

# Convert categorical features into dummy variables

for col in categorical_features:
dummies = pd.get_dummies(data[col], prefix=col)
data = pd.concat([data, dummies], axis=1)
data.drop(col, axis=1, inplace=True)

Vikas College Of Arts, Science and Commerce Page 20

# Display the first few rows of the updated dataset
data.head()

# Normalize the data

mms = MinMaxScaler()
data_transformed = mms.fit_transform(data)

# Calculate the sum of squared distances for different values of k

sum_of_squared_distances = []
K = range(1, 15)
for k in K:
km = KMeans(n_clusters=k)
km.fit(data_transformed)
sum_of_squared_distances.append(km.inertia_)

# Plot the elbow method graph

plt.plot(K, sum_of_squared_distances, "bx-")
plt.xlabel("Number of Clusters (k)")
plt.ylabel("Sum of Squared Distances")
plt.title("Elbow Method for Optimal k")
plt.show()

Output:

Vikas College Of Arts, Science and Commerce Page 21

Practical 9
Principal Component Analysis (PCA)
import pandas as pd
from sklearn.datasets import load_iris
from sklearn.decomposition import PCA
import matplotlib.pyplot as plt

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target
target_names = iris.target_names

# Perform PCA
pca = PCA(n_components=2) # Specify the number of components (dimensions)
X_r = pca.fit_transform(X)

# Create a DataFrame for visualization

df = pd.DataFrame(data=X_r, columns=['PC1', 'PC2'])
df['target'] = y

# Plot the data

plt.figure(figsize=(8, 6))
colors = ['navy', 'turquoise', 'darkorange']
lw = 2

for color, i, target_name in zip(colors, [0, 1, 2], target_names):

plt.scatter(df.loc[df['target'] == i, 'PC1'], df.loc[df['target'] == i, 'PC2'], color=color, alpha=.8, lw=lw,
label=target_name)

plt.title('PCA of IRIS dataset')

plt.legend(loc='best', shadow=False, scatterpoints=1)
plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.show()

Output:

Vikas College Of Arts, Science and Commerce Page 22

Vikas College Of Arts, Science and Commerce Page 23
Practical 10
Data Visualization and Storytelling

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns

# Load the dataset

# Assume 'data.csv' contains your dataset
df = pd.read_csv("data.csv")

# Perform data analysis

# Example: Calculate summary statistics
summary_stats = df.describe()

# Create meaningful visualizations

# Example: Plot a histogram of a numerical variable
plt.figure(figsize=(8, 6))
sns.histplot(data=df, x="numerical_variable", bins=20, kde=True)
plt.title("Histogram of Numerical Variable")
plt.xlabel("Numerical Variable")
plt.ylabel("Frequency")
plt.show()

# Example: Plot a bar chart of a categorical variable

plt.figure(figsize=(8, 6))
sns.countplot(data=df, x="categorical_variable", palette="viridis")
plt.title("Bar Chart of Categorical Variable")
plt.xlabel("Categories")
plt.ylabel("Count")
plt.xticks(rotation=45)
plt.show()

# Present findings and insights in a clear and concise manner

# Example: Use Markdown to format text for presentation
print("# Data Analysis and Visualization Report\n")
print("## Summary Statistics:\n")
print(summary_stats)
print("\n## Insights:\n")
print(
"- The histogram shows that the distribution of the numerical variable is approximately normal."
)
print(
"- The bar chart indicates that category A is the most frequent in the categorical variable."
)
print(
"- The scatterplot suggests a positive correlation between numerical variables 1 and 2, with different
categories showing distinct patterns.\n"

Vikas College Of Arts, Science and Commerce Page 24

)

Output:

Vikas College Of Arts, Science and Commerce Page 25

Vikas College Of Arts, Science and Commerce Page 26

Year of Goodbyes
28% (18)
Year of Goodbyes
41 pages
Regression Analysis - Cheatsheet
No ratings yet
Regression Analysis - Cheatsheet
9 pages
(Feature Engineering) (Extended-Cheatsheet)
No ratings yet
(Feature Engineering) (Extended-Cheatsheet)
9 pages
Method Statement 14728983812691479973057231
No ratings yet
Method Statement 14728983812691479973057231
6 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
Data Science Lab Manual
No ratings yet
Data Science Lab Manual
32 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Omkar
No ratings yet
Omkar
37 pages
Lab Experiment 4 - AI
No ratings yet
Lab Experiment 4 - AI
7 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
DSBDA Practicals
No ratings yet
DSBDA Practicals
16 pages
Dwdm-Lab Manual
No ratings yet
Dwdm-Lab Manual
39 pages
Data Analysis
No ratings yet
Data Analysis
8 pages
Project Paarth
No ratings yet
Project Paarth
21 pages
Lab Mannual of ML
No ratings yet
Lab Mannual of ML
43 pages
DA Manual - Part B
No ratings yet
DA Manual - Part B
13 pages
DataAnalytics Lab Manual
No ratings yet
DataAnalytics Lab Manual
35 pages
DATA SCIENCE iNTERVIEW QUESTION
No ratings yet
DATA SCIENCE iNTERVIEW QUESTION
42 pages
ML Combined
No ratings yet
ML Combined
254 pages
DS Assignment COMPLETED
No ratings yet
DS Assignment COMPLETED
11 pages
DADV - Lab - Subject - 303105315
No ratings yet
DADV - Lab - Subject - 303105315
35 pages
Saurabh
No ratings yet
Saurabh
22 pages
Data-Analytics-Manual Lab G.anill Kumar
No ratings yet
Data-Analytics-Manual Lab G.anill Kumar
23 pages
Final Cost Practical
No ratings yet
Final Cost Practical
29 pages
Top 90+ Data Science Interview Questions and Answers (2024)
No ratings yet
Top 90+ Data Science Interview Questions and Answers (2024)
38 pages
2 DataPreProcessing Code
No ratings yet
2 DataPreProcessing Code
46 pages
Data Mining Lab Manual CSE VII Sem
No ratings yet
Data Mining Lab Manual CSE VII Sem
63 pages
Kartik MLP 4-9prg
No ratings yet
Kartik MLP 4-9prg
10 pages
Data Preprocess Steps
No ratings yet
Data Preprocess Steps
2 pages
Python For DS Cheat Sheet
100% (2)
Python For DS Cheat Sheet
6 pages
R Program Record Book Iba
No ratings yet
R Program Record Book Iba
24 pages
Practicals
No ratings yet
Practicals
42 pages
AIDS - DM Using Python - Lab Programs
No ratings yet
AIDS - DM Using Python - Lab Programs
19 pages
Chandigarh Group of Colleges College of Engineering Landran, Mohali
No ratings yet
Chandigarh Group of Colleges College of Engineering Landran, Mohali
47 pages
PP DWDM 4 5
No ratings yet
PP DWDM 4 5
26 pages
Data Science Lab Manual..
No ratings yet
Data Science Lab Manual..
54 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
100% (1)
Statistics Consulting Cheat Sheet: Kris Sankaran October 1, 2017
44 pages
Data Preprocessing
No ratings yet
Data Preprocessing
56 pages
Simple Linear Regression
No ratings yet
Simple Linear Regression
30 pages
ML Complete Notes Hridoy
No ratings yet
ML Complete Notes Hridoy
5 pages
7708 - MBA PredAnanBigDataNov21
No ratings yet
7708 - MBA PredAnanBigDataNov21
11 pages
DVA Lab Manual
No ratings yet
DVA Lab Manual
20 pages
ML LAB Mannual - Index
No ratings yet
ML LAB Mannual - Index
29 pages
Machine Learning Project Checklist
No ratings yet
Machine Learning Project Checklist
30 pages
Data Science and Analtics Laboratory
No ratings yet
Data Science and Analtics Laboratory
21 pages
ADS LAB Merged
No ratings yet
ADS LAB Merged
86 pages
Ric BNB Manual For MSC It Part 1, Sem-1
No ratings yet
Ric BNB Manual For MSC It Part 1, Sem-1
53 pages
20dit073 Jay Prajapati ML
No ratings yet
20dit073 Jay Prajapati ML
68 pages
Data Mining Lab Manual 2 2
No ratings yet
Data Mining Lab Manual 2 2
63 pages
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
No ratings yet
Lecture 2 - Statistical Inference - EDA and DS Process - 02032023 111156am 1 - 1 27022024 012412pm
44 pages
What Are The Differences Between Supervised and Unsupervised Learning?
No ratings yet
What Are The Differences Between Supervised and Unsupervised Learning?
22 pages
Hint Sheet
No ratings yet
Hint Sheet
13 pages
Monika Sree 11-07-2024
No ratings yet
Monika Sree 11-07-2024
36 pages
Subject - Machine Learning Group - E27-24 Name
No ratings yet
Subject - Machine Learning Group - E27-24 Name
18 pages
Data Science
No ratings yet
Data Science
15 pages
PreProcessing With R
No ratings yet
PreProcessing With R
6 pages
Machine Learning Ess - Week 1-4week
No ratings yet
Machine Learning Ess - Week 1-4week
43 pages
ML File Syllabus
No ratings yet
ML File Syllabus
43 pages
Machine Learning Summer Training
No ratings yet
Machine Learning Summer Training
118 pages
Dvda Manual Pit
No ratings yet
Dvda Manual Pit
27 pages
Couchbase Certified Java Developer - Exam Practice Tests
From Everand
Couchbase Certified Java Developer - Exam Practice Tests
Cristian Scutaru
No ratings yet
FAMILY LAW-I Marriage
No ratings yet
FAMILY LAW-I Marriage
126 pages
Acids Bases
No ratings yet
Acids Bases
17 pages
Lafayette Parish Business Database 211
No ratings yet
Lafayette Parish Business Database 211
890 pages
Estimating The Cost of Risky Debt by Ian Cooper
No ratings yet
Estimating The Cost of Risky Debt by Ian Cooper
8 pages
Extracts Thirukkural
No ratings yet
Extracts Thirukkural
20 pages
U.K. Chatterjee, S.K. Bose, S.K. Roy - Environmental Degradation of Metals - Corrosion Technology Series - 14-CRC Press (2001)
No ratings yet
U.K. Chatterjee, S.K. Bose, S.K. Roy - Environmental Degradation of Metals - Corrosion Technology Series - 14-CRC Press (2001)
509 pages
Scott Slaybaugh - Who Is To Blame? (Titanic Articles)
No ratings yet
Scott Slaybaugh - Who Is To Blame? (Titanic Articles)
8 pages
04-01-2017 08-14-24 - 44XX Series Multifunction Meter LCD - Manual
0% (1)
04-01-2017 08-14-24 - 44XX Series Multifunction Meter LCD - Manual
2 pages
The
No ratings yet
The
17 pages
Pipeline Pigging
100% (3)
Pipeline Pigging
28 pages
Intelligence in IoMT Turkey
No ratings yet
Intelligence in IoMT Turkey
17 pages
Enterprise
No ratings yet
Enterprise
13 pages
CN Case Study
No ratings yet
CN Case Study
6 pages
HPLC in Nucleic Acid Research Methods and Applications 1st Edition Best Quality Download
100% (13)
HPLC in Nucleic Acid Research Methods and Applications 1st Edition Best Quality Download
17 pages
SUTENE2 TRM Test U8B
No ratings yet
SUTENE2 TRM Test U8B
4 pages
Daftar Persediaan Januari Terbaru 2025 Pt. HNR
No ratings yet
Daftar Persediaan Januari Terbaru 2025 Pt. HNR
38 pages
Boq Pipa Instrument Air
No ratings yet
Boq Pipa Instrument Air
4 pages
Os Study at Penpol PVT LTD
No ratings yet
Os Study at Penpol PVT LTD
88 pages
RSLTE001 - System Program Cell Level - RSLTE-LNBTS-2-Day-rslte LTE17A Reports RSLTE001 XML-2018 03-27-06!40!24 955
No ratings yet
RSLTE001 - System Program Cell Level - RSLTE-LNBTS-2-Day-rslte LTE17A Reports RSLTE001 XML-2018 03-27-06!40!24 955
1,000 pages
MR 307 Laguna 8
No ratings yet
MR 307 Laguna 8
314 pages
Consciousness, Causation, and Confusion: Darryl Mathieson
No ratings yet
Consciousness, Causation, and Confusion: Darryl Mathieson
18 pages
Motherboard Diagnose - Troubleshooting Tips For Common Issues
No ratings yet
Motherboard Diagnose - Troubleshooting Tips For Common Issues
9 pages
Geotechnical Engineering I Prof. Devendra N. Singh Department of Civil Engineering Indian Institute of Technology-Bombay
No ratings yet
Geotechnical Engineering I Prof. Devendra N. Singh Department of Civil Engineering Indian Institute of Technology-Bombay
14 pages
Ken Kim PG79 FINAL
No ratings yet
Ken Kim PG79 FINAL
1 page
Chapter 9
No ratings yet
Chapter 9
51 pages
Packaging Efficiency Project
No ratings yet
Packaging Efficiency Project
8 pages
Movement in and Out of Cells - NOTES
No ratings yet
Movement in and Out of Cells - NOTES
5 pages
Glasgow Coma Scale
0% (1)
Glasgow Coma Scale
3 pages

TYCS Practical

Uploaded by

TYCS Practical

Uploaded by

Data science

Vikas College Of Arts, Science and Commerce Page 1

2 Data Frames and Basic Data Pre-processing

3 Feature Scaling and Dummification

5 ANOVA (Analysis of Variance)

6 Regression and Its Types

7 Logistic Regression and Decision Tree

9 Principal Component Analysis (PCA)

10 Data Visualization and Storytelling

Vikas College Of Arts, Science and Commerce Page 2

Vikas College Of Arts, Science and Commerce Page 3

B. Create a pivot table to analyse and summarize data.

Vikas College Of Arts, Science and Commerce Page 4

Vikas College Of Arts, Science and Commerce Page 5

Vikas College Of Arts, Science and Commerce Page 6

# Reading CSV file into DataFrame

# Reading JSON file into DataFrame

# Displaying the first 10 rows of the DataFrame

# Filling missing values with 0

# Dropping rows with any missing values

Vikas College Of Arts, Science and Commerce Page 8

# Reading CSV file into DataFrame

# Filtering data based on a condition (e.g., age greater than 25)

# Sorting data based on a column (e.g., sorting by age in descending order)

# Displaying the filtered DataFrame

# Displaying the sorted DataFrame

# Displaying the grouped DataFrame

Vikas College Of Arts, Science and Commerce Page 9

print("printing few data")

Vikas College Of Arts, Science and Commerce Page 10

Vikas College Of Arts, Science and Commerce Page 12

Vikas College Of Arts, Science and Commerce Page 13

# Define sample data for each group

group1 = [15, 20, 25, 30, 35]

group2 = [10, 18, 22, 28, 32]

group3 = [12, 16, 20, 24, 28]

f_statistic, p_value = f_oneway(group1, group2, group3)

print("One-way ANOVA results:")

if p_value < alpha:

Vikas College Of Arts, Science and Commerce Page 14

# Dependent variable (predictor)

# Splitting the data into training and testing sets

# Simple Linear Regression

Vikas College Of Arts, Science and Commerce Page 15

Vikas College Of Arts, Science and Commerce Page 16

# Generate sample data

# Determine the optimal number of clusters using the silhouette score

# Plot the silhouette scores

# Choose the optimal number of clusters based on the silhouette score

# Apply K-Means clustering with the optimal number of clusters

# Visualize the clustering results

# Analyze the cluster characteristics

Vikas College Of Arts, Science and Commerce Page 18

# Display the first few rows of the dataset

# Define categorical and continuous features

# Descriptive statistics for continuous features

# Convert categorical features into dummy variables

Vikas College Of Arts, Science and Commerce Page 20

# Normalize the data

# Calculate the sum of squared distances for different values of k

# Plot the elbow method graph

Vikas College Of Arts, Science and Commerce Page 21

# Load the Iris dataset

# Create a DataFrame for visualization

# Plot the data

for color, i, target_name in zip(colors, [0, 1, 2], target_names):

plt.title('PCA of IRIS dataset')

Vikas College Of Arts, Science and Commerce Page 22

# Load the dataset

# Perform data analysis

# Create meaningful visualizations

# Example: Plot a bar chart of a categorical variable

# Present findings and insights in a clear and concise manner

Vikas College Of Arts, Science and Commerce Page 24

Vikas College Of Arts, Science and Commerce Page 25

You might also like