0% found this document useful (0 votes)

18 views114 pages

Practical No. 01

The document outlines practical exercises in Excel and Python for data analysis, including conditional formatting, pivot tables, VLOOKUP, and what-if analysis in Excel. It also covers data pre-processing with pandas, feature scaling, dummification, hypothesis testing, and ANOVA. Each section provides detailed steps and code examples for executing the tasks effectively.

Uploaded by

Dhruv Borse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

18 views114 pages

Practical No. 01

Uploaded by

Dhruv Borse

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 114

Practical No.

Aim: Introduction to Excel

 Perform conditional formatting on a dataset using various criteria.
 Create a pivot table to analyze and summarize data.
 Use VLOOKUP function to retrieve information from a different worksheet or table.
 Perform what-if analysis using Goal Seek to determine input values for desired output.

 Perform conditional formatting on a dataset using various criteria.

We perform conditional formatting on the "Profit" column to highlight cells with a profit greater than 800 using
following steps:

Steps:
1. Select the "Profit" column (Column E).

2. Go to the "Home" tab on the ribbon.

3. Click on "Conditional Formatting" in the toolbar.
4. Choose "Highlight Cells Rules" and then "Greater Than."

5. Enter the threshold value as 800.

6. Customize the formatting options (e.g., choose a fill color).
7. Click "OK" to apply the rule.

 Create a pivot table to analyze and summarize data.

Following are the steps to create a pivot table to analyze total sales by category.

Steps:
1. Select the entire dataset including headers.
2. Go to the "Insert" tab on the ribbon.
3. Click on "PivotTable."

4. Choose where you want to place the PivotTable (e.g., new worksheet).
5. Drag "Category" to the Rows area.
6. Drag "Sales" to the Values area, choosing the sum function.

 Use VLOOKUP function to retrieve information from a different worksheet or table.

Use the VLOOKUP function to retrieve the category of "Product M" from a separate worksheet named "Product
Table" using following steps:
Steps:
1. Assuming your "Product Table" is in a different worksheet.
2. In a cell in your main dataset, enter the formula:
=VLOOKUP("M", 'Product Table'!A:B, 2, FALSE)
 Perform what-if analysis using Goal Seek to determine input values for desired output.
Use Goal Seek to find the required sales for "Product P" to achieve a profit of 1000 using the following
steps.

Steps:
1. Identify the cell containing the formula for "Profit" for "Product P" (let's assume it's in cell E17).
2. Go to the "Data" tab on the ribbon.
3. Click on "What-If Analysis" and select "Goal Seek."

4. Set "Set cell" to the profit cell (E17), "To value" to 1000, and "By changing cell" to the sales cell (C17).
5. Click "OK" to let Excel determine the required sales.
Practical No. 02

Aim: Data Frames and Basic Data Pre-processing

 Read data from CSV and JSON files into a data frame.
 Perform basic data pre-processing tasks such as handling missing values and
outliers.
 Manipulate and transform data using functions like filtering, sorting, and grouping.

Data pre-processing:
Data pre-processing is a crucial step in the data analysis pipeline, encompassing tasks such as
reading data from various file formats, handling missing values, and managing outliers. This
practical guide explores how to execute these tasks using the pandas library in Python.

Steps:
Step 1: Reading from CSV and JSON Files
1. Utilize pandas to read data from a CSV file ('DATA SET.csv') into a data frame.
2. Use pandas to read data from a JSON file ('ds.json') into a data frame.
3. Display the first few rows of each data frame to inspect the data.
Step 2: Handling Missing Values
1. Drop rows with missing values from the CSV data frame.
2. Fill missing values with a specific value (e.g., 0) in the JSON data frame.
Step 3: Handling Outliers
1. Identify outliers in the 'Sales' column of the CSV data frame.
2. Replace outliers with the median value.
Step 4: Manipulating and Transforming Data
1. Filter the CSV data frame to include only rows where 'Sales' is greater than 10.
2. Sort the CSV data frame based on the 'Sales' column in descending order.
3. Group the CSV data frame by the 'Category' column and calculate the mean for numeric columns ('Sales',
'Cost', 'Profit').
Step 5: Displaying Results
1. Display the cleaned CSV data frame after handling missing values.
2. Display the JSON data frame after filling missing values.
3. Display the filtered CSV data frame.
4. Display the sorted CSV data frame.
5. Display the grouped CSV data frame showing the mean values for numeric columns.

Code:
import pandas as pd
# Read data from CSV file into a data frame
csv_file_path = 'DATA SET.csv'
df_csv = pd.read_csv(csv_file_path)
# Read data from JSON file into a data frame
json_file_path = 'ds.json'
df_json = pd.read_json(json_file_path)
# Display the first few rows of each data frame to inspect the data
print("CSV Data:")
print(df_csv.head())
print("\nJSON Data:")
print(df_json.head())
# Handling missing values
# Drop rows with missing values
df_csv_cleaned = df_csv.dropna()
# Fill missing values with a specific value (e.g., 0)
df_json_filled = df_json.fillna(0)
# Handling outliers
# Assume 'Sales' is the column with outliers
# Replace outliers with the median
median_value = df_csv['Sales'].median()
upper_threshold = df_csv['Sales'].mean() + 2 * df_csv['Sales'].std()
lower_threshold = df_csv['Sales'].mean() - 2 * df_csv['Sales'].std()
df_csv['Sales'] = df_csv['Sales'].apply(lambda x: median_value if x > upper_threshold or x <
lower_threshold else x)
# Manipulate and transform data
# Filtering
filtered_data = df_csv[df_csv['Sales'] > 10]
# Sorting
sorted_data = df_csv.sort_values(by='Sales', ascending=False)
# Grouping and calculating mean for numeric columns
numeric_columns = ['Sales', 'Cost', 'Profit']
grouped_data = df_csv.groupby('Category')[numeric_columns].mean()
# Display the results
print("\nCleaned CSV Data:")
print(df_csv_cleaned.head())
print("\nFilled JSON Data:")
print(df_json_filled.head())
print("\nFiltered Data:")
print(filtered_data.head())
print("\nSorted Data:")
print(sorted_data.head())
print("\nGrouped Data:")
print(grouped_data.head())
Output:
Practical No. 03

Aim: Feature Scaling and Dummification

 Apply feature-scaling techniques like standardization and normalization to
numerical features.
 Perform feature dummification to convert categorical variables into
numerical representations.

Feature Scaling:
Feature scaling is a preprocessing technique used to standardize the range of independent
variables or features of the data. It is essential for certain machine learning algorithms that are
sensitive to the scale of input features, ensuring that all features contribute equally to the
learning process.

Feature Dummification:
Feature dummification or one-hot encoding is a technique used to convert categorical
variables into numerical representations. This is necessary because many machine learning
algorithms require numerical input, and representing categorical variables as binary vectors
helps maintain their information.

Steps:
1. Load and Explore Data: Load the dataset and explore its structure, identify numeric and
categorical features.
2. Feature Scaling: Apply standardization and normalization to numeric features.
3. Feature Dummification: Convert categorical variables into numerical representations
using one-hot encoding.
4. Combine Features: Combine scaled numeric features with one-hot encoded categorical
features.
5. Display Resulting Dataset: Display the final dataset after both feature scaling and
dummification.

Code:
import pandas as pd
from sklearn.preprocessing import StandardScaler, MinMaxScaler, OneHotEncoder
from sklearn.compose import ColumnTransformer
from sklearn.pipeline import Pipeline
# Define the data
data = {
'Product': ['Apple_Juice', 'Banana_Smoothie', 'Orange_Jam', 'Grape_Jelly', 'Kiwi_Parfait',
'Mango_Chutney', 'Pineapple_Sorbet', 'Strawberry_Yogurt', 'Blueberry_Pie', 'Cherry_Salsa'],
'Category': ['Apple', 'Banana', 'Orange', 'Grape', 'Kiwi', 'Mango', 'Pineapple', 'Strawberry',
'Blueberry', 'Cherry'],
'Sales': [1200, 1700, 2200, 1400, 2000, 1000, 1500, 1800, 1300, 1600],
'Cost': [600, 850, 1100, 700, 1000, 500, 750, 900, 650, 800],
'Profit': [600, 850, 1100, 700, 1000, 500, 750, 900, 650, 800]
}
# Create a DataFrame
df = pd.DataFrame(data)
# Display the original dataset
print("Original Dataset:")
print(df)
# Step 1: Feature Scaling (Standardization and Normalization)
numeric_columns = ['Sales', 'Cost', 'Profit']
scaler_standardization = StandardScaler()
scaler_normalization = MinMaxScaler()
df_scaled_standardized =
pd.DataFrame(scaler_standardization.fit_transform(df[numeric_columns]),
columns=numeric_columns)
df_scaled_normalized =
pd.DataFrame(scaler_normalization.fit_transform(df[numeric_columns]),
columns=numeric_columns)
# Combine the scaled numeric features with the categorical features
df_scaled = pd.concat([df_scaled_standardized, df.drop(numeric_columns, axis=1)], axis=1)
# Display the dataset after feature scaling
print("\nDataset after Feature Scaling:")
print(df_scaled)
# Step 2: Feature Dummification
# Identify categorical columns
categorical_columns = ['Product', 'Category']
# Create a column transformer for dummification
preprocessor = ColumnTransformer(
transformers=[
('categorical', OneHotEncoder(), categorical_columns)
],
remainder='passthrough'
)
# Apply the column transformer to the dataset
df_dummified = pd.DataFrame(preprocessor.fit_transform(df))
# Display the dataset after feature dummification
print("\nDataset after Feature Dummification:")
print(df_dummified)
Output:
Practical No. 04

Aim: Hypothesis Testing

 Formulate null and alternative hypotheses for a given problem.
 Conduct a hypothesis test using appropriate statistical tests (e.g., t-test, chi-
square test).
 Interpret the results and draw conclusions based on the test outcomes.

Hypothesis Testing:
Hypothesis testing is a statistical method used to make inferences about population parameters based on
sample data. It involves the formulation of a null hypothesis (H0) and an alternative hypothesis (H1), and the
collection of sample data to assess the evidence against the null hypothesis. The goal is to determine whether
there is enough evidence to reject the null hypothesis in favor of the alternative hypothesis.

1. Formulate Hypotheses:
 Null Hypothesis (H0 ): The average caffeine content per serving is 80 mg (μ=80).
 Alternative Hypothesis (H1 ): The average caffeine content per serving is different from 80 mg
(μ≠80).
2. Statistical Test:
 A t-test is appropriate since you are comparing a sample mean to a known population mean, and the
sample size is small.
3. Data Collection:
 Randomly select 30 cans of the energy drink and measure the caffeine content in each.
4. Conducting the Hypothesis Test:
a. Collect Data:
 Calculate the sample mean (�) and standard deviation (s) from the 30 samples.
b. Set Significance Level (α):
 Choose a significance level (α=0.05,0.01,0.10).
c. Calculate the Test Statistic (t-value):
 Use the formula t=s/n �−μ .
d. Determine Degrees of Freedom:
 For a one-sample t-test, degrees of freedom (df) is n−1.
e. Find Critical Values or P-value:
 Use a t-table or statistical software to find the critical t-values for a two-tailed test at the chosen
significance level.
f. Make a Decision:
 If the t-value falls outside the critical region, reject the null hypothesis. If it falls inside, fail to reject.
g. Interpretation:
 If you reject the null hypothesis, there is enough evidence to suggest that the average caffeine
content per serving is different from 80 mg. If you fail to reject the null hypothesis, there is not
enough evidence to suggest a difference in the average caffeine content.
5. Conclusion:
 Draw conclusions about the energy drink's caffeine content, considering both statistical and practical
significance. Consider decisions relevant to the context of the problem.

Code:
import numpy as np
from scipy import stats
import matplotlib.pyplot as plt
# Generate two samples for demonstration purposes
np.random.seed(42)
sample1 = np.random.normal(loc=10, scale=2, size=30)
sample2 = np.random.normal(loc=12, scale=2, size=30)
# Perform a two-sample t-test
t_statistic, p_value = stats.ttest_ind(sample1, sample2)
# Set the significance level
alpha = 0.05
print("Results of Two-Sample t-test:")
print(f"t-statistic: {t_statistic}")
print(f"p-value: {p_value}")
print(f"Degrees of Freedom: {len(sample1) + len(sample2) - 2}")
# Plot the distributions
plt.figure(figsize=(10, 6))
plt.hist(sample1, alpha=0.5, label='Sample 1', color='blue')
plt.hist(sample2, alpha=0.5, label='Sample 2', color='orange')
plt.axvline(np.mean(sample1), color='blue', linestyle='dashed', linewidth=2)
plt.axvline(np.mean(sample2), color='orange', linestyle='dashed', linewidth=2)
plt.title('Distributions of Sample 1 and Sample 2')
plt.xlabel('Values')
plt.ylabel('Frequency')
plt.legend()
# Highlight the critical region if null hypothesis is rejected
if p_value < alpha:
critical_region = np.linspace(min(sample1.min(), sample2.min()), max(sample1.max(),
sample2.max()), 1000)
plt.fill_between(critical_region, 0, 5, color='red', alpha=0.3, label='Critical Region')
# Show the observed t-statistic
plt.text(11, 5, f'T-statistic: {t_statistic:.2f}', ha='center', va='center', color='black',
backgroundcolor='white')
# Show the plot
plt.show()
# Draw Conclusions
# Drawing Conclusions
if p_value < alpha:
if np.mean(sample1) > np.mean(sample2):
print("Conclusion: There is significant evidence to reject the null hypothesis.")
print("Interpretation: The mean caffeine content of Sample 1 is significantly higher than
that of Sample 2.")
# Additional context and practical implications can be added here.
else:
print("Conclusion: There is significant evidence to reject the null hypothesis.")
print("Interpretation: The mean caffeine content of Sample 2 is significantly higher than
that of Sample 1.")
# Additional context and practical implications can be added here.
else:
print("Conclusion: Fail to reject the null hypothesis.")
print("Interpretation: There is not enough evidence to claim a significant difference between
the means.")

Output:
Practical No. 05

Aim- ANOVA(Analysis of variance)

 Perform one-way ANOVA to compare means across multiple groups
 Conduct post-hoc tests to identify significant differences between groups
means.

Acquire and Prepare Data:

Obtain a dataset with a categorical independent variable (factor) and a continuous
dependent variable.
Ensure the data meets the assumptions of ANOVA, including normality, homogeneity of
variances, and independence of observations.
Clean and preprocess the data as needed, handling missing values and outliers.

Perform One-Way ANOVA:

Set up the hypothesis test:
Null Hypothesis (H0): The means of all groups are equal.
Alternative Hypothesis (H1): At least one group mean is different from the others.
Conduct the one-way ANOVA test using software such as R, Python (with libraries like
SciPy or statsmodels), or statistical packages like SPSS.
Calculate the F-statistic and corresponding p-value to determine the statistical significance
of the differences between group means.

from matplotlib import pyplot as plt

movies=["golmaal","annabelle","bhoot-uncle","bhoothnath","de dana dan"]
num_oscars=[5,10,3,6,8]
plt.bar(range(len(movies)),num_oscars)
plt.title("Horror Movies")
plt.ylabel("oscar award 2024")
plt.xticks(range(len(movies)),movies)
plt.show()

Output-
from matplotlib import pyplot as plt
years = [2020,2021,2022,2023,2024]
failurepercentrates = [60,70,50,10,0]
plt.plot(years,failurepercentrates,color = "green" ,marker ="o", linestyle="solid" )
plt.title("corona times success rates")
plt.ylabel("percentages rates")
plt.show()

Output-

from matplotlib import pyplot as plt

from collections import Counter
totalnumber =[83,95,91,67,70,100]
histogram=Counter(min(score // 10*10,90) for score in totalnumber)
plt.bar([x+5 for x in histogram.keys()],
histogram.values(),
10,
edgecolor=(0,0,0))
plt.axis([-5,105,0,5])
plt.xticks([10*i for i in range(11)])
plt.xlabel("total_score")
plt.ylabel("N no of student")
plt.title("disttibution of exam 1 marks")
plt.show()

Output-
Practical No. 06

Aim:- Regression and Its Types

 Implement simple linear regression using a dataset.
 Explore and interpret the regression model coefficients and goodness-of-fit
measures.
 Extend the analysis to multiple linear regression and assess the impact of
additional predictors.

Acquire Dataset:
Obtain a dataset suitable for regression analysis. The dataset should contain variables that
you believe may have a linear relationship or can be used to predict another variable of
interest.

Explore the Dataset:

Load the dataset into your preferred data analysis environment (e.g., Python with libraries
like Pandas and NumPy, or R).
Visualize the data using scatter plots, histograms, and other relevant plots to identify
potential relationships between variables.

Implement Simple Linear Regression:

Choose a predictor variable (independent variable) and a target variable (dependent variable)
based on your analysis.
Implement simple linear regression using the selected variables. This involves fitting a linear
model to the data and estimating coefficients (slope and intercept).

Visualization and Interpretation:

Visualize the regression line overlaid on the scatter plot to visually assess how well the
model fits the data.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
from sklearn import metrics

# Load the dataset

df = pd.read_csv('diabetes.csv')

# Simple Linear Regression

X_simple = df[['Age']]
y_simple = df['Pregnancies']
X_train_simple, X_test_simple, y_train_simple, y_test_simple = train_test_split(X_simple,
y_simple, test_size=0.2, random_state=0)

regressor_simple = LinearRegression()
regressor_simple.fit(X_train_simple, y_train_simple)

# Predictions on the test set

y_pred_simple = regressor_simple.predict(X_test_simple)

# Model evaluation
print('Simple Linear Regression:')
print('Intercept:', regressor_simple.intercept_)
print('Coefficient:', regressor_simple.coef_)
print('Mean Absolute Error:', metrics.mean_absolute_error(y_test_simple, y_pred_simple))
print('Mean Squared Error:', metrics.mean_squared_error(y_test_simple, y_pred_simple))
print('Root Mean Squared Error:', np.sqrt(metrics.mean_squared_error(y_test_simple,
y_pred_simple)))
print('R-squared:', metrics.r2_score(y_test_simple, y_pred_simple))

# Visualization for Simple Linear Regression

plt.scatter(X_simple, y_simple, color='gray')
plt.plot(X_simple, regressor_simple.predict(X_simple), color='red', linewidth=2)
plt.title('Simple Linear Regression')
plt.xlabel('Age')
plt.ylabel('Pregnancies')
plt.show()

# Multiple Linear Regression

X_multi = df[['Glucose', 'BloodPressure', 'Insulin']]
y_multi = df['Outcome']
X_train_multi, X_test_multi, y_train_multi, y_test_multi = train_test_split(X_multi, y_multi,
test_size=0.2, random_state=0)

regressor_multi = LinearRegression()
regressor_multi.fit(X_train_multi, y_train_multi)

# Predictions on the test set

y_pred_multi = regressor_multi.predict(X_test_multi)

Output-
Practical No. 07

Aim- Logistic Regression and Decision Tree

 Build a logistic regression model to predict a binary outcome.
 Evaluate the model's performance using classification metrics (e.g.,
accuracy,precision, recall).
 Construct a decision tree model and interpret the decision rules for
classification.
Acquire Dataset:
Obtain a dataset suitable for binary classification tasks. The dataset should contain predictor
variables and a binary outcome variable.

Explore the Dataset:

Load the dataset into your preferred data analysis environment.
Perform exploratory data analysis (EDA) to understand the distribution of variables, identify
any missing values, and assess potential relationships between variables.

Preprocess the Data:

Handle missing values and perform any necessary data cleaning steps.
Encode categorical variables if required.
Split the dataset into training and testing sets for model evaluation.

Build Logistic Regression Model:

Choose predictor variables (features) based on the analysis.
Implement logistic regression model using the chosen features to predict the binary outcome.
Train the model using the training dataset.
Assess the model's performance using classification metrics such as accuracy, precision, recall,
F1-score, and ROC-AUC.

Evaluate Logistic Regression Model:

Evaluate the model's performance on the testing dataset using the chosen classification metrics.
Interpret the results to understand how well the logistic regression model predicts the binary
outcome.

Construct Decision Tree Model:

Choose predictor variables based on the analysis and understanding of the dataset.
Implement a decision tree model to predict the binary outcome.
Train the decision tree model using the training dataset.
Visualize the decision tree to interpret the decision rules for classification.
Evaluate Decision Tree Model:
Evaluate the decision tree model's performance on the testing dataset using classification
metrics similar to logistic regression.
Interpret the decision rules generated by the decision tree model to understand how the model
makes predictions.

import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LogisticRegression
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns
from sklearn.preprocessing import StandardScaler

# Load the dataset

df = pd.read_csv('diabetesUp.csv')

# Split the dataset into features (X) and target variable (y)
X = df.drop(columns=['BloodPressure'])
y = df['Age']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the data

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Logistic Regression
log_reg_model = LogisticRegression(max_iter=1000) # Increase max_iter to avoid
convergence warning
log_reg_model.fit(X_train_scaled, y_train)
y_pred_log_reg = log_reg_model.predict(X_test_scaled)

# Decision Tree
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train_scaled, y_train)
y_pred_dt = dt_model.predict(X_test_scaled)

# Evaluation metrics for Logistic Regression

print("Logistic Regression:")
print("Accuracy:", accuracy_score(y_test, y_pred_log_reg))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_log_reg))
print("Classification Report:")
print(classification_report(y_test, y_pred_log_reg, zero_division=1)) # Set zero_division=1 to
handle zero division warning

# Evaluation metrics for Decision Tree

print("\nDecision Tree:")
print("Accuracy:", accuracy_score(y_test, y_pred_dt))
print("Confusion Matrix:")
print(confusion_matrix(y_test, y_pred_dt))
print("Classification Report:")
print(classification_report(y_test, y_pred_dt, zero_division=1))

# Plot confusion matrices

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 1)
sns.heatmap(confusion_matrix(y_test, y_pred_log_reg), annot=True, cmap='Blues', fmt='g')
plt.title('Confusion Matrix - Logistic Regression')
plt.xlabel('Predicted')
plt.ylabel('True')

plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix(y_test, y_pred_dt), annot=True, cmap='Blues', fmt='g')
plt.title('Confusion Matrix - Decision Tree')
plt.xlabel('Predicted')
plt.ylabel('True')

plt.tight_layout()
plt.show()
Output-
Practical No. 08

Aim- K-Means Clustering

 Apply the K-Means algorithm to group similar data points into clusters.
 Determine the optimal number of clusters using elbow method or silhouette
analysis.
 Visualize the clustering results and analyze the cluster characteristics

Apply K-Means Algorithm:

Choose the number of clusters (K) that you want to create.
Apply the K-Means algorithm to the preprocessed data to cluster similar
data points into K clusters.
Iterate the algorithm until convergence, where the cluster centroids no
longer change significantly.

Determine Optimal Number of Clusters:

Use the elbow method or silhouette analysis to determine the optimal
number of clusters.
Elbow Method: Plot the within-cluster sum of squares (WCSS)
against the number of clusters. Choose the number of clusters where
the decrease in WCSS starts to slow down (elbow point).
Silhouette Analysis: Compute the silhouette scores for different
numbers of clusters. Choose the number of clusters with the highest
average silhouette score, indicating well-separated clusters.

Visualize Clustering Results:

Visualize the clustering results to understand the structure of the clusters.
Plot the clusters using scatter plots for two or three-dimensional data.
Use dimensionality reduction techniques such as PCA or t-SNE to
visualize high-dimensional data in two or three dimensions.
Analyze Cluster Characteristics:
Analyze the characteristics of each cluster to understand the patterns and
differences between clusters.
Compute cluster centroids to determine the center of each cluster in
feature space.
Explore the distribution of data points within each cluster to identify
commonalities and differences.

Interpret Results:
Interpret the clustering results based on the characteristics of each cluster.
Analyze any meaningful patterns or insights discovered through clustering.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=42)

# Determine the optimal number of clusters using silhouette analysis

silhouette_scores = []
for n_clusters in range(2, 11):
kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) #
Explicitly setting n_init
cluster_labels = kmeans.fit_predict(X)
silhouette_avg = silhouette_score(X, cluster_labels)
silhouette_scores.append(silhouette_avg)

# Plot silhouette scores to determine the optimal number of clusters

plt.plot(range(2, 11), silhouette_scores, marker='o')
plt.xlabel('Number of Clusters')
plt.ylabel('Silhouette Score')
plt.title('Silhouette Analysis')
plt.show()
# Choose the optimal number of clusters based on the silhouette score or elbow
method
n_clusters = silhouette_scores.index(max(silhouette_scores)) + 2 # Adjusted for 0-
based indexing

# Apply K-Means clustering with the optimal number of clusters

kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) #
Explicitly setting n_init
kmeans.fit(X)
cluster_labels = kmeans.labels_

# Visualize the clustering results

plt.scatter(X[:, 0], X[:, 1], c=cluster_labels, cmap='viridis')
centers = kmeans.cluster_centers_
plt.scatter(centers[:, 0], centers[:, 1], c='red', marker='*', s=300, alpha=0.5)
plt.xlabel('Feature 1')
plt.ylabel('Feature 2')
plt.title('K-Means Clustering')
plt.show()

# Analyze the characteristics of each cluster

cluster_df = pd.DataFrame(X, columns=['Feature 1', 'Feature 2'])
cluster_df['Cluster'] = cluster_labels
cluster_summary = cluster_df.groupby('Cluster').mean()
print(cluster_summary)
Output-
Practical No. 09

Aim- Principal Component Analysis (PCA)

 Perform PCA on a dataset to reduce dimensionality.
 Evaluate the explained variance and select the appropriate number of principal
components.
 Visualize the data in the reduced-dimensional space.

Acquire and Preprocess Data:

Obtain a dataset suitable for PCA. Ensure that the dataset contains numerical
features.
Preprocess the data by handling missing values, scaling the features if necessary,
and encoding categorical variables.

Standardize the Data:

Standardize the features by subtracting the mean and dividing by the standard
deviation. This step is essential for PCA as it ensures that all features have the same
scale.

Compute Covariance Matrix:

Compute the covariance matrix of the standardized data. The covariance matrix
represents the relationships between different features in the dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Perform PCA
pca = PCA(n_components=2)
X_pca = pca.fit_transform(X)
# Calculate explained variance ratio
explained_variance_ratio = pca.explained_variance_ratio_
print("Explained Variance Ratio:", explained_variance_ratio)

# Visualize the data in the reduced-dimensional space

plt.figure(figsize=(8, 6))
for i in range(len(iris.target_names)):
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], label=iris.target_names[i])

plt.title('PCA of Iris Dataset')

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
Output-
Practical No. 10

Aim- Data Visualization and Storytelling

 Create meaningful visualizations using data visualization tools
 Combine multiple visualizations to tell a compelling data story.
 Present the findings and insights in a clear and concise manner.

Select Visualization Tools:

Choose appropriate data visualization tools based on your familiarity, dataset
complexity, and desired visualizations.
Common tools include Python libraries like Matplotlib, Seaborn, Plotly, and
ggplot2 in R. Alternatively, you can use BI tools like Tableau, Power BI, or
online platforms like Google Data Studio.

Explore and Clean the Data:

Perform exploratory data analysis (EDA) to understand the distribution,
relationships, and patterns within the dataset.
Clean the data by handling missing values, outliers, and inconsistencies.

Identify Key Insights:

Identify key insights or findings from the dataset that you want to communicate
through visualizations.
Prioritize insights based on relevance and significance to the audience.

Create Meaningful Visualizations:

Design and create visualizations that effectively communicate the key insights
identified.
Choose appropriate chart types (e.g., bar charts, line charts, scatter plots,
histograms) based on the nature of the data and the insights you want to convey.
Use color, size, and other visual cues effectively to enhance understanding and
highlight important information.

Combine Visualizations into a Story:

Organize your visualizations into a cohesive narrative or story.
Create a storyboard or outline to structure the flow of the story, including an
introduction, main points, and conclusion.
Use a combination of text, annotations, and visual transitions to guide the
audience through the story.
Present the Findings:
Prepare for the presentation by practicing your delivery and ensuring you can
effectively communicate the insights.
Use clear and concise language to explain the visualizations and the insights they
convey.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load sample dataset (you can replace this with your own dataset)
df = sns.load_dataset('tips')

# Explore the dataset

print(df.head())

# Data Visualization
# Visualization 1: Distribution of Total Bill Amount
plt.figure(figsize=(10, 6))
sns.histplot(df['total_bill'], kde=True)
plt.title('Distribution of Total Bill Amount')
plt.xlabel('Total Bill Amount')
plt.ylabel('Frequency')
plt.show()

# Visualization 2: Relationship between Total Bill and Tip Amount

plt.figure(figsize=(10, 6))
sns.scatterplot(x='total_bill', y='tip', data=df, hue='sex')
plt.title('Relationship between Total Bill and Tip Amount')
plt.xlabel('Total Bill Amount')
plt.ylabel('Tip Amount')
plt.legend(title='Sex')
plt.show()

# Visualization 3: Box plot of Total Bill Amount by Day

plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box plot of Total Bill Amount by Day')
plt.xlabel('Day')
plt.ylabel('Total Bill Amount')
plt.show()

# Visualization 4: Count of Customers by Day and Time

plt.figure(figsize=(10, 6))
sns.countplot(x='day', hue='time', data=df)
plt.title('Count of Customers by Day and Time')
plt.xlabel('Day')
plt.ylabel('Count of Customers')
plt.legend(title='Time')
plt.show()

# Data Storytelling
print("\nInsights:")
print("1. The distribution of total bill amounts is right-skewed, with most bills falling between
$10 and $20.")
print("2. There is a positive relationship between total bill amount and tip amount, with some
variations based on gender.")
print("3. Total bill amounts tend to be higher on Saturdays compared to other days.")
print("4. The count of customers is higher during dinner time compared to lunchtime on all
days.")

# Conclusion
print("\nConclusion:")
print("Based on the analysis, we can infer that there is a strong relationship between the total
bill amount and tip amount, with variations based on factors such as day and time. Further
analysis can be conducted to explore these relationships in more detail.")

Output-
Practical No. 01

Aim: Introduction to Excel

 Perform conditional formatting on a dataset using various criteria.

We perform conditional formatting on the "Profit" column to highlight cells with a profit greater than 800 using
following steps:

Steps:
1. Select the "Profit" column (Column E).

2. Go to the "Home" tab on the ribbon.

3. Click on "Conditional Formatting" in the toolbar.
4. Choose "Highlight Cells Rules" and then "Greater Than."

5. Enter the threshold value as 800.

6. Customize the formatting options (e.g., choose a fill color).
7. Click "OK" to apply the rule.

 Create a pivot table to analyze and summarize data.

Following are the steps to create a pivot table to analyze total sales by category.

Steps:
1. Select the entire dataset including headers.
2. Go to the "Insert" tab on the ribbon.
3. Click on "PivotTable."

4. Choose where you want to place the PivotTable (e.g., new worksheet).
5. Drag "Category" to the Rows area.
6. Drag "Sales" to the Values area, choosing the sum function.

 Use VLOOKUP function to retrieve information from a different worksheet or table.

4. Set "Set cell" to the profit cell (E17), "To value" to 1000, and "By changing cell" to the sales cell (C17).
5. Click "OK" to let Excel determine the required sales.
Practical No. 02

Aim: Data Frames and Basic Data Pre-processing

Aim: Feature Scaling and Dummification

 Apply feature-scaling techniques like standardization and normalization to
numerical features.
 Perform feature dummification to convert categorical variables into
numerical representations.

Aim: Hypothesis Testing

Output:
Practical No. 05

Aim- ANOVA(Analysis of variance)

 Perform one-way ANOVA to compare means across multiple groups
 Conduct post-hoc tests to identify significant differences between groups
means.

Acquire and Prepare Data:

Perform One-Way ANOVA:

from matplotlib import pyplot as plt

Output-

from matplotlib import pyplot as plt

Output-
Practical No. 06

Aim:- Regression and Its Types

Explore the Dataset:

Implement Simple Linear Regression:

Visualization and Interpretation:

Visualize the regression line overlaid on the scatter plot to visually assess how well the
model fits the data.

# Load the dataset

df = pd.read_csv('diabetes.csv')

# Simple Linear Regression

X_simple = df[['Age']]
y_simple = df['Pregnancies']
X_train_simple, X_test_simple, y_train_simple, y_test_simple = train_test_split(X_simple,
y_simple, test_size=0.2, random_state=0)

regressor_simple = LinearRegression()
regressor_simple.fit(X_train_simple, y_train_simple)

# Predictions on the test set

y_pred_simple = regressor_simple.predict(X_test_simple)

# Visualization for Simple Linear Regression

# Multiple Linear Regression

regressor_multi = LinearRegression()
regressor_multi.fit(X_train_multi, y_train_multi)

# Predictions on the test set

y_pred_multi = regressor_multi.predict(X_test_multi)

Output-
Practical No. 07

Aim- Logistic Regression and Decision Tree

Explore the Dataset:

Preprocess the Data:

Handle missing values and perform any necessary data cleaning steps.
Encode categorical variables if required.
Split the dataset into training and testing sets for model evaluation.

Build Logistic Regression Model:

Evaluate Logistic Regression Model:

Construct Decision Tree Model:

# Load the dataset

df = pd.read_csv('diabetesUp.csv')

# Split the dataset into features (X) and target variable (y)
X = df.drop(columns=['BloodPressure'])
y = df['Age']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the data

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Decision Tree
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train_scaled, y_train)
y_pred_dt = dt_model.predict(X_test_scaled)

# Evaluation metrics for Logistic Regression

# Evaluation metrics for Decision Tree

# Plot confusion matrices

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix(y_test, y_pred_dt), annot=True, cmap='Blues', fmt='g')
plt.title('Confusion Matrix - Decision Tree')
plt.xlabel('Predicted')
plt.ylabel('True')

plt.tight_layout()
plt.show()
Output-
Practical No. 08

Aim- K-Means Clustering

Apply K-Means Algorithm:

Determine Optimal Number of Clusters:

Visualize Clustering Results:

Interpret Results:
Interpret the clustering results based on the characteristics of each cluster.
Analyze any meaningful patterns or insights discovered through clustering.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=42)

# Determine the optimal number of clusters using silhouette analysis

# Plot silhouette scores to determine the optimal number of clusters

# Apply K-Means clustering with the optimal number of clusters

kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) #
Explicitly setting n_init
kmeans.fit(X)
cluster_labels = kmeans.labels_

# Visualize the clustering results

# Analyze the characteristics of each cluster

Aim- Principal Component Analysis (PCA)

Acquire and Preprocess Data:

Standardize the Data:

Standardize the features by subtracting the mean and dividing by the standard
deviation. This step is essential for PCA as it ensures that all features have the same
scale.

Compute Covariance Matrix:

Compute the covariance matrix of the standardized data. The covariance matrix
represents the relationships between different features in the dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Visualize the data in the reduced-dimensional space

plt.figure(figsize=(8, 6))
for i in range(len(iris.target_names)):
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], label=iris.target_names[i])

plt.title('PCA of Iris Dataset')

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
Output-
Practical No. 10

Aim- Data Visualization and Storytelling

Select Visualization Tools:

Explore and Clean the Data:

Perform exploratory data analysis (EDA) to understand the distribution,
relationships, and patterns within the dataset.
Clean the data by handling missing values, outliers, and inconsistencies.

Identify Key Insights:

Identify key insights or findings from the dataset that you want to communicate
through visualizations.
Prioritize insights based on relevance and significance to the audience.

Create Meaningful Visualizations:

Combine Visualizations into a Story:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load sample dataset (you can replace this with your own dataset)
df = sns.load_dataset('tips')

# Explore the dataset

print(df.head())

# Visualization 2: Relationship between Total Bill and Tip Amount

# Visualization 3: Box plot of Total Bill Amount by Day

plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box plot of Total Bill Amount by Day')
plt.xlabel('Day')
plt.ylabel('Total Bill Amount')
plt.show()

# Visualization 4: Count of Customers by Day and Time

Output-
Practical No. 01

Aim: Introduction to Excel

 Perform conditional formatting on a dataset using various criteria.

We perform conditional formatting on the "Profit" column to highlight cells with a profit greater than 800 using
following steps:

Steps:
1. Select the "Profit" column (Column E).

2. Go to the "Home" tab on the ribbon.

3. Click on "Conditional Formatting" in the toolbar.
4. Choose "Highlight Cells Rules" and then "Greater Than."

5. Enter the threshold value as 800.

6. Customize the formatting options (e.g., choose a fill color).
7. Click "OK" to apply the rule.

 Create a pivot table to analyze and summarize data.

Following are the steps to create a pivot table to analyze total sales by category.

Steps:
1. Select the entire dataset including headers.
2. Go to the "Insert" tab on the ribbon.
3. Click on "PivotTable."

4. Choose where you want to place the PivotTable (e.g., new worksheet).
5. Drag "Category" to the Rows area.
6. Drag "Sales" to the Values area, choosing the sum function.

 Use VLOOKUP function to retrieve information from a different worksheet or table.

4. Set "Set cell" to the profit cell (E17), "To value" to 1000, and "By changing cell" to the sales cell (C17).
5. Click "OK" to let Excel determine the required sales.
Practical No. 02

Aim: Data Frames and Basic Data Pre-processing

Aim: Feature Scaling and Dummification

 Apply feature-scaling techniques like standardization and normalization to
numerical features.
 Perform feature dummification to convert categorical variables into
numerical representations.

Aim: Hypothesis Testing

Output:
Practical No. 05

Aim- ANOVA(Analysis of variance)

 Perform one-way ANOVA to compare means across multiple groups
 Conduct post-hoc tests to identify significant differences between groups
means.

Acquire and Prepare Data:

Perform One-Way ANOVA:

from matplotlib import pyplot as plt

Output-

from matplotlib import pyplot as plt

Output-
Practical No. 06

Aim:- Regression and Its Types

Explore the Dataset:

Implement Simple Linear Regression:

Visualization and Interpretation:

Visualize the regression line overlaid on the scatter plot to visually assess how well the
model fits the data.

# Load the dataset

df = pd.read_csv('diabetes.csv')

# Simple Linear Regression

X_simple = df[['Age']]
y_simple = df['Pregnancies']
X_train_simple, X_test_simple, y_train_simple, y_test_simple = train_test_split(X_simple,
y_simple, test_size=0.2, random_state=0)

regressor_simple = LinearRegression()
regressor_simple.fit(X_train_simple, y_train_simple)

# Predictions on the test set

y_pred_simple = regressor_simple.predict(X_test_simple)

# Visualization for Simple Linear Regression

# Multiple Linear Regression

regressor_multi = LinearRegression()
regressor_multi.fit(X_train_multi, y_train_multi)

# Predictions on the test set

y_pred_multi = regressor_multi.predict(X_test_multi)

Output-
Practical No. 07

Aim- Logistic Regression and Decision Tree

Explore the Dataset:

Preprocess the Data:

Handle missing values and perform any necessary data cleaning steps.
Encode categorical variables if required.
Split the dataset into training and testing sets for model evaluation.

Build Logistic Regression Model:

Evaluate Logistic Regression Model:

Construct Decision Tree Model:

# Load the dataset

df = pd.read_csv('diabetesUp.csv')

# Split the dataset into features (X) and target variable (y)
X = df.drop(columns=['BloodPressure'])
y = df['Age']

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)

# Scale the data

scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)

# Decision Tree
dt_model = DecisionTreeClassifier()
dt_model.fit(X_train_scaled, y_train)
y_pred_dt = dt_model.predict(X_test_scaled)

# Evaluation metrics for Logistic Regression

# Evaluation metrics for Decision Tree

# Plot confusion matrices

plt.figure(figsize=(12, 5))

plt.subplot(1, 2, 2)
sns.heatmap(confusion_matrix(y_test, y_pred_dt), annot=True, cmap='Blues', fmt='g')
plt.title('Confusion Matrix - Decision Tree')
plt.xlabel('Predicted')
plt.ylabel('True')

plt.tight_layout()
plt.show()
Output-
Practical No. 08

Aim- K-Means Clustering

Apply K-Means Algorithm:

Determine Optimal Number of Clusters:

Visualize Clustering Results:

Interpret Results:
Interpret the clustering results based on the characteristics of each cluster.
Analyze any meaningful patterns or insights discovered through clustering.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
from sklearn.metrics import silhouette_score

# Generate sample data

X, _ = make_blobs(n_samples=300, centers=4, cluster_std=0.60, random_state=42)

# Determine the optimal number of clusters using silhouette analysis

# Plot silhouette scores to determine the optimal number of clusters

# Apply K-Means clustering with the optimal number of clusters

kmeans = KMeans(n_clusters=n_clusters, random_state=42, n_init=10) #
Explicitly setting n_init
kmeans.fit(X)
cluster_labels = kmeans.labels_

# Visualize the clustering results

# Analyze the characteristics of each cluster

Aim- Principal Component Analysis (PCA)

Acquire and Preprocess Data:

Standardize the Data:

Standardize the features by subtracting the mean and dividing by the standard
deviation. This step is essential for PCA as it ensures that all features have the same
scale.

Compute Covariance Matrix:

Compute the covariance matrix of the standardized data. The covariance matrix
represents the relationships between different features in the dataset.

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.datasets import load_iris

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Visualize the data in the reduced-dimensional space

plt.figure(figsize=(8, 6))
for i in range(len(iris.target_names)):
plt.scatter(X_pca[y == i, 0], X_pca[y == i, 1], label=iris.target_names[i])

plt.title('PCA of Iris Dataset')

plt.xlabel('Principal Component 1')
plt.ylabel('Principal Component 2')
plt.legend()
plt.show()
Output-
Practical No. 10

Aim- Data Visualization and Storytelling

Select Visualization Tools:

Explore and Clean the Data:

Perform exploratory data analysis (EDA) to understand the distribution,
relationships, and patterns within the dataset.
Clean the data by handling missing values, outliers, and inconsistencies.

Identify Key Insights:

Identify key insights or findings from the dataset that you want to communicate
through visualizations.
Prioritize insights based on relevance and significance to the audience.

Create Meaningful Visualizations:

Combine Visualizations into a Story:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import seaborn as sns

# Load sample dataset (you can replace this with your own dataset)
df = sns.load_dataset('tips')

# Explore the dataset

print(df.head())

# Visualization 2: Relationship between Total Bill and Tip Amount

# Visualization 3: Box plot of Total Bill Amount by Day

plt.figure(figsize=(10, 6))
sns.boxplot(x='day', y='total_bill', data=df)
plt.title('Box plot of Total Bill Amount by Day')
plt.xlabel('Day')
plt.ylabel('Total Bill Amount')
plt.show()

# Visualization 4: Count of Customers by Day and Time

Output-

Data Cleaning - Cheatsheet
100% (2)
Data Cleaning - Cheatsheet
8 pages
Universal Data Analytics Algorithm
No ratings yet
Universal Data Analytics Algorithm
51 pages
Wind Load Calculation Sheet
No ratings yet
Wind Load Calculation Sheet
6 pages
Data Manipulation in Python Using Pandas
No ratings yet
Data Manipulation in Python Using Pandas
12 pages
Practicals
No ratings yet
Practicals
42 pages
Even Students
No ratings yet
Even Students
36 pages
Data Science Practicals
No ratings yet
Data Science Practicals
47 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Linear Algebra Cheat Sheet
100% (2)
Linear Algebra Cheat Sheet
1 page
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Vislaization Manual
No ratings yet
Vislaization Manual
27 pages
Data Science Practicals
No ratings yet
Data Science Practicals
40 pages
Index: SR. NO. Practical Name Date of Perform NO. Sign
No ratings yet
Index: SR. NO. Practical Name Date of Perform NO. Sign
28 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
Handson Data Preprocessing PYTHON
No ratings yet
Handson Data Preprocessing PYTHON
3 pages
Walmart Da PDF
No ratings yet
Walmart Da PDF
21 pages
Dsbda Ass2
No ratings yet
Dsbda Ass2
49 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
Profitanalysis
No ratings yet
Profitanalysis
18 pages
Data Cleaning in Python
No ratings yet
Data Cleaning in Python
14 pages
NumPy and Pandas Step
No ratings yet
NumPy and Pandas Step
9 pages
Intro To Pandas For Data Analytics
No ratings yet
Intro To Pandas For Data Analytics
20 pages
Pandas Syntax Revision For ML
No ratings yet
Pandas Syntax Revision For ML
10 pages
Pandas Practise Problems
No ratings yet
Pandas Practise Problems
8 pages
Document
No ratings yet
Document
29 pages
Exp 8 - LM
No ratings yet
Exp 8 - LM
10 pages
Eda Indepth
No ratings yet
Eda Indepth
19 pages
Lab 6
No ratings yet
Lab 6
9 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Python - Pandas - Numpy Interview Q&A
No ratings yet
Python - Pandas - Numpy Interview Q&A
12 pages
BI Pracrical
No ratings yet
BI Pracrical
12 pages
DW Lab File
No ratings yet
DW Lab File
18 pages
Pandas Fuction Notes
No ratings yet
Pandas Fuction Notes
3 pages
Explorotary Data Analysis
100% (1)
Explorotary Data Analysis
30 pages
Cleaning Data in Python
No ratings yet
Cleaning Data in Python
8 pages
Excel To Pandas Advanced Data Techniques For BI Devs 1729266352
No ratings yet
Excel To Pandas Advanced Data Techniques For BI Devs 1729266352
9 pages
Prac 1
No ratings yet
Prac 1
5 pages
BigMart PDF
100% (1)
BigMart PDF
42 pages
GRCPlatfrom 5.4 AdminGuide
No ratings yet
GRCPlatfrom 5.4 AdminGuide
1,704 pages
Step-by-Step Explanation of Python Data Preprocessing Script
No ratings yet
Step-by-Step Explanation of Python Data Preprocessing Script
9 pages
Week 6 - Data Cleaning
No ratings yet
Week 6 - Data Cleaning
8 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
DS P2 Tanvi
No ratings yet
DS P2 Tanvi
3 pages
23 C 25 EC 1 2 Sem Electronics and Communication Engg
No ratings yet
23 C 25 EC 1 2 Sem Electronics and Communication Engg
22 pages
III Unit
No ratings yet
III Unit
4 pages
Assvid
No ratings yet
Assvid
13 pages
Data Analysis
No ratings yet
Data Analysis
4 pages
Prac 1
No ratings yet
Prac 1
5 pages
Pandas Data Manipulation Extended CheatSheet 1731972219
No ratings yet
Pandas Data Manipulation Extended CheatSheet 1731972219
9 pages
Prac 7
No ratings yet
Prac 7
5 pages
Supermarket Sales Data Analysis
No ratings yet
Supermarket Sales Data Analysis
6 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
EDS - Python Cheat Sheet
0% (1)
EDS - Python Cheat Sheet
3 pages
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Pandas Cheat Sheet
No ratings yet
Pandas Cheat Sheet
5 pages
1data Cleansing Cheklist
No ratings yet
1data Cleansing Cheklist
2 pages
Machine Learning Project Roadmap
No ratings yet
Machine Learning Project Roadmap
4 pages
11.6202 MUS - KS07 - Active - Carbon - Filter - Dryer - 2009
No ratings yet
11.6202 MUS - KS07 - Active - Carbon - Filter - Dryer - 2009
40 pages
Vehicle Frame & Suspension
No ratings yet
Vehicle Frame & Suspension
68 pages
Math 8 - q2 - Mod8 - Solvingproblemnsinvolvinglinearfunctions - V3
No ratings yet
Math 8 - q2 - Mod8 - Solvingproblemnsinvolvinglinearfunctions - V3
24 pages
4.1.1. Arenes: F324: Rings, Polymers and Analysis
No ratings yet
4.1.1. Arenes: F324: Rings, Polymers and Analysis
17 pages
2000 Nissan Frontier VG33E FA
No ratings yet
2000 Nissan Frontier VG33E FA
36 pages
Some Practical Laboratory Experiments
No ratings yet
Some Practical Laboratory Experiments
15 pages
Web Programming Multiple Choice Questions EDMAPRA
No ratings yet
Web Programming Multiple Choice Questions EDMAPRA
7 pages
Graph Related PCM Test
No ratings yet
Graph Related PCM Test
29 pages
Data Science Cheat Sheet: KEY Imports
100% (1)
Data Science Cheat Sheet: KEY Imports
1 page
Hatch 22 Lessons
No ratings yet
Hatch 22 Lessons
31 pages
Cost Index
No ratings yet
Cost Index
36 pages
V B01B0006B-GB PDF
No ratings yet
V B01B0006B-GB PDF
12 pages
SNK Series - 2023
No ratings yet
SNK Series - 2023
4 pages
Lab # 4-Head Loss in Pipes - Fillable
No ratings yet
Lab # 4-Head Loss in Pipes - Fillable
4 pages
Cambridge: Igcse
No ratings yet
Cambridge: Igcse
17 pages
Comp 128
No ratings yet
Comp 128
18 pages
Mastering Git
No ratings yet
Mastering Git
8 pages
IR Receiver AX 1838HS
No ratings yet
IR Receiver AX 1838HS
6 pages
Review of Literature
No ratings yet
Review of Literature
6 pages
Experiment 7 Electric Field Data Sheet Group 1 June 8
No ratings yet
Experiment 7 Electric Field Data Sheet Group 1 June 8
6 pages
Accomplishment Newsletter Numeracy Post Test
No ratings yet
Accomplishment Newsletter Numeracy Post Test
5 pages
Rate of Change of Area With Respect To Time
No ratings yet
Rate of Change of Area With Respect To Time
4 pages
117SIP, 107DIP, 171DIP PCB Mount Miniature Reed Relays
No ratings yet
117SIP, 107DIP, 171DIP PCB Mount Miniature Reed Relays
5 pages
LUZ MEDINA, Checkpoint 4 Corrected
No ratings yet
LUZ MEDINA, Checkpoint 4 Corrected
3 pages
DataSheet ANT-100
No ratings yet
DataSheet ANT-100
2 pages
Determining The Chemical Formula of A Hydrate
No ratings yet
Determining The Chemical Formula of A Hydrate
2 pages
Essential n8n Playbook
From Everand
Essential n8n Playbook
Leandro Calado
No ratings yet
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
From Everand
Scala Data Analysis Cookbook (new): Navigate the world of data analysis, visualization, and machine learning with over 100 hands-on Scala recipes
Arun Manivannan
No ratings yet
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet