0% found this document useful (0 votes)

20 views18 pages

DW Lab File

Uploaded by

jadeanica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views18 pages

DW Lab File

Uploaded by

jadeanica

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 18

Index

S. No Practical Remarks
1 Write a program to read data from CSV
files and display the content.
2 Write a program to read data from JSON
files and display the content.
3 Load a dataset into a data structure (e.g.,
Data Frame) and perform basic data
cleaning (e.g., handling missing values).
4 Solve some Case study by performing
filtering, Group by and add new column
to dataset.
5 Design and implement a program to
create a Data Mart.
6 Implementation of Data Cleansing using
the Python
7 Develop a program to create metadata
for a dataset, including relevant data
descriptions.
8 Write Python code to perform data
transformation tasks.
9 Write a python code for Data
Discretization.
10 Create and visualize a graph from a
dataset using a graph library.
11 Case Study I
12 Case Study II
13 Case Study III
14 Implement a k-Nearest Neighbour (k-
NN) classifier and evaluate its
performance on a given dataset.

1. Write a program to read data from CSV files and display the content.

Objective:- Read and display data from various file formats such as CSV and JSON.

Code:-
import csv
def read_csv(file_path):
try:
with open(file_path, mode='r', newline='', encoding='utf-8') as file:
csv_reader = csv.reader(file)
header = next(csv_reader)
print(f"Header: {header}")
print("\nData:")
for row in csv_reader:
print(row)
except FileNotFoundError:
print(f"Error: The file '{file_path}' was not found.")
except Exception as e:
print(f"An error occurred: {e}")
file_path = 'Book1.csv'
read_csv(file_path)

Output:-

2.Write a program to read data from JSON files and display the content.
Objective:- Load datasets into appropriate data structures (e.g., Pandas DataFrame)
for analysis.

Code:-

import json
def display_json_data():
json_data = {
"name": "Alice",
"age": 30,
"city": "New York",
"is_student": False,
"courses": ["Math", "Science", "English"]
}
print("JSON Data Content:")
print(json.dumps(json_data, indent=4))
display_json_data()

Output:-

3. Load a dataset into a data structure (e.g., Data Frame) and perform basic
data cleaning (e.g., handling missing values).
Objective:- Perform basic data cleaning by handling missing values and
inconsistencies.

Code:-

import pandas as pd
import numpy as np
#create a list of lists to hold the data
data = { 'Customer ID' :[1,2,3,4,5,6],
'Name' : ['John Smith','jane Doe','jake Doe','john Smith',None,'Alice
Brown'],
'Purchase Date' :
['01/09/2024','01/09/0024','02/09/2024','09/01/2024','01/09/2024','01/09/2024'],
'Amount' : [100, '$200', 300, 400, -500, 600],
'Email' : ['[email protected]', '[email protected]', 'N/A',
'[email protected]', '', 'alice#example.com'],
'Address' :['123 Maple St,NY','456 Eim St,NY' ,'789 Pine St,NY','123 Maple
St,NY','123 Maple St,NY','']
}
#create database
import pandas as pd
df= pd.DataFrame(data)
df['Purchase Date'] = pd.to_datetime(df['Purchase Date'], format='%d/%m/%Y',
errors='coerce').dt.strftime('%Y-%m-%d')
#Remove dollar signs and convert amount to numeric
df['Amount'] = df['Amount'].replace('[\$,]', '', regex=True).astype(float)
#Currect negetive amounts (assume refund should be positive)
df['Amount'] = df['Amount'].abs()
#Replace N/A and Invalid emails with None
df['Email'] = df['Email'].replace(['N/A', 'Invalid'], None)
df['Email'] = df['Email'].replace('alice#example.com', '[email protected]')
#currect lowercase names
df['Name'] = df['Name'].str.title()
df = df.drop_duplicates(subset=['Customer ID', 'Amount'])
df['Address'] = df['Address'].fillna('Address not Available',inplace=True)
#final cleasing data
print("\nCleansed Data")
print(df)
Output:-
4. Solve some Case study by performing filtering, Group by and add new
column to dataset.

Objective:- Implement advanced data analysis techniques, such as filtering,

grouping, and adding new calculated columns to datasets.

Code:-

import pandas as pd
file_path = 'sales_data.csv'
df = pd.read_csv(file_path)
# Display the original dataset
print("Original Dataset:")
print(df.head())
# Filter the dataset where Amount is greater than 800
filtered_df = df[df['Amount'] > 800]
# Display the filtered dataset
print("\nFiltered Dataset (Amount > 800):")
print(filtered_df)
grouped_df = df.groupby('Salesperson').agg(
total_sales=('Amount', 'sum'),
total_quantity_sold=('Quantity', 'sum')
).reset_index()
# Display the grouped data
print("\nGrouped Data by Salesperson:")
print(grouped_df)
df['Total Sales after Discount'] = df['Amount'] * (1 - df['Discount'])
print("\nDataset with 'Total Sales after Discount' Column:")
print(df)

Output:-
5. Design and implement a program to create a Data Mart.

Objective:- Design and build a Data Mart for organizing and storing data for business
intelligence.

Code:-

import pandas as pd
data = {
'Order_ID': [101, 102, 103, 104, 105, 106],
'Salesperson': ['Alice', 'Bob', 'Alice', 'Charlie', 'Alice', 'Bob'],
'Region': ['East', 'West', 'East', 'East', 'West', 'East'],
'Amount': [1000, 1500, 800, 1200, 2000, 900],
'Quantity': [10, 15, 8, 12, 20, 9],
'Discount': [0.1, 0.05, 0.2, 0.15, 0.1, 0.1],
'Date': ['2024-01-10', '2024-01-12', '2024-01-13', '2024-01-14', '2024-01-15',
'2024-01-16']
}
# Convert the dictionary to a pandas DataFrame
df = pd.DataFrame(data)
# Data cleaning
print("\nMissing Values in Data:")
print(df.isnull().sum())
df['Discount'].fillna(0, inplace=True)
df['Total_Sales_After_Discount'] = df['Amount'] * (1 - df['Discount'])
# Aggregating data by Region and Salesperson
df_aggregated = df.groupby(['Region', 'Salesperson']).agg(
total_sales=('Total_Sales_After_Discount', 'sum'),
total_quantity=('Quantity', 'sum')
).reset_index()
# Display the transformed (aggregated) data
print("\nAggregated Data (Sales by Region and Salesperson):")
print(df_aggregated)
data_mart_path = 'sales_data_mart.csv'
df_aggregated.to_csv(data_mart_path, index=False)
# Confirm Data Mart creation
print(f"\nData Mart Created and Saved to: {data_mart_path}")
Output:-

6. Implementation of Data Cleansing using the Python

Objective:- Apply data cleansing techniques to improve data quality.

Code:-

import pandas as pd
import numpy as np
data = {
'Customer_ID': [101, 102, 103, 104, 105, 106, 107, 108, 109, 110],
'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eva', 'Frank', 'George',
'Hannah', 'Ivan', 'Jack'],
'Age': [25, 30, np.nan, 22, 35, 29, np.nan, 40, 23, 25],
'Email': ['[email protected]', '[email protected]', '[email protected]', np.nan,
'[email protected]', '[email protected]', '[email protected]', '[email protected]',
'[email protected]', '[email protected]'],
'Purchase_Amount': [100, 200, 150, 300, 250, 400, 450, 100, 500, 200],
'Country': ['USA', 'USA', 'USA', 'Canada', 'USA', 'Canada', 'USA', 'USA',
'USA', 'USA'],
}
# Convert to DataFrame
df = pd.DataFrame(data)
print("Original Dataset:")
print(df)
df['Age'] = df['Age'].fillna(df['Age'].median())
# Fill NaN values in 'Email' with a placeholder
df['Email'] = df['Email'].fillna('[email protected]')
print("\nData after Handling Missing Values:")
print(df)
# Removing duplicates
df_duplicate = df.append(df.iloc[0], ignore_index=True)
df_no_duplicates = df_duplicate.drop_duplicates()
print("\nData after Removing Duplicates:")
print(df_no_duplicates)
# Inconsistent Formatting
df['Country'] = df['Country'].str.title()
# Stripping leading spaces in 'Name' column
df['Name'] = df['Name'].str.strip()
print("\nData after Inconsistent Formatting Handling:")
print(df)
Q1 = df['Purchase_Amount'].quantile(0.25)
Q3 = df['Purchase_Amount'].quantile(0.75)
IQR = Q3 - Q1
lower_bound = Q1 - 1.5 * IQR
upper_bound = Q3 + 1.5 * IQR
df_no_outliers = df[(df['Purchase_Amount'] >= lower_bound) & (df['Purchase_Amount']
<= upper_bound)]
print("\nData after Handling Outliers:")
print(df_no_outliers)

Output:-
7. Develop a program to create metadata for a dataset, including relevant
data descriptions.

Objective:- Generate metadata to describe the structure and content of datasets.

Code:-

import pandas as pd

# Load dataset
df = pd.read_csv('Book1.csv')
# Generate metadata
metadata = {
'columns': df.columns.tolist(),
'data_types': df.dtypes.to_dict(),
'missing_values': df.isnull().sum().to_dict(),
'descriptive_statistics': df.describe().to_dict()
}
# Output the metadata
print("Metadata for the dataset: \n")
print(metadata['columns'])
print(metadata['data_types'])
print(metadata['missing_values'])
print(metadata['descriptive_statistics'])

Output:-
8. Write Python code to perform data transformation tasks.

Objective:- Perform data transformation tasks like normalization, scaling, and

encoding.

Code:-

from sklearn.preprocessing import MinMaxScaler

import pandas as pd

# Example dataset
data = {'Age': [25, 30, 35, 40], 'Salary': [50000, 60000, 70000, 80000]}
df = pd.DataFrame(data)

# Min-Max Scaling of 'Age' and 'Salary' columns

scaler = MinMaxScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

print("Transformed Data:")
print(df)

Output:-
9. Write a python code for Data Discretization.

Objective:- Apply data discretization to convert continuous data into discrete bins.

Code:-

import pandas as pd
import numpy as np

df = pd.DataFrame({'Age': [25, 30, 35, 40, 45, 50, 55, 60]})

# Define bins and labels

bins = [0, 30, 45, 100]
labels = ['Young', 'Middle-Aged', 'Old']

# Create a new 'Age_Group' column

df['Age_Group'] = pd.cut(df['Age'], bins=bins, labels=labels)

print("Discretized Data:")
print(df)

Output:-
10. Create and visualize a graph from a dataset using a graph library.

Objective:- Visualize datasets by creating graphs using visualization libraries.

Code:-

import matplotlib.pyplot as plt

import pandas as pd

df = pd.DataFrame({
'Year': [2015, 2016, 2017, 2018, 2019],
'Sales': [100, 150, 200, 250, 300]
})

# Plotting
plt.plot(df['Year'], df['Sales'], marker='o')
plt.title('Sales Over the Years')
plt.xlabel('Year')
plt.ylabel('Sales')
plt.grid(True)
plt.show()

Output:-
11. Case Study I

Objective:- Solve business case studies by analyzing the data and extracting
actionable insights.

Code:-

import pandas as pd
from sklearn.preprocessing import StandardScaler
from sklearn.impute import SimpleImputer
data = {'Age': [25, 30, 35, None, 45],
'Salary': [50000, 60000, None, 80000, 95000]}

df = pd.DataFrame(data)

# Handle missing values using mean imputation

imputer = SimpleImputer(strategy='mean')
df['Age'] = imputer.fit_transform(df[['Age']])
df['Salary'] = imputer.fit_transform(df[['Salary']])

# Feature scaling
scaler = StandardScaler()
df[['Age', 'Salary']] = scaler.fit_transform(df[['Age', 'Salary']])

# Print the cleaned data

print("Cleaned Data:")
print(df)

Output:-
12. Case Study II

Objective:- Implement machine learning algorithms such as k-Nearest Neighbors (k-

NN) for classification and performance evaluation.

Code:-

from sklearn.model_selection import train_test_split

from sklearn.neighbors import KNeighborsClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score

# Load the Iris dataset

iris = load_iris()
X = iris.data
y = iris.target

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

# Initialize the k-NN classifier with k=3

knn = KNeighborsClassifier(n_neighbors=3)

# Fit the model

knn.fit(X_train, y_train)

# Predict on the test set

y_pred = knn.predict(X_test)

# Evaluate the performance using accuracy score

accuracy = accuracy_score(y_test, y_pred)
print(f'Accuracy of k-NN classifier: {accuracy * 100:.2f}%')

Output:-
13. Case Study III

Objective:- Solve business case studies by analyzing the data and extracting
actionable insights.

Code:-

from sklearn.model_selection import GridSearchCV

from sklearn.metrics import classification_report

# Grid search for hyperparameter tuning

param_grid = {'n_neighbors': [1, 3, 5, 7, 9],
'weights': ['uniform', 'distance'],
'metric': ['euclidean', 'manhattan']}

grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)

grid_search.fit(X_train, y_train)

# Best parameters from Grid Search

print(f"Best parameters: {grid_search.best_params_}")

# Evaluate the best model

best_knn = grid_search.best_estimator_
y_pred_best = best_knn.predict(X_test)

# Classification report
print("\nClassification Report:")
print(classification_report(y_test, y_pred_best))

Output:-
14. Implement a k-Nearest Neighbour (k-NN) classifier and evaluate its
performance on a given dataset.

Objective:- Implement machine learning algorithms such as k-Nearest Neighbors (k-

NN) for classification and performance evaluation.

Code:-

from sklearn.datasets import load_iris

from sklearn.model_selection import train_test_split
from sklearn.neighbors import KNeighborsClassifier
from sklearn.metrics import accuracy_score, confusion_matrix, classification_report
import matplotlib.pyplot as plt
import seaborn as sns

# Load the Iris dataset

iris = load_iris()
X = iris.data # Features
y = iris.target # Target labels (species)

# Split the dataset into training and testing sets

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3,
random_state=42)

knn = KNeighborsClassifier(n_neighbors=3) # You can adjust k (the number of

neighbors)

knn.fit(X_train, y_train)

y_pred = knn.predict(X_test)
# Accuracy
accuracy = accuracy_score(y_test, y_pred)
print(f"Accuracy of the k-NN classifier: {accuracy * 100:.2f}%")

# Confusion Matrix
cm = confusion_matrix(y_test, y_pred)
print("\nConfusion Matrix:")
print(cm)

# Classification Report (Precision, Recall, F1-Score)

print("\nClassification Report:")
print(classification_report(y_test, y_pred))

# Visualize the confusion matrix using seaborn

plt.figure(figsize=(7, 5))
sns.heatmap(cm, annot=True, fmt='d', cmap='Blues', xticklabels=iris.target_names,
yticklabels=iris.target_names)
plt.title("Confusion Matrix for k-NN Classifier")
plt.xlabel("Predicted Labels")
plt.ylabel("True Labels")
plt.show()
Output:-

HF4 Standard Tests Teachers Notes PDF
100% (1)
HF4 Standard Tests Teachers Notes PDF
25 pages
Immanence - A Life
No ratings yet
Immanence - A Life
3 pages
Reflection Rubric
No ratings yet
Reflection Rubric
1 page
Data Exploration Preparation
No ratings yet
Data Exploration Preparation
12 pages
Document
No ratings yet
Document
29 pages
Overview of Data Cleaning
No ratings yet
Overview of Data Cleaning
17 pages
IIT FDS Assignment 1 Likhita
No ratings yet
IIT FDS Assignment 1 Likhita
7 pages
Practical File IP
No ratings yet
Practical File IP
27 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
Data Mining Journal 1 Kashan
No ratings yet
Data Mining Journal 1 Kashan
13 pages
Even Students
No ratings yet
Even Students
36 pages
PDS Exp 7 To 9
No ratings yet
PDS Exp 7 To 9
10 pages
DAP Writeups - Merged
No ratings yet
DAP Writeups - Merged
33 pages
IIT FDS Assignment1
No ratings yet
IIT FDS Assignment1
2 pages
Oddstudents
No ratings yet
Oddstudents
35 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
28 pages
IP Record Final-1
No ratings yet
IP Record Final-1
34 pages
Dejene Chala Stat606 Screening Quiz Programming Part
No ratings yet
Dejene Chala Stat606 Screening Quiz Programming Part
12 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
Informatics Practicals 12th (Personal)
No ratings yet
Informatics Practicals 12th (Personal)
89 pages
B Tech-AIML-question Bank-2 Answer Key
No ratings yet
B Tech-AIML-question Bank-2 Answer Key
9 pages
Deep Learning Ram
No ratings yet
Deep Learning Ram
21 pages
New Green Field School
No ratings yet
New Green Field School
33 pages
III Unit
No ratings yet
III Unit
4 pages
Ip Practical File
No ratings yet
Ip Practical File
20 pages
IP Practic MINE
No ratings yet
IP Practic MINE
30 pages
Pandas Practice 2
No ratings yet
Pandas Practice 2
12 pages
Xii-Ip-Practicallist 241129 183455
No ratings yet
Xii-Ip-Practicallist 241129 183455
2 pages
Prac 7
No ratings yet
Prac 7
5 pages
Python and PowerBI Syllabus
No ratings yet
Python and PowerBI Syllabus
3 pages
Informatics Practices Record Class 12
No ratings yet
Informatics Practices Record Class 12
60 pages
Day 10 Pandasdatacleaning
No ratings yet
Day 10 Pandasdatacleaning
6 pages
FIT5196-S2-2020 Assessment 2
No ratings yet
FIT5196-S2-2020 Assessment 2
4 pages
Assignment
No ratings yet
Assignment
12 pages
Data Wrangling Notebook Summary
No ratings yet
Data Wrangling Notebook Summary
9 pages
Data Cleaning
No ratings yet
Data Cleaning
13 pages
PW2 DataCleaning
No ratings yet
PW2 DataCleaning
6 pages
IP Practical
No ratings yet
IP Practical
15 pages
12 Ip Practical List With Solution Complete
No ratings yet
12 Ip Practical List With Solution Complete
5 pages
Data Science Papers
No ratings yet
Data Science Papers
109 pages
DS Question Bank Unit-1 Part-2
No ratings yet
DS Question Bank Unit-1 Part-2
3 pages
IP Lab Record
No ratings yet
IP Lab Record
23 pages
Unit 4 - Working With Graphs - Python
No ratings yet
Unit 4 - Working With Graphs - Python
49 pages
Practical File Infomatics Practices 2024-25
No ratings yet
Practical File Infomatics Practices 2024-25
39 pages
Document (4) - 1
No ratings yet
Document (4) - 1
15 pages
Avneesh - To Be Printed Information Practice
No ratings yet
Avneesh - To Be Printed Information Practice
8 pages
Ads Phase 5
No ratings yet
Ads Phase 5
23 pages
Data Cleaning
No ratings yet
Data Cleaning
28 pages
PRACTICALS
No ratings yet
PRACTICALS
52 pages
Python (Unit - 2)
No ratings yet
Python (Unit - 2)
22 pages
Ass 3 - Best
No ratings yet
Ass 3 - Best
10 pages
IP Record Python 23-24 Aryan
No ratings yet
IP Record Python 23-24 Aryan
42 pages
Practical No. 01
No ratings yet
Practical No. 01
114 pages
IP Practical 2023-24 (1 To 34)
100% (1)
IP Practical 2023-24 (1 To 34)
32 pages
3rd Week Report
No ratings yet
3rd Week Report
7 pages
Class 12 Practical File Informatics Practices
No ratings yet
Class 12 Practical File Informatics Practices
22 pages
Statistical Transform Data Cleaning
No ratings yet
Statistical Transform Data Cleaning
30 pages
6.data Cleaning
No ratings yet
6.data Cleaning
20 pages
Python Report Ritik
No ratings yet
Python Report Ritik
15 pages
Practicals
No ratings yet
Practicals
42 pages
Lab 6
No ratings yet
Lab 6
9 pages
Visual Basic 2010 Coding Briefs Data Access
From Everand
Visual Basic 2010 Coding Briefs Data Access
Kevin Hough
5/5 (1)
C# 2010 Coding Briefs Data Access
From Everand
C# 2010 Coding Briefs Data Access
Kevin Hough
No ratings yet
Ooad Notes
No ratings yet
Ooad Notes
5 pages
Mis Notes
No ratings yet
Mis Notes
13 pages
Format Lab Record MAD CA3131
No ratings yet
Format Lab Record MAD CA3131
7 pages
Icc Notes
No ratings yet
Icc Notes
7 pages
Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan Download
100% (4)
Bayesian Statistics For Beginners: A Step-By-Step Approach Therese M Donovan Download
59 pages
Via Nova
No ratings yet
Via Nova
196 pages
Tim's Blog Light - WS2812 Library V2.0 - Part I: Understanding The WS2812
No ratings yet
Tim's Blog Light - WS2812 Library V2.0 - Part I: Understanding The WS2812
15 pages
Ch-2-Maintenance Support Processes1
No ratings yet
Ch-2-Maintenance Support Processes1
59 pages
Tenses (Notes) : Chart-Active Verb Tenses
No ratings yet
Tenses (Notes) : Chart-Active Verb Tenses
2 pages
Dr. Nawal Al Hulwa - CV - English
No ratings yet
Dr. Nawal Al Hulwa - CV - English
8 pages
CG Mini Project Report Kyashawanth
100% (1)
CG Mini Project Report Kyashawanth
33 pages
Project Final Report V 1.54
No ratings yet
Project Final Report V 1.54
88 pages
Lecture WRITING A RESEARCH PROPOSAL
No ratings yet
Lecture WRITING A RESEARCH PROPOSAL
16 pages
(Online Teaching) A2 Flyers Speaking Part 2
No ratings yet
(Online Teaching) A2 Flyers Speaking Part 2
11 pages
Test (Allophones and Aspiration)
No ratings yet
Test (Allophones and Aspiration)
3 pages
First Quarter Grasps For Performance Task #1: Writing Speech Choir Piece
No ratings yet
First Quarter Grasps For Performance Task #1: Writing Speech Choir Piece
3 pages
Top 50 SQL Server Interview Question
No ratings yet
Top 50 SQL Server Interview Question
15 pages
Advanced Linguistics OK
No ratings yet
Advanced Linguistics OK
5 pages
Jurnal - Siti Nurjanah - 022119046
No ratings yet
Jurnal - Siti Nurjanah - 022119046
15 pages
Sap Tables
No ratings yet
Sap Tables
5 pages
Telesys TMP 1700-470
No ratings yet
Telesys TMP 1700-470
10 pages
Assignment-Unit-Iii - (Software & It'S Types) : (PART-B)
No ratings yet
Assignment-Unit-Iii - (Software & It'S Types) : (PART-B)
4 pages
Ralph Rosen-Making Mockery - The Poetics of Ancient Satire (Classical Culture and Society) (2007)
100% (1)
Ralph Rosen-Making Mockery - The Poetics of Ancient Satire (Classical Culture and Society) (2007)
311 pages
Rmi RCRS
No ratings yet
Rmi RCRS
10 pages
Spelling Patterns Chart: Pattern: - Cei
No ratings yet
Spelling Patterns Chart: Pattern: - Cei
2 pages
Install Instructions
No ratings yet
Install Instructions
33 pages
MATHS TEST - Grade 1
No ratings yet
MATHS TEST - Grade 1
7 pages
Intro To ANSYS Ncode DL 14 5 L14 Standalone DesignLife
No ratings yet
Intro To ANSYS Ncode DL 14 5 L14 Standalone DesignLife
21 pages
Kajian Sikap Dan Persepsi Terhadap Pembelajaran Bahasa Mandarin Dalam Kalangan Pelajar Uitm Kelantan
No ratings yet
Kajian Sikap Dan Persepsi Terhadap Pembelajaran Bahasa Mandarin Dalam Kalangan Pelajar Uitm Kelantan
16 pages
HQL: Hyperinsane Query Language: (Or How To Access The Whole SQL API Within A HQL Injection ?)
No ratings yet
HQL: Hyperinsane Query Language: (Or How To Access The Whole SQL API Within A HQL Injection ?)
8 pages
A History of Political Thought Plato To Marx 2nd Edition 2nd Subrata Mukherjee Instant Download
No ratings yet
A History of Political Thought Plato To Marx 2nd Edition 2nd Subrata Mukherjee Instant Download
84 pages

DW Lab File

Uploaded by

DW Lab File

Uploaded by

Index

Objective:- Implement advanced data analysis techniques, such as filtering,

6. Implementation of Data Cleansing using the Python

Objective:- Apply data cleansing techniques to improve data quality.

Objective:- Generate metadata to describe the structure and content of datasets.

Objective:- Perform data transformation tasks like normalization, scaling, and

from sklearn.preprocessing import MinMaxScaler

# Min-Max Scaling of 'Age' and 'Salary' columns

df = pd.DataFrame({'Age': [25, 30, 35, 40, 45, 50, 55, 60]})

# Define bins and labels

# Create a new 'Age_Group' column

Objective:- Visualize datasets by creating graphs using visualization libraries.

import matplotlib.pyplot as plt

# Handle missing values using mean imputation

# Print the cleaned data

Objective:- Implement machine learning algorithms such as k-Nearest Neighbors (k-

from sklearn.model_selection import train_test_split

# Load the Iris dataset

# Split the dataset into training and testing sets

# Initialize the k-NN classifier with k=3

# Fit the model

# Predict on the test set

# Evaluate the performance using accuracy score

from sklearn.model_selection import GridSearchCV

# Grid search for hyperparameter tuning

grid_search = GridSearchCV(KNeighborsClassifier(), param_grid, cv=5)

# Best parameters from Grid Search

# Evaluate the best model

Objective:- Implement machine learning algorithms such as k-Nearest Neighbors (k-

from sklearn.datasets import load_iris

# Load the Iris dataset

# Split the dataset into training and testing sets

knn = KNeighborsClassifier(n_neighbors=3) # You can adjust k (the number of

# Classification Report (Precision, Recall, F1-Score)

# Visualize the confusion matrix using seaborn

You might also like