Data Mining Practicals Complete
Data Mining Practicals Complete
Practical No: 1
Practical Name: Calculate the mean and standard deviation
Code:
import numpy as np
# Example data
data = [10, 20, 30, 40, 50]
# Calculate mean
mean = np.mean(data)
print("Mean:", mean)
print("Standard Deviation:", std_dev)
Output:
Mean: 30.0
Standard Deviation: 14.142135623730951
Practical No: 2
Practical Name: Read the CSV file
Code:
import csv
import pandas as pd
Output:
['John', '25', '50000']
['Sarah', '30', '60000']
['Mike', '28', '55000']
Code:
import pandas as pd
Output:
Mean Salary: 75000.0
Total Salary: 150000
Number of rows: 2
Practical No: 4
Practical Name: Calculate total sales by month
Code:
import pandas as pd
print(total_sales_by_month)
Output:
YearMonth Sales
0 2023-01 25000
1 2023-02 27000
2 2023-03 30000
Practical No: 5
Practical Name: Implement Clustering using K-Means
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
# Apply KMeans
kmeans = KMeans(n_clusters=4, random_state=42)
kmeans.fit(X)
labels = kmeans.labels_
Output:
Cluster Centers: [[ 4.6 2.3]
[ 1.2 -0.3]
[-2.1 7.5]
[ 3.2 6.1]]
Practical No: 6
Practical Name: Classification using Random Forest
Code:
import numpy as np
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from sklearn.metrics import accuracy_score
Output:
Accuracy: 0.97
Practical No: 7
Practical Name: Regression Analysis using Linear Regression
Code:
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split
from sklearn.linear_model import LinearRegression
# Predict
y_pred = lin_reg.predict(X)
print("Predicted Values:", y_pred)
Output:
Predicted Values: [ 2. 4. 6. 8. 10.]
Practical No: 8
Practical Name: Association Rule Mining using Apriori
Code:
import pandas as pd
from mlxtend.frequent_patterns import apriori, association_rules
# Sample dataset
data = {'Milk': [1, 1, 0], 'Bread': [1, 0, 1], 'Butter': [0, 1, 1]}
df = pd.DataFrame(data)
# Apply Apriori
frequent_itemsets = apriori(df, min_support=0.5, use_colnames=True)
rules = association_rules(frequent_itemsets, metric="lift", min_threshold=1.0)
print("Rules:", rules)
Output:
Rules: Empty DataFrame
Practical No: 9
Practical Name: Visualize the result of clustering and compare
Code:
import numpy as np
import matplotlib.pyplot as plt
from sklearn.cluster import KMeans
from sklearn.datasets import make_blobs
Output:
A scatter plot showing clusters formed by K-Means.
Practical No: 10
Practical Name: Visualize the correlation matrix using a pseudocolor plot
Code:
import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt
Output:
A heatmap displaying the correlation matrix of generated features.
Practical No: 11
Practical Name: Use of degrees distribution of a network
Code:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
# Compute degrees
degrees = [G.degree(n) for n in G.nodes()]
degree_count = np.bincount(degrees)
Output:
A bar plot displaying the degree distribution of a network graph.
Practical No: 12
Practical Name: Graph visualization of a network using statistical measures
Code:
import networkx as nx
import matplotlib.pyplot as plt
import numpy as np
plt.figure(figsize=(10, 8))
nx.draw(G, pos, node_size=50, node_color=node_colors, with_labels=False,
edge_color='lightgray')
plt.title('Network Visualization Based on Degree Statistics')
plt.show()
Output:
A network graph highlighting nodes based on statistical degree measures.