Recent Trends In IT
Artificial Intelligence Practical Assignments: (Python / R
programming)
1. Write a program to implement Breadth First Search Algorithm.
→
from collections import deque
def bfs(graph, start):
visited = set()
queue = deque([start])
visited.add(start)
while queue:
node = queue.popleft()
print(node, end=" ")
for neighbor in graph[node]:
if neighbor not in visited:
visited.add(neighbor)
queue.append(neighbor)
# Example graph and execution
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B'],
'E': ['B', 'F'],
'F': ['C', 'E']
}
bfs(graph, 'A')
2. Write a program to implement the Depth First Search Algorithm.
→
def dfs(graph, start, visited=None):
if visited is None:
visited = set()
visited.add(start)
print(start, end=" ")
for neighbor in graph[start]:
if neighbor not in visited:
dfs(graph, neighbor, visited)
# Example graph and execution
graph = {
'A': ['B', 'C'],
'B': ['A', 'D', 'E'],
'C': ['A', 'F'],
'D': ['B'],
'E': ['B', 'F'],
'F': ['C', 'E']
}
dfs(graph, 'A')
3. Write a program to implement a water jug problem.
→
def water_jug_problem(jug1, jug2, target):
visited = set()
queue = [(0, 0)] # Start with both jugs empty
while queue:
(a, b) = queue.pop(0)
if (a, b) in visited:
continue
print(f"Jug1: {a}, Jug2: {b}")
visited.add((a, b))
if a == target or b == target:
return True
queue.extend([
(jug1, b), # Fill Jug1
(a, jug2), # Fill Jug2
(0, b), # Empty Jug1
(a, 0), # Empty Jug2
(min(a + b, jug1), max(0, a + b - jug1)), # Pour Jug2 -> Jug1
(max(0, a + b - jug2), min(a + b, jug2)) # Pour Jug1 -> Jug2
])
return False
# Example execution
water_jug_problem(4, 3, 2)
4. Write a program to implement the Tower of Hanoi problem.
→
def tower_of_hanoi(n, source, target, auxiliary):
if n == 1:
print(f"Move disk 1 from {source} to {target}")
return
tower_of_hanoi(n-1, source, auxiliary, target)
print(f"Move disk {n} from {source} to {target}")
tower_of_hanoi(n-1, auxiliary, target, source)
# Example execution
tower_of_hanoi(3, 'A', 'C', 'B')
Data Mining Practical Assignments:
1. Build a classification model and usage of weka to use a Decision Tree algorithm to
classify data from the "weather.arff" file. Perform initial preprocessing and create a
version of the initial dataset in which all numeric attributes should be converted to
categorical data.
→ Steps:
1. Open Weka Explorer.
2. Load the weather.arff dataset.
3. Preprocess:
○ Convert numeric attributes to categorical: Use the "Discretize" filter in
the "Preprocess" tab.
4. Apply Decision Tree:
○ Go to the "Classify" tab.
○ Choose J48 under the "Trees" category.
5. Evaluate the model (e.g., cross-validation or training set).
Using R:
# Install and load required libraries
library(RWeka)
# Load dataset
weather <- read.arff("weather.arff")
# Preprocess: Convert numeric to categorical
weather$NumericAttribute <- cut(weather$NumericAttribute, breaks = 3, labels =
c("Low", "Medium", "High"))
# Train Decision Tree
model <- J48(Class ~ ., data = weather)
summary(model)
2. Use database "labor.arff" Apply Linear Regression and find out the total number
of instances. (using weka / R).
→
library(foreign)
# Load dataset
labor <- read.arff("labor.arff")
# Linear Regression
model <- lm(TargetAttribute ~ ., data = labor)
summary(model)
# Count total instances
nrow(labor)
3. Use Naïve Bayes algorithm to diabetes data from the "diabetes arff" file. Perform
initial preprocessing and create a version of the initial data set in which the ID field
should be removed and the "Type" attribute should be converted to categorical
data. (using weka /R)
→
library(RWeka)
# Load dataset
diabetes <- read.arff("diabetes.arff")
# Preprocess: Remove ID field
diabetes$ID <- NULL
# Convert "Type" to categorical
diabetes$Type <- as.factor(diabetes$Type)
# Train Naïve Bayes
model <- NaiveBayes(Type ~ ., data = diabetes)
summary(model)
4. Build a classification model and usage of WEKA to implement Naïve Bayes
algorithm to classify whether data from the "Iris.arff" file on attribute plat growth
and leaves. Perform initial pre-processing and create a version of the initial dataset
in which all numeric attributes should be converted to categorical data.
→
Weka:
1. Load Iris.arff.
2. Preprocess:
○ Convert numeric attributes to categorical using "Discretize."
3. Apply Naïve Bayes under the "Classify" tab.
Using R:
library(RWeka)
# Load dataset
iris <- read.arff("iris.arff")
# Preprocess: Convert numeric attributes to categorical
iris$Sepal.Length <- cut(iris$Sepal.Length, breaks = 3, labels = c("Short", "Medium",
"Long"))
# Train Naïve Bayes
model <- NaiveBayes(Species ~ ., data = iris)
summary(model)
5. Use the Zero R classification algorithm to supermarket data from the
"supermarket.arff" file. Perform initial preprocessing and analyze confusion matrix.
(using weka /R)
→
library(RWeka)
# Load dataset
supermarket <- read.arff("supermarket.arff")
# Train Zero R
model <- ZeroR(Class ~ ., data = supermarket)
summary(model)
6. Use the simple K-means algorithm for database "bank.arff" with the default
settings and find out final cluster centroids.
→
Weka:
1. Load bank.arff.
2. Use "SimpleKMeans" under the "Cluster" tab.
3. Analyze the centroids in the output.
Using R:
library(cluster)
# Load dataset
bank <- read.arff("bank.arff")
# K-means clustering
kmeans_result <- kmeans(bank[, -1], centers = 3)
kmeans_result$centers
7. Use Apriori algorithm to vote data of a transaction and identify all frequent
k-itemsets and min support is 40% from the "vote.arff" file. Perform initial
preprocessing and find all the frequent item sets. (using weka /R).
→
library(arules)
# Load dataset
vote <- read.arff("vote.arff")
# Convert to transactions
transactions <- as(vote, "transactions")
# Apply Apriori
rules <- apriori(transactions, parameter = list(supp = 0.4, conf = 0.8))
inspect(rules)
8. Use the Hierarchical Clustering classification algorithm to tic-tac-toe data from
the "tic-tac-toe.arff" file. Perform initial pre-processing and create a version of the
initial data set in which the ID field should be removed and the "class" attribute
should be converted to categorical data. (Using weka /R).
→
library(cluster)
# Load dataset
tic_tac_toe <- read.arff("tic-tac-toe.arff")
# Preprocess: Remove ID and convert class to categorical
tic_tac_toe$ID <- NULL
tic_tac_toe$class <- as.factor(tic_tac_toe$class)
# Hierarchical clustering
distance <- dist(tic_tac_toe[, -ncol(tic_tac_toe)])
hclust_result <- hclust(distance)
plot(hclust_result)
9. Build a classification model and usage of weka to use a Decision Tree algorithm to
classify whether data from the "labour.arff" file. Perform initial preprocessing and
create a version of the initial dataset in which all numeric attributes should be
converted to categorical data.
→
Using Weka
1. Load the Dataset:
○ Open Weka Explorer.
○ Load the labor.arff dataset from the "Preprocess" tab.
2. Preprocess the Dataset:
○ Convert Numeric Attributes to Categorical:
■ Use the "Discretize" filter under the "Preprocess" tab:
1. Select "Choose" →
filters.unsupervised.attribute.Discretize.
2. Apply the filter to all numeric attributes.
3. Apply the Decision Tree Algorithm:
○ Go to the "Classify" tab.
○ Select J48 from the "trees" algorithms list (this is Weka’s
implementation of the C4.5 decision tree).
○ Set your class attribute (target variable) if not already set.
○ Run the classification algorithm.
Using R
# Load required libraries
library(RWeka)
library(foreign) # For loading ARFF files
# Load the dataset
labor <- read.arff("labor.arff")
# Preprocess: Convert numeric attributes to categorical
# (Assume "Age" and "Salary" are numeric attributes in this example)
labor$NumericAttribute1 <- cut(labor$NumericAttribute1, breaks = 3, labels =
c("Low", "Medium", "High"))
labor$NumericAttribute2 <- cut(labor$NumericAttribute2, breaks = 3, labels =
c("Low", "Medium", "High"))
# Train Decision Tree
model <- J48(ClassAttribute ~ ., data = labor)
# Summarize the decision tree
summary(model)
# Visualize the tree (if needed)
plot(model)
10. Use the FP Growth algorithm and Apriori algorithm for database
"supermarket.arff", with the default settings and find out which algorithm
generates maximum rules.
→Weka:
1. Load supermarket.arff.
2. Use "FP-Growth" and "Apriori" in the "Associate" tab with default settings.
3. Compare the number of generated rules.
Using R:
library(arules)
# Load dataset
supermarket <- read.arff("supermarket.arff")
# Convert to transactions
transactions <- as(supermarket, "transactions")
# Apriori
apriori_rules <- apriori(transactions, parameter = list(supp = 0.1, conf = 0.8))
inspect(apriori_rules)
# FP-Growth
library(arulesCBA)
fp_growth_rules <- fpgrowth(transactions, supp = 0.1, conf = 0.8)
inspect(fp_growth_rules)
Spark Practical Assignments:
1. Write a spark program to apply the map function and pass the expression
required to perform.
→
from pyspark.sql import SparkSession
# Initialize Spark Session
spark = SparkSession.builder.appName("MapFunctionExample").getOrCreate()
sc = spark.sparkContext
# Create an RDD
rdd = sc.parallelize([1, 2, 3, 4, 5])
# Apply map function to square each element
result = rdd.map(lambda x: x ** 2)
# Collect and print results
print(result.collect())
2. Write a spark program to apply a filter function and pass the expression required
to perform.
→
# Create an RDD
rdd = sc.parallelize([1, 2, 3, 4, 5])
# Apply filter to select only even numbers
result = rdd.filter(lambda x: x % 2 == 0)
# Collect and print results
print(result.collect())
3. Write a spark program to apply an intersection() function to return the
intersection of the elements.
→
# Create two RDDs
rdd1 = sc.parallelize([1, 2, 3, 4, 5])
rdd2 = sc.parallelize([3, 4, 5, 6, 7])
# Apply intersection
result = rdd1.intersection(rdd2)
# Collect and print results
print(result.collect())
4. Write a spark program to apply reduceByKey() function to aggregate the values.
→
# Create an RDD with key-value pairs
rdd = sc.parallelize([("a", 1), ("b", 2), ("a", 3), ("b", 4)])
# Apply reduceByKey to aggregate values by key
result = rdd.reduceByKey(lambda x, y: x + y)
# Collect and print results
print(result.collect())
5. Write a spark program to demonstrate the different ways to create a DataFrame.
→
from pyspark.sql import Row
# Example 1: From RDD
rdd = sc.parallelize([("Alice", 25), ("Bob", 30)])
df1 = spark.createDataFrame(rdd, schema=["Name", "Age"])
# Example 2: From List
data = [("Alice", 25), ("Bob", 30)]
df2 = spark.createDataFrame(data, schema=["Name", "Age"])
# Example 3: From Rows
rows = [Row(Name="Alice", Age=25), Row(Name="Bob", Age=30)]
df3 = spark.createDataFrame(rows)
# Show DataFrames
df1.show()
df2.show()
df3.show()
6. Write a spark program to add or update columns in DataFrame.
→
# Create a DataFrame
data = [("Alice", 25), ("Bob", 30)]
df = spark.createDataFrame(data, schema=["Name", "Age"])
# Add a new column
df = df.withColumn("NewColumn", df["Age"] + 5)
# Update an existing column
df = df.withColumn("Age", df["Age"] * 2)
# Show the DataFrame
df.show()
7. Write a spark program to remove distinct on multiple selected columns.
→
# Create a DataFrame
data = [("Alice", 25, "NY"), ("Bob", 30, "CA"), ("Alice", 25, "NY")]
df = spark.createDataFrame(data, schema=["Name", "Age", "City"])
# Remove duplicates based on "Name" and "Age"
df_distinct = df.dropDuplicates(["Name", "Age"])
# Show the DataFrame
df_distinct.show()