Mtech Final
Mtech Final
for (i in 3:n)
{
c <- a + b
cat(c, " ")
a <- b
b <- c
}
}
n_terms <- 10 cat("Fibonacci Series:\n") fibonacci(n_terms)
# Calculate Sum
sum_vec <- sum(vec)
# Calculate Mean
mean_vec <- mean(vec)
# Calculate Product
product_vec <- prod(vec)
# Print results
cat("Vector:", vec, "\n")
cat("Sum:", sum_vec, "\n")
cat("Mean:", mean_vec, "\n")
cat("Product:", product_vec, "\n")
Assumes similar data points are close to each other in feature space.
Example: If most nearby points are apples, a new point near them is likely an apple. Choice of K
Value Matters
A small K (e.g., 1 or 3) may lead to noise affecting predictions.
A large K (e.g., 20 or 50) may oversmooth and ignore local variations. Feature Scaling is
Important
Distance calculations are affected by different feature scales.
Example: If height is in cm and weight in kg, height will dominate. Normalization (Min-Max
Scaling or Standardization) is required.
Assumes a Meaningful Distance Metric
Uses distance measures like Euclidean, Manhattan, or Cosine similarity.
Example: Euclidean distance works well for continuous data, while Hamming distance is used for
categorical data.
Non-Parametric Nature
Does not assume an underlying distribution (unlike linear regression). Learns only when making
predictions (lazy learning).
b What is ggplot2 in R? Write an R program to plot a bar chart using 3 1 10
some sample data.
ggplot2 is a powerful and widely used data visualization package in R. It is based on the Grammar
of Graphics and allows users to create complex visualizations easily by layering different
elements like axes, colors, and labels.
Features of ggplot2:
Provides high-quality, customizable plots.
Supports multiple chart types (bar charts, line charts, histograms, etc.).
Uses a layered approach for plotting.
Works well with dplyr and tidyverse for data manipulation.
# Load ggplot2 library
library(ggplot2)
# Create sample data
data <- data.frame(
Category = c("A", "B", "C", "D", "E"),
Value = c(10, 25, 15, 30, 20)
)
# Create a bar chart
ggplot(data, aes(x = Category, y = Value, fill = Category)) +
geom_bar(stat = "identity") +
ggtitle("Bar Chart Example") +
xlab("Categories") +
ylab("Values") +
theme_minimal()
+ Curse of Dimensionality
In spam filtering, emails are represented as high-dimensional feature vectors (e.g., thousands of
words).
KNN performs poorly in high dimensions because distances become less meaningful.
⬛ Alternative: Naïve Bayes is better since it uses word probabilities instead of distances to classify
emails.
b Explain scraping the web with API’s and other tools. 2 3 10
Scraping the Web with APIs and Other Tools
Web scraping is the process of extracting data from websites. It can be done using APIs or web
scraping tools when APIs are not available.