DM Practice

The document contains 8 slips with questions related to R and Python programming. Each slip contains 2 questions - the first question asks to write code in R language and the second question asks to write code in Python language. The questions cover topics like data manipulation, data visualization, linear regression, k-means clustering, decision trees, Naive Bayes etc. and involve tasks like data preprocessing, model building, parameter estimation and model evaluation. The code snippets provided aim to solve the programming problems step-by-step and demonstrate the use of various libraries like sklearn, pandas, matplotlib etc.

Uploaded by

66 Rohit Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

283 views15 pages

DM Practice

Uploaded by

66 Rohit Patil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 15

Slip 01

Q1.Write a R program to add, multiply and divide two vectors of integertype. (Vector
length should be minimum 4) [10 Marks]
a<-c(1,3,5,7)
b<-c(2,4,6,8)
print(a+b)
print(a-b)
print(a/b)
print(a%%b)
Q2.Consider the student data set. It can be downloaded from:
https://drive.google.com/open?id=1oakZCv7g3mlmCSdv9J8kdSaqO 5_6dIOw .
Write a programme in python to apply simple linear regression and find out mean
absolute error, mean squared error and root mean squared error. [20 Marks]
import numpy as nm
import pandas as pd
data_set= pd.read_csv('student_scores.csv')
print(data_set)
y = data_set['Scores'].values.reshape(-1, 1)
X = data_set['Hours'].values.reshape(-1, 1)
print(X)
print(y)
print(X.shape)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size = 0.2)
print(X_train)
print(X_test)
from sklearn.linear_model import LinearRegression
regressor = LinearRegression()
regressor.fit(X_train, y_train)
print(regressor.intercept_)
print(regressor.coef_)
score = regressor.predict([[9.5]])
print(score)
y_pred = regressor.predict(X_test)
print(y_pred)
from sklearn.metrics import mean_absolute_error, mean_squared_error
mae = mean_absolute_error(y_test, y_pred)
mse = mean_squared_error(y_test, y_pred)
rmse = nm.sqrt(mse)
print(mae)
print(mse)
print(mse)
print('Actual',y_test)
print('Predicted',y_pred)
Slip 02
Q1. Write an R program to calculate the multiplication table using afunction.[10 Marks]
num = as.integer(readline(prompt = "Enter a number: "))
for(i in 1:10)
{
print(paste(num,'x', i, '=', num*i))
}
Q2. Write a python program to implement k-means algorithms on asynthetic dataset.
[20Marks]
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn.datasets import make_blobs
data = make_blobs(n_samples=300, n_features=2, centers=5,
cluster_std=1.8,random_state=101)
data[0].shape
data[1]
plt.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
from sklearn.cluster import KMeans
kmeans = KMeans(n_clusters=5)
kmeans.fit(data[0])
kmeans.cluster_centers_
kmeans.labels_f, (ax1, ax2) = plt.subplots(1, 2, sharey=True,figsize=(10,6))
ax1.set_title('K Means')
ax1.scatter(data[0][:,0],data[0][:,1],c=kmeans.labels_,cmap='brg')
ax2.set_title("Original")
ax2.scatter(data[0][:,0],data[0][:,1],c=data[1],cmap='brg')
slip 03
Q1. Write a R program to reverse a number and also calculate the sum ofdigits of that
number. [10 Marks]
n = as.integer(readline(prompt = "Enter a number :"))
sum = 0
while (n > 0) {
r = n %% 10
sum = sum + r
n = n %/% 10
}
print(paste("Sum of digit is :", sum))
Q2. Consider the following observations/data. And apply simple linear regression and
find out estimated coefficients b0 and b1.( use numpypackage)x=[0,1,2,3,4,5,6,7,8,9,11,13]
y = ([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18] [20 Marks]
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9,11,13]).reshape((-1, 1))
y = np.array([1, 3, 2, 5, 7, 8, 8, 9, 10, 12,16, 18])
print(x)
print(y)
model = LinearRegression()
model.fit(x, y)
print('intercept:-',model.intercept_)
print('Slope:- ', model.coef_)
slip 04
Q1. Write a R program to calculate the sum of two matrices of given size. [10 Marks]
m1 = matrix(c(1, 2, 3, 4, 5, 6), nrow = 2)
print("Matrix-1:")
print(m1)
m2 = matrix(c(0, 1, 2, 3, 0, 2), nrow = 2)
print("Matrix-2:")
print(m2)
result = m1 + m2
print("Result of addition")
print(result)
Q2. Consider following dataset
weather=['Sunny','Sunny','Overcast','Rainy','Rainy','Rainy','Overcast','Sunny','Sunny','Rai
ny','Sunny','Overcast','Overcast','Rainy']
temp=['Hot','Hot','Hot','Mild','Cool','Cool','Cool','Mild','Cool','Mild','Mild','Mild','Hot','Mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No'].
Use Naïve Bayes algorithm to predict [0: Overcast, 2: Mild]tuple belongs to which class
whether to play the sports or not. [20 Marks]
weather=['sunny','sunny','overcast','rainy','rainy','rainy','overcast','sunny','sunny','rainy','sunny','ov
ercast','overcast','rainy']
temp=['hot','hot','hot','mild','cool','cool','cool','mild','cool','mild','mild','mild','hot','mild']
play=['No','No','Yes','Yes','Yes','No','Yes','No','Yes','Yes','Yes','Yes','Yes','No']
from sklearn import preprocessing
le = preprocessing.LabelEncoder()
wheather_encoded = le.fit_transform(weather)
print(wheather_encoded)
temp_encoded = le.fit_transform(temp)
label = le.fit_transform(play)
print("Temp:",temp_encoded)
print("Play:",label)
features = list(zip(wheather_encoded,temp_encoded))
print(features)
from sklearn.naive_bayes import GaussianNB
model = GaussianNB()
model.fit(features,label)
predicted = model.predict([[0,2]])
slip 05
Q1. Write a R program to concatenate two given factors. [10 Marks]
p<-c(1,2,4,5,7,8)
q<-c("shubham","arpita","nishka","gunjan","vaishali","sumit")
r<-c(p,q)
print(r)
Q2. Write a Python program build Decision Tree Classifier using Scikit- learn package for
diabetes data set (download database from
https://www.kaggle.com/uciml/pimaindians-diabetes-database) [20 Marks]
import numpy as np
import pandas as pd
dataset = pd.read_csv("play_tennis.csv")
from sklearn.preprocessing import LabelEncoder
Le = LabelEncoder()
dataset['outlook'] = Le.fit_transform(dataset['outlook'])
dataset['temp'] = Le.fit_transform(dataset['temp'])
dataset['humidity'] = Le.fit_transform(dataset['humidity'])
dataset['wind'] = Le.fit_transform(dataset['wind'])
dataset['play'] = Le.fit_transform(dataset['play'])
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
print(X)
from sklearn import tree
clf = tree.DecisionTreeClassifier(criterion = 'entropy')
clf = clf.fit(X, y)
tree.plot_tree(clf)
X_pred = clf.predict(X)
X_pred == y
slip 06
Q1. Write a R program to create a data frame using two given vectors and displaythe
duplicate elements. [10 Marks]
a = c(10,20,10,10,40,50,20,30)
b = c(10,30,10,20,0,50,30,30)
print("Original data frame:")
ab = data.frame(a,b)
print(ab)
print("Duplicate elements of the said data frame:")
print(duplicated(ab))
print("Unique rows of the said data frame:")
print(unique(ab)
Q2. Write a python program to implement hierarchical Agglomerative
clusteringalgorithm. (Download Customer.csv dataset from github.com).
[20 Marks]
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()
Slip 07
Q1. Write a R program to create a sequence of numbers from 20 to 50 and findthe mean
of numbers from 20 to 60 and sum of numbers from 51 to 91. [10 Marks]
print("Sequence of numbers from 20 to 50:")
print(seq(20,50))
print("Mean of numbers from 20 to 60:")
print(mean(20:60))
print("Sum of numbers from 51 to 91:")
print(sum(51:91))
Q2. Consider the following observations/data. And apply simple linear regression and
find out estimated coefficients b1 and b1 Also analyse theperformance of the model(Use
sklearn package) x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23])
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([1,2,3,4,5,6,7,8]).reshape((-1, 1))
print(x)
y = np.array([7,14,15,18,19,21,26,23])
print(y)
model = LinearRegression()
model.fit(x, y)
x_new = np.array(9).reshape((-1, 1))
y_new_pred = model.predict(x_new)
print(y_new_pred)
print('Slope:- ', model.coef_)
slip 08
Q1. Write a R program to get the first 10 Fibonacci numbers. [10 Marks]
total_terms = as.integer(readline(prompt="How many terms? "))
num1 = 0
num2 = 1
count = 2
if (total_terms <= 0) {
print("Please enter a positive integer")
} else {
if (total_terms == 1) {
print("Fibonacci sequence:")
print(num1)
} else { print("Fibonacci sequence:")
print(num1)
print(num2)
while (count < total_terms ) {
nxt = num1 + num2
print(nxt)
num1 = num2
num2 = nxt
count = count + 1 }
}
}
Q2. Write a python program to implement k-means algorithm to build prediction model
(Use Credit Card Dataset CC GENERAL.csv Download from kaggle.com) [20 Marks]
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('creditcard.csv')
dataset
x = dataset.iloc[:, [3, 4]].values
print(x)
from sklearn.cluster import KMeans
wcss_list= []
for i in range(1, 11):
kmeans = KMeans(n_clusters=i, init='k-means++', random_state= 42)
kmeans.fit(x)
wcss_list.append(kmeans.inertia_)
mtp.plot(range(1, 11), wcss_list)
mtp.title('The Elobw Method Graph')
mtp.xlabel('Number of clusters(k)')
mtp.ylabel('wcss_list')
mtp.show()
kmeans = KMeans(n_clusters=3, init='k-means++', random_state= 42)
y_predict= kmeans.fit_predict(x)
mtp.scatter(x[y_predict == 0, 0], x[y_predict == 0, 1], s = 100, c = 'blue', label =
'Cluster 1') #for first cluster
mtp.scatter(x[y_predict == 1, 0], x[y_predict == 1, 1], s = 100, c = 'green', label =
'Cluster 2') #for second cluster
mtp.scatter(x[y_predict== 2, 0], x[y_predict == 2, 1], s = 100, c = 'red', label =
'Cluster 3') #for third cluster
mtp.scatter(kmeans.cluster_centers_[:, 0], kmeans.cluster_centers_[:, 1], s = 300,
c = 'yellow', label = 'Centroid')
mtp.title('Clusters of Credit Card')
mtp.xlabel('V3')
mtp.ylabel('V4')
mtp.legend()
mtp.show()
slip 09
Q1. Write an R program to create a Data frames which contain details of 5 employees and
display summary of the data. [10 Marks]
Employees = data.frame(Name=c("Anastasia S","Dima ","Katherine","JAMESA","LAURA"),
Gender=c("M","M","F","F","M"),
Age=c(23,22,25,26,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576")
)
print("Details of the employees:")
print(Employees)
Q2. Write a Python program to build an SVM model to Cancer dataset. The dataset is
available in the scikit-learn library. Check the accuracyof model with precision
import numpy as nm
import pandas as pd
data_set= pd.read_csv('user_data.csv')
data_set
x= data_set.iloc[:, [2,3]].values
y= data_set.iloc[:, 4].values
from sklearn.model_selection import train_test_split
x_train, x_test, y_train, y_test= train_test_split(x, y, test_size= 0.25, random_state=0)
from sklearn.preprocessing import StandardScaler
st_x= StandardScaler()
x_train= st_x.fit_transform(x_train)
x_test= st_x.transform(x_test)
print(x_train)
print(x_test)
from sklearn.svm import SVC
classifier = SVC(kernel='linear', random_state=0)
classifier.fit(x_train, y_train)
y_pred= classifier.predict(x_test)
from sklearn.metrics import confusion_matrix
cm= confusion_matrix(y_test, y_pred)
print(cm)
from sklearn.metrics import accuracy_score
result2 = accuracy_score(y_test,y_pred)
print("Accuracy:",result2)
slip 10
Q1. Write a R program to find the maximum and the minimum value of a given vector [10
Marks]
x = c(10, 20, 30, 25, 9, 26)
print("Original Vectors:")
print(x)
print("Maximum value of the above Vector:")
print(max(x))
print("Minimum value of the above Vector:")
print(min(x))
Q2. Write a Python Programme to read the dataset (“Iris.csv”). dataset download
from(https://archive.ics.uci.edu/ml/datasets/iris) and apply Apriori algorithm. [20 Marks]
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv("Iris.csv")
dataset
transactions=[]
for i in range(0, 150):
transactions.append([str(dataset.values[i,j]) for j in range(0,6)])
from apyori import apriori
rules= apriori(transactions= transactions, min_support=0.003, min_confidence = 0.2,
min_lift=3, min_length=2, max_length=2)
results= list(rules)
results
for item in results:
pair = item[0]
items = [x for x in pair]
print("Rule:" + items[0] + " ->" + items[1])
print("Support: '"+ str(item[1]))
print("Confidence: "+ str(item[2][0][2]))
print("Lift: '"+ str(item[2][0][3]))
print('=====================================')
slip 11
Q1. Write a R program to find all elements of a given list that are not in another given
list.A = st("x", "y", "z") B = st("X", "Y", "Z", "x", "y", "z") [10 Marks]
l1 = list("x", "y", "z")
l2 = list("X", "Y", "Z", "x", "y", "z")
print("Original lists:")
print(l1)
print(l2)
print("All elements of l2 that are not in l1:")
setdiff(l2, l1)
Q2. Write a python program to implement hierarchical clustering algorithm.(Download
Wholesale customers data dataset from github.com).
[20 Marks]
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv('Wholesale customers data.csv')
dataset
x = dataset.iloc[:, [3, 4]].values
print(x)
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster5')
mtp.title('Clusters of customers')
mtp.xlabel('Milk')
mtp.ylabel('Grocery')
mtp.legend()
mtp.show()
slip 12
Q1. Write a R program to create a Dataframes which contain details of 5employees and
display details.Employeecontain(empno,empname,gender,age,designation)
[10 Marks]
Employees = data.frame(Name=c("Anastasia S","Dima ","Katherine","JAMESA","LAURA"),
Gender=c("M","M","F","F","M"),
Age=c(23,22,25,26,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576")
)
print("Details of the employees:")
print(Employees)
Q2. Write a python program to implement multiple Linear Regression modelfor a car
dataset.Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
[20 Marks]
import pandas
from sklearn import linear_model
df = pandas.read_csv("data.csv")
print(df)
X = df[['Weight', 'Volume']]
print(X)
y = df['CO2']
print(y)
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
slip 13
Q1. Draw a pie chart using R programming for the following datadistribution:
Digits on Dice1 2 3 4 5 6 Frequency of getting each number 7 2 6 3 4 8[10 Marks]

Q2. Write a Python program to read “StudentsPerformance.csv” file. Solvefollowing:- To

display the shape of dataset.
- To display the top rows of the dataset with their columns.Note:
Download dataset from following link :
(https://www.kaggle.com/spscientist/students-performance-inexams?
select=StudentsPerformance.csv) [20Marks]

slip 14
Q1. Write a script in R to create a list of employees (name) and perform thefollowing:
a. Display names of employees in the list.
b. Add an employee at the end of the list
c. Remove the third element of the list. [10 Marks]
Employees = data.frame(Name=c("Anastasia S","Dima ","Katherine","JAMESA","LAURA"),
Gender=c("M","M","F","F","M"),
Age=c(23,22,25,26,32),
Designation=c("Clerk","Manager","Exective","CEO","ASSISTANT"),
SSN=c("123-34-2346","123-44-779","556-24-433","123-98-987","679-77-576")
)
print("Details of the employees:")
print(Employees)
New_row_DF <- rbind(Employees, c("akshay","m",21,"Developer","234-464-24"))
print(New_row_DF)

Q2. Write a Python Programme to apply Apriori algorithm on Groceries dataset. Dataset
can be downloaded from
(https://github.com/amankharwal/Websitedata/blob/master/Groceries
_dataset.csv). Also display support and confidence for each rule.[20 Marks]
import numpy as nm
import matplotlib.pyplot as mtp
import pandas as pd
dataset = pd.read_csv("Iris.csv")
dataset
transactions=[]
for i in range(0, 150):
transactions.append([str(dataset.values[i,j]) for j in range(0,6)])
from apyori import apriori
rules= apriori(transactions= transactions, min_support=0.003, min_confidence = 0.2,
min_lift=3, min_length=2, max_length=2)
results= list(rules)
results
for item in results:
pair = item[0]
items = [x for x in pair]
print("Rule:" + items[0] + " ->" + items[1])
print("Support: '"+ str(item[1]))
print("Confidence: "+ str(item[2][0][2]))
print("Lift: '"+ str(item[2][0][3]))
print('=====================================')
Slip 15
Q1.Write a R program to add, multiply and divide two vectors of integer type.(vector
length should be minimum 4) [10 Marks]
a<-c(1,3,5,7)
b<-c(2,4,6,8)
print(a+b)
print(a-b)
print(a/b)
print(a%%b)
Q2. Write a Python program build Decision Tree Classifier for shows.csvfrom pandas and
predict class label for show starring a 40 years old American comedian, with 10 years of
experience, and a comedy ranking of 7? Create a csv file as shown in
https://www.w3schools.com/python/python_ml_decision_tree.asp[20 Marks]
import pandas
from sklearn import tree
from sklearn.tree import DecisionTreeClassifier
df = pandas.read_csv("data.csv")
d = {'UK': 0, 'USA': 1, 'N': 2}
df['Nationality'] = df['Nationality'].map(d)
d = {'YES': 1,'NO': 0}
df['Go'] = df['Go'].map(d)

features = ['Age', 'Experience', 'Rank', 'Nationality']

X = df[features]
y = df['Go']

dtree = DecisionTreeClassifier()
dtree = dtree.fit(X, y)

print(dtree.predict([[40, 10, 7, 1]]))

print("[1] means 'GO'")

print("[0] means 'NO'")

slip 16
Q1. Write a R program to create a simple bar plot of given data
Year Export Import
2001 26 35
2002 32 40
2003 35 50[10 Marks]

Q2. Write a Python program build Decision Tree Classifier using Scikit-learnpackage for
diabetes data set (download database from
https://www.kaggle.com/uciml/pima-indiansdiabetes-database) [20 Marks]
import numpy as np
import pandas as pd
dataset = pd.read_csv("play_tennis.csv")
from sklearn.preprocessing import LabelEncoder
Le = LabelEncoder()
dataset['outlook'] = Le.fit_transform(dataset['outlook'])
dataset['temp'] = Le.fit_transform(dataset['temp'])
dataset['humidity'] = Le.fit_transform(dataset['humidity'])
dataset['wind'] = Le.fit_transform(dataset['wind'])
dataset['play'] = Le.fit_transform(dataset['play'])
X = dataset.iloc[:, :-1].values
y = dataset.iloc[:, 4].values
print(X)
from sklearn import tree
clf = tree.DecisionTreeClassifier(criterion = 'entropy')
clf = clf.fit(X, y)
tree.plot_tree(clf)
X_pred = clf.predict(X)
X_pred == y
Slip 17
Q1. Write a R program to get the first 20 Fibonacci numbers.[10 Marks]
total_terms = as.integer(readline(prompt="How many terms? "))
num1 = 0
num2 = 1
count = 2
if (total_terms <= 0) {
print("Please enter a positive integer")
} else {
if (total_terms == 1) {
print("Fibonacci sequence:")
print(num1)
} else { print("Fibonacci sequence:")
print(num1)
print(num2)
while (count < total_terms ) {
nxt = num1 + num2
print(nxt)
num1 = num2
num2 = nxt
count = count + 1 }
}
}

Q2. Write a python programme to implement multiple linear regression modelfor stock
marketdata frame as follows:Stock_Market = {'Year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2
016,20,16,2016,2016,2016,2016,2016,2016,2016,2016,2016],'Month': [12,
11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],'Interest_Rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1
.75,1.75,1.75,1.75,1.75,1.75],'Unemployment_Rate':[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,
5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],'Stock_Index_Price':[1464,1394,1357,1293,12
56,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,884,866,876,822,704,719
] }And draw a graph of stock market price verses interest rate.
[20 Marks]
import pandas as pd
from sklearn import linear_model
data = {'year':
[2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2017,2016,2016,2016,2016,2016
,2016,2016,2016,2016,2016,2016,2016],
'month': [12,11,10,9,8,7,6,5,4,3,2,1,12,11,10,9,8,7,6,5,4,3,2,1],
'interest_rate':
[2.75,2.5,2.5,2.5,2.5,2.5,2.5,2.25,2.25,2.25,2,2,2,1.75,1.75,1.75,1.75,1.75,1.75,1.75,1.75,
1.75,1.75,1.75],
'unemployment_rate':
[5.3,5.3,5.3,5.3,5.4,5.6,5.5,5.5,5.5,5.6,5.7,5.9,6,5.9,5.8,6.1,6.2,6.1,6.1,6.1,5.9,6.2,6.2,6.1],
'index_price':
[1464,1394,1357,1293,1256,1254,1234,1195,1159,1167,1130,1075,1047,965,943,958,971,949,
884,866,876,822,704,719]
}
df = pd.DataFrame(data)
print(df)
x = df[['interest_rate','unemployment_rate']]
print(x)
y = df['index_price']
print(y)
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(x, y, test_size = 0.2)
print(X_train)
print(X_test)
regr = linear_model.LinearRegression()
regr.fit(X_train, y_train)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
y_pred=regr.predict(X_test)
print(y_pred)
from sklearn.metrics import r2_score
Accuracy=r2_score(y_test,y_pred)*100
print(Accuracy)
import matplotlib.pyplot as plt
plt.scatter(y_test,y_pred);
plt.xlabel('Actual');
plt.ylabel('Predicted');
import seaborn as sns
sns.regplot(x=y_test,y=y_pred,ci=None,color ='red');
Slip 18
Q1. Write a R program to find the maximum and the minimum value of a given vector[10
Marks]
x = c(10, 20, 30, 25, 9, 26)
print("Original Vectors:")
print(x)
print("Maximum value of the above Vector:")
print(max(x))
print("Minimum value of the above Vector:")
print(min(x))
Q2. Consider the following observations/data. And apply simple linear regression and
find out estimated coefficients b1 and b1 Also analyse theperformance of the model(Use
sklearn package)
x = np.array([1,2,3,4,5,6,7,8])
y = np.array([7,14,15,18,19,21,26,23]) [20 Marks]
import numpy as np
from sklearn.linear_model import LinearRegression
x= np.array([1,2,3,4,5,6,7,8]).reshape((-1, 1))
print(x)
y = np.array([7,14,15,18,19,21,26,23])
print(y)
model = LinearRegression()
model.fit(x, y)
x_new = np.array(9).reshape((-1, 1))
y_new_pred = model.predict(x_new)
print(y_new_pred)
print('Slope:- ', model.coef_)
Slip 19
Q1. Write a R program to create a Dataframes which contain details of 5 Studentsand
display the details.Students contain (Rollno,Studname,Address,Marks) [10 Marks]

Q2. Write a python program to implement multiple Linear Regression modelfor a car
dataset.Dataset can be downloaded from:
https://www.w3schools.com/python/python_ml_multiple_regression.asp
[20 Marks]
import pandas
from sklearn import linear_model
df = pandas.read_csv("data.csv")
print(df)
X = df[['Weight', 'Volume']]
print(X)
y = df['CO2']
print(y)
regr = linear_model.LinearRegression()
regr.fit(X, y)
predictedCO2 = regr.predict([[2300, 1300]])
print(predictedCO2)
slip 20
Q1. Write a R program to create a data frame from four given vectors.[10 Marks]
name = c('Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew', 'Laura', 'Kevin',
'Jonas')
score = c(12.5, 9, 16.5, 12, 9, 20, 14.5, 13.5, 8, 19)
attempts = c(1, 3, 2, 3, 2, 3, 1, 1, 2, 1)
qualify = c('yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes')
print("Original data frame:")
print(name)
print(score)
print(attempts)
print(qualify)
df = data.frame(name, score, attempts, qualify)
print(df)
Q2. Write a python program to implement hierarchical Agglomerativeclustering
algorithm.(Download Customer.csv dataset from github.com).[20 Marks]
dataset = pd.read_csv('Mall_Customers.csv')
x = dataset.iloc[:, [3, 4]].values
import scipy.cluster.hierarchy as shc
dendro = shc.dendrogram(shc.linkage(x, method="ward"))
mtp.title("Dendrogrma Plot")
mtp.ylabel("Euclidean Distances")
mtp.xlabel("Customers")
mtp.show()
from sklearn.cluster import AgglomerativeClustering
hc= AgglomerativeClustering(n_clusters=5, affinity='euclidean', linkage='ward')
y_pred= hc.fit_predict(x)
mtp.scatter(x[y_pred == 0, 0], x[y_pred == 0, 1], s = 100, c = 'blue', label = 'Cluster 1')
mtp.scatter(x[y_pred == 1, 0], x[y_pred == 1, 1], s = 100, c = 'green', label = 'Cluster 2')
mtp.scatter(x[y_pred== 2, 0], x[y_pred == 2, 1], s = 100, c = 'red', label = 'Cluster 3')
mtp.scatter(x[y_pred == 3, 0], x[y_pred == 3, 1], s = 100, c = 'cyan', label = 'Cluster 4')
mtp.scatter(x[y_pred == 4, 0], x[y_pred == 4, 1], s = 100, c = 'magenta', label = 'Cluster5')
mtp.title('Clusters of customers')
mtp.xlabel('Annual Income (k$)')
mtp.ylabel('Spending Score (1-100)')
mtp.legend()
mtp.show()

Mikrotik rb4011-rm Datasheet
No ratings yet
Mikrotik rb4011-rm Datasheet
4 pages
Unit 1 Data Mining Task
No ratings yet
Unit 1 Data Mining Task
7 pages
Proposal - SRI SAI ENTERPRISES MOHAN NAGAR
No ratings yet
Proposal - SRI SAI ENTERPRISES MOHAN NAGAR
4 pages
DM Slip Solutions
100% (1)
DM Slip Solutions
24 pages
Operations On Processes
No ratings yet
Operations On Processes
7 pages
Digital Signal Processing Ppt-1
100% (1)
Digital Signal Processing Ppt-1
12 pages
Oracle: Question & Answers
No ratings yet
Oracle: Question & Answers
18 pages
Effective Supply Chain Management
No ratings yet
Effective Supply Chain Management
20 pages
PI150 Series Frequency Inverter Operation Manual: 1.foreword
100% (3)
PI150 Series Frequency Inverter Operation Manual: 1.foreword
15 pages
Data Mining & Data Science Practical Slips
No ratings yet
Data Mining & Data Science Practical Slips
45 pages
Data Mining 5 Units Notes
No ratings yet
Data Mining 5 Units Notes
85 pages
Colgate OpenCore ComputerVision
No ratings yet
Colgate OpenCore ComputerVision
8 pages
EXL Interview Questions
No ratings yet
EXL Interview Questions
3 pages
GC 2024 04 19
No ratings yet
GC 2024 04 19
24 pages
WPC MP
No ratings yet
WPC MP
19 pages
Software Engineering: UNIT-2
No ratings yet
Software Engineering: UNIT-2
53 pages
UCO Bank Statement Sample Format
No ratings yet
UCO Bank Statement Sample Format
5 pages
04 Conjuntos Principales
No ratings yet
04 Conjuntos Principales
13 pages
Sheeting Accessories
No ratings yet
Sheeting Accessories
6 pages
Computation With The Fractional Fourier Transform
No ratings yet
Computation With The Fractional Fourier Transform
2 pages
Installing PINE A64 7" LCD Touch Screen Panel: Description
No ratings yet
Installing PINE A64 7" LCD Touch Screen Panel: Description
9 pages
DWM Lab Workbook Sample
No ratings yet
DWM Lab Workbook Sample
10 pages
Microphone Data Sheet
No ratings yet
Microphone Data Sheet
3 pages
Windchill REST Services 1.5
No ratings yet
Windchill REST Services 1.5
257 pages
Text and Annotation: Assoc Prof Eng Simona Sofia Duicu PHD
No ratings yet
Text and Annotation: Assoc Prof Eng Simona Sofia Duicu PHD
7 pages
Michael Todd Beauty Kicks Off Black Friday Sale
No ratings yet
Michael Todd Beauty Kicks Off Black Friday Sale
3 pages
Query Processing and Optimization
No ratings yet
Query Processing and Optimization
42 pages
DATA Analytics Previous Solved
No ratings yet
DATA Analytics Previous Solved
8 pages
Knowledge Representation in Data Mining
No ratings yet
Knowledge Representation in Data Mining
22 pages
Definition and Evolution of Marketing Management
No ratings yet
Definition and Evolution of Marketing Management
13 pages
Ihp w22 Model Answer Paper 22655
No ratings yet
Ihp w22 Model Answer Paper 22655
14 pages
BCA-404: Data Mining and Data Ware Housing
No ratings yet
BCA-404: Data Mining and Data Ware Housing
19 pages
Ch-4 Processor Memory Modeling Using Queuing Theory
100% (2)
Ch-4 Processor Memory Modeling Using Queuing Theory
19 pages
Python Lab Programs 1. To Write A Python Program To Find GCD of Two Numbers
No ratings yet
Python Lab Programs 1. To Write A Python Program To Find GCD of Two Numbers
12 pages
Forced Perspective Photography
100% (1)
Forced Perspective Photography
3 pages
Cp5151 Advanced Data Structures and Algorithims
No ratings yet
Cp5151 Advanced Data Structures and Algorithims
3 pages
R22-Ids-Question Bank
No ratings yet
R22-Ids-Question Bank
4 pages
Cmcp700s-Cvt Manual v1.1
No ratings yet
Cmcp700s-Cvt Manual v1.1
8 pages
DTB (ch5)
No ratings yet
DTB (ch5)
14 pages
MAN K100 Electrical System TGS-TGX
100% (4)
MAN K100 Electrical System TGS-TGX
236 pages
ME990-IH-Section 2a - LongBoltFlangeDesignProblems
No ratings yet
ME990-IH-Section 2a - LongBoltFlangeDesignProblems
15 pages
Project Report
No ratings yet
Project Report
16 pages
Vanishing and Exploding
No ratings yet
Vanishing and Exploding
9 pages
BioData-Pragati Pandit
No ratings yet
BioData-Pragati Pandit
4 pages
ML QB With Answer
No ratings yet
ML QB With Answer
20 pages
Chapter 2 Introduction To R and Python
No ratings yet
Chapter 2 Introduction To R and Python
35 pages
HCI Unit3
No ratings yet
HCI Unit3
13 pages
BCA-SEP-lesson Plan - R-Programming
No ratings yet
BCA-SEP-lesson Plan - R-Programming
5 pages
Iwt Practical
No ratings yet
Iwt Practical
20 pages
Infosys Campus Registration Guide
No ratings yet
Infosys Campus Registration Guide
7 pages
TCS CodeVita Preparation Guide
No ratings yet
TCS CodeVita Preparation Guide
37 pages
Case Study (Analysis of Algorithm
No ratings yet
Case Study (Analysis of Algorithm
14 pages
Unit4 Datascience
No ratings yet
Unit4 Datascience
43 pages
A Modular Approach To Program Organization
No ratings yet
A Modular Approach To Program Organization
51 pages
SC&RP - Unit 5
No ratings yet
SC&RP - Unit 5
36 pages
Skill Development Practical File
No ratings yet
Skill Development Practical File
18 pages
Hertz Heat Recovery
No ratings yet
Hertz Heat Recovery
11 pages
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
94% (18)
Vtu 7TH Sem Cse/ise Data Warehousing & Data Mining Notes 10cs755/10is74
70 pages
DS&BD Lab Manul
No ratings yet
DS&BD Lab Manul
98 pages
Mfcs PPT (All Units)
No ratings yet
Mfcs PPT (All Units)
103 pages
Pincer Search Algo
No ratings yet
Pincer Search Algo
8 pages
Agriculture Management System-3
No ratings yet
Agriculture Management System-3
22 pages
Bda Unit 5
No ratings yet
Bda Unit 5
14 pages
CDM 400x300 en
No ratings yet
CDM 400x300 en
5 pages
Data Mining Question Bank U3 & U4
No ratings yet
Data Mining Question Bank U3 & U4
3 pages
Unit 1 - Machine Learning
No ratings yet
Unit 1 - Machine Learning
21 pages
Assignment DBMS
No ratings yet
Assignment DBMS
8 pages
Q&A Univ 3unit
No ratings yet
Q&A Univ 3unit
18 pages
Administrator Practice Test Results 5
No ratings yet
Administrator Practice Test Results 5
12 pages
R Programming UNIT-1
No ratings yet
R Programming UNIT-1
48 pages
WWW - Manaresults.Co - In: I B. Tech. II Semester Regular Examinations, April/May - 2017 Data Structures
No ratings yet
WWW - Manaresults.Co - In: I B. Tech. II Semester Regular Examinations, April/May - 2017 Data Structures
4 pages
Chapter-1:-Introduction To R Language: 1.1 History and Overview
No ratings yet
Chapter-1:-Introduction To R Language: 1.1 History and Overview
7 pages
CS6303 Computer Architecture Question Bank 3rd Sem
No ratings yet
CS6303 Computer Architecture Question Bank 3rd Sem
5 pages
DM Important Questions
100% (1)
DM Important Questions
2 pages
Question Bank - WTL-oral Question Bank - WTL-oral
No ratings yet
Question Bank - WTL-oral Question Bank - WTL-oral
9 pages
Classification and Prediction
No ratings yet
Classification and Prediction
126 pages
Unit-2 Solution
No ratings yet
Unit-2 Solution
22 pages
Fdsa UNIT V
No ratings yet
Fdsa UNIT V
18 pages
It6006 Data Analytics Syllabus
No ratings yet
It6006 Data Analytics Syllabus
1 page
FDS Iat-2 Part-B
No ratings yet
FDS Iat-2 Part-B
4 pages
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
No ratings yet
Iv Semester: Data Mining Question Bank: Unit 2 2 Mark Questions)
5 pages
APMC Prachi Synopsis
No ratings yet
APMC Prachi Synopsis
6 pages
Assignment I Data Analytics
No ratings yet
Assignment I Data Analytics
3 pages
Big Data Unit 2
No ratings yet
Big Data Unit 2
19 pages
AoA Important Question
100% (1)
AoA Important Question
3 pages
Matlab File - Deepak - Yadav - Bca - 4TH - Sem - A50504819015
No ratings yet
Matlab File - Deepak - Yadav - Bca - 4TH - Sem - A50504819015
59 pages
Gujarat Technological University
No ratings yet
Gujarat Technological University
4 pages
Data Analytics Lab File Rohit
No ratings yet
Data Analytics Lab File Rohit
23 pages
13-Mca-Or-Probability & Statistics
No ratings yet
13-Mca-Or-Probability & Statistics
3 pages
Data Warehousing and Data Mining
No ratings yet
Data Warehousing and Data Mining
4 pages

DM Practice

Uploaded by

DM Practice

Uploaded by

Slip 01

Q2. Write a Python program to read “StudentsPerformance.csv” file. Solvefollowing:- To

features = ['Age', 'Experience', 'Rank', 'Nationality']

print(dtree.predict([[40, 10, 7, 1]]))

print("[1] means 'GO'")

print("[0] means 'NO'")

You might also like