CS-3361-Data-science-lab Manual
CS-3361-Data-science-lab Manual
Ex No 1:
Install Method
Numpy
Numpy is a numerical computing package for mathematics, science, and engineering. Many data
science packages use Numpy as a dependency.
Output:
lOMoAR cPSD| 7367891
Example Code:
# importing numpy
np
# creating list
list = [1, 2, 3,
4]
sample_array = np.array(list1)
Example:
# importing numpy
np
# creating list
list_1 = [1, 2, 3, 4]
list_2 = [5, 6, 7, 8]
Code:
import pandas as pd
import numpy as np
sas=pd.Series([1,3,5,np.nan,6])
sas
lOMoAR cPSD| 7367891
Code:
import pandas as pd
data={'apple': [3,2,0],
'orange' : [3,8,9]}
purchase=pd.DataFrame(data)
purchase
purchase.to_csv('datasciencelab.csv')
lOMoAR cPSD| 7367891
Ex. No. 4 - Reading data from text files, Excel and the web and exploring various
commands for doing descriptive analytics on the Iris data set.
For Code:
import pandas as pd
data1=pd.read_csv("Iris.csv")
data1.head()
lOMoAR cPSD| 7367891
data1.info()
data1.describe()
data1.isnull().sum()
data1.shape
lOMoAR cPSD| 7367891
data
lOMoAR cPSD| 7367891
Ex No. 5 - Use the diabetes data set from UCI and Pima Indians Diabetes data
set for performing the following:
Code:
import pandas as pd
import numpy as np
import statistics as
df = pd.read_csv("diabetes.csv")
print(df.shape)
print(df.info())
lOMoAR cPSD| 7367891
Code:
df.mean()
Code:
print(df.loc[:,'Age'].mean())
print(df.loc[:,'Income'].mean())
Median
Code:
df.median()
lOMoAR cPSD| 7367891
Code:
df.mode()
Code:
df.std()
lOMoAR cPSD| 7367891
Code:
df.var()
Code:
iqr iqr(df['Age'])
lOMoAR cPSD| 7367891
Code:
print(df.skew())
Code:
import pandas as pd
df = pd.read_csv(diabetes.csv')
df.head()
lOMoAR cPSD| 7367891
Code:
import matplotlib.pyplot as
sns.set(style='whitegrid', context='notebook')
cols = ['Pregnancies','Glucose','BloodPressure','SkinThickness','Insulin','BMI','DiabetesPedigreeFunction','Age']
Code:
import numpy as np
cm = np.corrcoef(df[cols].values.T)
sns.set(font_scale=1.5)
lOMoAR cPSD| 7367891
hm = sns.heatmap(cm,cbar=True,annot=True,square=True,fmt='.2f',annot_kws={'size':
15},yticklabels=cols,xticklabels=cols)
plt.show()
Code:
class LinearRegressionGD(object):
self.eta = eta
self.n_iter = n_iter
self.cost_ = []
for i in range(self.n_iter):
output = self.net_input(X)
lOMoAR cPSD| 7367891
errors = (y - output)
self.w_[1:] += self.eta *
2.0 self.cost_.append(cost)
return self
return self.net_input(X)
X = df[['Age']].values
y = df['Pregnancies'].values
sc_y = StandardScaler()
X_std =
sc_x.fit_transform(X) y_std =
sc_y.fit_transform(y) lr =
LinearRegressionGD()
lr.fit(X_std, y_std)
plt.ylabel('SSE')
plt.xlabel('Epoch')
plt.show()
lOMoAR cPSD| 7367891
Code:
plt.scatter(X, y, c='blue')
plt.plot(X, model.predict(X),
plt.xlabel('Age (standardized)')
plt.ylabel('Pregnancies(standardized)')
plt.show()
lOMoAR cPSD| 7367891
Code:
age_std = sc_x.transform([20])
pregnancy_std =
lr.predict(age_std)
Code:
train_x.shape, train_y.shape
MultipleRegression le = MultipleRegression()
le.fit(train_x,train_y)
y_pred = le.predict(test_x)
y_pred
lOMoAR cPSD| 7367891
result
lOMoAR cPSD| 7367891
Code:
print('coefficient', le.coef_)
print('intercept', le.intercept_)
b. Also compare the results of the above analysis for the two data sets
Installing datacompy
Details :
datacompy takes two dataframes as input and gives us a human-readable report containing statistics that lets us
know the similarities and dissimilarities between the two dataframes. It will try to join two dataframes either on a
list of join columns, or on indexes.
lOMoAR cPSD| 7367891
Code:
import datacompy
rel_tol=0,df1_name=‟olddiabetes‟,df2_name=‟newdiabetes‟)
print(compare.report())
OUTPUT:
lOMoAR cPSD| 7367891
Ex.No. 6 Apply and explore various plotting functions on UCI data sets
a. Normal curves
Code:
import numpy as np
x = np.linspace(1,50,200)
#Creating a Function.
prob_density = (np.pi*sd) *
mean = np.mean(x)
sd = np.std(x)
pdf = normal_dist(x,mean,sd)
plt.xlabel('Data points')
lOMoAR cPSD| 7367891
plt.ylabel('Probability Density')
import matplotlib.pyplot as
features
fig, ax = plt.subplots(1, 1)
Z = np.cos(X / 2) + np.sin(Y / 4)
ax.contour(X, Y, Z)
lOMoAR cPSD| 7367891
ax.set_title('Contour Plot')
ax.set_xlabel('feature_x')
ax.set_ylabel('feature_y')
plt.show()
Code:
import pandas as pd
con = pd.read_csv('concrete.csv')
con
list(con.columns)
lOMoAR cPSD| 7367891
con.head()
con['cement'] = con['cement'].astype('category')
con.describe(include='category')
ax.set_xlabel("coarseagg");
d. Histograms:
Creating a Histogram
Code:
# Creating dataset
27])
lOMoAR cPSD| 7367891
# Creating histogram
plt.show()
Code:
import numpy as np
# Creating dataset
np.random.seed(23685752)
N_points = 10000
n_bins = 20
# Creating distribution
lOMoAR cPSD| 7367891
x = np.random.randn(N_points)
y = .8 ** x + np.random.randn(10000) +
25 # Creating histogram
# Show plot
plt.show()
Code:
import numpy as np
lOMoAR cPSD| 7367891
fig = plt.figure()
ax = plt.axes(projection
z = np.linspace(0, 1,
100) x = z * np.sin(25 *
z)
y = z * np.cos(25 *
z) c = x + y
ax.scatter(x, y, z, c =
plt.show()
lOMoAR cPSD| 7367891
Code:
%matplotlib inline
import numpy as np
plt.figure(figsize=(8, 8))
m.bluemarble(scale=0.5);
m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6,
lat_0=45, lon_0=-100,)
lOMoAR cPSD| 7367891
m.etopo(scale=0.5, alpha=0.5)
fontsize=12);
fig = plt.figure(figsize =
(12,12)) m = Basemap()
m.drawcoastlines()
plt.title("Coastlines", fontsize=20)
plt.show()
lOMoAR cPSD| 7367891
import numpy as np
import pandas as pd
import matplotlib.pyplot as
import geopandas as
shp
sns.set_style('whitegrid')
fp = r'Maps_with_python\india-polygon.shp'
map_df = gpd.read_file(fp)
map_df_copy = gpd.read_file(fp)
plt.plot(map_df , markersize=5)
lOMoAR cPSD| 7367891