0% found this document useful (0 votes)
21 views36 pages

CS3361-Data Science Lab Manual - B.rethina Kumar

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views36 pages

CS3361-Data Science Lab Manual - B.rethina Kumar

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 36

Ex.

No : 1 Python Packages
Date :
AIM:
To Download, install and explore the features of NumPy, SciPy,
Jupyter, and Pandas packages.
(i) Installing numpy
(ii) Installing scipy
(iii) Installing jupyter
(iv) Installing pandas
Procedure:
(i) Installing PIP On Windows
Step 1: Download PIP get-pip.py
i. Launch a command prompt
ii. Then, run the following command to download the get-pip.py file:
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py

Step 2: Installing PIP on Windows


To install PIP type in the following:

Step 3: Verify Installation


To Verify PIP installation type in the following: pip help
Step 4 :Upgrading PIP

To check the current version of PIP: pip –version

To upgrade PIP on Windows: python -m pip install --upgrade pip

(ii) Installing numpy : pip install nympy

(iii) Installing scipy : pip install scipy

(iv) Installing jupyter : pip install jupyter

(v) Installing pandas : pip install pandas


Result:
Thus Downloading, installing and exploring the features of python
packages are executed successfully.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ex.No : 2 Working With NumPy Array
Date :
AIM:
To write the program in python to perform array manipulation using
numpy.
Algorithm :
(i) Matrix addition
(ii) Matrix multiplication
(iii) Scalar multiplication of matrix
(iv) Matrix transpose
(v) Array datatype conversion
(vi) Stacking of numpy arrays
(vii) Sequence generation
(viii) Sorting an array
Source Code:
(2.1). Matrix Addition
Program :
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[10, 11, 12], [13, 14, 15]])
c=a+b
print("a = ", a)
print("b = ", b)
print("Addition of a and b = ", c)
Output :

---------------------------------------------------------------------------------------------------------
(2.2). Matrix Multiplication
Program :
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b = np.array([[10, 11, 12], [13, 14, 15]])
c=a*b
print("a = ", a)
print("b = ", b)
print("Multiplication of a and b = ", c)
Output :

---------------------------------------------------------------------------------------------------------
(2.3). Scalar Multiplication of Matrix
Program :
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6]])
b=a*3
print("a = ", a)
print("b = a * 3 = ", b)
Output :

---------------------------------------------------------------------------------------------------------
(2.4). Matrix Transpose
Program :
import numpy as np
a = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]])
b = a.T
print("a = \n", a)
print("Transpose of a = \n", b)

Output :

---------------------------------------------------------------------------------------------------------
(2.5). Array Datatype Conversion
Program :
import numpy as np
a = np.array([[2.5, 3.8, 1.5], [4.7, 2.9, 1.56]])
b = a.astype('int')
print("The array in float datatype =\n", a)
print("The array in int datatype =\n", b)
Output :

---------------------------------------------------------------------------------------------------------
(2.6). Stacking of numpy arrays
Program :
import numpy as np
a1 = np.array([[1, 2, 3], [4, 5, 6]])
a2 = np.array([[7, 8, 9], [10, 11, 12]])
c = np.hstack((a1, a2))
d = np.vstack((a1, a2))
print("The two arrays are :\na1 =\n", a1, "\na2 =\n", a2)
print("\nHorizontal stacking :\n", c)
print("\nVertical stacking :\n", d)
Output :

---------------------------------------------------------------------------------------------------------
(2.7). Sequence generation
Program :
import numpy as np
lists = [x for x in range(0, 101, 2)]
a = np.array(lists)
print(a)
Output :

---------------------------------------------------------------------------------------------------------
(2.8). Sorting an array
Program:
import numpy as np
a = np.array([[1, 4, 2], [3, 4, 6], [0, -1, 5]])
print("Array before sorting")
print(np.sort(a, axis=None))
print("Sorting in row wise :")
print(np.sort(a, axis=1))
print("Sorting in column wise :")
print(np.sort(a, axis=0))

Output :
Result :
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Ex.No : 3 Working With Pandas Dataframe
Date :

AIM:
To perform various operations on dataframe using pandas module in
python.

Algorithm : (Write your own algorithm according to the program)


(i) Creating dataframe using dictionary
(ii) Creating dataframe from a series
(iii) Sorting the dataframe
(iv) Manipulation of dataframe
(a)Manipulation of columns
(i) Selection of column
(ii)Addition of column
(iii)Deletion of column
(b)Manipulation of rows
(i)Selection of rows
(ii)Addition of rows
(iii)Deletion of rows
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Source Code :
(3.1). Creating data frame
Program:
import pandas as pd
data = [['name1', 21, '[email protected]', 1234567891],
['name2', 26, '[email protected]', 1234567892]]
df = pd.DataFrame(data, columns=['NAME', 'AGE', 'EMAIL ID', 'PHONE
NUMBER'], index=[1, 2])
print(df)

Output :

(3.2). Creating dataframe using dictionary


Program
import pandas as pd
data = {'Name' : ['aa', 'bb', 'cc'], 'Age' : [20, 21, 25]}
df = pd.DataFrame(data)
print(df)

Output :

---------------------------------------------------------------------------------------------------------
(3.3). Creating dataframe from a series

Program
import pandas as pd
data = {'ONE' : pd.Series([10, 20, 30, 40], index=[1, 2, 3, 4]),
'TWO' : pd.Series([50, 60, 70, 80], index=[1, 2, 3, 4])}
df = pd.DataFrame(data)
print(df)
Output :

---------------------------------------------------------------------------------------------------------
(3.4). Sorting the dataframe

Program
import pandas as pd
data = {'Name' : ['name1', 'name2', 'name3'], 'Age' : [20, 21, 22]}
df = pd.DataFrame(data)
print("\nDataset before sorting :\n", df)
d_sort1 = df.sort_values(by='Name')
print("\nDataset after sorted by Name :\n", d_sort1)
d_sort2 = df.sort_values(by='Age')
print("\nDataset after sorted by Age :\n", d_sort2)
Output :
(3.5). Manipulation of data frame
(i) Selection of column :

Source Code :
import pandas as pd
data = {'ONE' : pd.Series([10, 20, 30, 40], index=[1, 2, 3, 4]),
'TWO' : pd.Series([50, 60, 70, 80], index=[1, 2, 3, 4])}
df = pd.DataFrame(data)
print("------------------------")
print(df)
print("------------------------")
print("Selecting row ONE")
print(df['ONE'])
print("------------------------")
print("Selecting row TWO")
print(df['TWO'])
print("------------------------")

Output :
(ii) Addition of column :
Program
import pandas as pd
data = {'ONE' : pd.Series([10, 20, 30, 40], index=[1, 2, 3, 4]),
'TWO' : pd.Series([50, 60, 70, 80], index=[1, 2, 3, 4])}
df = pd.DataFrame(data)
print("------------------------")
print("Data Frame before adding a new column")
print(df)
print("------------------------")
df['THREE'] = pd.Series([90, 100, 110, 120], index=[1, 2, 3, 4])
print("Data Frame after adding a new column\n", df)
print("------------------------")

Output :
(iii) Deletion of column

Source Code :
import pandas as pd
data = {'ONE' : pd.Series([0, 1, 2, 3], index=[1, 2, 3, 4]),
'TWO' : pd.Series([4, 5, 6, 7], index=[1, 2, 3, 4])}
print("-----------------------")
df = pd.DataFrame(data)
print("Original DataFrame :\n", df)
print("-----------------------")
del df['ONE']
print("DataFrame after deleting a column :\n", df)
print("-----------------------")

Output :
(iv) Selection of rows
Source Code :
import pandas as pd
data = {'ONE' : pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd']),
'TWO' : pd.Series([4, 5, 6, 7], index=['a', 'b', 'c', 'd'])}
print("-----------------------")
df = pd.DataFrame(data)
print("DataFrame :\n", df)
print("-----------------------")
print("row 'c' :")
print(df.loc['c'])
print("-----------------------")

Output :
(v) Addition of rows
Source Code :
import pandas as pd
df1 = pd.DataFrame([[1, 2], [3, 4]], columns = ['a', 'b'])
df2 = pd.DataFrame([[5, 6], [7, 8]], columns = ['a', 'b'])
print("---------------------")
print("df1 :")
print(df1)
print("---------------------")
print("df2 :")
print(df2)
print("---------------------")
print("df1 + df2 :")
df1 = df1.append(df2)
print(df1)

Output :
(vi) Deletion of rows

Program :
import pandas as pd
df = pd.DataFrame([[1, 2], [3, 4]], columns = ['a', 'b'])
print("DataFrame :")
print(df)
df = df.drop(0)
print("DataFrame after deleting the row 0 :")
print(df)
Output :
Result :
The various operations on data frame using pandas module in python has
been implemented and executed successfully.

Ex.No :4 Reading Data From Various Sources


AIM:
To perform Reading data from Text file, Excel file and web using python
functions.
Algorithm:
(4.1) Reading data from a text file

Program :
T = open(r'Data.txt')
print(T.read())

Output :

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(4.2) Reading the CSV file

Program
import pandas as pd
data = pd.read_csv(r'Data.csv')
print(data)

Output :

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(4.3) Reading the excel file

Program
import pandas as pd
data = pd.read_excel(r'Data.xlsx')
print(data)

Output :

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(4.4) Reading from web

Program:
import pandas as pd
url="https://en.wikipedia.org/wiki/Iris_flower_data_set"
df=pd.read_html(url)
print(df)

Output:
[ Dataset order Sepal length ... Petal width Species
0 1 5.1 ... 0.2 I. setosa
1 2 4.9 ... 0.2 I. setosa
2 3 4.7 ... 0.2 I. setosa
3 4 4.6 ... 0.2 I. setosa
4 5 5.0 ... 0.3 I. setosa
.. ... ... ... ... ...
145 146 6.7 ... 2.3 I. virginica
146 147 6.3 ... 1.9 I. virginica
147 148 6.5 ... 2.0 I. virginica
148 149 6.2 ... 2.3 I. virginica
149 150 5.9 ... 1.8 I. virginica

[150 rows x 6 columns], .


Ex.No : 5 Descriptive Analysis On The Iris Dataset
Date:

Aim : To Perform Descriptive Analysis on the Iris Dataset using python


funcitons.

Algorithm:
Descriptive analytics is the process of using current and historical data to identify trends
and relationships
Data set: Iris Dataset
Source Code :
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
# To read the dataset in python
Iris = pd.read_csv (r'C:\Users\8316\Desktop\Iris.csv')
print ("Iris Dataset : \n",Iris) # To print the dataset.
#The head function in Python displays the first five rows of the dataframe by default.
print ("Iris Dataset Head : \n",Iris.head())
#Shape Function to list the records and the features
print ("Iris Dataset Shape : \n",Iris.shape)
# The info() method prints information about the DataFrame.
print ("Iris Dataset Info : \n",Iris.info())
# Summaries for a dataset.
print ("Iris Dataset Describe : \n",Iris.describe())
#The number of rows in the dataset, and can be obtained via `count()`.
print ("Iris Dataset Count : \n",Iris.count())
# Pandas groupby is used for grouping the data according to the categories and apply a
function to the categories.
print ("Iris Dataset Group : \n",Iris.groupby('Species',as_index= False)["Id"].count())
#Sample mean for every numeric column
print ("Iris Dataset Mean : \n",Iris.mean())
# Sample median for every numeric column
print ("Iris Dataset Median : \n",Iris.median())
# Sample variance for every numeric column
print ("Iris Dataset Variance : \n",Iris.var())
# The different categories of Species
print ("Iris Dataset different categories : \n",Iris.Species.unique())
output:
Ex.No : 6 Bivariate Analysis and Multiple Regression analysis
Date:

AIM:
To perform Bivariate analysis such as linear, logistic regression modelling
and Multiple Regression analysis using python
Dataset: Diabetes Dataset
(6.1) Linear Regression

Source Code :
import matplotlib.pyplot as plt
import numpy as np
from sklearn import datasets, linear_model
from sklearn.metrics import mean_squared_error, r2_score
diabetes_x, diabetes_y = datasets.load_diabetes(return_X_y=True)
diabetes_x = diabetes_x[:, np.newaxis, 2]
diabetes_x_train = diabetes_x[:-20]
diabetes_x_test = diabetes_x[-20:]
diabetes_y_train = diabetes_y[:-20]
diabetes_y_test = diabetes_y[-20:]
regr = linear_model.LinearRegression()
regr.fit(diabetes_x_train, diabetes_y_train)
diabetes_y_pred = regr.predict(diabetes_x_test)
print('Coefficients :\n', regr.coef_)
print('Mean squared error : %.2f'%mean_squared_error(diabetes_y_test,
diabetes_y_pred))
print('Coefficient of ditermination : %.2f'%r2_score(diabetes_y_test, diabetes_y_pred))
plt.scatter(diabetes_x_test, diabetes_y_test, color='black')
plt.plot(diabetes_x_test, diabetes_y_pred, color='blue', linewidth=3)
plt.xticks(())
plt.yticks(())
plt.show()
Output :

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(6.2) Logistic Regression
Source Code :
import pandas as pd
import numpy as np
from sklearn.linear_model import LogisticRegression
from sklearn.model_selection import train_test_split
import seaborn as sns
import matplotlib.pyplot as plt
from sklearn import metrics
import warnings
warnings.filterwarnings('ignore')
DataPath = (r'C:\Users\8316\Downloads\diabetes.csv')
data = pd.read_csv(DataPath)
x=data.drop("Outcome",axis=1)
y=data[["Outcome"]]
x_train,x_test,y_train,y_test=train_test_split(x,y,test_size=0.30,random_state=0)
model=LogisticRegression()
model.fit(x_train,y_train)
y_predict=model.predict(x_test)
model_score=model.score(x_test,y_test)
#Logistic Regression Model Score
print("Logistic Regression Model Score = ",model_score)
#confusion matrix
print("Confusion Matrix : \n",metrics.confusion_matrix(y_test,y_predict))
sns.heatmap(metrics.confusion_matrix(y_test,y_predict), annot=True, fmt='d',
cmap='Blues')
plt.title("LogisticRegression Confusion Matrix")
plt.ylabel("Actual Values")
plt.xlabel("Predicted Values")
plt.savefig('confusion_matrix.png')
plt.show()
Output:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
( 6.3 ) #Multiple Regression analysis
Source Code:
import pandas as pd
from sklearn import linear_model
DataPath = (r'C:\Users\8316\Downloads\diabetes.csv')
df = pd.read_csv(DataPath)
df.head()
x=df[['Insulin','Glucose']]
y=df[['Outcome']]
regr=linear_model.LinearRegression()
regr.fit(x,y)
predicted=regr.predict([[500,200]])
print("Predicted Outcome = ", predicted)
Output:
Ex.No : 7 Exploring Various Plotting Functions using UCI data sets
Date:
(7.1). Constructing Normal Curve

Source Code :
import matplotlib.pyplot as plt
import numpy as np
from scipy.stats import norm
import statistics

x = np.arange(-20, 20, 0.01)


mean = statistics.mean(x)
sd = statistics.stdev(x)
plt.plot(x, norm.pdf(x, mean, sd))
plt.title("Normal Curve")
plt.show()

Output :
(7.2 ) Constructing lineplot,scatterplot,density plot and Contour plot.
Source Code:
import matplotlib.pyplot as plt
import seaborn as sns
import pandas as pd
import numpy as np
# read the csv data
DataPath = (r'C:\Users\8316\Downloads\diabetes.csv')
df = pd.read_csv(DataPath)
df.head()
#Line Plot for Diabetes Dataset
sns.lineplot(df['BloodPressure'],df['Age'], hue =df["Outcome"])
plt.title("Lineplot for Diabetes Dataset")
plt.show()
#Scatter Plot for Diabetes Dataset
sns.scatterplot(df['BloodPressure'],df['Age'], hue =df["Outcome"])
plt.title("Scatterplot for Diabetes Dataset")
plt.show()
#Density Plot for Diabetes Dataset
x=df["Insulin"]
sns.distplot(x, hist=False)
plt.title("Density plot for Diabetes Dataset")
plt.show()
#Contour Plot for Diabetes Dataset
def f(x, y): return np.sin(x) ** 10 + np.cos(10 + y * x) * np.cos(x)
x1=df["Age"]
x2=df["Outcome"]
X, Y = np.meshgrid(x1, x2)
Z = f(X, Y)
#plt.contour(X, Y, Z, colors='black');
plt.contour(X, Y, Z, 20, cmap='RdGy');
plt.title("Contour plot for Diabetes Dataset")
plt.xlabel("Age")
plt.ylabel("Outcome")
plt.show()
output :

You might also like