0% found this document useful (0 votes)
483 views

CS3361 Data Science Lab Manual

pip install numpy 2. Verify Installation: Python code: import numpy as np print('NumPy is installed successfully') 3. Explore Features: - NumPy arrays are fast and efficient for numerical operations. - NumPy provides functions to create, manipulate arrays like shape, size, slicing etc. - NumPy supports mathematical operations on arrays like sin, cos, sqrt etc. - NumPy supports linear algebra, Fourier transform and random number capabilities. So in summary, NumPy is a fundamental package for scientific computing in Python. It provides efficient multi-dimensional arrays and tools to work with these arrays. 3. Scipy SciPy stands for Scientific Python and it is

Uploaded by

shanmugam
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
483 views

CS3361 Data Science Lab Manual

pip install numpy 2. Verify Installation: Python code: import numpy as np print('NumPy is installed successfully') 3. Explore Features: - NumPy arrays are fast and efficient for numerical operations. - NumPy provides functions to create, manipulate arrays like shape, size, slicing etc. - NumPy supports mathematical operations on arrays like sin, cos, sqrt etc. - NumPy supports linear algebra, Fourier transform and random number capabilities. So in summary, NumPy is a fundamental package for scientific computing in Python. It provides efficient multi-dimensional arrays and tools to work with these arrays. 3. Scipy SciPy stands for Scientific Python and it is

Uploaded by

shanmugam
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 82

DEPARTMENT OF INFORMATION TECHNOLOGY

Vision of the Department

To achieve global standards in Quality of Education, Research and Development in

Information Technology by adapting to the rapid technological advancement, and efficiently mold the

engineers’ knowledge with theory and practice, using teaching skill utilizing the latest practical

experiments and software to fit

Mission of the Department

M1. To produce successful graduates with personal and professional responsibilities and

commitment to lifelong learning.

M2. To impart professional ethics, social responsibilities, entrepreneur skills and value-based IT

education.

M3. Facilitate the development and application of problem-solving skills in students.


DATA SCIENCE LAB MANUAL

CS3361 -DATA SCIENCE LABORATORY


COURSE OBJECTIVES:
 To understand the python libraries for data science
 To understand the basic Statistical and Probability measures for data science.
 To learn descriptive analytics on the benchmark data sets.
 To apply correlation and regression analytics on standard data sets.
 To present and interpret data using visualization packages in Python.
LIST OF EXPERIMENTS:
1. Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages.
2. Working with Numpy arrays
3. Working with Pandas data frames
4. Reading data from text files, Excel and the web and exploring various commands for
doing descriptive analytics on the Iris data set.
5. Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing
the following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.
6. Apply and explore various plotting functions on UCI data sets.
a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots d. Histograms e. Three dimensional plotting
7. Visualizing Geographic Data with Basemap
List of Equipments:
Tools: Python, Numpy, Scipy, Matplotlib, Pandas, statmodels, seaborn, plotly, bokeh
Note: Example data sets like: UCI, Iris, Pima Indians Diabetes etc.

1
DATA SCIENCE LAB MANUAL

CS3361 DATA SCIENCE LABORATORY


(For III Semester IT)

COURSE OUTCOMES:
At the end of this course, the students will be able to:
COs Cognitive Level Course Outcomes
CO1 Create Make use of the python libraries for data science
CO2 Create Make use of the basic Statistical and Probability measures for data science.
CO3 Apply Perform descriptive analytics on the benchmark data sets.
CO4 Apply Perform correlation and regression analytics on standard data sets
CO5 Apply Present and interpret data using visualization packages in Python.
Apply data science concepts and methods to solve problems in real-world contexts
CO6 Apply and will communicate these solutions effectively

LIST OF EXPERIMENTS – MAPPING WITH COS, POS& PSOS

Exp.
Name of the Experiment COs POs PSOs
No

Download, install and explore the features of NumPy,


1 1 1,2,3,4,9,10,11,12 1,2
SciPy, Jupyter, Statsmodels and Pandas packages.

2 Working with Numpy arrays 1,2 1,2,3,4,5,9,10,11,12 1,2


3 Working with Pandas data frames 1,2 1,2,3,4,5,9,11,12 1,2
Reading data from text files, Excel and the web and
4 exploring various commands for doing descriptive 3 1,2,3,4,5,9,10,11,12 1,2
analytics on the Iris data set.
Use the diabetes data set from UCI and Pima
Indians Diabetes data set for performing the
following:
a)Univariate analysis: Frequency, Mean, Median,
Mode, Variance, Standard Deviation, Skewness and
5 Kurtosis. 2,3,4 1,2,3,4,5,7,9,10,11,12 1,2
b)Bivariate analysis: Linear and logistic regression
modeling
c)Multiple Regression analysis
d)Also compare the results of the above analysis for the
two data sets.
Apply and explore various plotting functions on UCI
data sets.
a. Normal curves
6 b. Density and contour plots 4,5 1,2,3,4,5,9,10,11,12 1,2
c. Correlation and scatter plots
d. Histograms
e. Three dimensional plotting
7 Visualizing Geographic Data with Basemap 5 1,2,3,4,5,9,10,11,12 1,2
Experiments Beyond the Syllabus
8 Predict Income with Census Data 6 1,2,3,4,5,6,7,8,9,10,11,12 1,2

2
DATA SCIENCE LAB MANUAL

PROGRAM OUTCOMES (POs)

PO1 Engineering Knowledge PO7 Environment & Sustainability


PO2 Problem Analysis PO8 Ethics
PO3 Design & Development PO9 Individual & Team Work
PO4 Investigations PO10 Communication Skills
PO5 Modern Tools PO11 Project Mgt. & Finance
PO6 Engineer & Society PO12 Life Long Learning

PROGRAM SPECIFIC OUTCOMES (PSOs)

PSO1: Cutting Edge:


To identify and implement appropriate techniques, resources, modern tools for providing solution to new
idea and innovation.
PSO2: Risk Management:
To manage complex IT projects with consideration of human, financial, ethical and environmental
factors and an understanding of risk management processes, operational and policy implications.

3
DATA SCIENCE LAB MANUAL

INDEX

Ex.No LIST OF EXPERIMENTS Page No


Download, install and explore the features of NumPy,
1. 5
SciPy, Jupyter, Statsmodels and Pandas packages.

2. Working with Numpy Arrays 13

3. Working with Pandas Data Frames 26

Reading data from text files, Excel and the web and
4. exploring various commands for doing descriptive 32
analytics on the Iris data set.

5. Use the diabetes data set from UCI and Pima Indians
38
Diabetes data set

6. Apply and explore various plotting functions on UCI


50
data sets

7.
Visualizing Geographic Data with Basemap 55

8. Predict Income with Census Data 59

Sample Exercise 63

4
DATA SCIENCE LAB MANUAL

EX.No:1
Download, install and explore the features of NumPy, SciPy, Jupyter,
Statsmodelsand Pandas packages.
Date:

Related Co &
CO1 & PO1,2,3,4,9,10,11,12
PO’s

Aim
To Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels and
Pandas packages by using pip command.
Basic Tools:
a. Python
b. Numpy
c. Scipy
d. Matplotlib
e. Pandas
f. statmodels
g. seaborn
h. plotly
i. bokeh
1. Python
Python is an easy to learn, powerful programming language. It has efficient high-level data
structures and a simple but effective approach to object-oriented programming. Python’s elegant
syntax and dynamic typing, together with its interpreted nature, make it an ideal language for
scripting and rapid application development in many areas on most platforms.
The Python interpreter and the extensive standard library are freely available in source or
binary form for all major platforms from the Python web site, https://www.python.org/, and may be
freely distributed.
Installation Commands:
Step 1: Download the Python Installer binaries. Open the official Python website in
your web browser. ...
Step 2: Run the Executable Installer. Once the installer is downloaded, run the
Python installer. ...
Step 3: Add Python to environmental variables. ...
Step 4: Verify the Python Installation.

5
DATA SCIENCE LAB MANUAL

6
DATA SCIENCE LAB MANUAL

2. Numpy

NumPy stands for Numerical Python and it is a core scientific computing library
in Python. It provides efficient multi-dimensional array objects and various operations
to work with these array objects.

Package installer for Python (pip) needed to run Python on your computer.

Installation Commands:
1. Command Prompt : Py –m pip - -version
2. Command Prompt :Py –m pip install numpy

7
DATA SCIENCE LAB MANUAL

3. Scipy

SciPy is a scientific computation library that uses NumPy underneath.SciPy stands for
Scientific Python.It provides more utility functions for optimization, stats and signal
processing.LikeNumPy, SciPy is open source so we can use it freely.

Installation Commands:

Command Prompt :Py –m pip install scipy

4. Matplotlib
Matplotlib is a comprehensive library for creating static, animated, and interactive
visualizations in Python. Matplotlib makes easy things easy and hard things possible.

 Create publication quality plots.


 Make interactive figures that can zoom, pan, update.
 Customize visual style and layout.
 Export to many file formats.
 Embed in JupyterLab and Graphical User Interfaces.
8
DATA SCIENCE LAB MANUAL

 Use a rich array of third-party packages built on Matplotlib.

Installation Commands:

Command Prompt :Py –m pip install matplotlib

5. Pandas
pandas is a fast, powerful, flexible and easy to use open source data analysis and manipulation
tool, built on top of the Python programming language
Installation Commands:

Command Prompt: Py –m pip install Pandas

6. Jupyter

The Jupyter Notebook is the original web application for creating and sharing
computational documents. It offers a simple, streamlined, document-centric experience.

Installation Commands:

Command Prompt: Py –m pip install jupyter

9
DATA SCIENCE LAB MANUAL

6. Statmodels
Statsmodels is a Python package that allows users to explore data, estimate statistical
models, and perform statistical tests.
Installation Commands:

Command Prompt: Py –m pip install statsmodels

7. Seaborn
Seaborn is a Python data visualization library based on matplotlib. It provides a high-level
interface for drawing attractive and informative statistical graphics.
Installation Commands:

Command Prompt: Py –m pip install seaborn

8. Plotly
Plotly is a technical computing company headquartered in Montreal, Quebec, that
develops online data analytics and visualization tools. Plotly provides online graphing,
analytics, and statistics tools for individuals and collaboration, as well as scientific graphing
libraries for Python, R, MATLAB, Perl, Julia, Arduino, and REST.
Installation Commands:

Command Prompt: Py –m pip install plotly

10
DATA SCIENCE LAB MANUAL

9. Bokeh
Bokeh is a Python library for creating interactive visualizations for modern web browsers. It
helps you build beautiful graphics, ranging from simple plots to complex dashboards with
streaming datasets.

Installation Commands:

Command Prompt: Py –m pip install bokeh

1. Basic Array Program:(Python)


import numpy as np

arr = np.array([1, 2, 3, 4, 5])

print(arr)

Output

2. import pandas as pd

arr = pd.array([1, 2, 3, 4, 5])

print(arr)

3. Draw a line in a diagram from position (0,0) to position (6,250) using Matplotlib

import matplotlib.pyplot as plt


import numpy as np
xpoints = np.array([0, 6])
ypoints = np.array([0, 250])
plt.plot(xpoints, ypoints)
plt.show()

import matplotlib.pyplot as plt


import numpy as np

y = np.array([35, 25, 25, 15])


mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
plt.pie(y, labels = mylabels)
plt.legend()
plt.show()

A simple scatter plot:


import matplotlib.pyplot as plt
import numpy as np
11
DATA SCIENCE LAB MANUAL

x = np.array([5,7,8,7,2,17,2,9,4,11,12,9,6])
y = np.array([99,86,87,88,111,86,103,87,94,78,77,85,86])
plt.scatter(x, y)
plt.show()

import matplotlib.pyplot as plt


import numpy as np
x = np.random.normal(170, 10, 250)
plt.hist(x)
plt.show()

Example
Specify a new color for each wedge:
import matplotlib.pyplot as plt
import numpy as np
y = np.array([35, 25, 25, 15])
mylabels = ["Apples", "Bananas", "Cherries", "Dates"]
mycolors = ["black", "hotpink", "b", "#4CAF50"]
plt.pie(y, labels = mylabels, colors = mycolors)
plt.show()

Result

Thus the Download, install and explore the features of NumPy, SciPy, Jupyter, Statsmodels
and Pandas packages by using pip command and basic program was executed.

12
DATA SCIENCE LAB MANUAL

NUMPY

13
DATA SCIENCE LAB MANUAL

EX. No:2
Working with Numpy Arrays
Date:
C

Related CO & Po CO1 & CO2 -PO1,2,3,4,5,9,10,11,12


C
CC0

Aim :

To write python programs to create and access numpy arrays.

Algorithm:
1. Start the Program.
2. Import Numpy Library.
3. Perform operation with Numpy Array.
4. Display the output.
5. Stop the Program.
Numpy:
Numpy stands for Numerical Python. It is a Python library used for working with an
array. In Python, we use the list for purpose of the array but it’s slow to process. NumPy array
is a powerful N-dimensional array object and its use in linear algebra, Fourier transform, and
random number capabilities. It provides an array object much faster than traditional Python
lists.

Different way of creating Numpy Arrays:

1. numpy.array(): The Numpy array object in Numpy is called ndarray. We can create ndarray
using numpy.array() function.

Syntax: numpy.array(parameter)

2. numpy.arange(): This is an inbuilt NumPy function that returns evenly spaced values
within a given interval.

3. numpy.linspace(): This function returns evenly spaced numbers over a specified


between two limits.

Syntax: numpy.linspace(start, stop, num=50, endpoint=True, retstep=False, dtype=None,


axis=0)

14
DATA SCIENCE LAB MANUAL

4. numpy.empty(): This function create a new array of given shape and type,
without initializing value.

Syntax: numpy.empty(shape, dtype=float, order=’C’)

5. numpy.ones(): This function is used to get a new array of given shape and type, filled
with ones(1).

Syntax: numpy.ones(shape, dtype=None, order=’C’)

6. numpy.zeros(): This function is used to get a new array of given shape and type, filled
with zeros(0).

Syntax: numpy.ones(shape, dtype=None)

Create a NumPyndarray Object

NumPy is used to work with arrays. The array object in NumPy is called

ndarray. We can create a NumPy ndarray object by using the array() function.

import numpy as np

arr = np.array((1, 2, 3, 4, 5))

print(arr)

OUTPUT
[1 2 3 4 5]

Dimensions in Arrays

A dimension in arrays is one level of array depth (nested arrays).

Example

Create a 2-D array containing two arrays with the values 1,2,3 and 4,5,6:

15
DATA SCIENCE LAB MANUAL

import numpy as np

arr = np.array([[1, 2, 3], [4, 5, 6]])

print(arr)

OUTPUT
[[1 2 3]
[[1 2 3]

Check Number of Dimensions?

NumPy Arrays provides the ndim attribute that returns an integer that tells us how many
dimensions the array have.

Example

Check how many dimensions the arrays have:

import numpy as np
a = np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(a.ndim)
print(b.ndim)
print(c.ndim)
print(d.ndim)

Higher Dimensional Arrays

An array can have any number of dimensions.

When the array is created, you can define the number of dimensions by using
the ndmin argument.

Example

Create an array with 5 dimensions and verify that it has 5 dimensions:


16
DATA SCIENCE LAB MANUAL

import numpy as np
arr = np.array([1, 2, 3, 4], ndmin=5)
print(arr)
print('number of dimensions :', arr.ndim)

OUTPUT

[[[[[1 2 3 4]]]]]
number of dimensions : 5

NumPy Array Indexing

Array indexing is the same as accessing an array element.

The indexes in NumPy arrays start with 0, meaning that the first element has index 0, and the
second has index 1 etc.

Example

Get the first element from the following array:

import numpy as np

arr = np.array([1, 2, 3, 4])

print(arr[0])

OUTPUT

Example

Access the element on the first row, second column:


17
DATA SCIENCE LAB MANUAL

import numpy as np

arr = np.array([[1,2,3,4,5], [6,7,8,9,10]])

print('2nd element on 1st row: ', arr[0, 1])

OUTPUT

2nd element on 1st dim: 2

NumPy Array Slicing

Slicing in python means taking elements from one given index to another given index.

We pass slice instead of index like this: [start:end].

We can also define the step, like this: [start:end:step].

<slice> = <array>[start:stop]
To slice a one dimensional array, I provide a start and an end number separated by a
semicolon (:). The range then starts at the start number and one before the end number.

18
DATA SCIENCE LAB MANUAL

Two dimensional arrays


When I slice a 2D array, I imagine that the result is the intersection between the selected rows
(first index) and columns(second index).

NumPy Array Shape


NumPy arrays have an attribute called shape that returns a tuple with each index having
19
DATA SCIENCE LAB MANUAL

the number of corresponding elements.


Program
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
OUTPUT
(2,4)
Reshaping of Arrays
reshape:
Reshape is when you change the number of rows and columns which gives a new
view to an object. Now, let us take an example to reshape the below array:

Program:

1 import numpy as np
2 a = np.array([(8,9,10),(11,12,13)])
3 print(a)
4 a=a.reshape(3,2)
5 print(a)
Output– [[ 8 9 10] [11 12 13]] [[ 8 9] [10 11] [12 13]]

Aggregations
The Python numpy aggregate functions are sum, min, max, mean, average, product,
median, standard deviation, variance, argmin, argmax, percentile, cumprod, cumsum, and
corrcoef.
Functions
min() function returns the item with the lowest value, or the item with the lowest value in an
iterable.
20
DATA SCIENCE LAB MANUAL

If the values are strings, an alphabetically comparison is done.

Input: x = min(5, 10)


Output:5

max() function returns the item with the highest value, or the item with the highest value in an
iterable.

If the values are strings, an alphabetically comparison is done.

max(iterable)

Input: x = max(5, 10)


Output:10

Mean, Median, and Mode

What can we learn from looking at a group of numbers?

In Machine Learning (and in mathematics) there are often three values that interests us:

 Mean - The average value


 Median - The mid point value
 Mode - The most common value

Mean
import numpy
speed=[99,86,87,88,111,86,103,87,94,78,77,85,86]
x=numpy.mean(speed)
print(x)

Output

89.76923076923077

Median

21
DATA SCIENCE LAB MANUAL

import numpy
speed=[99,86,87,88,111,86,103,87,94,78,77,85,86]
x=numpy.median(speed)
print(x)

Output

87.0
Mode
import numpy
speed = [99,86,87,88,111,86,103,87,94,78,77,85,86]
x=numpy.mode(speed)
print(x)

Output

86,87
Standard deviation is a number that describes how spread out the values are.

import numpy

speed=[86,87,88,86,87,85,86]

x=numpy.std(speed)

print(x)

Output

0.90
Joining NumPy Arrays

Joining means putting contents of two or more arrays in a single array.

Example

Join two arrays

22
DATA SCIENCE LAB MANUAL

import numpy as np

arr1 = np.array([1, 2, 3])

arr2 = np.array([4, 5, 6])

arr = np.concatenate((arr1, arr2))

print(arr)

OUTPUT
[1 2 3 4 5 6]
Splitting NumPy Arrays

Splitting is reverse operation of Joining.

Joining merges multiple arrays into one and Splitting breaks one array into multiple.

We use array_split() for splitting arrays, we pass it the array we want to split and the number of
splits.

import numpy as np

arr = np.array([1, 2, 3, 4, 5, 6])

newarr = np.array_split(arr, 3)

print(newarr)
OUTPUT
[array([1, 2]), array([3, 4]), array([5, 6])]

How To Create Your Own ufunc

To create your own ufunc, you have to define a function, like you do with normal functions in
Python, then you add it to your NumPyufunc library with the frompyfunc() method.

The frompyfunc() method takes the following arguments:

23
DATA SCIENCE LAB MANUAL

1. function - the name of the function.


2. inputs - the number of input arguments (arrays).
3. outputs - the number of output arrays.

Example

Create your own ufunc for addition:

import numpy as np

def myadd(x, y):


return x+y

myadd = np.frompyfunc(myadd, 2, 1)

print(myadd([1, 2, 3, 4], [5, 6, 7, 8]))

OUTPUT

[6 8 10 12]

Sorting Arrays

Sorting means putting elements in an ordered sequence.

Ordered sequence is any sequence that has an order corresponding to elements, like numeric or
alphabetical, ascending or descending.

The NumPyndarray object has a function called sort(), that will sort a specified array.

Example

Sort the array:

import numpy as np
arr = np.array([3, 2, 0, 1])
print(np.sort(arr))

OUTPUT

[0 1 2 3]

24
DATA SCIENCE LAB MANUAL

Comparisons, Masks, and Boolean Logic


NumPy also implements comparisonoperators such as < (less than) and > (greater than)
as element-wise ufuncs.the result of these comparison operators is always an array with a
Boolean data type.
All six of the standard comparison operations are available:
In[4]: x = np.array([1, 2, 3, 4, 5])
In[5]: x <3 # less than
Out[5]: array([ True, True, False, False, False], dtype=bool)
In[6]: x >3 # greater than
Out[6]: array([False, False, False, True, True], dtype=bool)
In[7]: x <= 3 # less than or equal
Out[7]: array([ True, True, True, False, False], dtype=bool)
In[8]: x >= 3 # greater than or equal
Out[8]: array([False, False, True, True, True], dtype=bool)
In[9]: x != 3 # not equal
Out[9]: array([ True, True, False, True, True], dtype=bool)
In[10]: x == 3 # equal
Out[10]: array([False, False, True, False, False], dtype=bool)

Result :
Thus python programs for creating and accessing arrays have been executed.

25
DATA SCIENCE LAB MANUAL

PANDAS

26
DATA SCIENCE LAB MANUAL

EX.No:3
Working with Pandas data frames
Date:

Related CO & PoCO1 & CO2-PO1,2,3,4,5,9,11,12

Aim:

To write Python programs for using pandas data frames and accessing it.

Alogorithm:
1. Start the Program.
2. Import Numpy & Pandas Packages.
3. Create a Dataframe for the list of elements.
4. Load a Dataset from an external source into a pandas dataframe
5. Display the Output.
6. Stop the Program
Import Pandas
Once Pandas is installed, import it in your applications by adding the import keyword:
import pandas
SERIES
A Pandas Series is like a column in a table.It is a one-dimensional array holding data
of any type.
Example
Create a simple Pandas Series from a list:
import pandas as pd
a = [1, 7, 2]
myvar = pd.Series(a)
print(myvar)

Output:

27
DATA SCIENCE LAB MANUAL

DataFrame
A Pandas DataFrame is a 2 dimensional data structure, like a 2 dimensional array, or a
table with rows and columns.

Load Files Into a DataFrame


If your data sets are stored in a file, Pandas can load them into a DataFrame.

Example
Load a comma separated file (CSV file) into a DataFrame:

import pandas as pd

df = pd.read_csv('data.csv')

print(df)

output
Dura tion PulseMax pulse Calories
0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ... ...
164 60 105 140 290.8
165 60 110 145 300.4
166 60 115 145 310.2
167 75 120 150 320.4
168 75 125 150 330.4

[169 rows x 4 columns]


MISSING DATA

28
DATA SCIENCE LAB MANUAL

In DataFrame sometimes many datasets simply arrive with missing data, either because it
exists and was not collected or it never existed.

In Pandas missing data is represented by two value:


1. None: None is a Python singleton object that is often used for missing data in Python code.

2. NaN :NaN (an acronym for Not a Number), is a special floating-point value recognized by
all systems that use the standard IEEE floating-point representation
Loading Data
First of all, let’s import the libraries.

Missing data in the Pandas is represented by the value NaN (Not a Number).

You can use the isnull method to see missing data in the data.

The notnull method does the opposite of the isnull method. Let me show that.

29
DATA SCIENCE LAB MANUAL

Let’s assign None to a value in data.

If you want to remove the missing data, you can use the dropna method.

Hierarchical Indexes

Hierarchical indexing is a method of creating structured group relationships in the

dataset. Data frames can have hierarchical indexes. To show this, let me create a dataset.

30
DATA SCIENCE LAB MANUAL

Aggregation and Grouping

import pandas as pd
df = pd.DataFrame([[9, 4, 8, 9],
[8, 10, 7, 6],
[7, 6, 8, 5]],
columns=['Maths', 'English',
'Science', 'History'])
print(df)

OUTPUT

31
DATA SCIENCE LAB MANUAL

1. df.sum()

2. df.describe()

 We used agg() function to calculate the sum, min, and max of each column in our dataset.

Result :
Thus python programs for creating and accessing data frames using Pandas have been
executed and verified.

32
DATA SCIENCE LAB MANUAL

EX.No:4
Reading data from text files, Excel and the web and exploring
various commands for doing descriptive analytics on the Iris
Date:
data set
Related CO & PO CO3 & PO1,2,3,4,5,9,10,11,12

Aim:

To read the data from text files, Excel and the web and exploring various commands for
doing descriptive analytics on the Iris data set

Dataset Down Load link:

https://www.kaggle.com/datasets/arshid/iris-flower-dataset

Descriptive Analysis:

Descriptive analysis, also known as descriptive analytics or descriptive statistics, is the process
of using statistical techniques to describe or summarize a set of data.

Iris Data Set:

Iris Dataset is considered as the Hello World for data science. It contains five columns namely
– Petal Length, Petal Width, Sepal Length, Sepal Width, and Species Type. Iris is a flowering
plant, the researchers have measured various features of the different iris flowers and recorded
them digitally.

Reading data from text files

Step 1: Capture the file path

Step 2: Apply the Python code

Step 3: Run the Python code to import the Text file

Syntax:
data=pandas.read_csv(‘filename.txt’, sep=’ ‘, header=None, names=[“Column1”,
“Column2”])

33
DATA SCIENCE LAB MANUAL

Parameters:
 filename.txt: As the name suggests it is the name of the text file from which we want to
read data.
 sep: It is a separator field. In the text file, we use the space character(‘ ‘) as the separator.

 header: This is an optional field. By default, it will take the first line of the text file as a
header. If we use header=None then it will create the header.
 names: We can assign column names while importing the text file by using the names
argument.

Program:

# importing pandas
import pandas as pd
# read text file into pandas DataFrame
df = pd.read_csv("gfg.txt", sep=" ")
# display DataFrame
print(df)
Output:

Reading data from Web

Step 1: Capture the file path

Step 2: Apply the Python code

Step 3: Run the Python code to import the web file

import pandas as pd

url="https://raw.githubusercontent.com/cs109/2014_data/master/countries.csv"
c=pd.read_csv(url)

Reading data from Excel

34
DATA SCIENCE LAB MANUAL

Step 1: Capture the file path

Step 2: Apply the Python code

Step 3: Run the Python code to import the Excel file

Program
import pandas as pd
df = pd.read_excel('C:\Users\Ron\Desktop\products.xlsx')
print(df)

Output
product_name price
0 computer 700
1 tablet 250
2 printer 120
3 laptop 1200
4 Keyboard 100

Reading data from CSV File:

Step 1: Capture the file path

Step 2: Apply the Python code

Step 3: Run the Python code to import the CSV file

Iris data set

To load this CSV file, and we will convert it into the dataframe. read_csv() method is used to
read CSV files.import pandas as pd

35
DATA SCIENCE LAB MANUAL

# Reading the CSV file


df = pd.read_csv("Iris.csv")
# Printing top 5 rows
df.head()

OUTPUT

df.shape
Output:
(150, 6)

df.describe()(df.Mean(),df.std())

df.isnull().sum()

An Outlier is a data-item/object that deviates significantly from the rest of the (so-called
normal)objects.

# importing packages
importseaborn as sns
36
DATA SCIENCE LAB MANUAL

importmatplotlib.pyplot as plt

# Load the dataset


df = pd.read_csv('Iris.csv')

sns.boxplot(x='SepalWidthCm', data=df)

OUTPUT

# importing packages
importseaborn as sns
importmatplotlib.pyplot as plt
sns.countplot(x='Species', data=df, )
plt.show()
OUTPUT

# importing packages
importseaborn as sns
importmatplotlib.pyplot as plt
sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm',hue='Species', data=df, )
37
DATA SCIENCE LAB MANUAL

# Placing Legend outside the Figure


plt.legend(bbox_to_anchor=(1, 1), loc=2)
plt.show()

OUTPUT

Result:
Thus the data from text files, Excel and the web and exploring various commands for doing
descriptive analytics on the Iris data set were successfully completed and executed.

38
DATA SCIENCE LAB MANUAL

EX.No:5
UCI and Pima Indians Diabetes data set
Date:

Related CO & PO CO2,CO3,CO4-PO1,2,3,4,5,7,9,10,11,12

Aim:
To Use the diabetes data set from UCI and Pima Indians Diabetes data set for
performing the following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.

Algorithm:

Use the diabetes data set from UCI and Pima Indians Diabetes data set for performing the
following:
a. Univariate analysis: Frequency, Mean, Median, Mode, Variance, Standard
Deviation, Skewness and Kurtosis.
b. Bivariate analysis: Linear and logistic regression modeling
c. Multiple Regression analysis
d. Also compare the results of the above analysis for the two data sets.

UCI and Pima Indians Diabetes data set


UCI ML Datasets
C:\Users\Admin\Desktop\New folder\FDS
>>> import pandas as pd
>>>df=pd.read_csv('C:\\Users\\Admin\\Downloads\\hou_all.csv')
>>>df
0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24 1.1
0 0.02731 0.0 7.07 0 0.469 6.421 78.9 4.9671 2 242 17.8 396.90 9.14 21.6 1
1 0.02729 0.0 7.07 0 0.469 7.185 61.1 4.9671 2 242 17.8 392.83 4.03 34.7 1
2 0.03237 0.0 2.18 0 0.458 6.998 45.8 6.0622 3 222 18.7 394.63 2.94 33.4 1
3 0.06905 0.0 2.18 0 0.458 7.147 54.2 6.0622 3 222 18.7 396.90 5.33 36.2 1
4 0.02985 0.0 2.18 0 0.458 6.430 58.7 6.0622 3 222 18.7 394.12 5.21 28.7 1

.. ... ... ... .. ... ... ... ... .. ... ... ... ... ... ...
500 0.06263 0.0 11.93 0 0.573 6.593 69.1 2.4786 1 273 21.0 391.99 9.67 22.4 1
501 0.04527 0.0 11.93 0 0.573 6.120 76.7 2.2875 1 273 21.0 396.90 9.08 20.6 1

39
DATA SCIENCE LAB MANUAL

502 0.06076 0.0 11.93 0 0.573 6.976 91.0 2.1675 1 273 21.0 396.90 5.64 23.9 1

503 0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0 1
504 0.04741 0.0 11.93 0 0.573 6.030 80.8 2.5050 1 273 21.0 396.90 7.88 11.9 1
[505 rows x 15 columns]
>>>df.describe()
0.00632 18 2.31 0 ... 396.9 4.98 24 1.1

count 505.000000 505.000000 505.000000 505.000000 ... 505.000000 505.000000


505.000000 505.0
mean 3.620667 11.350495 11.154257 0.069307 ... 356.594376 12.668257
22.529901 1.0
std 8.608572 23.343704 6.855868 0.254227 ... 91.367787 7.139950 9.205991
0.0

min 0.009060 0.000000 0.460000 0.000000 ... 0.320000 1.730000 5.000000


1.0
25% 0.082210 0.000000 5.190000 0.000000 ... 375.330000 7.010000 17.000000
1.0
50% 0.259150 0.000000 9.690000 0.000000 ... 391.430000 11.380000 21.200000
1.0
75% 3.678220 12.500000 18.100000 0.000000 ... 396.210000 16.960000
25.000000 1.0
max 88.976200 100.000000 27.740000 1.000000 ... 396.900000 37.970000
50.000000 1.0
[8 rows x 15 columns]
>>>df.mean()
0.00632 3.620667
18 11.350495
2.31 11.154257
0 0.069307
0.538 0.554728
6.575 6.284059
65.2 68.581584
4.09 3.794459
1 9.566337
296 408.459406
40
DATA SCIENCE LAB MANUAL

15.3 18.461782
396.9 356.594376
4.98 12.668257
24 22.529901
1.1 1.000000

dtype: float64

>>>df.median()
0.00632 0.25915
18 0.00000
2.31 9.69000
0 0.00000
0.538 0.53800
6.575 6.20800
65.2 77.70000
4.09 3.19920
1 5.00000
296 330.00000
15.3 19.10000
396.9 391.43000
4.98 11.38000
24 21.20000
1.1 1.00000

dtype: float64

df.mode()
0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24 1.1

0 0.01501 0.0 18.1 0.0 0.538 5.713 100.0 3.4952 24.0 666.0 20.2 396.9 6.36 50.0 1.0
1 14.33370NaNNaNNaNNaN 6.127 NaNNaNNaNNaNNaNNaN 7.79 NaNNaN

2 NaNNaNNaNNaNNaN 6.167 NaNNaNNaNNaNNaNNaN 8.05 NaNNaN


3 NaNNaNNaNNaNNaN 6.229 NaNNaNNaNNaNNaNNaN 14.10 NaNNaN
4 NaNNaNNaNNaNNaN 6.405 NaNNaNNaNNaNNaNNaN 18.13 NaNNaN

41
DATA SCIENCE LAB MANUAL

5 NaNNaNNaNNaNNaN 6.417 NaNNaNNaNNaNNaNNaNNaNNaNNaN

>>>df.std()
0.00632 8.608572
18 23.343704
2.31 6.855868
0 0.254227
0.538 0.115990
6.575 0.703195
65.2 28.176371
4.09 2.107757
1 8.707553
296 168.629992
15.3 2.162520
396.9 91.367787
4.98 7.139950
24 9.205991
1.1 0.000000

dtype: float64

>>df.var()
0.00632 74.107509
18 544.928497
2.31 47.002931
0 0.064631
0.538 0.013454
6.575 0.494483
65.2 793.907894
4.09 4.442640
1 75.821484
296 28436.074242
15.3 4.676493

42
DATA SCIENCE LAB MANUAL

396.9 8348.072540
4.98 50.978891
24 84.750275
1.1 0.000000

dtype: float64

>>df.value_counts()
0.00632 18 2.31 0 0.538 6.575 65.2 4.09 1 296 15.3 396.9 4.98 24 1.1
0.00906 90.0 2.97 0 0.400 7.088 20.8 7.3073 1 285 15.3 394.72 7.85 32.2 1 1
1.05393 0.0 8.14 0 0.538 5.935 29.3 4.4986 4 307 21.0 386.85 6.58 23.1 1 1
1.41385 0.0 19.58 1 0.871 6.129 96.0 1.7494 5 403 14.7 321.02 15.12 17.0 1 1
1.38799 0.0 8.14 0 0.538 5.950 82.0 3.9900 4 307 21.0 232.60 27.71 13.2 1 1
1.35472 0.0 8.14 0 0.538 6.072 100.0 4.1750 4 307 21.0 376.73 13.04 14.5 1 1
..
0.11069 0.0 13.89 1 0.550 5.951 93.8 2.8893 5 276 16.4 396.90 17.92 21.5 1 1
0.11027 25.0 5.13 0 0.453 6.456 67.8 7.2255 8 284 19.7 396.90 6.73 22.2 1 1
0.10959 0.0 11.93 0 0.573 6.794 89.3 2.3889 1 273 21.0 393.45 6.48 22.0 1 1
0.10793 0.0 8.56 0 0.520 6.195 54.4 2.7778 5 384 20.9 393.49 13.00 21.7 1 1
88.97620 0.0 18.10 0 0.671 6.968 91.9 1.4165 24 666 20.2 396.90 17.21 10.4 1 1
Length: 505, dtype: int64
LINEAR ANALYSIS
>>> import pandas as pd
>>>df=pd.read_csv('C:\\Users\\Admin\\Downloads\\diabetes.csv')
>>>df
Pregnancies GlucoseBloodPressureSkinThickness Insulin BMI DiabetesPedigreeFunction
Age Outcome
0 6 148 72 35 0 33.6 0.627 50 1
1 1 85 66 29 0 26.6 0.351 31 0
2 8 183 64 0 0 23.3 0.672 32 1
3 1 89 66 23 94 28.1 0.167 21 0
4 0 137 40 35 168 43.1 2.288 33 1
.. ... ... ... ... ... ... ... ... ...

763 10 101 76 48 180 32.9 0.171 63 0


764 2 122 70 27 0 36.8 0.340 27 0
765 5 121 72 23 112 26.2 0.245 30 0
766 1 126 60 0 0 30.1 0.349 47 1

43
DATA SCIENCE LAB MANUAL

767 1 93 70 31 0 30.4 0.315 23 0

[768 rows x 9 columns]

importstatsmodels.apiassm

#define response variable


y = df['Glucose']

#define explanatory variable


x = df[[' Age']]

#add constant to predictor variables


x = sm.add_constant(x)

#fit linear regression model


model = sm.OLS(y, x).fit()

#view model summary


print(model.summary())

OUTPUT
OLS Regression Results
===================================================================
===========
Dep. Variable: score R-squared: 0.794
Model: OLS Adj. R-squared: 0.783
Method: Least Squares F-statistic: 69.56
Date: Sat, 29 Oct 2022 Prob (F-statistic): 1.35e-07
Time: 08:36:32 Log-Likelihood: -55.886
No. Observations: 20 AIC: 115.8

44
DATA SCIENCE LAB MANUAL

Df Residuals: 18 BIC: 117.8


Df Model: 1
Covariance Type: nonrobust
===================================================================
===========
coefstd err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------
const 69.0734 1.965 35.149 0.000 64.945 73.202
hours 3.8471 0.461 8.340 0.000 2.878 4.816
===================================================================
===========
Omnibus: 0.171 Durbin-Watson: 1.404
Prob(Omnibus): 0.918 Jarque-Bera (JB): 0.177
Skew: 0.165 Prob(JB): 0.915
Kurtosis: 2.679 Cond. No. 9.37
===================================================================
===========

Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.

Univariate analysis
>>>df.mean()
Pregnancies 3.845052
Glucose 120.894531
BloodPressure 69.105469
SkinThickness 20.536458
Insulin 79.799479
BMI 31.992578
DiabetesPedigreeFunction 0.471876
Age 33.240885
Outcome 0.348958
dtype: float64

>>>df.median()
Pregnancies 3.0000
Glucose 117.0000
BloodPressure 72.0000
SkinThickness 23.0000
45
DATA SCIENCE LAB MANUAL

Insulin 30.5000
BMI 32.0000
DiabetesPedigreeFunction 0.3725
Age 29.0000
Outcome 0.0000
dtype: float64

>>>>>>df['Age'].mean()
33.240885416666664
>>>df.mode()
Pregnancies GlucoseBloodPressureSkinThickness Insulin BMI DiabetesPedigreeFunction
Age Outcome
0 1.0 99 70.0 0.0 0.0 32.0 0.254 22.0 0.0
1 NaN 100 NaNNaNNaNNaN 0.258 NaNNaN

>>>df.var()
Pregnancies 11.354056
Glucose 1022.248314
BloodPressure 374.647271
SkinThickness 254.473245
Insulin 13281.180078
BMI 62.159984
DiabetesPedigreeFunction 0.109779
Age 138.303046
Outcome 0.227483
dtype: float64

>>>df.std()
Pregnancies 3.369578
Glucose 31.972618
BloodPressure 19.355807
SkinThickness 15.952218
Insulin 115.244002
BMI 7.884160
DiabetesPedigreeFunction 0.331329
Age 11.760232
Outcome 0.476951
dtype: float64

frequency
>>>df['Age'].value_counts()
22 72
21 63
25 48

46
DATA SCIENCE LAB MANUAL

24 46
23 38
28 35
26 33
27 32
29 29
31 24
41 22
30 21
37 19
42 18
33 17
38 16
36 16
32 16
45 15
34 14
46 13
43 13
40 13
39 12
35 10
50 8
51 8
52 8
44 8
58 7
47 6
54 6
49 5
48 5
57 5
53 5
60 5
66 4
63 4
62 4
55 4
67 3
56 3
59 3
65 3
69 2
61 2
72 1
81 1
64 1
70 1
68 1
Name: Age, dtype: int64
>>>df.skew()

47
DATA SCIENCE LAB MANUAL

Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
>>>df.kurt()
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
Outcome -1.600930
dtype: float64
Logistic
importstatsmodels.formula.apiassmf

#fit logistic regression model


model = smf.logit('Glucose + Age', data=df).fit()

#view model summary


print(model.summary())

Output
OLS Regression Results
===================================================================
===========
Dep. Variable: score R-squared: 0.794
Model: OLS Adj. R-squared: 0.783
Method: Least Squares F-statistic: 69.56
Date: Sat, 29 Oct 2022 Prob (F-statistic): 1.35e-
07 Time: 09:17:14 Log-Likelihood: -55.886
No. Observations: 20 AIC: 115.8
Df Residuals: 18 BIC: 117.8
Df Model: 1
Covariance Type: nonrobust
===================================================================
===========
coefstd err t P>|t| [0.025 0.975]

const 69.0734 1.965 35.149 0.000 64.945 73.202


hours 3.8471 0.461 8.340 0.000 2.878 4.816
48
DATA SCIENCE LAB MANUAL

===================================================================
===========
Omnibus: 0.171 Durbin-Watson: 1.404
Prob(Omnibus): 0.918 Jarque-Bera (JB): 0.177
Skew: 0.165 Prob(JB): 0.915
Kurtosis: 2.679 Cond. No. 9.37

===================================================================
===========
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
MULTIPLE REGRESSION

import pandas as pd

fromsklearn import linear_model

importstatsmodels.api as sm

df=pd.read_csv('C:\\Users\\Admin\\Downloads\\diabetes.csv')

x = df[[ ‘Glucose’, ‘Age']]


y = df[‘Income’]

# withsklearn
regr = linear_model.LinearRegression()
regr.fit(x, y)
print('Intercept: \n', regr.intercept_)
print('Coefficients: \n', regr.coef_)
# withstatsmodels
x = sm.add_constant(x) # adding a constant

model = sm.OLS(y, x).fit()


predictions = model.predict(x)

print_model = model.summary()
print(print_model)

OLS Regression Results


===================================================================
===========
Dep. Variable: index_price R-squared: 0.898
Model: OLS Adj. R-squared: 0.888

49
DATA SCIENCE LAB MANUAL

Method: Least Squares F-statistic: 92.07


Date: Sat, 29 Oct 2022 Prob (F-statistic): 4.04e-11
Time: 09:56:42 Log-Likelihood: -134.61
No. Observations: 24 AIC: 275.2
Df Residuals: 21 BIC: 278.8
Df Model: 2
Covariance Type: nonrobust
===================================================================
==================
coefstd err t P>|t| [0.025 0.975]

const 1798.4040 899.248 2.000 0.059 -71.685 3668.493


interest_rate 345.5401 111.367 3.103 0.005 113.940 577.140
unemployment_rate -250.1466 117.950 -2.121 0.046 -495.437 -4.856
===================================================================
===========
Omnibus: 2.691 Durbin-Watson: 0.530
Prob(Omnibus): 0.260 Jarque-Bera (JB): 1.551
Skew: -0.612 Prob(JB): 0.461
Kurtosis: 3.226 Cond. No. 394.
===================================================================
===========
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
>>>

Result :
Thus Python programs are written for performing univariate,bivariate and multiple
linear regression analysis and executed successfully.

50
DATA SCIENCE LAB MANUAL

EX.No:6
Apply and explore various plotting functions on UCI data sets.
Date:

Related CO & PO CO4,CO5 & PO1,2,3,4,5,9,10,11,12

Aim:
To Apply and explore various plotting functions on UCI data sets for performing the
following

a. Normal curves
b. Density and contour plots
c. Correlation and scatter plots d. Histograms e. Three dimensional plotting

Normal Curve :

The normal distribution is a continuous probability distribution that is symmetrical on both


sides of the mean, so the right side of the center is a mirror image of the left side.The normal
distribution is often called the bell curve because the graph of its probability density looks like
a bell. It is also known as called Gaussian distribution, after the German mathematician Carl
Gauss who first described it.

Program:
import pandas as pd
>>>df=pd.read_csv('C:\\Users\\Admin\\Downloads\\hou_all.csv')
>>>df

Output:

51
DATA SCIENCE LAB MANUAL

Program:

importmatplotlib.pyplot as plt
plt.plot(df.set)
[<matplotlib.lines.Line2D object at 0x000001AB280313F0>]
>>>plt.show()
Output:

Scatter Plot:

A scatter plot is also called a scatter chart, scattergram, or scatter plot, XY graph. The scatter
diagram graphs numerical data pairs, with one variable on each axis, show their relationship.
Now the question comes for everyone:

Program:

>>plt.scatter(df.set,df.value)
<matplotlib.collections.PathCollection object at 0x000001AB27E9ED70>
>>>plt.show()
Output:

52
DATA SCIENCE LAB MANUAL

Histograms :

A histogram is a graphical representation that organizes a group of data points into user-
specified ranges and an approximate representation of the distribution of numerical data.

Syntax: hist(v,main,xlab,xlim,ylim,breaks,col,border)

Program:
plt.hist(df)
(array([[504., 2., 0., 0., 0., 0., 0., 0., 0., 0.],
[474., 32., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[226., 280., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 17., 123., 130., 71., 28., 0., 0., 137.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 24., 12., 4., 9., 43., 414., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[506., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]), array([ 0. , 71.1, 142.2,
213.3, 284.4, 355.5, 426.6, 497.7, 568.8,
639.9, 711. ]), <a list of 15 BarContainer objects>)
>>>plt.show()
Output:

Program:

df.corr()

53
DATA SCIENCE LAB MANUAL

Output:

Contour Plot :

A contour plot is a graphical technique for representing a 3-dimensional surface by plotting


constant z slices, called contours, on a 2-dimensional format. That is, given a value for z, lines
are drawn for connecting the (x,y) coordinates where that z value occurs.

Program:

>>plt.contour(df)
<matplotlib.contour.QuadContourSet object at 0x000001AB27F0E740>
>>>plt.show()

Output:

Density Plot :

54
DATA SCIENCE LAB MANUAL

A density plot is a representation of the distribution of a numeric variable that uses a


kernel density estimate to show the probability density function of the variable. In Python
Language we use the density() function which helps to compute kernel density estimates. And
further with its return value, is used to build the final density plot.

Syntax: density(x)

Three Dimensional Plot :

The most basic three-dimensional plot is a line or collection of scatter plot created from sets
of (x, y, z) triples.

Program:

>>from mpl_toolkits.mplot3d import Axes3D


>>>fig = plt.figure()
>>>ax = fig.add_subplot(111, projection='3d')
>>>plt.show()

Output:

Result :

Thus python programs for exploring various plots using matplotlib were executed
successfully.

55
DATA SCIENCE LAB MANUAL

EX.No:7
Visualizing Geographic Data with Basemap
Date:
C

Related CO & PO CO5 & PO1,2,3,4,5,9,10,11,12

Aim:

To Visualize Geographic Data with Basemap

Algorithm:

1. Start the Program.


2. Import Basemap Package.
3. Perform Visualize Function of Geographic data.
4. Display the Output.
5. Stop the Program

Base Map

The common type of visualization in data science is that of geographic data. Matplotlib's main
tool for this type of visualization is the Basemap toolkit, which is one of several Matplotlib
toolkits which lives under the mpl_toolkits namespace.

Basemap is a matplotlib extension used to visualize and create geographical maps in python.

Installing Base map :

pip Install basemap

Importing Basemap and matplotlib libraries:

from mpl_toolkits.basemap import Basemap


import matplotlib.pyplot as plt

Functions used in Basemap:

Basemap() - To create a base map class.

Physical boundaries and bodies of water

56
DATA SCIENCE LAB MANUAL

 drawcoastlines(): Draw continental coast lines


 drawlsmask(): Draw a mask between the land and sea, for use with projecting
images on one or the other
 drawmapboundary(): Draw the map boundary, including the fill color for oceans.
 drawrivers(): Draw rivers on the map
 fillcontinents(): Fill the continents with a given color; optionally fill lakes with
another color

·Political boundaries

 ·drawcountries(): Draw country boundaries


 drawstates(): Draw US state boundaries
 drawcounties(): Draw US county boundaries

·Map features

 ·drawgreatcircle(): Draw a great circle between two points


 drawparallels(): Draw lines of constant latitude
 drawmeridians(): Draw lines of constant longitude
 drawmapscale(): Draw a linear scale on the map

·Whole-globe images

 ·bluemarble(): Project NASA's blue marble image onto the map


 shadedrelief(): Project a shaded relief image onto the map

Program:

from mpl_toolkits.basemap import Basemap

import matplotlib.pyplot as plt

fig = plt.figure(figsize = (12,12))

m = Basemap()

m.drawcoastlines()

plt.title("Coastlines", fontsize=20)

plt.show()

57
DATA SCIENCE LAB MANUAL

Output:

College Coordinates

fig = plt.figure(figsize=(8, 8))


m = Basemap(projection='lcc', resolution=None,
width=8E6, height=8E6,
lat_0=9.2343, lon_0=78.7836)
m.etopo(scale=0.5, alpha=0.5)

# Map (long, lat) to (x, y) for


plotting x, y = m(78.7836,
9.2343)
plt.plot(x, y, 'ok', markersize=5)
plt.text(x, y, ' Mohamed Sathak Engineering College', fontsize=12);

Result:
Thus Base map toolkit was used to visualize the geographic data.
58
DATA SCIENCE LAB MANUAL

Experiments beyond the Syllabus

59
DATA SCIENCE LAB MANUAL

EX.No:8
Predict Income with Census Data
Date:

Related CO & PO CO6 &PO1,2,3,4,5,6,7,8,9,10,11,12

Aim
To develop Census Income Project Using Python.

Problem Statement:
The prediction task is to determine whether a person makes over $50K a year or less then.

Program

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline
import seaborn as sns
from plotly.offline import iplot
import plotly as py
py.offline.init_notebook_mode(connected=True)from sklearn.preprocessing import LabelEncoder
from sklearn.model_selection import train_test_split
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.tree import DecisionTreeClassifier
from sklearn.ensemble import RandomForestClassifier
from sklearn.ensemble import AdaBoostClassifier
from sklearn.svm import SVC
from sklearn.ensemble import GradientBoostingClassifier
from xgboost import XGBClassifier
from sklearn.metrics import accuracy_score,confusion_matrix,classification_reportimport warnings
warnings.filterwarnings('ignore')

Importing dataset from GitHub.


df_census = pd.read_csv("https://raw.githubusercontent.com/dsrscientist/dataset1/master/census_income.csv")

Exploratory Data Analysis (EDA).


df_census.head()

60
DATA SCIENCE LAB MANUAL

df_census.info()<class 'pandas.core.frame.DataFrame'>
RangeIndex: 32560 entrie s, 0 to 32559
Data columns (total 15 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 Age 32560 non-null int64
1 Workclass 32560 non-null object
2 Fnlwgt 32560 non-null int64
3 Education 32560 non-null object
4 Education_num 32560 non-null int64
5 Marital_status 32560 non-null object
6 Occupation 32560 non-null object
7 Relationship 32560 non-null object
8 Race 32560 non-null object
9 Sex 32560 non-null object
10 Capital_gain 32560 non-null int64
11 Capital_loss 32560 non-null int64
12 Hours_per_week 32560 non-null int64
13 Native_country 32560 non-null object
14 Income 32560 non-null object
dtypes: int64(6), object(9)
memory usage: 3.7+ MB

61
DATA SCIENCE LAB MANUAL

The data-set contains 32560 rows and 14 features + the target variable (Income). 6 are integers and 9

are objects. Below I have listed the features with a short description:
0 Age : The age of people
1 Workclass : Type of job
2 Fnlwgt : Final weight
3 Education : Education status
4 Education_num : Number of years of education in total
5 Marital_status : Marital_status
6 Occupation : Occupation
7 Relationship : Relationship
8 Race : residential segregation
9 Sex : Gender
10 Capital_gain : Capital gain is the profit one earns
11 Capital_loss : Capital gain is the profit one loose
12 Hours_per_week : Earning rate as per hrs
13 Native_country : Country
14 Income : Income (Target variable)df_census.describe()

df_census['Income'].value_counts().plot(kind='bar')

Result

Thus the output was successfully executed.

62
DATA SCIENCE LAB MANUAL

SAMPLE EXERCISE

63
DATA SCIENCE LAB MANUAL

Write a NumPy program to create a null vector of size 10 and update sixth
value to 11
Python Code :

import numpy as np

x = np.zeros(10)

print(x)

print("Update sixth value to 11")

x[6] = 11

print(x)

Sample Output:

[ 0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

Update sixth value to 11

[ 0. 0. 0. 0. 0. 0. 11. 0. 0. 0.]

Write a NumPy program to convert an array to a float type


Python Code :

import numpy as np

import numpy as np

a = [1, 2, 3, 4]

print("Original array")

print(a)

x = np.asfarray(a)

print("Array converted to a float type:")

print(x)

Sample Output:

Original array

[1, 2, 3, 4]

64
DATA SCIENCE LAB MANUAL

Array converted to a float type:

[ 1. 2. 3. 4.]

Write a NumPy program to create a 3x3 matrix with values ranging from 2 to 10
Python Code :

import numpy as np

x = np.arange(2, 11).reshape(3,3)

print(x)

Sample Output:

[[ 2 3 4]

[ 5 6 7]

[ 8 9 10]]

Write a NumPy program to convert a list of numeric value into a one-dimensional


NumPy array
Python Code :

import numpy as np

l = [12.23, 13.32, 100, 36.32]

print("Original List:",l)

a = np.array(l)

print("One-dimensional NumPy array: ",a)

Sample Output:

Original List: [12.23, 13.32, 100, 36.32]

One-dimensional NumPy array: [ 12.23 13.32 100. 36.32]

Write a NumPy program to convert an array to a float type

Python Code :

import numpy as np

import numpy as np

65
DATA SCIENCE LAB MANUAL

a = [1, 2, 3, 4]

print("Original array")

print(a)

x = np.asfarray(a)

print("Array converted to a float type:")

print(x)

Sample Output

Original array
[1, 2, 3, 4]
Array converted to a float type:
[ 1. 2. 3. 4.]

Write a NumPy program to create an empty and a full array


Python Code :

import numpy as np

# Create an empty array

x = np.empty((3,4))

print(x)

# Create a full array

y = np.full((3,3),6)

print(y)

Sample Output:

[[ 6.93643969e-310 8.76783124e-317 6.93643881e-310 6.79038654e-31


3]
[ 2.22809558e-312 2.14321575e-312 2.35541533e-312 2.42092166e-32
2]
[ 7.46824097e-317 9.08479214e-317 2.46151512e-312 2.41907520e-31
2]]
[[6 6 6]
[6 6 6]
[6 6 6]]

Write a NumPy program to convert a list and tuple into arrays


Python Code :

66
DATA SCIENCE LAB MANUAL

import numpy as np

my_list = [1, 2, 3, 4, 5, 6, 7, 8]

print("List to array: ")

print(np.asarray(my_list))

my_tuple = ([8, 4, 6], [1, 2, 3])

print("Tuple to array: ")

print(np.asarray(my_tuple))

Sample Output:

List to array:
[1 2 3 4 5 6 7 8]
Tuple to array:
[[8 4 6]
[1 2 3]]

Write a NumPy program to find the real and imaginary parts of an array of complex numbers
Python Code :

import numpy as np

x = np.sqrt ([1+0j])

y = np.sqrt ([0+1j])

print("Original array:x ",x)

print("Original array:y ",y)

print("Real part of the array:")

print(x.real)

print(y.real)

print("Imaginary part of the array:")

print(x.imag)

print(y.imag)

Sample Output

Original array:x [ 1.+0.j]


Original array:y [ 0.70710678+0.70710678j]
Real part of the array:
[ 1.]

67
DATA SCIENCE LAB MANUAL

[ 0.70710678]
Imaginary part of the array:
[ 0.]
[ 0.70710678]
Write a Pandas program to get the powers of an array values element-wise.
Note: First array elements raised to powers from second array

Sample data: {'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':


[86,97,96,72,83]}

Expected Output:
XYZ
0 78 84 86
1 85 94 97
2 96 89 96
3 80 83 72
4 86 86 83
Python Code :

import pandas as pd

df = pd.DataFrame({'X':[78,85,96,80,86], 'Y':[84,94,89,83,86],'Z':[86,97,96,72,83]});

print(df)

Sample Output:

X Y Z

0 78 8 86
4
1 85 9 97
4
2 96 8 96
9
3 80 8 72
3
4 86 86 83
Write a Pandas program to select the specified columns and rows from a given
data frame.Sample Python dictionary data and list labels:

Select 'name' and 'score' columns in rows 1, 3, 5, 6 from the following data
frame. exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily',
'Michael', 'Matthew','Laura', 'Kevin', 'Jonas'],'score': [12.5, 9, 16.5, np.nan, 9,
20, 14.5, np.nan, 8, 19],
68
DATA SCIENCE LAB MANUAL

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],


'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

69
DATA SCIENCE LAB MANUAL

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

Expected Output:
Select specific columns and rows:
score
quali
fyb
9.0
no
d NaN nof
20.0 yes
g 14.5 yes

Python Code :

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',


'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index =labels)

print("Select specific columns and rows:")

print(df.iloc[[1, 3, 5, 6], [1, 3]])


Sample Output:

Select specific columns and rows:


score qualify
b 9.0 no
d NaN no
f 20.0 yes
g 14.5 yes

Write a Pandas program to count the number of rows and columns of a


DataFrame.Sample Python dictionary data and list labels:

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James',


'Emily', 'Michael', 'Matthew','Laura', 'Kevin', 'Jonas'],
'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],
70
DATA SCIENCE LAB MANUAL

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],


'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}
labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']
Expected Output:
Number of
Rows: 10
Number of
Columns: 4
Python Code :

import pandas as pd

import numpy as np

exam_data = {'name': ['Anastasia', 'Dima', 'Katherine', 'James', 'Emily', 'Michael', 'Matthew',


'Laura', 'Kevin', 'Jonas'],

'score': [12.5, 9, 16.5, np.nan, 9, 20, 14.5, np.nan, 8, 19],

'attempts': [1, 3, 2, 3, 2, 3, 1, 1, 2, 1],

'qualify': ['yes', 'no', 'yes', 'no', 'no', 'yes', 'yes', 'no', 'no', 'yes']}

labels = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h', 'i', 'j']

df = pd.DataFrame(exam_data , index=labels)

total_rows=len(df.axes[0])

total_cols=len(df.axes[1])

print("Number of Rows: "+str(total_rows))

print("Number of Columns: "+str(total_cols))

Sample Output:

Number of Rows: 10

Number of Columns: 4

Reading data from text files, Excel and the web and exploring various
commands for doingdescriptive analytics on the Iris data set

Use the diabetes data set from Pima Indians Diabetes data set for performing
thefollowing:

Apply Univariate analysis:

71
DATA SCIENCE LAB MANUAL

 Frequency
 Mean,
 Median,
 Mode,
 Variance
 Standard Deviation
 Skewness and Kurtosis

Use the diabetes data set from Pima Indians Diabetes data set for
performing thefollowing:
Apply Bivariate analysis:

 Linear and logistic regression modeling


 #create linear regression object
 reg =linear_model.Linearregression()
 #train the model using the training sets
 reg.fit(X_train, Y_train)
 Y_predict =reg.predict(X_test)
 #regression coeffiecients
 print(‘coefficients:’,reg.coef_)
 #variance score: 1 means perfect prediction
 print(‘variance score : {}’.format(reg.score(X_test, Y_test)))
 #checking predictions accuracy by r2 scores (value lies between 0 to 1)
 from sklearn.metrics import r2_score
 print(‘r^2:’r2_score(Y_test, Y_predict))
 #calculating root mean square error
 from sklearn.metrics import mean_squared_error
 mse=mean_squared_error(Y_test, Y_predict)
 rmse=np.sqrt(mse)
 print(‘RMSE:’,rmse)

Use the diabetes data set from Pima Indians Diabetes data set for performing
thefollowing:

Apply Bivariate analysis:

 Multiple Regression analysis


 MULTIPLE REGRESSION : DIABETES DATASET(PIMA)
 import matplotlib.py as plt
 import numpy as np
 from sklearn import datasets, linear_model,metrics
 import pandas as pd
 import numpy as np
 df=pd.ref.read_csv(‘pima_diabetes.csv’)
 data=df[[‘age’,’glucose’,’BMI’,’bloodpressure’,pregnancies’]]
 target=df[[‘outcome’]]

72
DATA SCIENCE LAB MANUAL

 print(data)
 print(target)
 #defining feature matrix(x) and response vector(y)
 X=data
 Y=target
 #splitting X and Y into triaining and testing sets
 from sklearn.model_selection import train_test_split
 X_train, X_test, Y_train =train_test_split(X,Y, test_size=0.3,
 random_state=101)

Apply and explore various plotting functions on UCI data set for performing the
following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting

Apply and explore various plotting functions on UCI data set for performing the
following:

a) Correlation and scatter plots


b) Histograms
c) Three-dimensional plotting

Apply and explore various plotting functions on Pima Indians Diabetes data set for
performing the following:

a) Normal values
b) Density and contour plots
c) Three-dimensional plotting

Apply and explore various plotting functions on Pima Indians Diabetes data
set forperforming the following:

a) Correlation and scatter plots


b) Histograms
c) Three-dimensional plotting
Write a Pandas program to count number of columns of a DataFrame.

Sample Output:
Original
73
DATA SCIENCE LAB MANUAL

DataFrame
col1 col2
col3
0147
1258
2 3 6 12
3491
4 7 5 11
Number of columns:
3
Python Code :
import pandas as pd

d = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]}

df = pd.DataFrame(data=d)

print("Original DataFrame")

print(df)

print("\nNumber of columns:")

print(len(df.columns))

Write a Pandas program to group by the first column and get second
column as lists inrows

Sample data:
Original
DataFrame
col1 col2
0 C1 1
1 C1 2
2 C2 3
3 C2 3
4 C2 4
5 C3 6
6 C2 5
Group on the col1:
col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object

74
DATA SCIENCE LAB MANUAL

Python Code :

import pandas as pd

df = pd.DataFrame( {'col1':['C1','C1','C2','C2','C2','C3','C2'], 'col2':[1,2,3,3,4,6,5]})

print("Original DataFrame")

print(df)

df = df.groupby('col1')['col2'].apply(list)

print("\nGroup on the col1:")

print(df)

Sample Output:

Original DataFrame

col1 col2

0 C1 1

1 C1 2

2 C2 3

3 C2 3

4 C2 4

5 C3 6

6 C2 5

Group on the col1:


col1
C1 [1, 2]
C2 [3, 3, 4, 5]
C3 [6]
Name: col2, dtype: object

Write a Pandas program to check whether a given column is present in a


DataFrame ornot.

Sample data:
Original
DataFrame
col1 col2
col3
75
DATA SCIENCE LAB MANUAL

0147
1258
2 3 6 12
3491
4 7 5 11
Col4 is not
present in
DataFrame.Col1
is present in
DataFrame.

Python Code :

import pandas as pd

d = {'col1': [1, 2, 3, 4, 7], 'col2': [4, 5, 6, 9, 5], 'col3': [7, 8, 12, 1, 11]}

df = pd.DataFrame(data=d)

print("Original DataFrame")

print(df)

if 'col4' in df.columns:

print("Col4 is present in DataFrame.")

else:

print("Col4 is not present in DataFrame.")

if 'col1' in df.columns:

print("Col1 is present in DataFrame.")

else:

print("Col1 is not present in DataFrame.")

Sample Output:

Original DataFrame
col1 col2 col3
0 1 4 7
1 2 5 8
2 3 6 12
3 4 9 1
4 7 5 11
Col4 is not present in DataFrame.
Col1 is present in DataFrame.

76
DATA SCIENCE LAB MANUAL

Create two arrays of six elements. Write a NumPy program to count


the number of instances of a value occurring in one array on the
condition of another array.

Sample Output:

Original arrays:
[ 10 -10 10 -10 -10 10]
[0.85 0.45 0.9 0.8 0.12 0.6 ]
Number of instances of a value occurring in one array on the
condition of another array:3

Python Code :

import numpy as np

x = np.array([10,-10,10,-10,-10,10])

y = np.array([.85,.45,.9,.8,.12,.6])

print("Original arrays:")

print(x)

print(y)

result = np.sum((x == 10) & (y > .5))

print("\nNumber of instances of a value occurring in one array on the condition of another


array:")

print(result)

Sample Output:

Original arrays:

[ 10 -10 10 -10 -10 10]

[0.85 0.45 0.9 0.8 0.12 0.6 ]

Number of instances of a value occurring in one array on the condition of another array:

77
DATA SCIENCE LAB MANUAL

Create a 2-dimensional array of size 2 x 3, composed of 4-byte integer


elements. Write a NumPy program to find the number of occurrences of a
sequence in the said array.
Sample Output:

Original NumPy array:


[[1 2 3]
[2 1 2]]
Type: <class
'numpy.ndarr
ay'>
Sequence:
2,3
Number of occurrences of the said sequence: 2

Python Code:

import numpy as np

np_array = np.array([[1, 2, 3], [2, 1, 2]], np.int32)

print("Original Numpy array:")

print(np_array)

print("Type: ",type(np_array))

print("Sequence: 1,2",)

result = repr(np_array).count("1, 2")

print("Number of occurrences of the said sequence:",result)

Sample Output:

Original Numpy array:

[[1 2 3]

[2 1 2]]

Type: <class 'numpy.ndarray'>

Sequence: 1,2

Number of occurrences of the said sequence: 2

78
DATA SCIENCE LAB MANUAL

Write a NumPy program to merge three given NumPy arrays of same shape

Python Code :

import numpy as np

arr1 = np.random.random(size=(25, 25, 1))

arr2 = np.random.random(size=(25, 25, 1))

arr3 = np.random.random(size=(25, 25, 1))

print("Original arrays:")

print(arr1)

print(arr2)

print(arr3)

result = np.concatenate((arr1, arr2, arr3), axis=-1) print("\

nAfter concatenate:")

print(result)
Sample Output:

Original arrays:
[[[2.42789481e-01]
[4.92252795e-01]
[9.33448807e-01]
[7.25450297e-01]
[9.74093474e-02]
[5.68505405e-01]
[9.65681560e-01]
[7.94931731e-01]
[7.52893987e-01]
[5.43942380e-01]
[9.38096939e-01]
[8.38066653e-01]
[7.83185689e-01]
[4.22962615e-02]
[2.96843761e-01]
[9.50102088e-01]
[6.36912393e-01]
[3.75066692e-03]
[6.03600756e-01]
[4.22466907e-01]
[3.23442622e-01]
[7.23251484e-02]
[1.49598420e-01]
[5.45714254e-01]

79
DATA SCIENCE LAB MANUAL

[9.59122727e-01]]
.........

After concatenate:
[[[0.24278948 0.41363799 0.00761597]
[0.4922528 0.8033311 0.73833312]
[0.93344881 0.55224706 0.72935665]
...
[0.14959842 0.26294052 0.63326384]
[0.54571425 0.82177763 0.7713901 ]
[0.95912273 0.39791879 0.7461949 ]]

[[0.58204652 0.28280547 0.32875312]


[0.05928564 0.63875077 0.85676908]
[0.02107624 0.37230817 0.32580032]
...

[0.81238835 0.16140993 0.79200889]


[0.01420085 0.77062809 0.56002668]
[0.34129681 0.60455665 0.91487593]]

[[0.61701713 0.53000888 0.91511595]


[0.59898594 0.74249711 0.26521013]
[0.17960885 0.93751794 0.65149374]
...
[0.57535478 0.35775559 0.37634059]
[0.85069491 0.80337115 0.21084745]
[0.5365623 0.512

Write a NumPy program to combine last element with first element of two
given ndarraywith different shapes.

Sample Output:
Original arrays:
['PHP', 'JS', 'C++']
['Pyt
hon',
'C#',
'Num
Py']
After
Com
binin
g:
['PHP' 'JS' 'C++Python' 'C#' 'NumPy']
Python Code :

import numpy as np

80
array1 = ['PHP','JS','C++']

array2 = ['Python','C#', 'NumPy']

print("Original arrays:")

print(array1)

print(array2)

result = np.r_[array1[:-1], [array1[-1]+array2[0]], array2[1:]] print("\

nAfter Combining:")

print(result)
Sample Output:

Original arrays:
['PHP', 'JS', 'C++']
['Python', 'C#', 'NumPy']

After Combining:
['PHP' 'JS' 'C++Python' 'C#' 'NumPy']

You might also like