0% found this document useful (0 votes)
12 views38 pages

EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages

The document provides a comprehensive guide on installing Anaconda, creating environments, and running Jupyter Notebook, along with practical examples of using NumPy and Pandas for data manipulation and analysis. It covers creating and manipulating arrays with NumPy, as well as creating and managing DataFrames with Pandas, including reading data from files and performing descriptive analytics. Additionally, it discusses data visualization techniques using libraries like Seaborn and Matplotlib, specifically in the context of datasets related to diabetes and the Iris dataset.

Uploaded by

rramaniravi18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views38 pages

EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages

The document provides a comprehensive guide on installing Anaconda, creating environments, and running Jupyter Notebook, along with practical examples of using NumPy and Pandas for data manipulation and analysis. It covers creating and manipulating arrays with NumPy, as well as creating and managing DataFrames with Pandas, including reading data from files and performing descriptive analytics. Additionally, it discusses data visualization techniques using libraries like Seaborn and Matplotlib, specifically in the context of datasets related to diabetes and the Iris dataset.

Uploaded by

rramaniravi18
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 38

EX.

No: 1
Date: DOWNLOAD INSTALL EXPLORE THE FEATURES OF NUMPY,
SCIPY, JUPITER, STATSMODELS AND PANDAS PACKAGES

How to Install Anaconda & Run Jupyter Notebook

Instructions To Install Anaconda and Run Jupyter Notebook


 Download & Install Anaconda Distribution
 Create Anaconda Environment
 Install and Run Jupyter Notebook

Download & Install Anaconda Distribution

Follow the below step-by-step instructions to install Anaconda


distribution.

Download Anaconda Distribution

Go to https://anaconda.com/ and select Anaconda Individual Edition


todownload the latest version of Anaconda. This downloads the .exe file to the
windows download
folder.
Install Anaconda

By double-clicking the .exe file starts the Anaconda installation. Followthe


below screen shot’s and complete the installation
This finishes the installation of Anaconda distribution, now let’s see howto
create an environment and install Jupyter Notebook
Create Anaconda Environment from Navigator
A conda environment is a directory that contains a specific collection of conda
packages that you have installed. For example, you may have one environment with
NumPy
1.7 and its dependencies, and another environmentwith NumPy 1.6 for legacy
testing. https://conda.io/docs/using/envs.html
Open Anaconda Navigator

Open Anaconda Navigator from windows start or by searching it.


Anaconda Navigator is a UI application where you can control theAnaconda
packages, environment
e.t.c

Create an Environment to Run Jupyter Notebook

This is optional but recommended to create an environment


before you proceed. This gives complete segregation of different
package installs for different projects you would be working on. If
you already have an
Environment, you can use it too.
Select + Create icon at the bottom of the screen to create an Anaconda
environment.
Install and Run Jupyter Notebook

Once you create the anaconda environment, go back to the Home page on
Anaconda Navigator and install Jupyter Notebook from an application on the right
panel.

It will take a few seconds to install Jupyter to your environment, once


the install completes, you can open Jupyter from the same screen or by
accessing Anaconda Navigator -> Environments -> your
environment (mine pandas-tutorial) -> select Open With
Jupyter Notebook.

This opens up Jupyter Notebook in the default browser.

Now select New -> PythonX and enter the below lines and select Run. On
Jupyter, each cell is a statement, so you can run each cell independently
when there are no dependencies on previous cells.
This completes installing Anaconda and running Jupyter Notebook
WORKING WITH NUMPY ARRAYS

(i) Create a NumPy ndarray

Object Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) print(arr)
print(type(arr))

Output

[1 2 3 4 5]
<class 'numpy.ndarray'>

2-D Arrays
Program
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]]) print(arr)

Output

[[1 2 3]
[4 5 6]]

3-D arrays Program


import numpy as np
arr = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
print(arr)

Output

[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
(iii) Check Number of Dimensions?
Program
import numpy as npa
= np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])

print(a.ndim) print(b.ndim) print(c.ndim) print(d.ndim)

Output
0
1
2
3

Access Array Elements Program

import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])

Output
1

Program
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])

Output

7
(iv) Slicing arrays Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])

Output

[2 3 4 5]
NumPy Array Shape Program

import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)

Output
(2, 4)

(v) Reshaping arrays Program


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12])
newarr = arr.reshape(4, 3) print(newarr)

Output

[[ 1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]]

Iterating Arrays Program

import numpy as np
arr= np.array([1, 2, 3])
for x in arr:
print(x)

Output
1
2
3
Joining NumPy Arrays Program

import numpy as np
arr1= np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2)) print(arr)

Output
[1 2 3 4 5 6]

Splitting NumPy Arrays Program


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6])
newarr= np.array_split(arr, 3)
print(newarr)

Output

[array([1, 2]), array([3, 4]), array([5, 6])]

Searching Arrays Program


import numpy as np
arr = np.array([1, 2, 3, 4, 5, 4, 4])
x= np.where(arr == 4)
print(x)

Output
(array([3, 5, 6]),)

Sorting ArraysProgram

import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))

Output
[0 1 2 3]
WORKING WITH PANDAS DATA FRAMES

Create a simple Pandas DataFrame:


Program
import pandas as pd data = {
"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data)
Print(df)

Output

calories duration
0 420 50
1 380

2 390 45

Locate Row Program


print(df.loc[0])

Output
calories 420
duration 50
Name: 0, dtype: int64

Note: This example returns a Pandas Series.

(iv )use a list of indexes:

Program

print(df.loc[[0, 1]])

Output
calories duration
0 420 50
1 380 40
Note: When using [], the result is a Pandas DataFrame.
Named Indexes Program

import pandas as pd data = {


"calories": [420, 380, 390],
"duration": [50, 40, 45]
}
df = pd.DataFrame(data, index = ["day1", "day2", "day3"]) print(df)

Output
calories duration
day1 420 50
day2 380 40
day3 390 45

Locate Named Indexes


print(df.loc["day2"])

Output
calories 380
duration 40
Name: 0, dtype: int64

Load Files Into a DataFrame Program

import pandas as pd
df = pd.read_csv('data.csv')
print(df)

Output

Duration Pulse Maxpulse Calories


0 60 110 130 409.1
1 60 117 145 479.0
2 60 103 135 340.0
3 45 109 175 282.4
4 45 117 148 406.0
.. ... ... ...
...
164 60 105 140 290.8

165 60 110 145 300.4

166 60 115 145 310.2

167 75 120 150 320.4

168 75 125 150 330.4

[169 rows x 4 columns]

Check the number of maximum returned rows: Program

import pandas as pd
print(pd.options.display.max_rows)

In my system the number is 60, which means that if the DataFrame contains more than 60
rows, the print(df) statement will return only the headers and the first andlast 5 rows.

import pandas as pd
pd.options.display.max_rows = 9999
df = pd.read_csv('data.csv')
print(df)

Viewing the Data Program


import pandas as pd
df = pd.read_csv('data.csv')
print(df.head(4))
Output
Duration Pulse Maxpulse Calories
0 60 110 130 409.1

1 60 117 145 479.0

2 60 103 135 340.0


3 45 109 175 282.4

4 45 117 148 406.0

Print the last 5 rows of the DataFrame:

print(df.tail())
print(df.info())

Output
<class 'pandas.core.frame.DataFrame'>

RangeIndex: 169 entries, 0 to 168

Data columns (total 4 columns):

# Column Non-Null Count Dtype

0 Duration 169 non-nullint64


1 Pulse 169 non-null int64
2 Maxpulse 169 non-nullint64
3 Calories 164 non-nullfloat64

dtypes: float64(1), int64(3)


memory usage: 5.4 KB

None
READING DATA FROM TEXT FILES, EXCEL AND THE WEB AND EXPLORINGVARIOUS
COMMANDS FOR DOING DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET.

Program
import pandas as pd

# Reading the CSV file

df = pd.read_csv("Iris.csv")

# Printing top 5 rows


df.head()

Output:

Getting Information about the Dataset

df.shape

Output: (150, 6)

df.info()
Output

df.describe()

Checking Missing Values

df.isnull().sum()
Checking Duplicates

data = df.drop_duplicates(subset ="Species",) data

Output

df.value_counts("Species")

Data Visualization

# importing packages
import seaborn as sns

import matplotlib.pyplot as plt

sns.countplot(x='Species', data=df, )
plt.show()
Comparing Sepal Length and Sepal Width

# importing packages

import seaborn as sns

import matplotlib.pyplot as plt

sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm',
hue='Species', data=df, )

# Placing Legend outside the Figure

plt.legend(bbox_to_anchor=(1, 1), loc=2)


plt.show()

# importing packages
import seaborn as sns

import matplotlib.pyplot as plt

sns.pairplot(df.drop(['Id'], axis = 1),


hue='Species', height=2)
Output:
USE THE DIABETES DATA SET FROM UCI AND PIMA INDIANS DIABETESDATA SET FOR
PERFORMING THE FOLLOWING

Program
import pandas as pd

import seaborn as sns

import matplotlib.pyplot as plt

%matplotlib inline

from sklearn.linear_model import LogisticRegression

from sklearn.externals import joblib


df = pd.read_csv('C:/Users/Downloads/diabetes.csv')

count = df['Glucose'].value_counts()

display(count)

df.head()

25
df.describe()

43

df.mean()

df.mode()

df.var()

26
df.std()

df.skew()

Pregnancies 0.901674

Glucose 0.173754
BloodPressure -1.843608

SkinThickness 0.109372

Insulin 2.272251
BMI -0.428982

DiabetesPedigreeFunction 1.919911
Age 1.129597

Outcome 0.635017

dtype: float64

df.kurtosis()
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
27
Outcome -1.600930
dtype: float64

corr = df.corr()
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)

sns.countplot('Outcome', data=df)
plt.show()

28
# Computing the %age of diabetic and non-diabetic in the sample

Out0=len([df.Outcome==1])

Out1=len([df.Outcome==0])

Total=Out0+Out1

PC_of_1 = Out1*100/Total
PC_of_0 = Out0*100/Total

PC_of_1, PC_of_0

(50.0, 50.0)
plt.figure(dpi = 120,figsize= (5,4))
mask = np.triu(np.ones_like(df.corr(),dtype = bool))

sns.heatmap(df.corr(),mask = mask, fmt = ".2f",annot=True,lw=1,cmap = 'plasma')

plt.yticks(rotation = 0)
plt.xticks(rotation = 90) plt.title('Correlation Heatmap') plt.show()

29
APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS ON UCI DATA SETS

Normal curves Program

import pandas as pd

import matplotlib.pyplot as plt

import numpy as np
df=pd.read_csv("C:/Users/Downloads/dataset_diabetes/diabetic_data.cs v")

df.head()
mean =df['time_in_hospital'].mean()
std =df['time_in_hospital'].std()

x_axis = np.arange(1, 10, 0.01)

plt.plot(x_axis, norm.pdf(x_axis, mean, std))


plt.show()

Output

30
Density and contour plots Program

df.time_in_hospital.plot.density(color='green')

plt.title('Density plot for time_in_hospital')


plt.show()

output:
Correlation and scatter plots Program

mp.figure(figsize=(20,10))

dataplot = sb.heatmap(data.corr(), cmap="YlGnBu", annot=True)

Output
Histogram
df.hist(figsize=(12,12),layout=(5,3))

# plotting histogram for carat using distplot()

sb.distplot(a=df.num_lab_procedures, kde=False)
# visualizing plot using matplotlib.pyplot library

plt.show()

Output
Three dimensional plotting Program

fig = plt.figure()

ax = plt.axes(projection = '3d')

x= df['number_emergency']

x = pd.Series(x, name= '')


y = df['number_inpatient']

y= pd.Series(x, name= '')

z = df['number_outpatient']

z= pd.Series(x, name= '')

ax.plot3D(x, y, z, 'green')

ax.set_title('3D line plot diabetes dataset')


plt.show()

Output
BASEMAP
import numpy as np

from matplotlib import pyplot as plt


from mpl_toolkits.basemap import Basemap

plt.figure(figsize=(8,8))

m=Basemap(projection='ortho',resolution=None,lat_0=50,lon_0=-
100,llcrnrlat= -90,urcrnrlat=90,llcrnrlon=-180,urcrnrlon=180,)

m.bluemarble(scale= 0.5);

fig=plt.figure(figsize=(8,8))
m=Basemap(projection='lcc',resolution=None,width=8E6,height=8E6,lat_0
=45,lon_0= -100)
m.etopo(scale=0.5,alpha=0.5)

x,y=m(-122.3,47.6)
plt.plot(x,y,'ok',markersize=5)
plt.text(x,y,'Seattle',fontsize=12);

from itertools import chain

def draw_map(m,scale=0.2):

m.shadedrelief(scale=scale)

lats=m.drawparallels(np.linspace(-90,90,13))
lons=m.drawmeridians(np.linspace(-180,180,13))

lat_lines=chain(*(tup[1][0]for tup in lats.items()))

lon_lines=chain(*(tup[1][0]for tup in lons.items()))


all_lines=chain(lat_lines,lon_lines)
for line in all_lines:

line.set(linestyle='-',alpha=0.3,color='w')

fig=plt.figure(figsize=(8,6),edgecolor='w')
m=Basemap(projection='cyl',resolution=None,llcrnrlat= -
90,urcrnrlat=90,llcrnrlon=-180,urcrnrlon=180,)

draw_map(m)
fig=plt.figure(figsize=(8,6),edgecolor='w')

m=Basemap(projection='moll',lon_0=0,resolution='c')

m.drawcoastlines()
m.fillcontinents(color='coral',lake_color='aqua')

draw_map(m)

fig=plt.figure(figsize=(8,8))

m=Basemap(projection='ortho',resolution=None,lat_0=50,lon_0=0,llcrnrlat
= -90,urcrnrlat=90,llcrnrlon=-180,urcrnrlon=180,)

draw_map(m)
fig=plt.figure(figsize=(8,8))

m=Basemap(projection='lcc',resolution=None,lon_0=0,lat_0=50,lat_1=45,l
at_2=55,width=1.6E7,height=1.2E7)

draw_map(m)
plt.show()

Output:

You might also like