EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
EX - No: 1 Date:: Download Install Explore The Features of Numpy, Scipy, Jupiter, Statsmodels and Pandas Packages
No: 1
Date: DOWNLOAD INSTALL EXPLORE THE FEATURES OF NUMPY,
SCIPY, JUPITER, STATSMODELS AND PANDAS PACKAGES
Once you create the anaconda environment, go back to the Home page on
Anaconda Navigator and install Jupyter Notebook from an application on the right
panel.
Now select New -> PythonX and enter the below lines and select Run. On
Jupyter, each cell is a statement, so you can run each cell independently
when there are no dependencies on previous cells.
This completes installing Anaconda and running Jupyter Notebook
WORKING WITH NUMPY ARRAYS
Object Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5]) print(arr)
print(type(arr))
Output
[1 2 3 4 5]
<class 'numpy.ndarray'>
2-D Arrays
Program
import numpy as np
arr = np.array([[1, 2, 3], [4, 5, 6]]) print(arr)
Output
[[1 2 3]
[4 5 6]]
Output
[[[1 2 3]
[4 5 6]]
[[1 2 3]
[4 5 6]]]
(iii) Check Number of Dimensions?
Program
import numpy as npa
= np.array(42)
b = np.array([1, 2, 3, 4, 5])
c = np.array([[1, 2, 3], [4, 5, 6]])
d = np.array([[[1, 2, 3], [4, 5, 6]], [[1, 2, 3], [4, 5, 6]]])
Output
0
1
2
3
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[0])
Output
1
Program
import numpy as np
arr = np.array([1, 2, 3, 4])
print(arr[2] + arr[3])
Output
7
(iv) Slicing arrays Program
import numpy as np
arr = np.array([1, 2, 3, 4, 5, 6, 7])
print(arr[1:5])
Output
[2 3 4 5]
NumPy Array Shape Program
import numpy as np
arr = np.array([[1, 2, 3, 4], [5, 6, 7, 8]])
print(arr.shape)
Output
(2, 4)
Output
[[ 1 2 3]
[4 5 6]
[7 8 9]
[10 11 12]]
import numpy as np
arr= np.array([1, 2, 3])
for x in arr:
print(x)
Output
1
2
3
Joining NumPy Arrays Program
import numpy as np
arr1= np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])
arr = np.concatenate((arr1, arr2)) print(arr)
Output
[1 2 3 4 5 6]
Output
Output
(array([3, 5, 6]),)
Sorting ArraysProgram
import numpy as np
arr = np.array([3, 2, 0, 1]) print(np.sort(arr))
Output
[0 1 2 3]
WORKING WITH PANDAS DATA FRAMES
Output
calories duration
0 420 50
1 380
2 390 45
Output
calories 420
duration 50
Name: 0, dtype: int64
Program
print(df.loc[[0, 1]])
Output
calories duration
0 420 50
1 380 40
Note: When using [], the result is a Pandas DataFrame.
Named Indexes Program
Output
calories duration
day1 420 50
day2 380 40
day3 390 45
Output
calories 380
duration 40
Name: 0, dtype: int64
import pandas as pd
df = pd.read_csv('data.csv')
print(df)
Output
import pandas as pd
print(pd.options.display.max_rows)
In my system the number is 60, which means that if the DataFrame contains more than 60
rows, the print(df) statement will return only the headers and the first andlast 5 rows.
import pandas as pd
pd.options.display.max_rows = 9999
df = pd.read_csv('data.csv')
print(df)
print(df.tail())
print(df.info())
Output
<class 'pandas.core.frame.DataFrame'>
None
READING DATA FROM TEXT FILES, EXCEL AND THE WEB AND EXPLORINGVARIOUS
COMMANDS FOR DOING DESCRIPTIVE ANALYTICS ON THE IRIS DATA SET.
Program
import pandas as pd
df = pd.read_csv("Iris.csv")
Output:
df.shape
Output: (150, 6)
df.info()
Output
df.describe()
df.isnull().sum()
Checking Duplicates
Output
df.value_counts("Species")
Data Visualization
# importing packages
import seaborn as sns
sns.countplot(x='Species', data=df, )
plt.show()
Comparing Sepal Length and Sepal Width
# importing packages
sns.scatterplot(x='SepalLengthCm', y='SepalWidthCm',
hue='Species', data=df, )
# importing packages
import seaborn as sns
Program
import pandas as pd
%matplotlib inline
count = df['Glucose'].value_counts()
display(count)
df.head()
25
df.describe()
43
df.mean()
df.mode()
df.var()
26
df.std()
df.skew()
Pregnancies 0.901674
Glucose 0.173754
BloodPressure -1.843608
SkinThickness 0.109372
Insulin 2.272251
BMI -0.428982
DiabetesPedigreeFunction 1.919911
Age 1.129597
Outcome 0.635017
dtype: float64
df.kurtosis()
Pregnancies 0.159220
Glucose 0.640780
BloodPressure 5.180157
SkinThickness -0.520072
Insulin 7.214260
BMI 3.290443
DiabetesPedigreeFunction 5.594954
Age 0.643159
27
Outcome -1.600930
dtype: float64
corr = df.corr()
sns.heatmap(corr,
xticklabels=corr.columns,
yticklabels=corr.columns)
sns.countplot('Outcome', data=df)
plt.show()
28
# Computing the %age of diabetic and non-diabetic in the sample
Out0=len([df.Outcome==1])
Out1=len([df.Outcome==0])
Total=Out0+Out1
PC_of_1 = Out1*100/Total
PC_of_0 = Out0*100/Total
PC_of_1, PC_of_0
(50.0, 50.0)
plt.figure(dpi = 120,figsize= (5,4))
mask = np.triu(np.ones_like(df.corr(),dtype = bool))
plt.yticks(rotation = 0)
plt.xticks(rotation = 90) plt.title('Correlation Heatmap') plt.show()
29
APPLY AND EXPLORE VARIOUS PLOTTING FUNCTIONS ON UCI DATA SETS
import pandas as pd
import numpy as np
df=pd.read_csv("C:/Users/Downloads/dataset_diabetes/diabetic_data.cs v")
df.head()
mean =df['time_in_hospital'].mean()
std =df['time_in_hospital'].std()
Output
30
Density and contour plots Program
df.time_in_hospital.plot.density(color='green')
output:
Correlation and scatter plots Program
mp.figure(figsize=(20,10))
Output
Histogram
df.hist(figsize=(12,12),layout=(5,3))
sb.distplot(a=df.num_lab_procedures, kde=False)
# visualizing plot using matplotlib.pyplot library
plt.show()
Output
Three dimensional plotting Program
fig = plt.figure()
ax = plt.axes(projection = '3d')
x= df['number_emergency']
z = df['number_outpatient']
ax.plot3D(x, y, z, 'green')
Output
BASEMAP
import numpy as np
plt.figure(figsize=(8,8))
m=Basemap(projection='ortho',resolution=None,lat_0=50,lon_0=-
100,llcrnrlat= -90,urcrnrlat=90,llcrnrlon=-180,urcrnrlon=180,)
m.bluemarble(scale= 0.5);
fig=plt.figure(figsize=(8,8))
m=Basemap(projection='lcc',resolution=None,width=8E6,height=8E6,lat_0
=45,lon_0= -100)
m.etopo(scale=0.5,alpha=0.5)
x,y=m(-122.3,47.6)
plt.plot(x,y,'ok',markersize=5)
plt.text(x,y,'Seattle',fontsize=12);
def draw_map(m,scale=0.2):
m.shadedrelief(scale=scale)
lats=m.drawparallels(np.linspace(-90,90,13))
lons=m.drawmeridians(np.linspace(-180,180,13))
line.set(linestyle='-',alpha=0.3,color='w')
fig=plt.figure(figsize=(8,6),edgecolor='w')
m=Basemap(projection='cyl',resolution=None,llcrnrlat= -
90,urcrnrlat=90,llcrnrlon=-180,urcrnrlon=180,)
draw_map(m)
fig=plt.figure(figsize=(8,6),edgecolor='w')
m=Basemap(projection='moll',lon_0=0,resolution='c')
m.drawcoastlines()
m.fillcontinents(color='coral',lake_color='aqua')
draw_map(m)
fig=plt.figure(figsize=(8,8))
m=Basemap(projection='ortho',resolution=None,lat_0=50,lon_0=0,llcrnrlat
= -90,urcrnrlat=90,llcrnrlon=-180,urcrnrlon=180,)
draw_map(m)
fig=plt.figure(figsize=(8,8))
m=Basemap(projection='lcc',resolution=None,lon_0=0,lat_0=50,lat_1=45,l
at_2=55,width=1.6E7,height=1.2E7)
draw_map(m)
plt.show()
Output: