intro2Python_part2
intro2Python_part2
Unit-02
Capturing, Preparing
and Working with data
Outline
Looping
We suppose to close the file once we are done using the file in the Python using close()
method.
closefile.py
1 f = open('demofile.txt') fileusingwith.py
2 data = f.read() 1 with open('demofile.txt') as f :
3 print(data) 2 data = f.read()
4 f.close() 3 print(data)
When we open file using with we need not to close the file.
If we open file with ‘w’ mode it will overwrite the data to the existing file or will create new file if
file does not exists.
If we open file with ‘a’ mode it will append the data at the end of the existing file or will create
new file if file does not exists.
numpyarray.py Output
1 import numpy as np <class 'numpy.ndarray'>
2 a= np.array(['Andalus','Insitute','Sanaa']) ['Andalus' 'Insitute' 'Sanaa']
3 print(type(a))
4 print(a)
zeros(n) function will return NumPy array of given shape, filled with zeros.
numpyzeros.py Output
1 import numpy as np [0. 0. 0.]
2 c = np.zeros(3)
3 print(c) [[0. 0. 0.] [0. 0. 0.] [0. 0. 0.]]
4 c1 = np.zeros((3,3)) #have to give as tuple
5 print(c1)
ones(n) function will return NumPy array of given shape, filled with ones.
linspace(start,stop,num) function will return evenly spaced numbers over a specified interval.
numpylinspace.py Output
1 import numpy as np [0. 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8
2 c = np.linspace(0,1,11) 0.9 1. ]
3 print(c)
Note: in arange function we have given start, stop & step, whereas in lispace function we are
giving start,stop & number of elements we want.
Note: the number of elements and multiplication of rows and cols in new array must be equal.
Example : here we have old one-dimensional array of 10 elements and reshaped shape is (5,2)
so, 5 * 2 = 10, which means it is a valid reshape
We can reshape the array in any shape using reshape method, which we learned in previous
slide.
Unit 02 – Overview of Python and Data Analysis 15
Aggregations
min() function will return the minimum value from the ndarray, there are two ways in which we
can use min function, example of both ways are given below.
numpymin.py Output
1 import numpy as np Min way1 = 1
2 l = [1,5,3,8,2,3,6,7,5,2,9,11,2,5,3,4,8,9,3,1,9,3] Min way2 = 1
3 a = np.array(l)
4 print('Min way1 = ',a.min())
5 print('Min way2 = ',np.min(a))
max() function will return the maximum value from the ndarray, there are two ways in which we
can use min function, example of both ways are given below.
numpymax.py Output
1 import numpy as np Max way1 = 11
2 l = [1,5,3,8,2,3,6,7,5,2,9,11,2,5,3,4,8,9,3,1,9,3] Max way2 = 11
3 a = np.array(l)
4 print('Max way1 = ',a.max())
5 print('Max way2 = ',np.max(a))
If we want to get sum of rows or cols we can use axis argument with the aggregate functions.
numpyaxis.py Output
1 import numpy as np sum (cols) = [12 15 18]
2 array2d = np.array([[1,2,3],[4,5,6],[7,8,9]]) sum (rows) = [6 15 24]
3 print('sum (cols)= ',array2d.sum(axis=0)) #Vertical
4 print('sum (rows)= ',array2d.sum(axis=1)) #Horizontal
Both method is valid and provides exactly the same answer, but single bracket notation is
recommended as in double bracket notation it will create a temporary sub array of third row
and then fetch the second column from it.
Single bracket notation will be easy to read and write while programming.
Series
Data Frames
Accessing text, CSV, Excel files using pandas
Accessing SQL Database
Missing Data
Group By
Merging, Joining & Concatenating
Operations
2- Pandas
Pandas is an open source library built on top of NumPy.
It allows for fast data cleaning, preparation and analysis.
It excels in performance and productivity.
It also has built-in visualization features.
It can work with the data from wide variety of sources.
Install :
conda install pandas
OR pip install pandas