Pendahuluan Python
Pendahuluan Python
ipynb - Colaboratory
1. NumPy in Python
What is NumPy?
It is the fundamental package for scienti c computing with Python. It contains various features
including these important ones:
Besides its obvious scienti c uses, NumPy can also be used as an e cient multi-dimensional
container of generic data. Arbitrary data-types can be de ned using Numpy which allows NumPy to
seamlessly and speedily integrate with a wide variety of databases.
It is a table of elements (usually numbers), all of the same type, indexed by a tuple of positive
integers. In NumPy dimensions are called axes. The number of axes is rank. NumPy’s array class is
called ndarray. It is also known by the alias array.
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 1/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
1. For example, you can create an array from a regular Python list or tuple using the array
function. The type of the resulting array is deduced from the type of the elements in the
sequences.
2. Often, the elements of an array are originally unknown, but its size is known. Hence, NumPy
offers several functions to create arrays with initial placeholder content. These minimize the
necessity of growing arrays, an expensive operation. For example: np.zeros, np.ones, np.full,
np.empty, etc.
3. To create sequences of numbers, NumPy provides a function analogous to range that returns
arrays instead of lists.
4. arange: returns evenly spaced values within a given interval. step size is speci ed.
5. linspace: returns evenly spaced values within a given interval. num no. of elements are
returned.
6. Reshaping array: We can use reshape method to reshape an array. Consider an array with
shape (a1, a2, a3, …, aN). We can reshape and convert it into another array with shape (b1, b2,
b3, …, bM). The only required condition is: a1 x a2 x a3 … x aN = b1 x b2 x b3 … x bM . (i.e
original size of array remains unchanged.)
7. Flatten array: We can use atten method to get a copy of array collapsed into one dimension.
It accepts order argument. Default value is ‘C’ (for row-major order). Use ‘F’ for column major
order.
newarr = arr.reshape(2, 2, 3)
# Flatten array
arr = np.array([[1, 2, 3], [4, 5, 6]])
flarr = arr.flatten()
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 3/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
A random array:
[[0.70864301 0.1445599 ]
[0.62385575 0.05495546]]
Original array:
[[1 2 3 4]
[5 2 4 2]
[1 2 0 1]]
Reshaped array:
[[[1 2 3]
[4 5 2]]
[[4 2 1]
[2 0 1]]]
Original array:
[[1 2 3]
[4 5 6]]
Fattened array:
[1 2 3 4 5 6]
a = np.array([1, 2, 5, 3])
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 4/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
# transpose of array
a = np.array([[1, 2, 3], [3, 4, 5], [9, 6, 0]])
Original array:
[[1 2 3]
[3 4 5]
[9 6 0]]
Transpose of array:
[[1 3 9]
[2 4 6]
[3 5 0]]
Binary operators: These operations apply on array elementwise and a new array is created. You can
use all basic arithmetic operators like +, -, /, , etc. In case of +=, -=, = operators, the exsisting array is
modi ed.
a = np.array([[1, 2],
[3, 4]])
b = np.array([[4, 3],
[2, 1]])
# add arrays
print ("Array sum:\n", a + b)
# matrix multiplication
print ("Matrix multiplication:\n", a.dot(b))
Array sum:
[[5 5]
[5 5]]
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 5/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
Array multiplication:
[[4 6]
[6 4]]
Matrix multiplication:
[[ 8 5]
[20 13]]
a = np.array([[1, 4, 2],
[3, 4, 6],
[0, -1, 5]])
# sorted array
print ("Array elements in sorted order:\n",
np.sort(a, axis = None))
# Creating array
arr = np.array(values, dtype = dtypes)
print ("\nArray sorted by names:\n",
np.sort(arr, order = 'name'))
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 6/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
Pandas: It is an open-source, BSD-licensed library written in Python Language. Pandas provide high
performance, fast, easy to use data structures and data analysis tools for manipulating numeric
data and time series. Pandas is built on the numpy library and written in languages like Python,
Cython, and C. In pandas, we can import data from various le formats like JSON, SQL, Microsoft
Excel, etc.
# Printing dataframe
df
Numpy: It is the fundamental library of python, used to perform scienti c computing. It provides
high-performance multidimensional arrays and tools to deal with them. A numpy array is a grid of
values (of the same type) that are indexed by a tuple of positive integers, numpy arrays are fast,
easy to understand, and give users the right to perform calculations across arrays.
[[23 46 85]
[43 56 99]
[11 34 55]]
Numpy numpy.resize()
With the help of Numpy numpy.resize(), we can resize the size of an array. Array can be of any
shape but to resize it we just need the size i.e (2, 2), (2, 3) and many more. During resizing numpy
append zeros if values at a particular place is missing.
Example #1:
In this example we can see that with the help of .resize() method, we have changed the shape of an
array from 1×6 to 2×3.
print(gfg)
[[1 2 3]
[4 5 6]]
Example #2:
In this example we can see that, we are trying to resize the array of that shape which is type of out
of bound values. But numpy handles this situation to append the zeros when values are not existed
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 8/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
in the array.
# importing the python module numpy
import numpy as np
print(ga)
[[1 2 3 4]
[5 6 0 0]
[0 0 0 0]]
Both the numpy.reshape() and numpy.resize() methods are used to change the size of a NumPy
array. The difference between them is that the reshape() does not changes the original array but
only returns the changed array, whereas the resize() method returns nothing and directly changes
the original array.
# creating an array
A = np.array([1, 2, 3, 4, 5, 6])
print("Original array:")
display(A)
# using reshape()
print("Changed array")
display(A.reshape(2, 3))
print("Original array:")
display(A)
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 9/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
Original array:
array([1, 2, 3, 4, 5, 6])
Changed array
array([[1, 2, 3],
Example 2: Using resize()
[4, 5, 6]])
Original array:
array([1, 2, 3, 4, 5, 6])
# importing the module
import numpy as np
# creating an array
Aa = np.array([1, 2, 3, 4, 5, 6])
print("Original array:")
display(Aa)
# using resize()
print("Changed array")
# this will print nothing as None is returned
display(Aa.resize(2, 3))
print("Original array:")
display(Aa)
Original array:
array([1, 2, 3, 4, 5, 6])
Changed array
None
Original array:
array([[1, 2, 3],
[4, 5, 6]])
print(B)
print(new)
[[1 2]
[4 5]
[7 8]]
[[1 2 4]
[5 7 8]]
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 10/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
Numpy.transpose()
numpy.transpose(), We can perform the simple function of transpose within one line by using
numpy.transpose() method of Numpy. It can transpose the 2-D arrays on the other hand it has no
effect on 1-D arrays. This method transpose the 2-D numpy array.
# before transpose
print(Aa, end ='\n\n')
# after transpose
print(Aa.transpose())
[[1 2 3]
[4 5 6]
[7 8 9]]
[[1 4 7]
[2 5 8]
[3 6 9]]
# before transpose
print(Ab, end ='\n\n')
# after transpose
print(Ab.transpose(1, 0))
[[1 2]
[4 5]
[7 8]]
[[1 4 7]
[2 5 8]]
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 11/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
import numpy as np
my_array = np.array([[11,22,33],[44,55,66]])
print(my_array)
print(type(my_array))
[[11 22 33]
[44 55 66]]
<class 'numpy.ndarray'>
import numpy as np
import pandas as pd
my_array = np.array([[11,22,33],[44,55,66]])
print(df)
print(type(df))
So here is the complete code to convert the array to a DataFrame with an index:
import numpy as np
import pandas as pd
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 12/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
my_array = np.array([[11,22,33],[44,55,66]])
print(df)
print(type(df))
import numpy as np
print(my_array)
print(type(my_array))
print(my_array.dtype)
import numpy as np
import pandas as pd
print(df)
print(type(df))
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 13/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
Let’s check the data types of all the columns in the new DataFrame by adding df.dtypes to the code:
import numpy as np
import pandas as pd
print(df)
print(type(df))
Let’s check the data types of all the columns in the new DataFrame by adding df.dtypes to the code:
import numpy as np
import pandas as pd
print(df)
print(type(df))
print(df.dtypes)
For example, suppose that you’d like to convert the last 3 columns in the DataFrame to integers.
import numpy as np
import pandas as pd
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 14/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
import pandas as pd
my_array = np.array([['Jon',25,1995,2016],['Maria',47,1973,2000],['Bill',38,1982,2005]])
df['Age'] = df['Age'].astype(int)
df['Birth Year'] = df['Birth Year'].astype(int)
df['Graduation Year'] = df['Graduation Year'].astype(int)
print(df)
print(type(df))
print(df.dtypes)
pd.concat([df1, df2])
import pandas as pd
print (df1)
import pandas as pd
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 15/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
print (df2)
import pandas as pd
You may then choose to assign the index values in an incremental manner once you concatenated
the two DataFrames.
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 16/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
import pandas as pd
Pandas.DataFrame.loc
max_speed shield
cobra 1 2
viper 4 5
sidewinder 7 8
df.loc['viper']
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 17/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
d . oc[ pe ]
max_speed 4
shield 5
Name: viper, dtype: int64
df.loc[['viper', 'sidewinder']]
max_speed shield
viper 4 5
sidewinder 7 8
df.loc['cobra', 'shield']
df.loc['cobra':'viper', 'max_speed']
cobra 1
viper 4
Name: max_speed, dtype: int64
import pandas as pd
print (df)
set_of_numbers equal_or_lower_than_4?
0 1 True
1 2 True
2 3 True
3 4 True
4 5 False
5 6 False
6 7 False
7 8 False
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 18/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
8 9 False
9 10 False
how to get the same results as in case 1 by using lambada, where the conditions are:
If the number is equal or lower than 4, then assign the value of ‘True’ Otherwise, if the number is
greater than 4, then assign the value of ‘False’
import pandas as pd
print (df)
set_of_numbers equal_or_lower_than_4?
0 1 True
1 2 True
2 3 True
3 4 True
4 5 False
5 6 False
6 7 False
7 8 False
8 9 False
9 10 False
import pandas as pd
print (df)
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Mismatch
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 19/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
import pandas as pd
print (df)
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Mismatch
import pandas as pd
print (df)
First_name name_match
0 Jon Mismatch
1 Bill Match
2 Maria Mismatch
3 Emma Match
import pandas as pd
print (df)
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 20/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
set_of_numbers
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
9 10
10 0
11 0
set_of_numbers
0 1
1 2
2 3
3 4
4 555
5 6
6 7
7 8
8 9
9 10
10 999
11 999
import pandas as pd
import numpy as np
df.loc[df['set_of_numbers'].isnull(), 'set_of_numbers'] = 0
print (df)
set_of_numbers
0 1.0
1 2.0
2 3.0
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 NaN
11 NaN
set_of_numbers
0 1.0
1 2.0
2 3.0
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 21/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
3 4.0
4 5.0
5 6.0
6 7.0
7 8.0
8 9.0
9 10.0
10 0.0
11 0.0
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 22/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
stats_numeric = df['Price'].describe()
print (stats_numeric)
count 5.000000
mean 27600.000000
std 4878.524367
min 22000.000000
25% 25000.000000
50% 27000.000000
75% 29000.000000
max 35000.000000
Name: Price, dtype: float64
You’ll notice that the output contains 6 decimal places. You may then add the syntax of astype (int)
to the code to get integer values.
count 5
mean 27600
std 4878
min 22000
25% 25000
50% 27000
75% 29000
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 23/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
max 35000
Name: Price, dtype: int64
stats_categorical = df['Brand'].describe()
print (stats_categorical)
count 5
unique 4
top Toyota Corolla
freq 2
Name: Brand, dtype: object
stats = df.describe(include='all')
print (stats)
1. Count
2. Mean
3. Standard deviation
4. Minimum
5. 0.25 Quantile
6. 0.50 Quantile (Median)
7. 0.75 Quantile
8. Maximum
count1 = df['Price'].count()
print('count: ' + str(count1))
mean1 = df['Price'].mean()
print('mean: ' + str(mean1))
std1 = df['Price'].std()
print('std: ' + str(std1))
min1 = df['Price'].min()
print('min: ' + str(min1))
quantile1 = df['Price'].quantile(q=0.25)
print('25%: ' + str(quantile1))
quantile2 = df['Price'].quantile(q=0.50)
print('50%: ' + str(quantile2))
quantile3 = df['Price'].quantile(q=0.75)
print('75%: ' + str(quantile3))
max1 = df['Price'].max()
print('max: ' + str(max1))
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 25/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
How to plot a DataFrame using Pandas follow the complete steps to plot:
1. Scatter diagram
2. Line chart
3. Bar chart
4. Pie chart
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data,columns=['Unemployment_Rate','Stock_Index_Price'])
df.plot(x ='Unemployment_Rate', y='Stock_Index_Price', kind = 'scatter')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 26/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
df = pd.DataFrame(data,columns=['Year','Unemployment_Rate'])
df.plot(x ='Year', y='Unemployment_Rate', kind = 'line')
plt.show()
import pandas as pd
import matplotlib.pyplot as plt
df = pd.DataFrame(data,columns=['Country','GDP_Per_Capita'])
df.plot(x ='Country', y='GDP_Per_Capita', kind = 'bar')
plt.show()
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 27/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
import pandas as pd
import matplotlib.pyplot as plt
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 28/29
2/23/2021 Pendahuluan Python.ipynb - Colaboratory
https://colab.research.google.com/drive/1OUSf_USROWX1K4Nq36sjimD_vuJaM_e4#scrollTo=ijDLvkGvGVYd&printMode=true 29/29