0% found this document useful (0 votes)
25 views

Python - Vectorized - Tute - Jupyter Notebook

Uploaded by

Anvitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

Python - Vectorized - Tute - Jupyter Notebook

Uploaded by

Anvitha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 16

Python Tutorial for Vectorizing form of Implementation

Example 1: Illustrative Example of Neural Network Implementation via


Vectorizing the function
In [22]:  1 import numpy as np
2 ​
3 # Small number of data samples
4 ​
5 # x0, x1, x2 = 1., 2., 3.
6 # bias, w1, w2 = 0.1, 0.3, 0.5
7 ​
8 # x = [x0, x1, x2]
9 # w = [bias, w1, w2]
10 # x_vec, w_vec = np.array(x), np.array(w)
11 ​
12 # Large number of data samples
13 x, w = np.random.rand(100000), np.random.rand(100000)
14 x_vec, w_vec = np.array(x), np.array(w)
In [23]:  1 # Python code to demonstrate the working of # zip()
2
3 # # initializing lists
4 # name = [ "Manjeet", "Nikhil", "Shambhavi", "Astha" ]
5 # marks = [ 40, 50, 60, 70 ]
6
7 # # using zip() to map values
8 # mapped = zip(name, marks)
9 # # converting values to print as set
10 # mapped = set(mapped)
11
12 # # printing resultant values
13 # print ("The zipped result is : ",end="")
14 # print (mapped)
15 ​
16 # #Unzipping the Value Using zip()
17 # c, v, = zip(*mapped)
18 # print('c =', c)
19 # print('v =',v)
In [24]:  1 # Neural network output with For Loop statement
2 def forloop(x, w):
3 z = 0.
4 for i in range(len(x)):
5 z += x[i] * w[i]
6 return z
7 ​
8 # Neural network output with listcomprehension statement
9 def listcomprehension(x, w):
10 z = sum(x_i*w_i for x_i, w_i in zip(x, w))
11 return z
12 ​
13 # Neural network output with Vectorized form
14 def vectorized(x, w):
15 z = x_vec.dot(w_vec)
16 # z = (x_vec.transpose()).dot(w_vec)
17 return z
18 ​
19 ​

Comparison of Processing Speed of above three different forms of


implemenattion
In [25]:  1 # Method-1: forloop
2 import time
3 t10 = time.time()
4 print(forloop(x,w))
5 t11 = time.time()
6 time_forloop = t11 - t10
7 print(time_forloop)

25005.565200480993
0.03804349899291992
In [26]:  1 # Method-2: ListComprehension Implemenattion
2 import time
3 t20 = time.time()
4 print(listcomprehension(x,w))
5 t21 = time.time()
6 time_listComp = t21 - t20
7 print(time_listComp)

25005.565200480993
0.02210259437561035

In [27]:  1 # Method-3: Vectorized Implemenattion


2 import time
3 t30 = time.time()
4 print(vectorized(x_vec,w_vec))
5 t31 = time.time()
6 time_vectorized = t31 - t30
7 print(time_vectorized)

25005.565200480753
0.0009987354278564453
In [28]:  1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4 ​
5 x = ['Method-1', 'Method-2', 'Method-3']
6 Processing_Times = [time_forloop, time_listComp, time_vectorized]
7
8 ​
9 x_pos = [i for i, _ in enumerate(x)]
10 ​
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15 ​
16 plt.xticks(x_pos, x)
17 ​
18 plt.show()
Example 2: Illustrative Example for Predicting House sales price using
Boston house dataset
In [29]:  1 import numpy as np
2 import matplotlib.pyplot as plt
3 %matplotlib inline
4 ​
5 from sklearn.datasets import load_boston
6 boston_data = load_boston()
7 print(boston_data['DESCR'])
.. _boston_dataset:

Boston house prices dataset


---------------------------

**Data Set Characteristics:**

:Number of Instances: 506

:Number of Attributes: 13 numeric/categorical predictive. Median Value (attribute 14) is usually the target.

:Attribute Information (in order):


- CRIM per capita crime rate by town
- ZN proportion of residential land zoned for lots over 25,000 sq.ft.
- INDUS proportion of non-retail business acres per town
- CHAS Charles River dummy variable (= 1 if tract bounds river; 0 otherwise)
- NOX nitric oxides concentration (parts per 10 million)
- RM average number of rooms per dwelling
- AGE proportion of owner-occupied units built prior to 1940
- DIS weighted distances to five Boston employment centres
- RAD index of accessibility to radial highways
- TAX full-value property-tax rate per $10,000
- PTRATIO pupil-teacher ratio by town
- B 1000(Bk - 0.63)^2 where Bk is the proportion of black people by town
- LSTAT % lower status of the population
- MEDV Median value of owner-occupied homes in $1000's

:Missing Attribute Values: None

:Creator: Harrison, D. and Rubinfeld, D.L.

This is a copy of UCI ML housing dataset.


https://archive.ics.uci.edu/ml/machine-learning-databases/housing/ (https://archive.ics.uci.edu/ml/machine-learn
ing-databases/housing/)

This dataset was taken from the StatLib library which is maintained at Carnegie Mellon University.

The Boston house-price data of Harrison, D. and Rubinfeld, D.L. 'Hedonic


prices and the demand for clean air', J. Environ. Economics & Management,
vol.5, 81-102, 1978. Used in Belsley, Kuh & Welsch, 'Regression diagnostics
...', Wiley, 1980. N.B. Various transformations are used in the table on
pages 244-261 of the latter.

The Boston house-price data has been used in many machine learning papers that address regression
problems.

.. topic:: References

- Belsley, Kuh & Welsch, 'Regression diagnostics: Identifying Influential Data and Sources of Collinearity',
Wiley, 1980. 244-261.
- Quinlan,R. (1993). Combining Instance-Based and Model-Based Learning. In Proceedings on the Tenth Internati
onal Conference of Machine Learning, 236-243, University of Massachusetts, Amherst. Morgan Kaufmann.
In [30]:  1 # take the boston data
2 data = boston_data['data']
3 # we will only work with two of the features: INDUS and RM
4 # x_input = data[:, [2,]] # for single feature of input data (INDUS)
5 x_input = data[:, [2,5]] # for two features of input data (INDUS and RM)
6 # x_input = data[:, [2,5,7]] # for three features of input data (INDUS,RM, and DIS)
7 # x_input = data[:, ] # All features of input data
8 y_target = boston_data['target']
9 # print(x_input.shape[1])
10 # print(x_input)
11 # print(y_target.shape[0])
12 # print(y_target)
13 ​
14 # Individual plots for the two features:
15 plt.title('Industrialness vs Med House Price')
16 plt.scatter(x_input[:, 0], y_target)
17 plt.xlabel('Industrialness')
18 plt.ylabel('Med House Price')
19 plt.show()
20 ​
21 plt.title('Avg Num Rooms vs Med House Price')
22 plt.scatter(x_input[:, 1], y_target)
23 plt.xlabel('Avg Num Rooms')
24 plt.ylabel('Med House Price')
25 plt.show()
26 ​
27 # plt.title('Avg weighted distances vs Med House Price')
28 # plt.scatter(x_input[:, 2], y_target)
29 # plt.xlabel('Avg weighted distances ')
30 # plt.ylabel('Med House Price')
31 # plt.show()
32 ​
Define cost function: Non-vectorized form
1 𝑁
(𝑦,𝑡) = 𝑁 ∑(𝑦(𝑖) − 𝑡(𝑖) )2
𝑖=1
1 𝑁
(𝑦,𝑡) = 𝑁 ∑(𝑤1 𝑥(𝑖)1 + 𝑤2 𝑥(𝑖)2 + 𝑏 − 𝑡(𝑖) )2
𝑖=1
In [31]:  1 # Non-vectorized implementation
2 def cost(w1, w2, b, X, t):
3 '''
4 Evaluate the cost function in a non-vectorized manner for
5 inputs `X` and targets `t`, at weights `w1`, `w2` and `b`.
6 '''
7 costs = 0
8 for i in range(len(t)):
9 # y_i = w1 * X[i, 0] + w2 * X[i, 0] + b # for single feature of input data
10 y_i = w1 * X[i, 0] + w2 * X[i, 1] + b # for two features of input data
11 # y_i = w1 * X[i] + w2 * X[i] + b # All features of input data
12 t_i = t[i]
13 costs += (y_i - t_i) ** 2
14 return costs / len(t)
15 ​

In [32]:  1 ​
2 cost(3, 5, 1, x_input, y_target)

Out[32]: 2475.821173270752

In [33]:  1 ​
2 cost(3, 5, 0, x_input, y_target)

Out[33]: 2390.2197701086957

Vectorizing the cost function:


(𝑦,𝑡) = 𝑁1 ‖𝐗𝐰 + 𝐛 − 𝐭‖2
In [35]:  1 def cost_vectorized(w1, w2, b, X, t):
2 '''
3 Evaluate the cost function in a vectorized manner for
4 inputs `X` and targets `t`, at weights `w1`, `w2` and `b`.
5 '''
6 N = len(y_target)
7 w = np.array([w1, w2])
8 # print(w)
9 y = np.dot(X, w) + b * np.ones(N)
10 cost_vect = np.sum((y - t)**2) / (N)
11 return cost_vect
12 ​

In [36]:  1 cost_vectorized(3, 5, 1, x_input, y_target)

Out[36]: 2475.821173270751

In [37]:  1 ​
2 ​
3 cost(3, 5, 0, x_input, y_target)

Out[37]: 2390.2197701086957

Comparing Processing Speed of the Vectorized vs Nonvectorized


code
We'll see below that the vectorized code already runs ~2x faster than the non-vectorized code! Hopefully this will convince you to always
vectorized your code whenever possible
In [38]:  1 import time
2 t40 = time.time()
3 print(cost(3, 5, 1, x_input, y_target))
4 t41 = time.time()
5 time_CostNonvect = t41 - t40
6 print(time_CostNonvect)

2475.821173270752
0.0009961128234863281

In [39]:  1 import time


2 t50 = time.time()
3 print(cost_vectorized(3, 5, 1, x_input, y_target))
4 t51 = time.time()
5 time_CostVect = t51 - t50
6 print(time_CostVect)

2475.821173270751
0.0009663105010986328
In [40]:  1 import matplotlib.pyplot as plt
2 %matplotlib inline
3 # plt.style.use('ggplot')
4 ​
5 x = ['Cost_NonVectorized', 'Cost_Vectorized']
6 Processing_Times = [time_CostNonvect, time_CostVect]
7
8 ​
9 x_pos = [i for i, _ in enumerate(x)]
10 ​
11 plt.bar(x_pos, Processing_Times, color='green')
12 plt.xlabel("<-------------Method-----------> ")
13 plt.ylabel("<----------Time-------------> ")
14 plt.title(" Time Vs Method ")
15 ​
16 plt.xticks(x_pos, x)
17 ​
18 plt.show()
In [ ]:  1 ​

You might also like