0% found this document useful (0 votes)
35 views19 pages

SIM - Chapters - DA T4

This document introduces programming tools and concepts for analytics programming using Python. It covers Google Colab as a programming tool, basic Python concepts like variables, strings, comments, operators and program structures. It also provides examples to explain each concept.

Uploaded by

Adam Taufik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
35 views19 pages

SIM - Chapters - DA T4

This document introduces programming tools and concepts for analytics programming using Python. It covers Google Colab as a programming tool, basic Python concepts like variables, strings, comments, operators and program structures. It also provides examples to explain each concept.

Uploaded by

Adam Taufik
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

4

INTRODUCTION TO ANALYTICS PROGRAMMING

LEARNING OUTCOMES
At the end of this topic, students should be able to:
• Utilize tools/software for analytics programming
• Explain variable, string, comments and programming structures
• Implement data management/manipulation codes using programming language
• Implement basic plots/visualization codes using programming language

INTRODUCTION
How do you implement data analytics solutions? i.e. when you have an analytics problem and want to
resolve it using machine learning, for example, you need to code the solution using a programming
language. To code a programming language, you have to know the tools/software, which will be
introduced in this topic. Furthermore, you also have to know the concept of variable and basic
programming structures such as string, array, repetition, condition and function. Next, in this topic you
will also learn how to code a programming language so that you can manage data as well as visualize
them using plots/graphs.

4.1 PROGRAMMING TOOLS/SOFTWARE


The programming language used in this course is Python, meanwhile the tools used for coding the
Python programming is Google Colaboratory or normally called as Google Colab. Google Colab is a
free tools provided by Google Research.

Google Colab allows you to write Python code online using web browser as the interface. It does not
require download and install, hence giving you convenience to start the Python coding. The works/codes
that you write is in the format known as notebook, and each notebook file is stored with .ipynb
extension. More information about Google Colab can be retrieved from the link below:

Link on Google Colab information: https://research.google.com/colaboratory/faq.html

To start using Google Colab, login your Google account at https://colab.research.google.com. The
interface shown in Figure 1 will be prompted out on your browser. There are five tabs available, which
are described as the following:
• Examples – contains some notebooks of Python programming examples.
• Recent – contains the recent notebooks that you have worked with.
• Google Drive – contains all notebooks in your Google drive.
• GitHub – this tab allows you to load notebook from GitHub
• Upload – this tab allows you to load notebook from local directory.
For first time use, or in the event that you do not have any notebook, or you want to start a new notebook,
click the link “New notebook” at the bottom right of the window in Figure 1.

Figure 1: Google Colab prompted window

The notebook is shown in Figure 2. The component marked as “1” in the notebook is the cell, which is
where you will write your Python codes. It is always advisable to name your notebook before start
working on the codes. To name your notebook, click on the part marked as “2”, and name with a good,
representing name. As a start, name the notebook as myFirstNb.ipynb.

Figure 2: Google Colab notebook


4.2 FIRST CODES
To write your first Python code, type and run the following code:

2+3

The code should be written in the cell. Meanwhile, to run the code, either click “Run” button or press
Ctrl + Enter. You should see the output 5 is displayed in the output segment, the segment underneath
the cell. Figure 3 shows the mentioned components.

Python code
Output
Run button
Figure 3: Google Colab interface for writing Python codes

You may write as many codes as possible in a single cell. However, it might be useful to write codes
with different objective/purpose in separate cells. To create a new cell, click “+ Code” button at the left
top part of Google Colab notebook as shown in Figure 2. You will find that a new cell will be created
underneath the previous one.

In the new cell, type and run:

print (“Hello World!”)

You will get the text displayed in the output cell. This is an important point that you may need to note
– the output could be in numerical or text-based format.

SELF-LEARNING ACTIVITY
Differentiate between the display of output using literal mathematical operation codes and print
function.

4.3 VARIABLES AND MATHEMATICAL OPERATIONS


In a single cell, type and run the following codes:

a = 10
b = 15
c=a+b
d=a*b

The codes above show another important behaviour of Python language, whereby a value can be stored
in a placeholder, known as variable. A variable can hold a single value. In this case, a, b, c and d are
variables, each holds a single value.
Furthermore, you may also need to note that a variable can hold a literal value, as shown in the first and
second line (a and b) or it can also hold a value based on the outcome of other operations, as shown in
the third and fourth line (c and d). To display the value of a variable, you can either type its name or
print (variable name). The above variables’ values can be printed by typing and running the following:

This will print the value of a, which is 10.

print (b)
print (c)

These will print the value of b and c. You also need to note that combining the following two codes and
run together will only print the latest variable’s value:

a
b
c

This will print the value of c only, which is 25.

Another important point that you have to be aware of is the naming of the variables. Python sets some
rules with regards to naming the variables, and they are as the following:
• Must start with a letter/underscore character
• Cannot start with a number
• Can only contain alpha-numeric characters and underscores (A-z, 0-9, and _ )
• Case-sensitive (name, Name and NAME are three different variables)

SELF-LEARNING ACTIVITY
Give three examples each for valid and invalid variable names.

4.4 STRING
As in other programming languages, Python also allows a variable to hold text-based values comprising
string of characters. This type of values is known as string. Type and run the following codes:

str1 = "football"
fr2 = "rugby"
str3, str4, str5 = "cycling", "judo", "table tennis"
print ("I like to watch " + str4)
print (str1 + " and " + str5 + " are among the games in the Summer Olympics")

Line 1 – 3 show how strings are defined – by assigning to the variables. Meanwhile, line 4 and 5 show
how strings are concatenated together. Strings concatenation can be done upon literal strings or
variables (that hold string value) or combination of both.
As mentioned above, a string is a set of characters, which allows it to be split by character, as follows:

print (str2[0])
print (str2[3])

The two lines above print the characters in the str2 that are located at position 1 and 4. The number in
the square brackets represent the position of the character in the str2. This position number is normally
called as an index. Hence, you should note that the index starts with 0 (instead of 1), and ends with
n-1.

Furthermore, Python also provides some methods that can be used to return specific information/values
about strings. For example:

print (len(str3))

will print the number of characters contained in str3. This is done by the len () method.

Another example of method is split (), that split a string according to the specific separator. The codes
below split str5 by a space separator. Hence line 2 and 3 below will each print “table” and “tennis”.

strnew = str5.split(" ")


print (strnew[0])
print (strnew[1])

SELF-LEARNING ACTIVITY
There are quite a number more functions that can be used to manipulate strings. Find three of the
functions and implement them with Python to show how the work.

4.5 COMMENTS
Python allows you to provide comments in the programming code. A comment skips codes from being
executed, and this is done by supplying the start of the line of code with #. Comments are normally
used to describe codes. For example;

#This is to convert Celsius to Fahrenheit


C = 31
f = 9 / 5 * C + 32

Line 1 will not be executed, instead it is only used to explain what the codes that follow will do.

4.6 OPERATORS
In Python, the most commonly used operators are mathematical, comparison and logical operators.

The mathematical operators are:


+ Addition
- Subtraction
* Multiplication
/ Division
% Modulus (divides two operands and returns the remainder)
** Exponent

The comparison operators are:


== Equality (to check whether two operands are equal or not)
!= Not equality (to check whether two operands are not equal, may also use <>)
> Greater than (to check whether left operand is greater than right operand)
< Less than (to check whether left operand is less than right operand)
>= Greater than or equal (to check whether left operand is greater than or equal with right operand)
<= Less than or equal (to check whether left operand is less than or equal with right operand)
Comparison operators will generate a result of either true or false.

The logical operators are:

and If both operands (left and right) are true, then condition becomes true
or If any of the two operands (left or right) is true, then condition becomes true
not Returns the reverse logical value of the operand

SELF-LEARNING ACTIVITY
Run the following codes containing the comparison and logical operators, and see the results:

x = 10
y = 11
print (x == y)
print (x != y)
print (x > y)
print (x < y)
print (x <= y)
print (x >= y)

st1 = x < y and x == 11


st2 = x < y or x == 11
print (st1)
print (st2)
print (not st2)

4.7 PROGRAM STRUCTURES

4.7.1 Decision
Decision is a process of checking for condition, and determining actions according to the condition.
Type and run the following codes:
Single condition:

cr = 1.5
if (cr == 1.5) :
print ("Warning!")
Note: You may change the cr value to other number and see what happens.

Two conditions:

cr = 0.8
if ( cr == 1.5 ) :
print ("Warning!")
else :
print("Normal")

Multiple conditions:

cr = 1.5
if ( cr >= 1.5 ):
print ("Critical")
elif (cr >=1.0 and cr <1.5):
print("Warning")
else:
print("Normal")
Note: Change cr to 1.1 and 0.7, and observe the output

In some cases, several single condition If statements need to be combined together to evaluate separate
conditions but accumulatively, they contribute to the results. Taking the example below, type and run
the codes:

spe = 9505
pre = 13000
tem = 165
if ( spe >= 9500 ):
if ( pre >= 12800):
if (tem >= 150):
print ("Equipment FAIL")

The codes above comprise three If statements, but only if all of them are evaluated to True, then the text
will be printed. Alternatively, the codes above can be written using the logical operators as shown
below:

if(spe>=9500 and pre>=12800 and tem>=150):


print ("Equipment FAIL")
Even though these codes are written differently, but they produce the same result as the former codes.
In this case, you have to recall the behaviour of the logical operators as discussed earlier. Another
example of If statement with conditions having logical operators is as follows:

if (accuracy == 90 or precision >= 70):


print (“Good!”)
else:
print (“Repeat!”)

In the codes above, if the accuracy value is equal to 90, or precision value is greater than or equal to 70,
“Good” will be printed. Otherwise, “Repeat” will be printed.

**See more examples and explanation about Python Decision statement:


https://www.youtube.com/watch?v=Zp5MuPOtsSY

4.7.2 Repetition
Repetition, which is also known as looping, is a statement to execute codes (or a block of codes) for
several times. Repetition also performs its duty based on condition(s). Type and run the following
examples:

Example 1:
for x in range(10):
print(x)

Example 2:
for x in range(10):
print(x, end=’ ‘)

Example 3:
for num1 in range(3):
for num2 in range(10, 14):
print(num1, ",", num2)

**See more examples and explanation about Python Repetition:


https://www.youtube.com/watch?v=94UHCEmprCY

4.7.3 Functions
A function is a block of codes that becomes executed when it is called – using its name. So far, we have
seen the print function that displays the values we supply in the parenthesis (this is called arguments).
The print function and many others are predefined functions provided by the tools/library. Other than
predefined functions, we may also create functions, and these are known as user-defined functions.

Type and run the following codes:

def ex_function():
print("Hello from a function")
ex_function() #function call

The simple codes above define a function to print the text “Hello from a function”. You could see that
the text will only be printed when the function name, in this case it is ex_function, is called. The code
ex_function() mimics the way the print function is called. That is how a function is executed/triggered
for actions.

Try the following codes:

def bmi_score(w,h):
return w/(h*h)
print(bmi_score(90,1.75))
print(bmi_score(51,1.53))
print(bmi_score(45,1.51))
print(bmi_score(89,1.65))

The codes above show another example of function definition and call. In this case, the same function
is called for four times – this becomes one of the main purposes of having functions in the code i.e.,
code reusability.

4.7.4 Library and Array


Python is provided with numerous kinds of library whereby each of them contains functions that you
may use in your codes. One of the examples of library is numpy, where it can be used for a lot of
functions including array.

To understand array, you firstly need to understand that thus far the variables that you have seen are
normal variables. The behaviour of a normal variable is that it can only store a single value, e.g.:

num1 = 3
num1 = 3 * 12
print(num1)

The above codes show that when the same variable, num1 is assigned with a value for two times, only
the latest one will be taken for printing (display). This means that the latest value assigned will overwrite
the previous value. It happens due to the behaviour of a variable that can only store a single value.

Imagine that you are dealing with 100 values of marks, or 500 values of salary, you need to use 100 and
500 separate variables, respectively. This is cumbersome, thanks to the concept of array which allows
a single variable to hold multiple values with the same type. To utilize array, the numpy library is used
as the following:

import numpy #importing library


arr = numpy.array([10, 22, 35, 44, 51]) #using function
print(arr)
The arr variable now can store five values because it is declared as an array variable. You should also
note that when array is defined, numpy.array() function is called. This function can only be used in
the code after the numpy library is imported, which is shown in the first line of the codes above.
Furthermore, the above codes show that the whole five numbers are printed at once. You may also print
the selected number (each value in an array is called an element) from the arr array using index. An
index is written in a pair of square brackets, which shows the position of the element in the array. In
Python, the index starts with 0. Type and run the following codes and see what happens:

print(arr[0])
print(arr[4])
print(arr[2])

You have to be cautious so that the index value does not exceed its boundary. The index exceeding its
boundary becomes one of the common errors done with regards to array. This is mainly due to the fact
that the last element’s index is position of element – 1 (as index begins with 0). Type and run the
following code and see what happens:

print(arr[5])

Another important behaviour that you have to note about array is that its elements must be of the same
data type. Taking the following codes, you should be able to make a conclusion about the matter:

a_list = numpy.array([1,25,"Three"])
print(a_list[0]+a_list[1])

You should supposedly expect that the above codes will result in 26 being printed, as a result of the
first and second elements of a_list array are added together. However, Python treats all the elements in
the a_list array as string because one of them (the third element) is indeed a string. Consequently, when
the additional operation is carried out, the + operator is treated as the string concatenation operator,
instead of the addition operator.

The above codes should have been written as the following, then only the printed output will be 26.

b_list = numpy.array([1,25,3])
print(b_list[0]+b_list[1])

4.7.5 Dictionary
A dictionary is a collection of unordered, changeable and indexed data. Type and run:

cars = {
"brand": "Proton",
"model": "Preve",
"year": 2015
}
print(cars)
m = cars['model']
m = cars.get('year')

To add more item:

cars["color"] = "Blue"
print(cars)

To delete item:

del cars["model"]
print(cars)

4.7.6 Data Frame (Table)


Data frame or table becomes one of the most important data types in Python. This is due to the fact that
data analytics normally involves data that are arranged in rows and columns format. To use data frame,
Python provides a powerful library known as pandas.

Type and run the following codes:

import pandas as pd
data = {'Name':['Carrol', 'Mike', 'John'],'Gender':['Female', 'Male', 'Male'],
'Height':[160,175,173], 'Weight':[49,89,77], 'Age':[35,36,41]}
df = pd.DataFrame(data)
print(df)

You should be able to get a nice data frame containing three observations (rows) and five columns
printed out.

Similar to the concept of array, you may also access to specific values (data) in the data frame by using
index operators. Based on data frame you created earlier, run each of the following codes and make a
conclusion of what it does:

Example 1:
print(df['Height'])
print(df.loc[:,'Height'])

Example 2:
print(df.loc[:,['Name','Age']])
print(df[['Name','Age']])

Example 3:
print(df.loc[2])
print(df.loc[1:2])
print(df.loc[[1,2]])

Example 4:
print(df.loc[[0,1],['Name','Weight']])
Example 5:
print(df.iloc[:,2])
print(df.iloc[2])
print(df.iloc[2,4])

Consider the following codes:


h1=df[['Height']]
h2=df['Height']

When the two codes above are printed as follows:


print(h1)
print(h2)

You will find that even though both lines will display the same three values of height, but their display
are slightly different between each other. The content of h1 actually contains the height values in data
frame form. In contrast, the content of h2 contains the literal height values in numerical form. To clearly
see the difference, let’s perform a mathematical function sum that adds all the three values of height in
h1 and h2:

print(sum(h1))
print(sum(h2))

You will find that with h1, the mathematical operation could not be performed due to the fact that the
three values of height are in data frame form. In contrast, the mathematical operation is successfully
carried out on h2 because all the three height values are in numerical form.

4.7.7 Managing Empty Cells


In this section, you will learn on how to manage empty cells in the data frame. An empty cell occurs in
the dataset when there is no data provided. To see the impact of having empty cells in the dataset, firstly
create the data frame as the following:

import pandas as pd
import numpy as np
data = {'Name':['Ali', 'Abu', 'George', 'Mike', 'Chan', 'Sammy'],
'Marks':[70, 65,np.nan, 82, 78, 75]}
score = pd.DataFrame(data)

The above codes create a data frame containing two columns i.e. “Name” and “Marks” with six
observations/records (rows). To display the data frame, you may type and run:

print(score)

The table shows that in the third observation there is an empty cell marked as “NaN”. Now, using the
predefined function sum(), you are going to calculate the sum of all marks. The following codes will
carry out the task:
print(sum(score[‘Marks’]))

You will find that the above code will produce error – which is related to the NaN value exists in the
marks. Having the NaN value, which is a non-numerical value, the mathematical functions like sum()
will generate error. To handle this issue, you need to treat the NaN value so that it will not affect the
calculation. One of the ways is to set the NaN value into the value of 0. Another way is to drop the NaN
value when it exists using the special function knows as dropna():

score2 = score.dropna()
print(sum(score2['Marks']))

4.7.8 Importing and Exporting Data


The most common way of getting data for analysis is through importing csv or Excel dataset. Python
provides this capability through pandas function.

Assume that there is a dataset named ds1.csv in the working folder (working path), the following code
will import the data into Python environment:

my = pd.read_csv("ds1.csv")

To display the data, call the variable (data frame) name:

my

The correctness of data import may be verified by using head() function (this is normally used when
data is huge, and we only want to display certain n observations of the data):

my.head()

Summary of data containing some statistical measurements can be retrieved by using describe()
function:

my.describe()

To save a csv file use to_csv() function:

my.to_csv("ds1.csv")

For Excel files, the functions read_excel() and to_excel() are used for importing and saving Excel files,
respectively.

The examples above are used when the dataset is located in the working directory (i.e. the directory
where you set the Jupiter Notebook to work). If the dataset is located outside of the working directory,
then the path of the location of the dataset has to be supplied in the import/save functions. For instance;
pd.read_csv("C:/Users/johnny/Py/ds1.csv")
Note: to check for the working directory, type and run the following codes:
import os
os.getcwd()

SELF-LEARNING ACTIVITY
Find a dataset known as Iris. Import and display the dataset from Iris into a table in the Google Colab.

4.7.9 Visualization/Plotting
The commonly used library for Python plotting is matplotlib, whereby its sub-library known as pyplot
will be used as examples in this book. Type and run the following simple plotting:

import matplotlib.pyplot as plt


# set the x values
x = [1, 2, 3, 4, 5]
# set the y values
y = [2, 4, 6, 8, 10]
#plot x and y
plt.plot(x, y)
# set the label of x-axis
plt.xlabel('Numbers')
# set the label of y-axis
plt.ylabel('Doubles')

The following plot will be displayed:

In most cases, the properties of the plotting functions can be edited so that different
visualization/plotting can be displayed. For instance, the fourth line in codes above can be changed to:
plt.plot(x, y, ‘go’)
This will make the points to be presented as dots:

When the above codes are amended with more points/lines then the plot will also change accordingly.
For instance:
import matplotlib.pyplot as plt
x = [1, 2, 3, 4, 5]
y = [2, 4, 6, 8, 10]
a = [1, 2, 3, 4, 5]
b = [3, 5, 7, 9, 11]

plt.plot(x, y, 'go')
plt.plot(a, b, 'b*')
plt.xlabel('Numbers')
plt.ylabel('Doubles')

The following plot will be generated:


Other than the line and scatter plots shown above, Python may also be used to plot many other types of
graphs/charts. For example, the following codes display a bar chart:

Employee = ['John','Mike','Brenda','Tony','Miranda']
Salary = [145000,92000,152000,79000,87000]
plt.bar(Employee, Salary)
plt.title('Employees Annual Salary')
plt.xlabel('Employee')
plt.ylabel('Annual Salary')
plt.show()
The following codes plot a pie chart for the data:

import numpy as np
medals = ['USA', 'Britain', 'China','Russia', 'Germany', 'Japan', 'France']
data = [46, 27, 26, 19, 17, 12, 10]
fig = plt.figure(figsize =(5, 7))
plt.pie(data, labels = medals)
plt.title('Gold Medals by Top 7 Countries in 2016 Olympics')
plt.show()
Thus far, the examples show how to plot graphs/charts using hard-coded data (data written in the code
directly). The plotted data may also be retrieved from the library, or the csv/Excel files as shown in the
earlier section.

SELF-LEARNING ACTIVITY
Using the Iris dataset you imported earlier, come out with two visualizations to represent the data.

**See the videos below for further explanation about the contents of this topic:
https://recordings.roc2.blindsidenetworks.com/utp/9058d062d33ebba8947bd34bbb62d4c14b1ab850-
1620175923063/capture/
https://recordings.roc2.blindsidenetworks.com/utp/4df8fdc80b974c3a0cdcbee87fcfb1a859a94559-
1620195618605/capture/

SUMMARY
In this topic, you have learned the important data analytics tools i.e., Python programming language
and Google Colab as the development environment software. The fundamentals of Python have been
discussed such as variable, string, comments and programming structures become the basic constructs
for Python-based program/solution. Furthermore, this topic also discusses the application of Python in
data science activities, namely data management/manipulation as well as basic plots/visualization. You
will utilize/apply all these knowledge and skills for the development of predictive analytics solutions,
which will be covered in the next topic.

KEYWORDS
Python, Python structure, data analytics, data management with Python, visualization
REFERENCES
IEEE for Engineering, Science & Technology

[1] (Book) Author, Book Title, Edition, City of Publisher, State: Publisher, Year.
[2] (Chapter in Book) Author, “Title of chapter”, in Title of Published Book, Editor, Edition, City of
Publisher, State: Publisher, Year, pp. x-xx.
[3] (Journal) Author, “Article title”, Title of Journal/Periodical, vol. x, no x, pp x ̶ xx, month, Year.
[4] (E-book) Authors, Book Title, City of Publisher, State: Publisher, year. [Online] Available:
http/DOI/URL.
[5] (Online Journal) Author, “Article title”, in Title of Journal/Periodical, vol. x, no x, pp x ̶xx, month,
Year. [Online]. Available: site/path/file. Accessed on: Month, Day, Year.

You might also like