0% found this document useful (0 votes)
43 views

01 - CM2015 - Introduction To Data Programming (2022-10)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
43 views

01 - CM2015 - Introduction To Data Programming (2022-10)

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 19

CM2015 – Programming with Data [SIM – UOL]

Topic 1
Introduction to Data Programming

Learning Outcomes
After completing this topic and the recommended reading, you should be able
to:
• Set up and run Jupyter Notebook on a Windows, Mac or Linux operating
system.
• Use Jupyter Notebook to write and edit code.
• Write and explain simple Python programs using variables and
mathematical operators.

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 1


CM2015 – Programming with Data [SIM – UOL]

1. Introduction to Data Programming

Data (definition)
• “Facts and statistics collected together for reference or analysis.”
[Oxford English Dictionary]

• “Information, especially facts or numbers, collected to be examined and


considered and used to help decision-making, or information in an
electronic form that can be stored and used by a computer.”
[Cambridge Dictionary]

• “Factual information (such as measurements or statistics) used as a basis


for reasoning, discussion, or calculation.”
[Merriam-Webster]

• “Data are individual facts, statistics, or items of information, often


numeric, that are collected through observation. In a more technical
sense, data are a set of values of qualitative or quantitative variables
about one or more persons or objects.”
[Wikipedia]

Information (definition)
• “Facts provided or learned about something or someone.”
[Oxford English Dictionary]

• “Facts or details about a situation, person, event, etc.”


[Cambridge Dictionary]

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 2


CM2015 – Programming with Data [SIM – UOL]

• “Knowledge obtained from investigation, study, or instruction.”


[Merriam-Webster]

• “Knowledge communicated or received concerning a particular fact or


circumstance; knowledge gained through study, communication,
research, instruction, etc.”
[Dictionary.com]

Data vs. Information


• Data
o Raw, unorganised facts that need to be processed.
o Unusable until it is organised.
• Information
o Created when data is processed, organised, and structured.
o Needs to be put in an appropriate context in order to become
useful.

Data Science

Data Processing Information

Programming and Data


• Tasks to undertake for data programming
o Data collection
o Data processing (wrangling)
o Data visualisation

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 3


CM2015 – Programming with Data [SIM – UOL]

o Train and apply algorithms from fields such as machine learning,


statistics, data mining, optimisation, image processing, etc.

• Programming
o The process of producing an executable computer program that
performs a specific task.
o The purpose is to find a sequence of instructions that automate the
implementation of the task for solving a given problem.

• Programming Language
o The source code of a program is written in one or more languages
that are intelligible to humans, rather than machine code, which is
directly executed by the CPU.

o Python
§ https://www.python.org/

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 4


CM2015 – Programming with Data [SIM – UOL]

2. Introduction to Development Environments

Source-code Editors
• Source-code editor, or programming text editor, is a fundamental
programming tools designed specifically for editing source code of
computer programs.
• It highlights the syntax elements of your programs, and provides many
features that aid in your program development.
• Examples:
o Visual Studio Code [https://code.visualstudio.com/]
o Notepad++ (Windows only) [https://notepad-plus-plus.org/]
o Vim [https://www.vim.org/]
o Sublime Text (not open source) [https://www.sublimetext.com/]
o Atom [https://atom.io/]
o Emacs [https://www.gnu.org/software/emacs/]
o TextMate (Macs only) [https://macromates.com/]

o Jupyter
§ https://jupyter.org/

Integrated Development Environments (IDEs)


• Integrated development environment is a software application that
provides comprehensive facilities to computer programmers for software
development.
• An IDE normally consists of at least a source code editor, build
automation tools and a debugger.
• Examples:
o Spyder [https://www.spyder-ide.org/]

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 5


CM2015 – Programming with Data [SIM – UOL]

o RStudio [https://rstudio.com]
o Eclipse [https://www.eclipse.org/]
o Microsoft Visual Studio [https://visualstudio.microsoft.com/vs/]
o Wing Python IDE [https://wingware.com]

Markdown / Markup Languages


• Markdown is a markup language that consists of a set of rules for adding
formatting elements to plain text documents
o Boldface, italics, headers, paragraphs, lists, code blocks, images,
etc.
o https://www.markdownguide.org/
• Invented by John Gruber
o The overriding design goal for Markdown’s formatting syntax is to
make it as readable as possible.
o The idea is that a Markdown-formatted document should be
publishable as-is, as plain text, without looking like it’s been
marked up with tags or formatting instructions
• Examples
o HTML; XML; LaTeX

Version Control Systems


• Version Control is a class of systems responsible for managing changes
to computer programs, documents, large websites, or other collections of
information.
• Version Control Systems (VCS) are software tools that help software
teams manage changes to source code over time.
o Undertakes the tedious task of keeping track of the changes to all
project’s files and who made them

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 6


CM2015 – Programming with Data [SIM – UOL]

o Allows users to recover any previous version at any given time


• Examples:
o Subversion [https://subversion.apache.org]

o Git
§ https://git-scm.com/

o GitHub
§ https://github.com/

Package/Environment Manager
• Package manager, or package management system, is a collection of
software tools that automates the process of installing, upgrading,
configuring, and removing computer programs for a computer in a
consistent manner. Also deals with packages, distributions of software
and data in archive files.
• Environment manager enables personalised, consistent desktop
environments without cumbersome roaming profiles or scripts.
• Example:

o Anaconda
§ https://www.anaconda.com/

Installing Anaconda
• Go to Anaconda, download Anaconda Individual Edition
o https://www.anaconda.com/products/distribution
• Packages include
o conda
§ package management system

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 7


CM2015 – Programming with Data [SIM – UOL]

o pandas, scikit-learn, nltk


§ packages for data science
o Anaconda Navigator
§ a graphical user interface
o QtConsole
§ an interactive Python environment
o Spyder
§ a standard cross-platform IDE for Python
o Jupyter Notebook
§ an interactive web-browser based application for creating
and sharing code

Package Installer for Python (pip)


• pip is the de facto and recommended package-management system
written in Python and is used to install and manage software packages.
• It connects to an online repository of public packages, called the Python
Package Index.
• We use pip to install packages from the Python Package Index
• Examples
o pip install beautifulsoup4
o pip install -r dependencies
§ Install packages based on dependencies in code
o pip freeze
§ See all the packages installed

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 8


CM2015 – Programming with Data [SIM – UOL]

3. Introduction to Python

• Open-source, interpreted, high-level, object-oriented, general-purpose,


easy to download, write and read
• Named for the British comedy group Monty Python
• Simpler language, allow us to focus less on the language and more on
problem solving
• Many of the best parts of other languages are included
o Data structures
o Controls
o Many packages for common tasks

Variables
• Variable is a named piece of memory whose value can change during the
running of the program; constant is a value which cannot change as the
program runs.
o Python doesn’t use constant
• We use variable names to represent objects (number, data structures,
functions, etc.) in our program, to make our program more readable.
o All variable names must be one word, spaces are never allowed.
o Can only contain alpha-numeric characters and underscores.
o Must start with a letter or the underscores character.
o Cannot begin with a number.
o Case-sensitive
o Standard way for most things named in Python is lower with under
§ Lower case with separate words joined by an underscore

Comments

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 9


CM2015 – Programming with Data [SIM – UOL]

• Not processed by the computer, valued by other programmers.


• Header comments
o Appear at beginning of a program or a module
o Provide general information
• Step comments or in-line comments
o Appear throughout program
o Explain the purpose of specific portion of code
• Often comments delineated by
o // comment goes here
o /* comment goes here */
o # Python uses this

Python Operations
• Assignment Operator
o “=”
o Example:
§ a = 67890/12345
# compute the ratio, store the result in ram, assign to a
# the value of a is 5.499392
§ b=a
# b pointing to value of a

• Output
o “print()”
o Example:
§ print(‘Hello World!’) # print the string literals
§ print(a) # print the value of a

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 10


CM2015 – Programming with Data [SIM – UOL]

Data Types in Python


• Declaration of variables in Python is not needed
o Use an assignment statement to create a variable

• Float
o Stores real numbers
o a = 4.6
o print(type(a))

• Integer
o Stores integers
o b = 10
o print(type(b))

• Conversion
o int(a) # convert float to int => 4
o float(b) # convert int to float => 10.0

• Basic arithmetic operators


o 3+2 # Addition => 5
o 5–2 # Subtraction => 3
o 5 * -2 # Multiplication => -10
o 5 / 2.5 # Division => 2.0
o 2**2 # Exponentiation => 4
o 10 % 3 # Modulus => 1

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 11


CM2015 – Programming with Data [SIM – UOL]

o 10 // 3 # Floor Division => 3

• String
o Stores strings
o phrase = ‘All models are wrong, but some are useful.’
o phrase[0:3] # slicing character 0 up to 2
=> All
o phrase.find(‘models’) # find the starting index of word
=> 4
o phrase.find(‘right’) # word not found
=> -1
o phrase.lower() # set to lower case
=> ‘all models are wrong, but
some are useful.’
o phrase.upper() # set to upper case
=> ‘ALL MODELS ARE
WRONG, BUT SOME ARE
USEFUL.’
o phrase.split(‘,’) # split strings into list, base on delimiter
=> [‘All models are wrong’,
‘ but some are useful.’]

• Boolean
o Stores logical or Boolean values of TRUE or FALSE
o k=1>3
o print(k)

o print(type(k))

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 12


CM2015 – Programming with Data [SIM – UOL]

• Logical operators
o Conjunction (AND): “and”
o Disjunction (OR): “or”
o Negation (NOT): “not”
a b a and b a or b not a
T T T T F
T F F T F
F T F T T
F F F F T

Data Structures in Python


• Tuples
o Store ordered collection of objects
o Immutable: elements cannot be modified, added or deleted
o Written with round brackets “( )”
§ tuple1 = (“apple”, “banana”, “cherry”, “orange”, “kiwi”,
“melon”, “mango”)
§ tuple2 = (“Handsome Koh”, 4896, 13.14, True)
o Accessing elements by indexing
§ tuple1[0] # first element index => ‘apple’
§ tuple1[-1] # last element index => ‘mango’
§ tuple1[2:5] # range of elements => (‘cherry’, ‘orange’,
‘kiwi’)

• Lists

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 13


CM2015 – Programming with Data [SIM – UOL]

o Store ordered collection of objects; mutable


o Written with square brackets “[ ]”
§ list1 = [“apple”, “banana”, “cherry”]
§ list2 = [“Handsome Koh”, 4896, 13.14, True]
o Changing elements
§ list1.append(“orange”) # add to last position
=> [‘apple’, ‘banana’, ‘cherry’,
‘orange’]
§ list1[2] = “coconut” # modify index element
=> [‘apple’, ‘banana’, ‘coconut’,
‘orange’]
§ list1.remove(“apple”) # delete elements
=> [‘banana’, ‘coconut’, ‘orange’]
§ list1.insert(2, “durian”) # insert element at position
=> [‘banana’, ‘coconut’, ‘durian’,
‘orange’]

• Sets
o Store unordered, unindexed, nonduplicates collection of objects
o Written with square brackets “{ }”
§ set1 = {“apple”, “banana”, “cherry”}
§ set2 = {“apple”, “samsung”}
o Set operations
§ set1.union(set2) # Union both sets
=> {‘apple’, ‘banana’, ‘cherry’,
‘samsung’}
§ set1.intersection(set2) # Intersect both sets
=> {‘apple’}

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 14


CM2015 – Programming with Data [SIM – UOL]

• Dictionaries
o Store unordered collection of objects
o Written with square brackets “{ }”, and “key:value” pair
§ thisdict = {“brand”: “Ford”, “model”: “Mustang”,
“year”: 1964}
o Accessing/modifying elements by key name
§ thisdict[“model”] => ‘Mustang’
§ thisdist[“year”] = 2018 => {‘brand’: ‘Ford’,
thisdist[“color”] = “red” ‘model’: ‘Mustang’,
‘year’: 2018,
‘color’: ‘red’}

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 15


CM2015 – Programming with Data [SIM – UOL]

4. Introduction to Jupyter Notebook

• Jupyter Notebook is a web-based interactive computing platform.


• “Julia” + “Python” + “R”
• Integrate code and output into a single document contains:
o Live code, mathematical equations, visualisations, and
explanatory/narrative text, interactive dashboards and other media
• Can be easily shared
o Notebook files have “.ipynb” extension
o Export to “.html” and “.pdf” extensions

• Launch “Jupyter Notebook” from “Anaconda Navigator”


• Create new notebook
o “File” à “New Notebook” à “Python 3”
• Exporting notebook
o “File” à “Download as” à “HTML (.html)”
o “File” à “Print Preview” (for PDF)
• Shutting Down Jupyter
o “File” à “Close and Halt”
o Quit

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 16


CM2015 – Programming with Data [SIM – UOL]

5. Exercises

1.301 Practice Exercises (Coursera)


• Refers to “1.301 part-1.html”

1.302 A bit more Python – our first downloadable notebook!


(Coursera)
• Refers to “1.302 pythonPractice.html”

1.304 World’s Population


• Refers to “1.304 Topic 1 - Lab.html”

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 17


CM2015 – Programming with Data [SIM – UOL]

6. Practice Quiz
• Work on Practice Quiz 01 posted on Canvas.

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 18


CM2015 – Programming with Data [SIM – UOL]

Useful Resources

o http://

Prepared by Koh Chung Haur @ 2022 (version 2022.1) 19

You might also like