1 Intro
1 Intro
Course Introduction
By Dr. Qi Li
BUS 601
Computer and Data Analytics
• Computers can perform calculations and make logical decisions
phenomenally faster than human beings can
• Today’s personal computers can perform billions of calculations in one
second
• Supercomputers are already performing thousands of trillions (quadrillions) of
instructions per second!
• Computers process data under the control of sequences of instructions called
computer programs
• Guide the computer through ordered actions specified by computer
programmers
• A computer consists of various physical devices referred to as **hardware
(keyboard, screen, mouse, solid-state disks, hard disks, memory, DVD drives
and processing units
• Moore’s Law
– For many decades, hardware costs have fallen rapidly
– Every year or two, the capacities of computers have approximately doubled inexpensively
BUS 601
Computer Architecture
• Von Neumann architecture
BUS 601
Input and Output
• Input Unit:
– obtains information (data and computer programs)
from input devices and places it at the disposal of the
other units for processing
• Output Unit:
– Takes information the computer has processed and
places it on various output devices to make it
available for use outside the computer
BUS 601
Memory Unit
• Rapid-access, relatively low-capacity “warehouse” section
retains information that has been entered through the input unit,
making it immediately available for processing when needed
• Also retains processed information until it can be placed on
output devices by the output unit
• Information in the memory unit is volatile—it’s typically lost
when the computer’s power is turned off
• Called memory, primary memory or RAM (Random Access
Memory)
• Main memories on desktop and notebook computers contain as
much as 128 GB of RAM, though 8 to 16 GB is most common
– A gigabyte is approximately one billion bytes
– A byte is eight bits
– A bit is either a 0 or a 1
BUS 601
Arithmetic and Logic Unit (ALU)
• “Manufacturing” section
• Performs calculations, such as addition,
subtraction, multiplication and division
• Also contains the decision mechanisms that
allow the computer, for example, to compare
two items from the memory unit to determine
whether they’re equal
• In today’s systems, the ALU is part of the next
logical unit, the CPU
BUS 601
Central Processing Unit (CPU)
• “Administrative” section
• Coordinates and supervises the operation of the other sections
• Tells the input unit when information should be read into the
memory unit
• Tells the ALU when information from the memory unit should be
used in calculations
• Tells the output unit when to send information from the memory
unit to specific output devices
• Most computers have multicore processors that implement
multiple processors on a single integrated-circuit chip
• A dual-core processor has two CPUs, a quad-core processor has
four and an octa-core processor has eight
– Intel has some processors with up to 72 cores
BUS 601
Secondary Storage Unit
• Long-term, high-capacity “warehousing” section
• Programs or data not actively being used by the other units
normally are placed on secondary storage devices until they’re
needed
• Information on secondary storage devices is persistent—it’s
preserved even when the computer’s power is turned off
• Secondary storage information takes much longer to access than
information in primary memory, but its cost per unit is much less
• Many current drives hold terabytes (TB) of data
– A terabyte is approximately one trillion bytes
– Typical hard drives on desktop and notebook computers hold up to 4 TB,
and some recent desktop-computer hard drives hold up to 15 TB
BUS 601
Data Hierarchy
BUS 601
Bits
• A bit (short for “binary digit”—a digit that can
assume one of two values) is the smallest data
item in a computer
• Can have the value 0 or 1
• Bits are the basis of the binary number system
BUS 601
Characters
• Decimal digits (0–9), letters (A–Z and a–z) and special symbols
such as $ @ % & * ( ) – + " : ; , ? /
• Computer's character set contains the characters used to write
programs and represent data items
• Computers process only 1s and 0s, so a character set represents
every character as a pattern of 1s and 0s
• Python uses Unicode® characters composed of one, two, three or
four bytes (8, 16, 24 or 32 bits, respectively)—known as UTF-8
encoding
• The ASCII (American Standard Code for Information
Interchange) character set is a subset of Unicode that represents
letters (a–z and A–Z), digits and some common special
characters
• ASCII subset of Unicode
• Unicode charts for all languages,
BUSsymbols,
601 emojis and more
Fields
• Fields are composed of characters or bytes
• A field is a group of characters or bytes that conveys
meaning
– a person’s name
– a person’s age
– etc.
BUS 601
Records
• A record is a group of related fields
• A record for an employee might consist of
– Employee identification number (a whole number)
– Name (a string of characters)
– Address (a string of characters)
– Hourly pay rate (a number with a decimal point)
– Year-to-date earnings (a number with a decimal point)
– Amount of taxes withheld (a number with a decimal point)
BUS 601
Files
• A file is a group of related records
• More generally, a file contains arbitrary data in arbitrary
formats
• Any organization of the bytes in a file, such as
organizing the data into records, is a view created by
the application programmer
• Not unusual for an organization to have many files,
some containing billions, or even trillions, of characters
of information
BUS 601
Databases
• A database is a collection of data organized for
easy access and manipulation
• Most popular model is the relational database, in
which data is stored in simple tables
• A table includes records and fields
• You can search, sort and otherwise manipulate
the data, based on its relationship to multiple
tables or databases
BUS 601
Programming Language
• Programmers write instructions in various
programming languages
– Some directly understandable by computers
– Others require intermediate translation steps
• Three general types
– Machine languages
– Assembly languages
– High-level languages
BUS 601
Machine Language
• Any computer understands only its own machine
language, defined by its hardware design
• Generally consist of strings of numbers (ultimately 1s
and 0s) that instruct computers to perform their most
elementary operations
• Cumbersome for humans
• Section of an early machine-language payroll program
that adds overtime pay to base pay and stores the result
in gross pay
– +1300042774
– +1400593419
– +1200274027
BUS 601
Assembly Languages and Assemblers
• English-like abbreviations to represent elementary
operations
• Formed the basis of assembly languages
• Assemblers were developed to convert assembly-
language programs to machine language at computer
speeds
• Section of an assembly-language payroll program that
adds overtime pay to base pay and stores the result in
gross pay
– load basepay
– add overpay
– store grosspay
BUS 601
High-Level Languages and Compilers
• With the advent of assembly languages, computer usage increased
• Programmers still needed numerous instructions to accomplish even
simple tasks. High-level languages enable single statements to
accomplish substantial tasks.
• A typical high-level-language program contains many statements,
known as the program’s source code
• Compilers convert high level language into machine language.
• High-level languages look almost like everyday English and contain
commonly used mathematical notations.
• Payroll program written in a high-level language might contain a
statement such as grossPay = basePay + overTimePay
• Python is among the world’s most widely used high-level
programming languages
BUS 601
Interpreters
• Interpreter programs execute high-level language
programs directly and avoid the delay of compilation
• Interpreted programs run slower than compiled
programs
• Most widely used Python implementation—CPython—
uses a clever mixture of compilation and interpretation
to run programs
BUS 601
Operating Systems
• Make using computers more convenient for users, application
developers and system administrators
• Provide services that allow each application to execute safely,
efficiently and concurrently with other applications
• Core components of the operating system are implemented in the
kernel
• Linux, Windows and macOS are popular desktop computer
operating systems
• Google’s Android and Apple’s iOS are the most popular mobile
operating systems
BUS 601
Why Python
• Open source, free and widely available with a massive open-source
community
• Massive numbers of free open-source Python applications
• Easier to learn than many other languages, enabling novices and professional
developers to get up to speed quickly
• Easier to read than many other popular programming languages
• Widely used in education, web development (e.g., Django, Flask), financial
community, Artificial Intelligence
• Enhances developer productivity with extensive standard libraries and third-
party open-source libraries
– Programmers can write code faster and perform complex tasks with minimal code
• Supports popular procedural, functional-style and object-oriented
programming
• Build anything from simple scripts to complex apps with massive numbers of
users, such as Dropbox, YouTube, Reddit, Instagram and Quora
• Widely used in the Extensive job market for Python programmers across
many disciplines, especially in data-science-oriented positions, and Python
jobs are among the highest paid of all BUS 601
programming jobs
Anaconda Python Distribution
• Easy to install on Windows, macOS and Linux and
supports the latest versions of Python, the IPython
interpreter and Jupyter Notebooks
• Also includes other software packages and libraries
commonly used in Python programming and data
science
• IPython interpreter
BUS 601
Other Popular Programming
•Languages
Basic
– Developed in the 1960s at Dartmouth College to familiarize novices with programming techniques
– Many of its latest versions are object-oriented
• C
– Developed in the early 1970s by Dennis Ritchie at Bell Laboratories
– Initially known as the UNIX operating system’s development language
– General-purpose operating systems and other performance-critical systems often are written in C or C++
• C++
– Based on C
– Developed by Bjarne Stroustrup in the early 1980s at Bell Laboratories
– Enhances C and adds capabilities for object-oriented programming
• Java
– Sun Microsystems in 1991 funded an internal corporate research project led by James Gosling, which
resulted in the C++-based object-oriented programming language called Java
– “write once, run anywhere” —Enable developers to write programs that will run on a great variety of
computer systems
– Used in enterprise applications, to enhance the functionality of web servers, to provide applications for
consumer devices (e.g., smartphones, tablets, television set-top boxes, appliances, automobiles and more)
and for many other purposes
– Originally the key language for developing Android smartphone and tablet apps, though several other
languages are now supported
BUS 601
• C#
– Based on C++ and Java
– One of Microsoft’s three primary object-oriented programming languages—others are Visual C++ and
Visual Basic
– Developed to integrate the web into computer applications and is now widely used to develop many types
of applications
– Microsoft now offers open-source versions of C# and Visual Basic
• JavaScript
– Most widely used scripting language
– Primarily used to add programmability to web pages
– All major web browsers support it
– Many Python visualization libraries output JavaScript as part of visualizations that you can interact with
in your web browser
– Tools like NodeJS also enable JavaScript to run outside of web browsers
• Swift,
– Introduced in 2014
– Apple’s programming language for developing iOS and macOS apps
– A contemporary language that includes popular features from languages such as Objective-C, Java, C#,
Ruby, Python and others
– Open source, so it can be used on non-Apple platforms as well
• R
– A popular open-source programming language for statistical applications and visualization
– Python and R are the two most widely used data-science languages
BUS 601
Test-Drives: Using IPython and Jupyter Notebooks
BUS 601
IPython Interactive Mode
• Entering IPython in Interactive Mode
– Open a command-line window on your system
• On macOS, open a Terminal from the Applications folder’s Utilities
subfolder
• On Windows, open the Anaconda Command Prompt from the start
menu
• On Linux, open your system’s Terminal or shell (this varies by Linux
distribution)
– Type ipython, then press Enter (or Return)
• Exiting Interactive Mode
– Type exit and press Enter to exit immediately
– Type Ctrl + d (or control + d) then confirm
– Type Ctrl + d (or control + d) twice
BUS 601
BUS 601
Executing a Python Program Using the
IPython Interpreter
• Execute a script named RollDieDynamic.py that you’ll write in
Chapter 6
• .py extension indicates the file contains Python source code
• RollDieDynamic.py simulates rolling a six-sided die, presenting
a colorful animated visualization that dynamically graphs the
frequencies of each die face
BUS 601
Summary
• Basic computer architecture
• Why Python
• Python Running Environment
• Python Lab
BUS 601