Csce5560 - InfrastructureTechnologies 2
Csce5560 - InfrastructureTechnologies 2
2
The Internet
• The Internet is an open system
– Details publicly available
– A lot of software is free
– Lots of publicly available expertise available
– Dangers with privacy and security
• Implications of open systems
– Wide variety of implementations
– Cost of implementation less
– High level of compatibility
– Wide variety of developers selling products
3
The Internet
• Short for “internetworking”
• Worldwide system of interconnected networks & computers
– Computer network is an interconnection of a group of computers
• Links businesses, educational institutions, government
agencies, and individuals
• ARPANET was the world’s first operational network, and the
predecessor of the global Internet
– Developed by DoD in late 1960s
4
World Wide Web (WWW)
• Original Internet provided screens full of text
– GUIs added some color and layout, but still not very interesting
• Then hyperlinks were invented
– The mouse was invented to click on hyperlinks
• Uniform Resource Locater (URL) allowed sites to be named
Tim Berners-Lee submitted proposal to form World Wide Web in March 1989
5
World Wide Web (WWW)
6
Evolution of the Internet
• Innovation Phase, 1964 – 1974
– Creation of fundamental building blocks
– TCP/IP protocols and models being proposed (1974)
• Institutionalization Phase, 1975 – 1995
– Large institutions provide funding and legitimization
– ARPANET adopts TCP/IP in 1983 (200 routers)
– NSF funds TCP/IP based backbone network (NSFNET) in 1984
• First commercial site was Amazon.com on July 5, 1994
• In 1994, the WWW grew by 2300 percent!
8
Internet Technology Concepts
• In 1995, the Federal Networking Council passed a resolution
that defines the term Internet as a global information system
RESOLUTION: The Federal Networking Council (FNC) agrees that the following
language reflects our definition of the term "Internet".
"Internet" refers to the global information system that --
(i) is logically linked together by a globally unique address space based on the
Internet Protocol (IP) or its subsequent extensions/follow-ons;
(ii) is able to support communications using the Transmission Control
Protocol/Internet Protocol (TCP/IP) suite or its subsequent extensions/follow-ons,
and/or other IP-compatible protocols; and
(iii) provides, uses or makes accessible, either publicly or privately, high level
services layered on the communications and related infrastructure described herein.
10
Packet-Switched Networks
11
Key Internet Technologies
• TCP/IP Communications Protocol
– Protocol
• Set of rules that govern transmission of information between two
points
• Connectivity between computers enabled by protocols
– TCP (Transmission Control Protocol)
• Establishes connections among sending/receiving web computers
• Handles assembly of packets at point of transmission, and re-
assembly at receiving end
– IP (Internet Protocol)
• Provides the Internet’s addressing scheme
– Advantages
• Open nature, flexible, routable
12
Key Internet Technologies
• Client/Server Computing
– Distributed computing model
– Server sets rules of communication for network and provides every
client with an address so others can find it on the network
– Has largely replaced centralized mainframe computing
– Internet is the largest implementation of client/server computing
13
The Internet
15
TCP/IP and OSI Model
Layered model:
• Assists in protocol design
• Fosters competition
• Changes in one layer do
not affect other layers
Network
• Provides a common
Link language
Physical
Internet Protocol Stack
Physical
16
Sending and Receiving Messages
Source: Cisco 17
IP Addresses
• Each host on Internet assigned specific and unique number
for identification that serves as “postal address” on the
network
• IPv4
– Unique 32-bit number
– Divided into four octets (set of eight bits) separated by periods
• E.g., 144.92.43.178
– Network class determined from first octet
18
IPv4 Network Classes
20
Routing Internet Messages
21
Domain Names, DNS, and URLs
• Domain Name
– IP addressed expressed in natural language
• Domain Name System (DNS)
– Allows numeric IP address to be expressed in natural language
• Uniform Resource Locator (URL)
– Address used by Web browser to identify location of content on web
– E.g., https://www.unt.edu
IP Address
Domain Name
23
Layered Architecture
26
World Wide Web (WWW)
• A collection of web pages
– Each page contains links to other pages
• Multimedia documents
– Text
– Images • World Wide Web (WWW) is not the Internet
– Sounds • Access to the Internet does not mean you
– Drawings have e-commerce
– Video • WWW works in HTTP while web pages work
in HTML
• Hypertext
– Links to other documents (i.e., hyperlinks)
– Can begin execution of a program
27
World Wide Web (cont’d)
• Web servers
– Computers that run server software
– Waits for requests to arrive from a user (e.g., document)
– Sends (i.e., serves) the document to the requesting computer
• Web browsers
– Computer programs that can
• Display web documents
• Follow links
• Execute other programs
• Enhance applications such as real-time audio or video
– Provides access to information on the WWW
28
Architectural Overview
30
HTTP Methods
31
HTTP Methods
32
HTTP Message Headers
• The request line (i.e. line with GET method) may be followed
by additional lines, called request headers
• Responses may have response headers as well
33
HyperText Transfer Protocol
• HTTP is connectionless and stateless meaning that it forgets
about requests and responses after they are complete
– This feature of HTTP requires workarounds to support e-commerce
• As a result, the web is stateless – no concept of login session
– Browser sends request to the server and gets back a file
• Server forgets it has seen that particular client
Source: Information Systems: Creating Business Value, by Huber, Piercy, and McKeown, Wiley 2007 34
Cookies
• Cookies solved this stateless issue by having server supply
additional information when client requests a page – a small
(4KB) file or string
– When browser sends a request to a page, it first checks to see if it has
an associated cookie with the domain the request is going to – if so, it
appends the cookie to the request
– The server gets to interpret it any way it wants
https://www.vox.com/recode/2019/12/10/18656519/what-are-cookies-website-tracking-gdpr-privacy
Today, the proliferation of “accept web site cookie” alerts are triggered by
(1) the General Data Protection Regulation (GDPR), a sweeping data privacy
act enacted in the EU in May 2018, and (2) the ePrivacy Directive, first
passed in 2002 and then updated in 2009. The alerts are supposed to
improve our online privacy, but have been largely ineffectual.
35
HyperText Markup Language (HTML)
• The browser interprets the HTML through the use of tags,
which are used to format the content of the Web page
• The tags, enclosed in angle brackets (< and >), mark the
placement and appearance of page components
37
A Simple Web Page
38
Typical HTML Tags
39
HTML Forms
• The primary method of data input into a web site is the HTML
form which is composed of one or more HTML controls
• These controls must match the data needs of the transaction
and minimize chance of data errors
41
Server-Side Technologies
Server-side Programming Description
Technology Languages
43
PHP Overview
• PHP is a server-side scripting language used to develop
interactive web sites (e.g., to process HTML forms)
– Stands for “PHP: Hypertext Preprocessor”
Originally stood for “Personal Home Page”
48
Technology Infrastructure
49
Business Issues
• Internet commerce is about business – using the network
effectively to achieve business goals
– Current technology offers tools for reaching business goals
– If no clear idea of business goals in using network, then technology
cannot help us to achieve them
• Business goals can also be changed to take advantage of
current technology
– Technology often allows new kinds of operations that were previously
too expensive
• E.g., entirely appropriate to choose new focus on closer customer
relationships, using the Internet to communicate with customers
– Without the network, such a goal might have been too expensive or
difficult to achieve
50
Two Key Technology Issues
1. How to apply Internet technology to business problems?
– E-commerce applications bring together many technologies: the web,
databases, high-speed networking, cryptographic algorithms,
multimedia, etc.
– Putting them together to form a secure, high-performance, integrated
e-commerce system can be challenging
2. How to deal with the fast pace of technological change?
– Any commerce system must be prepared to accommodate and
incorporate new technologies as they become available
• The key to adaptability is a coherent system architecture that
clearly lays out what is to be accomplished and why
– Adopt new technologies that help us achieve our goals, while avoiding
new technologies that may seem exciting but do not really fit in with
our goals or the system
51
Required Infrastructure
• Successful implementation of e-commerce requires
– Significant changes to existing business processes
– Substantial investment in technology
• Poor web site performance
– Drives consumers to abandon some e-commerce sites
52
E-Commerce Features
• Important component of e-commerce is the web site
– Web site should have technology that will make it easier for its
customers to navigate
– Site should offer every single feature necessary
– Fully-functional and sustainable e-commerce web site
– Stable server for hosting
– Provide customer specific services
– Technology partners who constantly upgrade the features as well as
the technology
– Help business partners such as logistic partners and suppliers to share
and exchange business data
53
E-Commerce Features (cont’d)
• In practice, many web servers also implement other features:
– Authentication and authorization before allowing resource access
– Handling of static content and dynamic content
– HTTPS support using SSL/TLS to allow secure (encrypted) connections
to the server on the standard port 443 instead of the usual port 80
– Content compression to reduce the size of the responses
– Virtual hosting to serve many web sites using one IP address
• Hosting multiple domain names on a single server, allowing server
to share its resources (e.g., memory, processor cycles) without
requiring all services provided to use same host name
54
E-Commerce Infrastructure
• Information superhighway infrastructure
– Internet, LAN, WAN, routers, telecom, cable TV, wireless, etc.
• Messaging and information distribution infrastructure
– HTML, XML, e-mail, HTTP, etc.
• Common business infrastructure
– Security, authentication, electronic payment, directories, catalogs, etc.
55
Network and Packet Switching
• All EC depends on network to securely transmit data
– Internet (see earlier section)
– Intranet
• Local or restricted communications network
– Extranet
• Controlled private network that allows access to a subset of
information to partners, vendors, and suppliers, customers, etc.
– Value-added network (VAN)
• Hosted service offering that acts as an intermediary between
business partners sharing data via shared business processes
– Virtual private network (VPN)
• Extends a private network across a public network, enabling users
to send/receive data across shared or public networks as if their
devices were directly connected to the private network 56
E-Commerce Processes
1. Attract customers
– Advertising, marketing
2. Interact with customers
– Catalog, negotiation
3. Handle and manage orders
– Order capture
– Payment
– Transaction
– Fulfillment (physical good, service good, digital good)
4. React to customer inquiries
– Customer service
– Order tracking
57
Electronic Exchange Model
59
E-Commerce Technology Stages
• 1995 – 1999: First Generation
– Establishing a web presence
– Static content
• 1996 – 2000: Second Generation
– Providing interaction
– Dynamic content
• 1999 – 2003: Third Generation
– Supporting transactions
– Internal automation
• 2002 – Today: Fourth Generation
– Transforming process
– External automation
60
First Generation
• Basic technologies are still used:
– Client/server networks
• The networks over which data travel
– Browser
• Application software that lets users request and view web pages
– HTTP protocol
• Standardized rules for exchanging data over the web
– HTML
• Language that guides the display of a requested page
61
Client/Server Network
Source: Information Systems: Creating Business Value, By Huber, Piercy, and McKeown, Wiley 2007 62
Second Generation
• Providing interaction between web page and user requires
dynamic content based on user input and programming
instructions
• The process involves:
– Obtaining input data
– Passing data to the web server
– Executing programming instructions to process the data
• Input data comes from several sources
– Web page header information about the user
– Server resources like the system clock
– Stored data about the user from a cookie
– Data input using an HTML form
63
Third Generation
• Businesses recognized that they must deal with three issues
to be successful
1. Making it possible for customers to find information about
companies, products, and services
2. Making it possible for customers to order and pay online for goods
and services
3. Providing secure and private transactions
64
Search Engines
• Internet search engines make it possible for customers to find
information
– When you search the web, you are really searching a database that
was created from previous web searches
• The main difference in search engines is how the database of
web locations is created and organized
– Web sites are found by a web crawler (scours Internet for content) and
are submitted by humans
• An important consideration is how the database organizes or
indexes the web data – which pages are shown first when you
submit search criteria?
– Index (store and organize) content found during crawling process
– Rank (ordered by most to least relevant) pieces of content
65
Search Engine Process
Source: Information Systems: Creating Business Value, By Huber, Piercy, and McKeown, Wiley 2007 66
Order and Payment Systems
• All e-commerce sites must have components for processing
orders and accepting payments
• Four primary components of a typical e-commerce site are:
– The shopping and ordering system
– The merchant account
– The payment gateway
– The security system
• Most e-commerce systems use a secure HTML order form or
an in-house shopping cart system
– Smaller businesses often use third party merchant accounts like that
available from PayPal
• The shopping cart system is the most popular e-commerce
system for larger businesses where a customer wants to buy
multiple products usually using a credit card
67
Merchant Accounts
• Defined as a bank account that allows the merchant to receive
the proceeds of credit card purchases
• Secure gateway provider is company that provides a network
to process encrypted transactions from merchant’s web site
– It then passes the transactions on to the issuing banks for credit card
approval
– A secure gateway provider will usually provide a payment gateway to
link e-commerce site to the banking system
68
Steps in E-Commerce Process
1. Customer places order on e-commerce web site
2. Payment gateway provider detects placement of an order
– The provider securely encrypts the transaction data and passes an
authorization request to the bank to verify the customer’s credit card
account and available funds
3. Gateway provider returns a response, indicating whether or not
transaction is authorized, to e-commerce merchant (< 3 seconds)
4. Upon approval, e-commerce merchant notifies the user and fulfills the
customer's order
5. Gateway provider sends a settlement request to merchant account’s
bank
6. Merchant account’s bank deposits the transaction funds into e-
commerce merchant’s account
69
Steps in E-Commerce Process
Source; Information Systems: Creating Business Value, By Huber, Piercy, and McKeown, Wiley 2007 70
E-Commerce Security
• The order and payment systems must be secure to protect
both the customer and the merchant
– Most e-commerce security technologies relate to the Transport Layer
Security (TLS) protocol, which allows a client and a server to
communicate in a way that prevents eavesdropping, message forgery,
or tampering
• A server that encrypts data using the TLS protocol is known as
a secure server which uses HTTPS instead of HTTP
• Another security system is the Secure Electronic Transaction
(SET) protocol from Microsoft and MasterCard
• Cookies are an important part of these security systems as
they are used to authenticate users or to hold data to match
the user in the shopping cart
71
Encryption
• Most common method of providing security to e-commerce
transactions is encryption, the process of scrambling a
message so that it is meaningful only to the person holding
the key to deciphering it
– The reverse process, or decryption, converts a seemingly senseless
character string into the original message
• Two primary forms of encryption systems: private key and
public key encryption (more details later)
– A key is used with an algorithm used to encode and decode messages
72
Fourth Generation
• XML and web services are technologies for dealing with
software integration problem
– These technologies are moving us into the realm of
automatic interaction between computers at one business
automatically interacting with computers at another
business
74
Uses for XML
XML Use Description
B2B e- XML provides a tool for exchanging transaction data between applications
commerce with a minimum of human interaction.
Data XML stores data in plain text files; which allows the development of
Storage generic applications to store, retrieve, and display the data.
Basis for XML has been used to create new languages, such as Wireless Markup
new Language (WML), which is used to mark up Internet applications for
languages handheld devices like wireless PDAs and mobile phones
Data Real-world systems often work with data in incompatible formats;
Exchange however, because XML is self-descriptive and usually transmitted as
plain-text, it can be read by many applications.
Increase in Since XML is platform-independent, data can be made available to more
Usefulness applications than just the standard browser. Diverse applications can
of Data access XML files as data sources.
Separate Developers can create “separate concerns” by storing data in separate
data from XML files. HTML can be focused on display that will not require any
HTML changes as data changes.
75
Web Services
• Defined as any software, application, or cloud technology that
provides standardized web protocols (HTTP or HTTPS) to
interoperate, communicate, and exchange data messaging
throughout the Internet
• Platform-independent component that can be
– Described using a standard description language
– Published to a public registry of services
– Discovered using a standard method
– Requested through an API
– Combined with services and procedures to compose an application
• Based on a number of accepted standards that allows
everybody to work on the same basis
76
Primary Web Service Standards
Standard Description
XML The language used by web services for marking the
exchanged data according to its meaning
Simple Object A simple XML-based protocol to let applications
Access Protocol exchange information over HTTP
(SOAP)
77
Web Services
Source; Information Systems: Creating Business Value, By Huber, Piercy, and McKeown, Wiley 2007 78
N-Tiered Architectures
79
N-Tiered Architectures
• N-tier architectures have the same components
– Presentation
• Users access directly, also called front end layer (and client)
– Business/Logic
• Encapsulates business logic, also called middle or application layer
– Data
• External data source to store application data, typically database
server, also called back end layer
• N-tier architectures try to separate the components into
different tiers/layers
– Tier: physical separation
– Layer: logical separation
80
Layers in Tiered Architectures
• Layer can access directly only public components of its
directly-below layer
– Presentation layer can only access the public components in
application layer, but not in data layer
– Application layer can only access the public components in data layer,
but not in presentation layer
• Why?
– Minimize the dependencies of one layer on other layers
– Benefits for layer development/maintenance (separation of concerns),
upgrading, scaling, etc.
– Makes the tier security enforcement possible
• E.g., client layer cannot access data layer directly, but through
application layer, so data layer has a higher security guarding
– Avoid cyclic dependencies among software components
81
1-Tier Architecture
• Standalone application
– All 3 layers are on the same machine
• Presentation, logic, and data layers tightly connected
– Scalability: Hard to increase volume of processing
– Portability: Moving machines may mean rewriting
everything
82
– Maintenance: Changing one layer requires changing other
2-Tier Architecture
Architecture Principles
• Client-Server architecture
• Each tier (Presentation, Logic,
Data) should be independent and
not expose dependencies related
to implementation
• Unconnected tiers should not
communicate
• Change in platform affects only
the layer running on that
particular platform
85
3-Tier Architecture Example
86
N-Tier Architecture
88
N-Tier Architecture
• Advantages
– Scalable due to tier decoupling
– Better and finer security control to whole system
• Enforce security differently for each tier
• E.g., put business and data tier behind firewall for protection
– Better fault tolerance ability
• E.g., cluster or load balance data layer without affect other layers
– Independent tier upgrade without impact other layers
• Layer dependency only on directly-below layer minimizes side
effect of layer’s change on whole system
– Friendly and efficient for development
• Decoupled layers are logic software components by functionality
• Each layer can be assigned to specialized team for that area
– Better reusability
• Logically grouped components and loose coupling among layers
89
N-Tier Architecture
• Disadvantages
– Performance of whole application
• Slow due to hardware and network bandwidth (since more
networks, computers, and processes involved)
– More cost for hardware, network, maintenance, and deployment
90
Performance and Scalability
91
Goals and Approaches
• EC has made system growth more rapid and dynamic
• Want to improve performance and reliability to provide
– Higher throughput
– Lower latency
• Response time
– Increase availability
• Approaches
– Scaling network and system infrastructure
• How are performance, redundancy, and reliability related to
scalability?
– Load balancing
– Web caching 92
Web Server Demand
• The main job of a web server computer is to respond to
requests from web client computers
– Demand is the most important factor affecting speed of site
• Factors in overall demand
– Number of simultaneous users in peak periods
– Nature of customer requests (user profile)
– Type of content (dynamic versus static web pages)
– Required security
– Number of items in inventory
– Number of page requests
– Speed of legacy applications
93
Scalability Approaches
• The ability of a site to increase in size as demand warrants
– Scale hardware vertically
• Increase processing supply by improving hardware (i.e.,
processors, memory, disk space), but maintaining the physical
footprint and the number of servers
– Scale hardware horizontally
• Increase processing supply by adding servers and increasing
physical facility (share workload)
– Improve processing architecture of the site
• Improve processing supply by identifying operations with similar
workloads, and using dedicated tuned servers for each type of
load
94
Scalability Approaches
Vertical Scaling
Horizontal Scaling 95
Load-Balancing Systems
• Load-Balancing switch
– Piece of network hardware that monitors the workloads of servers
attached to it
– Assigns incoming web traffic to a server that has the most available
capacity at that instant in time
96
Web Architecture & Performance
• Web performance and scalability issues
– Network congestion
– Server overloading
Web Server
• Basic web architecture
Client (browser)
97
Web Proxy as Firewall
• Web proxy can serve as intermediate server between clients
and resources the client is trying to access, i.e., web server
– To implement a firewall
– To improve performance (to web page access) through caching
Web Server
Firewall Proxy
Client (browser)
98
Web Caching System
• Caching popular objects is way to improve web performance
• Web caching at clients, proxies, and servers
Web Server
Proxy
Client (browser)
99
Web Caching System Hits
100
Web Caching System Misses
e
tim
101
Web Caching System
• Advantages
– Reduces bandwidth consumption (decreases network traffic)
– Reduces access latency in case of cache hit
– Reduces workload of the web server
– Usage history collected by proxy cache can be used to determine
usage patterns to optimize cache replacement and prefetching policies
• Disadvantages
– Stale data due to lack of proper updating
– Latency may increase in case of cache miss
– A single proxy cache is always a bottleneck
– A single proxy is a single point of failure
102
Web Caching Issues
• Cache replacement
– Web objects have different sizes, accessing costs, and access pattern
– Traditional replacement policies: LRU (Least Recently Used), LFU (Least
Frequently Used), FIFO (First In First Out), etc.
– Key-based using size (evicts largest object)
• LRU-MIN (LRU among objects with largest size)
• Lowest Latency First (objects with lowest download latency)
– Cost-based using factors such as last access time, cache entry time,
transfer time cost, etc.
• Least Normalized Cost (access frequency, transfer time cost, and
size)
• Server-Assisted Scheme (fetching cost, size, next request time, and
cache prices during request intervals)
103
Web Caching Issues (cont’d)
• Prefetching
– Maximum cache hit rate is typically no more than 40% – 50%
– To increase hit rate, anticipate future document requests and prefetch
documents in caches
– Documents to prefetch
• Considered as popular at web servers
• Predicted to be accessed by users soon, based on access pattern
104
Web Caching Issues (cont’d)
• Cache coherency
– Cache may provide users with stale documents
– HTTP supports cache coherence
• GET retrieves a document given its URL
• Conditional GET combines with If-Modified-Since header
• Cache-Control: no-cache indicates that the object should be
reloaded from the server
• Last-Modified is returned with every GET message to indicate last
modification time of the document
105
Web Caching Issues (cont’d)
• Dynamic data caching
– There exists non-cacheable data such as authenticated data, server
dynamically generated data, etc.
– Active cache
• Allows servers to supply cache applets to be attached with
documents, which are invoked upon cache hits to finish processing
without contacting the server
– Web server accelerator
• Resides in front of one or more web servers and provides an API
that allows applications to explicitly add, delete, and update
cached data (both static and dynamic data)
– Data update propagation (DUP)
• Maintains data dependence information between cached objects
and underlying data that affect their values – affected cached
objects are then either invalidated or updated
106