0% found this document useful (0 votes)
4 views

Advanced Distributed Paradigms Notes - Google Docs

The document outlines the fundamentals of client/server architecture and socket programming, emphasizing the roles of processes, operating systems, and protocols in facilitating communication between applications. It explains the distinction between server and client processes, the importance of protocols for data exchange, and provides an overview of Java's Socket API for establishing connections. Additionally, it discusses the significance of standard protocols like HTTP and the challenges in designing and managing client/server interactions.

Uploaded by

khalidagnaber123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

Advanced Distributed Paradigms Notes - Google Docs

The document outlines the fundamentals of client/server architecture and socket programming, emphasizing the roles of processes, operating systems, and protocols in facilitating communication between applications. It explains the distinction between server and client processes, the importance of protocols for data exchange, and provides an overview of Java's Socket API for establishing connections. Additionally, it discusses the significance of standard protocols like HTTP and the challenges in designing and managing client/server interactions.

Uploaded by

khalidagnaber123
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 43

‭Monday September 4th‬

‭Quizzes are scheduled one day after the submission of homeworks.‬

‭Wednesday September 6th‬


‭ 1 - Introduction to the Client/Server Model and Socket Programming‬
P
‭(Programming for Communication)‬
‭Socket programming or network programming‬

‭ he objective of the chapter is to be able to write two applications A and B that do not‬
T
‭necessarily run on the same machine: this brings up the concept of‬‭Information Exchange‬‭(not‬
‭just data, this means that it is not just raw bytes, but making sense of these bytes and‬
‭understanding one another)‬
‭To achieve such an information exchange, this is handled by the operating systems‬
‭The OS is that piece of software that takes control of data. you boot it. It gets loaded by the‬
‭BIOS and takes control to manage the operation of your machine.‬
‭The main motivation behind OS is avoiding redundancy and repeating information so as to‬
‭avoid inconsistencies such.‬
‭There are some common tasks needed by many programs, such as opening files, allocating‬
‭space in memory, reading input from the keyboard, using peripherals, … These are taken care‬
‭of by the OS, so you would able to focus on the main part of your program.‬
‭It is made up of several modules each taking care of specific tasks.‬

‭ ‬‭process‬‭is a program in execution (as it gets loaded‬‭from disk to memory)‬


A
‭Kernel mode‬‭(privileged mode)‬
‭User mode‬‭(non-privileged mode) (user processes that‬‭run from programs that we wrote) no‬
‭direct access to the hardware. As we don’t want our processes to be bothered by low-level‬
‭details.‬
‭System calls:‬‭We issue a call to the system for the‬‭OS to take care of such details. (e.g.‬
‭system calls such as: fopen, malloc)‬
‭The main keyword here is‬‭ease of use‬‭.‬

‭-‬ ‭ sking the networking manager to open a connection to a remote host: we open a virtual‬
A
‭connection (a logical representation of an actual connection open with the other side).‬
‭Again we do not care about the low-level details.‬
‭These system calls that belong to the same family, can be grouped into some kind of‬‭library‬‭.‬
‭From the user perspective (as in programmer) you only care about the‬‭API‬‭.‬
‭For networking management, we can talk about the‬‭networking‬‭library/API‬‭. The “fancy” name‬
‭for it is‬‭The Socket API‬‭. (this name comes from the‬‭idea of abstracting a sort of pipe that‬
‭connects two programs, where whatever you put in one end of the pipe ends up in the other‬
‭end)‬
‭Using the socket API, you do not care about the geographical location of the programs (instead‬
‭a logical location)‬

‭We can now move to this representation:‬


-‭ > Many new concepts:‬
‭- How to locate/identify a process on the Internet?‬
‭●‬ ‭IP address‬‭of the machine on which the process is‬‭running (an IP address identifies‬
‭only the machine (which could have multiple processes running under it) (we do not‬
‭make use of the process ID ((as it is metadata about that process created and managed‬
‭by the processes scheduler)) (so each module has its own variable, own space, and‬
‭hence its own data in order to manage the processes))‬
‭●‬ ‭The‬‭port number‬‭of the process (a logical number)‬‭through which it is identified by the‬
‭networking manager‬
‭●‬ ‭The port number is a 16 bit unsigned integer (0 to 65535)‬
‭The Socket API needs the IP address and the port number of the process and gets you there.‬

‭Friday, September 8th‬


‭ he operating system takes care of multiple details such as accessing the root of each machine,‬
T
‭dealing with potential data corruption and such. These are low-level details.‬

‭-‬ ‭What is a‬‭Server‬‭process?‬

‭-‬ ‭What is a‬‭Client‬‭process?‬

‭ et’s consider a scenario:‬‭Process A‬‭asks through‬‭the networking manager to open a‬


L
‭connection on its behalf with Process B with its specific port number. This system call would be‬
‭something as simple as copen for example (taking IP address and port number). This request is‬
‭sent to the other side and gets to Process B.‬
‭On the‬‭B‬‭side, the process starts and asks its networking‬‭manager that it wants to be known by‬
‭port 1234 for example (through a‬‭bind‬‭) (‬‭Registration‬‭step‬‭). The second step is another call‬
‭from the process to the networking manager saying that it wants it to‬‭listen‬‭on its behalf to‬
‭incoming connection requests(a‬‭blocking call‬‭).‬
‭Process B would have a representation of the open socket with process A and thus sends a‬
‭connection success.‬
‭Process A is assigned a random port by its own networking manager (does not explicitly request‬
‭its own port), which will be directly communicated to the other side.‬
‭B waits for incoming connections, while A sends a connection request. The connection is‬
‭opened‬‭passively‬‭on side B (needs to have a known‬‭port), and‬‭actively‬‭on side A (will be‬
‭assigned a random port).‬
‭Both sides‬‭do not play a symmetric role‬‭(passive vs.‬‭active / explicitly bound to a port vs.‬
‭randomly assigned)‬
‭For a connection to be established, the two sides need to play‬‭asymmetric‬‭roles.‬
‭In this scenario:‬‭Process B is the Server‬‭, and‬‭process‬‭A is the Client‬‭.‬

‭ server is a process that opens connections‬‭passively‬‭.‬


A
‭A client is a process that opens connections‬‭actively‬‭.‬
‭ server IS NOT A MACHINE!‬
A
‭The traditional false definition that a server is a machine, comes from the expectation of the‬
‭server to be robust, have enough processing power, disk space and such; which leads us to the‬
‭idea of a server being a powerful machine. But the servers are the processes that run on them.‬

‭-‬ ‭What is a‬‭Protocol‬‭?‬

‭ nce the connection is established between client and server,‬


O
‭Before making sense of the data, there is a‬‭synchronization‬‭problem between client and‬
‭server side. One of the first problems we are faced with is which side will start sending the‬
‭information. The data flow depends on cases. We need to describe how the communication‬
‭shall proceed (‬‭who‬‭is sending and‬‭who‬‭is receiving,‬‭as well as the‬‭when‬‭), so that they can be‬
‭in-sync. But then we have another issue, which is the “‬‭what”‬‭that is being sent. The way of‬
‭interpreting the data needs to be part of the description of the conversation between both sides.‬
‭It needs to be governed by rules so that both sides are in-sync, understand each other and‬
‭achieve a common purpose.‬
‭The socket API does indeed allow you to open the connection, send and receive. But then it is‬
‭up to the server and client developers to decide what is being sent according to the rules that‬
‭govern the communication between the two. There is hence a third party (role), which comes‬
‭before the client and server roles, which establishes the rules that govern the communication;‬
‭these rules are called the‬‭Protocol‬‭. The role is called‬‭the protocol designer. It is specified in‬
‭natural language (like english).‬
‭The Protocol:‬‭It is the set of rules that govern the‬‭communication between the client and the‬
‭server, and without which the connection cannot make sense and no communication can‬
‭proceed.‬

‭Monday, September 11th‬


‭ he client by default initiates the connection, but there is no rule that says which side should be‬
T
‭sending and which side should be receiving. It is up to the developers, but there needs to be an‬
‭agreement.‬
‭The protocol is a sort of contract between the client and the server and hence, their respective‬
‭developers. So that their respective products communicate. This contract can be seen as‬
‭something horizontal. The client and server should not necessarily speak the same‬‭language‬
‭(for example, it doesn’t matter whether the two processes are written in different programming‬
‭languages (where the programming language is itself a virtual contract between the‬
‭programmer and the compiler (as if you do not respect the programming language, the compiler‬
‭would not understand you)), but the same‬‭protocol‬‭.‬

‭ n analogy:‬
A
‭For a‬‭programming language‬‭, you may master it, but‬‭if you do not have a solution in mind or‬
‭an‬‭algorithm‬‭, you would not be able to write a single‬‭line of code.‬
‭The‬‭Socket API‬‭gives you a powerful way to open connections,‬‭send and receive, but what are‬
‭you going to send and receive, so you need that‬‭protocol‬‭to tell you what to do.‬
‭ ava Socket API Overview (java.net)‬
J
‭A connection has a representation on both sides (client and server) through variables. But as‬
‭they represent the same concept, they are variables of the same type. In OOP, these would be‬
‭two objects of the same class. The class from which we create these two objects is called‬
‭Socket‬‭.‬

‭Classes:‬
‭●‬ ‭ServerSocket (server)‬
‭-‬ ‭ServerSocket (int port)‬
‭-‬ ‭Socket accept()‬
‭●‬ ‭Socket (client, server)‬
‭-‬ ‭Socket (String hostNAmeOrIP@, int port)‬

‭ ‬ ‭From the client side:‬



Socket s = new Socket (“10.10.10”‬‭(IP address of server‬‭side)‬‭
‭ , 1234‬‭(port number of‬
‭the server)‬‭ );‬
‭This s represents a‬‭connectionToServer‬
‭(this assumes that on the server side, we already have a process on that IP and port, willing to‬
‭accept a connection)‬

‭ ‬ ‭On the server side:‬



ServerSocket ss = new ServerSocket (1234);‬‭(creating‬‭a server (not listening yet)‬

‭and registering through port 1234) (this might throw a bind exception if the port is already taken‬
‭by another server)‬
‭This class offers a method called accept‬
Socket s = ss.accept();‬

‭This s represents‬‭connectionFromClient‬

I‭nstead of having dedicated send and receive methods, they make use of Java.IO. library with‬
‭encapsulated two objects of types inputstream and output stream.‬
‭Once we have these two objects, everything else is simple IO operations as dictated by the‬
‭protocol. (we no longer care about the networking infrastructure and such)‬
‭Client/Server Application Example‬
‭●‬ ‭Purpose:‬
‭This client/server application shall allow the client to download and upload files from/ to‬
‭the/server (Fx application for File Exchange)‬
‭●‬ ‭Protocol:‬
‭1.‬ ‭The client opens a connection with the server and‬‭informs‬‭the server whether it wants‬
‭to download or upload a file using a header‬
‭2.‬ ‭If the client wants to download a file then‬
‭2.1. The header will be as the following:‬
download[one space][file name][Line Feed]‬

‭2.2. Upon receiving this header, the server searches for the specified file.‬
‭2.3. If the file is not found then the server shall reply with a header as the following:‬
NOT[one space]FOUND[Line Feed]‬

‭ he notion of the‬‭header‬‭:‬
T
‭We always need some sort of metadata that explains what the data is. We need to send a‬
‭description of what we are sending, before we actually send anything. So the protocol describes‬
‭what this metadata looks like. Hence, how the header looks is specified at the level of the‬
‭protocol.‬

‭Do something at home (source code in academix)‬

‭Monday, September 18th‬


‭ hen a protocol gets so popular that it gets adopted by everybody, it becomes a‬‭standard‬
W
‭protocol. IEEE is an example of entities that standardize protocols. When it comes to the‬
‭application layer, IETF (Internet Engineering Task Force) is one of the main authorities that‬
‭standardizes such protocols (for example, HTTP (Hypertext Transfer Protocol))‬

‭ ata flows between different processes according to different protocols. But if we analyze such‬
D
‭traffic of data flow in the internet, we would find that around 90% of the traffic is done adhering‬
‭to HTTP protocol. Your browser allows you to browse the Internet using the HTTP protocol (an‬
‭HTTP client, talking to HTTP servers out there), it is end-to-end, implemented at the level of‬
‭your browser and on whatever server out there.‬

I‭nternet‬‭is the underlying‬‭infrastructure‬‭(switches,‬‭routers, fiber optics)‬‭- The port of the traffic‬


‭on Internet adhering to the HTTP protocol is‬‭Web‬‭(equivalent‬‭to‬‭HTTP‬‭).‬

I‭n SMTP (Simple Mail Transfer Protocol), the protocol defines the sender, cc, subject, body, …‬
‭in order to enable a sender and a receiver to exchange mail. (default port 25)‬
I‭n FTP (File Transfer Protocol), we have not only the number of bytes being exchanged that are‬
‭specified at the level of the protocol, but also the file types for example. (default port 21)‬

‭ elnet (the void protocol) doesn’t specify anything, as other than establishing a connection, it‬
T
‭allows the client to send whatever command to the server. (default port 23)‬

‭An RFC (Request For Comments) is a draft of a protocol, awaiting confirmation.‬

‭ im Berner Lee (father of the Web (WWW)) who worked at CERN, worked on HTTP 0 .9 (which‬
T
‭was called the one-line protocol)‬

‭ ll protocols have a default port for the server (for example Fx (and HTTP) is bound to port 80)‬
A
‭The default ports are not reserved for those specific protocols. (it’s more of a convention,‬
‭especially for the standard protocols)‬

‭Wednesday, September 20th‬

‭P2 - Integration: Service Oriented Model and Programming‬

‭ hroughout the history of computing, programs started on standalone machines. Engineers and‬
T
‭computer scientists realized that there was a need for going outside the box and into another.‬
‭As we strive to do even more with less, and being more productive:‬

‭Issues related to the client/server model:‬


‭●‬ ‭Designing a specific protocol, or in the best case, adopting and adapting an existing‬
‭(maybe standard) protocol‬
‭●‬ ‭Managing connections, as well as corresponding streams, on both sides.‬
‭●‬ ‭Implementing the protocol, including (application) error management, on both sides.‬
‭Also, the more services/functionality should be provided, the more tedious the whole process‬
‭becomes. (this affects not only the protocol, as well as the implementation of both the‬
‭client/server)‬
‭This motivates the need for a better paradigm…‬

‭ ur objective:‬
O
‭Can we imagine a programming paradigm, which provides us software developers with the‬
‭luxury of‬‭invoking remote services/functionalities,‬‭as if they were local‬‭? (In traditional‬
‭programming, when you call a function, this function is implemented within your application‬
‭(might be in a different library, package, class, ….) but still they would be imported and will be‬
‭part of the process at runtime. What we are talking about here is having this functionality not at‬
‭all as part of your application (it is remote) but still we would like to be able to invoke it, while it‬
‭gives the application developer using the technology, the impression that the functionality is‬
‭there locally hence we need the technology and a layer underneath, that makes the necessary‬
‭code that fills the gap and makes the invocation reach the other side.)‬
‭ uch a paradigm should hide all the programming hassle and details mentioned above.‬
S
‭This would:‬
‭●‬ ‭Increase‬‭developer productivity‬‭(leveraging the abstraction)‬‭(so you expand your own‬
‭code, without having to pay the price for all the remote infrastructure)‬
‭●‬ ‭Promote software integration for:‬
‭○‬ ‭Richer functionality‬‭(extending the functionality‬‭of your application, without‬
‭doing it yourself, through an external application (‬‭e.g.‬‭Translation or weather‬
‭services))‬
‭○‬ ‭Higher performance‬‭(extending your application by‬‭leveraging applications that‬
‭perform the heavy computations on your behalf (‬‭e.g.‬‭When you don’t have the‬
‭necessary processing power to provide a certain kind of computation for‬
‭example(you have the formula or algorithm, but you just lack the computing‬
‭power))‬

‭ loud computing wouldn’t be possible without such a paradigm (externalizing workloads to have‬
C
‭applications or part of them running on premise/in the cloud)‬

‭Read about:‬
‭●‬ ‭Serialization‬
‭●‬ ‭XML (eXtensible Markup Language)‬
‭●‬ ‭JSON (JavaScript Object Notation)‬

‭ rainstorming:‬
B
‭Let’s say we are trying to design and develop a traditional client/server application that allows‬
‭the client to perform the four basic math operations (+ - x :) on the server.‬
‭-> Instead of thinking in terms of headers, commands, we want to adopt a new mindset, we will‬
‭think in terms of‬‭functions‬‭. These functions would‬‭be implemented on the server side and I‬
‭would be invoking (through simple calls) and getting back the results.‬
‭We need a‬‭generic‬‭command (function name corresponding‬‭to some functionality on the‬
‭server), as well as a way to tell the other side, the number of parameters it should expect, their‬
‭types (some sort of metadata about this). This is where‬‭serialization‬‭comes in (or marshaling)‬

‭Friday, September 22nd‬


‭ hen we have something like:‬‭
W float sum (float, float);‬
‭The‬‭Business Implementation‬‭is‬‭Remote.‬‭(which would‬‭be something like:‬‭ return x+y;‬
‭(the real implementation))‬
‭The function we call as part of our application, only shares the interface with the actual function‬
‭(so that it gives us the impression that we are calling that remote function,‬‭locally‬‭)‬
‭On the local level, what we have can look something like: (a sort of fake implementation)‬
Float sum (float x, float y) {‬

Open a connection to …‬

‭et streams‬
G
Send request‬

}‬

‭Which is also hidden to you. This fake implementation acts as a bridge between the actual‬
‭business implementation elsewhere and your code.‬
‭The fact that the prototype (‬‭Application Programming‬‭Interface(API)‬‭) is identical, is what‬
‭gives this impression that we are invoking the actual implementation.‬

‭●‬ ‭Having an API is not synonymous with having Remoting.‬

‭ his fake implementation is called:‬‭Proxy Implementation‬‭(‭o


T ‬ r‬‭Stub)‬
‭The contract between the service provider and the service consumer (programmer) is the API.‬

‭ e need to make the distinction between:‬‭RPC (Remote‬‭Procedure Call)‬‭being the concept of‬
W
‭invoking remote functions as if they were local. And the technology which provides us with the‬
‭generic protocol‬‭, which generates the‬‭proxy code‬‭.‬

‭ he first attempts at making some kind of implementation of this RPC were‬


T
‭language-dependent. Because the serialization process was initially specific to a programming‬
‭language. As serialization is dependent on the protocol, and hence the whole technology was‬
‭language-dependent.‬
‭We then moved into the need for designing a serialization method that is language independent.‬
‭XML is a language that sits at equal distance from all programming languages, which is why it‬
‭acts as a bridge between the different programming languages.‬

‭Monday, September 25th‬


‭ e now speak of‬‭service provider/producer‬‭(the side that provides the business‬
W
‭implementation of the API) and‬‭service consumer.‬
‭The service‬‭API/Contract‬‭is what should be agreed‬‭upon, or rather what the service provider‬
‭publishes its API. and it is up to the consumers, if interested in taking advantage of this service,‬
‭they need to adhere to this API in order to know the kind of calls they should use to consume‬
‭the service.‬
‭This same API has another implementation, which is the client stub/proxy implementation. This‬
‭is what gives the developer on the client side the impression that they are invoking the actual‬
‭functionality on their side. This proxy gets generated by the API‬‭(specific to it)‬‭, and must‬
‭marshall the calls and parameters according to this very API.‬
‭The‬‭service provider‬‭, just like the service consumer,‬‭both leverage the underlying RPC‬
‭technology. If you were to develop the RPC technology itself, then we are talking about the‬
‭technology provider‬‭.‬
‭These aspects of the technology are provided at the level of the‬‭RPC Library‬‭. If you are the‬
‭technology provider, you design the generic protocol, which also includes a way to do‬
‭marshaling and unmarshalling. This technique is then applied at the level of the stub (API‬
‭specific), leveraging what is provided by the technology.‬
‭There is a special compiler provided at the level of the technology provider, which bridges the‬
‭gap between the API and the stub implementation.‬

‭ he stub, other than marshaling and unmarshalling, also needs to know the location of the‬
T
‭server skeleton in order to open a connection and such.‬

‭RPC Runtime Flow:‬

‭ evelopment Process‬‭(of an application leveraging‬‭this paradigm)‬


D
‭There are two approaches: API-first / Code-first‬
‭Regardless of the approach, the server skeleton and the client stub are generated automatically‬
‭from the service API/contract using the appropriate tool provided by the chosen technology.‬
‭ PI-first:‬
A
‭As its name implies, this approach consists of designing the service API/contract first. Then,‬
‭server-side code, as well as client-side code are created. From a design perspective, it's always‬
‭a good practice to specify the API/contract before delving into the implementation.‬

‭ his service API, if the RPC technology was Java based for example, would be an interface or a‬
T
‭couple of interfaces, if C, it would be a header file.‬
‭There are however some other languages that are used to specify interfaces, such as‬
‭XML(WSDL).‬

‭ ode-first:‬
C
‭In this approach, developers start by coding the service business implementation, or at least‬
‭defining its business interface in a target programming language, such as Java, Python,‬
‭JavaScript, etc. Then, they use an appropriate tool for the chosen technology to generate the‬
‭service API. This won't be possible if such a tool doesn't exist for the chosen technology and‬
‭target programming language.‬
‭ he API generator creates an explicit interface from an implicit interface that is present at the‬
T
‭level of the business implementation.‬
‭The service developer in this code-first approach, implicitly plays the role of service designer as‬
‭well.‬

‭ omework on Sunday (submission in academix)‬


H
‭Quiz on Monday‬

‭Wednesday, October 4th‬


‭ e need technology to generate the stub/skeleton.‬
W
‭We will be given a taxonomy of technologies (a way of categorizing them):‬

‭ n RPC technology is, by default,‬‭language-dependent‬‭.‬‭As we are passing parameters,‬


A
‭invoking functions, and stuff like that, which has different representations depending on the‬
‭language.‬
‭RPyC:‬‭it’s a technology that is specific to python‬‭and assumes that both sides are implemented‬
‭in python (service provider and consumer) (the parameters gets marshaled in a way that is‬
‭specific to python; and assumes that the other side is also in python)‬
‭Distributed Ruby‬‭(for Ruby)‬
‭RMI‬‭(specific to Java) (Remote Method Invocation):‬‭It uses a protocol called JRMP (Java‬
‭Remote Method Protocol) (the marshaling here is based on Java SerDes(Serialization and‬
‭Deserialization)‬
‭Go RPC‬‭(for Go language)‬
‭RPC‬‭(implementation for C language) (make the difference‬‭between the general concept, and‬
‭the technology specific to C)‬
‭For all of these, which are language-dependent, both sides are forced to use the same‬
‭language.‬
‭ ut then, as programmers wanted more flexibility, there are‬‭Language agnostic‬
B
‭(neutral/independent) ones. The most famous one is‬‭CORBA‬‭(Common Object Request Broker‬
‭Architecture). This technology stands between objects in the network, and acts as a kind of‬
‭broker. In order to achieve this, we focus on the‬‭marshaling‬‭process, and how the‬‭interface‬‭is.‬
‭CORBA created‬‭IDL‬‭(Interface Definition Language):‬‭The elements of the interface include how‬
‭the types are represented as well.‬
‭The protocol for CORBA is called IIOP (Internet Inter ORB(Object Request Broker) Protocol)‬

‭ hen came‬‭XML‬‭(eXtensible Markup Language), a language‬‭just like IDL ( which are not for‬
T
‭imperative or OOP) but instead are‬‭descriptive‬‭languages‬‭allowing you to describe (data,‬
‭functionality, services …) It represents a new generation that came with the rise of language‬
‭agnostic technologies. It allows you to structure data (allow metadata).‬
‭Something like:‬
‭●‬ ‭<fname> …….. </fname>‬
‭●‬ ‭
<lname> …….. </lname>‬
‭These are tag-based (‬‭Markup‬‭). It is‬‭eXtensible‬‭because‬‭these tags are user defined.‬
‭We need to provide a‬‭schema‬‭(sort of dictionary)‬‭that‬‭describes our attributes (what is valid,‬
‭whether an attribute is simple or complex (for example having a <address> tag which includes‬
‭<street>, <city> and such). We also describe the types of each attribute for rigor, so that we help‬
‭the parser for it to help us tell us if our file is okay.‬
‭So we have XML + our own schema, which would give us a specialization of XML, that yields a‬
‭specific XML-based language‬
‭Example of a schema:‬
<method>‬

<name> sum </>‬

<param>‬

<name> …. </>‬

<sthg> …. </>‬

</param>‬

</method>‬

‭WSDL (Web Service Definition Language) (The basis is HTTP, hence Web)‬
‭(which is why these are classified under Web services)‬

‭ TTP needed to be augmented which gave us SOAP (Service Object Access Protocol) (which‬
H
‭is XML-based)‬
‭So we have XML/SOAP as this specialization of XML.‬

‭ ayers of abstraction:‬
L
‭WSDL‬
‭XML‬
‭SOAP‬
‭HTTP‬
‭Socket API‬
‭ OAP (which is included as the body of the HTTP request) is used to specify details about the‬
S
‭names of functions, arguments, …‬

‭Friday, October 6th‬


‭ OAP is an extension of the HTTP protocol (as SOAP messages get encapsulated in HTTP‬
S
‭messages). So HTTP along with SOAP are the protocols that are used within XML/SOAP‬
‭We say SOAP over HTTP‬

‭ TTP supports several verbs, but XML/SOAP uses only one HTTP verb/command/method‬
H
‭which is‬‭POST‬‭.‬

‭It has an HTTP header and a body of the request.‬

‭ he‬‭header‬‭is made of several lines for richer metadata‬‭and information to be communicated‬


T
‭from the client side so the server side such as encoding used, content type to be accepted,‬
‭content length‬
POST http://localhost:9000/calculator HTTP/1.1‬‭(first‬‭mandatory line, made of the‬

‭Verb [Space] URL [Space] Version [CRLF].‬
‭The URL is a standard to specify how and where to reach a resource in the whole internet (it is‬
‭made of a server serving that resource, whose‬‭location‬‭we need (‬‭IP or DNS name‬‭and‬‭port‬
‭number‬‭), as well as which‬‭protocol‬‭the server speaks,‬‭as well as a‬‭specific identifier‬‭of that‬
‭resource‬
(‭ IP, Port, Protocol, ID) are the elements making up the URL. The way in which these are‬
‭expressed has been standardized:‬‭Protocol://IP:Port/ID‬‭)‬

‭ hen we have a number of lines that represents a variable number of attributes all following the‬
T
‭format of‬‭Attribute: ___ [CRLF]‬

‭ccept-Encoding: gzip,deflate‬
A
Content-Type: text/xml;charset=UTF-8‬

SOAPAction: ""‬

Content-Length: 321‬

Host: localhost:9000‬

Connection: Keep-Alive‬

User-Agent: Apache-HttpClient/4.5.5‬

‭The delimiter is a blank line‬

I‭n the body, we can have anything. Its type is specified earlier as part of the header in the‬
‭Content-type attribute.‬
‭In the case of XML/SOAP, the body is following the SOAP Protocol and is XML-based.‬

‭soapenv:Envelope‬
<
xmlns:soapenv="http://schemas.xmlsoap.org/soap/envelope/"‬

‭mlns:prov="http://provider.calculator.xs.integration.paradigms.sse.au‬
x
i.ma/">‬

<soapenv:Header/>‬

<soapenv:Body>‬

‭(The stub generated this:)‬
<prov:computeAll>‬

<arg0>7.0</arg0>‬

<arg1>5.0</arg1>‬

</prov:computeAll>‬

</soapenv:Body>‬

</soapenv:Envelope>‬

‭ he reply from the server side:‬


T
‭HTTP has multiple responses (status codes) to express the outcome of the reply: such as ok‬
‭(using 200 OK), redirection, ok but no sufficient access….‬

‭TTP/1.1 200 OK‬


H
Transfer-encoding: chunked‬

Content-type: text/xml; charset=utf-8‬

<S:Envelope xmlns:S="http://schemas.xmlsoap.org/soap/envelope/">‬

<S:Body>‬

<ns0:computeAllResponse‬

xmlns:ns0="http://provider.calculator.xs.integration.paradigms.sse.aui‬

.ma/">‬

<return>‬

‭(We notice marshaled attributes:)‬
<sum>12.0</sum>‬

<difference>2.0</difference>‬

<product>35.0</product>‬

<ratio>1.4</ratio>‬

</return>‬

</ns0:computeAllResponse>‬

</S:Body>‬

</S:Envelope>‬

‭ evelopment process:‬
D
‭In order to use API-first, we will need to design the API, and hence master WSDL.‬

‭In our case, we will be focusing on Code-first.‬

‭Log prints server side‬

‭Webservice‬‭from‬ ‭
@ javax.jws.WebService‬‭(this annotation‬‭is used to support the‬
‭XML/SOAP technology)‬

‭ fter instantiating an instance from the service, it only gets‬‭published‬‭once we make use of the‬
A
‭Endpoint.publish method:‬
Endpoint.publish(URL, calculator);‬

‭(this publishes the calculator object under this URL.‬
‭So the client side only needs the WSDL file that‬‭describes‬‭the interface (API), as well as the‬
‭location‬‭(URL))‬
‭This URL is used to reach that service.‬

‭ hen we want to develop any application, we need a package manager that fetches the‬
W
‭libraries, dependencies and such for that specific application. We make use of build tools: In‬
‭Javascript we use NPM, its equivalent in Java is Gradle / Maven.‬

‭Monday, October 9th‬


‭Generate WSDL and Java server stub code:‬
‭●‬ ‭Build the code:‬
./gradlew build‬

‭●‬ G
‭ enerate the CalculatorService WSDL, as well as skeleton code, from‬
‭ma.aui.sse.paradigms.integration.xs.provider.Calculator class:‬
wsgen -wsdl -cp build/classes/java/main/ -d build/classes/java/main/‬

-r src/main/resources/‬‭(the WSDL file is put here)‬

ma.aui.sse.paradigms.integration.xs.calculator.provider.Calculator‬‭(path‬

‭where the compiled code of the class is, as well as any additional dependencies)‬

‭●‬ C
‭ hange the service location url within the generated‬
‭src/main/resources/CalculatorService.wsdl (<soap:address‬
‭location="REPLACE_WITH_ACTUAL_URL"/>) to: http://localhost:9000/calculator‬

‭ he WSDL file that is generated represents our API.‬


T
‭This WSDL file is the only thing that the client side developer needs.‬
‭The client side developer needs to compile this WSDL file to the language they are using, to‬
‭generate the stub.‬
‭In dynamic languages that are not‬‭strongly typed‬‭(where‬‭you don’t need to declare types‬
‭explicitly), the stubs can be compiled at runtime.‬

‭ ESTful (RS) Web Services‬


R
‭REST (REpresentational State Transfer) is not a technology per say like XML/SOAP, it is‬
‭considered instead as an architectural style: a set of requirements, recommendations and‬
‭guidelines to develop an application (expose services and consume them). It doesn’t specify an‬
‭additional protocol, marshaling and unmarshaling techniques and such.‬
‭REST comes with the following architectural properties / non-functional requirements:‬
‭●‬ ‭Performance (measure from the user perspective as the‬‭response time‬‭, from the other‬
‭side (provider), it also includes the processing power,‬‭number of transactions you are‬
‭able to process per second‬‭)‬
‭●‬ ‭Scalability (related to performance but different; it is the ability to maintain and preserve‬
‭performance when the load increases at an acceptable or linear cost)‬
‭●‬ ‭Simplicity (we enjoy and appreciate the simplicity of REST)‬
‭●‬ ‭Modifiability (should allow to modify resources)‬
‭●‬ ‭Portability‬
‭●‬ ‭Reliability‬

‭To fulfill these requirements, REST defines a set of constraints / design principles:‬

‭‬ C
● ‭ lient/Server architecture‬
‭●‬ ‭Statelessness (as opposed to statefulness‬
‭We have‬‭Stateless‬‭and‬‭Stateful‬‭.‬
‭Between a client and a server, over an interaction or conversation, we have a set of‬
‭requests and responses. For a given interaction, the question is, does the server‬
‭remember (preserve/recall) what has happened with that client or not (preserve the state‬
‭of the conversation: what has happened so far)!‬
‭ or example in an e-commerce application, we have a shopping cart (the question is‬
F
‭whether the server actually remembers what you have chosen so far, or is it the client‬
‭that keeps track of what you have chosen) This consists of the‬‭state‬‭of the interaction‬

‭ rawbacks of Statefulness include: Resources at the level of the server (as the more‬
D
‭clients you have the more state you need to preserve (scalability concerns)). We also‬
‭need load balancing‬
‭In statelessness we do not keep any data which helps with scalability. (do not care about‬
‭data replication and synchronization issues)‬

‭‬ C
● ‭ acheability‬
‭●‬ ‭Self-descriptive messages‬
‭●‬ ‭HATEOAS: Hypermedia As The Engine Of Application State‬

‭ o class on wednesday or friday‬


N
‭Make up sessions this saturday and next saturday 10:30‬

‭Make-up Class Saturday, October 14th‬


‭ EST is just a style- designed with HTTP‬
R
‭REST is not RPC because it does not specify API language‬
‭HTTP and REST goes hand in hand‬
‭GET is the same in both rest and http- allows fetching resources from server side‬
‭GET should have no side effect on the server side- shall not change anything‬
‭Post is supposed to let the client to post content on server side(create)‬
‭part of this style of rest would refer to collections‬
‭(Get http: //aui.academix.com/courses/1)here I am asking to get the collection, the one that has‬
‭id of 1‬
‭Post http: //aui.academix.com/courses/ give attributes values, and the body of request‬
‭(Json/xml)‬
‭{"name": "Adv. and...", "code": ...} It is not up to me to specify the id‬
‭Patch is used to modify/update‬
‭Patch http: //aui.academix.com/courses/2‬
‭change the name--> {"name": "Advanced.."}‬
‭If i want to override the whole thing, I need to use put‬
‭Put http: //aui.academix.com/courses/2 all attributes should be changed‬
‭Delete http: //aui.academix.com/courses/2 the body will be empty‬
‭2 centric concepts in rest: Record/ collection (creating records under collections)‬
‭In rest:Post/Get/Patch/Put/Delete‬
‭Create, Retrieve, Update(partially, fully), Delete: CRUD <--> Databases‬
‭suitable for data oriented applications/ scenarios within apps‬
‭meaning when: sum(x,y); --> this is not data oriented, this is just processing/ sophisticated algo‬
‭to deliver a service‬
‭XML is not data-oriented semantically speaking, the intention is to invoke a function.‬
f‭or data oriented we use REST otherwise use XML‬
‭We can have something like this:‬
‭GET http: //aui.academix.com/courses?name=CSC*&credits=3 the response can be in Xml or‬
‭JSON or some other format‬
‭Select * from Course where name like "CSC*" and credits=3;‬
‭ex:{[{"name":"...", "code": ".."},{},...]}‬
‭RESTdoes not respect all the principles. so not all web services should be considered RESTful‬
‭OpenAPI is not part of REST, it is a choice!‬

‭If we want to fetch something in brands: GET url/brands/7 - brands here is a collection‬

‭GET should have no effect on server-side‬

‭Monday, October 16th‬


‭ situation is suitable for REST if the application is‬‭data-driven‬‭(CREATE RETRIEVE UPDATE‬
A
‭DELETE - this naturally corresponds to the POST method of HTTP, GET, PATCH/PUT‬
‭(depending on whether the update is partial or full), and DELETE) - for example a catalog‬
‭management for an e-commerce app.‬

‭ e are looking at a case study of a calculator app. We can make use of REST with OpenAPI‬
W
‭(here, there is no natural mapping between its method and HTTP methods) (as it is‬
‭Processing-driven‬‭)‬
‭For server-side we will use Spring Boot App Server (IOC Container) (This container it is running‬
‭within JVM which is running within the OS) Spring exposes a Web Server making use of HTTP,‬
‭responds to REST requests and maps them to the methods we expose (invoke them). This will‬
‭generate our OpenAPI to get an API, then a Stub generator makes use of it to generate a‬
‭Python stub.‬

I‭f we want to make use of the add method, our request would be like:‬
GET http://……:8080/calculator/add?x=5&y=7‬

‭Decorator/marker/annotation are the same term‬

‭ e annotate our app using @RestController which instantiates it on our behalf and starts‬
W
‭listening for requests, and takes note that this component is mapped to /calculator‬

‭ allback methods are not called by us directly, they are called by spring so that they call us‬
C
‭back (which we expose for spring)‬
‭We only specify the what, not when and how (which is taken care of by spring)‬
‭IoC stands for Inversion of Control‬

‭ EST is a much simpler solution and more rigorous.‬


R
‭The natural mapping to REST verbs for data-driven applications.‬
‭Friday, October 20th‬

‭P3 - Performance: Multithreaded and Asynchronous Programming‬

‭ erformance is measured from a user perspective using the following metric: Response time‬
P
‭(they want an acceptable, if not good or very good response time (less than 300 ms))‬
‭Increasing processing power is one of the ways, but it is costly. So we need to optimize at the‬
‭software level first.‬
‭One of the techniques at the software level is‬‭asynchronous‬‭programming‬‭(other techniques‬
‭exist such as caching)‬
‭●‬ ‭Asynchrony as means for supporting performance‬
‭Before getting into this subject, we will cover other traditional ways to support performance‬
‭Single-threading:‬
‭-‬ ‭Statements execute synchronously (statement i+1 never executes before statement i‬
‭terminates execution), one after the other (a set of statements, with one line of‬
‭execution)‬
‭There are cases though, where statement i+1 doesn’t depend on the outcome of‬
‭statement i. They might be completely independent that they would even be able to be‬
‭executed in parallel. But in this single threaded concurrency model, even though from a‬
‭logical perspective you would want them to run in parallel, the fact that you are using this‬
‭model stops you from doing so.‬
‭-‬ ‭Statements are‬‭blocking‬‭(they block what comes next)‬
‭-‬ ‭One‬‭call stack‬‭is used to keep track of where we are‬‭(when you call a function, this call‬
‭stack is maintained by the OS by each process to keep track of where we were each‬
‭time we call a function so that we are able to come back) (One stack keeps track of one‬
‭line of execution, so in case we want to have multiple lines of execution we would need‬
‭multiple call stacks.)‬
‭-‬ ‭The function that is currently executing: pushed on the top of the stack‬
‭-‬ ‭Its stack trace: elements on which it is stacked, all the way to the main‬
‭-‬ ‭Where it should return within its calling function‬
‭-‬ ‭Once it returns, it gets popped from the top of the stack‬
‭-‬ ‭Synchronous/Blocking I/O is a huge waste of CPU time!‬

‭ hen we have a certain DMA (Direct Memory Access) operation f1 for example that takes too‬
W
‭much time. There is an‬‭opportunity‬‭to run these subsequent‬‭statements (as long as the CPU is‬
‭idle and that these statements do not depend on f1)‬

‭ ulti-threading:‬
M
‭It is based on having an independent call stack per line of execution.‬
‭ he main characteristic of a thread is that a thread shares the same memory, in opposition to‬
T
‭processes.‬
‭In order to know when an application requires multithreading, this needs support from the‬
‭runtime (our application) (for example JVM (Java Virtual Machine and the JRE Java Runtime‬
‭Environment) understands the Java program than the process)‬

‭ hen we add multithreading to our application, this has no incidence on the rules of the‬
W
‭communication (so there is no need to change or upgrade the‬‭protocol‬‭)‬

Start()‬‭creates a thread and calls‬‭


‭ run()‬

‭ ulti-threading:‬
M
‭Based on different lines of execution (threads), sharing memory (heap)‬
‭E.g. To handle several requests/clients concurrently‬
‭Requires:‬
‭Programming language support to request thread creation‬
‭Runtime support for actual thread creation and management‬
‭An independent call stack per thread to keep track of where we are‬
‭Leverages thread interleaving by the runtime to optimize CPU usage‬
‭I/O waiting threads get preempted (by the runtime)‬
‭Ready threads get executed concurrently (by the runtime)‬
‭Thread safety / state consistency is the responsibility of the developer!‬
‭Within each thread, the execution is still synchronous: one line after the other‬

‭Monday, October 23rd‬


‭ verytime we try to check for a port in order to open a connection, this operation would take a‬
E
‭few milliseconds (years from the perspective of the CPU), so doing this in a non-blocking‬
‭manner would greatly improve performance.‬
‭ = new Socket (ip+i,p)‬
s
Printf (‘Success!’);‬

s.close ()‬

‭For the example of java, we would wrap this within a run method of the start ()method as part of‬
‭the Thread class‬

‭We would want to be able to call the constructor (s= new Socket …) without blocking‬

‭ or this to be done asynchronously, we would need to have an additional parameter that would‬
F
‭tell us once we have the connection ready, this is the logic to apply on the result once the result‬
‭is ready (in the future). This recipe would be in the form of a function definition (that would be‬
‭eventually called (thanks to the support of the runtime) (‬‭a callback‬‭))‬
af1(..., r=>{});‬

‭f1(...,cb1);‬‭//(cb1 being the function definition)‬


a
af2(...,cb2);‬

af3(...,cb3);‬

‭Here, af2 for example doesn’t depend on anything from af1.‬
‭We would need 3 threads (3 lines of execution)‬
‭There is no necessary order of which function finishes first and such.‬

‭Our lines of execution:‬


‭l1‬ ‭cb1‬

‭l2‬ ‭cb2‬

‭l3‬ ‭cb3‬

‭ et’s assume cb2 finished first and is ready.‬


L
‭For these callback functions, we would have a‬‭callback‬‭queue‬‭(waiting over the call stack use.‬
‭So this acts as a buffer)‬

‭cb2‬ ‭cb1‬ ‭cb3‬

‭This callback function that is ready would need to be pushed into the‬‭main call stack‬

‭cb2‬

‭main‬

‭Event loop (async function terminating, its callback function is invoked, …..)‬
I‭n multithreading (with multiple call stacks), if you launch many threads you end up having so‬
‭many call stacks which consumes your memory. In this model, we want to only have one call‬
‭stack.‬

‭ hen a callback gets to the call stack, the subsequent callback cannot get into this call stack‬
W
‭until the first one terminates. So here for example cb1 would be waiting for cb2. In case a‬
‭callback is performing some heavy processing (not an async I/O operation), we would have cb1‬
‭waiting forever. In this case, having such a programming model with only one call stack is not‬
‭appropriate for these kinds of applications (heavy processing applications). These runtimes (for‬
‭example nodeJS as part of JS), are not suitable for such applications. But for most applications‬
‭that we would be working with (like in our projects (capstones, internships) basic common‬
‭needs (HR systems)), can be done using Node as most scenarios are just basic I/O).‬
‭This model is for I/O intensive apps.‬

I‭t is more efficient (consumes less memory) than non I/O intensive apps, as it only makes use of‬
‭a single call stack which is enough.‬
‭If we have multiple asynchronous functions, each calling other asynchronous functions within‬
‭their callback functions… =>‬‭callback hell‬

‭Wednesday, October 25th‬


‭ allback Hell‬
C
‭Callbacks are key to asynchronous programming‬
‭But, what happens when the callback result of an asynchronous call needs to be passed to a‬
‭second asynchronous call?‬
‭-‬ ‭Even worse, what if the callback result of the second asynchronous call needs to be‬
‭passed to a third asynchronous call, and so on…‬
‭What about errors?‬
‭We end up having deep and ugly levels of nested callbacks‬
‭●‬ ‭Difficult to trace and manage‬
‭●‬ ‭Known as the‬‭callback hell‬

(‭ using callback-based asynchrony, functions do not return, so the result can only be accessed‬
‭within its callback function. Which is why we have so much indentation and confusing code)‬

I‭n the callback, we might have two parameters, first an error in case the async function doesn’t‬
‭terminate properly, and another one if it terminates properly and yields a result.‬

-‭ > Callback-based asynchrony is bad for chaining and error handling‬


‭-> a better asynchrony paradigm is needed!‬

‭The root cause of the problem when it comes to async functions is the fact they cannot return‬
‭ e might return a pointer to the result (but not in memory (an address space), but instead in‬
W
‭time(so that when the result is available in the future, we get access to it)). So it is not the result,‬
‭but a‬‭promise‬‭of the result.‬

‭ his async function we are thinking of, launches in a non-blocking call. And it returns a promise‬
T
‭that it will return a result once it is ready.‬
‭Pseudo code:‬
‭launch execution in a different thread‬
‭Create and return a promise‬

‭romise = asyncFunc(data);‬
P
r’= Promise.then(r =>{f(r’)})‬‭(we have the promise,‬‭and once it is fulfilled (result of‬

‭the async function completes execution and its result is returned, then (the callback))‬

‭ his would look something like this:‬


T
asyncFunc1(data).then(r1=>{return r2})‬

.then(r2=>{return…})‬

.then(r3=>{})‬

…‬

‭(no indentation)‬

‭Promises‬‭let asynchronous methods return values like‬‭synchronous methods‬


‭●‬ ‭But instead of immediately returning the final value, the asynchronous method returns a‬
‭promise to supply the value at some points in the future‬
‭At each point of time, a promise is in one of three‬‭states‬‭:‬
‭●‬ ‭Pending‬
‭●‬ ‭Settled‬
‭-‬ ‭Resolved(Fulfilled)‬
‭-‬ ‭Rejected‬

‭Friday, October 27th‬


‭ uiz on Wednesday‬
Q
‭Homework on Tuesday night‬

‭Zeep is a module that provides the XML/SOAP implementation in Python‬

‭In JavaScript, any function call by default is asynchronous.‬

I‭n the case of our calculator app, if we were to have an async version of it, it would look‬
‭something like:‬
soap.createClient(url, (err, calculator) => {‬

calculator.add(args, (er, result) => {‬

console.log(args. arg0
‭ + ‘ + ‘ + args.arg1 + ‘ = ’ +‬
result.return);‬

});‬

calculator.subtract(…….‬

‭ his would still put us in a situation where we might encounter callback hell, so a promise-based‬
T
‭version of this would look like:‬
calculator P = soap.createClient (“_________”),‬‭(this‬‭operation would be done‬

‭only once, and is cached within the promise. It resolves once and its value is the same no‬
‭matter how many times you call the .then (on this same promise))‬
calculator P.then (calculator => calculator.add(args))‬‭(the result of this is‬

‭passed to then once it is ready, which would then be returned as a promise to the next line)‬
.then (result => console.log(result));‬

Calculator P.then (c => c.subtract(args))‬

.then (r => console.log(r));‬

‭then (....)‬
.
.then (....)‬

‭In case we nest multiple thens within the same initial calculator p to which we pass the first‬
‭promise, the subsequent operations would only be able to return once the promises for the‬
‭previous operations in our calculators have returned. In this case, we are not really leveraging‬
‭asynchronicity.‬

I‭n general, whenever we deal with I/O we should favor asynchronous (non-blocking) libraries‬
‭over synchronous libraries (blocking). Within the asynchronous libraries, we should favor‬
‭promise-based ones over callback-based ones‬

‭‬ T
● ‭ he then() method returns a promise‬
‭●‬ ‭A value returned inside a then() handler (is returned by the then() method as a promise‬
‭for that value) becomes the resolution value of the promise returned from that then()‬
‭●‬ ‭If the value inside the then() is a promise, then the promise returned by then will “adopt‬
‭the state” of that promise and resolve/reject just as the returned promise does.‬

‭Monday, October 30th‬


‭romises = []‬
P
For (i=0, i<n, i++){‬

Promises.push(get(i, i+ S/n),‬

}‬

‭ romise Orchestration:‬
P
‭Promise.‬‭all‬‭(iterable)‬
‭-‬ ‭ eturns a promise that either fulfills when all of the promises in the iterable argument‬
R
‭have fulfilled or rejects as soon as one of the promises in the iterable argument rejects‬
‭-‬ ‭If the returned promise fulfills, it is fulfilled with an array of the values from the fulfilled‬
‭promises in the same order as defined in the iterable‬
‭-‬ ‭If the returned promise rejects, it is rejected with the reason from the first promise in the‬
‭iterable that rejected. This method can be useful for aggregating results of multiple‬
‭promises‬

‭romise.all(promises).then(chunks => {})‬


P
‭This defines one callback that only calls and executes if and only if all promises get resolved (in‬
‭the download case, when all promises of the different chunks get resolved, and hence they are‬
‭all downloaded and to be written to disk).‬

‭Promise.‬‭race‬‭(iterable)‬
‭-‬ ‭Returns a promise that fulfills or rejects as soon as one of the promises in the iterable‬
‭fulfills or rejects, with the value or reason from that promise‬

‭or (i=0, i<n, i++){‬


F
Promises.push(server[i]),‬

}‬

Promise.race(promises).then ((i=>{})‬

‭ romise-based doesn’t offer enhanced performance over classic callback-based asynchronous‬


P
‭programming, it just offers a way to avoid ugly code and harder debugging.‬

‭Advantages and disadvantages:‬


‭●‬ ‭Promises avoid callback hell through chaining, orchestration and error handling‬
‭●‬ ‭But they still offer a paradigm that differs from the‬‭straightforward‬‭way of synchronous‬
‭programming (unfamiliar way to traditional programming) as promised outcomes can‬
‭only be accessible as a parameter of then callbacks‬
‭●‬ ‭Can we do better?‬

‭ wait‬‭:‬
A
Let response = await fetch(‘____________’)‬

Let text = await response.text()‬

console.log (text)‬

‭(we cannot do this directly as we need to define the function as‬‭async‬‭(boxed))‬

‭Wednesday, November 1st‬


‭ e can await what is awaitable (any function that is asynchronous (either marked as async, or a‬
W
‭function that returns a promise))‬

async function f1() {‬



‭et response = await fetch('‬
l http://www.example.org/example.txt‬
‭ ')‬

let text = await response.text()‬

console.log(text)‬

‭‬
}
f1()‬

I‭n this context we define the function not for modularity or code reuse, but because we have‬
‭awaitable operations that need boxing.‬

‭ here is a better way of doing this:‬


T
‭Using IIFE:‬
‭We wrap it inside an Immediately Invoked Function Expression‬
(async () => {‬

try{‬

let response = await fetch('‬
‭ http://www.example.org/example.txt‬
‭ ')‬

‭et text = await response.text()‬
l
console.log(text)‬

}catch(error){console.log(error)}‬

})()‬

‭ ere we do not need to define the function first then call it. We just invoke within the wrapped‬
H
‭function itself.‬

‭●‬ S ‭ ingle threading cannot be used in professional applications, especially as the server‬
‭side deals with multiple clients concurrently (which we cannot afford to have happen‬
‭synchronously) (CPU being idle) when we have I/O‬
‭●‬ ‭We either use multithreading (you need to create the threads yourself, and these threads‬
‭consume memory) or async programming (either traditional callback or promise-based)‬
‭●‬ ‭Another option is only making the I/O operations asynchronous, which favors using‬
‭promise-based async programming‬

‭ etch returns a promise asynchronously‬


F
‭For an async function, when you call it, it triggers the action in a different thread and it returns a‬
‭promise. But meanwhile, regardless of whether we call then on a promise or not, the request in‬
‭the async function is already sent.‬
‭Promises are hot/eager‬

‭ 4 - Extensibility and Reactivity: Functional and Reactive‬


P
‭Programming‬

‭Professor-provided Table of Contents:‬


‭●‬ ‭The Windows Case Study‬
‭●‬ ‭The Observer Design Pattern‬
‭●‬ ‭From The Observer Design Pattern To Reactive Extensions (Rx)‬
‭○‬ ‭Observable Evolution‬
‭■‬ ‭From shared context to dedicated context‬
‭■‬ ‭From hot to cold‬
‭○‬ ‭Observer Evolution: +error, +complete‬
‭●‬ ‭RxJS‬
‭○‬ ‭RxJS Artificats‬
‭■‬ ‭Observable‬
‭■‬ ‭Observer‬
‭■‬ ‭Subscription‬
‭■‬ ‭Subscriber‬
‭■‬ ‭Subject‬
‭○‬ ‭RxJS Features‬
‭○‬ ‭Execution Canceling‬
‭○‬ ‭Pipelining‬
‭○‬ ‭Operators‬
‭ ‬ ‭From Promise-based Asynchrony To Rx-based Asynchrony‬

‭○‬ ‭The Downside of Promises‬
‭○‬ ‭Rx Observables To The Rescue‬

‭ et’s imagine a context where we have two windows which are synced, meaning that whenever‬
L
‭we make a change in the first window, this change is mirrored in the second one.‬
‭Let’s say that we perform checks at a certain frequency, to detect whether these changes‬
‭happened and reflect them accordingly. The more we increase check frequency, we decrease‬
‭the inconsistency window or time frame.‬
‭In this context we have two different actors: the windows themselves and the Operating System‬
‭(specifically the‬‭FileSystem‬‭- which creates the folder,‬‭renames it, deletes it and such) .The‬
‭window just shows the state of the filesystem. They reflect the state at the level of the File‬
‭system (created, kept track of, managed at that level).‬‭The truth holder‬‭is the filesystem (it is‬
‭the main actor, the acting entity), because of that, then instead of making the windows ask each‬
‭time, let the filesystem notify back whenever there is a change. Each window would register with‬
‭the filesystem (stating that it is interested in whatever changes happening in that path). The‬
‭filesystem would take note of this, by putting a pointer at the level of that window.‬

‭This window in which we interact with the filesystem and get the feedback (notification).‬

I‭n this scenario,‬‭the window observes the filesystem‬‭(but in reality the window doesn’t‬
‭actively observe, they register and ask to be notified back, so the active entity is the filesystem.‬
‭The window is passive, and just‬‭reacts‬‭) The party‬‭that reacts is the‬‭observer‬‭, and the truth‬
‭holder is the‬‭observable‬‭(active).‬

‭-> Observer Design Pattern‬


‭ his reminds us of‬‭Polymorphism‬‭in Java (different observers may have different behaviors,‬
T
‭despite exposing the same notify method)‬

‭Friday, November 3rd‬


‭ he observer is an interface, we also have concreteObserve which implements the update‬
T
‭method differently. The observer doesn’t care about these different behaviors.‬

‭ e have this interface, because we might need these different implementations (‬‭concrete‬
W
‭observers‬‭)‬
‭It also acts as‬‭the contract‬‭between the observables‬‭and the observers as if the observables‬
‭are telling concrete observers that if they want to get notified at events, then expose and‬
‭implement this update() method. (the only way to get this support is through implementing this‬
‭method)‬

I‭n this context, it is the callee that gets the advantage of calling the function, and not its caller.‬
‭This is why this update is considered to be a callback. The subject (which is the caller) isn’t the‬
‭one concerned with the benefit of making that call.‬

‭Wednesday, November 8th‬


‭ he window watches/observes the node management by the file system.‬
T
‭The truth that the file system holds is the truth about the state of these nodes.‬

‭Notify should iterate through the subscribers and call update() on them‬

‭ he only requirement from the filesystem to whatever is subscribed, is that it would expose an‬
T
‭update method: hence,‬‭we just need to have an interface,‬‭exposing one method being‬
‭update‬‭->‬‭observer‬
‭ e can make use of inheritance since the LinkedList <Observer> will be needed for all‬
W
‭observers we have.‬

‭Friday, November 10th‬


‭ rom the observer design pattern to reactive extension rx‬
F
‭RxJava‬
‭RxJS‬
‭RxPython‬
‭Check the observer/ observable code on github.‬

‭2 methods have been added . The contract has 3 methods‬


‭Update‬
‭Error‬
‭Complete‬

‭ pdate observer gets called back successful‬


U
‭Straightforward method: allows the observable to tell the observer : complete‬

‭Shift the responsibility‬

‭ very observer has its own execution context , don't miss any notification‬
E
‭For RX and observable is code and has one observer per context‬
‭Observable is data producer?‬

‭it pushes notifications so we can say that observable is a data producer‬

‭observer react to the data‬

‭rx observable produces the data from the beg to the end for each observer that subscribes to it‬

‭myobservable.subscribe(myObserver);‬

‭Monday, November 13th‬


-‭ ‬ ‭What can we say about RxObservables:‬
‭What is it? It has a recipe, whose outcome is to notify (push a notification) to the subscribed‬
‭observer about changes in the state (The recipe somehow generates some data, which is then‬
‭pushed as a notification to the observer).‬
‭-> Rx Observable is a‬‭recipe‬‭to generate some data‬‭and‬‭push‬‭it to the subscribed Observer.‬
-‭ ‬ ‭Characteristics‬‭:‬‭Cold‬‭(it doesn’t do anything until something is subscribed to it)‬
‭Coldness comes from the fact that this is a recipe, which only executes if there is a subscriber.‬
‭How many data elements are pushed?‬‭As many as the‬‭recipe‬‭(we can push 0 or more data‬
‭elements through time)‬
‭It can be seen as a function (since it is also cold (gets executed only when you call it, and each‬
‭time with an execution context)),‬‭but‬‭a function returns‬‭only once.‬
‭While an observable can notify (return) multiple times.‬

I‭n RxJS (since in JavaScript, we have functions as first class-citizens), instead of extending‬
‭Observable to Filesystem for example to customize a behavior. We just instantiate, and pass the‬
‭recipe (which is a function) (‬‭
new Observable (Recipe);‬ ‭)‬‭This recipe would take as a‬
‭parameter, its subscriber.‬

‭ () gets executed by the observable once it’s done executing its recipe‬
2
‭The successful execution of its recipe‬
‭-‬ ‭You can use‬‭Stack Blitz‬‭, as a JS playground to try‬‭things out.‬

‭Wednesday, November 15th‬


‭‬ H
● ‭ omework 3: Due this Sunday, 19th‬
‭●‬ ‭Quiz 3: on Monday, 20th‬

‭Contrast:‬
‭-‬ ‭The function which is the data producer is a passive entity, while the caller is an active‬
‭entity.‬
‭-‬ ‭The observable as the data producer, on the other hand, is the active entity, while the‬
‭observer is the passive one.‬

‭Producer‬ ‭Consumer‬

‭Function‬ ‭Passive‬ ‭Active‬

‭Observable‬ ‭Active‬ ‭Passive‬

‭In common:‬
‭-‬ ‭They are both cold, both produce data‬

‭ xObservables can be either synchronous or asynchronous‬


R
‭Using $ is a naming convention for observables‬

‭ x provides us with some common operators, one of which is called interval‬


R
‭range is another‬‭operator‬‭within Rx‬

‭An interesting way of defining a for loop for example: (we have an engine, making use of range)‬
‭onst r$ = range(0,1000);‬
c
r$.subscribe((x) => console.log(x));‬

const my$ = interval(2000);‬

my$.subscribe((x) => console.log(x));‬

I‭nterval‬
‭(or other operational operators)‬ ‭factory/pipe‬
‭|‬
‭|‬
‭T1‬ ‭T2‬ ‭T3‬
‭→………….stream>‬ ‭……….—> Customized recipe into‬

‭Op1 Op2 Op3‬ ‭Observable (which is cold)‬

‭ he raw data stream that is being emitted (is actually only the recipe to emit the data)‬
T
‭If the recipe were hot, we might get overwhelmed at the level of execution. But since it is‬‭cold‬‭,‬
‭we are able to take our time for planning and applying all necessary transformations, to get our‬
‭customized observable, so when a‬‭subscription‬‭to an‬‭observer occurs it‬‭triggers‬‭the whole‬
‭process.‬

‭●‬ ‭Makeup Friday 8-10pm‬

‭Friday, November 17th‬


‭ or common needs, we do not need to go back to our own observable class and implement our‬
F
‭own versions of these simple data emitting functionalities, which is where‬‭operators‬‭are useful.‬
‭Creational Operators - We also have Transformational Operators, such as map - Filtering‬
‭Operators (most popular being filter)‬
‭Over promise-based programming, we have a clear distinction between planning (pipelining)‬
‭and execution (subscribing)‬
‭Using observables, we have a cold recipe and each time we subscribe we can trigger an entire‬
‭process, while for promises, they are hot and hence if we have an initial fetch, then a bunch of‬
‭thens, we cannot fetch again unless we explicitly call the fetch again.‬
‭Another advantage of observables over promises is that promises‬‭emit data only once‬‭.‬

‭ romises‬‭: Asynchronous only, emit only once, and are‬‭hot, mix planning and execution.‬
P
‭Observables‬‭: Can be synchronous or async, can emit‬‭many data elements and are cold. Rx‬
‭Observables provide artifacts for planning and transformation (through .pipe(operators)), and an‬
‭artifact for execution (subscribe).‬

‭From now on, we should favor Rx based libraries for asynchronous operations.‬

‭Example of Angular making use of Observables over Promises‬


‭Friday, November 17th Makeup Session‬
‭ bservables evolved by extending to have the error and complete methods‬
O
‭Observer evolved from a shared context (among all observers) to a dedicated execution context‬
‭(per observer) (from hot to cold)‬

‭-‬ ‭How does the unsubscribe method cancel and stop the execution context‬

‭P5 - Scalability: Distributed and Parallelized Programming‬

‭ calability is one of the non-functional requirements we take into consideration in the context of‬
S
‭software development.‬
‭Some people tend to mix performance and scalability: they are‬‭related but different‬‭.‬
‭Performance‬‭reflects the number of transactions executed‬‭per unit of time, or from the user’s‬
‭perspective through response time (the time they click on a button, to the time they get back the‬
‭response (fully loaded on their screen)).‬

I‭n this chapter, we are not considering I/O operations (which was the main purpose behind‬
‭asynchronous programming), instead we are dealing with heavy computations, and hence‬
‭asynchronous programming wouldn’t help.‬

‭ e first need to maximize the CPU usage we have, then and only then, think of increasing‬
W
‭processing power.‬

‭ calability‬‭is the ability to preserve performance‬‭when the load grows, at an acceptable/linear‬


S
‭cost. (this is how the two are related)‬
‭The continuous upgrade of RAM, CPU resources and such is not economically sustainable.‬
‭What we pay for in computing resources incur exponential costs. We would need to invest much‬
‭much more in this processing power, so we end up understanding that the load growth does not‬
‭justify this cost.‬
‭Scalability is a‬‭key success factor‬‭for any product‬‭which aims to offer services to a growing‬
‭customer base. It is one of the main challenges of real life professional applications.‬

‭Scalability is especially needed in the field of Big Data and High Performance Computing (HPC)‬

‭ caling in vs Scaling out‬


S
‭Scale in:‬‭the in means‬‭within the same box. So this‬‭machine has some initial processing power,‬
‭and once you notice load growth and a degradation of performance, you add more processing‬
‭power. It is the naive way of going about it. It is the easiest way (but not necessarily the most‬
‭cost effective) (also called vertical extension) One of the limitations of scaling in (other than cost‬
‭effectiveness) is that you cannot infinitely add things into this box, as there is a limit.‬

‭→ A smarter way would be scaling‬‭out‬‭of that box:‬


‭ cale out:‬‭adding more boxes that would act as one supercomputer for example. You need a‬
S
‭software layer‬‭for these boxes to appear and act as‬‭one layer. So this is called horizontal‬
‭scaling. Instead of using expensive best-of-breed hardware, we can suffice with using some‬
‭commodity‬‭hardware/entry-level/class hardware. We‬‭can reach whatever capacity or‬
‭computing power needed by combining this commodity hardware. (each contributing with‬
‭processing, RAM and storage)‬
‭The keyword here is‬‭partition.‬
‭What is the opportunity we are leveraging, that allows us to only need to send a partition of the‬
‭data and not the entirety of it? Acting on elements independently. So‬‭independence‬‭of‬
‭operations is what gives us this opportunity.‬

‭Monday, November 20th‬


‭ ommodity hardware has an issue of both‬‭performance‬‭and‬‭resilience‬‭/‭a
C ‬ vailability‬
‭(susceptible to failure), but when we combine these, making use of a replication factor, and‬
‭having a software layer which for example check if the replication factor drops below a certain‬
‭threshold, it would identify all the data items that were lost, and get the‬‭replication‬‭factor back‬
‭to its desired level, and hence have it available within our cluster of machines.‬

‭●‬ ‭Read about‬‭Ceph‬‭(Distributed File Storage System)‬

‭Wednesday, November 22nd‬


‭(missing)‬

‭No classes for a week‬

‭Monday, December 4th‬


‭ ecap:‬
R
‭- Asynchronous programming is no longer valid here, as we are not dealing with intensive‬‭IO‬
‭(making the CPU idle), instead we are dealing with‬‭heavy processing‬‭of big data.‬
‭- Since we have partitions at the level of two different machines for example (two partitions), we‬
‭are able to read at double the speed.‬
‭-‬‭Resilience‬‭is the‬‭objective‬‭, and‬‭redundancy‬‭or‬‭replication‬‭is the‬‭means to achieve it‬‭(the‬
‭how)‬
‭- YARN is the orchestrator / manager of the different nodes (laptops for example) and their‬
‭different resources (memory, processing power, storage)‬
‭- MapReduce is the computational layer‬

‭Spark‬
‭From Spark, as a distributed platform, we expect:‬
‭●‬ ‭Resource management‬‭(nodes (which have failed, are‬‭full, are back, ….) / processing,‬
‭memory and storage,....)‬
‭●‬ ‭Running workloads in a parallel fashion (‬‭distributed‬‭processing‬‭)‬
‭Parallel operations‬‭on data would be for example having‬‭to apply the same function on‬
‭different elements of an array (as a collection), happening in a parallel manner.‬

‭-‬ ‭RDDs:‬
‭ park has introduced:‬‭Resilient‬‭(replicated) (we can‬‭add speed as an objective)‬‭Distributed‬
S
‭(partitioned) (we can add parallelism as a means) Datasets (‬‭RDD‬‭s)‬
‭→ a main characteristic of these RDDs is that they are‬‭immutable‬‭(to avoid inconsistencies) (as‬
‭if they were mutable, we’d have inconsistencies since we have replicas and we would need‬
‭synchronization and such.)‬
‭(an example of an immutable object is the String class in Java, as if we have a String, we can‬
‭for example use substring on it, which would return a new string, and our original one would‬
‭stay untouched)‬

-‭ ‬ ‭Operations:‬
‭The two types of operations we have seen before are Map and Reduce, in Spark, we have new‬
‭jargon which is‬‭Transformations‬‭(allowing to go from‬‭an initial RDD to a target RDD)‬‭&‬
‭Actions‬‭.‬
‭These operations are supported by the platform, and are offered by the API which gives us‬
‭access to these.‬
‭-‬ ‭Transformations:‬
‭These are all methods under the RDD class.‬
‭●‬ ‭Map (we apply a function on each element in a set, which then returns a new set)‬
‭●‬ ‭Filter (for example from an initial RDD containing numbers, we move to one containing‬
‭only prime numbers or such, so only the ones fulfilling a certain criteria - the function‬
‭passed as a parameter, analyzes each element that is sent and returns whether true or‬
‭false. Then we filter out the false ones and return the true ones. (its prototype starts with‬
‭bool))‬
‭●‬ ‭flatMap (it can map an element to a list of elements (not one-to-one). e.g. We can have‬
‭an RDD of numbers, then we flatMap it to another RDD, where each element of the RDD‬
‭is a list of numbers that divide that element))‬
‭●‬ ‭Union (takes another RDD as an argument)‬
‭●‬ ‭reduceByKey (a traditional reduce operation takes n elements and reduces them to 1‬
‭element. Here we take a collection and another collection is returned.)‬
‭●‬ ‭…. Read about other transformations‬
‭All of these transformations do not touch or change the original RDD, instead they return a new‬
‭one.‬

‭Wednesday, December 6th‬


‭-‬ ‭Actions:‬
‭Such as:‬
‭●‬ ‭Reduce‬
‭●‬ ‭Count‬
‭●‬ ‭First‬

I‭n order to fulfill the‬‭performance‬‭promise of Spark,‬‭there is a smart optimization at the level of‬
‭Transformations, which is:‬‭transformations are lazy‬‭(cold), which means that when we apply‬
‭an operation on an RDD it is added to a plan (it buffers them), which is what gives Spark the‬
‭opportunity to optimize. It is only when you call an Action, that the‬‭optimized plan‬‭is triggered‬
‭(executed).‬
‭Whenever you write an application for Spark, it is considered to be a Driver (as the‬
‭computations would not be performed at the same machine where your driver is executed,‬
‭instead on remote nodes that are part of the cluster).‬

‭Spark Architecture:‬

I‭n this example, we have two nodes, and the Manager which is managing them and making‬
‭these nodes appear as one. At the level of each node, you would install the binaries for spark‬
‭(which are the same between the‬‭Worker Node‬‭s and‬‭Cluster‬‭Manager‬‭, but just labeled‬
‭differently). This is our Spark platform for now, which makes our cluster. The difference between‬
‭Worker and‬‭Executor‬‭, is that at the level of each‬‭node box, you run the software as a worker,‬
‭but each worker can manage several executors per box. A Spark cluster can run the workloads‬
‭of several applications (but here we would need to create some separation or isolation) so each‬
‭worker node creates a separate executor for each application. The worker node is a JVM, but‬
‭when it receives a workload, it creates a‬‭JVM per‬‭Driver program.‬
‭The‬‭Driver program‬‭: its entry point to the cluster‬‭(to acquire resources from the cluster and‬
‭start submitting transformations and actions through the API to the cluster), it uses a main class‬
‭(part of the API) which is‬‭SparkContext‬‭(when instantiated,‬‭you specify the resources needed‬
‭by your application (how many nodes, how much processing power, how much RAM,) and this‬
‭SparkContext take these parameters as well as the location of the cluster manager. So it‬
‭ xpresses its needs while communicating with the Cluster Manager, and the hassle of making‬
e
‭this communication happen is hidden under the SparkContext class.)‬
‭The cluster manager returns pointers to the Worker Nodes, so then it no longer communicates‬
‭with the Cluster Manager for the resource allocation aspect, (while the distributed computing is‬
‭now done directly between the driver program and the worker nodes which are executing its‬
‭workloads).‬
‭●‬ ‭Where Does the driver program itself run? It can run from any laptop (containing the‬
‭SparkContext), or it is submitted to the cluster (through the cluster manager; where it‬
‭would pick one of the nodes and run this driver on it.) So there are two modes of‬
‭deploying the driver program:‬‭client mode‬‭(runs in‬‭the machine from which it is‬
‭deployed),‬‭cluster mode‬‭(where it is submitted to‬‭the cluster).‬

‭Thursday, December 7th - Online Makeup Class‬


‭Product ratings case study‬

‭Monday, December 11th‬


‭ treams‬
S
‭A stream is an unbounded collection of events that occur through time (come one after the‬
‭other)‬
‭An‬‭event‬‭describes a fact that has already occurred‬‭in the past, hence an event by definition is‬
‭immutable ‘what happened happened!’‬
‭→ A‬‭stream‬‭is a collection of immutable data describing‬‭what has already happened, and as‬
‭time is unbounded, we have events coming through time, one after the other‬

‭ tate-oriented systems vs. event-driven systems‬


S
‭State-oriented systems:‬‭the file always reflects the‬‭current state of the system (in the example‬
‭of databases for example) (if changes occur, we lose track of what has happened before)‬

‭ ut if we see this through time, there was an event which happened which is entering an initial‬
B
‭value, and then another event happened which is changing it and entering a new value.‬
‭If we had recorded this as a stream of events, we would have an immutable event (a create‬
‭event) of entering the first value (at t=t1), then another one (an update event) of entering the‬
‭second value (at t=t2)‬
‭→‬‭event-driven system‬
‭Same as with a state-oriented system, where we need an infrastructure which for example can‬
‭be a relational DBMS, in order to do this, we need an infrastructure which is specialized in‬
‭handling streams of events.‬
‭‬ P
● ‭ roducer may be producing data at a rate that the consumer is not able to handle.‬
‭●‬ ‭In case of complete failure of the consumer, does data get lost? Would the producer stop‬
‭sending and try again later? There is a burden‬

I‭n case we have multiple producers and consumers we end up with multiple issues such as‬
‭Spaghetti connections:‬

‭ hese issues stem from the fact that we have direct interaction /‬‭tight coupling‬‭between‬
T
‭stream producers and consumers‬
‭ o, we insert a middleware (a guy in between), which would play this role of the stream‬
S
‭infrastructure.‬
‭We hence have a brokered communication or interaction/‬‭loose coupling‬‭between stream‬
‭producers and consumers.‬
‭It may act as a buffer to allow the consumer to consume at their own rate. The broker would be‬
‭running on a solid, robust, fast, correctly sized box. It records the events produced by the‬
‭producers.‬
‭If the consumer fails, there would be no effect on the production of the data. And whenever the‬
‭consumer goes back it can read the events from the broker. Since the broker is a solid platform,‬
‭and as we can make it highly available, it is hence redundant.‬

‭The state-of-the-art now is‬‭Kafka‬‭. (prior to it, there‬‭was RabbitMQ, RocketMQ …)‬

‭ afka‬
K
‭Kafka is the DeFacto Distributed ;Event Store and Stream Processing Platform‬
‭Wiki: (Kafka was originally developed at LinkedIn, and was subsequently open sourced in early‬
‭2011. Jay Kreps, Neha Narkhede and Jun Rao helped co-create Kafka. Graduation from the‬
‭Apache Incubator occurred on 23 October 2012. Jay Kreps chose to name the software after‬
‭the author Franz Kafka because it is "a system optimized for writing", and he liked Kafka's work)‬

‭ et’s explore the different ways in which this reading and writing (production and consumption)‬
L
‭occur:‬
‭The most traditional (no more used) is called‬‭Polling‬‭:‬‭the consumer would connect to the‬
‭broker, and ask if it has any new events for it. The broker would say no for example, and the‬
‭connection closes. Then at a certain rate the consumer would try again, and so on. The issue‬
‭here comes from the fact that we have a static approach (a static time, in which we check each‬
‭time). So we could have unavailability of data at a certain time‬
‭Pub/Sub:‬‭(publish/subscribe) the consumer would subscribe‬‭with the broker (saying it is‬
‭interested in events from a certain stream). So a‬‭persistent connection is maintained‬‭, and the‬
‭broker would push any new events that come in from that producer. So here we have events‬
‭pushed as soon as possible.‬
‭ he broker records the state of the conversation here. (last event sent with its ID, to know what‬
T
‭the next event to be sent is)‬
‭There are some drawbacks:‬
‭●‬ ‭The broker overloading the consumer.‬
‭●‬ ‭If an event is pushed but the consumer doesn’t read it properly, or it should send‬
‭feedback saying that it has read it correctly, if not, it needs to be resent. So this can‬
‭cause some issues.‬

‭Kafka leveraged the use‬‭Long Polling‬

‭Wednesday, December 13th‬


‭The broker decouples the producer from the consumer.‬

‭Under pub/sub there are three modes:‬


‭●‬ ‭At most once‬‭(from the perspective of the receiver)‬‭(either once or 0 (missing the sent‬
‭message)(it doesn’t care if a message is not received and you tolerate this))‬
‭●‬ ‭At least once‬‭(definitely receive each copy, but might‬‭receive duplicate)‬
‭●‬ ‭Exactly once‬‭(the most challenging)‬

‭ ong Polling:‬
L
‭You let the consumer ask for the data itself, just like in polling but with an enhancement. It‬
‭consumes at its own rate. (the connection is persistent, so the consumer makes a request, then‬
‭whenever data is available it would receive it, rather than constantly requesting and having to‬
‭open and close connection continuously)‬
‭Since the consumer takes the initiative, it keeps track of what the latest message received is.‬
‭Each consumer records the state.‬
‭This helps release the load. Make its implementation as small as possible. And let each‬
‭consumer be responsible for the reading, the state and such. Decentralized at the level of each‬
‭consumer.‬

‭ ence, Kafka’s broker is much simpler (making use of Long Polling)‬


H
‭Kafka’s broker is stateless, and statelessness supports scalability‬

‭Kafka Architecture:‬
‭ opic‬‭is Kafka’s vocabulary to say stream.‬
T
‭At the level of the broker infrastructure, there could be many streams, produced by many‬
‭producers. Each stream would correspond to a topic. These topics are functional‬
‭(domain-specific entities).‬

‭ topic may be handled in parallel, by two or more brokers, part of the same infrastructure.‬
A
‭Events belonging to the same topic may be written in parallel to two or more brokers in the‬
‭cluster. So a certain broker doesn't have the whole truth about a topic, and hence we go back to‬
‭the concept of‬‭partition‬‭.‬‭A partition is non-functional.‬‭This is for performance and scalability.‬

‭ or resilience, and high availability, each partition is replicated on other brokers.‬


F
‭So we get a Broker elected as a‬‭Leader‬‭for a certain‬‭partition.‬
‭We can say that Broker 2 is a standby leader for partition 0 (if broker 1 fails)‬

‭The consumer would be consuming from both brokers.‬

‭Consumer Groups:‬
‭ e have a cluster of two brokers (called servers here), and we have a topic with 6 partitions‬
W
‭from P0 to P5, where each broker is the leader for 3 partitions out of the 6. In order to give more‬
‭sense to partitions, there is this concept of consumer groups: So we have for example in group‬
‭A, 3 consumers, which are configured to be in the same group. So we have the same‬
‭consumer, but for the purpose of scalability, it is run within 3 different machines (C0 to C3, which‬
‭are all the same consumer). (in here, we do not want and C2 to process the same event, (as if‬
‭you were in the same machine, you would not process the same event twice, so we do the‬
‭same here)) from within a consumer group, we read from different partitions, so as not to miss‬
‭an event, and no two consumers within a consumer group read from the same partition (no‬
‭duplicate processing of events).‬

‭ afka is marketing itself today, as even an alternative to Spark. (even though initially they‬
K
‭compete in different spaces (distributed processing, and distributed streaming))‬
‭Done‬

You might also like