CompSci A2 Paper 3
CompSci A2 Paper 3
● Composite: A data type that is a collection of data that can consist of multiple
elements of different or the same data types grouped under a single identifier
● Non-composite: It can be defined without referencing another data type and it can
be a primitive type available in a programming language, or a user-defined type.
Describe what happens in relation to storage or retrieval of a record in the file, when the
calculated hash value isca a duplicate of a previously calculated hash value for a different
record key.
● A collision occurs when the record key doesn’t match the stored record key, this
means the determined storage location has already been used for another record.
● If the record is to be stored:
○ Search the file linearly, to find the next available storage (closed hash)
2
○ Search the overflow area linearly, to find next available storage space
(open hash)
● If the record is to be found:
○ Search the overflow area linearly (open hash) until the matching record key
is found
○ Search linearly from where you are (closed hash) until the matching record
key is found
○ If not found record is not in file
Enumerated Data Type (A user-defined non composite data type with an ordered
list of possible values):
TYPE Season(Summe, Winter, Autumn, Spring)
DECLARE ThisSeason : Season
DECLARE NextSeason : Season
ThisSeason ← Autumn
NextSeason ← ThisSeason + 1 // NextSeason is set to Spring
Pointer Data Type (A user defined non composite data type used to reference to
a memory location):
3
State the consequence of storing the binary number as a floating point number in this
system. Justify your answer. (Note: This only applies to specific scenarios where the
original number of bits is larger then the bits provided for the mantissa, for this
question I'll be using original 14 bits with sign and 12 bits mantissa):
4
Explain the reason why binary numbers are stored in normalized form:
I can’t explain the process of converting properly (Binary to mantissa & exponent) over
text so watch these 2 videos:
● https://youtu.be/mGfOJQgdI_U?si=NIJfrGzOkIsSxUdt
● https://youtu.be/IGQ9YOnhWxA?si=bUpyNU2QipPoI9ah
● In both serial and sequential files records are stored one after the other and need
to be accessed one after the other
● Serial files are stored in chronological order
● Sequential files are stored in ordered records and stored in the order of the key
field
● In serial files, new records are added to the next available space/records are
appended to the file
● In sequential files, new records are inserted in the correct position
● It becomes difficult to access because you must access all proceeding records
before retrieving the one being searched.
● It cannot support modern high-speed requirements for quick record access.
● File access: Records in this file type are searched using Sequential Access.
Successively read record by record until the required data is found or the whole file
has been searched, and the required data is not found, thus prolonging the process
● The sorting makes it easy to access records but does not remove the need to
access other records as the search looks for particular records.
● The binary search technique can reduce record search time by half the time.
Advantages of random files organization(Records are stored randomly in the file but are
accessed directly. The location for each record is found using a Hashing Algorithm on the
record's key field. Magnetic and optical disks use random file organization.):
● It's well suited for larger files, which take longer to access sequentially. Data in
direct access files are stored in an identifiable record, which could be found by
involving initial direct access to a nearby record followed by a limited serial search.
● How often do transactions occur, and how often does one need to add data?
● How often does it need to be accessed, edited, or deleted?
Most suitable method of file access when a record is referenced by a unique address on a
disk type storage medium:
● Direct access
● Most commonly used on data networks such as the internet to send large data
files that don’t need to be streamed
● Is used when it is necessary to be able to overcome faulty lines by rerouting
● Used when communication needs to be more secure
● High volume data transmission
● Used when isn’t necessary to use all of the bandwidth
● Any specific examples like (email, text messages, documents, etc)
7
● Used by email clients to retrieve email messages (a pull protocol) from a mail
server (over a TCP/IP connection)
● Keep the server and client in sync by not deleting the original email
● A dedicated circuit
● Circuit is established before transmission starts // circuit is released after
transmission ends
● Data is transferred using the whole bandwidth
● All data is transferred over the same route
The TCP/IP protocol suite has 4 layers, the application layer provides user services.
Identify the protocols used by this layer and describe:
● HTTP(S): For sending and receiving web pages (HyperText Transfer Protocol)
● FTP: For sending and receiving files over a network (File Transfer Protocol)
● SMTP: For sending/uploading emails / push protocol (Simple Mail Transfer
Protocol)
● POP3: For receiving/downloading emails / pull protocol (Post Office Protocol 3)
● IMAP: For receiving/downloading emails / pull protocol (Internet Message Access
Protocol (Alternative to POP3))
Identify the other layers of the TCP/IP protocol suite. Describe the function of each
layer:
Explain how packet switching is used to transfer messages across the internet:
● A large message is divided up into a group of smaller chunks of the same size
called packets
● The packet has a header and a payload
● The header contains a source IP address, destination IP address (and sequence
number)
● Each packet is dispatched independently and may travel along different routes /
paths
● The packets may arrive out of order and are reassembled into the original message
at the destination
● If packets are missing / corrupted a re-transmission is requested
● Time delays to correct errors / Network problems may introduce errors to packets
● Requires complex protocols for delivery
● Unsuitable for real time transmission applications
Using the TCP/IP protocol suite, what happens when a message is sent from host to
another (In the syllabus you only need to know up to the link layer, physical layer is here
for understanding):
● Sender side (Physical Layer): Receives the frames and converts the IP addresses
into the hardware addresses appropriate to the network media. The physical
network layer then sends the frame out over the network media.
● Service Provider: Re-routes the packets according to the IP address
● Receiver side (Physical Layer): Receives the packet in its frame form. It computes
the packet's checksum and sends the frame to the data link layer.
● Receiver side (Link Layer): Verifies that the checksum for the frame is correct and
strips off the frame header and checksum. Finally, the data link protocol sends the
frame to the Internet layer.
● Receiver side (Network Layer): Reads information in the header to identify the
transmission and determine if it is a fragment. IP would reassemble the fragments
into the original datagram if the transmission was fragmented. It then strips off the
IP header and passes it on to transport layer protocols.
● Receiver side (Transport Layer): Reads the header to determine which application
layer protocol must receive the data. Then TCP strips off its related header and
sends the message or stream up to the receiving application.
● Receiver side (Application Layer): Receives the message and performs the
operation requested by the sender
● Note, LINK is just a combination of data link and physical
● https://www.youtube.com/watch?v=OTwp3xtd4dg (Nice explanation :))
A processor will have an architecture which refers to its physical construction, a process
will also have what is termed an ‘Instruction set architecture’. This is concerned with:
• the instruction set
• the instruction format
• the addressing modes
• the registers accessible by instructions
Complex Instruction Set Computer (CISC): a single instruction can be more complex and
involve more loading of data from memory
RISC CISC
● For RISC the term reduced affects more than just the number of instructions, as the
reduction in the number of instruction is not the major driving force for the use of
RISC, but it is the reduction of the complexity of the instruction that's the key
feature of RISC
● The typical CISC architecture contains many specialized instructions, the
specialized instructions are designed to match the requirement of a high-level
programming language, they require multiple memory accesses which are very
slow compared register accesses
Instruction fetch (IF) 1.1 2.1 3.1 4.1 5.1 6.1 7.1
Now what if a question asks to put 6 instructions for example in a table with 12 clock
cycles and five stages, this is how it would be done:
1 2 3 4 5 6 7 8 9 10 11 12
Describe the process of pipelining during the fetch-execute cycle in RISC processors:
● Instructions are divided into subtasks / 5 stages (IF, ID, OF, IE, WB)
● Each subtask is completed during one clock cycle
● No two instructions can execute their same stage at the same cycle
● The second instruction begins in the second clock cycle, while the first instruction
has moved on to its second subtask
● The third instruction begins in the third clock cycle while the first and second move
on to their second and third subtask respectively and so on
● Erase the pipeline content for the latest 4 instructions to have entered. Then, the
normal interrupt handling routine can be applied to the remaining instruction
● Construct the individual units in the processor with the individual program counter
registers. This allows current data to be stored for all of the instructions in the
pipeline while the interrupt is handled
SISD: Single Instruction Stream Single Data stream; a single processor accessing one
memory, found in early computers (Cheap, low power, microprocessors slow speed,
battery solar power systems)
SIMD: Single Instruction Stream Multiple Data stream; processing of parallel data input
requiring one control unit instructing multiple processing units, found in array processors
(Efficient on large amount of data, modern GPUs, scientific process limited)
15
MISD: Multiple Instruction Stream Single Data stream; does not exist in a single
architecture, used to sort large quantities of data, contains multiple processors which
process the same data
Pros Cons
A virtual machine can crash without The time, effort and cost needed for
affecting the host machine implementation is high
Can run legacy applications that are Performance of the guest system cannot
currently incompatible be adequate measured
There are security benefits as a virus A virtual machine may be affected by any
would only infect the virtual machine weakness of the host machine
More than one new computer system can Use of a virtual machine increases the
be emulated so allows for multiple OS to maintenance overheads because both the
exist host system and virtual machine must be
maintained
● Used by companies wishing to use the legacy software on newer hardware and
server consolidation companies
● Virtualizing machines allows developers to test applications on many systems
without making expensive hardware purchases
Combinational circuit (A circuit whose output is dependant only on the input values):
Half adder: This is the simplest circuit that can be used for binary addition, the circuit
takes in two bits and outputs a sum bit (S) and a carry bit (C). So how would that work? If
you are adding 1 0 or 0 1, sum bit is 1 and carry bit is 0, however if you are adding 1 1, the
sum bit is gonna be 0 and the carry bit is 1 as it is being carried forward. So a half adder
truth table would be something like this:
17
0 0 0 0
0 1 1 0
1 0 1 0
1 1 0 1
You can see the truth table for the S output can be seen as a match for the XOR operator,
therefore we can say one circuit that would produce the half adder functionality would
contain an AND gate and an XOR gate, with each gate receiving input from A and B. This
is only one of the several circuits that would provide the functionality. The NAND and
NOR gates are universal gates, any logic circuit can be constructed using only NOR or
NAND gates.
Now moving on to the full adder, if a half adder is used each time, there has to be a
separate circuit to handle the carry bit because the half adder only takes in 2 bits. This is
where the full adder comes in with 3 inputs including the previous carry bit. The truth
table for a full adder would be something like this:
0 0 0 0 0
0 0 1 1 0
0 1 0 1 0
18
0 1 1 0 1
1 0 0 1 0
1 0 1 0 1
1 1 0 0 1
1 1 1 1 1
This circuit is a self-consistent state, if we consider the state with Q=1 and Q’=0:
• and we consider the condition that both S and R inputs are 0
• then the top NOR gate has inputs both 0
• giving output 1
• and the bottom NOR gate has inputs 1 and 0
• giving output 0
The S and R inputs are for set and reset respectively, we can showcase a truth table to
see how S and R change the initial state to a new final state:
0 0 1 0 1 0
1 0 1 0 1 0
0 1 1 0 0 1
0 0 0 1 0 1
1 0 0 1 1 0
0 1 0 1 0 1
A combination of S=0 and R=1 converts a set state to an unset state, a combination of
S=1 and R=0 converts an unset state to a set state.
20
The purpose of a flip-flop is to store a binary digit, and so memory can be created from
flip-flops, and flip-flops can be used to store bits of data
In addition to the possibility of entering an invalid state there is also the potential for a
circuit to arrive in an uncertain state if inputs do not arrive quite at the same time. In order
to prevent this, a circuit may include a clock pulse input to give a better chance of
synchronizing inputs. The JK flip-flop is an example
J K Clock Q
0 0 ↑ Q unchanged
1 0 ↑ 1
0 1 ↑ 0
1 1 ↑ Q toggles (Q and Q’
is switched)
● https://www.youtube.com/watch?v=G35mcLX-vh0&pp=ygUec3IgYW5kIGprIGZsaX
AgZmxvcCBBIGxldmVsIGNz (SR flip-flop)
● https://www.youtube.com/watch?v=fJakGU0vCg8&pp=ygUec3IgYW5kIGprIGZsaX
AgZmxvcCBBIGxldmVsIGNz (JK flip-flop)
● https://www.youtube.com/watch?v=j6krFp511HA (JK flip-flop)
There are two problems with the SR flip-flop that the JK flip-flop overcomes.
State each problem and state why it does not occur for the JK flip-flop:
● Problem 1
○ One combination of S and R gives NOT valid / indeterminate output // Q
○ and Q have the same value
○ The JK flip-flop does not allow for Q and Q to have the same value for
○ any combination of inputs // Q and Q have to be complementary
● Problem 2
○ Inputs may not arrive at the same time
○ The JK flip-flop has a clock pulse to synchronize inputs
Karnaugh maps: a method of obtaining a Boolean algebra expression from a truth table
involving the
● Minimizes the number of logic gates used, thus providing more efficient circuit
In the maps, the column is the last input while the row is the rest, for example:
A B C, C would be the column, AB would be the row
A B, B would be the column A would be the row
● No zeros allowed
● No diagonal
● Only power of 2 number of cells in each group
● Groups should be as large as possible
● Every one must be in at least one group
● Overlapping allowed
● Wrap around allowed
● Fewest number of groups possible
Purpose of an operating system (OS) (Refer to these explanations of the function of the
OS only if its not in the rest of the notes for this chapter):
environment, removing the need of processor functions, and provides system calls
(portability)
● Multitasking, More than one program can be stored in memory, but only one can
have CPU access at any given time. The rest of the programs remain ready
● Process: A program being executed which has an associated Process Control Block
(PCB) in memory, a PCB is a complex data structure containing all relevant data to
the execution of a process, process stats include; Ready (A new process arrives at
the memory, and the PCB is created). Running (Has CPU access). Blocked (Cannot
progress until some event has occurred)
● Scheduling ensures the computer system can serve all requests and obtain a
certain quality of service
● Interrupt causes OS kernel to invoke ISR, which means the kernel may have to
decide on a priority and register values stored in PCB, reasons for an interrupt
could be errors, waiting for I/O, scheduling halts process
● Low-level scheduling: Allocation of specific processor components to complete
specific tasks, OS contains low-level scheduling algorithms
● Preemptive: Will stop the process that would have otherwise have continued to
execute normally
● First-come-first-served, Non-preemptive and FIFO queue
● Round-robin, allocates time slice to each process, preemptive, can be a FIFO
queue, does not prioritize
● Priority based, most complex as priorities re-evaluated on queue change, priority
calculator requires computation, criteria for priority time is estimated time of
execution, estimated remaining time of execution, is the CPU/IO bound? Length of
time spent in waiting queue
● Paging, process split into pages, memory split into frames, all pages loaded into
memory at once
● Virtual memory: No need for all pages to be in memory, CPU address space is thus
larger than physical space, addresses resolved by the memory management unit,
the process of virtual memory is as it goes: All pages on the disk initially, one/more
loaded into memory when process ‘ready’, pages replaced from disk when needed
(this can be done with FIFO queue or usage-statistics-based algorithm)
● Disk thrashing: Perpetual loading/unloading of pages due to a page from disk
immediately requiring the page it replaced.
24
● The user mode is the one available for the user or an application program.
● Privileged/kernel mode has the sole access to parts of the memory and to certain
system functions that the user mode can’t access.
Syntax analysis: Using parsing algorithms to interpret the meaning of sequence of tokens
and check the grammar
Code generation: The process by which an intermediate code is generated after syntax
analysis
25
For interpreters:
● Analysis and code generation run for each code line as above
● Each line is executed as soon as the intermediate code is generated
● It checks that the code matches the grammar of the language // checks the tokens
conform with the rules of the programming language
● Syntax errors are reported
● A parse tree is produced
● Disk / secondary storage is used to extend the RAM so the CPU appears to be able
to access more memory space than the available RAM
● Only the data in use needs to be in main memory so data can be swapped
between RAM and virtual memory as necessary
● Virtual memory is created temporary
26
State the difference between paging and segmentation in the way memory is divided:
● Paging allows the memory to be divided into fixed sized blocks and segmentation
divides the memory into variable sized blocks
● The operating system divides the memory into pages, the compiler is responsible
for calculating the segment size
● Access times for paging is faster than for segmentation
● Recursion
● Implementation of ADT
● Procedure calls
● Interrupt handling
● RPN
Explain why Reverse Polish Notation (RPN) is used to carry out the evaluation of
expressions
Identify, with reasons a data structure that could be used to evaluate an expression in
RPN
● Stack: The operands are popped from the stack in the reverse order to how they
were pushed
● Binary tree: A tree allows both infix and postfix to be evaluated (tree inversal)
LHS ::= RHS (Left hand side (always non terminal symbol) is defined by the right hand
side (sequence of symbols (terminal or non terminal)))
The password “sentence" you see at the bottom is how it can be formed, and must follow
that order, for example a password produced from this like F$4 would be invalid as the
order must be a letter → digit → symbol, and the loop indicates that it can be repeated
like ASD55$$, anything used that is not in the diagram would count as an invalid letter
and therefore make the password invalid for example ? is not a valid symbol so it would
make the password invalid
Advantages of RPN:
● RPN expressions do not need brackets, and there is no need for the precedence of
operators
● RPN is simpler for a machine to evaluate
● There is no need for backtracking in evaluation as the operators appear in the
order required for computation and can be evaluated from left to right
The process is done step by step where the most recent process is on top (RPN can also
be done in binary trees)
17. Security
Encryption: The making of cipher text from plain text, encryption can be used when
transmitting data over a network, it is a routine procedure when storing data within a
computing system
● Key widely available that can be used to encrypt message that only owner of
private key can decrypt
● Private key only known to owner of the key pair, the public key can be distributed
to anyone
● When the messages are sent to the owner of a public key, they are encrypted with
the owners public key so they can only be decrypted with owners private key
● Message digests are encrypted with the private key of the sender to form a digital
signature
● Messages are encrypted with the public key of the receiver
● The message that came with the digital signature is decrypted using the receiver’s
private key
● The digital signature received is decrypted with the sender’s public key to recover
the message digest sent
● The decrypted message received is hashed with the agreed hashing algorithm to
reproduce the message digest of the messages received
● The two messages digests are compared, if both digests are the same the
message has not been altered
● Symmetric encryption uses a single key and asymmetric encryption uses a pair of
keys.
● The symmetric single key is used by all, whereas only one of the keys for
asymmetric encryption is available to everyone / one of the asymmetric encryption
keys needs to be kept secret.
● Limited range
● Required dedicated fiber (optic) line and specialist hardware
● Cost of dedicated fiber (optic) line and specialist hardware is expensive
● Polarization of light may be altered whilst traveling down fiber optic cables
Describe the purpose of Secure Sockets Layer (SSL) and Transport Layer Security (TLS)
protocols:
● The SSL and TLS protocols provide communication security over the internet, as
they provide encryption
● They enable two parties to identify and authenticate each other and communicate
with confidentiality and integrity
Explain how SSL/TLS protocols are used when a client-server communication is initiated:
Artificial Intelligence is the ability of computers to perform tasks that usually only a
human would be able to do, such as decision-making, speech recognition, etc.
● To find the optimal / shortest / most cost-effective route between two nodes based
on distance / cost / time
A small amount of data is required at the Large amount of data is required at the
training stage training stage
Most of the data’s feature must be Deep learning does not require advanced
identified beforehand and manually coded identification of data features as it learns
into the system them from the data itself
A given task is solved using a modular The problem is solved from beginning to
approach where each module is then end as a single entity
combined to create a final model
Testing of the system takes a long time Testing of the system takes less time
There are specific rules that clarify the The system makes decisions based on its
reasoning behind every step in the model own logic, making the reasoning behind
those decisions difficult to understand
Labeled and Unlabeled data: Labeled data is fully defined and recognisable. Unlabelled
data is data which is unidentified and unrecognizable.
35
● Uses artificial neural networks that contains a high number of hidden layers
modeled on the human brain
● Deep learning uses many layers to progressively extract high level features from
the raw input
● Deep learning is a specialized form of machine learning
State the reason for having multiple hidden layers in an artificial neural network:
● Artificial neural networks are intended to replicate the way human brains work
● Weights / values are assigned for each connection between nodes
● The data are input at the input layer and are passed into the system They are
analyzed at each subsequent (hidden) layer where characteristics are extracted /
outputs are calculated
● this process of training / learning is repeated many times to achieve optimum
outputs // reinforcement learning takes place
● Decisions can be made without being specifically programmed
● The deep learning net will have created complex feature detectors The output
layer provides the results
● Back propagation (of errors) will be used to correct any errors that have been
made.
Classification: Split the data into two or more predefined groups. Example: spam email
filtering, where emails are split into either spam or not
Linear Regression: They are used where there is a straight-line correlation between
variables
37
Clustering: Split the data into smaller groups or clusters based on specific features. The
programmer might specify a target number of groups or let the algorithm decide.
● The results generated by the systems are compared to the expected outcome.
● The difference between the two results is calculated.
38
● Outputs travel back from the output layer to the hidden layer to adjust the initial
weightings on each neuron.
● ErrorB= Actual Output – Desired Output
● If the error difference is too large, the weightings are altered.
● The process is iterative until the outputs have an acceptable error range or until
the weights stop changing. The model has then been successfully set up.
● https://www.youtube.com/watch?v=Ilg3gGewQ5U Good explanation
Components of a graph:
● Nodes are the fundamental units of the graph. Every node can be labeled or
unlabelled.
● Edges: Edges are lines used to connect two nodes of the graph. It can be an
ordered pair of nodes in a directed graph. Every edge can be labeled/unlabelled.
● The graph below has a set of nodes V= { 1,2,3,4,5} and a set of edges E= {
(1,2),(1,3),(2,3),(2,4),(2,5),(3,5),(4,50 }.
● Adjacency: 2 nodes are said to be adjacent if they are endpoints of the same edge.
● Path: A set of alternating nodes and edges allows you to go from one node to
another. A path with unique nodes is called a simple path.
We want to calculate the shortest distance for each town, so how do we go about that?
There are many ways to get to town 1, however the smallest value to reach there is 3,
that means it is the shortest distance so take it as 3, town 2 only has 1 way from base so
we simply take it as 5, town 3 also has many ways but the shortest value/distance is 2, so
we just keep repeating until we find shortest distance to each town.
this would not be a problem in small networks, but in large networks it will result
in massive inefficiencies
● Negative weighted costs: On physical networks with physical distances, you can’t
have negative weights, but on some networks where you calculate costs, you
might have negative costs for a particular leg. Dijkstra can’t handle these negative
costs
● Directed networks: Dijkstra’s algorithm doesn’t always work best when there are
directed networks (such as motorways that only run in one direction)
A* Algorithm:
Let us solve an example down below, h being heuristic value, g being the movement cost,
and f being the total of g and h:
Now we want to find the shortest path between the home and school nodes using A*
algorithm,
41
Start from the Home. The cost from Home to Home is 0, so g= 0. The heuristic cost of a
home is 14, so h=14 and f=g+h=14
Now, there are three immediate nodes from home: A, B, and C. Calculate the values of g, h
and f for A, B and C from home and write them in the table
Select the node whose f value is the shortest (in this case, Node A)
From A, there are two immediate nodes, B and E. Calculate the g value for each node and
add the g value of A. Then, add the corresponding h values to get f for each node. Then
go to the node with whose f value is shortest again (in this case, Node E)
From E, there are two immediate nodes, School and F. Calculate the g value for each node
and add the g value of E. Then, add the corresponding h values to get f for each node
From F to School, add the g value (3) to the g value of F (8) and calculate f
TABLE IS BELOW
1 Home 0 14 14
2 A 1 10 11
3 B 5 7 12
4 C 4 9 13
5 B 1+3=4 7 11
6 E 1+6=7 3 10
7 F 7+1=8 3 11
8 School 7+5=12 0 12
9 School 1+6+1+3=11 0 11
42