epam upskill notes

Download as docx, pdf, or txt
Download as docx, pdf, or txt
You are on page 1of 51

RAM stores the information about the commands that are executed, about programs,

and about everything that is being worked with. It is not suitable for long-term
storage, because the information disappears after a computer is turned off. Types:
DIMM (dual in-line memory module), SO-DIMM (small outline DIMM), DDR SDRAM
(versions 2, 3, 4).
A hard disk drive and a solid state drive ensure long-term storage of information.
A motherboard is a microlite printed circuit board, on which various microcircuits
and expansion slots are soldered. It consists of:

 a CPU socket into which a processor is inserted like a plug


 DIMM Memory slots for installing RAM
 a SATA connector for connecting a hard disk drive (HDD) and a solid state drive
(SSD)
 a northbridge, PCI slots, a CMOS battery, BIOS
 a built-in sound card, a network card, a graphic card, USB ports

A power supply is needed to make a constant flow of current from alternating current
and to provide power to all components of a computer. A power supply consists of:

 voltage converters. Usually, there is a sticker with data on a power supply, for
example: input 220V, output + 3.3V, + 5V, + 12V
 power supply outputs (e.g. for a motherboard, a processor, SATA power for
SSD/HDD and PCIe power for a graphic card)

Laptop Computer Components


The architecture and construction of a laptop are the same as in a
desktop computer. The only difference is that the details are presented
in a smaller version, and many components are soldered directly to a
motherboard. The following main components should be noted:

 a motherboard
 CPU
 a cooling fan
 a RAM module
 an HDD or a SSD
 Wi-fi, Bluetooth module
 a battery
 slots and sockets
Boolean Algebra
Boolean algebra is a branch of mathematics that deals with variables
having only two possible values, 1 and 0, or "true" and "false". The
main difference between Boolean algebra and elementary algebra is
that Boolean algebra deals with logical operations, whereas
elementary algebra deals with arithmetic operations. Why is it
important to us? First, as we will learn later, modern computers and
their one of the most important parts, CPU, are built of billions very
simple circuits called logic gates. These logic gates are physical
implementations of Boolean functions. Second, all programs contain
many conditional statements, which allow the program to perform
different actions depending on whether a condition is seen as true or
false. These conditional statements are also Boolean functions.
Key Facts About Boolean Algebra

 Variables can have only two values. The values are truth values, "true" and
"false". It is often denoted as 1 and 0, respectively.
 There are three basic operations in Boolean Algebra:
1. A AND B. This operation results in true value only if both A and B are true.
2. A OR B. This operation results in true value only if at least one of A or B is
true.
3. NOT A. This operation negates the value of A. If A is true, then NOT A is
false, and if A is false, then NOT A is true.

The operations can be expressed using the so-called truth tables. A


truth table lists all possible combinations of values of variables and
the result of operations. The truth tables for the basic operations in
Boolean Algebra are provided below.
In this topic we will outline on a basic level what the CPU does. It is
obvious that it somehow performs some instructions but let us have a
closer look at it.
The CPU has so-called registers which are essentially a few bytes of
very fast memory. The size of registers depends on computer
architecture; modern processors have 64-bit registers. Each CPU has
the following registers:
1. Program counter. This register contains the address of the current instruction
in RAM (Random Access Memory).
2. Instruction register. This register contains the instruction currently being
executed.
3. Data registers. These registers are used to store results of operations.

There are many other registers in modern complex CPUs but let us
keep it simple.

The CPU also has a clock generator that generates pulses with
predefined intervals of time. It is used to synchronize different parts of
a processor. Each time a clock pulse occurs, millions of small circuits in
the processor change their state and process signals. All at the same
time.
Fetch-Execute cycle or Fetch-Decode-Execute cycle or Instruction cycle
is a three-stage process:
The first 4 bits, 0001, indicate that this is a Load command and the last 4 bits are used
for memory address. 4 bits allow us to use 16 memory addresses from 0 to 15. This is
enough for our simple CPU because we only have 8 bytes of memory.

The cycle continues repeating again and again. With each cycle, the value at the data
register is increased by the value at the memory address 5, 2, and the result is stored in
the memory address 4. In our simple CPU this cycle would never end. Real CPUs have
commands to compare two registers and jump based on the result of this comparison.
If we had extended our instruction set with "Compare X, Y" command to compare
register X and Y and store the result of the comparison as a bit in a flag register (flag
register is essentially a read-only register to indicate status of some commands) and
"JumpGreater C" command to set the program counter to C only if the comparison flag
is set to the value corresponding to "X is greater than Y" result of comparison, we
would have been able to modify our program so that it jumps to another part of code
as the value in the data registry exceeds, for example, 16. Machine level instructions
which do something based on a condition are called control flow instructions.
We have already explained how a computer stores and operates number data.
However, we need to process not only numbers, but texts as well. This is done by
mapping letters to numbers. These mappings are called encoding systems. One of the
first such systems was ASCII (American Standard Code for Information Interchange). It
maps one byte to one character. Though only 7-bits of byte are used, 7-bits give us 128
unique characters which is enough to code all letters of the English alphabet,
numbers, and some special symbols.

ASCII table consists of two types of symbols:

 Special control characters, for example, SOH, STX, and others.


 English letters, numbers, and other printable characters, for example, question
mark (63), equals sign (61).

ASCII table can be extended to 8-bit (one byte). There are many extended ASCII
encodings. The upper part of Extended ASCII is used to represent characters from the
languages other than English. However, additional 128 characters are not enough to
represent all missing characters from all languages. So, to correctly interpret text
codes using one of the extended ASCII encodings, you also need to know which
encoding system should be used.
One-byte encodings can be used to represent an English text and other
alphabet-based languages, but some languages like Chinese have
thousands of different characters. In scientific texts we might also want
to have a mixture of English and Greek letters. The standard called
Unicode was developed to solve this problem. The standard maps
more than 100,000 characters to numbers. The first 128 characters in
the Unicode match the first 128 ASCII characters.
Unicode

Unicode is a character set which


assigns numbers to characters.
How these numbers are stored by
the computer depends on a
Unicode encoding. The most used Unicode encodings are: UTF-8, UTF-
16, and UTF-32.
UTF-32 was the first Unicode encoding. It is a fixed-length Unicode encoding. The
number 32 in the name represents the number of bits used to code each Unicode
character. 32 bits correspond to 4 bytes. So, each symbol in UTF-32 is coded using 4
bytes. In modern times, this encoding is rarely used because: 1) It wastes a lot of space;
an English text encoded using UTF-32 uses 4 times more space. 2) It is not ASCII-
compatible.
UTF-16 uses 2 or 4 bytes to encode characters. It is a variable length Unicode encoding.
It is used internally by Windows and Java programming language. Although it wastes
less space than UTF-32 for an English text, it still uses twice as much space compared
to ASCII; and it is not ASCII-compatible.
UTF-8 is a variable length encoding scheme for Unicode. It uses one to four bytes to
encode symbols. If a text uses only ASCII symbols, the Unicode and ASCII encoded text
will be the same. It is ASCII-compatible. At the same time, because it is ASCII-
compatible, it uses one byte for one letter for an English text. UTF-8 is the most used
encoding on web pages.

Since UTF-8 is a variable length encoding, we must have a way how to tell when a
character ends and another one starts. This is done by using several first bits of the
first byte of each character to indicate how many bytes this character uses.

Number of Bytes Binary Number of "X"

1 0xxxxxxx 7

2 110xxxxx 10xxxxxx 11

3 1110xxxx 10xxxxxx 10xxxxxx 16

4 11110xxx 10xxxxxx 10xxxxxx 10xxxxxx 21


Let us encode Greek small letter alpha using UTF-8 to demonstrate it. Its decimal
number in Unicode table is 945. The binary representation of this number is 0011 1011
0001. We need 10 bits to store this number. The second pattern from the table should
be used. Substitute the bits into the pattern to get 1100 1110 1011 0001.
It should also be noted that UTF-16 and UTF-32 have two variations each: big-endian
and little-endian. We are not going to elaborate on these encodings, but these
variations vary only in the order of bytes in the character. UTF-8 and UTF-16 may also
include BOM (byte order mark) at the beginning of a text to indicate whether the text
was encoded using big-endian or little-endian variation. This allows to avoid having a
beforehand agreement.
The complete table of possible Unicode encodings for Greek small letter alpha is
provided below.

Character Name Unicode Number (Hex) Unicode Numb

α Greek small letter alpha U+03B1 945

Encoding Hex Binary

UTF-8 CE B1 11001110 10110001

UTF-16BE 03 B1 00000011 10110001

UTF-16LE B1 03 10110001 00000011

UTF-32BE 00 00 03 B1 00000000 00000000 00000011 10110001


UTF-32LE B1 03 00 00 10110001 00000011 00000000 00000000

As a conclusion, the table with listed properties of the discussed encodings is provided
below.

Variable or Bytes per


Encoding Usage
Fixed-length Character

ASCII and other 1-byte Fixed 1 No longer widely used and


encodings should be avoided

UTF-32 Fixed 4 Rarely used

UTF-16 Variable 2 or 4 Rarely used

UTF-8 Variable 1 to 4 Widely used


Color Models

What is a color? Most electromagnetic waves are not visible for us, but a narrow band
of electromagnetic waves stimulates retina of our eyes making these waves visible for
us. Each individual wavelength of this band represents a particular color. Human eyes
have cells known as cones which are responsible for color perception. There are three
types of cones, each being sensitive to its own range of colors: red, blue, and green.
However, it does not mean that red cones are not stimulated when you are looking at
a green color, it means that they are stimulated less than the green ones are. Our brain
interprets how much each type of a cone is stimulated producing our sensation of
color as a result.
There are two ways to obtain color: additive color mixing and subtractive color mixing.

RGB stands for Red, Green, and Blue (primary colors in the RGB model). Each
computer screen pixel releases the amounts of red, blue, and green light so that our
eyes perceive the desired color.
This type of color mixing is called “additive” because we start with a black color and
combine (or add) different amounts of red, green, and blue light. We add several wave
lengths to obtain a color.
When painting or printing, colors are obtained in a way that is called "subtractive".
The white surface seems white to us because it reflects all visible wavelengths (the
white color is simply a mixture of all visible wavelengths). By painting on the white
surface, we make the surface absorb some part of spectrum, thus giving it a color.
When we mix paints, both paints still absorb all wavelengths they did previously. The
only difference is that the wavelengths reflected by both paints are reflected now.
Thus, we “subtract” colors from white to obtain a new color.
In color printing, the usual primary colors are cyan, magenta, and yellow (CMY). The
key (K, black) component is often added to avoid mixing all colors to obtain black.
FSM is defined by:

o the list of its states


o the alphabet which is just a list of all possible input characters
o the initial state
o the list of accepting states
o the list of transitions which can be represented by a two-dimensional
table (state x character).

In case of syntax checking, the inputs are characters from a string


being checked for correct syntax. The input string is read character by
character. Each new character triggers a transition.
Let us jump straight to an example. The FSM shown below checks if the
input string is a decimal number written in the E notation. The E
notation is used by most calculators and many computer programs to
represent numbers in scientific notation. The string "mEn" represents
the number m×10n.

This FSM has seven states labelled from 1 to 7; each state is represented by a circle on
the graph. State 1 is the initial state; it is denoted by a circle with an arrow coming
from nowhere. States 2, 4, and 7 are accepting states; they are shown as double
circles. The transitions are represented with arrows. Each arrow has a symbol, or a list
of symbols written next to it. These symbols cause corresponding transitions.
Suppose a number 14.5E+5 (14.5×105) is given as an input to this FSM. Let us run it
through this FSM. The machine starts in state 1 as it is our initial state.

State Symbol Transition

1 1 The machine follows a "Digit" transition from state 1 to state 2. Even


though it is an accepting state, we do not stop here because we have
more symbols to process.
2 4 The machine follows a "Digit" transition from state 2 to state 2. This
transition loops around back to state 2.

2 . The machine follows a "." transition from state 2 to state 3.

3 5 The machine follows a "Digit" transition from state 3 to state 4.

4 E The machine follows an "E" transition from state 4 to state 5.

5 + The machine follows a "+" transition from state 5 to state 6.

6 5 The machine follows a "Digit" transition from state 6 to state 7.

State 7 in an accepting state, so the string 14.5E+5 is accepted.


The number 14.5E+5 runs through all possible states. If we try to feed this machine a
simpler number like 35, it will end up in state 2, another accepting state; however, it
accepts another form of an input string.
What if we give this machine the string "3E+5.1"? Clearly, this number is not in the E
notation. After processing the first four symbols, we end up in state 7 and the next
symbol to process is ".". There is no transition for this symbol. Even though the
machine stops in an accepting state, the string is not accepted because the machine
couldn't fully process the string.
Construct an FSM which accepts a valid month number (1, 2, …, 12).
Check Your Answer
Begin with an initial state.
A valid month number can be a one-digit number (1, 2, …, 9) or a two-
digit number (10, 11, 12). We begin by adding a transition to accepting
state 2 if the first digit is from 2 to 9. We exclude 1 for now because 1
is a special case. We will deal with it later.

Next, add a transition to another state if the first digit is 1.


This state should be an accepting state because 1 is a valid month
number. Double circle state 3.

Two-digit numbers 10, 11, and 12 should also be accepted. We need


another accepting state if the next digit after 1 is 0, 1, or 2.
This machine accepts all valid month numbers. If we try to feed it with
any non-valid month number, it will not accept it. For example, if we
feed it with number 0, it will stop in state 1 because there is no
transition for symbol “0” in state 1. If we feed it with number 13, it will
stop in state 3 because there is no transition for symbol “3” in state 3.
If we feed it with number 25, it will stop in state 2 because there is no
transtion for symbol “5” in state 2.
We can combine states 2 and 4 to simplify this machine so that it has
only 3 states.
Regular expressions have many different implementations which
slightly vary in their syntax. However, the basics of syntax are the
same.
Any single literal character in a regex matches this character. A simple
regex "a" will match the "a" after "M" in the string "Mary is a girl".
Several characters have special meaning. If you want to use any of
these characters as a literal, you need to escape them with a
backslash. To escape a character means to write a special escape
character before the character you want to escape. For example, the
dot "." has a special meaning in a regex, to match the literal dot you
should write "\.". To match the escape character you escape itself, that
is write "\\".

Special
Description
Character
\ Escape character.

. Matches any single character.

^ Beginning of a string.

$ End of a string.

| A vertical bar is a Boolean "or". It separates alternatives. For example, "gray|


grey" matches both "gray" and "grey".

? Matches zero or one occurrence of the preceding element.

* Matches zero or more occurrences of the preceding element.

+ Matches one or more occurrences of the preceding element.

Special constructions can be used to group characters or indicate a


range of matching characters.

Special
Description
character

(…) Used for grouping to define the scope and precedence of other operators. For
example, "gray|grey" and "gr(a|e)y" are equivalent patterns.
[…] Matches a single character contained within brackets. It is not needed to escape
characters listed in brackets even if it is a special character. For example, "[*+]"
would match "*" or "+".

[X-Y] Matches any character between X and Y. For example, [a-z] matches any
lowercase letter of the English alphabet, [0-9] matches any digit.

[^…] Matches a single character not contained within brackets.

{n} Matches exactly n occurrences of the preceding element.

{min, max} Matches the preceding element at least min times and not more than max times.

Character classes are a shorthand for most frequently used ranges of


characters. Some examples are given below.

Character
Description
class

\w Alphanumeric characters and underscore

\s Whitespace characters

\d Digits

Regular expressions are easier to learn with examples.


Regular
Description
expression

.ar The dot matches any character, so this regex would match "bar", "car",
"ear" and others.

[ec]ar [] matches a single character listed in brackets. This regex matches only
"ear" and "car".

[^c]ar Matches all strings matched by ".ar" except "car".

^[ec]ar Matches "ear" or "car" but only at the beginning of the string.

a\{5\}r Matches "a{5}r" because "{" and "}" are escaped.

a{5}r Matches "aaaaar" because "{" and "}" are not escaped.

There is a direct link between regular expressions and FSMs. It can be


shown that any regular expression can be converted to an equivalent
non-deterministic FSM accepting the same set of strings which is
matched by this regular expression. As we mentioned before, a non-
deterministic FSM can be converted to a deterministic FSM. A DFSM is
a set of instructions which can be easily performed by a computer.
Every time you use a regular expression in your code or in a text
editor, behind the scenes, it is converted to a DFSM to test if there is a
match.
Operating System

There are many manufacturers of computer hardware. They


manufacture thousands of different models and types of equipment.
The question arises: how to create programs suitable for each type of
equipment? This is a very complicated task. It would make more sense
not to create specific programs but to create a system that will be able
to establish interaction between a user, computer equipment, and
programs. Such a system is called an operating system.
An operating system (OS) is system software that manages
computer hardware, software resources, and provides common
services for computer programs.
How does an OS work?

Let's look at an example of how it works.


Click each
heading for more
information.
What is the difficulty?

When writing a
program, a
programmer takes
into consideration
that the program will need to perform a task with a specific hardware.
For example:

 a Messenger must be able to interact with a network card. However, there are
many different network cards.
 a Web Browser must be able to work with a processor. However, there are many
different processors.

If each program interacted directly with hardware, it would be quite


difficult for programmers to develop programs and applications.
Imagine writing a program that will support 100 different processors
and 20 different network cards.
What is the solution?

The solution is to
use an operating
system. An
operating system
is a large program
that interacts with
various hardware
manufacturers
using a set of
drivers.
A driver is a
computer
program that
operates or controls a particular type of device that is attached to a
computer. For example, there are drivers for INTEL PROC, AMD PROC,
WD HDD, D-Link NC network cards, etc. A shell (e.g. NVIDIA, Samsung
Magician) might be used to make it easier to run and set drivers.
Types of OS

Let's take a close look at the types of operating systems and statistics
about their use.
Click each tab for more information.
Desktop Operating SystemsMobile Operating Systems
According to
the Statcounter,
the dominant
desktop operating
systems in 2019
were:

 Windows
(76.32%)
 OS X (17.65).

Thus, an
operating system
helps developers
of programs and applications not to tie development to specific
hardware and, thus, it increases the efficiency of a program
development process. The main functions of an operating system are
as follows:

 managing a computer's resources (e.g. CPU, memory) between different


programs and applications
 establishing a graphical user interface
 executing and providing services for applications software

In the further topics, we will discuss in more detail Windows and Linux
operating systems.

A command line is a user interface navigated by typing


corresponding commands at prompts by using a keyboard instead of a
mouse as in the case of using graphical user interface (GUI). To start
the command line in Windows OS, press Windows+R to open "Run" box.
Then, type cmd and click OK to open a regular Command Prompt or
type cmd and then press Ctrl+Shift+Enter to open an administrator
Command Prompt.
Switch Drive in Command Line

When working with command line, you


may face the problem when the access to
some drives or directories may be
blocked or limited by a system
administrator on computers belonging to
a corporate organization or an
educational institution. If for this reason
you need to change the drive that opens
in the command line by default, just type the desired drive name with a colon, like
"D:".
Run as Administrator

For the same reason that is listed above, for some operations, you may need to run the
Command line with administrator privileges – "Run as administrator."
CD Command in Command Line

CD is the command-line command that is used for navigating the directory tree. One
of the possible use cases of using CD command is to jump directly to your target folder
by enclosing the path in quotation marks:
C:\Users\user>cd "C:\Windows\Microsoft.NET\DirectX for Managed Code"
To learn more about CD, use help command:
cd/?
Processes and Threads

When you run any program, the program creates processes. Each process provides the
resources needed to execute the program.
In simple terms, a process is an executing program. One or more threads run in the
context of the process.
A thread is the basic unit to which the operating system allocates processor time. A
thread can execute any part of the process code, including the parts currently being
executed by another thread.
From: official site of Microsoft
Each process can create a child process. In this case, the process which has a child
process is called the parent process.
From: official site of Microsoft
How to end/kill/close a process?

In the Task Manager, if you want to end a process, right-click on the desired process
and click End task.
If you want to end a process and all related child processes, right-click on the desired
process and click End process tree.
Sorting in Task Manager

In the Task Manager, you can sort processes by Name, Status, CPU, Memory, Disk, Net
Restart a Service from the Command Line

Sometimes there are cases when you need to restart a service. You can do this using
the command-line program, SC.
SC is a command-line program used for communicating with the
Service Control Manager and services.
sc <server> [command] [service name] <option1> <option2>...
where service name is the name of the service. If the name contains
spaces, enclose it in quotation marks, "service name."
Let us consider the basic requests and the corresponding commands.

Request Command

To stop a service sc stop [service name]

To start a service sc start [service name]

To restart a running service sc stop [service name] && sc start


[service name]

If you are not sure whether a service is already running, and sc stop [service name] & sc start
want to restart or to start it [service name]

Click each tab for more information.


&&&

&& is "and" operator. If two commands are chained together using &&,
then the second command will run only if the first command has been
finished successfully.
If we talk about the start of a service, you can start it with parameters, if necessary.
sc [<ServerName>] start <ServiceName> [<ServiceArguments>]
where

 <ServerName>: Specifies the name of the remote server on which the service is
located. The name must use the Universal Naming Convention (UNC) format
(for example, \\myserver). To run SC.exe locally, omit this parameter.
 <ServiceName>: Specifies the service name returned by
the getkeyname operation.
 <ServiceArguments>: Specifies the service arguments that should be passed
for the service to be started.

Users & Groups

What is the purpose of a user account in Windows? In simple terms, a


user account is a kind of a folder that contains information about a
user. A user account is needed to pass authorization and to access a
user’s personal data. When you log in to social networks, you enter
your login and password, and only after that access your personal
page. With user accounts in Windows the logic is the same.
A user account is a collection of settings
and permissions that Windows uses to
understand what actions you are allowed
to perform, what files and folders you
have access to, what devices you can use,
and so on.
Without logging into your account, you
will not be able to access your personal
files, applications, and settings.
Moreover, in the situations when one
computer is used by several users, user accounts allow to keep all these personal data
separate. For example, while using a shared corporate computer, logging into your
account, you will be taken to your personal desktop with all your documents,
applications, and so on.
When an account is created, a user is granted permissions that determine what that
user will be allowed in the system. If we talk about how permissions can be assigned,
there are two possible ways.

1. The first way is to set each account rights individually. This method becomes
ineffective when there are a lot of users that should have similar permissions.
2. The second method is more preferable from this point of view. The essence of
this approach is that users are formed into groups, and then the whole groups
are granted the required permissions or rights. For example, a group of
administrators.

When we talk about creating new users in an organization and authenticating them,
concepts such as domain and active directory may come up.
A domain is the minimal structural unit in Active Directory. It may include network
objects, such as users, computers, printers, shared resources, etc.
For example, domains may correspond to departments in an organization, or to
geographical locations.
Active Directory (AD) is a directory service that stores information about objects on
the network and makes this information easy for administrators and users to find and
use.

You can find more information about Active Directory and the related
concepts on the official website of Microsoft and here.
An environment variable is a text variable of the operating system storing some
information, for example, system settings data, a path to a folder, etc.
There are two types of environment variables:

 User variables
 System variables

Environment variables can be useful in many ways. One of the possible applications of
environment variables is to use them for quick access to a folder. If you have some
documents in a folder with a long path, you can create an environment variable which
will contain the path to the required folder and, when necessary, use it for quick
access. Consider an example.
Click the arrow to see information about the next steps.
Step 1
Imagine that you have a folder named "Subfolder" with the following
path:
C:\Users\<UserName>\Documents\Folder\Subfolder

Another application of environment variables is to use them in scripts and batch


(.CMD or .BAT ) files as part of a path.

What is Shell?

In simple terms, the shell is a command line interface that takes commands and
executes them. In other words, the shell processes commands and returns the output.
What is Terminal?

Terminal, or a terminal emulator, is a program that allows you to communicate with


the shell through the window.
Below, you can find some of the most popular commands.

Comman Comman
Description Description
d d

man (short for "manual") ps -aux shows processes that are


returns a detailed manual of a currently running; the list of the
command processes is static

pwd (short for "print working more displays long text files per page at
directory") a time; you can navigate only
prints the current working forward
directory

ls lists files and directories in the less displays long text files per page at
current directory a time; you can navigate both
forward and backward

cd (short for "change directory") grep searches content in text files


changes the current working
directory

mkdir creates a new folder kill terminates a process

echo prints a string of a text to the vi a text editor


terminal window

cat (short for "concatenate") wget download a file from the Internet
reads data from a file and displays
its content on the screen

top shows processes that are sudo runs a program as a super user
currently running; the list of the
processes is updated every
second
OSI Model

How does a computer network work? For better understanding, let's


have a closer look at what parts it consists of. The OSI model will help
to understand this.
The Open Systems Interconnection (OSI) model is a conceptual
model used to describe the constituent parts and the functions of a
networking system.
The classic OSI model has 7 layers. Let's consider what each of the
levels is intended for.
Click each + button in the graphic below for more information.

Let's look at an example. Suppose we start transferring files via a


messenger or watching a video on YouTube. At this point, the
application forms a specific request, and the work starts on the
Application layer. Then, the work is carried out sequentially on all
layers. Having reached the Physical layer, the work is reduced to
transmitting bits.
This is the classic OSI model. It was developed in 1983. However, in
practice, computer networks had been working even before this model
was developed. And they worked slightly differently - using the TCP/IP
stack.
TCP/IP Model

The Transmission Control Protocol/Internet


Protocol (TCP/IP) model is a concise version of the
OSI model.
It consists of 4 layers. In this model, compared to OSI, some layers are
combined with each other. For example, the Application, Presentation,
and Session layers are merged into the Application layer. The Physical
layer from the OSI model is missing from the original TCP/IP model;
however, it should be noted that there are other versions of the TCP/IP
model and networking models in general. E.g., Andrew S. Tanenbaum
adds the Physical layer to the four layers described above.
Click here to compare different models.

Protocols

On each layer, certain protocols work.


In this way, on the Application layer we can use a browser. Here data is
generated using the HTTP protocol. If we transfer files, then FTP (File
transfer protocol) can be used. The data that we see on the page can
be presented in multiple different formats: for example, illustrations
in PNG format or lists in JSON format. To receive this data,
the TCP or UDP protocol is used, which is responsible for
transportation. If we look at the layer just below, we will see the
protocols of network interactions. The IP protocol is responsible for
addressing and routing across the network. Below is
the Ethernet protocol. The last Physical layer can be directly occupied
by the technology which enables you to connect to the network: DSL,
Bluetooth, OTN.
Thus, each upstream protocol uses a downstream protocol which is
responsible only for its own task.
To conclude, you have considered two key models. There is the OSI
model and the TCP/IP model. The network layers described in these
models help to understand how data moves from a user-readable and
a computer-readable format to a transmitted signal and back.

While IP can be used to access information on the Internet, people


most frequently use those domain names that could be easily
memorized, like google.com. The Domain Name System (DNS) is
basically the phonebook of the Internet. The purpose of DNS is to
translate domain names to IP addresses. DNS resolution is a process of
converting a hostname into an IP address.
The DNS resolution process
involves 4 types of servers.
Click each + button in the graphic
below to see more information.

This chain of requests takes a lot


of time. To speed up the process,
DNS records are temporarily saved (cached) at various stages of the
process:

 Browser DNS caching. Modern web browsers cache DNS records for a small
amount of time.
 Operating system caching. The DNS resolution request from an application may
not leave your machine if your operating system has already made a request for
that domain name and has it in its cache.
 DNS server caching. Each type of the DNS server may cache responses it
received from other DNS servers.

A Uniform Resource Locator (URL), often called a web address, is a


reference to a web resource that specifies its location on a computer
network and a protocol for retrieving it.
Most used type of databases is a relational database. The data in
the relational database is stored in tables. Each row in the table is a
record with a unique ID called the key. The columns of the table hold
attributes of the data, and each record usually has a value for each
attribute. A relational database provides the most reliable way to
access structured information. Click here to read about the ACID test
used to check the reliability of databases. Virtually all relational
databases use structured query language (SQL) for writing and
querying data.
Relational databases are called that way because they usually contain
multiple tables, some of which are related to each other. Each row in a
table has its own unique key. Rows in a table can be linked to rows in
other tables by having an additional column containing the key of the
linked row. Such columns are called foreign keys. Let us have a look at
the example.
We have three tables: customers, products, and orders.
The CustomerId and ProductId fields in the Orders table contain foreign
keys to the CustomerId column in the Customer table and
the ProductId column in the Products table, respectively. Instead of
writing out the name and price of the product, and the name and
address of the customer in that table, we only add the customer’s and
product’s unique identifiers. Suppose, for example, the customer’s
address changes, we do not have to update it in all past orders, we
only need to change it in the Customers table.

Among relational databases, the most popular relational database


management systems are:

o Oracle database
o MySQL
o Microsoft SQL Server
o PostgreSQL
Relational - The data is stored in tables, some of which are related to each other. Each
row in the table is a record with a unique ID called the key. The columns of the table
hold attributes of the data, and each record usually has a value for each attribute.
Information in an object-oriented database is represented in the form of objects, as in
object-oriented programming.
Several organizations have released various guidelines on strong
passwords. Most common advice is:

 A minimum password length should be 10


 Use lowercase and uppercase characters, numbers, and symbols
 Do not use the same password for different accounts
 Avoid dictionary words and any information which might be associated with the
user (names, dates)

Another good piece of advice is to use Diceware technique.


The Diceware website provides a numbered list of words. Each word in
the list has a 5-digit number using only digits from 1 to 6 assigned to
it. You randomly select several words from this list and use these
words as a single passphrase. To select a truly random word, roll the
dice 5 times to obtain a word’s number. For example, if you rolled 4, 3,
6, 5, 3, the number is 43653, which corresponds to the word "nova." It
is recommended to use a minimum of 6 words in the password
generated this way. The resulting password might be "nova sense copy
lent ram quiet". It is important to select words randomly. Any phrase
which makes sense would be a weak password .
It is almost impossible to remember all your passwords if you use a
different password for each system as it is recommended. A
reasonable compromise is to use a password manager program which
securely stores all your passwords. It has a master password to open
its database and this password should be a strong randomly generated
one which you will have to remember.
Password strength can be measured by the number of guesses needed
to find the password, but in computer industry the base-2 logarithm of
that number is usually used; it is called "entropy bits." For example, a
password with an entropy of 32 bits would require 2 32 (4,294,967,296)
attempts to guess the password.
Depending on the symbol set used in a password, the entropy per
symbol varies.

Symbol Set Number of Different Symbols Entropy per Symbol

Numbers 10 3.322

Case-insensitive English alphabet 26 4.700

Case-sensitive English alphabet 52 5.700

Case-sensitive alphanumeric (a-z, A-Z, 0-9) 62 5.954

All ASCII printable characters 95 6.570

A 6-symbol password that uses only lower-case Latin letters will have
28.2 entropy bits. It would take only 228.2 = 308,351,367 attempts to
discover it (if we assume that we spend 1 second per attempt, this
results in approximately 9.8 years of trying).
Diceware method gives 12.9 bits of entropy per word (there are
65 words in the list, so the entropy per word is ). The entropy of the
recommended six-word password is 77.5 bits, assuming an attacker
knows that you used diceware technique. Otherwise, it is even more
entropic.
So far, we assumed that the attacker would try all possible
combinations of symbols to find the password. Such attacks are
called brute-force attacks. In fact, a more common attack is
a dictionary attack. It is based on trying all the strings from a
predefined list. Usually this would be words from a dictionary. The
attacker also tries different combinations of these strings and makes
well-known substitutes like writing "$" instead of "s" in the words ("pa$
$word").
Another threat is that the attacker may get access to the system
storing passwords. For example, the database containing a user’s
passwords may be stolen. In the past, the systems used to store
passwords as plain text. After gaining the access, the attacker would
immediately know the usernames and their passwords, and may try to
use these credentials to access other systems because people often
reuse their passwords. To prevent that, a technique called password
hashing is used. Modern systems almost never store passwords as
plain text; they store a hash of a password instead. The hash is a
result of applying a one-way function to a password. The one-way
function is a function which is easy to compute on every input, but
hard to invert given the result. A simple example of a one-way function
is a product of two prime numbers. It is easy to multiply two large
prime numbers, but very difficult to find prime factors having only the
product.
One of the most used and well-known hashing algorithm families is a
SHA (Secure Hash Algorithm). It uses a number of logical operations
(AND, XOR, OR, and others) on the input to produce its output. SHA
usually has a number after its name to indicate the variant being used.
For example, SHA-256 means that the result is a 256-bit hash. A hash
is essentially a fixed-size image of an input string which can be of any
length. An important property of hash functions is that it should be
virtually impossible to find two input strings which would produce the
same hash, but this is still possible because it maps a larger set to a
smaller one. Older versions of hashing algorithms (SHA-1 and MD5) are
no longer widely used because such collisions were found.
Secure Communications
Secure communications take place when the parties want to
communicate without the third party being able to listen to them.
Secure communications over the Internet are usually achieved via
encryption, assuming the endpoints are not compromised.
The two primary methods for encryption used in computer systems
are symmetric and asymmetric.
Click each heading to see more information about primary methods of
encryption.
Symmetric
In symmetric encryption, the same key is used for encryption and
decryption. The key should be somehow passed to both parties
beforehand via a secure method. If an attacker eavesdrops on the
message, he would not be able to understand it without the key. One
of the simplest symmetric encryptions is XOR cipher. XOR operation is
applied to the message and the key, repeating the key if the message
is longer than the key. The party receiving the message uses the same
operation to decrypt the message.
– message, – key, – encrypted message
The encrypted message is .
The decryption works because
For example, the string “EPAM” (01000101 01010000 01000001
01001101 in ASCII) can be encrypted using the key “PW” (01010000
01010111 in ASCII) as follows:

To decrypt the message, we apply XOR operation again:

The XOR cipher is not very secure though because it is susceptible to


frequency analysis. If the key is not changed for a long time, the
eavesdropping attacker can gather enough data to match the
decrypted messages to the frequencies of words and letters in a
typical text. However, it is very fast and easy to implement. XOR
operations, among others, are used in the AES (Advanced Encryption
Standard) which is the most widely used symmetric encryption
nowadays.
Asymmetric
In asymmetric encryption (also known as public-key), there are two
keys: public and private. Each party generates its own pair of keys. A
public key is used for data encryption and it is shared to everyone.
Anyone who wants to send an encrypted message to the owner of a
pair of keys uses the recipient’s public key to encrypt the message. A
private key is used for data decryption and is kept secret. The
combination of a public and a private key is called a key pair. The
public and private key are mathematically related in such a way that
even if one knows the public key, it is almost impossible to discover
the private key. Even if the attacker intercepts the message, it is
impossible to decrypt it using the public key only.
One of the most popular public key cryptosystems is RSA, named after
its authors, Ron Rivest, Adi Shamir, and Leonard Adleman. Without
going into much detail, it is based on a fact that it is hard to factor the
product of two large prime numbers. A part of a public key is a number
which is a product of two prime numbers. The prime factors are never
shared to the public, and one of them is a part of a private key. One of
the disadvantages of asymmetric encryption is that generating keys
and encrypting messages is much slower compared to symmetric
encryption. In practice, asymmetric encryption is often used only to
establish the connection between two parties and obtain a common
key for symmetric encryption.

You might also like