It Officer Notes Ebook
It Officer Notes Ebook
File System:
Stores permanent records in various files
Need application program to access and manipulate data.
Data Redundancy
Data Inconsistency
Difficult in accessing data
Data Integrity
Low Security
Data Isolation: The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.As an example, if two people are
updating the same catalog item, it's not acceptable for one person's changes to be
"clobbered" when the second person saves a different set of changes. Both users should
be able to work in isolation, working as though he or she is the only user. Each set of
changes must be isolated from those of the other users.
Data Integrity is the assurance that information is unchanged from its source, and has not
been accidentally (e.g. through programming errors), or maliciously (e.g. through breaches
or hacks) modified, altered or destroyed. In another words, it concerns with the
completeness, soundness, and wholeness of the data that complies with the intention of
data creators.It's a logical property of the DB, independent of the actual data.
Data Consistency refers to the usability of the Data, and is mostly used in single site
environment. But still in single site environment, some problems may arise in Data
Consistency during recovery activities when original data is replaced by the backup copies
of Data. You have to make sure that you data is usable while backing up data.
Data Abstraction:To simplify the interaction of users and database, DBMS hides some
information which is not user interest is called Data Abstraction. So, developer hides
complexity from users and show Abstract view of data.
1)External/View Level:It is user's view of the database.This level describes the part of the
database that is relevant to each user.
2)Conceptual/Logical Level:
Describes what data is stored in the database and the relationship among the data.
Represent all entities, their attributes and their relationship
Constraints on the data
Security and Integrity information
3)Physical/Internal Level:
Schemas:
Instances:
Collection of information stored in the database at a particular moment.
Sub-schema: It is a subset of schema and inherits the same property that the schema
has. It is an application programmer's or user view of the data items types and record
types which he or she uses.
DBMS Components:
1)Hardware
2)Data
3)Software
4)Users
5)Procedures(Set of rules for database management)
Types of Users:
a)Naive Users:
End Users of the database who work through menu driven application programs, where
the type and range of response is always indicated to the users.
b)Online Users:
Those users who may communicate with database directly through an online terminal.
c)Application Programmer:
Those users who are responsible for developing the application program.
d)DBA(Database Administrator)
DBA(Database Administrator):
DBA directs or performs all activities related to maintaining a successful database
environment.
Function of DBA:
Database Languages:
1)DDL(Data Definition Language):
Deals with database schemas and description, how the data should reside in the
database.
Used to alter/modify a database or table structure and schema.
Create
Alter
Drop
Rename
Truncate
Comment
Update
Select
Insert
Delete
Merge
Call
Lock Table
Grant
Revoke
4)Transaction Language:
Control and manage transactions to maintain integrity of data within SQL statement.
Command used in Transaction Language:
Set Transaction
Commit
Savepoint
Rollback
Database Model:
Logical structure of a database and fundamental determines in which manner data can be
stored, organized and manipulated.
1)Hierarchical Model:
Data is organized in tree like structure, implying a single parent for each record.
Allows to one to many relationship
2)Network Model:
Allows many to many relationship in a graph like structure that allows multiple
parents.
Organise data using two fundamental concepts called records and sets.
3)Relational Data Model:
Collection of tables to represent data and the relationship among those data. Eg: Oracle,
Sybase.
Hierarchical ,Network and Relational data model is type of Record Based Model
1)Entity: It is "thing" or "object" in the real world that is distinguishable from all other
objects. An entity has a set of properties and values for some set of properties that may
uniquely identify an entity.
2)Entity Set:
Collection of entities all having same properties or attributes.
3)Attributes:
Each entity is described by set of attributes/properties. Attributes are descriptive properties
possessed by each member of an entity set.
For each attributes, there is set of permitted values called domain or value set of the
attributes.
Types of attributes:
1)Simple Attributes: Not divided into subpart eg: any unique number like 1234
2)Composite Attributes: Divided into subpart eg: Name is divided into first name, middle
name and last name.
3)Single Value Attribute: Single value for a particular entity eg: order_id
4)Multivalued Attribute: More than one value for a particular entity eg: Phone No.
5)Derived Attribute: Attribute value is dependent on some other attribute.Eg: Age
1)Primary key:
A primary is a column or set of columns in a table that uniquely identifies tuples (rows) in
that table.
A relation may contain many candidate keys.When the designer select one of them to
indentify a tuple in the relation,it becomes a primary key.It means that if there is only one
candidate key ,it will automatically selected as primary key.
2)Composite key
Key that consist of two or more attributes that uniquely identify an entity occurrence is
called Composite key. But any attribute that makes up the Composite key is not a simple
key in its own.
3)Super Key
A super key is the most general type of key.A super key is a set of one of more columns
(attributes) to uniquely identify rows in a table.Super key is a superset of Candidate key.
4)Candidate key
A candidate key is simply the "shortest" super key. Candidate Key are individual columns
in a table that qualifies for uniqueness of each row/tuple.Every table must have at least
one candidate key but at the same time can have several.
5)Secondary key
Out of all candidate keys, only one gets selected as primary key, remaining keys are
known as alternate or secondary keys.
6)Foreign key
A FOREIGN KEY in one table points to a PRIMARY KEY in another table.They act as a
cross-reference between tables.
Relationship Type: A relationship type defines a set of associations among entities of the
different entity types.
b)Participation Constraint
a)Cardinality Ratio: Specifics the number of relationship instances that an entity can
participate in.The possible cardinality ratios are:
b)Participation Constraint: The participation constraint specifies whether the existence
of an entity depends on its being relate to another entity via the relationship type.There are
two types of participation constraints:
1)Specialization:
Consider an entity set person, with attributes name, street, and city. A person may be
further classied as one of the following:
customer
employee
2)Generalization:
3)Aggregation:
Normalization: It is the process of removing redundant data from your tables in order to
improve storage efficiency, data integrity and scalability. This improvement is balanced
against an increase in complexity and potential performance losses from the joining of the
normalized tables at query-time.There are two goals of the normalization process:
eliminating redundant data (for example, storing the same data in more than one table)
and ensuring data dependencies make sense (only storing related data in a table). Both of
these are worthy goals as they reduce the amount of space a database consumes and
ensure that data is logically stored. Normalization is also called Bottom-up-approach,
because this technique requires full knowledge of every participating attribute and its
dependencies on the key attributes, if you try to add new attributes after normalization is
done, it may change the normal form of the database design.
F = {A B, D C}.
From set of attributes F, we can derive the primary key. For R, the key can be (A,D), a
composite primary key. That means, AD BC, AD can uniquely identify B and C. But, for
this case A and D is not required to identify B or C uniquely. To identify B, attribute A is
enough. Likewise, to identify C, attribute D is enough. The functional dependencies AD
B or AD C are called as Partial functional dependencies.
a)1NF
b)2NF
c)3NF
d)BCNF
e)(4NF)
f)5NF
a)1NF: A relation is considered to be in first normal form if all of its attributes have
domain that are indivisible or atomic.
A table is in 1NF if and only if its satisfies the following five conditions:
b)2NF:
An attribute that is not part of any candidate key is known as non-prime attribute.
X->Y
Y does not ->X
Y->Z
An attribute that is not part of any candidate key is known as non-prime attribute.
In other words 3NF can be explained like this: A table is in 3NF if it is in 2NF and for each
functional dependency X-> Y at least one of the following conditions hold:
An attribute that is a part of one of the candidate keys is known as prime attribute.
BCNF is more restrictive than 3NF.While decomposing relation to make them in BCNF we
may loose some dependencies i.e BCNF does not guarantee the dependency
preservation property.
e)4NF
f)5NF
Fifth normal form (5NF), also known as project-join normal form (PJ/NF) is a level
of database normalization designed to reduce redundancy in relational databases
recording multi-valued facts by isolating semantically related multiple relationships.
A table is said to be in the 5NF if and only if every non-trivial join dependency in it is
implied by the candidate keys.
A join dependency *{A, B, Z} on R is implied by the candidate key(s) of R if and only
if each of A, B, , Z is a superkey for R
Normalization De-Normalization
It removes data redundancy i.e.; it eliminates It creates data redundancy i.e.; duplicate data
any duplicate data from the same table and may be found in the same table.
puts into a separate new table.
It maintains data integrity i.e.; any addition or It may not retain the data integrity.
deletion of data from the table will not create
any mismatch in the relationship of the tables.
It increases the number of tables in the It reduces the number of tables and hence
database and hence the joins to get the result. reduces the number of joins. Hence the
performance of the query is faster here
compared to normalized tables.
Even though it creates multiple tables, inserts, In this case all the duplicate data are at single
updates and deletes are more efficient in this table and care should be taken to
case. If we have to insert/update/delete any insert/delete/update all the related data in that
data, we have to perform the transaction in table. Failing to do so will create data integrity
that particular table. Hence there is no fear of issues.
data loss or data integrity.
Use normalized tables where more number of Use de-normalization where joins are
insert/update/delete operations are performed expensive and frequent query is executed on
and joins of those tables are not expensive. the tables.
Relational Algebra
Below are fundamental operations that are "complete". That is, this set of operations alone
can define any retrieval.
Select
Project
Rename
Union
Set Difference
Cartesian Product
Selection()
Selection is used to select required tuples of the relations.
for the above relation
(c>3)R
will select the tuples which have c more than 3.
Note: selection operator only selects the required tuples but does not display them. For
displaying, data projection operator is used.
Projection ()
Projection is used to project required column data from a relation.
Union (U)
Union operation in relational algebra is same as union operation in set theory, only constraint
is for union of two relation both relation must have same set of Attributes.
Rename ()
Rename is a unary operation used for renaming attributes of a relation.
(a/b)R will rename the attribute b of relation by a.
Relational Calculus
1. Relational algebra operations manipulate some relations and provide some expression in
the form of queries where as relational calculus are formed queries on the basis of pairs of
expressions.
2. RA have operator like join, union, intersection, division, difference, projection, selection
etc. where as RC has tuples and domain oriented expressions.
3. RA is procedural language where as RC is non procedural query system.
4. Expressive power of RA and RC are equivalent. This means any query that could be
expressed in RA could be expressed by formula in RC.
5. Any KC formula is translated in Algebric query.
6. There is modification which is easy in queries in RA than the RC.
7. RA formed the mathematical form and have no specificjuer1 language RC also has
mathematical form but has one query language QUEL.
8. Relational algebra is easy to manipulate and understand than RC.
9. RA queries are more powerful than the RC.
10. RC are formed WFFs where as RA does not form any formula.
11. RA is a procedural. That means we have write some conditions in order.
12. RC is a Non procedural. In here we have write the conditions in any order.
The tuple relational calculus is based on specifying a number of tuple variables. Each such
tuple variable normally ranges over a particular database relation. This means that the variable
may take any individual tuple from that relation as its value. A simple tuple relational calculus
query is of the form { t I COND(t)}, where '1' is a tuple variable and COND(t) is a conditional
expression involving '1'. The result of such a query is a relation that contains all the tuples
(rows) that satisfy COND(t).
For each tuple variable the range relation 'R' of 'to This value is specified by a condition of the
form R(t) .
A condition to select the required tuples from the relation.
A set of attributes to be retrieved. This set is called the requested attributes. The values of
these attributes for each selected combination of tuples. If the requested attribute list is not
specified, then all the attributes of the selected tuples are retrieved.
The domain calculus differs from the tuple calculus in the type of variables used in formulas.
In domain calculus the variables range over single values from domains of attributes rather
than ranging over tuples. To form a relation of degree 'n' for a query result, we must have 'n' of
these domain variables-one for each attribute.
SQL
SQL stands for Structured Query Language.
SQL is used to communicate with a database.
According to ANSI (American National Standards Institute), it is the standard language
for relational database management systems.
SQL statements are used to perform tasks such as update data on a database, or
retrieve data from a database.
Some common relational database management systems that use SQL are: Oracle,
Sybase, Microsoft SQL Server, Access, Ingres, etc.
Some database systems require a semicolon at the end of each SQL
statement.Semicolon is the standard way to separate each SQL statement in database
systems that allow more than one SQL statement to be executed in the same call to the
server.
TIMESTAMP Stores year, month, day, hour, minute, and second values
Commands :
1)Select
The SELECT statement is used to select data from a database.The result is stored in a result
table, called the result-set.
SELECT column_name,column_name
FROM table_name;
OR
SELECT * FROM table_name;
Asterisk(*) means select all columns in the table.
2)Create Table
Used to create tables to store data. Integrity Constraints like primary key, unique key, foreign
key can be defined for the columns while creating the table. The integrity constraints can be
defined at column level or table level.
CREATE TABLE table_name
(
column_name1 data_type(size),
column_name2 data_type(size),
column_name3 data_type(size),
....
);
3)Create DB
Used to create a database.
CREATE DATABASE dbname;
4)Insert
Used to add new rows of data to a table.
INSERT INTO table_name
VALUES (value1,value2,value3,...);
OR
5)Update
Used to modify the existing rows in a table.In the Update statement, WHERE clause identifies
the rows that get affected. If you do not include the WHERE clause, column values for all the
rows get affected.
UPDATE table_name
SET column1=value1,column2=value2,...
WHERE some_column=some_value;
6)Delete
Used to delete rows from a table.The WHERE clause in the sql delete command is optional
and it identifies the rows in the column that gets deleted. If you do not include the WHERE
clause all the rows in the table is deleted, so be careful while writing a DELETE query without
WHERE clause.
DELETE FROM table_name
WHERE some_column=some_value;
7)Alter
Used to change characteristics of a database.After creating a database, we can change its
properties by executing ALTER DATABASE statement. The user should have admin privileges
for modifying a database.
ALTER TABLE table_name
ADD column_name datatype.
8)Order By
Used to sort the result-set by one or more columns.The ORDER BY keyword sorts the records
in ascending order by default. To sort the records in a descending order, you can use the
DESC keyword.
SELECT column_name, column_name
FROM table_name
ORDER BY column_name ASC|DESC, column_name ASC|DESC;
9)Where
Used to extract only those records that fulfill a specified criterion.
SELECT column_name,column_name
FROM table_name
WHERE column_name operator value;1
10)Having Clause
Having clause is used to filter data based on the group functions. This is similar to WHERE
condition but is used with group functions. Group functions cannot be used in WHERE Clause
but can be used in HAVING clause.
If you want to select the department that has total salary paid for its employees more than
25000, the sql query would be like;
SELECT dept, SUM (salary) FROM employee GROUP BY dept HAVING SUM (salary) >
25000
11)Group By
The SQL GROUP BY Clause is used along with the group functions to retrieve data grouped
according to one or more columns.
For Example: If you want to know the total amount of salary spent on each department,
the query would be:
SELECT dept, SUM (salary) FROM employee GROUP BY dept;
12) Group functions are built-in SQL functions that operate on groups of rows and return
one value for the entire group. These functions are: COUNT, MAX, MIN, AVG, SUM,
DISTINCT
SQL COUNT (): This function returns the number of rows in the table that satisfies the
condition specified in the WHERE condition. If the WHERE condition is not specified, then
the query returns the total number of rows in the table.
SQL MAX(): This function is used to get the maximum value from a column.
SQL MIN(): This function is used to get the minimum value from a column.
SQL AVG(): This function is used to get the average value of a numeric column.
SQL SUM(): This function is used to get the sum of a numeric column.
There are other comparison keywords available in sql which are used to enhance the
search capabilities of a sql query. They are "IN", "BETWEEN...AND", "IS NULL", "LIKE".
Comparision Operators Description
BETWEEN...AND column value is between two values, including the end values specified in
the range.
14)Joins
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
(INNER) JOIN: Returns records that have matching values in both tables
SELECT column_name(s)
FROM table1
INNER JOIN table2 ON table1.column_name = table2.column_name;
LEFT (OUTER) JOIN: Return all records from the left table, and the matched records
from the right table
SELECT column_name(s)
FROM table1
LEFT JOIN table2 ON table1.column_name = table2.column_name;
RIGHT (OUTER) JOIN: Return all records from the right table, and the matched
records from the left table.
SELECT column_name(s)
FROM table1
RIGHT JOIN table2 ON table1.column_name = table2.column_name;
FULL (OUTER) JOIN: Return all records when there is a match in either left or right
table
SELECT column_name(s)
FROM table1
FULL OUTER JOIN table2 ON table1.column_name = table2.column_name;
A self JOIN is a regular join, but the table is joined with itself.
SELECT column_name(s)
FROM table1 T1, table1 T2
WHERE condition;
15)AUTO INCREMENT fields are used for auto generating values for particular column
whenever new row is being inserted.Very often the primary key of a table needs to be
created automatically; we define that field as AUTO INCREMENT field.
16)SQL Views
A VIEW is a virtual table, through which a selective portion of the data from one or more
tables can be seen. Views do not contain data of their own. They are used to restrict
access to the database or to hide data complexity. A view is stored as a SELECT
statement in the database. DML operations on a view like INSERT, UPDATE, DELETE
affects the data in the original table upon which the view is based.
The Syntax to create a sql view is
CREATE VIEW view_name AS SELECT column_list FROM table_name [WHERE
condition];
view_name is the name of the VIEW.
The SELECT statement is used to define the columns and rows that you want to
display in the view.
17)SQL Index
Index in sql is created on existing tables to retrieve the rows quickly.When there are
thousands of records in a table, retrieving information will take a long time. Therefore
indexes are created on columns which are accessed frequently, so that the information
can be retrieved quickly. Indexes can be created on a single column or a group of
columns. When a index is created, it first sorts the data and then it assigns a ROWID for
each row.
CREATE INDEX index_name ON table_name (column_name1,column_name2...);
Transaction
A transaction is a set of changes that must all be made together. It is a program unit whose
execution mayor may not change the contents of a database. Transaction is executed as a
single unit. If the database was in consistent state before a transaction, then after execution of
the transaction also, the database must be in a consistate. For example, a transfer of money
from one bank account to another requires two changes to the database both must succeed or
fail together.
A transaction is a logical unit of database processing that includes one or more access
operations:
A transaction (set of operations) may be stand-alone specified in a high level language like
SQL submitted interactively, or may be embedded within a program (say, Java, Python or
C++). A users program may carry out many operations on the data retrieved from the
database, but the DBMS is only concerned about what data is read/written from/to the
database.
ACID Properties
The ACID model is one of the oldest and most important concepts of database theory. A
transaction may contain several low level tasks and further a transaction is a very small unit of
any program. There is a set of properties that guarantee that database transactions are
processed reliably. These properties are called ACID properties and are the subject to the
sections below:
Atomicity
Atomicity states that database modifications must follow an all or nothing rule. Though a
transaction involves several low level operations but this property states that a transaction
must be treated as an atomic unit, that is, either all of its operations are executed or none.
There must be no state in database where the transaction is left partially completed. States
should be defined either before the execution of the transaction or after the
execution/abortion/failure of the transaction. A transaction must be fully complete, saved
(committed) or completely undone (rolled back).
Consistency
The consistency property ensures that the database remains in a consistent state before the
start of the transaction and after the transaction is over (whether successful or not). This
property states that after the transaction is finished, its database must remain in a consistent
state. There must not be any possibility that some data is incorrectly affected by the execution
of transaction.
If each transaction is consistent, and the database starts consistent, then the database ends
up consistent. If a transaction violates the databases consistency rules, the entire transaction
will be rolled back and the database will be restored to a state consistent with those rules.
Durability
Durability refers to the guarantee that once the user has been notified of success, the
transaction will persist, and not be undone. This property states that in any case all updates
made on the database will persist even if the system fails and restarts. If a transaction writes or
updates some data in database and commits that data will always be there in the database. If
the transaction commits but data is not written on the disk and the system fails, that data will
be updated once the system comes up.
Once a transaction commits, the system must guarantee that the results of its operations will
never be lost, in spite of subsequent failures.
Isolation
Isolation refers to the requirement that other operations cannot access or see the data in an
intermediate state during a transaction. This constraint is required to maintain the performance
as well as the consistency between transactions in a database. Thus, each transaction is
unaware of another transactions executing concurrently in the system.
In other words, in a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will affect
the existence of any other transaction.
States of Transaction
A transaction must be in one of the following states:
Active: the initial state, the transaction stays in this state while it is executing.
Partially committed: after the final statement has been executed.
Failed: when the normal execution can no longer proceed.
Aborted: after the transaction has been rolled back and the database has been restored to
its state prior to the start of the transaction.
Committed: after successful completion.
When multiple transactions are trying to access the same sharable resource, there could arise many
problems if the access control is not done properly. There are some important mechanisms to which
access control can be maintained. Earlier we talked about theoretical concepts like serializability, but
the practical concept of this can be implemented by using Locks and Timestamps.
Depending upon the rules we have found, we can classify the locks into two types.
Shared Lock: A transaction may acquire shared lock on a data item in order to read its content. The
lock is shared in the sense that any other transaction can acquire the shared lock on that same data
item for reading purpose.
Exclusive Lock: A transaction may acquire exclusive lock on a data item in order to both read/write
into it. The lock is excusive in the sense that no other transaction can acquire any kind of lock (either
shared or exclusive) on that same data item.
The relationship between Shared and Exclusive Lock can be represented by the following table which is
known as Lock Matrix.
Timestamp-based Protocols
The most commonly used concurrency protocol is the timestamp based protocol. This protocol uses
either system time or logical counter as a timestamp.
Lock-based protocols manage the order between the conflicting pairs among transactions at the time
of execution, whereas timestamp-based protocols start working as soon as a transaction is created.
Every transaction has a timestamp associated with it, and the ordering is determined by the age of the
transaction. A transaction created at 0002 clock time would be older than all other transactions that
come after it. For example, any transaction 'y' entering the system at 0004 is two seconds younger and
the priority would be given to the older one.
In addition, every data item is given the latest read and write-timestamp. This lets the system know
when the last read and write operation was performed on the data item.
Deadlock
A deadlock is a condition wherein two or more tasks are waiting for each other in order to be finished
but none of the task is willing to give up the resources that other task needs. In this situation no task
ever gets finished and is in waiting state forever.
Deadlock Prevention
The DBMS verifies each transaction and sees if there can be deadlock situation upon execution of the
transaction. If it finds everything is fine, then allows the transaction to execute. If it finds that there can
be a deadlock, it never allows the transaction to execute. DBMS basically checks for the timestamp at
which a transaction has been initiated and orders the transactions based on it. If there are any
transactions at same time period with requesting each others resource, then it stops those
transactions before executing it. In above case, DBMS will never allow the transaction to execute
simultaneously. This method is suitable for large system.
Wait-Die Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with a
conflicting lock by another transaction, then one of the two possibilities may occur
If TS(Ti) < TS(Tj) that is Ti, which is requesting a conflicting lock, is older than Tj then Ti is allowed to
wait until the data-item is available.
If TS(Ti) > TS(tj) that is Ti is younger than Tj then Ti dies. Ti is restarted later with a random delay but
with the same timestamp.
This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
In this scheme, if a transaction requests to lock a resource (data item), which is already held with
conflicting lock by some another transaction, one of the two possibilities may occur
If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back that is Tiwounds Tj. Tj is restarted later with a
random delay but with the same timestamp.
If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
This scheme, allows the younger transaction to wait; but when an older transaction requests an item
held by a younger one, the older transaction forces the younger one to abort and release the item.
In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance:
It is always better to avoid deadlock in a system rather than aborting or restarting the transaction. This
is waste of time and resource. Wait-for-graph is one of the methods for detecting the deadlock
situation. But this method is suitable for smaller database. For large database deadlock prevention
method may help.
Wait-for Graph
This is a simple method available to track if any deadlock situation may arise. For each transaction
entering into the system, a node is created. When a transaction Ti requests for a lock on an item, say X,
which is held by some other transaction Tj, a directed edge is created from Ti to Tj. If Tj releases item X,
the edge between them is dropped and Ti locks the data item.
The system maintains this wait-for graph for every transaction waiting for some data items held by
others. The system keeps checking if there's any cycle in the graph.
Storage media are classified by speed of access, cost per unit of data to buy the media, and
by the medium's reliability. Unfortunately, as speed and cost go up, the reliability does down.
1. Cache is the fastest and the most costly for of storage. The type of cache referred to
here is the type that is typically built into the CPU chip and is 256KB, 512KB, or 1MB.
Thus, cache is used by the operating system and has no application to database, per
se.
2. Main memory is the volatile memory in the computer system that is used to hold
programs and data. While prices have been dropping at a staggering rate, the
increases in the demand for memory have been increasing faster. Today's 32-bit
computers have a limitation of 4GB of memory. This may not be sufficient to hold the
entire database and all the associated programs, but the more memory available will
increase the response time of the DBMS. There are attempts underway to create a
system with the most memory that is cost effective, and to reduce the functionality of
the operating system so that only the DBMS is supported, so that system response can
be increased. However, the contents of main memory are lost if a power failure or
system crash occurs.
3. Flash memory is also referred to as electrically erasable programmable read-only
memory (EEPROM). Since it is small (5 to 10MB) and expensive, it has little or no
application to the DBMS.
4. Magnetic-disk storage is the primary medium for long-term on-line storage today.
Prices have been dropping significantly with a corresponding increase in capacity. New
disks today are in excess of 20GB. Unfortunately, the demands have been increasing
and the volume of data has been increasing faster. The organizations using a DBMS
are always trying to keep up with the demand for storage. This media is the most cost-
effective for on-line storage for large databases.
5. Optical storage is very popular, especially CD-ROM systems. This is limited to data
that is read-only. It can be reproduced at a very low-cost and it is expected to grow in
popularity, especially for replacing written manuals.
6. Tape storage is used for backup and archival data. It is cheaper and slower than all of
the other forms, but it does have the feature that there is no limit on the amount of data
that can be stored, since more tapes can be purchased. As the tapes get increased
capacity, however, restoration of data takes longer and longer, especially when only a
small amount of data is to be restored. This is because the retrieval is sequential, the
slowest possible method.
Magnetic Disks
Disks are actually relatively simple. There is normally a collection of platters on a spindle. Each
platter is coated with a magnetic material on both sides and the data is stored on the surfaces.
There is a read-write head for each surface that is on an arm assembly that moves back and
forth. A motor spins the platters at a high constant speed, (60, 90, or 120 revolutions per
seconds.)
The surface is divided into a set of tracks (circles). These tracks are divided into a set of
sectors, which is the smallest unit of data that can be written or read at one time. Sectors can
range in size from 31 bytes to 4096 bytes, with 512 bytes being the most common. A collection
of a specific track from both surfaces and from all of the platters is called a cylinder.
Platters can range in size from 1.8 inches to 14 inches. Today, 5 1/4 inches and 3 1/2 inches
are the most common, because they have the highest seek times and lowest cost.
A disk controller interfaces the computer system and the actual hardware of the disk drive. The
controller accepts high-level command to read or write sectors. The controller then converts
the commands in the necessary specific low-level commands. The controller will also attempt
to protect the integrity of the data by computing and using checksums for each sector. When
attempting to read the data back, the controller recalculates the checksum and makes several
attempts to correctly read the data and get matching checksums. If the controller is
unsuccessful, it will notify the operating system of the failure.
The controller can also handle the problem of eliminating bad sectors. Should a sector go bad,
the controller logically remaps the sector to one of the extra unused sectors that disk vendors
provide, so that the reliability of the disk system is higher. It is cheaper to produce disks with a
greater amount of sectors than advertised and then map out bad sectors than it is to produce
disks with no bad sectors or with extremely limited possibility of sectors going bad.
There are many different types of disk controllers, but the most common ones today are SCSI,
IDE, and EIDE.
One other characteristic of disks that provides an interesting performance is the distance from
the read-write head to the surface of the platter. The smaller this gap is means that data can
be written in a smaller area on the disk, so that the tracks can be closer together and the disk
has a greater capacity. Often the distance is measured in microns. However, this means that
the possibility of the head touching the surface is increased. When the head touches the
surface while the surface is spinning at a high speed, the result is called a "head crash", which
scratches the surface and defaces the head. The bottom line to this is that someone must
replace the disk.
1. Seek time is the time to reposition the head and increases with the distance that the
head must move. Seek times can range from 2 to 30 milliseconds. Average seek
time is the average of all seek times and is normally one-third of the worst-case seek
time.
2. Rotational latency time is the time from when the head is over the correct track until
the data rotates around and is under the head and can be read. When the rotation is
120 rotations per second, the rotation time is 8.35 milliseconds. Normally, the average
rotational latency time is one-half of the rotation time.
3. Access time is the time from when a read or write request is issued to when the data
transfer begins. It is the sum of the seek time and latency time.
4. Data-transfer rate is the rate at which data can be retrieved from the disk and sent to
the controller. This will be measured as megabytes per second.
5. Mean time to failure is the number of hours (on average) until a disk fails. Typical
times today range from 30,000 to 800,000 hours (or 3.4 to 91 years).
Requests for disk I/O are generated by both the file system and by the virtual memory
manager found in most systems. Each request specifies the address on the disk to be
referenced; that address specifies is in the form of a block number. Each block is a contiguous
sequence of sectors from a single track of one platter and ranges from 512 bytes to several
kilobytes of data. The lower level file manager must convert block addresses into the
hardware-level cylinder, surface, and sector number.
Since access to data on disk is several orders of magnitude slower is access to data in main
memory; much attention has been paid to improving the speed of access to blocks on the disk.
This is also where more main memory can speed up the response time, by making sure that
the data needed is in memory when it is needed.
This is the same problem that is addressed in designing operating systems, to insure the best
response time from the file system manager and the virtual memory manager.
Scheduling. Disk-arm scheduling algorithms attempt to order accesses in an attempt
to increase the number of accesses that can be processed in a given amount of time.
The might include First-Come/First-Serve, Shortest Seek First, and elevator.
File organization. To reduce block-access time, data could be arranged on the disk in
the same order that it is expected to be retrieved. (This would be storing the data on
the disk in order based on the primary key.) At best, this starts to produce less and less
of a benefit, as there are more inserts and deletes. Also we have little control of where
on the disk things get stored. The more the data gets fragmented on the disk, the more
time it takes to locate it.
Nonvolatile write buffer. Using non-volatile memory (flash memory) can be used to
protect the data in memory from crashes, but it does increase the cost. It is possible
that the use of an UPS would be more effective and cheaper.
Log disk. You can use a disk for writing a sequential log.
Buffering. The more information you have in buffers in main memory, the more likely
you are to not have to get the information from the disk. However it is more likely that
more of the memory will be wasted with information not necessary.
RAID
RAIDs are Redundant Arrays of Inexpensive Disks. There are six levels of organizing these
disks:
0 -- Non-redundant Striping
1 -- Mirrored Disks
2 -- Memory Style Error Correcting Codes
3 -- Bit Interleaved Parity
4 -- Block Interleaved Parity
5 -- Block Interleaved Distributed Parity
6 -- P + Q Redundancy
Tertiary Storage
Storage Access
A database is mapped into a number of different files, which are maintained by the underlying
operating system. Files are organized into block and a block may contain one or more data
item.
A major goal of the DBMS is to minimize the number of block transfers between the disk and
memory. Since it is not possible to keep all blocks in main memory, we need to manage the
allocation of the space available for the storage of blocks. This is also similar to the problems
encountered by the operating system, and can be in conflict with the operating system, since
the OS is concerned with processes and the DBMS is concerned with only one family of
processes.
Buffer Manager
Programs in a DBMS make requests (that is, calls) on the buffer manager when they need a
block from a disk. If the block is already in the buffer, the requester is passed the address of
the block in main memory. If the block in not in the buffer, the buffer manager first allocates
space in the buffer for the block, through out some other block, if required, to make space for
the new block. If the block that is to be thrown out has been modified, it must first be written
back to the disk. The internal actions of the buffer manager are transparent to the programs
that issue disk-block requests.
Replacement strategy. When there is no room left in the buffer, a block must be
removed from the buffer before a new one can be read in. Typically, operating systems
use a least recently use (LRU) scheme. There is also a Most Recent Used (MRU) that
can be more optimal for DBMSs.
Pinned blocks. A block that is not allowed to be written back to disk is said to be
pinned. This could be used to store data that has not been committed yet.
Forced output of blocks. There are situations in which it is necessary to write back to
the block to the disk, even though the buffer space is not currently needed. This might
be done during system lulls, so that when activity picks up, a write of a modified block
can be avoided in peak periods.
File Organization
Fixed-Length Records
If each character occupies 1 byte and a real occupies 8 bytes, then this record
occupies 40 bytes. If the first record occupies the first 40 bytes and the second record
occupies the second 40 bytes, etc. we have some problems.
It is difficult to delete a record, because there is no way to indicate that the record is
deleted. (At least one system automatically adds one byte to each record as a flag to
show if the record is deleted.) Unless the block size happens to be a multiple of 40
(which is extremely unlikely), some records will cross block boundaries. It would require
two block access to read or write such a record.
One solution might be to compress the file after each deletion. This will incur a major amount
of overhead processing, especially on larger files. Additionally, there is the same problem on
inserts!
Another solution would be to have two sets of pointers. One that would link the current record
to the next logical record (linked list) plus a free list (a list of free slots.) This increases the size
the file.
Variable-Length Records
It could be solved:
Any record can be placed anywhere in the file. There is no ordering of records and there is a
single file for each relation.
Any record can be placed anywhere in the file. A hashing function is computed on some
attribute of each record. The function specifies in which block the record should be placed.
Several different relations can be stored in the same file. Related records of the different
relations can be stored in the same block.
A RDBMS needs to maintain data about the relations, such as the schema. This is stored in a
data dictionary (sometimes called a system catalog):
Names of the relations
Names of the attributes of each relation
Domains and lengths of attributes
Names of views, defined on the database, and definitions of those views
Integrity constraints
Names of authorized users
Accounting information about users
Number of tuples in each relation
Method of storage for each relation (clustered/non-clustered)
Name of the index
Name of the relation being indexed
Attributes on which the index in defined
Type of index formed
Indexing
The main goal of designing the database is faster access to any data in the database and
quicker insert/delete/update to any data. This is because no one likes waiting. When a
database is very huge, even a smallest transaction will take time to perform the action. In order
to reduce the time spent in transactions, Indexes are used. Indexes are similar to book
catalogues in library or even like an index in a book. What it does? It makes our search simpler
and quicker. Same concept is applied here in DBMS to access the files from the memory.
When records are stored in the primary memory like RAM, accessing them is very easy and
quick. But records are not limited in numbers to store in RAM. They are very huge and we
have to store it in the secondary memories like hard disk. As we have seen already, in memory
we cannot store records like we see tables. They are stored in the form of files in different
data blocks. Each block is capable of storing one or more records depending on its size.
When we have to retrieve any required data or perform some transaction on those data, we
have to pull them from memory, perform the transaction and save them back to the memory. In
order to do all these activities, we need to have a link between the records and the data blocks
so that we can know where these records are stored. This link between the records and the
data block is called index. It acts like a bridge between the records and the data block.
Indexing is defined based on its indexing attributes. Indexing can be of the following
types
Primary Index Primary index is defined on an ordered data file. The data file is
ordered on a key field. The key field is generally the primary key of the relation.
Dense Index
Sparse Index
Dense Index
In this case, indexing is created for primary key as well as on the columns on which we
perform transactions. That means, user can fire query not only based on primary key column.
He can query based on any columns in the table according to his requirement. But creating
index only on primary key will not help in this case. Hence index on all the search key columns
are stored. This method is called dense index.
Sparse Index
In order to address the issues of dense indexing, sparse indexing is introduced. In this method
of indexing, range of index columns store the same data block address. And when data is to
be retrieved, the block address will be fetched linearly till we get the requested data.
Multilevel Index
Index records comprise search-key values and data pointers. Multilevel index is stored on the
disk along with the actual database files. As the size of the database grows, so does the size
of the indices. There is an immense need to keep the index records in the main memory so as
to speed up the search operations. If single-level index is used, then a large size index cannot
be kept in memory which leads to multiple disk accesses.
B+ Tree
A B-tree is a method of placing and locating files (called records or keys) in a database. (The
meaning of the letter B has not been explicitly defined.) The B-tree algorithm minimizes the
number of times a medium must be accessed to locate a desired record, thereby speeding up
the process.
B-trees are preferred when decision points, called nodes, are on hard disk rather than in
random-access memory (RAM). It takes thousands of times longer to access a data element
from hard disk as compared with accessing it from RAM, because a disk drive has mechanical
parts, which read and write data far more slowly than purely electronic media. B-trees save
time by using nodes with many branches (called children), compared with binary trees, in
which each node has only two children. When there are many children per node, a record can
be found by passing through fewer nodes than if there are two children per node.
In a tree, records are stored in locations called leaves. This name derives from the fact that
records always exist at end points; there is nothing beyond them. The maximum number of
children per node is the order of the tree. The number of required disk accesses is the depth.
The image at left shows a binary tree for locating a particular record in a set of eight leaves.
The image at right shows a B-tree of order three for locating a particular record in a set of eight
leaves (the ninth leaf is unoccupied, and is called a null). The binary tree at left has a depth of
four; the B-tree at right has a depth of three. Clearly, the B-tree allows a desired record to be
located faster, assuming all other system parameters are identical. The tradeoff is that the
decision process at each node is more complicated in a B-tree as compared with a binary tree.
A sophisticated program is required to execute the operations in a B-tree. But this program is
stored in RAM, so it runs fast.
In a practical B-tree, there can be thousands, millions, or billions of records. Not all leaves
necessarily contain a record, but at least half of them do. The difference in depth between
binary-tree and B-tree schemes is greater in a practical database than in the example
illustrated here, because real-world B-trees are of higher order (32, 64, 128, or more).
Depending on the number of records in the database, the depth of a B-tree can and often does
change. Adding a large enough number of records will increase the depth; deleting a large
enough number of records will decrease the depth. This ensures that the B-tree functions
optimally for the number of records it contains.
Hashing
Hash File organization method is the one where data is stored at the data blocks whose
address is generated by using hash function. The memory location where these records are
stored is called as data block or data bucket. This data bucket is capable of storing one or
more records.
The hash function can use any of the column value to generate the address. Most of the time,
hash function uses primary key to generate the hash index address of the data block. Hash
function can be simple mathematical function to any complex mathematical function. We can
even consider primary key itself as address of the data block. That means each row will be
stored at the data block whose address will be same as primary key.
Hash Organization
Bucket A hash file stores data in bucket format. Bucket is considered a unit of
storage. A bucket typically stores one complete disk block, which in turn can store one
or more records.
Hash Function A hash function, h, is a mapping function that maps all the set of
search-keys K to the address where actual records are placed. It is a function from
search keys to bucket addresses.
Each of these names would be the key in the database for that person's data. A database
search mechanism would first have to start looking character-by-character across the name for
matches until it found the match (or ruled the other entries out). But if each of the names were
hashed, it might be possible (depending on the number of names in the database) to generate
a unique four-digit key for each name. For example:
7864 Abernathy, Sara 9802 Epperdingle, Roscoe 1990 Moore, Wilfred 8822 Smith, David
(and so forth)
A search for any name would first consist of computing the hash value (using the same hash
function used to store the item) and then comparing for a match using that value. It would, in
general, be much faster to find a match across four digits, each having only 10 possibilities,
than across an unpredictable value length where each character had 26 possibilities.
There are two types of hash file organizations Static and Dynamic Hashing.
Static Hashing
In static hashing, when a search-key value is provided, the hash function always computes
the same address. For example, if mod-4 hash function is used, then it shall generate only 5
values. The output address shall always be same for that function. The number of buckets
provided remains unchanged at all times
Bucket Overflow
The condition of bucket-overflow is known as collision. This is a fatal state for any static hash
function. In this case, overflow chaining can be used.
Overflow Chaining When buckets are full, a new bucket is allocated for the same
hash result and is linked after the previous one. This mechanism is called Closed
Hashing.
Linear Probing When a hash function generates an address at which data is already
stored, the next free bucket is allocated to it. This mechanism is called Open Hashing.
Dynamic Hashing
The problem with static hashing is that it does not expand or shrink dynamically as the size of
the database grows or shrinks. Dynamic hashing provides a mechanism in which data
buckets are added and removed dynamically and on-demand. Dynamic hashing is also
known as extended hashing.
Hash function, in dynamic hashing, is made to produce a large number of values and only a
few are used initially.
The prefix of an entire hash value is taken as a hash index. Only a portion of the hash value is
used for computing bucket addresses. Every hash index has a depth value to signify how
many bits are used for computing a hash function. These bits can address 2n buckets. When
all these bits are consumed that is, when all the buckets are full then the depth value is
increased linearly and twice the buckets are allocated.
Hashing is not favorable when the data is organized in some ordering and the queries require
a range of data. When data is discrete and random, hash performs the best.
Hashing algorithms have high complexity than indexing. All hash operations are done in
constant time.
Data Backup:
In a computer system we have primary and secondary memory storage. Primary memory
storage devices - RAM is a volatile memory which stores disk buffer, active logs, and other
related data of a database. It stores all the recent transactions and the results too. When a
query is fired, the database first fetches in the primary memory for the data, if it does not exist
there, then it moves to the secondary memory to fetch the record. Fetching the record from
primary memory is always faster than secondary memory. What happens if the primary
memory crashes? All the data in the primary memory is lost and we cannot recover the
database.
In such cases, we can follow any one the following steps so that data in the primary memory
are not lost.
We can create a copy of primary memory in the database with all the logs and buffers,
and are copied periodically into database. So in case of any failure, we will not lose
all the data. We can recover the data till the point it is last copied to the database.
We can have checkpoints created at several places so that data is copied to the
database.
Suppose the secondary memory itself crashes. What happens to the data stored in it? All the
data are lost and we cannot recover. We have to think of some alternative solution for this
because we cannot afford for loss of data in huge database.
There are three methods used to back up the data in the secondary memory, so that it can be
recovered if there is any failure.
Remote Backup: - Database copy is created and stored in the remote network. This
database is periodically updated with the current database so that it will be in sync
with data and other details. This remote database can be updated manually called
offline backup. It can be backed up online where the data is updated at current and
remote database simultaneously. In this case, as soon as there is a failure of current
database, system automatically switches to the remote database and starts
functioning. The user will not know that there was a failure.
In the second method, database is copied to memory devices like magnetic tapes and
kept at secured place. If there is any failure, the data would be copied from these
tapes to bring the database up.
As the database grows, it is an overhead to backup whole database. Hence only the
log files are backed up at regular intervals. These log files will have all the
information about the transaction being made. So seeing these log files, database
can be recovered. In this method log files are backed up at regular intervals, and
database is backed up once in a week.
There are two types of data backup physical data backup and Logical data backup. The
physical data backup includes physical files like data files, log files, control files, redo- undo
logs etc. They are the foundation of the recovery mechanism in the database as they provide
the minute details about the transactions and modification to the database
Logical backup includes backup of logical data like tables, views, procedures, functions etc.
Logical data backup alone is not sufficient to recover the database as they provide only the
structural information. The physical data back actually provides the minute details about the
database and is very much important for recovery.
Data Recovery:
Data recovery is the process of restoring data that has been lost, accidentally
deleted, corrupted or made inaccessible.In enterprise IT, data recovery typically refers to the
restoration of data to a desktop, laptop, server or external storage system from a backup.
Failure Classification
To see where the problem has occurred, we generalize a failure into various categories, as
follows
Transaction failure
A transaction has to abort when it fails to execute or when it reaches a point from where it
cant go any further. This is called transaction failure where only a few transactions or
processes are hurt.
Logical errors Where a transaction cannot complete because it has some code
error or any internal error condition.
System errors Where the database system itself terminates an active transaction
because the DBMS is not able to execute it, or it has to stop because of some system
condition. For example, in case of deadlock or resource unavailability, the system
aborts an active transaction.
System Crash
There are problems external to the system that may cause the system to stop abruptly
and cause the system to crash. For example, interruptions in power supply may cause the
failure of underlying hardware or software failure.
Disk Failure
In early days of technology evolution, it was a common problem where hard-disk drives or
storage drives used to fail frequently.
Disk failures include formation of bad sectors, unreachability to the disk, disk head crash or
any other failure, which destroys all or a part of disk storage.
A transaction may be in the middle of some operation; the DBMS must ensure the
atomicity of the transaction in this case.
It should check whether the transaction can be completed now or it needs to be rolled
back.
There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction
Maintaining the logs of each transaction, and writing them onto some stable storage
before actually modifying the database.
Maintaining shadow paging, where the changes are done on a volatile memory, and
later, the actual database is updated.
Log-based Recovery
Log is a sequence of records, which maintains the records of actions performed by a
transaction. It is important that the logs are written prior to the actual modification and stored
on a stable storage media, which is failsafe.
When a transaction enters the system and starts execution, it writes a log about it.
XML:
XML is a markup language, which is mainly used to represent the structured data. Structured
data is the one which contains the data along with the tag / label to indicate what is that data. It
is like a data with tag as a column name in RDBMS. Hence the same is used to document the
data in DDB. One may think why we need to XML rather than simply documenting the data
with simple tags as shown in the contact detail example. XML provides lots of features to
handle the structured data within the document.
XML is the markup language which serves the structured data over the internet, which
can be viewed by the user easily as well as quickly.
It supports lots of different types of applications.
This XML does not have any optional feature so that its complexity can increase.
Hence XML is a simple language which any user can use with minimal knowledge.
XML documents are created very quickly. It does not need any thorough analysis,
design and development phases like in RDBMS. In addition, one should be able to
create and view XML in notepad too.
All these features of XML make it unique and ideal to represent DDB.
XML- enabled
Distributed Database:
A distributed database is a database in which portions of the database are stored in multiple physical
locations and processing is distributed among multiple database nodes.
A centralized distributed database management system (DDBMS) integrates the data logically so it can
be managed as if it were all stored in the same location. The DDBMS synchronizes all the data
periodically and ensures that updates and deletes performed on the data at one location will be
automatically reflected in the data stored elsewhere.
Distribution It states the physical distribution of data across the different sites.
Autonomy It indicates the distribution of control of the database system and the
degree to which each constituent DBMS can operate independently.
Architectural Models
Some of the common architectural models are
Client - Server Architecture for DDBMS
Peer - to - Peer Architecture for DDBMS
Multi - DBMS Architecture
Client - Server Architecture for DDBMS
This is a two-level architecture where the functionality is divided into servers and clients. The
server functions primarily encompass data management, query processing, optimization and
transaction management. Client functions include mainly user interface. However, they have
some functions like consistency checking and transaction management.
Design Alternatives
The distribution design alternatives for the tables in a DDBMS are as follows
Fully Replicated
In this design alternative, at each site, one copy of all the database tables is stored. Since,
each site has its own copy of the entire database, queries are very fast requiring negligible
communication cost. On the contrary, the massive redundancy in data requires huge cost
during update operations. Hence, this is suitable for systems where a large number of queries
is required to be handled whereas the number of database updates is low.
Partially Replicated
Copies of tables or portions of tables are stored at different sites. The distribution of the tables
is done in accordance to the frequency of access. This takes into consideration the fact that
the frequency of accessing the tables vary considerably from site to site. The number of
copies of the tables (or portions) depends on how frequently the access queries execute and
the site which generate the access queries.
Fragmented
In this design, a table is divided into two or more pieces referred to as fragments or partitions,
and each fragment can be stored at different sites. This considers the fact that it seldom
happens that all data stored in a table is required at a given site. Moreover, fragmentation
increases parallelism and provides better disaster recovery. Here, there is only one copy of
each fragment in the system, i.e. no redundant data.
Vertical fragmentation
Horizontal fragmentation
Hybrid fragmentation
Mixed Distribution
This is a combination of fragmentation and partial replications. Here, the tables are initially
fragmented in any form (horizontal or vertical), and then these fragments are partially
replicated across the different sites according to the frequency of accessing the fragments.
This is the method where all the transactions are executed in the primary memory or the
shadow copy of database. Once all the transactions completely executed, it will be updated to
the database. Hence, if there is any failure in the middle of transaction, it will not be reflected in
the database. Database will be updated after all the transaction is complete. A database
pointer will be always pointing to the consistent copy of the database, and copy of the
database is used by transactions to update. Once all the transactions are complete, the DB
pointer is modified to point to new copy of DB, and old copy is deleted. If there is any failure
during the transaction, the pointer will be still pointing to old copy of database, and shadow
database will be deleted. If the transactions are complete then the pointer is changed to point
to shadow DB, and old DB is deleted.
ORACLE
An Oracle database is a collection of data treated as a unit. The purpose of a database is
to store and retrieve related information. A database server is the key to solving the
problems of information management.
We can set primary key on table up to 16 columns of table in oracle 9i as well as in Oracle
10g.
The maximum number of data files in Oracle 9i and Oracle 10g Database is 65,536.
The physical database structures are the files that store the data. When you execute the
SQL command CREATE DATABASE, the following files are created:
Data files
Every Oracle database has one or more physical data files, which contain all the
database data. The data of logical database structures, such as tables and indexes,
is physically stored in the data files.
Control files
Every Oracle database has a control file. A control file contains metadata specifying
the physical structure of the database, including the database name and the names
and locations of the database files.
Every Oracle Database has an online redo log, which is a set of two or
more online redo log files. An online redo log is made up of redo entries (also
called redo records), which record all changes made to data.
LOGICAL STORAGE STRUCTURES
This section discusses logical storage structures. The following logical storage structures
enable Oracle Database to have fine-grained control of disk space use:
Data blocks
At the finest level of granularity, Oracle Database data is stored in data blocks.
One data block corresponds to a specific number of bytes on disk.
Extents
Segments
A segment is a set of extents allocated for a user object (for example, a table or
index), undo data, or temporary data.
Tablespaces
Redo: In the Oracle RDBMS environment, redo logs comprise files in a proprietary format
which log a history of all changes made to the database. Each redo log file consists of
redo records. A redo record, also called a redo entry, holds a group of change vectors,
each of which describes or represents a change made to a single block in the database.
For example, if a user UPDATEs a salary-value in a table containing employee-related data,
the DBMS generates a redo record containing change-vectors that describe changes to
the data segment block for the table. And if the user then COMMIT the update, Oracle
generates another redo record and assigns the change a "system change number" (SCN).
LGWR writes to redo log files in a circular fashion. When the current redo log file fills,
LGWR begins writing to the next available redo log file. When the last available redo log
file is filled, LGWR returns to the first redo log file and writes to it, starting the cycle
again. The numbers next to each line indicate the sequence in which LGWR writes to each
redo log file.
Oracle Database uses only one redo log files at a time to store redo records written from
the redo log buffer. The redo log file that LGWR is actively writing to is called
the current redo log file.Redo log files that are required for instance recovery are
called active redo log files. Redo log files that are no longer required for instance recovery
are called inactive redo log files.
A log switch is the point at which the database stops writing to one redo log file and begins
writing to another. Normally, a log switch occurs when the current redo log file is
completely filled and writing must continue to the next redo log file. However, you can
configure log switches to occur at regular intervals, regardless of whether the current redo
log file is completely filled. You can also force log switches manually.
Oracle Database assigns each redo log file a new log sequence number every time a log
switch occurs and LGWR begins writing to it. When the database archives redo log files,
the archived log retains its log sequence number. A redo log file that is cycled back for use
is given the next available log sequence number.
UNDO: Oracle Database creates and manages information that is used to roll back, or
undo, changes to the database. Such information consists of records of the actions of
transactions, primarily before they are committed. These records are collectively referred
to as undo.
Undo records are used to:
Roll back transactions when a ROLLBACK statement is issued
Recover the database
Provide read consistency
Analyze data as of an earlier point in time by using Oracle Flashback Query
Recover from logical corruptions using Oracle Flashback features.
Snapshots can also contain a WHERE clause so that snapshot sites can contain
customized data sets. Such snapshots can be helpful for regional offices or sales forces
that do not require the complete corporate data set.When a snapshot is refreshed, Oracle
must examine all of the changes to the master table to see if any apply to the snapshot.
Therefore, if any changes where made to the master table since the last refresh, a
snapshot refresh will take some time, even if the refresh does not apply any changes to
the snapshot. If, however, no changes at all were made to the master table since the last
refresh of a snapshot, the snapshot refresh should be very quick.
Snapshot and materialized view are almost same same but with one difference.
You can say that materialized view =snapshot + query rewrite functionality query rewrite
functionality:In materialized view you can enable or disable query rewrite option. which
means database server will rewrite the query so as to give high performance. Query
rewrite is based on some rewritten standards(by oracle itself).So the database server will
follow these standards and rewrite the query written in the materialized view ,but this
functionality is not there in snapshots.
Simple snapshots are the only type that can use the FAST REFRESH method. A snapshot
is considered simple if the defining query meets the following criteria:
Oracle8 extends the universe of simple snapshots with a feature known as subquery
subsetting, described in the later section entitled Subquery Subsetting.
Not surprisingly, any snapshot that is not a simple snapshot is a complex snapshot.
Complex snapshots can only use COMPLETE refreshes, which are not always practical.
For tables of more than about 100,000 rows, COMPLETE refreshes can be quite unwieldy.
You can often avoid this situation by creating simple snapshots of individual tables at the
master site and performing the offending query against the local snapshots.
1. System Global Area (SGA):- This is a large, shared memory segment that virtually
all Oracle processes will access at one point or another.
2. Process Global Area (PGA): This is memory that is private to a single process
or thread; it is not accessible from other processes/threads.
3. User Global Area (UGA): This is memory associated with your session. It is
located either in the SGA or the PGA, depending whether you are connected to
the database using a shared server (it will be in the SGA), or a dedicated server (it
will be in the PGA).
1)SGA:
There are five memory structures that make up the System Global Area (SGA). The SGA
will store many internal data structures that all processes need access to, cache data from
disk, cache redo data before writing to disk, hold parsed SQL plans and so on.SGA is
used to store database information that is shared by database processes. It contains data
and control information for the Oracle Server and is allocated in the virtual memory if the
computer where Oracle resides.
1.Redo Buffer: The redo buffer is where data that needs to be written to the online redo
logs will be cached temporarily, before it is written to disk. Since a memory-to-memory
transfer is much faster than a memory-to-disk transfer, use of the redo log buffer can
speed up database operation. The data will not reside in the redo buffer for very long. In
fact, LGWR initiates a flush of this area in one of the following scenarios:
Every three seconds
Whenever someone commits
When LGWR is asked to switch log files
When the redo buffer gets one-third full or contains 1MB of cached redo log data
Use the parameter LOG_BUFFER parameter to adjust but be-careful increasing it too
large as it will reduce your I/O but commits will take longer.
2.Buffer Cache: The block buffer cache is where Oracle stores database blocks before
writing them to disk and after reading them in from disk. There are three places to store
cached blocks from individual segments in the SGA:
Default pool (hot cache): The location where all segment blocks are normally cached.
Keep pool (warm cache): An alternate buffer pool where by convention you assign
segments that are accessed fairly frequently, but still get aged out of the default buffer pool
due to other segments needing space.
Recycle pool (do not care to cache): An alternate buffer pool where by convention you
assign large segments that you access very randomly, and which would therefore
cause excessive buffer flushing of many blocks from many segments. Theres no benefit to
caching such segments because by the time you wanted the block again, it would have
been aged out of the cache. You would separate these segments out from the segments in
the default and keep pools so they would not cause those blocks to age out of the cache.
3.Shared Pool: The shared pool is where Oracle caches many bits of program data.
When we parse a query, the parsed representation is cached there. Before we go through
the job of parsing an entire query, Oracle searches the shared pool to see if the work has
already been done. PL/SQL code that you run is cached in the shared pool, so the next
time you run it, Oracle doesnt have to read it in from disk again. PL/SQL code is not only
cached here, it is shared here as well. If you have 1,000 sessions all executing the same
code, only one copy of the code is loaded and shared among all sessions. Oracle stores
the system parameters in the shared pool. The data dictionary
cache (cached information about database objects) is stored here.Dictionary cache is a
collection of database tables and views containing information about the database, its
structures, privileges and users. When statements are issued oracle will check
permissions, access, etc and will obtain this information from its dictionary cache, if the
information is not in the cache then it has to be read in from the disk and placed in to the
cache. The more information held in the cache the less oracle has to access the slow
disks.The parameter SHARED_POOL_SIZE is used to determine the size of the shared
pool, there is no way to adjust the caches independently, you can only adjust the shared
pool size.The shared pool uses a LRU (least recently used) list to maintain what is held in
the buffer, see buffer cache for more details on the LRU.
4.Large Pool: The large pool is not so named because it is a large structure (although it
may very well be large in size). It is so named because it is used for allocations of large
pieces of memory that are bigger than the shared pool is designed to handle. Large
memory allocations tend to get a chunk of memory, use it, and then be done with it. There
was no need to cache this memory as in buffer cache and Shared Pool, hence a new pool
was allocated. So basically Shared pool is more like Keep Pool whereas Large Pool is
similar to the Recycle Pool. Large pool is used specifically by:
Shared server connections, to allocate the UGA region in the SGA.
Parallel execution of statements, to allow for the allocation of interprocess
message buffers, which are used to coordinate the parallel query servers.
Backup for RMAN disk I/O buffers in some cases.
5.Java Pool: The Java pool is used in different ways, depending on the mode in which the
Oracle server is running. In dedicated server mode the total memory required for the
Java pool is quite modest and can be determined based on the number of Java classes
youll be using. In shared server connection the java pool includes shared part of each java
class and Some of the UGA used for per-session state of each session, which is
allocated from the JAVA_POOL within the SGA.
6.Streams Pool: The Streams pool (or up to 10 percent of the shared pool if no Streams
pool is configured) is used to buffer queue messages used by the Streams process as it
moves or copies data from one database to another.
The SGA comprises a number of memory components, which are pools of memory used
to satisfy a particular class of memory allocation requests. Examples of memory
components include the shared pool (used to allocate memory for SQL and PL/SQL
execution), the java pool (used for java objects and other java execution memory), and the
buffer cache (used for caching disk blocks). All SGA components allocate and deallocate
space in units of granules. Oracle Database tracks SGA memory use in internal numbers
of granules for each SGA component.Granule size is determined by total SGA size. On
most platforms, the size of a granule is 4 MB if the total SGA size is less than 1 GB, and
granule size is 16MB for larger SGAs. Some platform dependencies arise. For example,
on 32-bit Windows, the granule size is 8 M for SGAs larger than 1 GB.Oracle Database
can set limits on how much virtual memory the database uses for the SGA. It can start
instances with minimal memory and allow the instance to use more memory by expanding
the memory allocated for SGA components, up to a maximum determined by
the SGA_MAX_SIZEinitialization parameter. If the value for SGA_MAX_SIZE in the initialization
parameter file or server parameter file (SPFILE) is less than the sum the memory allocated
for all components, either explicitly in the parameter file or by default, at the time the
instance is initialized, then the database ignores the setting for SGA_MAX_SIZE.
2)PGA:
PGA is the memory reserved for each user process connecting to an Oracle Database and
is allocated when a process is created and deallocated when a process is terminated.
Contents of PGA:-
Private SQL Area: Contains data such as bind information and run-time memory
structures. It contains Persistent Area which contains bind information and is freed
only when the cursor is closed and Run time Area which is created as the first step
of an execute request. This area is freed only when the statement has been
executed. The number of Private SQL areas that can be allocated to a user process
depends on the OPEN_CURSORS initialization parameter.
Session Memory: Consists of memory allocated to hold a sessions variable and
other info related to the session.
SQL Work Areas: Used for memory intensive operations such as: Sort, Hash-join,
Bitmap merge, Bitmap Create.
NOTE:- From 11gR1 You can set MEMORY_TARGET and auto-mem management for
both SGA and PGA is taken care.
I came across several DBAs enquiring about how the PGA Memory is allocated and from
their I cam to know about several misconceptions people are having so writing a short note
on the same.
Oracle will try and keep the PGA under the target value, but if you exceed this value
Oracle will perform multi-pass operations (disk operations).
3)UGA:
The UGA (User Global Area) is your state information, this area of memory will be
accessed by your current session, depending on the connection type (shared server) the
UGA can be located in the SGA which is accessible by any one of the shared server
processes, because a dedicated connection does not use shared servers the memory will
be located in the PGA
Shared server - UGA will be part of the SGA
Dedicated server - UGA will be the PGA
CURSOR: A cursor is a temporary work area created in the system memory when a SQL
statement is executed. A cursor contains information on a select statement and the rows of
data accessed by it.This temporary work area is used to store the data retrieved from the
database, and manipulate this data. A cursor can hold more than one row, but can process
only one row at a time. The set of rows the cursor holds is called the active set.
1)Implicit Cursor
2)Explicit Cursor
They must be created when you are executing a SELECT statement that returns more
than one row. Even though the cursor stores multiple records, only one record can be
processed at a time, which is called as current row. When you fetch a row the current row
position moves to next row.
For Example: When you execute INSERT, UPDATE, or DELETE statements the cursor
attributes tell us whether any rows are affected and how many have been affected. When
a SELECT... INTO statement is executed in a PL/SQL Block, implicit cursor attributes can
be used to find out whether any row has been returned by the SELECT statement. PL/SQL
returns an error when no data is selected.
In PL/SQL, you can refer to the most recent implicit cursor as the SQL cursor, which
always has the attributes like %FOUND, %ISOPEN, %NOTFOUND, and %ROWCOUNT.
The SQL cursor has additional attributes, %BULK_ROWCOUNT and
%BULK_EXCEPTIONS, designed for use with the FORALL statement.
TRIGGER: Triggers are stored programs, which are automatically executed or fired when
some events occur.Trigger automatically associated with DML statement, when DML
statement execute trigger implicitly execute.You can create trigger using the CREATE
TRIGGER statement. If trigger activated, implicitly fire DML statement and if trigger
deactivated can't fire.
Triggers could be defined on the table, view, schema, or database with which the event is
associated.
Advantages of trigger:
2) By using triggers, business rules and transactions are easy to store in database and
can be used consistently even if there are future updates to the database.
4) When a change happens in a database a trigger can adjust the change to the
entire database.
Use the CREATE TRIGGER statement to create and enable a database trigger, which is:
Before a trigger can be created, the user SYS must run a SQL script commonly
called DBMSSTDX.SQL. The exact name and location of this script depend on your operating
system.
To create a trigger in your own schema on a table in your own schema or on your
own schema (SCHEMA), you must have the CREATE TRIGGERsystem privilege.
To create a trigger in any schema on a table in any schema, or on another user's
schema (schema.SCHEMA), you must have the CREATE ANYTRIGGER system privilege.
In addition to the preceding privileges, to create a trigger on DATABASE, you must
have the ADMINISTER DATABASE TRIGGER system privilege.
If the trigger issues SQL statements or calls procedures or functions, then the owner of the
trigger must have the privileges necessary to perform these operations. These privileges
must be granted directly to the owner rather than acquired through roles.
Extents
The next level of logical database space is called an extent. An extent is a specific number
of contiguous data blocks that is allocated for storing a specific type of information.
Segments
The level of logical database storage above an extent is called a segment. A segment is a
set of extents that have been allocated for a specific type of data structure, and that all are
stored in the same tablespace. For example, each table's data is stored in its own data
segment, while each index's data is stored in its own index segment.Oracle allocates
space for segments in extents. Therefore, when the existing extents of a segment are full,
Oracle allocates another extent for that segment. Because extents are allocated as
needed, the extents of a segment may or may not be contiguous on disk. The segments
also can span files, but the individual extents cannot.
- data segments
- index segments
- rollback segments
- temporary segments
Data Segments:
There is a single data segment to hold all the data of every non clustered table in an oracle
database. This data segment is created when you create an object with the CREATE
TABLE/SNAPSHOT/SNAPSHOT LOG command. Also, a data segment is created for a
cluster when a CREATE CLUSTER command is issued.
The storage parameters control the way that its data segment's extents are allocated.
These affect the efficiency of data retrieval and storage for the data segment associated
with the object.
Index Segments:
Every index in an Oracle database has a single index segment to hold all of its data.
Oracle creates the index segment for the index when you issue the CREATE INDEX
command. Setting the storage parameters directly affects the efficiency of data retrieval
and storage.
Rollback Segments
Rollbacks are required when the transactions that affect the database need to be undone.
Rollbacks are also needed during the time of system failures. The way the roll-backed data
is saved in rollback segment, the data can also be redone which is held in redo segment.
A rollback segment is a portion of the database that records the actions of transactions if
the transaction should be rolled back. Each database contains one or more rollback
segments. Rollback segments are used to provide read consistency, to rollback
transactions, and to recover the database.
Types of rollbacks:
- statement level rollback
- rollback to a savepoint
- rollback of a transaction due to user request
- rollback of a transaction due to abnormal process termination
- rollback of all outstanding transactions when an instance terminates abnormally
- rollback of incomplete transactions during recovery.
Temporary Segments:
The SELECT statements need a temporary storage. When queries are fired, oracle needs
area to do sorting and other operation due to which temporary storages are useful.
The commands that may use temporary storage when used with SELECT are:
GROUP BY, UNION, DISTINCT, etc.
Oracle Trigger
Oracle allows you to define procedures that are implicitly executed when an
INSERT, UPDATE, or DELETE statement is issued against the associated table. These
procedures are called database triggers.
Oracle Cursor
A cursor is a pointer to this context area. PL/SQL controls the context area through a cursor.
A cursor holds the rows (one or more) returned by a SQL statement. The set of rows the
cursor holds is referred to as the active set.
You can name a cursor so that it could be referred to in a program to fetch and process the
rows returned by the SQL statement, one at a time. There are two types of cursors
Implicit cursors
Explicit cursors
Implicit Cursors
Implicit cursors are automatically created by Oracle whenever an SQL statement is executed,
when there is no explicit cursor for the statement. Programmers cannot control the implicit
cursors and the information in it.
Whenever a DML statement (INSERT, UPDATE and DELETE) is issued, an implicit cursor is
associated with this statement. For INSERT operations, the cursor holds the data that needs
to be inserted. For UPDATE and DELETE operations, the cursor identifies the rows that would
be affected.
In PL/SQL, you can refer to the most recent implicit cursor as the SQL cursor, which always
has attributes such as %FOUND, %ISOPEN, %NOTFOUND, and %ROWCOUNT. The SQL cursor
has additional attributes, %BULK_ROWCOUNT and %BULK_EXCEPTIONS, designed for use
with the FORALL statement.
Explicit Cursors
Explicit cursors are programmer-defined cursors for gaining more control over the context
area. An explicit cursor should be defined in the declaration section of the PL/SQL Block. It is
created on a SELECT Statement which returns more than one row.
Exception Handling
PL/SQL facilitates programmers to catch such conditions using exception block in the
program and an appropriate action is taken against the error condition.
o System-defined Exceptions
o User-defined Exceptions
1. DECLARE
2. <declarations section>
3. BEGIN
4. <executable command(s)>
5. EXCEPTION
6. <exception handling goes here >
7. WHEN exception1 THEN
8. exception1-handling-statements
9. WHEN exception2 THEN
10. exception2-handling-statements
11. WHEN exception3 THEN
12. exception3-handling-statements
13. ........
14. WHEN others THEN
15. exception3-handling-statements
16. END;
PL/SQL catches and handles exceptions by using exception handler architecture.
Whenever an exception occurs, it is raised. The current PL/SQL block execution halts and
control is passed to a separate section called exception section. In the exception section,
you can check what kind of exception has been occurred and handle it appropriately. This
exception handler architecture enables separating the business logic and exception
handling code hence make the program easier to read and maintain.
Delivery
Accuracy
Time Line
1) Where Delivery means system must delivered data to correct destination. Data must be
received by the intended device.
2) Accuracy mean data delivered in accurately. Means that data should not be altered
during transmission.
3) Time line means data should be delivered in time. When data in form of video audio is
transfer as they produced at same time to other location is called real time transition.
Serial communication
Parallel communication
Serial communication
In telecommunication and computer science, serial communication is the process of
sending data one bit at one time, sequentially on a single wire, over a communication
channel or computer bus. Serial is a common communication protocol that is used by
many devices. Serial communication has become the standard for intercomputer
communication. Serial communication is used for all long-haul communication and most
computer networks its save the costs of cable. Serial communication is a popular means of
transmitting data between a computer and a peripheral device such as a programmable
instrument or even another computer. its also easy to established and no extra devices are
used because most of computers have one or more serial ports.Examples isR-
232,Universal Serial Bus,R-423,PCI Express.
Parallel communication
Parallel communication is fast method of communication. in Parallel transmission transmit
the data across a parallel wire. These Parallel wires are flat constituting multiple, smaller
cables. Each cable can carry a single bit of information . A parallel cable can carry group
of data at the same time. In telecommunication and computer science, parallel
communication is a method of sending several data signals over a communication link at
one time. Examples is Industry Standard Architecture(ISA),Parallel ATA,IEEE
1284,Conventional PCI.
For synchronous data transfer, both sender and receiver access the data
according to the same clock. Therefore, a special line for the clock signal is
required. A master(or one of the senders) should provide clock signal to all the
receivers in synchronous data transfer mode. Synchronous data transfer supports
very high data transfer rate.
For asynchronous data transfer, there is no common clock signal between the
senders and receivers. Therefore, the sender and the receiver first need to agree
on a data transfer speed. This speed usually does not change after data transfer
starts. The data transfer rate is slow in asynchronous data transfer..
Data Flow Communication between two devices can be simplex, half-duplex, or full-
duplex:
In simplex mode, the communication is unidirectional, as on a one-way street. Only one
of the two devices on a link can transmit; the other can only receive. Keyboards and
traditional monitors are examples of simplex devices. The keyboard can only introduce
input; the monitor can only accept output. The simplex mode can use the entire capacity of
the channel to send data in one direction.
In half-duplex mode, each station can both transmit and receive, but not at the same
time. : When one device is sending, the other can only receive, and vice versa . The half-
duplex mode is like a one-lane road with traffic allowed in both directions. When cars are
traveling in one direction, cars going the other way must wait. Walkie-talkies and CB
(citizens band) radios are both half-duplex systems. The half-duplex mode is used in
cases where there is no need for communication in both directions at the same time; the
entire capacity of the channel can be utilized for each direction.
Type of Connection: A network is two or more devices connected through links. A link is
a communications pathway that transfers data from one device to another. For
visualization purposes, it is simplest to imagine any link as a line drawn between two
points. For communication to occur, two devices must be connected in some way to the
same link at the same time.
b)Multipoint.
b)Multipoint: A multipoint (also called multi drop) connection is one in which more than
two specific devices share a single link. In a multipoint environment, the capacity of the
channel is shared, either spatially or temporally. If several devices can use the link
simultaneously, it is a spatially shared connection. If users must take turns, it is a
timeshared connection.
Devices on the network are referred to as 'nodes.' The most common nodes are
computers and peripheral devices. Network topology is illustrated by showing these nodes
and their connections using cables.
1)Bus Topology
2)Ring Topology
3)Star Topology
4)Mesh Topology
5)Tree Topology
1)Bus Topology: In networking a bus is the central cable -- the main wire -- that connects
all devices on a local-area network (LAN). It is also called the backbone. This is often used
to describe the main network connections composing the Internet. Bus networks are
relatively inexpensive and easy to install for small networks. Ethernet systems use a bus
topology.A signal from the source is broadcasted and it travels to all workstations
connected to bus cable. Although the message is broadcasted but only the intended
recipient, whose MAC address or IP address matches, accepts it. If the MAC /IP address
of machine doesnt match with the intended address, machine discards the signal. A
terminator is added at ends of the central cable, to prevent bouncing of signals. A barrel
connector can be used to extend it.
1. It is cost effective.
2. Cable required is least compared to other network topology.
3. Used in small networks.
4. It is easy to understand.
5. Easy to expand joining two cables together.
2)Ring Topology: All the nodes are connected to each-other in such a way that they
make a closed loop. Each workstation is connected to two other components on either
side, and it communicates with these two adjacent neighbors. Data travels around the
network, in one direction. Sending and receiving of data takes place by the help of
TOKEN.
Token Passing: Token contains a piece of information which along with data is sent by
the source computer. This token then passes to next node, which checks if the signal is
intended to it. If yes, it receives it and passes the empty to into the network, otherwise
passes token along with the data to next node. This process continues until the signal
reaches its intended destination.
The nodes with token are the ones only allowed to send data. Other nodes have to wait for
an empty token to reach them. This network is usually found in offices, schools and small
buildings.
3)Star Topology: In a star network devices are connected to a central computer, called a
hub. Nodes communicate across the network by passing data through the hub.
Advantages of Star Topology
1) As compared to Bus topology it gives far much better performance, signals dont
necessarily get transmitted to all the workstations. A sent signal reaches the intended
destination after passing through no more than 3-4 devices and 2-3 links. Performance of
the network is dependent on the capacity of central hub.
2) Easy to connect new nodes or devices. In star topology new nodes can be added
easily without affecting rest of the network. Similarly components can also be removed
easily.
3) Centralized management. It helps in monitoring the network.
4) Failure of one node or link doesnt affect the rest of network. At the same time its easy
to detect the failure and troubleshoot it.
1) Too much dependency on central device has its own drawbacks. If it fails whole
network goes down.
2) The use of hub, a router or a switch as central device increases the overall cost of the
network.
3) Performance and as well number of nodes which can be added in such topology is
depended on capacity of central device.
4)Mesh Topology:In a mesh network, devices are connected with many redundant
interconnections between network nodes. In a true mesh topology every node has a
connection to every other node in the network.
Full mesh topology:occurs when every node has a circuit connecting it to every other
node in a network. Full mesh is very expensive to implement but yields the greatest
amount of redundancy, so in the event that one of those nodes fails, network traffic can be
directed to any of the other nodes. Full mesh is usually reserved for backbone networks.
Partial mesh topology: is less expensive to implement and yields less redundancy than
full mesh topology. With partial mesh, some nodes are organized in a full mesh scheme
but others are only connected to one or two in the network. Partial mesh topology is
commonly found in peripheral networks connected to a full meshed backbone.
ADVANTAGES OF MESH TOPOLOGY
5)Tree Topology: Tree Topology integrates the characteristics of Star and Bus Topology.
Earlier we saw how in Physical Star network Topology, computers (nodes) are connected
by each other through central hub. And we also saw in Bus Topology, work station devices
are connected by the common cable called Bus. After understanding these two network
configurations, we can understand tree topology better. In Tree Topology, the number of
Star networks are connected using Bus. This main cable seems like a main stem of a tree,
and other star networks as the branches. It is also called Expanded Star Topology.
ADVANTAGES OF TREE TOPOLOGY
1. Heavily cabled.
2. Costly.
3. If more nodes are added maintenance is difficult.
4. Central hub fails, network fails.
6)Hybrid Topology: A hybrid topology is a type of network topology that uses two or more
other network topologies, including bus topology, mesh topology, ring topology, star
topology, and tree topology.
Hybrid network topology has many advantages. Hybrid topologies are flexible, reliable,
have increased fault tolerance. The new nodes can be easily added to the hybrid network,
the network faults can be easily diagnosed and corrected without affecting the work of the
rest of network. But at the same time hybrid topologies are expensive and difficult for
managing.
Types of Network:
1)LAN: A LAN connects network devices over a relatively short distance. A networked
office building, school, or home usually contains a single LAN, though sometimes one
building will contain a few small LANs (perhaps one per room), and occasionally a LAN will
span a group of nearby buildings. In TCP/IP networking, a LAN is often but not always
implemented as a single IP subnet.A LAN typically relies mostly on wired connections for
increased speed and security, but wireless connections can also be part of a LAN. High
speed and relatively low cost are the defining characteristics of LANs.the maximum span
of 10 km.
2)WAN: A wide area network, or WAN, occupies a very large area, such as an entire
country or the entire world. A WAN can contain multiple smaller networks, such as LANs
or MANs. The Internet is the best-known example of a public WAN.
3)MAN: A metropolitan area network (MAN) is a hybrid between a LAN and a WAN. Like a
WAN, it connects two or more LANs in the same geographic area. A MAN, for example,
might connect two different buildings or offices in the same city. However, whereas WANs
typically provide low- to medium-speed access, MAN provide high-speed connections,
such as T1 (1.544Mbps) and optical services.
The optical services provided include SONET (the Synchronous Optical Network standard)
and SDH (the Synchronous Digital Hierarchy standard). With these optical services,
carriers can provide high-speed services, including ATM and Gigabit Ethernet. These two
optical services provide speeds ranging into the hundreds or thousands of megabits per
second (Mbps). Devices used to provide connections for MANs include high-end routers,
ATM switches, and optical switches.
5)Campus Area Network: This is a network which is larger than a LAN, but smaller than
an MAN. This is typical in areas such as a university, large school or small business. It is
typically spread over a collection of buildings which are reasonably local to each other. It
may have an internal Ethernet as well as capability of connecting to the internet.
6)Storage Area Network: This network connects servers directly to devices which store
amounts of data without relying on a LAN or WAN network to do so. This can involve
another type of connection known as Fibre Channel, a system similar to Ethernet which
handles high-performance disk storage for applications on a number of professional
networks.
OSI layers
The main concept of OSI is that the process of communication between two endpoints in a
telecommunication network can be divided into seven distinct groups of related functions,
or layers. Each communicating user or program is at a computer that can provide those
seven layers of function. So in a given message between users, there will be a flow of data
down through the layers in the source computer, across the network and then up through
the layers in the receiving computer. The seven layers of function are provided by a
combination of applications, operating systems, network card device drivers and
networking hardware that enable a system to put a signal on a network cable or out
over Wi-Fi or other wireless protocol).
1. Data link layer synchronizes the information which is to be transmitted over the
physical layer.
2. The main function of this layer is to make sure data transfer is error free from one
node to another, over the physical layer.
3. Transmitting and receiving data frames sequentially is managed by this layer.
4. This layer sends and expects acknowledgements for frames received and sent
respectively. Resending of non-acknowledgement received frames is also handled
by this layer.
5. This layer establishes a logical layer between two nodes and also manages the
Frame traffic control over the network. It signals the transmitting node to stop, when
the frame buffers are full.
1. It routes the signal through different channels from one node to other.
2. It acts as a network controller. It manages the Subnet traffic.
3. It decides by which route data should take.
4. It divides the outgoing messages into packets and assembles the incoming packets
into messages for higher levels.
Transport layer breaks the message (data) into small units so that they are handled more
efficiently by the network layer.
LAYER 5: THE SESSION LAYER :
1. Session layer manages and synchronize the conversation between two different
applications.
2. Transfer of data from source to destination session layer streams of data are
marked and are re-synchronized properly, so that the ends of the messages are not
cut prematurely and data loss is avoided.
1. Presentation layer takes care that the data is sent in such a way that the receiver
will understand the information (data) and will be able to use the data.
2. While receiving the data, presentation layer transforms the data to be ready for the
application layer.
3. Languages(syntax) can be different of the two communicating systems. Under this
condition presentation layer plays a role of translator.
4. It performs Data compression, Data encryption, Data conversion etc.
1. OSI model distinguishes well between the services, interfaces and protocols.
2. Protocols of OSI model are very well hidden.
3. Protocols can be replaced by new protocols as technology changes.
4. Supports connection oriented services as well as connectionless service.
Network Interface Card (NIC): NIC provides a physical connection between the
networking cable and the computer's internal bus. NICs come in three basic varieties 8 bit,
16 bit and 32 bit. The larger number of bits that can be transferred to NIC, the faster the
NIC can transfer data to network cable.
Repeater: Repeaters are used to connect together two Ethernet segments of any media
type. In larger designs, signal quality begins to deteriorate as segments exceed their
maximum length. We also know that signal transmission is always attached with energy
loss. So, a periodic refreshing of the signals is required.
Hubs: Hubs are actually multi part repeaters. A hub takes any incoming signal and
repeats it out all ports.
Bridges: When the size of the LAN is difficult to manage, it is necessary to break up the
network. The function of the bridge is to connect separate networks together. Bridges do
not forward bad or misaligned packets.
Switch: Switches are an expansion of the concept of bridging. Cut through switches
examine the packet destination address, only before forwarding it onto its destination
segment, while a store and forward switch accepts and analyzes the entire packet before
forwarding it to its destination. It takes more time to examine the entire packet, but it allows
catching certain packet errors and keeping them from propagating through the network.
Routers: Router forwards packets from one LAN (or WAN) network to another. It is also
used at the edges of the networks to connect to the Internet.
Gateway: Gateway acts like an entrance between two different networks. Gateway in
organisations is the computer that routes the traffic from a work station to the outside
network that is serving web pages. ISP (Internet Service Provider) is the gateway for
Internet service at homes.
ARP: Address Resolution Protocol (ARP) is a protocol for mapping an Internet Protocol
address (IP address) to a physical machine address that is recognized in the local
network. For example, in IP Version 4, the most common level of IP in use today, an
address is 32 bits long. In an Ethernet local area network, however, addresses for
attached devices are 48 bits long. (The physical machine address is also known as a
Media Access Control or MAC address.) A table, usually called the ARP cache, is used to
maintain a correlation between each MAC address and its corresponding IP address. ARP
provides the protocol rules for making this correlation and providing address conversion in
both directions.
There are four types of arp messages that may be sent by the arp protocol. These are
identified by four values in the "operation" field of an arp message. The types of message
are:
1)ARP request
2)ARP reply
3)RARP request
4)RARP reply
Frame Relay:
Frame Relay is a standardized wide area network technology that operates at the physical
and logical link layers of OSI model. Frame relay originally designed for transport across
Integrated Services Digital Network (ISDN) infrastructure, it may be used today in the
context of many other network interfaces.
Frame relay is an example of a packet switched technology. Packet switched network
enables end stations to dynamically share the network medium and the available
bandwidth.
Frame Relay is often described as a streamlined version of X.25, it is because frame relay
typically operates over WAN facilities that offer more reliable connection services. Frame
relay is strictly a layer 2 protocol suite, where as X.25 provides services at layer 3.
For most services, the network provides a permanent virtual circuit (PVC), which means
that the customer sees a continuous, dedicated connection without having to pay for a full-
time leased line, while the service provider figures out the route each frame travels to its
destination and can charge based on usage. Switched virtual circuits (SVC), by contrast,
are temporary connections that are destroyed after a specific data transfer is completed.In
order for a frame relay WAN to transmit data, data terminal equipment (DTE) and data
circuit-terminating equipment (DCE) are required. DTEs are typically located on the
customer's premises and can encompass terminals, routers, bridges and personal
computers. DCEs are managed by the carriers and provide switching and associated
services.
Frame Relay virtual circuits fall into two categories: switched virtual circuits (SVCs) and
permanent virtual circuits (PVCs).
Switched virtual circuits (SVCs) are temporary connections used in situations requiring
only sporadic data transfer between DTE devices across the Frame Relay network. A
communication session across an SVC consists of the following four operational states:
Call setupThe virtual circuit between two Frame Relay DTE devices is established.
Data transferData is transmitted between the DTE devices over the virtual circuit.
IdleThe connection between DTE devices is still active, but no data is transferred. If an
SVC remains in an idle state for a defined period of time, the call can be terminated.
Permanent virtual circuits (PVCs) are permanently established connections that are used
for frequent and consistent data transfers between DTE devices across the Frame Relay
network. Communication across a PVC does not require the call setup and termination
states that are used with SVCs. PVCs always operate in one of the following two
operational states:
Data transferData is transmitted between the DTE devices over the virtual circuit.
IdleThe connection between DTE devices is active, but no data is transferred. Unlike
SVCs, PVCs will not be terminated under any circumstances when in an idle state.
DTE devices can begin transferring data whenever they are ready because the circuit is
permanently established.
X.25:
X.25 Packet Switched networks allow remote devices to communicate with each other
over private digital links without the expense of individual leased lines. Packet Switching is
a technique whereby the network routes individual packets of HDLC data between
different destinations based on addressing within each packet. An X.25 network consists
of a network of interconnected nodes to which user equipment can connect. The user end
of the network is known as Data Terminal Equipment (DTE) and the carriers equipment
is Data Circuit-terminating Equipment (DCE) . X.25 routes packets across the network
from DTE to DTE.
The X.25 standard corresponds in functionality to the first three layers of the Open
Systems Interconnection (OSI) reference model for networking. Specifically, X.25 defines
the following:
The physical layer interface for connecting data terminal equipment (DTE), such as
computers and terminals at the customer premises, with the data communications
equipment (DCE), such as X.25 packet switches at the X.25 carriers facilities. The
physical layer interface of X.25 is called X.21bis and was derived from the RS-232
interface for serial transmission.
The data-link layer protocol called Link Access Procedure, Balanced (LAPB), which
defines encapsulation (framing) and error-correction methods. LAPB also enables
the DTE or the DCE to initiate or terminate a communication session or initiate data
transfer. LAPB is derived from the High-level Data Link Control (HDLC) protocol.
The network layer protocol called the Packet Layer Protocol (PLP), which defines
how to address and deliver X.25 packets between end nodes and switches on an
X.25 network using permanent virtual circuits (PVCs) or switched virtual circuits
(SVCs). This layer is responsible for call setup and termination and for managing
transfer of packets.
IP address is short for Internet Protocol (IP) address. An IP address an identifier for
a computer or device on a TCP/IP network. Networks using the
TCP/IP protocol route messages based on the IP address of the destination.
Contrast with IP, which specifies the format of packets also called datagrams, and
the addressing scheme.
An IP is a 32-bit number comprised of a host number and a network prefix, both of which
are used to uniquely identify each node within a network.To make these addresses more
readable, they are broken up into 4 bytes, or octets, where any 2 bytes are separated by a
period. This is commonly referred to as dotted decimal notation.The first part of an Internet
address identifies the network on which the host resides, while the second part identifies
the particular host on the given network. This creates the two-level addressing
hierarchy.All hosts on a given network share the same network prefix but must have a
unique host number. Similarly, any two hosts on different networks must have different
network prefixes but may have the same host number. Subnet masks are 32 bits long and
are typically represented in dotted-decimal (such as 255.255.255.0) or the number of
networking bits (such as /24).
The host's formula will tell you how many hosts will be allowed on a network that has a
certain subnet mask. The host's formula is 2n - 2. The "n" in the host's formula represents
the number of 0s in the subnet mask, if the subnet mask were converted to binary.
Network Masks
A network mask helps you know which portion of the address identifies the network and
which portion of the address identifies the node. Class A, B, and C networks have default
masks, also known as natural masks, as shown here:
Class A: 255.0.0.0
Class B: 255.255.0.0
Class C: 255.255.255.0
An IP address on a Class A network that has not been subnetted would have an
address/mask pair similar to: 8.20.15.1 255.0.0.0. In order to see how the mask helps you
identify the network and node parts of the address, convert the address and mask to
binary numbers.
8.20.15.1 = 00001000.00010100.00001111.00000001
255.0.0.0 = 11111111.00000000.00000000.00000000
Once you have the address and the mask represented in binary, then identification of the
network and host ID is easier. Any address bits which have corresponding mask bits set to
1 represent the network ID. Any address bits that have corresponding mask bits set to 0
represent the node ID.
8.20.15.1 = 00001000.00010100.00001111.00000001
255.0.0.0 = 11111111.00000000.00000000.00000000
-----------------------------------
net id | host id
netid = 00001000 = 8
hostid = 00010100.00001111.00000001 = 20.15.1
A subnet mask is what tells the computer what part of the IP address is the
network and what part is for the host computers on that network.
Subnetting
Subnetting is a process of breaking large network in small networks known as subnets.
Subnetting happens when we extend default boundary of subnet mask. Basically we
borrow host bits to create networks. Let's take a example
Being a network administrator you are asked to create two networks, each will host 30
systems.Single class C IP range can fulfill this requirement, still you have to purchase 2
class C IP range, one for each network. Single class C range provides 256 total addresses
and we need only 30 addresses, this will waste 226 addresses. These unused addresses
would make additional route advertisements slowing down the network.With subnetting
you only need to purchase single range of class C. You can configure router to take first
26 bits instead of default 24 bits as network bits. In this case we would extend default
boundary of subnet mask and borrow 2 host bits to create networks. By taking two bits
from the host range and counting them as network bits, we can create two new subnets,
and assign hosts them. As long as the two new network bits match in the address, they
belong to the same network. You can change either of the two bits, and you would be in a
new subnet.
Advantage of Subnetting
Subnetting breaks large network in smaller networks and smaller networks are
easier to manage.
Subnetting reduces network traffic by removing collision and broadcast traffic, that
overall improve performance.
Subnetting allows you to apply network security polices at the interconnection
between subnets.
Subnetting allows you to save money by reducing requirement for IP range.
CIDR [ Classless Inter Domain Routing]:CIDR is a slash notation of subnet mask.
CIDR tells us number of on bits in a network address.
Class A has default subnet mask 255.0.0.0. that means first octet of the subnet
mask has all on bits. In slash notation it would be written as /8, means address has
8 bits on.
Class B has default subnet mask 255.255.0.0. that means first two octets of the
subnet mask have all on bits. In slash notation it would be written as /16, means
address has 16 bits on.
Class C has default subnet mask 255.255.255.0. that means first three octets of the
subnet mask have all on bits. In slash notation it would be written as /24, means
address has 24 bits on.
Subscribe Study Regular YouTube
Channel and Join Our Facebook Group
For MCQ and Understand these Topic
Concepts
Multiplexing :To combine multiple signals (analog or digital) for transmission over
a single line or media. A common type of multiplexing combines several low-speed
signals for transmission over a single high-speed connection. Multiplexing is done by
using a device called multiplexer (MUX) that combines n input lines to generate one
output line i.e. (many to one). Therefore multiplexer (MUX) has several inputs and one
output. At the receiving end, a device called demultiplexer (DEMUX) is used that
separates signal into its component signals. So DEMUX has one input and several
outputs.
Time Division Multiplexing (TDM) :Short for Time Division Multiplexing, a type of
multiplexing that combines data streams by assigning each stream a different time slot
in a set. TDM repeatedly transmits a fixed sequence of time slots over a single
transmission channel. Within T-Carrier systems, such as T-1 and T-3, TDM
combines Pulse Code Modulated (PCM) streams created for each conversation or data
stream.
Rules of Network Protocol include guidelines that regulate the following characteristics of a
network: access method, allowed physical topologies, types of cabling, and speed of data
transfer.
Types of Network Protocols
Ethernet
Local Talk
Token Ring
FDDI
ATM
The followings are some commonly used network symbols to draw different kinds of
network protocols.
Ethernet
The Ethernet protocol is by far the most widely used one. Ethernet uses an access method
called CSMA/CD (Carrier Sense Multiple Access/Collision Detection). This is a system where
each computer listens to the cable before sending anything through the network. If the
network is clear, the computer will transmit. If some other nodes have already transmitted on
the cable, the computer will wait and try again when the line is clear. Sometimes, two
computers attempt to transmit at the same instant. A collision occurs when this happens. Each
computer then backs off and waits a random amount of time before attempting to retransmit.
With this access method, it is normal to have collisions. However, the delay caused by
collisions and retransmitting is very small and does not normally effect the speed of
transmission on the network.
The Ethernet protocol allows for linear bus, star, or tree topologies. Data can be transmitted
over wireless access points, twisted pair, coaxial, or fiber optic cable at a speed of 10 Mbps up
to 1000 Mbps.
Fast Ethernet
To allow for an increased speed of transmission, the Ethernet protocol has developed a new
standard that supports 100 Mbps. This is commonly called Fast Ethernet. Fast Ethernet
requires the application of different, more expensive network concentrators/hubs and
network interface cards. In addition, category 5 twisted pair or fiber optic cable is necessary.
Fast Ethernet is becoming common in schools that have been recently wired.
Local Talk
Local Talk is a network protocol that was developed by Apple Computer, Inc. for Macintosh
computers. The method used by Local Talk is called CSMA/CA (Carrier Sense Multiple Access
with Collision Avoidance). It is similar to CSMA/CD except that a computer signals its intent to
transmit before it actually does so. Local Talk adapters and special twisted pair cable can be
used to connect a series of computers through the serial port. The Macintosh operating
system allows the establishment of a peer-to-peer network without the need for additional
software. With the addition of the server version of AppleShare software, a client/server
network can be established.
The Local Talk protocol allows for linear bus, star, or tree topologies using twisted pair cable. A
primary disadvantage of Local Talk is low speed. Its speed of transmission is only 230 Kbps.
Token Ring
The Token Ring protocol was developed by IBM in the mid-1980s. The access method used
involves token-passing. In Token Ring, the computers are connected so that the signal travels
around the network from one computer to another in a logical ring. A single electronic token
moves around the ring from one computer to the next. If a computer does not have
information to transmit, it simply passes the token on to the next workstation. If a computer
wishes to transmit and receives an empty token, it attaches data to the token. The token then
proceeds around the ring until it comes to the computer for which the data is meant. At this
point, the data is captured by the receiving computer. The Token Ring protocol requires a star-
wired ring using twisted pair or fiber optic cable. It can operate at transmission speeds of 4
Mbps or 16 Mbps. Due to the increasing popularity of Ethernet, the use of Token Ring in
school environments has decreased.
FDDI
Fiber Distributed Data Interface (FDDI) is a network protocol that is used primarily to
interconnect two or more local area networks, often over large distances. The access method
used by FDDI involves token-passing. FDDI uses a dual ring physical topology. Transmission
normally occurs on one of the rings; however, if a break occurs, the system keeps information
moving by automatically using portions of the second ring to create a new complete ring. A
major advantage of FDDI is high speed. It operates over fiber optic cable at 100 Mbps.
ATM
Asynchronous Transfer Mode (ATM) is a network protocol that transmits data at a speed of
155 Mbps and higher. ATM works by transmitting all data in small packets of a fixed size;
whereas, other protocols transfer variable length packets. ATM supports a variety of media
such as video, CD-quality audio, and imaging. ATM employs a star topology, which can work
with fiber optic as well as twisted pair cable.
ATM is most often used to interconnect two or more local area networks. It is also frequently
used by Internet Service Providers to utilize high-speed access to the Internet for their clients.
As ATM technology becomes more cost-effective, it will provide another solution for
constructing faster local area networks.
Gigabit Ethernet
The most latest development in the Ethernet standard is a protocol that has a transmission
speed of 1 Gbps. Gigabit Ethernet is primarily used for backbones on a network at this time. In
the future, it will probably also be used for workstation and server connections. It can be used
with both fiber optic cabling and copper. The 1000BaseTX, the copper cable used for Gigabit
Ethernet, became the formal standard in 1999.
Protocol Cable Speed Topology
Ethernet Twisted Pair, Coaxial, 10 Mbps Linear Bus, Star,
Fiber Tree
Fast Twisted Pair, Fiber 100 Mbps Star
Ethernet
Local Talk Twisted Pair .23 Mbps Linear Bus or Star
Token Ring Twisted Pair 4 Mbps - 16 Star-Wired Ring
Mbps
FDDI Fiber 100 Mbps Dual ring
ATM Twisted Pair, Fiber 155-2488 Mbps Linear Bus, Star,
Tree
Compare the Network Protocols
Carrier Sensed Multiple Access (CSMA) : CSMA is a network access method used on shared
network topologies such as Ethernet to control access to the network. Devices attached to
the network cable listen (carrier sense) before transmitting. If the channel is in use, devices
wait before transmitting. MA (Multiple Access) indicates that many devices can connect to
and share the same network. All devices have equal access to use the network when it is
clear.
In CSMA/CD (Carrier Sense Multiple Access/Collision Detection) Access Method, every host
has equal access to the wire and can place data on the wire when the wire is free from traffic.
When a host want to place data on the wire, it will sense the wire to find whether there is a
signal already on the wire. If there is traffic already in the medium, the host will wait and if
there is no traffic, it will place the data in the medium. But, if two systems place data on the
medium at the same instance, they will collide with each other, destroying the data. If the
data is destroyed during transmission, the data will need to be retransmitted. After collision,
each host will wait for a small interval of time and again the data will be retransmitted, to
avoid collision again.
In CSMA/CA, before a host sends real data on the wire it will sense the wire to check if the
wire is free. If the wire is free, it will send a piece of dummy data on the wire to see whether
it collides with any other data. If it does not collide, the host will assume that the real data also
will not collide.
Token Passing
In CSMA/CD and CSMA/CA the chances of collisions are there. As the number of hosts in the
network increases, the chances of collisions also will become more. In token passing, when a
host want to transmit data, it should hold the token, which is an empty packet. The token is
circling the network in a very high speed. If any workstation wants to send data, it should wait
for the token. When the token has reached the workstation, the workstation can take the
token from the network, fill it with data, mark the token as being used and place the token
back to the network
TCP/IP means Transmission Control Protocol and Internet Protocol. It is the network model used in
the current Internet architecture as well. Protocols are set of rules which govern every possible
communication over a network. These protocols describe the movement of data between the
source and destination or the internet. These protocols offer simple naming and addressing
schemes.
1. It operated independently.
2. It is scalable.
3. Client/server architecture.
4. Supports a number of routing protocols.
5. Can be used to establish a connection between two computers.
Demerits of TCP/IP
Data link layer is layer 2 in OSI model. It is responsible for communications between adjacent
network nodes. It handles the data moving in and out across the physical layer. It also provides a
well defined service to the network layer. Data link layer is divided into two sub layers. The Media
Access Control (MAC) and logical Link Control (LLC).
Data-Link layer ensures that an initial connection has been set up, divides output data into data
frames, and handles the acknowledgements from a receiver that the data arrived successfully. It
also ensures that incoming data has been received successfully by analyzing bit patterns at special
places in the frames.
In the following sections data link layer's functions- Error control and Flow control has been
discussed. After that MAC layer is explained. Multiple access protocols are explained in the MAC
layer section.
Network is responsible for transmission of data from one device to another device. The end to end
transfer of data from a transmitting application to a receiving application involves many steps, each
subject to error. With the error control process, we can be confident that the transmitted and
received data are identical. Data can be corrupted during transmission. For reliable communication,
error must be detected and corrected.
Error control is the process of detecting and correcting both the bit level and packet level errors.
Types of Errors
Single Bit Error
The term single bit error means that only one bit of the data unit was changed from 1 to 0 and 0 to
1.
Burst Error
In term burst error means that two or more bits in the data unit were changed. Burst error is also
called packet level error, where errors like packet loss, duplication, reordering.
Error Detection
Error detection is the process of detecting the error during the transmission between the sender
and the receiver.
Types of error detection
Parity checking
Cyclic Redundancy Check (CRC)
Checksum
Redundancy
Redundancy allows a receiver to check whether received data was corrupted during transmission.
So that he can request a retransmission. Redundancy is the concept of using extra bits for use in
error detection. As shown in the figure sender adds redundant bits (R) to the data unit and sends to
receiver, when receiver gets bits stream and passes through checking function. If no error then
data portion of the data unit is accepted and redundant bits are discarded. otherwise asks for the
retransmission.
Parity checking
Parity adds a single bit that indicates whether the number of 1 bits in the preceding data is even or
odd. If a single bit is changed in transmission, the message will change parity and the error can be
detected at this point. Parity checking is not very robust, since if the number of bits changed is
even, the check bit will be invalid and the error will not be detected.
Data unit is composite by number of 0s, which is one less than the divisor.
Then it is divided by the predefined divisor using binary division technique. The remainder is
called CRC. CRC is appended to the data unit and is sent to the receiver.
Receiver follows following steps.
When data unit arrives followed by the CRC it is divided by the same divisor which was used to
find the CRC (remainder).
If the remainder result in this division process is zero then it is error free data, otherwise it is
corrupted.
Checksum
Check sum is the third method for error detection mechanism. Checksum is used in the upper
layers, while Parity checking and CRC is used in the physical layer. Checksum is also on the
concept of redundancy.
In the checksum mechanism two operations to perform.
Checksum generator
Sender uses checksum generator mechanism. First data unit is divided into equal segments of n
bits. Then all segments are added together using 1s complement. Then it complements ones
again. It becomes Checksum and sends along with data unit.
Exp:
If 16 bits 10001010 00100011 is to be sent to receiver.
So the checksum is added to the data unit and sends to the receiver. Final data unit is 10001010
00100011 01010000.
Checksum checker
Receiver receives the data unit and divides into segments of equal size of segments. All segments
are added using 1s complement. The result is completed once again. If the result is zero, data will
be accepted, otherwise rejected.
Exp:
The final data is nonzero then it is rejected.
Error Correction
This type of error control allows a receiver to reconstruct the original information when it has been
corrupted during transmission.
Hamming Code
It is a single bit error correction method using redundant bits.
In this method redundant bits are included with the original data. Now, the bits are arranged such
that different incorrect bits produce different error results and the corrupt bit can be identified. Once
the bit is identified, the receiver can reverse its value and correct the error. Hamming code can be
applied to any length of data unit and uses the relationships between the data and the redundancy
bits.
Algorithm:
If the error occurred at bit 7 which is changed from 1 to 0, then receiver recalculates the same sets
of bits used by the sender. By this we can identify the perfect location of error occurrence. once the
bit is identified the receiver can reverse its value and correct the error.
Flow Control is one important design issue for the Data Link Layer that controls the flow of data
between sender and receiver.
In Communication, there is communication medium between sender and receiver. When Sender
sends data to receiver than there can be problem in below case :
1) Sender sends data at higher rate and receive is too sluggish to support that data rate.
To solve the above problem, FLOW CONTROL is introduced in Data Link Layer. It also works on
several higher layers. The main concept of Flow Control is to introduce EFFICIENCY in Computer
Networks.
Approaches of Flow Control
Feed back based Flow Control is used in Data Link Layer and Rate based
Flow Control is used in Network Layer.
1. TIMER, if sender was not able to get acknowledgment in the particular time than, it sends
the buffered data once again to receiver. When sender starts to send the data, it starts
timer.
2. SEQUENCE NUMBER, from this the sender sends the data with the specific sequence
number so after receiving the data, receiver sends the data with that sequence number,
and here at sender side it also expect the acknowledgment of the same sequence number.
This type of scheme is called Positive Acknowledgment with Retransmission (PAR).
1. Sender first start sending the data and receiver start sending data after it receive the data.
2. Receiver and sender both start sending packets simultaneously,
First case is simple and works perfectly, but there will be an error in the second one. That error can
be like duplication of the packet, without any transmission error.
The problem with pipelining is if sender sending 10 packets, but the problem occurs in 8th one than
it is needed to resend whole data. So the protocol called Go back N and Selective Repeat were
introduced to solve this problem.In this protocol, there are two possibility at the receivers end, it
may be with large window size or it may be with window size one.
iii. A Protocol Using Selective Repeat
Protocol using Go back N is good when the errors are rare, but if the line is poor, it wastes a lot of
bandwidth on retransmitted frames. So to provide reliability, Selective repeat protocol was
introduced. In this protocol sender starts it's window size with 0 and grows to some predefined
maximum number. Receiver's window size is fixed and equal to the maximum number of sender's
window size. The receiver has a buffer reserved for each sequence number within its fixed window.
Whenever a frame arrives, its sequence number is checked by the function to see if it falls within
the window, if so and if it has not already been received, it is accepted and stored. This action is
taken whether it is not expected by the network layer.
The data link layer is divided into two sublayers: The Media Access Control (MAC) layer and
the Logical Link Control (LLC) layer. The MAC sublayer controls how a computer on the network
gains access to the data and permission to transmit it. The LLC layer controls frame
synchronization, flow control and error checking.
Mac Layer is one of the sublayers that makeup the datalink layer of the OSI reference Model.MAC
layer is responsible for moving packets from one Network Interface card NIC to another across the
shared channelThe MAC sublayer uses MAC protocols to ensure that signals sent from different
stations across the same channel don't collide.
Different protocols are used for different shared networks, such as Ethernets, Token Rings,
Token Buses, and WANs.
1. ALOHA
ALOHA is a simple communication scheme in which each source in a network sends its data
whenever there is a frame to send without checking to see if any other station is active. After
sending the frame each station waits for implicit or explicit acknowledgment. If the frame
successfully reaches the destination, next frame is sent. And if the frame fails to be received at the
destination it is sent again.
Pure ALOHA ALOHA is the simplest technique in multiple accesses. Basic idea of this mechanism
is a user can transmit the data whenever they want. If data is successfully transmitted then there
isnt any problem. But if collision occurs than the station will transmit again. Sender can detect the
collision if it doesnt receive the acknowledgment from the receiver.
Slotted ALOHA
In ALOHA a newly emitted packet can collide with a packet in progress. If all packets are of the
same length and take L time units to transmit, then it is easy to see that a packet collides with any
other packet transmitted in a time window of length 2L. If this time window is decreased somehow,
than number of collisions decreases and the throughput increase. This mechanism is used in
slotted ALOHA or S-ALOHA. Time is divided into equal slots of Length L. When a station wants to
send a packet it will wait till the beginning of the next time slot.
Advantages of slotted ALOHA:
a. Persistent
When a station has the data to send, it first listens the channel to check if anyone else is
transmitting data or not. If it senses the channel idle, station starts transmitting the data. If it senses
the channel busy it waits until the channel is idle. When a station detects a channel idle, it transmits
its frame with probability P. Thats why this protocol is called p-persistentCSMA. This protocol
applies to slotted channels. When a station finds the channel idle, if it transmits the fame with
probability 1, that this protocol is known as 1 -persistent. 1 -persistent protocol is the most
aggressive protocol.
b. Non-Persistent
Non persistent CSMA is less aggressive compared to P persistent protocol. In this protocol, before
sending the data, the station senses the channel and if the channel is idle it starts transmitting the
data. But if the channel is busy, the station does not continuously sense it but instead of that it
waits for random amount of time and repeats the algorithm. Here the algorithm leads to better
channel utilization but also results in longer delay compared to 1 persistent.
Transmission media is a pathway that carries the information from sender to receiver. We
use different types of cables or waves to transmit data. Data is transmitted normally through
electrical or electromagnetic signals.
It is the transmission media in which signals are confined to a specific path using wire or cable. The
types of Bounded/ Guided are discussed below.
Frequency
Thickness
Type of electromagnetic noise field
Distance from the shield to the noise source
Shield discontinuity
Grounding practices
Some STP cablings make use of a thick copper braided shield which makes the cable thicker,
heavier, and in turn much more difficult for installation as compared to the UTP cables.
COAXIAL CABLE:
Coaxial cable is very common & widely used commutation media. For example TV wire is
usually coaxial.
Coaxial cable gets its name because it contains two conductors that are parallel to each other.
The center conductor in the cable is usually copper. The copper can be either a solid wire or
stranded martial.
Outside this central Conductor is a non-conductive material. It is usually white, plastic material
used to separate the inner Conductor form the outer Conductor. The other Conductor is a fine
mesh made from Copper. It is used to help shield the cable form EMI.
Outside the copper mesh is the final protective cover. (as shown in Fig)
The actual data travels through the center conductor in the cable. EMI interference is caught
by outer copper mesh. There are different types of coaxial cable vary by gauge & impedance.
Gauge is the measure of the cable thickness. It is measured by the Radio grade measurement,
or RG number. The high the RG number, the thinner the central conductor core, the lower the
number the thicker the core.
Here the most common coaxial standards.
50-Ohm RG-7 or RG-11 : used with thick Ethernet.
50-Ohm RG-58 : used with thin Ethernet
75-Ohm RG-59 : used with cable television
93-Ohm RG-62 : used with ARCNET.
Fiber Optics
Fiber optic cable uses electrical signals to transmit data. It uses light. In fiber optic cable light
only moves in one direction for two way communication to take place a second connection
must be made between the two devices. It is actually two stands of cable. Each stand is
responsible for one direction of communication. A laser at one device sends pulse of light
through this cable to other device. These pulses translated into 1s and 0s at the other end.
In the center of fiber cable is a glass stand or core. The light from the laser moves through this
glass to the other device around the internal core is a reflective material known as CLADDING.
No light escapes the glass core because of this reflective cladding.
Fiber optic cable has bandwidth more than 2 gbps (Gigabytes per Second)
A wireless network enables people to communicate and access applications and information without
wires. This provides freedom of movement and the ability to extend applications to different parts of a
building, city, or nearly anywhere in the world. Wireless networks allow people to interact with e-mail
or browse the Internet from a location that they prefer.
Many types of wireless communication systems exist, but a distinguishing attribute of a wireless
network is that communication takes place between computer devices. These devices include personal
digital assistants (PDAs), laptops, personal computers (PCs), servers, and printers. Computer devices
have processors, memory, and a means of interfacing with a particular type of network. Traditional cell
phones don't fall within the definition of a computer device; however, newer phones and even audio
headsets are beginning to incorporate computing power and network adapters. Eventually, most
electronics will offer wireless network connections.
As with networks based on wire, or optical fiber, wireless networks convey information between
computer devices. The information can take the form of e-mail messages, web pages, database
records, streaming video or voice. In most cases, wireless networks transfer data, such as e-mail
messages and files, but advancements in the performance of wireless networks is enabling support for
video and voice communications as well.
The Institute of Electrical and Electronics Engineers(IEEE) is a standards setting body. Each of their
standards is numbered and a subset of the number is the actual standard. The 802 family of standards
is ones developed for computer networking.
IEEE, or Institute of Electrical and Electronics Engineers, is a standards setting body. They create
standards for things like networking so products can be compatible with one another. You may have
heard of IEEE 802.11b - this is the standard that IEEE has set (in this example, wireless-b networking).
Several networking technologies: 802.2, 802.3, 802.5, 802.11, and FDDI. Each of these is just a standard
set of technologies, each with its own characteristics.
The technical definition for 802.2 is "the standard for the upper Data Link Layer sublayer also known as
the Logical Link Control layer. It is used with the 802.3, 802.4, and 802.5 standards (lower DL
sublayers)."
802.2 "specifies the general interface between the network layer (IP, IPX, etc) and the data link layer
(Ethernet, Token Ring, etc).
Basically, think of the 802.2 as the "translator" for the Data Link Layer. 802.2 is concerned with
managing traffic over the physical network. It is responsible for flow and error control. The Data Link
Layer wants to send some data over the network, 802.2 Logical Link Control helps make this possible. It
also helps by identifying the line protocol, like NetBIOS, or Netware.
The LLC acts like a software bus allowing multiple higher layer protocols to access one or more lower
layer networks. For example, if you have a server with multiple network interface cards, the LLC will
forward packers from those upper layer protocols to the appropriate network interface. This allows the
upper layer protocols to not need specific knowledge of the lower layer networks in use.
802.3 Ethernet
Now that we have an overview of the OSI model, we can continue on these topics. I hope you have a
clearer picture of the network model and where things fit on it.
802.3 is the standard which Ethernet operates by. It is the standard for CSMA/CD (Carrier Sense
Multiple Access with Collision Detection). This standard encompasses both the MAC and Physical Layer
standards.
CSMA/CD is what Ethernet uses to control access to the network medium (network cable). If there is
no data, any node may attempt to transmit, if the nodes detect a collision, both stop transmitting and
wait a random amount of time before retransmitting the data.
The original 802.3 standard is 10 Mbps (Megabits per second). 802.3u defined the 100 Mbps (Fast
Ethernet) standard, 802.3z/802.3ab defined 1000 Mbps Gigabit Ethernet, and 802.3ae define 10
Gigabit Ethernet.
Commonly, Ethernet networks transmit data in packets, or small bits of information. A packet can be a
minimum size of 72 bytes or a maximum of 1518 bytes.
As we mentioned earlier when discussing the ring topology, Token Ring was developed primarily by
IBM. Token ring is designed to use the ring topology and utilizes a token to control the transmission of
data on the network.
The token is a special frame which is designed to travel from node to node around the ring. When it
does not have any data attached to it, a node on the network can modify the frame, attach its data and
transmit. Each node on the network checks the token as it passes to see if the data is intended for that
node, if it is; it accepts the data and transmits a new token. If it is not intended for that node, it
retransmits the token on to the next node.
The token ring network is designed in such a way that each node on the network is guaranteed access
to the token at some point. This equalizes the data transfer on the network. This is different from an
Ethernet network where each workstation has equal access to grab the available bandwidth, with the
possible of a node using more bandwidth than other nodes.
Originally, token ring operated at a speed of about 4 Mbps and 16 Mbps. 802.5t allows for 100 Mbps
speeds and 802.5v provides for 1 Gbps over fibber.
Token ring can be run over a star topology as well as the ring topology.
There are three major cable types for token ring: Unshielded twisted pair (UTP), Shielded twisted pair
(STP), and fibber.
Token ring utilizes a Multi-station Access Unit (MAU) as a central wiring hub. This is also sometimes
called a MSAU when referring to token ring networks.
802.11 is the collection of standards setup for wireless networking. You are probably familiar with the
three popular standards: 802.11a, 802.11b, 802.11g and latest one is 802.11n. Each standard uses a
frequency to connect to the network and has a defined upper limit for data transfer speeds.
802.11a was one of the first wireless standards. 802.11a operates in the 5Ghz radio band and can
achieve a maximum of 54Mbps. Wasn't as popular as the 802.11b standard due to higher prices and
lower range.
802.11b operates in the 2.4Ghz band and supports up to 11 Mbps. Range of up to several hundred feet
in theory. The first real consumer option for wireless and very popular.
802.11g is a standard in the 2.4Ghz band operating at 54Mbps. Since it operates in the same band as
802.11b, 802.11g is compatible with 802.11b equipment. 802.11a is not directly compatible with
802.11b or 802.11g since it operates in a different band.
Wireless LANs primarily use CSMA/CA - Carrier Sense Multiple Access/Collision Avoidance. It has a
"listen before talk" method of minimizing collisions on the wireless network. This results in less need
for retransmitting data.
Cryptography can reformat and transform our data, making it safer on its trip between
computers. The technology is based on the essentials of secret codes, augmented by modern
mathematics that protects our data in powerful ways.
Computer Security - generic name for the collection of tools designed to protect data and to
thwart hackers
Internet Security - measures to protect data during their transmission over a collection of
interconnected networks.
Security Attacks, Services and Mechanisms: To assess the security needs of an organization
effectively, the manager responsible for security needs some systematic way of defining the
requirements for security and characterization of approaches to satisfy those requirements. One
approach is to consider three aspects of information security:
Security attack Any action that compromises the security of information owned by an
organization.
Security mechanism A mechanism that is designed to detect, prevent or recover from a
security attack.
Security service A service that enhances the security of the data processing systems and the
information transfers of an organization. The services are intended to counter security attacks
and they make use of one or more security mechanisms to provide the service.
Basic Concepts:
Cipher An algorithm for transforming an intelligible message into one that is unintelligible by
transposition and/or substitution methods
Key Some critical information used by the cipher, known only to the sender& receiver
Encipher (encode) The process of converting plaintext to cipher text using a cipher and a key
Decipher (decode) the process of converting cipher text back into plaintext using a cipher and
a key
Known-Plaintext Analysis (KPA): Attacker decrypt ciphertexts with known partial plaintext.
Chosen-Plaintext Analysis (CPA): Attacker uses ciphertext that matches arbitrarily selected
plaintext via the same algorithm technique.
Ciphertext-Only Analysis (COA): Attacker uses known ciphertext collections.
Man-in-the-Middle (MITM) Attack: Attack occurs when two parties use message or key
sharing for communication via a channel that appears secure but is actually compromised.
Attacker employs this attack for the interception of messages that pass through the
communications channel. Hash functions prevent MITM attacks.
Adaptive Chosen-Plaintext Attack (ACPA): Similar to a CPA, this attack uses chosen plaintext
and ciphertext based on data learned from past encryptions.
Cryptology Both cryptography and cryptanalysis
Code An algorithm for transforming an intelligible message into an unintelligible one using a
code-book
Cryptography:
Type of operations used for transforming plain text to cipher text All the encryption algorithms
are based on two general principles: substitution, in which each element in the plaintext is
mapped into another element, and transposition, in which elements in the plaintext are
rearranged.
The number of keys used If the sender and receiver uses same key then it is said to be
symmetric key (or) single key (or) conventional encryption. If the sender and receiver use
different keys then it is said to be public key encryption.
The way in which the plain text is processed A block cipher processes the input and block of
elements at a time, producing output block for each input block. A stream cipher processes the
input elements continuously, producing output element one at a time, as it goes along.
Cryptanalysis:
There are various types of cryptanalytic attacks based on the amount of information known
to the cryptanalyst.
Cipher text only A copy of cipher text alone is known to the cryptanalyst. Known plaintext
The cryptanalyst has a copy of the cipher text and the corresponding plaintext.
Chosen plaintext The cryptanalysts gains temporary access to the encryption machine. They
cannot open it to find the key, however; they can encrypt a large number of suitably chosen
plaintexts and try to use the resulting cipher texts to deduce the key.
Chosen cipher text The cryptanalyst obtains temporary access to the decryption machine,
uses it to decrypt several string of symbols, and tries to use the results to deduce the key.
Diffie-Hellman:
RSA is used to come up with a public/private key pair for asymmetric ("public-key")
encryption:
Working: (based upon the above paint example)
alice and bob produces a mix based upon their secret colour
exchange the mix between them
finalize a common secret
RSA:
Working:
sender encrypts the data to be transferred using using the public key of the
recipient
receiver decrypts the encrypted data using his private key
Web application security is the process of securing confidential data stored online from
unauthorized access and modification. This is accomplished by enforcing stringent policy
measures. Security threats can compromise the data stored by an organization is hackers
with malicious intentions try to gain access to sensitive information.
The aim of Web application security is to identify the following:
SQL Injection
XSS (Cross Site Scripting)
Remote Command Execution
Path Traversal
1)SQL Injection: SQL injection is a type of security exploit in which the attacker adds
Structured Query Language (SQL) code to a Web form input box to gain access to
resources or make changes to data. An SQL query is a request for some action to be
performed on a database. Typically, on a Web form for user authentication, when a user
enters their name and password into the text boxes provided for them, those values are
inserted into a SELECT query. If the values entered are found as expected, the user is
allowed access; if they aren't found, access is denied. However, most Web forms have no
mechanisms in place to block input other than names and passwords. Unless such
precautions are taken, an attacker can use the input boxes to send their own request to
the database, which could allow them to download the entire database or interact with it in
other illicit ways and by injecting a SQL statement, like ) OR 1=1--, the attacker can
access information stored in the web sites database. Of course, the example used above
represents a relatively simple SQL statement. Ones used by attackers are often much
more sophisticated if they know what the tables in the database are since these complex
statements can generally produce better results.
2)Cross Site Scripting: Cross-Site Scripting (XSS) attacks are a type of injection, in
which malicious scripts are injected into otherwise benign and trusted web sites. XSS
attacks occur when an attacker uses a web application to send malicious code, generally
in the form of a browser side script, to a different end user. Flaws that allow these attacks
to succeed are quite widespread and occur anywhere a web application uses input from a
user within the output it generates without validating or encoding it.An attacker can use
XSS to send a malicious script to an unsuspecting user. The end users browser has no
way to know that the script should not be trusted, and will execute the script. Because it
thinks the script came from a trusted source, the malicious script can access any cookies,
session tokens, or other sensitive information retained by the browser and used with that
site. These scripts can even rewrite the content of the HTML page.
3)Remote Command Execution:Remote Command Execution vulnerabilities allow
attackers to pass arbitrary commands to other applications. In severe cases, the attacker
can obtain system level privileges allowing them to attack the servers from a remote
location and execute whatever commands they need for their attack to be successful.
HTTPS was originally used mainly to secure sensitive web traffic such as financial
transactions, but it is now common to see it used by default on many sites we use
in our day to day lives such as social networking and search engines. The HTTPS
protocol uses the Transport Layer Security (TLS) protocol, the successor to the
Secure Sockets Layer (SSL) protocol, to secure communications. When configured
and used correctly, it provides protection against eavesdropping and tampering,
along with a reasonable guarantee that a website is the one we intend to be using.
Or, in more technical terms, it provides confidentiality and data integrity, along with
authentication of the website's identity.
IPSec:IPsec (Internet Protocol Security) is a framework for a set of protocols for security at
the network or packet processing layer of network communication. It is an Internet
Engineering Task Force (IETF) standard suite of protocols that provides data
authentication, integrity, and confidentiality as data is transferred between communication
points across IP networks. IPSec provides data security at the IP packet level. A packet is
a data bundle that is organized for transmission across a network, and it includes a header
and payload (the data in the packet). IPSec emerged as a viable network security standard
because enterprises wanted to ensure that data could be securely transmitted over the
Internet. IPSec protects against possible security exposures by protecting data while in
transit.
Padding 0 to 255 bytes is used for 32-bit alignment and with the block size of the
block cipher.
Padding Length Indicates the length of the Padding field in bytes. This field is
used by the receiver to discard the Padding field.
Next Header Identifies the nature of the payload, such as TCP or UDP.
Authentication Data Contains the Integrity Check Value (ICV), and a message
authentication code that is used to verify the sender's identity and message integrity. The
ICV is calculated over the ESP header, the payload data and the ESP trailer.
In IPv4, the AH protects the IP payload and all header fields of an IP datagram
except for mutable fields (i.e. those that might be altered in transit), and also IP
options such as the IP Security Option (RFC 1108). Mutable (and therefore
unauthenticated) IPv4 header fields are DSCP/ToS, ECN, Flags, Fragment Offset,
TTL and Header Checksum.
In IPv6, the AH protects most of the IPv6 base header, AH itself, non-mutable
extension headers after the AH, and the IP payload. Protection for the IPv6 header
excludes the mutable fields: DSCP, ECN, Flow Label, and Hop Limit.
3)Internet Key Exchange (IKE): The Internet Key Exchange (IKE) is an IPsec (Internet
Protocol Security) standard protocol used to ensure security for virtual private network
(VPN) negotiation and remote host or network access. Specified in IETF Request for
Comments (RFC) 2409, IKE defines an automatic means of negotiation and authentication
for IPsec security associations (SA). Security associations are security policies defined for
communication between two or more entities; the relationship between the entities is
represented by a key. The IKE protocol ensures security for SA communication without the
preconfiguration that would otherwise be required.
Eliminates the need to manually specify all the IPSec security parameters in the
crypto maps at both peers.
Allows you to specify a lifetime for the IPSec security association.
Allows encryption keys to change during IPSec sessions.
Allows IPSec to provide anti-replay services.
Permits Certification Authority (CA) support for a manageable, scalable IPSec
implementation.
Allows dynamic authentication of peers.
Kerberos was created by MIT as a solution to these network security problems. The
Kerberos protocol uses strong cryptography so that a client can prove its identity to a
server (and vice versa) across an insecure network connection. After a client and server
has used Kerberos to prove their identity, they can also encrypt all of their communications
to assure privacy and data integrity as they go about their business.
Kerberos uses the concept of a ticket as a token that proves the identity of a user. Tickets
are digital documents that store session keys. They are typically issued during a login
session and then can be used instead of passwords for any Kerberized services. During
the course of authentication, a client receives two tickets:
A ticket-granting ticket (TGT), which acts as a global identifier for a user and a session
key
A service ticket, which authenticates a user to a particular service
These tickets include time stamps that indicate an expiration time after which they become
invalid. This expiration time can be set by Kerberos administrators depending on the
service.
To accomplish secure authentication, Kerberos uses a trusted third party known as a key
distribution center (KDC), which is composed of two components, typically integrated
into a single server:
An authentication server (AS), which performs user authentication
A ticket-granting server (TGS), which grants tickets to users
The authentication server keeps a database storing the secret keys of the users and
services. The secret key of a user is typically generated by performing a one-way hash of
the user-provided password. Kerberos is designed to be modular, so that it can be used
with a number of encryption protocols, with AES being the default cryptosystem.
Kerberos aims to centralize authentication for an entire networkrather than storing
sensitive authentication information at each users machine, this data is only maintained in
one presumably secure location.
To start the Kerberos authentication process, the initiating client sends a request to an
authentication server for access to a service. The initial request is sent as plaintext
because no sensitive information is included in the request.The authentication server
retrieves the initiating client's private key, assuming the initiating client's username is in the
KDC database. If the initiating client's username cannot be found in the KDC database, the
client cannot be authenticated and the authentication process stops. If the client's
username can be found in the KDC database, the authentication server generates a
session key and a ticket granting ticket. The ticket granting ticket is timestamped and
encrypted by the authentication server with the initiating client's password.The initiating
client is then prompted for a password; if what is entered matches the password in the
KDC database, the encrypted ticket granting ticket sent from the authentication server is
decrypted and used to request a credential from the ticket granting server for the desired
service. The client sends the ticket granting ticket to the ticket granting server, which may
be physically running on the same hardware as the authentication server, but performing a
different role.
The ticket granting service carries out an authentication check similar to that performed by
the authentication server, but this time sends credentials and a ticket to access the
requested service. This transmission is encrypted with a session key specific to the user
and service being accessed. This proof of identity can be used to access the requested
"kerberized" service, which, once having validated the original request, will confirm its
identity to the requesting system.The timestamped ticket sent by the ticket granting service
allows the requesting system to access the service using a single ticket for a specific time
period without having to be re-authenticated. Making the ticket valid for a limited time
period makes it less likely that someone else will be able to use it later; it is also possible
to set the maximum lifetime to 0, in which case service tickets will not expire. Microsoft
recommends a maximum lifetime of 600 minutes for service tickets; this is the default
value in Windows Server implementations of Kerberos.
Kerberos Advantages
The Kerberos protocol is designed to be secure even when performed over an insecure
network.
Since each transmission is encrypted using an appropriate secret key, an attacker cannot
forge a valid ticket to gain unauthorized access to a service without compromising an
encryption key or breaking the underlying encryption algorithm, which is assumed to be
secure.
Kerberos is also designed to protect against replay attacks, where an attacker
eavesdrops legitimate Kerberos communications and retransmits messages from an
authenticated party to perform unauthorized actions.
The inclusion of time stamps in Kerberos messages restricts the window in which an
attacker can retransmit messages.
Tickets may contain the IP addresses associated with the authenticated party to prevent
replaying messages from a different IP address.
Kerberized services make use of a replay cache, which stores previous authentication
tokens and detects their reuse.
Kerberos makes use of symmetric encryption instead of public-key encryption, which
makes Kerberos computationally efficient
The availability of an open-source implementation has facilitated the adoption of
Kerberos.
Kerberos Disadvantages
Kerberos has a single point of failure: if the Key Distribution Center becomes unavailable,
the authentication scheme for an entire network may cease to function. Larger networks
sometimes prevent such a scenario by having multiple KDCs, or having backup KDCs
available in case of emergency.
If an attacker compromises the KDC, the authentication information of every client and
server on the network would be revealed.
Kerberos requires that all participating parties have synchronized clocks, since time
stamps are used.
Virus: A computer virus is a program, script, or macro designed to cause damage, steal
personal information, modify data, send e-mail, display messages, or some combination of
these actions.When the virus is executed, it spreads by copying itself into or over data
files, programs, or boot sector of a computer's hard drive, or potentially anything else
writable. To help spread an infection the virus writers use detailed knowledge of security
vulnerabilities, zero days, or social engineering to gain access to a host's computer.
Types of Virus:
1)Boot Sector Virus:A Boot Sector Virus infects the first sector of the hard drive, where the
Master Boot Record (MBR) is stored. The Master Boot Record (MBR) stores the disk's
primary partition table and to store bootstrapping instructions which are executed after the
computer's BIOS passes execution to machine code. If a computer is infected with Boot
Sector Virus, when the computer is turned on, the virus launches immediately and is
loaded into memory, enabling it to control the computer.Examples of boot viruses are
polyboot and antiexe.
2)File Deleting Viruses:A File Deleting Virus is designed to delete critical files which are
the part of Operating System or data files.
3)Mass Mailer Viruses:Mass Mailer Viruses search e-mail programs like MS outlook for e-
mail addresses which are stored in the address book and replicate by e-mailing
themselves to the addresses stored in the address book of the e-mail program.
4)Macro Virus: Document or macro viruses are written in a macro language. Such
languages are usually included in advanced applications such as word processing and
spreadsheet programs. The vast majority of known macro viruses replicate using the MS
Office program suite, mainly MS Word and MS Excel, but some viruses targeting other
applications are known as well. The symptoms of infection include the automatic restart of
computer again and again. Commonly known types of macro viruses are Melissa A,
Bablas and Y2K Bug.
5)File Infector:Another common problem of the computer programmers is the file infector
viruses which automatically interrupt during the processing or while writing and infects the
file. Or they work on execution of the file. Unwanted dialog boxes starts appearing on the
screen with unknown statements with extensions .com and .exe. They destroy the original
copy of the file and save the infected file with the same as original. Once infected, it is very
hard to recover the original data.
6)Stealth viruses: Stealth viruses have the capability to hide from operating system or anti-
virus software by making changes to file sizes or directory structure. Stealth viruses are
anti-heuristic nature which helps them to hide from heuristic detection.
7)Resident Virus:These are the threat programs that permanently penetrates in the
Random access memory of the computer system .when the computer gets started it is
automatically transmitted to the secondary storage media and interrupts all the sequential
operations of the processor and corrupt all the running programs. For instance Randex
and CMJ are commonly known resident viruses .if these viruses gets into the hard disk
then one has to replace the secondary storage media and some times RAM even.
8)Polymorphic Viruses: Polymorphic viruses change their form in order to avoid detection
and disinfection by anti-virus applications. After the work, these types of viruses try to hide
from the anti-virus application by encrypting parts of the virus itself. This is known as
mutation.
9)Retrovirus is another type virus which tries to attack and disable the anti-virus
application running on the computer. A retrovirus can be considered anti-antivirus. Some
Retroviruses attack the anti-virus application and stop it from running or some other
destroys the virus definition database.
Worms:
A computer worm is a self-replicating computer program that penetrates an operating
system with the intent of spreading malicious code. Worms utilize networks to send copies
of the original code to other computers, causing harm by consuming bandwidth or possibly
deleting files or sending documents via email. Worms can also install backdoors on
computers. Worms are often confused with computer viruses; the difference lies in how
they spread. Computer worms self-replicate and spread across networks, exploiting
vulnerabilities, automatically; that is, they dont need a cyber criminals guidance, nor do
they need to latch onto another computer program.
A mail worm is carried by an email message, usually as an attachment but there have
been some cases where the worm is located in the message body. The recipient must
open or execute the attachment before the worm can activate. The
attachment may be a document with the worm attached in a virus-like manner, or it may
bean independent file. The worm may very well remain undetected by the user if it is
attached to a document. The document is opened normally and the users attention is
probably focused on the document contents when the worm activates. Independent worm
files usually fake an error message or perform some similar action to avoid detection.
Pure worms have the potential to spread very quickly because they are not dependent on
any human actions, but the current networking environment is not ideal for them. They
usually require a direct real-time connection between the source and target computer
when the worm replicates.
Backdoor: These are created to give an unauthorized user remote control of a computer.
Once installed on a machine, the remote user can then do anything they wish with the
infected computer. This often results in uniting multiple backdoor Trojan-infected
computers working together for criminal activity.
Rootkit: Programmed to conceal files and computer activities, rootkits are often created to
hide further malware from being discovered. Normally, this is so malicious programs can
run for an extended period of time on the infected computer.
DDoS: A sub sect of backdoor Trojans, denial of service (DoS) attacks are made from
numerous computers to cause a web address to fail.
Banker: Trojan-bankers are created for the sole purpose of gathering users bank, credit
card, debit card and e-payment information.
FakeAV: This type of Trojan is used to convince users that their computers are infected
with numerous viruses and other threats in an attempt to extort money. Often, the threats
arent real, and the FakeAV program itself will be what is causing problems in the first
place.
Ransom: Trojan-Ransoms will modify or block data on a computer either so it doesnt work
properly or so certain files cant be accessed. The person disrupting the computer will
restore the computer or files only after a user has paid a ransom. Data blocked this way is
often impossible to recover without the criminals approval.
1)SAML (Security Assertion Markup Language) is an open standard for exchanging
authentication information between a service provider and an identity provider (IdP). A
third-party IdP is used to authenticate users and to pass identity information to the service
provider in the form of a digitally signed XML(Extensible Mark-up language)
document. Tableau Server is a service provider. Examples of IdPs include PingOne and
OneLogin.SAML is designed for business-to-business (B2B) and business-to-consumer
(B2C) transactions.
Single sign-on (SSO) is a session and user authentication service that permits a user to
use one set of login credentials (e.g., name and password) to access multiple applications.
The service authenticates the end user for all the applications the user has been given
rights to and eliminates further prompts when the user switches applications during the
same session. On the back end, SSO is helpful for logging user activities as well as
monitoring user accounts.Some SSO services use protocols such as Kerberos and the
security assertion markup language (SAML).
Protocol This defines the way that SAML asks for and gets assertions, for
example, using SOAP over HTTP.
Binding This details exactly how SAML message exchanges are mapped into
SOAP exchanges.
Protocol defines how SAML asks for and receives assertions. Binding defines how
SAML message exchanges are mapped to Simple Object Access Protocol (SOAP)
exchanges. SAML works with multiple protocols including Hypertext Transfer
Protocol (HTTP), Simple Mail Transfer Protocol (SMTP), File Transfer Protocol (FTP)
and also supports SOAP, BizTalk, and Electronic Business XML (ebXML). The
Organization for the Advancement of Structured Information Standards (OASIS) is
the standards group for SAML.
2)OAuth 2
OAuth, which was first released in 2007, was conceived as an authentication method for
the Twitter application program interface (API). In 2010, The IETF OAuth Working Group
published OAuth 2.0. Like the original OAuth, OAuth 2.0 provides users with the ability to
grant third-party access to web resources without sharing a password. Updated features
available in OAuth 2.0 include new flows, simplified signatures and short-lived tokens with
long-lived authorizations.OAuth 2 is an authorization framework that enables applications
to obtain limited access to user accounts on an HTTP service, such as Facebook, GitHub,
and DigitalOcean. It works by delegating user authentication to the service that hosts the
user account, and authorizing third-party applications to access the user account. OAuth 2
provides authorization flows for web and desktop applications, and mobile devices.
OpenID Connect is built directly on OAuth 2.0 and in most cases is deployed right along
with (or on top of) an OAuth infrastructure. OpenID Connect also uses the JSON Object
Signing And Encryption (JOSE) suite of specifications for carrying signed and encrypted
information around in different places. In fact, an OAuth 2.0 deployment with JOSE
capabilities is already a long way to defining a fully compliant OpenID Connect system,
and the delta between the two is relatively small.
Firewall
A firewall is a network security device that monitors incoming and outgoing network
traffic and decides whether to allow or block specific traffic based on a defined set of
security rules.
Firewalls have been a first line of defense in network security for over 25 years. They
establish a barrier between secured and controlled internal networks that can be
trusted and untrusted outside networks, such as the Internet.
3. Stateful inspection firewalls, on the other hand, not only examine each
packet, but also keep track of whether or not that packet is part of an
established TCP session. This offers more security than either packet
filtering or circuit monitoring alone, but exacts a greater toll on network
performance.
While inspection firewalls are the most secure, they are also rather complex
and the most likely to be misconfigured. Whichever firewall type you choose,
keep in mind that a misconfigured firewall can in some ways be worse than
no firewall at all, because it lends the dangerous impression of security while
providing little or none.
Digital Signature: Signature is the proof to the receiver that the document comes
from the correct entity. The person who signs it takes the responsibility of the content
present in the document. A signature on a document, when verified, is a sign of
authentication; the document is authentic.
The value of the hash is unique to the hashed data. Any change in the data,
even changing or deleting a single character, results in a different value. This
attribute enables others to validate the integrity of the data by using the
signer's public key to decrypt the hash. If the decrypted hash matches a
second computed hash of the same data, it proves that the data hasn't
changed since it was signed. If the two hashes don't match, the data has
either been tampered with in some way (integrity) or the signature was
created with a private key that doesn't correspond to the public key
presented by the signer (authentication).A digital signature can be used with
any kind of message -- whether it is encrypted or not -- simply so the receiver
can be sure of the sender's identity and that the message arrived intact.
Digital signatures make it difficult for the signer to deny having signed
something (non-repudiation) -- assuming their private key has not been
compromised -- as the digital signature is unique to both the document and
the signer, and it binds them together. A digital certificate, an electronic
document that contains the digital signature of the certificate-issuing
authority, binds together a public key with an identity and can be used to
verify a public key belongs to a particular person or entity.
Hackers are classified according to the intent of their actions. The following list
classifies hackers according to their intent.
Symbol Description
What is Cybercrime:
Cybercrime is the use of computers and networks to perform illegal activities such
as spreading computer viruses, online bullying, performing unauthorized electronic
fund transfers, etc. Most cybercrimes are committed through the internet. Some
cybercrimes can also be carried out using Mobile phones via SMS and online
chatting applications.
Type of Cybercrime
The following list presents the common types of cybercrimes:
Computer Fraud: Intentional deception for personal gain via the use of
computer systems.
Privacy violation: Exposing personal information such as email addresses,
phone number, account details, etc. on social media, websites, etc.
Identity Theft: Stealing personal information from somebody and
impersonating that person.
Sharing copyrighted files/information: This involves distributing copyright
protected files such as eBooks and computer programs etc.
Electronic funds transfer: This involves gaining an un-authorized access to
bank computer networks and making illegal fund transfers.
Electronic money laundering: This involves the use of the computer to
launder money.
ATM Fraud: This involves intercepting ATM card details such as account
number and PIN numbers. These details are then used to withdraw funds
from the intercepted accounts.
Denial of Service Attacks: This involves the use of computers in multiple
locations to attack servers with a view of shutting them down.
Spam: Sending unauthorized emails. These emails usually contain
advertisements.
1)Web Bugs: A Web bug is a small GIF format image file that cane
embedded in a Web page or an HTML format email message. A Web Bug
can be a small as single pixel in size and can easily be hidden anywhere in
an HTML document.
3)Cookies: A cookie is a small text file that a Web servers asks your browser
to place on your computer. The cookie contains informations that identifies
your computer(its ip address),you (your user-name and email address),and
information about your visit to the Website. If you set up an account at a Web
Site such as an e-commerce site, the cookie will contain information about
your account, making it easy for the server to find and manage your account
whenever you visit.
4)Snagging: In the right setting, a thief can try snagging information by listening in on
a telephone extension, through a wiretap, or over a cubicle wall while the victim gives
credit card other personal information to a legitimate agent
5)Flooders: Used to attack networked computer systems with a large volume of traffic
to carry out a denial of service(Dos) attack.
6)Rootkit: Set of hacker tools used after attacker has broken into computer system
and gained root-level access.
Types of Intrusions:
External attacks
attempted break-ins, denial of service attacks, etc.
Internal attacks
Masquerading as some other user
Mechanisms Used:
Examples :
Car Alarms
House Alarms
Surveillance Systems
Spy Satellites, and spy planes
Subscribe Study Regular YouTube
Channel and Join Our Facebook Group
For MCQ and Understand these Topic
Concepts
Artificial neural networks: Non-linear predictive models that learn through training
and resemble biological neural networks in structure.
Rule induction: The extraction of useful if-then rules from data based on statistical
significance.
2)Data Warehouse:
A data warehouse is a:
subject-oriented
integrated
time varying
non-volatile
collection of data in support of the management's decision-making process.A data
warehouse is a centralized repository that stores data from multiple information sources
and transforms them into a common, multidimensional data model for efficient querying
and analysis.
Subject Oriented:Data warehouses are designed to help you analyze data. For example,
to learn more about your company's sales data, you can build a warehouse that
concentrates on sales. Using this warehouse, you can answer questions like "Who was
our best customer for this item last year?" This ability to define a data warehouse by
subject matter, sales in this case, makes the data warehouse subject oriented.
Nonvolatile:Nonvolatile means that, once entered into the warehouse, data should not
change. This is logical because the purpose of a warehouse is to enable you to analyze
what has occurred.
Time Variant:In order to discover trends in business, analysts need large amounts of data.
This is very much in contrast to online transaction processing (OLTP) systems, where
performance requirements demand that historical data be moved to an archive. A data
warehouse's focus on change over time is what is meant by the term time variant.
There are two approaches to data warehousing, top down and bottom up. The top down
approach spins off data marts for specific groups of users after the complete data
warehouse has been created. The bottom up approach builds the data marts first and then
combines them into a single, all-encompassing data warehouse.
Slice and dice refers to a strategy for segmenting, viewing and understanding data in a
database. Users slices and dice by cutting a large segment of data into smaller parts, and
repeating this process until arriving at the right level of detail for analysis. Slicing and
dicing helps provide a closer view of data for analysis and presents data in new and
diverse perspectives.The term is typically used with OLAP databases that present
information to the user in the form of multidimensional cubes similar to a 3D spreadsheet.
ETL process:
ETL (Extract, Transform and Load) is a process in data warehousing responsible for
pulling data out of the source systems and placing it into a data warehouse. ETL involves
the following tasks:
Extracting the data from source systems (SAP, ERP, other operational systems), data
from different source systems is converted into one consolidated data warehouse format
which is ready for transformation processing.
applying business rules (so-called derivations, e.g., calculating new measures and
dimensions),
cleaning (e.g., mapping NULL to 0 or "Male" to "M" and "Female" to "F" etc.),
filtering (e.g., selecting only certain columns to load),
splitting a column into multiple columns and vice versa,
joining together data from multiple sources (e.g., lookup, merge),
transposing rows and columns,
applying any kind of simple or complex data validation (e.g., if the first 3 columns
in a row are empty then reject the row from processing)
Loading the data into a data warehouse or data repository other reporting applications
A Data Mart is one piece of a data warehouse where all the information is related to
a specific business area. Therefore it is considered a subset of all the data stored in
that particular database, since all data marts together create a data warehouse.
This idea of subsetting the information can be easily extrapolated to different
departments in a company or distinct business areas with lots of data related to it.
They are all related to the same company but divided by usability into several data
marts.
So a data mart is some subset of data specific to some user types tasks, creating a
view in a format that makes information easier to use and analyse by the end users of
your system.
Data Lake: Data warehousing applies the structure on the way in, organizing it to
fit the context of the database schema. Data lakes facilitate a much more fluid
approach; they only add structures to data as it dispenses to the application layer.
In storage, data lakes preserve the original structures or unstructured forms to
remain; it is a Big Data storage and retrieval system that could conceivably scale
upward indefinitely. Data lake is often associated with Hadoop-oriented object
storage. In such a scenario, an organization's data is first loaded into the Hadoop
platform, and then business analytics and data mining tools are applied to the data
where it resides on Hadoop's cluster nodes of commodity computers. Microsoft
Azure Data Lake is a highly scalable data storage and analytics service. The
service is hosted in Azure, Microsoft's public cloud, and is largely intended for big
datastorage and analysis.
Data SWAMP: Data lakes do not require much structure, and they accept all data.
However, in poorly designed and neglected systems, they risk becoming data swamps.
A Data Swamp is the term that describes the failure to document the stored data
accurately, resulting in the inability to analyze and exploit the data efficiently; the
original data may remain, but the data swamp cannot retrieve it without the metadata
that gives it context.
Data Cube: A Data Cube is an application that puts data into matrices of three or more
dimensions. Transformations in the data express as tables, arrays of processed
information. Where tables match rows of data strings with columns of data types,
a data cube cross-references tables from single or multiple data sources to increase
the detail associated with each data point. This transformation connects the data to a
position in rows and columns of more than one table. The benefit is that knowledge
workers can use data cubes to create data volumes to drill down into and discover the
deepest insights possible.
Partitioning Method
Hierarchical Method
Density-based Method
Grid-Based Method
Model-Based Method
Constraint-based Method
Data mining is used for market basket analysis to provide information on what
product combinations were purchased together when they were bought and in
what sequence. This information helps businesses promote their most
profitable products and maximize the profit. In addition, it encourages customers
to purchase related products that they may have been missed or overlooked.
Retail companies use data mining to identify customers behavior buying
patterns.
Several data mining techniques e.g., distributed data mining have been
researched, modeled and developed to help credit card fraud detection.
Data mining is used to identify customers loyalty by analyzing the data of
customers purchasing activities such as the data of frequency of purchase in a
period of time, a total monetary value of all purchases and when was the last
purchase. After analyzing those dimensions, the relative measure is generated
for each customer. The higher of the score, the more relative loyal the customer
is.
To help the bank to retain credit card customers, data mining is applied. By
analyzing the past data, data mining can help banks predict customers that
likely to change their credit card affiliation so they can plan and launch different
special offers to retain those customers.
Credit card spending by customer groups can be identified by using data
mining.
The hidden correlations between different financial indicators can be
discovered by using data mining.
From historical market data, data mining enables to identify stock trading rules.
The growth of the insurance industry entirely depends on the ability to convert data into
the knowledge, information or intelligence about customers, competitors, and its
markets. Data mining is applied in insurance industry lately but brought tremendous
competitive advantages to the companies who have implemented it successfully. The
data mining applications in insurance industry are listed below:
Data mining helps determine the distribution schedules among warehouses and
outlets and analyze loading patterns.
Characteristics of SRS:
Correct
Complete and Unambiguous
Verifiable
Consistent
Traceable
Modifiable
Software Life Cycle Models:
A software life cycle model (also called process model) is a descriptive and dia
grammatic representation of the software life cycle. A life cycle model represents all
the activities
required to make a software product transit through its life cycle phases. It also
captures the order in which these activities are to be undertaken. In other words, a
life cycle model maps the different activities performed on a software product from its
inception to retirement. Different life cycle models may map the basic development
activities to phases in different ways. Thus, no
matter which life cycle model is followed, the basic activities are included in all
life cycle models though the activities may be carried out in different orders in
different life cycle models. During any life cycle phase, more than one activity may also
be carried out. A software life cycle model is a particular abstraction representing a
software life cycle.Such a model may be:
Waterfall Model:
The Waterfall Model was first Process Model to be introduced.
The Waterfall Model is a linear sequential flow. In which progress is seen as flowing
steadily downwards (like a waterfall) through the phases of software implementation.
This means that any phase in the development process begins only if the previous
phase is complete. The waterfall approach does not define the process to go back to
the previous phase to handle changes in requirement. The waterfall approach is the
earliest approach that was used for software development.
This model is used only when the requirements are very well known, clear and
fixed.
Product definition is stable.
Technology is understood.
There are no ambiguous requirements
Ample resources with required expertise are available freely
The project is short.
Very less customer enter action is involved during the development of the product.
Once the product is ready then only it can be demoed to the end users. Once the
product is developed and if any failure occurs then the cost of fixing such issues are
very high, because we need to update everywhere from document till the logic.
Business Modeling
Data Modeling
Process Modeling
Application Modeling
Testing and Turnover
1)Business Modeling: The business model for the product under development is designed
in terms of flow of information and the distribution of information between various business
channels. A complete business analysis is performed to find the vital information for
business, how it can be obtained, how and when is the information processed and what
are the factors driving successful flow of information.
2)Data Modeling: Once the business modeling phase over and all the business analysis
completed, all the required and necessary data based on business analysis are identified
in data modeling phase.
3)Process modeling: Data objects defined in data modeling are converted to achieve the
business information flow to achieve some specific business objective. Description are
identified and created for CRUD of data objects.
4)Application Generation: The actual system is built and coding is done by using
automation tools to convert process and data models into actual prototypes.
5)Testing and turnover: All the testing activates are performed to test the developed
application.
Iterative Model: This model leads the software development process in iterations. It
projects the process of development in cyclic manner repeating every step after every
cycle of SDLC process. The software is first developed on very small scale and all the
steps are followed which are taken into consideration. Then, on every next iteration,
more features and modules are designed, coded, tested, and added to the software.
Every cycle produces a software, which is complete in itself and has more features and
capabilities than that of the previous one. After each iteration, the management team
can do work on risk management and prepare for the next iteration. Because a cycle
includes small portion of whole software process, it is easier to manage the
development process but it consumes more resources.
In iterative model we can only create a high-level design of the application before we
actually begin to build the product and define the design solution for the entire product.
Later on we can design and built a skeleton version of that, and then evolved the
design based on what had been built.
In iterative model we are building and improving the product step by step. Hence we
can track the defects at early stages. This avoids the downward flow of the defects.
In iterative model we can get the reliable user feedback. When presenting sketches
and blueprints of the product to users for their feedback, we are effectively asking them
to imagine how the product will work.
In iterative model less time is spent on documenting and more time is given for
designing.
V Model or Verification and Validation Model. Every testing execution should follow
some sequence and V Model is the perfect way to perform the testing approaches. In
V Model there are some steps or sequences specified which should be followed during
performing test approach. Once one step completes we should move to the next step.
Test execution sequences are followed in V shape. In software development life cycle,
V Model testing should start at the beginning of the project when requirement analysis
starts. In V Model project development and testing should go parallel. Verification
phase should be carried out from SDLC where validation phase should be carried out
from STLC (Software Testing Life Cycle)
Steps in V Model
Basically there are 4 steps involved in STLC while performing V Model testing strategy.
Unit Testing.
Integration Testing.
System Testing.
Acceptance Testing.
Advantages of V Model
If project is small and easy to understand, V Model is the best approach as its
easy and simple to use.
Many testing activities are performed in the beginning like planning and design
which saves lots of testing time.
Most of the defects and bugs are found in the beginning of the project
development. So less chances of defect or bug to be occurred at final testing
phase.
Disadvantages of V Model
Guessing the error in the beginning of the project could take more time.
Less flexibility.
Any changes done in the middle of the development which is unplanned could
make difficult to make the changes at all the places like test document and
requirements.
In case of some software deliverables, especially the large ones, it is difficult to assess
the effort required at the beginning of the software development life cycle.
There is lack of emphasis on necessary designing and documentation.
The project can easily get taken off track if the customer representative is not clear
what final outcome that they want.
Only senior programmers are capable of taking the kind of decisions required during
the development process. Hence it has no place for newbie programmers, unless
combined with experienced resources.
When to use Agile model:
When new changes are needed to be implemented. The freedom agile gives to
change is very important. New changes can be implemented at very little cost because
of the frequency of new increments that are produced.
To implement a new feature the developers need to lose only the work of a few days,
or even only hours, to roll back and implement it.
Unlike the waterfall model in agile model very limited planning is required to get started
with the project. Agile assumes that the end users needs are ever changing in a
dynamic business and IT world. Changes can be discussed and features can be newly
effected or removed based on feedback. This effectively gives the customer the
finished system they want or need.
Both system developers and stakeholders alike, find they also get more freedom of
time and options than if the software was developed in a more rigid sequential way.
Having options gives them the ability to leave important decisions until more or better
data or even entire hosting programs are available; meaning the project can continue
to move forward without fear of reaching a sudden standstill.
Generates working software quickly and early during the software life cycle.
More flexible less costly to change scope and requirements.
Easier to test and debug during a smaller iteration.
Easier to manage risk because risky pieces are identified and handled during its
iteration.
Each iteration is an easily managed milestone.
Disadvantages of Incremental Model
Such models are used where requirements are clear and can implement by
phase wise. From the figure its clear that the requirements is divided into R1,
R2.Rn and delivered accordingly.
Mostly such model is used in web applications and product based companies.
The Prototyping Model is applied when detailed information related to input and
output requirements of the system is not available. In this model, it is assumed that all
the requirements may not be known at the start of the development of the system. It is
usually used when a system does not exist or in case of a large and complex system
where there is no manual process to determine the requirements. This model allows
the users to interact and experiment with a working model of the system known
as prototype. The prototype gives the user an actual feel of the system.
Prototype model should be used when the desired system needs to have a lot of
interaction with the end users.
Typically, online systems, web interfaces have a very high amount of interaction with
end users, are best suited for Prototype model. It might take a while for a system to be
built that allows ease of use and needs minimal training for the end user.
Prototyping ensures that the end users constantly work with the system and provide a
feedback which is incorporated in the prototype to result in a useable system. They are
excellent for designing good human computer interface systems.
Big Bang Model This model is the simplest model in its form. It requires little planning,
lots of programming and lots of funds. This model is conceptualized around the big
bang of universe. As scientists say that after big bang lots of galaxies, planets, and
stars evolved just as an event. Likewise, if we put together lots of programming and
funds, you may achieve the best software product. This model is not suitable for large
software projects but good one for learning and experimenting.
COCOMO Model: The Constructive Cost model was developed by Barry Boehm; this
is a type of software that is used to determine cost estimate. It works by combining a
regression formula with predetermined parameters that are derived through the data of
a particular project. The main cocomo model advantage is that you can determine the
costs that will be incurred when investing in a particular project. Another cocomo
model advantage is that the estimates and all other related information that is obtained
is factual, so your results are always accurate. You can also customize the structure of
the software to your convenience; this is yet another cocomo model advantage. The
best cocomo model advantage is that it can be repeated any number of times, this
means that you can calculate the cost of a particular project initially and determine how
changes and modifications will affect your initial project estimates. Ease of use is what
has made this model a popular one, the cocomo model advantage offered to its users
allows them to be in full control of the projects and all the costs entailed. Another
cocomo model advantage is that it is well documented and calibrated, offering precise
calculations.
Gantt Chart: A Gantt chart is a horizontal bar chart developed as a production control
tool in 1917 by Henry L. Gantt, an American engineer and social scientist. Frequently
used in project management, a Gantt chart provides a graphical illustration of a
schedule that helps to plan, coordinate, and track specific tasks in a project. Gantt
charts may be simple versions created on graph paper or more complex automated
versions created using project management applications such as Microsoft Project or
Excel. They can also be used for scheduling production processes and employee
rostering.In the latter context, they may also be known as timebar schedules. Gantt
charts can be used to track shifts or tasks and also vacations or other types of out-of-
office time. Specialized employee scheduling software may output schedules as a
Gantt chart, or they may be created through popular desktop publishing software.
VERIFICATION
Verification is the process to make sure the product satisfies the conditions imposed at
the start of the development phase. In other words, to make sure the product behaves
the way we want it to.
VALIDATION
Validation is the process to make sure the product satisfies the specified requirements
at the end of the development phase. In other words, to make sure the product is built
as per customer requirements.
Functionality testing
Implementation testing
When functionality is being tested without taking the actual implementation in concern
it is known as black-box testing. The other side is known as white-box testing where
not only functionality is tested but the way it is implemented is also analyzed.
There are many types of Black Box Testing but following are the prominent ones
-
Black box testing has its own life cycle called Software Test Life Cycle (STLC)
and it is relative to every stage of Software Development Life Cycle.Some
famous Black Box testing techniques are Boundary value analysis, state
transition testing, equivalence partitioning.
2)White Box Testing:It is also known as Clear Box Testing, Open Box Testing, Glass
Box Testing, Transparent Box Testing, Code-Based Testing or Structural Testing.
It is a software testing method in which the internal structure/ design/ implementation
of the item being tested is known to the tester. The tester chooses inputs to exercise
paths through the code and determines the appropriate outputs. Programming know-
how and the implementation knowledge is essential. White box testing is testing
beyond the user interface and into the nitty-gritty of a system.
This method is named so because the software program, in the eyes of the tester, is
like a white/ transparent box; inside which one clearly sees.
White box testing, on its own, cannot identify problems caused by mismatches
between the actual requirements or specification and the code as implemented but it
can help identify some types of design weaknesses in the code. Examples include
control flow problems (e.g., closed or infinite loops or unreachable code), and data flow
problems (e.g., trying to use a variable which has no defined value). Static code
analysis (by a tool) may also find these sorts of problems, but doesn't help the
tester/developer understand the code to the same degree that personally designing
white-box test cases does.
Gray Box Testing is named so because the software program, in the eyes of the tester
is like a gray/ semi-transparent box; inside which one can partially see.
Gray Box Testing gives the ability to test both sides of an application, presentation
layer as well as the code part. It is primarily useful in Integration Testing and
Penetration Testing.Grey-box testing is a perfect fit for Web-based applications.Grey-
box testing is also a best approach for functional or domain testing.
Matrix Testing: This testing technique involves defining all the variables that
exist in their programs.
Regression Testing: To check whether the change in the previous version has
regressed other aspects of the program in the new version. It will be done by
testing strategies like retest all, retest risky use cases, retest within firewall.
Orthogonal Array Testing or OAT: It provides maximum code coverage with
minimum test cases.
Pattern Testing: This testing is performed on the historical data of the previous
system defects. Unlike black box testing, gray box testing digs within the code
and determines why the failure happened
Usually, Grey box methodology uses automated software testing tools to conduct the
testing. Stubs and module drivers are created to relieve tester to manually generate
the code.
There are many other types of testing like:
Acceptance Testing
Acceptance testing is often done by the customer to ensure that the delivered product
meets the requirements and works as the customer expected. It falls under the class of
black box testing.
Regression Testing
Regression testing is the testing after modification of a system, component, or a group
of related units to ensure that the modification is working correctly and is not damaging
or imposing other modules to produce unexpected results. It falls under the class of
black box testing.
Beta Testing
Beta testing is the testing which is done by end users, a team outside development, or
publicly releasing full pre-version of the product which is known as beta version. The
aim of beta testing is to cover unexpected errors. It falls under the class of black box
testing.
Unit Testing
Unit testing is the testing of an individual unit or group of related units. It falls under the
class of white box testing. It is often done by the programmer to test that the unit
he/she has implemented is producing expected output against given input.Statements,
functions, methods, interfaces i.e units of the code are individually tested for proper
execution. It can be automated or can be done manually. Usually small data is used
for unit testing.
Integration Testing
Integration testing is testing in which a group of components are combined to produce
output. Also, the interaction between software and hardware is tested in integration
testing if software and hardware components have any relation. It may fall under both
white box testing and black box testing. Different approaches used in integration
testing are: top down & bottom up integration testing, sandwich
testing (combination of both).
Stress Testing
Stress testing is the testing to evaluate how system behaves under unfavorable
conditions. Testing is conducted at beyond limits of the specifications. It falls under the
class of black box testing.
Performance Testing
Performance testing is the testing to assess the speed and effectiveness of the system
and to make sure it is generating results within a specified time as in performance
requirements. It falls under the class of black box testing.
Functional Testing
Functional testing is the testing to ensure that the specified functionality required in the
system requirements works. It falls under the class of black box testing.
System Testing
System testing is the testing to ensure that by putting the software in different
environments (e.g., Operating Systems) it still works. System testing is done with full
system implementation and environment. It falls under the class of black box testing. It
is performed after integration testing. Various approaches used are: load testing,
smoke testing, security testing, migration testing etc.
Usability Testing
Usability testing is performed to the perspective of the client, to evaluate how the GUI
is user-friendly? How easily can the client learn? After learning how to use, how
proficiently can the client perform? How pleasing is it to use its design? This falls under
the class of black box testing.
Data Flow Diagram: Data Flow Diagram (DFD) is a graphical representation of flow
of data in an information system. It is capable of depicting incoming data flow, outgoing
data flow, and stored data. The DFD does not mention anything about how data flows
through the system. There is a prominent difference between DFD and Flowchart. The
flowchart depicts flow of control in program modules. DFDs depict flow of data in the
system at various levels. It does not contain any control or branch elements.
Logical DFD - This type of DFD concentrates on the system process, and flow of
data in the system. For example in a banking software system, how data is moved
between different entities.
Physical DFD - This type of DFD shows how the data flow is actually implemented in
the system. It is more specific and close to the implementation.
DFD Components
DFD can represent Source, destination, storage and flow of data using the following
set of components -
Entities - Entities are source and destination of information data. Entities are
represented by a rectangles with their respective names.
Process - Activities and action taken on the data are represented by Circle or
Round-edged rectangles.
Data Storage - There are two variants of data storage - it can either be
represented as a rectangle with absence of both smaller sides or as an open-
sided rectangle with only one side missing.
Data Flow - Movement of data is shown by pointed arrows. Data movement is
shown from the base of arrow as its source towards head of the arrow as
destination.
Levels of DFD:
DFD Level 0 is also called a Context Diagram. Its a basic overview of the whole
system or process being analyzed or modeled. Its designed to be an at-a-glance view,
showing the system as a single high-level process, with its relationship to external
entities. It should be easily understood by a wide audience, including stakeholders,
business analysts, data analysts and developers.
DFD Level 1 provides a more detailed breakout of pieces of the Context Level
Diagram. You will highlight the main functions carried out by the system, as you break
down the high-level process of the Context Diagram into its subprocesses.
DFD Level 2 then goes one step deeper into parts of Level 1. It may require more text
to reach the necessary level of detail about the systems functioning.
Structure chart is a chart derived from Data Flow Diagram. It represents the system
in more detail than DFD. It breaks down the entire system into lowest functional
modules, describes functions and sub-functions of each module of the system to a
greater detail than DFD. Structure chart represents hierarchical structure of modules.
At each layer a specific task is performed.
Decision Table Testing is a good way to deal with a combination of inputs, which
produce different results. It helps reduce test effort in verifying each and every
combinations of test data, at the same time ensuring complete coverage
Active Data Dictionary: Any changes to the database object structure via DDLs will
have to be reflected in the data dictionary. But updating the data dictionary tables for
the changes are responsibility of database in which the data dictionary exists. If the
data dictionary is created in the same database, then the DBMS software will
automatically update the data dictionary. Hence there will not be any mismatch
between the actual structure and the data dictionary details. Such data dictionary is
called active data dictionary.
User can change the structure of database objects by using DDLs. But users can not
change the structure/content of data dictionary tables/views. All the data dictionary
tables/views are controlled and managed by DBMS. Users do not have any
modification rights on them.
UML stands for Unified Modeling Language. UML 2.0 helped extend the original
UML specification to cover a wider portion of software development efforts including
agile practices.
Improved integration between structural models like class diagrams and behavior
models like activity diagrams.
Added the ability to define a hierarchy and decompose a software system into
components and sub-components.
The original UML specified nine diagrams; UML 2.x brings that number up to 13.
The four new diagrams are called: communication diagram, composite structure
diagram, interaction overview diagram, and timing diagram. It also renamed
statechart diagrams to state machine diagrams, also known as state diagrams.
Types of UML:
Structural UML diagrams
Class diagram
Class diagrams are the backbone of almost every object-oriented method,
including UML. They describe the static structure of a system.
Package diagram
Package diagrams are a subset of class diagrams, but developers sometimes
treat them as a separate technique. Package diagrams organize elements of a
system into related groups to minimize dependencies between packages.
Object diagram
Object diagrams describe the static structure of a system at a particular time.
They can be used to test class diagrams for accuracy.
Component diagram
Component diagrams describe the organization of physical software
components, including source code, run-time (binary) code, and executables.
Composite structure diagram
Composite structure diagrams show the internal part of a class.
Deployment diagram
Deployment diagrams depict the physical resources in a system, including
nodes, components, and connections.
Behavioral UML diagrams
Activity diagram
Activity diagrams illustrate the dynamic nature of a system by modeling the flow
of control from activity to activity. An activity represents an operation on some
class in the system that results in a change in the state of the system. Typically,
activity diagrams are used to model workflow or business processes and internal
operation.
Sequence diagram
Sequence diagrams describe interactions among classes in terms of an
exchange of messages over time.
Use case diagram
Use case diagrams model the functionality of a system using actors and use
cases.
State diagram
Statechart diagrams, now known as state machine diagrams and state diagrams
describe the dynamic behavior of a system in response to external stimuli. State
diagrams are especially useful in modeling reactive objects whose states are
triggered by specific events.
Communication diagram
Communication diagrams model the interactions between objects in sequence.
They describe both the static structure and the dynamic behavior of a system.
Interaction overview diagram
Interaction overview diagrams are a combination of activity and sequence
diagrams. They model a sequence of actions and let you deconstruct more
complex interactions into manageable occurrences.
Timing diagram
A timing diagram is a type of behavioral or interaction UML diagram that focuses
on processes that take place during a specific period of time. They're a special
instance of a sequence diagram, except time is shown to increase from left to
right instead of top down.
Software Quality:
Quality software is reasonably bug or defect free, delivered on time and within budget,
meets requirements and/or expectations, and is maintainable.
ISO 8402-1986 standard defines quality as the totality of features and characteristics
of a product or service that bears its ability to satisfy stated or implied needs.
Once the processes have been defined and implemented, Quality Assurance has the
following responsibilities:
The quality management system under which the software system is created is
normally based on one or more of the following models/standards:
CMMI
Six Sigma
ISO 9000
Note: There are many other models/standards for quality management but the ones
mentioned above are the most popular.
Software Quality Assurance encompasses the entire software development life cycle
and the goal is to ensure that the development and/or maintenance processes are
continuously improved to produce products that meet specifications/requirements. The
process of Software Quality Control (SQC) is also governed by Software Quality
Assurance (SQA).SQA is generally shortened to just QA.
Software Quality Control (SQC) is a set of activities for ensuring quality
in software products.
It includes the following activities:
Reviews
o Requirement Review
o Design Review
o Code Review
o Deployment Plan Review
o Test Plan Review
o Test Cases Review
Testing
o Unit Testing
o Integration Testing
o System Testing
o Acceptance Testing
Test Case:
A test case is a document, which has a set of test data, preconditions, expected
results and post conditions, developed for a particular test scenario in order to verify
compliance against a specific requirement. Test Case acts as the starting point for the
test execution, and after applying a set of input values, the application has a definitive
outcome and leaves the system at some end point or also known as execution post
condition.
Test Scenario
Test Steps
Prerequisite
Test Data
Expected Result
Test Parameters
Actual Result
Environment Information
Comments
As far as possible, write test cases in such a way that you test only one thing at
a time. Do not overlap or complicate test cases. Attempt to make your test
cases atomic.
Ensure that all positive scenarios and negative scenarios are covered.
Language:
o Write in simple and easy to understand language.
o Use active voice: Do this, do that.
o Use exact and consistent names (of forms, fields, etc).
Characteristics of a good test case:
o Accurate: Exacts the purpose.
o Economical: No unnecessary steps or words.
o Traceable: Capable of being traced to requirements.
o Repeatable: Can be used to perform the test over and over.
o Reusable: Can be reused if necessary.
Type of Beta:
Developers release either a closed beta or an open beta; closed beta versions are
released to a select group of individuals for a user test and are invitation only, while
open betas are from a larger group to the general public and anyone interested. The
testers report any bugs that they find, and sometimes suggest additional features they
think should be available in the final version.
Gamma Check:
Gamma check is performed when the application is ready for release to the specified
requirements, this check is performed directly without going through all the testing
activities at home.
Elements of TQM:
Root Cause Analysis
Customer-focused
Process-oriented
Continuous improvement
Effective Communication
Quality Control Tools:
Cause - Effect Diagram
Checklists
Histogram
Graphs
Pareto Charts
Tree Diagram
Arrow Diagram
A data structure is said to be linear if its elements combine to form any specific order.
There are basically two techniques of representing such linear structure within
memory.
First way is to provide the linear relationships among all the elements
represented by means of linear memory location. These linear structures are
termed as arrays.
The second technique is to provide the linear relationship among all the
elements represented by using the concept of pointers or links. These linear
structures are termed as linked lists.
Arrays
Queue
Stacks
Linked List
This structure is mostly used for representing data that contains a hierarchical
relationship among various elements.
Tree
Graph
1. Input Step
2. Assignment Step
3. Decision Step
4. Repetitive Step
5. Output Step
1. To save time (Time Complexity): A program that runs faster is a better program.
2. To save space (Space Complexity): A program that saves space over a competing
program is considerable desirable.
Following are the commonly used asymptotic notations to calculate the running
time complexity of an algorithm.
Notation
Notation
Notation
1)Big Oh Notation,
The notation (n) is the formal way to express the upper bound of an algorithm's
running time. It measures the worst case time complexity or the longest amount of
time an algorithm can possibly take to complete.
2)Omega Notation,
The notation (n) is the formal way to express the lower bound of an algorithm's
running time. It measures the best case time complexity or the best amount of time an
algorithm can possibly take to complete.
3)Theta Notation,
The notation (n) is the formal way to express both the lower bound and the upper
bound of an algorithm's running time.
Linked List: A linked list is a linear collection of data elements, called nodes, where
the linear order is given by means of pointers. Each node is divided into two parts:
1. Singly Linked List/Linear Linked List : It is also called One Way List or Singly
Linked List. It is linear collection of data elements which are called Nodes. The
elements may or may not be stored in consecutive memory locations. So pointers are
used maintain linear order. Each node is divided into two parts. The first part contains
the information of the element and is called INFO Field. The second part contains the
address of the next node and is called LINK Field or NEXT Pointer Field. The
START contains the starting address of the linked list i.e. it contains the address of the
first node of the linked list. The LINK Field of last node contains NULL Value which
indicates that it is the end of linked list. The operations we can perform on singly linked
lists are insertion, deletion and traversal.
2. Doubly Linked List : In this type of Linked list, there are two references associated
with each node, One of the reference points to the next node and one to the previous
node. Advantage of this data structure is that we can traverse in both the directions
and for deletion we dont need to have explicit access to previous node.
3. Circular Linked List : Circular linked list is a linked list where all nodes are
connected to form a circle. There is no NULL at the end. A circular linked list can be a
singly circular linked list or doubly circular linked list. Advantage of this data structure is
that any node can be made as starting node. This is useful in implementation of
circular queue in linked list. Circular Doubly Linked Lists are used for implementation of
advanced data structures like Fibonacci Heap.
They are a dynamic in nature which allocates the memory when required.
Insertion and deletion operations can be easily implemented.
Stacks and queues can be easily executed.
Linked List reduces the access time.
Stack:
At all times, we maintain a pointer to the last PUSHed data on the stack. As this
pointer always represents the top of the stack, hence named top.
The top pointer provides top value of the stack without actually removing it.
2)Queue:
Enqueue
This operation is used to add an item to the queue at the rear end. So, the head of the
queue will be now occupied with an item currently added in the queue. Head count will
be incremented by one after addition of each item until the queue reaches the tail
point. This operation will be performed at the rear end of the queue.
Dequeue
This operation is used to remove an item from the queue at the front end. Now the tail
count will be decremented by one each time when an item is removed from the queue
until the queue reaches the head point. This operation will be performed at the front
end of the queue.
Initialize
This operation is used to initialize the queue by representing the head and tail
positions in the memory allocation table (MAT).
Few more functions are required to make the above-mentioned queue operation
efficient. These are
peek() Gets the element at the front of the queue without removing it.
isfull() Checks if the queue is full.
isempty() Checks if the queue is empty.
In queue, we always dequeue (or access) data, pointed by front pointer and while
enqueing (or storing) data in the queue we take help of rear pointer.
Insert: O(1)
Remove: O(1)
Size: O(1)
Circular Queue: In a standard queue data structure re-buffering problem occurs for
each dequeue operation. To solve this problem by joining the front and rear ends of a
queue to make the queue as a circular queue.Circular queue is a linear data structure.
It follows FIFO principle.
In circular queue the last node is connected back to the first node to make
a circle.
Circular linked list follow the First In First Out principle
Elements are added at the rear end and the elements are deleted at front end
of the queue
Both the front and the rear pointers points to the beginning of the array.
It is also called as Ring buffer.
Items can inserted and deleted from a queue in O(1) time.
Sorting
Sorting is nothing but storage of data in sorted order, it can be in ascending or
descending order. The term Sorting comes into picture with the term Searching. There
are so many things in our real life that we need to search, like a particular record in
database, roll numbers in merit list, a particular telephone number, any particular page
in a book etc.Sorting arranges data in a sequence which makes searching easier.
Every record which is going to be sorted will contain one key. Based on the key the
record will be sorted.
1. Bubble Sort
2. Insertion Sort
3. Selection Sort
4. Quick Sort
5. Merge Sort
6. Heap Sort
Bubble Sort
Bubble Sort is probably one of the oldest, most easiest, straight-forward, inefficient
sorting algorithms. It works by comparing each element of the list with the element next
to it and swapping them if required. With each pass, the largest of the list is "bubbled"
to the end of the list whereas the smaller values sink to the bottom. This way the
number of passes would be equal to size of array 1.
Selection Sort
The idea of selection sort is rather simple: we repeatedly find the next largest (or
smallest) element in the array and move it to its final position in the sorted array.
Assume that we wish to sort the array in increasing order, i.e. the smallest element at
the beginning of the array and the largest element at the end. We begin by selecting
the largest element and moving it to the highest index position. We can do this by
swapping the element at the highest index and the largest element. We then reduce
the effective size of the array by one element and repeat the process on the smaller
(sub)array. The process stops when the effective size of the array becomes 1 (an array
of 1 element is already sorted).
Insertion Sort
The Insertion Sort algorithm is a commonly used algorithm. Even if you haven't been a
programmer or a student of computer science, you may have used this algorithm. Try
recalling how you sort a deck of cards. You start from the begining, traverse through
the cards and as you find cards misplaced by precedence you remove them and insert
them back into the right position. Eventually what you have is a sorted deck of cards.
The same idea is applied in the Insertion Sort algorithm.
ShellSort
ShellSort is mainly a variation of Insertion Sort. In insertion sort, we move elements
only one position ahead. When an element has to be moved far ahead, many
movements are involved. The idea of shellSort is to allow exchange of far items. In
shellSort, we make the array h-sorted for a large value of h. We keep reducing the
value of h until it becomes 1. An array is said to be h-sorted if all sublists of every hth
element is sorted.
Heap Sort
Heap sort is a comparison based sorting technique based on Binary Heap data
structure. It is similar to selection sort where we first find the maximum element and
place the maximum element at the end. We repeat the same process for remaining
element.
Merge Sort
MergeSort is a Divide and Conquer algorithm. It divides input array in two halves, calls
itself for the two halves and then merges the two sorted halves.
Quick sort
Like Merge Sort, QuickSort is a Divide and Conquer algorithm. It picks an element as
pivot and partitions the given array around the picked pivot. There are many different
versions of quickSort that pick pivot in different ways.
1) Always pick first element as pivot.
2) Always pick last element as pivot (implemented below)
3) Pick a random element as pivot.
4) Pick median as pivot.
The key process in quickSort is partition().
Subscribe Study Regular YouTube
Channel and Join Our Facebook Group
For MCQ and Understand these Topic
Concepts
A Linear Search is the basic and simple search algorithm. A linear search searches
an element or value from an array till the desired element or value is not found and it
searches in a sequence order. It compares the element with all the other elements
given in the list and if the element is matched it returns the value index else it return -1.
Linear Search is applied on the unsorted or unordered list when there are fewer
elements in a list. In complexity terms this is an O(n) search - the time taken to search
the list gets bigger at the same rate as the list does.
Binary Search is applied on the sorted array or list. In binary search, we first compare
the value with the elements in the middle position of the array. If the value is matched,
then we return the value. If the value is less than the middle element, then it must lie in
the lower half of the array and if it's greater than the element then it must lie in the
upper half of the array. We repeat this procedure on the lower (or upper) half of the
array. Binary Search is useful when there are large numbers of elements in an array.
In complexity terms this is an O(log n) search - the number of search operations
grows more slowly than the list does, because you're halving the "search space" with
each operation.
Hashing is a technique that is used to uniquely identify a specific object from a group
of similar objects. Some examples of how hashing is used in our lives include:
In universities, each student is assigned a unique roll number that can be used
to retrieve information about them.
In libraries, each book is assigned a unique number that can be used to
determine information about the book, such as its exact position in the library or
the users it has been issued to etc.
In both these examples the students and books were hashed to a unique number.
Assume that you have an object and you want to assign a key to it to make searching
easy. To store the key/value pair, you can use a simple array like a data structure
where keys (integers) can be used directly as an index to store values. However, in
cases where the keys are large and cannot be used directly as an index, you should
use hashing.
In hashing, large keys are converted into small keys by using hash functions. The
values are then stored in a data structure called hash table. The idea of hashing is to
distribute entries (key/value pairs) uniformly across an array. Each element is assigned
a key (converted key). By using that key you can access the element in O(1) time.
Using the key, the algorithm (hash function) computes an index that suggests where
an entry can be found or inserted.
hash = hashfunc(key)
index = hash % array_size
In this method, the hash is independent of the array size and it is then reduced to an
index (a number between 0 and array_size 1) by using the modulo operator (%).
Hash function
A hash function is any function that can be used to map a data set of an arbitrary size
to a data set of a fixed size, which falls into the hash table. The values returned by a
hash function are called hash values, hash codes, hash sums, or simply hashes.
Applications
Associative arrays: Hash tables are commonly used to implement many types of
in-memory tables. They are used to implement associative arrays (arrays whose
indices are arbitrary strings or other complicated objects).
Database indexing: Hash tables may also be used as disk-based data
structures and database indices (such as in dbm).
Caches: Hash tables can be used to implement caches i.e. auxiliary data tables
that are used to speed up the access to data, which is primarily stored in slower
media.
Object representation: Several dynamic languages, such as Perl, Python,
JavaScript, and Ruby use hash tables to implement objects.
Hash Functions are used in various algorithms to make their computing faster
Greedy algorithms work by recursively constructing a set of objects from the smallest
possible constituent parts. Recursion is an approach to problem solving in which the
solution to a particular problem depends on solutions to smaller instances of the same
problem. The advantage to using a greedy algorithm is that solutions to smaller
instances of the problem can be straightforward and easy to understand. The
disadvantage is that it is entirely possible that the most optimal short-term solutions
may lead to the worst possible long-term outcome.
Most networking algorithms use the greedy approach. Here is a list of few of
them
Merge Sort
Quick Sort
Binary Search
Strassen's Matrix Multiplication
Closest pair (points)
Dynamic programming approach is similar to divide and conquer in breaking
down the problem into smaller and yet smaller possible sub-problems. But
unlike, divide and conquer, these sub-problems are not solved independently.
Rather, results of these smaller sub-problems are remembered and used for
similar or overlapping sub-problems. Dynamic programming is used where we
have problems, which can be divided into similar sub-problems, so that their
results can be re-used. Mostly, these algorithms are used for optimization.
Before solving the in-hand sub-problem, dynamic algorithm will try to examine
the results of the previously solved sub-problems. The solutions of sub-
problems are combined in order to achieve the best solution. Dynamic
programming can be used in both top-down and bottom-up manner.
Graph
Tree: A tree is an ideal data structure for representing hierarchical data. A tree can be
thorectically defined as a finite set of one or more data items(nodes).
Path Path refers to the sequence of nodes along the edges of a tree.
Root The node at the top of the tree is called root. There is only one root per
tree and one path from the root node to any node.
Parent Any node except the root node has one edge upward to a node called
parent.
Child The node below a given node connected by its edge downward is
called its child node.
Leaf The node which does not have any child node is called the leaf node.
Visiting Visiting refers to checking the value of a node when control is on the
node.
Levels Level of a node represents the generation of a node. If the root node
is at level 0, then its next child node is at level 1, its grandchild is at level 2, and
so on.
Degree of a tree is the maximum degree of node in a gicen tree.A node with
degree zero is called terminal node or a leaf.
For a Binary Tree to be a binary search tree, the data of all the nodes in the left sub-tree of the
root node should be the data of the root. The data of all the nodes in the right subtree of the
root node should be >> the data of the root.
Complete Binary Tree: A Binary Tree is complete Binary Tree if all levels are completely
filled except possibly the last level and the last level has all keys as left as possible
In Fig. 1, consider the root node with data = 10.
Also, considering the root node with data=5data=5, its children also satisfy the
specified ordering. Similarly, the root node with data=19data=19 also satisfies this
ordering. When recursive, all subtrees satisfy the left and right subtree ordering.
Pre-order traversal
Post-order traversal
In-order traversal
Trees are so useful and frequently used, because they have some very serious
advantages:
A degenerate (or pathological) tree: A Tree where every internal node has one child.
Such trees are performance-wise same as linked list.
AVL Tree: One of the more popular balanced trees, known as an AVL tree in Data
Structures, was introduced in 1962 by Adelson-Velski and Landis. An Avl trees is a
binary search tree in which for every node in the tree, The height of the left and right
Sub trees differ by at most1.
Importance of Rotations :
The insert and delete operations of AVL tree are the same as binary search tree (BST)
Since an insertion(deletion) involve adding (deleting) a tree node, this can only
increase (decrease) the heights of same sub tree(s) by 1
Thus, the AVL tree property may be violated
If the AVL tree property is violated ata node x, it means that the height of left(x) and
right(x) differ by exactly 2
After the insertion or deletion operations, we need to examine the tree and see if any
node violates the AVL tree property
If the AVL tree property is violated at node so, single or double rotation will be applied
to x to restore the AVL tree property.
Rotation will be applied in a bottom up manner starting at the place of
insertion(deletion)
Thus when we perform a rotation at x, The AVL tree property is restored at all proper
descendants of x.
Spanning Tree: A spanning tree is a subset of Graph G, which has all the vertices
covered with minimum possible number of edges. Hence, a spanning tree does not
have cycles and it cannot be disconnected. A complete undirected graph can have
maximum nn-2 number of spanning trees, where n is the number of nodes.
All possible spanning trees of graph G, have the same number of edges and
vertices.
The spanning tree does not have any cycle (loops).
Removing one edge from the spanning tree will make the graph disconnected,
i.e. the spanning tree is minimally connected.
Adding one edge to the spanning tree will create a circuit or loop, i.e. the
spanning tree is maximally acyclic.
In a weighted graph, a minimum spanning tree is a spanning tree that has minimum
weight than all other spanning trees of the same graph. In real-world situations, this
weight can be measured as distance, congestion, traffic load or any arbitrary value
denoted to the edges.
Two Important Minimum Spanning Tree:
1)Kruskal Algorithm
Kruskal's algorithm is a greedy algorithm in graph theory that finds a minimum
spanning tree for a connected weighted graph.
It finds a subset of the edges that forms a tree that includes every vertex, where the
total weight of all the edges in the tree is minimized.
This algorithm is directly based on the MST( minimum spanning tree) property.
2)Prims Algorithm
Prim's algorithm is a greedy algorithm that finds a minimum spanning tree for a
connected weighted undirected graph.It finds a subset of the edges that forms a tree
that includes every vertex, where the total weight of all the edges in the tree is
minimized.This algorithm is directly based on the MST( minimum spanning tree)
property.
BFS DFS
BFS Stands for Breadth First Search. DFS stands for Depth First Search.
BFS starts traversal from the root node and DFS starts the traversal from the root node and explore
then explore the search in the level by level the search as far as possible from the root node i.e.
manner i.e. as close as possible from the root depth wise.
node.
Breadth First Search can be done with the help Depth First Search can be done with the help
of queue i.e. FIFO implementation. of Stack i.e. LIFO implementations.
This algorithm works in single stage. The This algorithm works in two stages in the first stage
visited vertices are removed from the queue the visited vertices are pushed onto the stack and later
and then displayed at once. on when there is no vertex further to visit those are
popped-off.
BFS requires more memory compare to DFS. DFS require less memory compare to BFS.
BFS is useful in finding shortest path.BFS can DFS in not so useful in finding shortest path. It is used
be used to find the shortest distance between to perform a traversal of a general graph and the idea
some starting node and the remaining nodes of of DFS is to make a path as long as possible, and then
the graph. go back (backtrack) to add branches also as long as
possible.
B Tree B+ Tree
A B tree is an organizational
structure for information B+ tree is an n-array tree with
storage and retrieval in the a variable but often large
form of a tree in which all number of children per node.
terminal nodes are at the A B+ tree consists of a root,
Description
same distance from the base, internal nodes and leaves.
and all non-terminal nodes The root may be either a leaf
have between n and 2 n sub- or a node with two or more
trees or pointers (where n is children.
an integer).
Search O(log n)
O(logb n)
Insert O(log n)
O(logb n)
Delete O(log n)
O(logb n)
1. Processor management which involves putting the tasks into order and pairing
them into manageable size before they go to the CPU.
2. Memory management which coordinates data to and from RAM (random-access
memory) and determines the necessity for virtual memory.
3. Device management which provides interface between connected devices.
4. Storage management which directs permanent data storage.
5. Application which allows standard communication between software and your
computer.
6. User interface which allows you to communicate with your computer.
Operating system makes the programming task easier. The common services
provided by the operating system is listed below.
o Program execution
o I/O operation
o File system manipulation
o Communications
o Error detection.
Resource allocation
Accounting
Protection
Process:
o The text section comprises the compiled program code, read in from non-
volatile storage when the program is launched.
o The data section stores global and static variables, allocated and
initialized prior to executing main.
o The heap is used for dynamic memory allocation, and is managed via
calls to new, delete, malloc, free, etc.
o The stack is used for local variables. Space on the stack is reserved for
local variables when they are declared ( at function entrance or
elsewhere, depending on the language ), and the space is freed up when
the variables go out of scope. Note that the stack is also used for function
return values, and the exact mechanisms of stack management may be
language specific.
o Note that the stack and the heap start at opposite ends of the process's
free space and grow towards each other. If they should ever meet, then
either a stack overflow error will occur, or else a call to new or malloc will
fail due to insufficient memory available.
When processes are swapped out of memory and later restored, additional
information must also be stored and restored. Key among them are the program
counter and the value of all program registers.
Ready - The process has all the resources available that it needs to run, but the CPU is not
Waiting - The process cannot run at the moment, because it is waiting for some resource to
Process Scheduling:
Maximize CPU use, quickly switch processes onto CPU for time sharing
Process scheduler selects among available processes for next execution
on CPU
Maintains scheduling queues of processes
Schedulers
I/O-bound process spends more time doing I/O than computations, many
short CPU bursts
CPU-bound process spends more time doing computations; few very long
CPU bursts
Context Switch
When CPU switches to another process, the system must save the state of the
old process and load the saved state for the new process via a context switch.
Context of a process represented in the PCB
Context-switch time is overhead; the system does no useful work while
switching.The more complex the OS and the PCB -> longer the context switch
Time dependent on hardware support. Some hardware provides multiple sets of
registers per CPU -> multiple contexts loaded at once
What is a Thread?
A thread is a path of execution within a process. Also, a process can contain multiple
threads.
Why Multithreading?
Thread is also known as lightweight process. The idea is achieve parallelism by
dividing a process into multiple threads. For example, in a browser, multiple tabs can
be different threads. MS word uses multiple threads, one thread to format the text,
other thread to process inputs etc.
Process vs Thread?
The typical difference is that threads within the same process run in a shared memory
space, while processes run in separate memory spaces.
Threads are not independent of one other like processes as a result threads shares
with other threads their code section, data section and OS resources like open files
and signals. But, like process, a thread has its own program counter (PC), a register
set, and a stack space.
User thread are implemented by users. kernel threads are implemented by OS.
OS doesnt recognized user level threads. Kernel threads are recognized by OS.
complicated.
support.
If one user level thread perform blocking If one kernel thread perform blocking
operation then entire process will be operation then another thread can continue
blocked. execution.
Thread libraries provides programmers with API for creating and managing of
threads.Thread libraries may be implemented either in user space or in kernel space.
The user space involves API functions implemented solely within user space, with no
kernel support. The kernel space involves system calls, and requires a kernel with
thread library support.
Any solution to the critical section problem must satisfy three requirements:
Mutual Exclusion : If a process is executing in its critical section, then no other
process is allowed to execute in the critical section.
Progress : If no process is in the critical section, then no other process from
outside can block it from entering the critical section.
Bounded Waiting : A bound must exist on the number of times that other
processes are allowed to enter their critical sections after a process has made a
request to enter its critical section and before that request is granted.
Petersons Solution is a classical software based solution to the critical section
problem.
TestAndSet is a hardware solution to the synchronization problem. In TestAndSet, we
have a shared lock variable which can take either of the two values, 0 or 1.
Semaphore:
A semaphore is hardware or a software tag variable whose value indicates the status
of a common resource. Its purpose is to lock the resource being used. A process which
needs the resource will check the semaphore for determining the status of the
resource followed by the decision for proceeding. In multitasking operating systems,
the activities are synchronized by using the semaphore techniques.
There are two types of semaphores : Binary Semaphores and Counting
Semaphores
Binary Semaphores : They can only be either 0 or 1. They are also known as
mutex locks, as the locks can provide mutual exclusion. All the processes can
share the same mutex semaphore that is initialized to 1. Then, a process has to
wait until the lock becomes 0. Then, the process can make the mutex semaphore
1 and start its critical section. When it completes its critical section, it can reset
the value of mutex semaphore to 0 and some other process can enter its critical
section.
Counting Semaphores : They can have any value and are not restricted over a
certain domain. They can be used to control access a resource that has a
limitation on the number of simultaneous accesses. The semaphore can be
initialized to the number of instances of the resource. Whenever a process wants
to use that resource, it checks if the number of remaining instances is more than
zero, i.e., the process has an instance available. Then, the process can enter its
critical section thereby decreasing the value of the counting semaphore by 1.
After the process is over with the use of the instance of the resource, it can leave
the critical section thereby adding 1 to the number of available instances of the
resource.
Semaphores are commonly use for two purposes: to share a common memory space
and to share access to files. Semaphores are one of the techniques for interprocess
communication (IPC). The C programming language provides a set of interfaces or
"functions" for managing semaphores.
Properties of Semaphores
1. Simple
2. Works with many processes
3. Can have many different critical sections with different semaphores
4. Each critical section has unique access semaphores
5. Can permit multiple processes into the critical section at once, if desirable
Deadlock: Is it a state where two ore more operations are waiting for each other, say a
computing action 'A' is waiting for action 'B' to complete, while action 'B' can only
execute when 'A' is completed. Such a situation would be called a deadlock. In
operating systems, a deadlock situation is arrived when computer resources required
for complete of a computing task are held by another task that is waiting to execute.
The system thus goes into an indefinite loop resulting into a deadlock.The deadlock in
operating system seems to be a common issue in multiprocessor systems, parallel and
distributed computing setups.
The resources may be either physical or logical. Examples of physical resources are
Printers, Tape Drivers, Memory Space, and CPU Cycles. Examples of logical
resources are Files, Semaphores, and Monitors.The simplest example of deadlock is
where process 1 has been allocated non-shareable resources A, say, a tap drive, and
process 2 has be allocated non-sharable resource B, say, a printer. Now, if it turns out
that process 1 needs resource B (printer) to proceed and process 2 needs
resource A (the tape drive) to proceed and these are the only two processes in the
system, each is blocked the other and all useful work in the system stops. This
situation ifs termed deadlock. The system is in deadlock state because each process
holds a resource being requested by the other process neither process is willing to
release the resource it holds.Resources come in two flavors: preemptable and non
preemptable. A preemptable resource is one that can be taken away from the process
with no ill effects. Memory is an example of a preemptable resource. On the other
hand, a non preemptable resource is one that cannot be taken away from process
(without causing ill effect). For example, CD resources are not preemptable at an
arbitrary moment.Reallocating resources can resolve deadlocks that involve
preemptable resources. Deadlocks that involve non preemptable resources are difficult
to deal with.
Following three strategies can be used to remove deadlock after its occurrence:
1. PreemptionWe can take a resource from one process and give it to other. This
will resolve the deadlock situation, but sometimes it does causes problems.
2. RollbackIn situations where deadlock is a real possibility, the system can
periodically make a record of the state of each process and when deadlock
occurs, roll everything back to the last checkpoint, and restart, but allocating
resources differently so that deadlock does not occur.
3. Kill one or more processesThis is the simplest way, but it works.
Livelock: A situation in which two or more processes continuously change their states
in response to changes in the other process(es) without doing any useful work. It is
somewhat similar to the deadlock but the difference is processes are getting polite and
let other to do the work. This can be happen when a process trying to avoid a
deadlock.
Dijkstra Banking Algorithm :
One reason this algorithm is not widely used in the real world is because to use it the
operating system must know the maximum amount of resources that every process is
going to need at all times. Therefore, for example, a just-executed program must
declare up-front that it will be needing no more than, say, 400K of memory. The
operating system would then store the limit of 400K and use it in the deadlock
avoidance calculations.The Banker's Algorithm seeks to prevent deadlock by
becoming involved in the granting or denying of system resources. Each time that a
process needs a particular non-sharable resource, the request must be approved by
the banker.
Memory Management:
Main Memory refers to a physical memory that is the internal memory to the computer.
The word main is used to distinguish it from external mass storage devices such as
disk drives. Main memory is also known as RAM. The computer is able to change only
data that is in main memory. Therefore, every program we execute and every file we
access must be copied from a storage device into main memory.
All the programs are loaded in the main memeory for execution. Sometimes complete
program is loaded into the memory, but some times a certain part or routine of the
program is loaded into the main memory only when it is called by the program, this
mechanism is called Dynamic Loading, this enhance the performance.Also, at times
one program is dependent on some other program. In such a case, rather than loading
all the dependent programs, CPU links the dependent programs to the main executing
program when its required. This mechanism is known as Dynamic Linking.
Swapping
Swapping is a simple memory/process management technique used by the operating
system(os) to increase the utilization of the processor by moving some blocked
process from the main memory to the secondary memory(hard disk);thus forming a
queue of temporarily suspended process and the execution continues with the newly
arrived process.After performing the swapping process,the operating system has two
options in selecting a process for execution :Operating System can admit newly
created process or operating system can activate suspended process from the swap
memory.
If you have ever used any Linux based operating system then at the time of installation
Did you see an options/warning for the need of swap memory space?? If you have
enough primary memory(RAM) e.g greater than 2GB then you may need not any
swapping memory space for desktop users(I am using Ubuntu 10.04 LTS and total
RAM is 4GB so I am not feeling any trouble without swap memory space) and some
times using swap memory may slow down your computer performance.
First Fit
The first hole that is big enough is allocated to program.
Best Fit
The smallest hole that is big enough is allocated to program.
Worst Fit
The largest hole that is big enough is allocated to program.
Fragmentation occurs in a dynamic memory allocation system when most of the free
blocks are too small to satisfy any request. It is generally termed as inability to use the
available memory.In such situation processes are loaded and removed from the
memory. As a result of this, free holes exists to satisfy a request but is non contiguous
i.e. the memory is fragmented into large no. Of small holes. This phenomenon is
known as External Fragmentation. Also, at times the physical memory is broken into
fixed size blocks and memory is allocated in unit of block sizes. The memory allocated
to a space may be slightly larger than the requested memory. The difference between
allocated and required memory is known as Internal fragmentation i.e. the memory
that is internal to a partition but is of no use.
Paging Computer memory is divided into small partitions that are all the same size
and referred to as, page frames. Then when a process is loaded it gets divided into
pages which are the same size as those previous frames. The process pages are then
loaded into the frames. A Page Table is the data structure used by a virtual memory
system in a computer operating system to store the mapping between virtual
address and physical addresses.Virtual address is also known as Logical address and
is generated by the CPU. While Physical address is the address that actually exists on
memory.
Segmentation:
Involves programmer (allocates memory to specific function inside code)
Separate compiling
Separate protection
Share code
Each segment in this scheme is divided into pages and each segment is
maintained in a page table. So the logical address is divided into following 3
parts :
Segment numbers(S)
Page number (P)
The displacement or offset number (D)
Virtual Memory
Virtual memory is an approach to make use of the secondary storage devices as an
extension of the primary storage of the computer.It is the process of increasing the
apparent size of a computer's RAM by using a section of the hard disk storage as an
extension of RAM. Logically-assigned memory that may or may not exist physically.
Through the use of paging and the swap area, more memory can be referenced and
allocated than actually exists on the system, thus giving the appearance of a larger
main memory than actually exists. Virtual memory is commonly implemented by
demand paging. It can also be implemented in a segmentation system. Demand
segmentation can also be used to provide virtual memory.
Benefits of having Virtual Memory :
1. Large programs can be written, as virtual space available is huge compared to
physical memory.
2. Less I/O required, leads to faster and easy swapping of processes.
3. More physical memory available, as programs are stored on virtual memory, so
they occupy very less space on actual physical memory.
Demand Paging
A demand paging system is quite similar to a paging system with swapping where
processes reside in secondary memory and pages are loaded only on demand, not in
advance. When a context switch occurs, the operating system does not copy any of
the old programs pages out to the disk or any of the new programs pages into the
main memory Instead, it just begins executing the new program after loading the first
page and fetches that programs pages as they are referenced.
While executing a program, if the program references a page which is not available in
the main memory because it was swapped out a little ago, the processor treats this
invalid memory reference as a page fault and transfers control from the program to
the operating system to demand the page back into the memory.
Advantages
Following are the advantages of Demand Paging
Easy to implement, keep a list, replace pages by looking back into time.
Thrashing: A process that is spending more time paging than executing is said to be
thrashing. In other words it means, that the process doesn't have enough frames to
hold all the pages for its execution, so it is swapping pages in and out very frequently
to keep executing. Sometimes, the pages which will be required in the near future have
to be swapped out.Initially when the CPU utilization is low, the process scheduling
mechanism, to increase the level of multiprogramming loads multiple processes into
the memory at the same time, allocating a limited amount of frames to each process.
As the memory fills up, process starts to spend a lot of time for the required pages to
be swapped in, again leading to low CPU utilization because most of the proccesses
are waiting for pages. Hence the scheduler loads more processes to increase CPU
utilization, as this continues at a point of time the complete system comes to a stop.
FILE DIRECTORIES:
Collection of files is a file directory. The directory contains information about the files,
including attributes, location and ownership. Much of this information, especially that is
concerned with storage, is managed by the operating system. The directory is itself a
file, accessible by various file management routines.
SINGLE-LEVEL DIRECTORY
In this a single directory is maintained for all the users.
Naming problem: Users cannot have same name for two files.
Grouping problem: Users cannot group files according to their need.
TWO-LEVEL DIRECTORY
The way that files are accessed and read into memory is determined by Access
methods. Usually a single access method is supported by systems while there are
OS's that support multiple access methods.
Sequential Access
Direct Access
Files are allocated disk spaces by operating system. Operating systems deploy
following three main ways to allocate disk space to files.
Contiguous Allocation
Linked Allocation
Indexed Allocation
Contiguous Allocation
Mutual Exclusing
A way of making sure that if one process is using a shared modifiable data, the other
processes will be excluded from doing the same thing.Formally, while one process
executes the shared variable, all other processes desiring to do so at the same time
moment should be kept waiting; when that process has finished executing the shared
variable, one of the processes waiting; while that process has finished executing the
shared variable, one of the processes waiting to do so should be allowed to proceed.
In this fashion, each process executing the shared data (variables) excludes all others
from doing so simultaneously. This is called Mutual Exclusion.
Note that mutual exclusion needs to be enforced only when processes access shared
modifiable data - when processes are performing operations that do not conflict with
one another they should be allowed to proceed concurrently.
If we could arrange matters such that no two processes were ever in their critical
sections simultaneously, we could avoid race conditions. We need four conditions to
hold to have a good solution for the critical section problem (mutual exclusion).
No two processes may at the same moment inside their critical sections.
No assumptions are made about relative speeds of processes or number of
CPUs.
No process should outside its critical section should block other processes.
No process should wait arbitrary long to enter its critical section.
System Call: System calls provide an interface between the process and the
operating system.
System calls allow user-level processes to request some services from the
operating system which process itself is not allowed to do.
In handling the trap, the operating system will enter in the kernel mode, where it
has access to privileged instructions, and can perform the desired service on
the behalf of user-level process.
It is because of the critical nature of operations that the operating system itself
does them every time they are needed.
For example, for I/O a process involves a system call telling the operating
system to read or write particular area and this request is satisfied by the
operating system.
Types of System calls
Process control
File management
Device management
Information maintenance
Communications
1) Process Control:
A running program needs to be able to stop execution either normally or
abnormally.
When execution is stopped abnormally, often a dump of memory is taken and
can be examined with a debugger.
Following are functions of process control:
i. end, abort
ii. load, execute
iii. create process, terminate process
iv. get process attributes, set process attributes
v. wait for time
vi. wait event, signal event
vii. allocate and free memory
2) File management :
We first need to be able to create and delete files. Either system call requires
the name of the file and perhaps some of the file's attributes.
Once the file is created, we need to open it and to use it. We may also read,
write, or reposition. Finally, we need to close the file, indicating that we are no
longer using it.
We may need these same sets of operations for directories if we have a
directory structure for organizing files in the file system.
In addition, for either files or directories, we need to be able to determine the
values of various attributes and perhaps to reset them if necessary. File
attributes include the file name, a file type, protection codes, accounting
information, and so on
Functions:
o create file, delete file
o open, close file
o read, write, reposition
o get and set file attributes
3) Device Management:
A process may need several resources to execute - main memory, disk drives,
access to files, and so on. If the resources are available, they can be granted,
and control can be returned to the user process. Otherwise, the process will
have to wait until sufficient resources are available.
The various resources controlled by the OS can be thought of as devices. Some
of these devices are physical devices (for example, tapes), while others can be
thought of as abstract or virtual devices (for example, files).
Once the device has been requested (and allocated to us), we can read, write,
and (possibly) reposition the device, just as we can with files.
In fact, the similarity between I/O devices and files is so great that many OSs,
including UNIX, merge the two into a combined file-device structure.
A set of system calls is used on files and devices. Sometimes, 1/0 devices are
identified by special file names, directory placement, or file attributes.
Functions:
o request device, release device
o read, write, reposition
o get device attributes, set device attributes
o logically attach or detach devices
Information Maintenance
Many system calls exist simply for the purpose of transferring information
between the user program and the OS. For example, most systems have a
system call to return the current time and date.
Other system calls may return information about the system, such as the
number of current users, the version number of the OS, the amount of free
memory or disk space, and so on.
In addition, the OS keeps information about all its processes, and system calls
are used to access this information. Generally, calls are also used to reset the
process information.
Functions:
get time or date, set time or date
get system data, set system data
get and set process, file, or device attributes
Communication
There are two common models of interprocess communication: the message-
passing model and the shared-memory model. In the message-passing model,
the communicating processes exchange messages with one another to transfer
information.
In the shared-memory model, processes use shared memory creates and
shared memory attaches system calls to create and gain access to regions of
memory owned by other processes.
Recall that, normally, the OS tries to prevent one process from accessing
another process's memory. Shared memory requires that two or more
processes agree to remove this restriction. They can then exchange information
by reading and writing data in the shared areas.
Message passing is useful for exchanging smaller amounts of data, because no
conflicts need be avoided. It is also easier to implement than is shared memory
for intercomputer communication.
Shared memory allows maximum speed and convenience of communication,
since it can be done at memory speeds when it takes place within a computer.
Problems exist, however, in the areas of protection and synchronization
between the processes sharing memory.
Functions:
o create, delete communication connection
o send, receive messages
o transfer status information
o Attach and Detach remote devices
The fork() system call is used to create processes. When a process (a program
in execution) makes a fork() call, an exact copy of the process is created. Now
there are two processes, one being the parent process and the other being
the child process.The process which called the fork() call is the parent process
and the process which is created newly is called the child process. The child
process will be exactly the same as the parent. Note that the process state of the
parent i.e., the address space, variables, open files etc. is copied into the child
process. This means that the parent and child processes have identical but
physically different address spaces. The change of values in parent process
doesnt affect the child and vice versa is true too.Both processes start execution
from the next line of code i.e., the line after the fork() call. The exec() system call
is also used to create processes. But there is one big difference
between fork() and exec() calls. The fork() call creates a new process while
preserving the parent process. But, an exec() call replaces the address space,
text segment, data segment etc. of the current process with the new process.It
means, after an exec() call, only the new process exists. The process which
made the system call, wouldnt exist.
Device Controller
Device drivers are software modules that can be plugged into an OS to handle a
particular device. Operating System takes help from device drivers to handle all I/O
devices.The Device Controller works like an interface between a device and a device
driver. I/O units (Keyboard, mouse, printer, etc.) typically consist of a mechanical
component and an electronic component where electronic component is called the
device controller.
There is always a device controller and a device driver for each device to
communicate with the Operating Systems. A device controller may be able to handle
multiple devices. As an interface its main task is to convert serial bit stream to block of
bytes, perform error correction as necessary.
Any device connected to the computer is connected by a plug and socket, and the
socket is connected to a device controller. Following is a model for connecting the
CPU, memory, controllers, and I/O devices where CPU and device controllers all use
a common bus for communication.
Polling I/O
Polling is the simplest way for an I/O device to communicate with the processor. The
process of periodically checking status of the device to see if it is time for the next I/O
operation, is called polling. The I/O device simply puts the information in a Status
register, and the processor must come and get the information. Most of the time,
devices will not require attention and when one does it will have to wait until it is next
interrogated by the polling program. This is an inefficient method and much of the
processors time is wasted on unnecessary polls.
Compare this method to a teacher continually asking every student in a class, one
after another, if they need help. Obviously the more efficient method would be for a
student to inform the teacher whenever they require assistance.
Interrupts I/O
An alternative scheme for dealing with I/O is the interrupt-driven method. An interrupt
is a signal to the microprocessor from a device that requires attention. A device
controller puts an interrupt signal on the bus when it needs CPUs attention when
CPU receives an interrupt, It saves its current state and invokes the appropriate
interrupt handler using the interrupt vector (addresses of OS routines to handle
various events). When the interrupting device has been dealt with, the CPU continues
with its original task as if it had never been interrupted.
UNIX is an operating system which was first developed in the 1960s, and has
been under constant development ever since. By operating system, we mean the
suite of programs which make the computer work. It is a stable, multi-user, multi-
tasking system for servers, desktops and laptops.
UNIX systems also have a graphical user interface (GUI) similar to Microsoft
Windows which provides an easy to use environment. However, knowledge of
UNIX is required for operations which aren't covered by a graphical program, or for
when there is no windows interface available, for example, in a telnet session.
There are many different versions of UNIX, although they share common
similarities. The most popular varieties of UNIX are Sun Solaris, GNU/Linux, and
MacOS X. Redhat is the most popular distribution because it has been ported to a
large number of hardware platforms (including Intel, Alpha, and SPARC), it is easy
to use and install and it comes with a comprehensive set of utilities and applications
including the X Windows graphics system, GNOME and KDE GUI environments,
and the StarOffice suite (an open source MS-Office clone for Linux).
Support code which is not required to run in kernel mode is in System Library. User
programs and other system programs works in User Mode which has no access to
system hardware and kernel code. User programs/ utilities use System libraries to
access Kernel functions to get system's low level tasks.
Kernel
The Linux kernel includes device driver support for a large number of PC
hardware devices (graphics cards, network cards, hard disks etc.), advanced
processor and memory management features, and support for many different
types of filesystems (including DOS floppies and the ISO9660 standard for
CDROMs). In terms of the services that it provides to application programs and
system utilities, the kernel implements most BSD and SYSV system calls, as
well as the system calls described in the POSIX.1 specification.
The kernel (in raw binary form that is loaded directly into memory at system
startup time) is typically found in the file /boot/vmlinuz, while the source files can
usually be found in /usr/src/linux.
System Utilities
Virtually every system utility that you would expect to find on standard
implementations of UNIX (including every system utility described in the
POSIX.2 specification) has been ported to Linux. This includes commands such
as ls, cp, grep, awk, sed, bc, wc, more, and so on. These system utilities are
designed to be powerful tools that do a single task extremely well
(e.g. grep finds text inside files while wc counts the number of words, lines and
bytes inside a file). Users can often solve problems by interconnecting these
tools instead of writing a large monolithic application program.
Application programs
Linux distributions typically come with several useful application programs as
standard. Examples include the emacs editor, xv (an image viewer), gcc (a C
compiler), g++ (a C++ compiler), xfig (a drawing package), latex (a powerful
typesetting language) and soffice (StarOffice, which is an MS-Office style clone
that can read and write Word, Excel and PowerPoint files).
Redhat Linux also comes with rpm, the Redhat Package Manager which makes
it easy to install and uninstall application programs.
When you connect to a UNIX computer remotely (using telnet) or when you log in
locally using a text-only terminal, you will see the prompt:
login:
At this prompt, type in your usename and press the enter/return/ key. Remember
that UNIX is case sensitive (i.e. Will, WILL and will are all different logins). You should
then be prompted for your password:
login: will
password:
Type your password in at the prompt and press the enter/return/ key. Note that
your password will not be displayed on the screen as you type it in.
If you mistype your username or password you will get an appropriate message from
the computer and you will be presented with the login: prompt again. Otherwise you
should be presented with a shell prompt which looks something like this:
To log out of a text-based UNIX shell, type "exit" at the shell prompt (or if that doesn't
work try "logout"; if that doesn't work press ctrl-d).
Graphical terminals:
If you're logging into a UNIX computer locally, or if you are using a remote login facility
that supports graphics, you might instead be presented with a graphical prompt with
login and password fields. Enter your user name and password in the same way as
above (N.B. you may need to press the TAB key to move between fields).
Once you are logged in, you should be presented with a graphical window manager
that looks similar to the Microsoft Windows interface. To bring up a window containing
a shell prompt look for menus or icons which mention the words "shell", "xterm",
"console" or "terminal emulator".
To log out of a graphical window manager, look for menu options similar to "Log out" or
"Exit".
Linux Commands
These commands will work with most (if not all) distributions of Linux as well as most
(?) implementations of Unix. They're the commands that everybody knows. To be able
to survive in Linux, you should know these. There aren't always handy-dandy tools for
X that shield you, especially if you're managing your own system, stuff often goes
wrong and you're forced to work with the bare minimum.
1. Navigation - how to get around
o cd - changing directories
o ls - listing files
o pwd - knowing where you are
2. File Management - who needs a graphical file manager?
o cp - copying files
o ln - creating symbolic links
o mv - moving and renaming files
o rm - removing files
3. Editing - using text editors for those nasty configuration files
o emacs - another widely used text editor
o pico - for wussies like myself
o vim - an improved version of the standard Unix text editor
4. Monitoring Your System - to satisfy your insatiable curiosity
o tail - follow a file as it grows
o top - a program to see how your memory and CPU are holding up
o w - look at who's logged on
Navigation
Navigating around the files and directories of your hard drive could be a dreaded task
for you, but it is necessary knowledge. If you were a user of command prompt
interfaces such as MS-DOS, you'll have little trouble adjusting. You'll only need to learn
a few new commands. If you're used to navigating using a graphical file manager, I
don't know how it'll be like, but some concepts might require a little more clarification.
Or maybe it'll be easier for you. Who knows? Everyone is different.
cd
As you might already have guessed, the cd command changes directories. It's a very
common navigation command that you'll end up using, just like you might have done in
MS-DOS.
You must put a space between cd and the ".." or else it won't work; Linux doesn't see
the two dots as an extension to the cd command, but rather a different command
altogether. It'll come to make sense if it doesn't already.
ls
The ls letters stand for list. It basically works the same way as the dir command in
DOS. Only being a Unix command, you can do more with it. :-)
Typing ls will give you a listing of all the files in the current directory. If you're new to
Linux, chances are that the directories you are commonly in will be empty, and after
the ls command is run, you aren't given any information and will just be returned to the
command prompt (the shell).
There are "hidden" files in Linux, too. Their file names start with a dot, and doing a
normal ls won't show them in a directory. Many configuration files start with a dot on
their file names because they would only get in the way of users who would like to see
more commonly used items. To view hidden files, use the -a flag with the ls command,
i.e. ls -a.
To view more information about the files in a directory, use the -l flag with ls. It will
show the file permissions as well as the file size, which are probably what are the most
useful things to know about files.
You might occasionally want to have a listing of all the subdirectories, also. A simple -
R flag will do, so you could look upon ls -R as a rough equivalent of the dir /s
command in MS-DOS.
You can put flags together, so to view all the files in a directory, show their
permissions/size, and view all the files that way through the subdirectories, you could
type ls -laR.
pwd
This command simply shows what directory you're in at the moment. It stands for "Print
Working Directory". It's useful for scripting in case you might ever want to refer to your
current directory.
Subscribe Study Regular YouTube
Channel and Join Our Facebook Group
For MCQ and Understand these Topic
Concepts
File Management
A lot of people, surprisingly for me, prefer to use graphical file managers. Fortunately
for me, I wasn't spoiled like that and used commands in DOS. That made it a bit easier
for me to make the transition to Linux. Most of the file management Linux gurus do is
through the command line, so if you learn to use the commands, you can brag that
you're a guru. Well, almost.
cp
Copying works very much the same. The cp command can be used just like the MS-
DOS copy command, only remember that directories are separated with slashes (/)
instead of backslashes (\). So a basic command line is just cp filename1 filename2.
There are other extensions to the cp command. You can use the -f command to force
it. You can use the -p command to preserve the permissions (and also who owns the
file, but I'm not sure).
You can move an entire directory to its new destination. Let's say you want to copy a
directory (and all of its contents) from where you are to be /home/jack/newdirectory/.
You would type cp -rpf olddirectory /home/jack/newdirectory. To issue this
command you would have to be in the directory where the subdirectory "olddirectory"
is actually located.
ln
The most simple way that I've ever used ln to create symbolic links is ln -s
existing_file link. Evidently there's a hard link and a symbolic link; I've been using a
symbolic link all along. You can also use the -f flag to force the command line to
overwrite anything that might have the symbolic link's file name already.
To remove a symbolic link, simply type rm symbolic_link. It won't remove the file that
it's linked to.
mv
The mv command can be used both to move files and to rename them. The syntax is
mv fileone filetwo, where "fileone" is the original file name and "filetwo" will be the
new file name.
You can't move a directory that is located in one partition to another, unfortunately.
You can copy it, though, using cp -rpf, and then remove it with rm -rf later on. If you
have only a single partition that makes up your filesystem then you have very little to
worry about in this area.
rm
The rm command is used for removing files. You use it just like the del or delete
command in MS-DOS. Let's say you want to remove a file called foobar in your current
directory. To do that, simply type rm foobar. Note that there is no "Recycle Bin" like in
Windows 95. So when you delete a file, it's gone for good.
To delete something in some other directory, use the full path as the file name. For
example, if you want to delete a file called "windows" that's in the directory
/usr/local/src/, you would type rm /usr/local/src/windows.
To remove an entire directory and its contents, type rm -rf /directory where
"/directory" is the path to the directory that you want to delete. If you're wondering, the
"rf" stands for "recursive" and "force". Be very careful with this command, as it can
wreak havoc easily if misused.
Editing
If you haven't figured out how important a text editor is, you soon will. Graphical
interfaces can't shield you forever, and those utilities have their limits. Besides, if
you're reading this page, I'm inclined to think that you want to be able to customize
beyond the capabilities of graphical utilities. You want to work at the command prompt.
I know you do.
The basic syntax to invoke these text editors is the same. Type the name of the editor
followed by the file you want to edit, separated by a space in between. Non-existent
files will be blank. Blank files will be blank as well.
emacs
To use GNU Emacs (or its counterpart, XEmacs), there are really only two commands
you need to know. Heck, they're the only ones I know.
While you're editing a certain file with emacs or xemacs, you can save it with the
[Ctrl]-x [Ctrl]-s keystrokes. Then to exit, type [Ctrl]-x [Ctrl]-c.
pico
The instructions for using pico are located on the screen. You save the file by using
the [Ctrl]-o keystroke (for write-out) and exit with [Ctrl]-x.
As a permanent solution, you probably don't want to use pico. It lacks real power.
Since I am such a wuss, however, I still have the bad habit of using pico once in a
while. Why? By pressing [Ctrl]j I can get entire paragraphs wrapped into a nice
justified block. I don't know how to do that with the other text editors.
vim
Most modern distributions include vim, derived from the infamously arcane Unix editor,
vi. (It stands for vi Improved, as a matter of fact.)
Using vim is different in that there are several modes in which you use it. To do actual
editing of the files, press [ESC] i (both separately). Then to save it, press [ESC] : w.
Escape, the colon, and "w" should be keyed in one after the other. Finally, to quit, type
[ESC] : q. The same rules apply as in previous vim commands.
You can use "w" and "q" at the same time to enable yourself to write to the file and
then quit right afterwards. Just press [ESC] : w q.
An important part of system administration (especially with your own system) is being
able to know what's going on.
tail
The program tail allows you to follow a file as it is growing. Most often, I use it to follow
/var/log/messages. I do that by typing tail -f /var/log/messages. Of course, you can
use anything else, including the other logs in /var/log/. Another file you may want to
keep an eye out for is /var/log/secure.
If you want to leave that running all the time, I recommend having some sort of
terminal program in X, logged in as root through su.
Another program you may want to look at is head. It monitors the top of the file
specified, instead of the bottom.
top
This program shows a lot of stuff that goes on with your system. In the program, you
can type:
Typing w will tell you who is logged in. This can be helpful if you're the only one who
uses your computer and you see someone logged in that's not supposed to be.
To shut down your system, type shutdown -h now, which tells the shutdown program
to begin system halt immediately. You can also tell it to halt the system at a later time, I
think, but you'll have to consult the shutdown manual page for that (man shutdown).
To do a reboot, you can either type reboot or shutdown -r. You can also use the
famous Ctrl-Alt-Delete combination to reboot, which you might already be familiar with.
Shutting down and restarting properly (as described above) will prevent your filesystem
from being damaged. Filesystem damage is the most obvious of the consequences,
but there are probably other things out there that I don't know about. The point is, shut
down your system properly.
There are (rare!) cases in which the machine might lock up entirely, and prevent you
from being able to access a command prompt. Only then will your last resort be to do a
forced reboot (just pressing the restart button on the case).
When you run the terminal, the Shell issues a command prompt (usually $), where
you can type your input, which is then executed when you hit the Enter key. The output
or the result is thereafter displayed on the terminal. The Shell wraps around the
delicate interior of an Operating system protecting it from accidental damage. Hence
the name Shell.
1. The Bourne Shell: The prompt for this shell is $ and its derivatives are listed below:
2. The C shell: The prompt for this shell is % and its subcategories are:
Writing a series of command for the shell to execute is called shell scripting.It
can combine lengthy and repetitive sequences of commands into a single and
simple script, which can be stored and executed anytime. This reduces the effort
required by the end user. "#!" is an operator called shebang which directs the script to
the interpreter location. So, if we use"#! /bin/sh" the script gets directed to the bourne-
shell. Variables store data in the form of characters and numbers. Similarly, Shell
variables are used to store information and they can by the shell only.
Command Description
Bg To send a process to background
Fg To run a stopped process in foreground
Top Details on all Active Processes
Ps Give the status of processes running for a user
ps PID Gives the status of a particular process
Pidof Gives the Process ID (PID) of a process
kill PID Kills a process
Nice Starts a process with a given priority
Renice Changes priority of an already running process
Df Gives free hard disk space on your system
Free Gives free RAM on your system
Any running program or a command given to a Linux system is called a
process
A process could run in foreground or background
The priority index of a process is called Nice in Linux. Its default value is 0
and it can vary between 20 to -19
The lower the Niceness index the higher would be priority given to that task
Some Commands:
1) mv
The mv command - move - allows a user to move a file to another folder or directory.
Just like dragging a file located on a PC desktop to a folder stored within the
"Documents" folder, the mv command functions in the same manner.
2) man
The man command - the manual command - is used to show the manual of the
inputted command. Just like a film on the nature of film, the man command is the meta
command of the Linux CLI. Inputting the man command will show you all information
about the command you are using.
man cd: The inputting command will show the manual or all relevant information for
the change directory command.
3) mkdir
The mkdir - make directory - command allows the user to make a new directory. Just
like making a new directory within a PC or Mac desktop environment, the mkdir
command makes new directories in a Linux environment.
4) rmdir
The rmdir - remove directory - command allows the user to remove an existing
command using the Linux CLI.
Both the mkdir and rmdir commands make and remove directories. They do not
make files and they will also not remove a directory which has files in it. The
mkdir will make an empty directory and the rmdir command will remove an
empty directory.
5) touch
The touch command - a.k.a. the make file command - allows users to make files using
the Linux CLI. Just as the mkdir command makes directories, the touch command
makes files. Just as you would make a .doc or a .txt using a PC desktop, the touch
command makes empty files.
6) locate
The locate - a.k.a. find - command is meant to find a file within the Linux OS. If you
don't know the name of a certain file or you aren't sure where the file is saved and
stored, the locate command comes in handy.
Programming on Perl does not cause portability issues, which is common when
using different shells in shell scripting.
Error handling is very easy on Perl
You can write long and complex programs on Perl easily due to its vastness.
This is in contrast with Shell that does not support namespaces , modules ,
object , inheritance etc.
Shell has fewer reusable libraries available . Nothing compared to Perl's CPAN
Shell is less secure. Its calls external functions(commands like mv , cp etc
depend on the shell being used) . On the contrary Perl does useful work while
using internal functions.
C/C++
The C language was developed in 1972 by Dennis Ritchie at Bell Telephone
laboratories, primarily as a systems programming language. That is, a language to
write operating systems with. Richies primary goals were to produce a minimalistic
language that was easy to compile, allowed efficient access to memory, produced
efficient code, and did not need extensive run-time support. Thus, for a high-level
language, it was designed to be fairly low-level, while still encouraging platform-
independent programming.
C++ (pronounced see plus plus) was developed by Bjarne Stroustrup at Bell Labs as
an extension to C, starting in 1979. C++ adds many new features to the C language,
and is perhaps best thought of as a superset of C, though this is not strictly true as
C99 introduced a few features that do not exist in C++. C++s claim to fame results
primarily from the fact that it is an object-oriented language. As for what an object is
and how it differs from traditional programming methods, well, well cover that in
chapter 8 (Basic object-oriented programming).
C++ is an Object Oriented Programming language but is not purely Object Oriented.
Its features like Friend and Virtual, violate some of the very important OOPS features,
rendering this language unworthy of being called completely Object Oriented. Its a
middle level language.
Header files are included at the beginning just like in C program. Here iostream is a
header file which provides us with input & output streams. Header files contained
predeclared function libraries, which can be used by users for their ease.
Using namespace std, tells the compiler to use standard namespace. Namespace
collects identifiers used for class, object and variables. NameSpace can be used by
two ways in a program, either by the use of using statement at the beginning, like we
did in above mentioned program or by using name of namespace as prefix before the
identifier with scope resolution (::) operator.
main(), is the function which holds the executing part of program its return type is int.
cout <<, is used to print anything on screen, same as printf in C
language. cin and cout are same as scanf and printf, only difference is that you do not
need to mention format specifiers like, %d for int etc, in cout & cin.
A library is a collection of precompiled code (e.g. functions) that has been packaged
up for reuse in many different programs. Libraries provide a common way to extend
what your programs can do. The C++ core language is actually very small and
minimalistic (and youll learn most of it in these tutorials). However, C++ also comes
with a library called the C++ standard library that provides additional functionality for
your use. The C++ standard library is divided into areas (sometimes also called
libraries, even though theyre just parts of the standard library), each of which focus on
providing a specific type of functionality. One of the most commonly used parts of the
C++ standard library is the iostream library, which contains functionality for writing to
the screen and getting input from a console user.
Variables:
"Variable is a memory location in C++ Programming language".Variable are used
to store data on memory.
o Variable name can consist of letter, alphabets and start with underscore
character.
o First character of variable should always be alphabet and cannot be digit.
o Blank spaces are not allowed in variable name.
o Special characters like #, $ are not allowed.
o A single variable can only be declared for only 1 data type in a program.
o As C++ is case sensitive language so if we declare a variable name and one
more NAME both are two different variables.
o C++ has certain keywords which cannot be used for variable name.
o A variable name can be consist of 31 characters only if we declare a variable
more than 1 characters compiler will ignore after 31 characters.
1. int
2. float
3. double
4. char
Sensitive case:-
If we chose char data type then we try to store an integer in it like then compiler will
give an error if we put single quotes around 5 like that '5'.then compiler will store it as
an character not integer.
ASCII VALUES OF CHARACTERS
Integers are those values which has no decimal part they can be positive or negative.
like 12 or -12.
1. int
2. short int
3. long int
int
Signed int
The range of storing value of signed int variable is -32768 to 32767.It can interact with
both positive and negative value.
Unsigned int
This type of integers cannot handle negative values. Its rang is 0 to 65535.
Long int
float
Scope of Variables
All the variables have their area of functioning, and out of that boundary they don't hold
their value, this boundary is called scope of the variable. For most of the cases its
between the curly braces,in which variable is declared that a variable exists, not
outside it. We will study the storage classes later, but as of now, we can broadly divide
variables into two main types,
Global Variables
Local variables
Global variables
Global variables are those, which ar once declared and can be used throughout the
lifetime of the program by any class or any function. They must be declared outside
the main() function. If only declared, they can be assigned different values at different
time in program lifetime. But even if they are declared and initialized at the same time
outside the main() function, then also they can be assigned any value at any point in
the program.
Local Variables
Local variables are the variables which exist only between the curly braces, in which its
declared. Outside that they are unavailable and leads to compile time error.
A constant in C++ means an unchanging value and each constant has a type but
does not have location except the string constant.
Integer Constants
Integer constants consist of one or more digits such as 0,1,2,3,4 or -115. Floating
point constants contain a decimal point such as 4.15, -10.05. It can also be written in
scientific notation such as 1E-35 means 1*10^-35 or -1E35 means 1*10^35.
Character Constants
Character constants specify the numeric value of that particular character such as a
is the value of a. Some special constants are in the following table.
String Constants
String constants consist of characters enclosed in double quotes such as
Hello, World
The string is stored in the memory and the numeric value of that constant is the
address of this memory. The string constant is suffixed by \0, (the null character) by
the compiler.
Both C and C++ use escape sequence in the same manner such as \n character to
produce a new line. All escape sequences are preceded by a backslash, which
indicates a special sequence to the compiler. The compiler as a single character views
each and every escape sequence.It seems and may have been expected that an
escape sequence occupies 2 bytes which is wrong, it occupies only one byte.
Operators:
Operators are special symbols used for specific purposes. C++ provides many
operators for manipulating data.
Generally, there are six type of operators : Arithmetical operators, Relational
operators, Logical operators, Assignment operators, Conditional operators, Comma
operator.
Arithmetical operators
Arithmetical operators +, -, *, /, and % are used to performs an arithmetic (numeric)
operation.
Operator Meaning
+ Addition
- Subtraction
* Multiplication
/ Division
% Modulus
You can use the operators +, -, *, and / with both integral and floating-point data types.
Modulus or remainder % operator is used only with the integral data type.
Binary operators
Operators that have two operands are called binary operators.
Unary operators
C++ provides two unary operators for which only one variable is required.
For Example
a = - 50;
a = + 50;
Here plus sign (+) and minus sign (-) are unary because they are not used between
two variables.
Relational operators
The relational operators are used to test the relation between two values. All relational
operators are binary operators and therefore require two operands. A relational
expression returns zero when the relation is false and a non-zero when it is true. The
following table shows the relational operators.
Relational Operators Meaning
< Less than
<= Less than or equal to
== Equal to
> Greater than
>= Greater than or equal to
!= Not equal to
Logical operators
The logical operators are used to combine one or more relational expression. The
logical operators are
Operators Meaning
|| OR
&& AND
! NOT
Assignment operator
The assignment operator '=' is used for assigning a variable to a value. This operator
takes the expression on its right-hand-side and places it into the variable on its left-
hand-side. For example:
m = 5;
The operator takes the expression on the right, 5, and stores it in the variable on the
left, m.
x = y = z = 32;
This code stores the value 32 in each of the three variables x, y, and z.
Conditional operator
The conditional operator ?: is called ternary operator as it requires three operands. The
format of the conditional operator is:
Conditional_ expression ? expression1 : expression2;
If the value of conditional expression is true then the expression1 is evaluated,
otherwise expression2 is evaluated.
int a = 5, b = 6;
big = (a > b) ? a : b;
The condition evaluates to false, therefore big gets the value from b and it becomes 6.
The comma operator
The comma operator gives left to right evaluation of expressions. When the set of
expressions has to be evaluated for a value, only the rightmost expression is
considered.
int a = 1, b = 2, c = 3, i; // comma acts as separator, not as an operator
i = (a, b); // stores b into i
Would first assign the value of a to i, and then assign value of b to variable i. So, at the
end, variable i would contain the value 2.
the sizeof operator determines the amount of memory required for an object at compile
time rather than at run time.
1. while loop
2. for loop
3. do-while loop
while loop
while loop can be address as an entry control loop. It is completed in 3 steps.
Syntax:
variable(initialization);
while(condition)
{
statements;
variable increment or decrement;
}
for loop
for loop is used to execute a set of statement repeatedly until a particular condition is
satisfied. we can say it an open ended loop. General format is,
Syntax:
for(initialization;condition;increment/decrement;)
{
statement;
}
do while loop
In some situations it is necessary to execute body of the loop before testing the
condition. Such situations can be handled with the help of do-while loop. do statement
evaluates the body of the loop first and at the end, the condition is checked
using while statement. General format of do-while loop is,
Syntax:
do{
.
}
while(condition);
C language allows jumping from one statement to another within a loop as well as
jumping out of the loop.
1) break statement
When break statement is encountered inside a loop, the loop is immediately exited
and the program continues with the statement immediately following the loop.
2) continue statement
It causes the control to go directly to the test-condition and then continue the loop
process. On encountering continue, cursor leave the current cycle of loop, and starts
with the next cycle.
Storage class of a variable defines the lifetime and visibility of a variable. Lifetime
means the duration till which the variable remains active and visibility defines in which
module of the program the variable is accessible. There are five types of storage
classes in C++. They are:
1. Automatic
2. External
3. Static
4. Register
5. Mutable
auto: This is the default storage class for all the variables declared inside a function or
a block. Hence, the keyword auto is rarely used while writing programs in C language.
Auto variables can be only accessed within the block/function they have been declared
and not outside them (which defines their scope). Of course, these can be accessed
within nested blocks within the parent block/function in which the auto variable was
declared. However, they can be accessed outside their scope as well using the
concept of pointers given here by pointing to the very exact memory location where the
variables resides. They are assigned a garbage value by default whenever they are
declared.
extern: Extern storage class simply tells us that the variable is defined elsewhere and
not within the same block where it is used. Basically, the value is assigned to it in a
different block and this can be overwritten/changed in a different block as well. So an
extern variable is nothing but a global variable initialized with a legal value where it is
declared in order to be used elsewhere. It can be accessed within any function/block.
Also, a normal global variable can be made extern as well by placing the extern
keyword before its declaration/definition in any function/block. This basically signifies
that we are not initializing a new variable but instead we are using/accessing the global
variable only. The main purpose of using extern variables is that they can be accessed
between two different files which are part of a large program.
static: This storage class is used to declare static variables which are popularly used
while writing programs in C language. Static variables have a property of preserving
their value even after they are out of their scope! Hence, static variables preserve the
value of their last use in their scope. So we can say that they are initialized only once
and exist till the termination of the program. Thus, no new memory is allocated
because they are not re-declared. Their scope is local to the function to which they
were defined. Global static variables can be accessed anywhere in the program. By
default, they are assigned the value 0 by the compiler.
register: This storage class declares register variables which have the same
functionality as that of the auto variables. The only difference is that the compiler tries
to store these variables in the register of the microprocessor if a free register is
available. This makes the use of register variables to be much faster than that of the
variables stored in the memory during the runtime of the program. If a free register is
not available, these are then stored in the memory only. Usually few variables which
are to be accessed very frequently in a program are declared with the register keyword
which improves the running time of the program. An important and interesting point to
be noted here is that we cannot obtain the address of a register variable using
pointers.
mutalble: The mutable specifier applies only to class objects, which are discussed
later in this tutorial. It allows a member of an object to override constness. That is, a
mutable member can be modified by a const member function.
Function:
A function is a block of code that performs some operation. A function can optionally
define input parameters that enable callers to pass arguments into the function. A
function can optionally return a value as output. Functions are useful for encapsulating
common operations in a single reusable block, ideally with a name that clearly
describes what the function does. Every C++ program has at least one function, which
is main(), and all the most trivial programs can define additional functions.
2.User-defined Function: C++ allows programmer to define their own function. A user-
defined function groups code to perform a specific task and that group of code is given
a name(identifier).When the function is invoked from any part of program, it all
executes the codes defined in the body of function.
Syntax:
return-type function-name(parameters)
{
//function body;
}
return-type : suggests what the function will return. It can be int, char, some
pointer or even a class object. There can be functions which does not return
anything, they are mentioned with void.
Function Name : is the name of the function, using the function name it is called.
Parameters : are variables to hold values of arguments passed while function is
called. A function may or may not contain parameter list.
Function body : is he part where the code statements are written.
Functions are called by their names. If the function is without argument, it can be
called directly using its name. But for functions with arguments, we have two ways to
call them,
1.Call by Value :In this calling technique we pass the values of arguments which are
stored or copied into the formal parameters of functions. Hence, the original values are
unchanged only the parameters inside function changes.
2. Call by Reference: In this we pass the address of the variable as arguments. In this
case the formal parameter can be taken as a reference or a pointer, in both the case
they will change the values of the original variable.
Default Value of Parameters: When you define a function, you can specify a default
value for each of the last parameters. This value will be used if the corresponding
argument is left blank when calling to the function. This is done by using the
assignment operator and assigning values for the arguments in the function definition.
If a value for that parameter is not passed when the function is called, the default
given value is used, but if a value is specified, this default value is ignored and the
passed value is used instead.
When function is called within the same function, it is known as recursion in C++. The
function which calls the same function, is known as recursive function.
A function that calls itself, and doesn't perform any task after function call, is known as
tail recursion. In tail recursion, we generally call the same function with return
statement.
Arrays: C++ provides a data structure, the array, which stores a fixed-size sequential
collection of elements of the same type. An array is used to store a collection of data,
but it is often more useful to think of an array as a collection of variables of the same
type. Instead of declaring individual variables, such as number0, number1, ..., and
number99, you declare one array variable such as numbers and use numbers[0],
numbers[1], and ..., numbers[99] to represent individual variables. A specific element
in an array is accessed by an index. All arrays consist of contiguous memory
locations. The lowest address corresponds to the first element and the highest
address to the last element.
Declaring Arrays
To declare an array in C++, the programmer specifies the type of the elements and
the number of elements required by an array as follows:
type arrayName[arraySize];
Concept Description
Strings in C++
String is a collection of characters. There are two types of strings commonly used in
C++ programming language:
Strings that are objects of string class (The Standard C++ Library string class)
C-strings (C-style Strings)
C-strings
In C programming, the collection of characters is stored in the form of arrays, this is
also supported in C++ programming. Hence it's called C-strings.
C-strings are arrays of type char terminated with null character, that is, \0 (ASCII value
of null character is 0).
String Object:
In C++, you can also create a string object for holding strings.
Unlike using char arrays, string objects has no fixed length, and can be extended as
per your requirement.
1 strcpy(s1, s2);
2 strcat(s1, s2);
Concatenates string s2 onto the end of string s1.
3 strlen(s1);
Returns the length of string s1.
4 strcmp(s1, s2);
Returns 0 if s1 and s2 are the same; less than 0 if s1<s2; greater than 0 if
s1>s2.
5 strchr(s1, ch);
Returns a pointer to the first occurrence of character ch in string s1.
6 strstr(s1, s2);
Returns a pointer to the first occurrence of string s2 in string s1.
Pointer in C++:
The pointer in C++ language is a variable, it is also known as locator or indicator that
points to an address of a value.
Advantage of pointer
1) Pointer reduces the code and improves the performance, it is used to retrieving
strings, trees etc. and used with arrays, structures and functions.
3) It makes you able to access any memory location in the computer's memory.
Usage of pointer
Pointers in c language are widely used in arrays, functions and structures. It reduces
the code and improves the performance.
OOPs:
Object means a real word entity such as pen, chair, table etc. Object-Oriented
Programming is a methodology or paradigm to design a program using classes and
objects. It simplifies the software development and maintenance by providing some
concepts:
o Object
o Class
o Inheritance
o Polymorphism
o Abstraction
o Encapsulation
Object: Any entity that has state and behavior is known as an object. For
example: chair, pen, table, keyboard, bike etc. It can be physical and logical.
Inheritance: When one object acquires all the properties and behaviours of parent
object i.e. known as inheritance. It provides code reusability. It is used to achieve
runtime polymorphism.
Encapsulation: Binding (or wrapping) code and data together into a single unit is
known as encapsulation. For example: capsule, it is wrapped with different
medicines.
Classes:
1. Class name must start with an uppercase letter(Although this is not mandatory). If
class name is made of more than one word, then first letter of each word must be in
uppercase. Example,
Class Studyregular
2. Classes contain, data members and member functions, and the access of these
data members and variable depends on the access specifiers (discussed in next
section).
3. Class's member functions can be defined inside the class definition or outside the
class definition.
4. Class in C++ are similar to structures in C, the only difference being, class defaults
to private access control, where as structure defaults to public.
5. All the features of OOPS, revolve around classes in C++. Inheritance,
Encapsulation, Abstraction etc.
6. Objects of class holds separate copies of data members. We can create as many
objects of a class as we need.
7. Classes do posses more characteristics, like we can create abstract classes,
immutable classes, all this we will study later.
Objects
Class is mere a blueprint or a template. No storage is assigned when we define a
class. Objects are instances of class, which holds the data variables declared in class
and the member functions work on these class objects.Each object has different data
variables. Objects are initialised using special class functions called Constructors. We
will study about constructors later. And whenever the object is out of its scope, another
special class member function called Destructor is called, to release the memory
reserved by the object. C++ doesn't have Automatic Garbage Collector like in JAVA, in
C++ Destructor performs this task.
Member functions are part of C++ classes. Member functions represent behavior
of a class. All member functions can be divided into the following categories:
1. Simple functions: Simple functions are functions that do not have any
specific keyword used in declaration. They do not have any special behavior
and manipulate with data members of a class. The syntax used for
declaration of a simple member functions is:
3. Static functions: Static functions have class scope. They cant modify any
non-static data members or call non-static member functions. Static members
functions do not have implicit this argument. Thats why they can work only
with static members of class. Static member functions can be declared using
the following format:
4. Inline functions: Inline functions are declared by using inline keyword. The
purpose of inline functions is discussed in details in Inline functions. All the
functions that are implemented inside the class declaration are inline member
functions.
Although, friend functions are not member function, we will discuss the use of friend
functions too. Friend functions can access even private members of a class. Friend
function is a function that is not member function of a class, but it has access to
private and protected members of a class. Friend function is declared and
implemented outside of class as a simple function. But the class has to grant friend
privileges by declaring this function with friend keyword inside of the class
declaration.
Function overloading :
A feature in C++ that enables several functions of the same name can be defined with
different types of parameters or different number of parameters. This feature is called
function overloading. The appropriate function will be identified by the compiler by
examining the number or the types of parameters / arguments in the overloaded
function. Function overloading reduces the investment of different function names and
used to perform similar functionality by more than one function.
Operator overloading :
A feature in C++ that enables the redefinition of operators. This feature operates on
user defined objects. All overloaded operators provides syntactic sugar for function
calls that are equivalent. Without adding to / changing the fundamental language
changes, operator overloading provides a pleasant faade.
Constructor
It is a member function having same name as its class and which is used to initialize
the objects of that class type with a legel initial value. Constructor is automatically
called when object is created.
Types of Constructor
Default Constructor-: A constructor that accepts no parameters is known as default
constructor. If no constructor is defined then the compiler supplies a default
constructor.
Circle :: Circle()
{
radius = 0;
}
Parameterized Constructor -: A constructor that receives arguments/parameters, is
called parameterized constructor.
Circle :: Circle(double r)
{
radius = r;
}
Copy Constructor-: A constructor that initializes an object using values of another
object passed to it as parameter, is called copy constructor. It creates the copy of the
passed object.
Circle :: Circle(Circle &t)
{
radius = t.radius;
}
There can be multiple constructors of the same class, provided they have different
signatures.
Destructor
A destructor is a member function having sane name as that of its class preceded by
~(tilde) sign and which is used to destroy the objects that have been created by a
constructor. It gets invoked when an objects scope is over.
~Circle() {}
Both of the functions have the same name as that of the class, destructor function
having (~) before its name.
Both constructor and destructor functions should not be preceded by any data type
(not even void).
These functions do not (and cannot) return any values.
We can have only the constructor function in a class without destructor function or
vice-versa.
Constructor function can take arguments but destructors cannot.
Constructor function can be overloaded as usual functions.
Explicit call to the constructor: - By explicit call to the constructor,means that the
constructor is explicitly declared by the programmer inside the class.
Implicit call to the constructor: - By implicit call to the constructor,means that the
constructor is implicitly provided by the Compiler when an Object of the Class is
created and there is no Explicit Constructor defined inside the Class.
C++ Enumeration:
Enum in C++ is a data type that contains fixed set of constants. It can be used for days
of the week (SUNDAY, MONDAY, TUESDAY, WEDNESDAY, THURSDAY, FRIDAY
and SATURDAY) , directions (NORTH, SOUTH, EAST and WEST) etc. The C++
enum constants are static and final implicitly. C++ Enums can be thought of as classes
that have fixed set of constants.
Purpose of Inheritance
1. Code Reusability
2. Method Overriding (Hence, Runtime Polymorphism.)
3. Use of Virtual Keyword
1. Single Inheritance
2. Multiple Inheritance
3. Hierarchical Inheritance
4. Multilevel Inheritance
5. Hybrid Inheritance (also known as Virtual Inheritance)
Single Inheritance
In this type of inheritance one derived class inherits from only one base class. It is the
most simplest form of Inheritance.
Multiple Inheritance
In this type of inheritance a single derived class may inherit from two or more than two
base classes.
Multilevel Inheritance
In this type of inheritance the derived class inherits from a class, which in turn inherits
from some other class. The Super class for one, is sub class for the other.
Hybrid (Virtual) Inheritance
Hybrid Inheritance is combination of Hierarchical and Mutilevel Inheritance.
Upcasting and downcasting are an important part of C++. Upcasting and downcasting
gives a possibility to build complicated programs with a simple syntax. It can be
achieved by using Polymorphism. C++ allows that a derived class pointer (or
reference) to be treated as base class pointer. This is upcasting. Downcasting is an
opposite process, which consists in converting base class pointer (or reference) to
derived class pointer.
In C++ programming you can achieve compile time polymorphism in two way, which
is given below;
Method overloading
Method overriding
Whenever same method name is exiting multiple times in the same class with
different number of parameter or different order of parameters or different types of
parameters is known as method overloading.
Define any method in both base class and derived class with same name, same
parameters or signature, this concept is known as method overriding.
In C++ Run time polymorphism can be achieve by using virtual function. Virtual
Function is a function in base class, which is overrided in the derived class, and which
tells the compiler to perform Late Binding on this function.
Virtual Keyword is used to make a member function of the base class Virtual.
1. Only the Base class Method's declaration needs the Virtual Keyword, not the
definition.
2. If a function is declared as virtual in the base class, it will be virtual in all its derived
classes.
3. The address of the virtual Function is placed in the VTABLE and the copiler
uses VPTR(vpointer) to point to the Virtual Function.
1. Abstract class cannot be instantiated, but pointers and refrences of Abstract class
type can be created.
2. Abstract class can have normal functions and variables along with a pure virtual
function.
3. Abstract classes are mainly used for Upcasting, so that its derived classes can use
its interface.
4. Classes inheriting an Abstract Class must implement all pure virtual functions, or
else they will become Abstract too.
Exception in C++: Exception is an event that happens when unexpected circumstances appear. It
can be a runtime error or you can create an exceptional situation programmatically. Exception
handling consists in transferring control from the place where exception happened to the special
functions (commands) called handlers. Exceptions are handled by using try/catch block. The code
that can produce an exception is surrounded with try block. The handler for this exception is placed
in catch block.
Java team members (also known as Green Team), initiated a revolutionary task to
develop a language for digital devices such as set-top boxes, televisions etc.
For the green team members, it was an advance concept at that time. But, it was
suited for internet programming. Later, Java technology as incorporated by Netscape.
Features of Java:
1) Simple
Java is easy to learn and its syntax is quite simple, clean and easy to understand.The
confusing and ambiguous concepts of C++ are either left out in Java or they have been
re-implemented in a cleaner way.
Eg : Pointers and Operator Overloading are not there in java but were an important
part of C++.
2) Object Oriented
In java everything is Object which has some data and behaviour. Java can be easily
extended as it is based on Object Model.
3) Robust
Java makes an effort to eliminate error prone codes by emphasizing mainly on compile
time error checking and runtime checking. But the main areas which Java improved
were Memory Management and mishandled Exceptions by introducing
automatic Garbage Collector and Exception Handling.
4) Platform Independent
Unlike other programming languages such as C, C++ etc which are compiled into
platform specific machines. Java is guaranteed to be write-once, run-anywhere
language.
On compilation Java program is compiled into bytecode. This bytecode is platform
independent and can be run on any machine, plus this bytecode format also provide
security. Any machine with Java Runtime Environment can run Java Programs.
5) Secure
When it comes to security, Java is always the first choice. With java secure features it
enable us to develop virus free, temper free system. Java program always runs in Java
runtime environment with almost null interaction with system OS, hence it is more
secure.
6) Multi Threading
Java multithreading feature makes it possible to write program that can do many tasks
simultaneously. Benefit of multithreading is that it utilizes same memory and other
resources to execute multiple threads at the same time, like While typing, grammatical
errors are checked along.
7) Architectural Neutral
8) Portable
Java Byte code can be carried to any platform. No implementation dependent features.
Everything related to storage is predefined, example: size of primitive data types
9) High Performance
Java Application:
There are mainly 4 type of applications that can be created using java:-
1) Standalone Application/Desktop Application
2) Web Application
3) Enterprise Application
4) Mobile Application
1) Standalone Application/Desktop Application :-
It is also known as desktop application or window-based application. An application
that we need to install on every machine such as media player, antivirus etc. AWT and
Swing are used in java for creating standalone applications.
2) Web Application :-
An application that runs on the server side and creates dynamic page, is called web
application. Currently, servlet, jsp, struts, jsf etc. technologies are used for creating
web applications in java.
3) Enterprise Application :-
An application that is distributed in nature, such as banking applications etc. It has the
advantage of high level security, load balancing and clustering. In java, EJB is used for
creating enterprise applications.
4) Mobile Application :-
An application that is created for mobile devices. Currently Android and Java ME are
used for creating mobile applications.
Non-static fields are also known as instance variables because their values are unique
to each instance of a class (to each object, in other words); the currentSpeed of one
bicycle is independent from the currentSpeed of another.
ii) Class Variables (Static Fields):- A class variable is any field declared with the
static modifier; this tells the compiler that there is exactly one copy of this variable in
existence, regardless of how many times the class has been instantiated.
A field defining the number of gears for a particular kind of bicycle could be marked as
static since conceptually the same number of gears will apply to all instances. The
code static int numGears = 6; would create such a static field. Additionally, the
keyword final could be added to indicate that the number of gears will never change.
iii) Local Variables:- Similar to how an object stores its state in fields, a method will
often store its temporary state in local variables. The syntax for declaring a local
variable is similar to declaring a field (for example, int count = 0;).
There is no special keyword designating a variable as local; that determination comes
entirely from the location in which the variable is declared which is between the
opening and closing braces of a method.
As such, local variables are only visible to the methods in which they are declared;
they are not accessible from the rest of the class.
JVM:
Java virtual Machine(JVM) is a virtual Machine that provides runtime environment to
execute java byte code. The JVM doesn't understand Java typo, that's why you
compile your *.java files to obtain *.class files that contain the bytecodes
understandable by the JVM. JVM control execution of every Java program. It enables
features such as automated exception handling, Garbage-collected heap.
JRE : The Java Runtime Environment (JRE) provides the libraries, the Java Virtual
Machine, and other components to run applets and applications written in the Java
programming language. JRE does not contain tools and utilities such as compilers or
debuggers for developing applets and applications.
JDK : The JDK also called Java Development Kit is a superset of the JRE, and
contains everything that is in the JRE, plus tools such as the compilers and debuggers
necessary for developing applets and applications.
JIT: It is the set of programs developed by SUN Micro System and added as a part of
JVM, to speed up the interpretation phase.
In the older version of java compilation phase is so faster than interpretation phase.
Industry has complained to the SUN Micro System saying that compilation phase is
very faster and interpretation phase is very slow.
So solve this issue, SUN Micro System has developed a program called JIT (just in
time compiler) and added as a part of JVM to speed up the interpretation phase. In the
current version of java interpretation phase is so faster than compilation phase. Hence
java is one of the highly interpreted programming languages.
Class Loader : Class loader loads the Class for execution.
Method area : Stores pre-class structure as constant pool.
Heap : Heap is in which objects are allocated.
Stack : Local variables and partial results are store here. Each thread has a private JVM stack
created when the thread is created.
Program register : Program register holds the address of JVM instruction currently being
executed.
Native method stack : It contains all native used in application.
Executive Engine : Execution engine controls the execute of instructions contained in the methods
of the classes.
Native Method Interface : Native method interface gives an interface between java code and
native code during execution.
Native Method Libraries : Native Libraries consist of files required for the execution of native
code.
/jdk1.5.0/jre/bin Executable files for tools and libraries used by the Java
platform. The executable files are identical to files in
/jdk1.5.0/bin. The java launcher tool serves as an
application launcher, in place of the old jre tool that
shipped with 1.1 versions of the JDK software. This
directory does not need to be in the PATH environment
variable.
/jdk1.5.0/jre/lib Code libraries, property settings, and resource files used
by the Java runtime environment.
/jdk1.5.0/jre/lib/i386/client Contains the .so file used by the Java HotSpot Client
Virtual Machine, which is implemented with Java
HotSpot technology. This is the default VM.
/jdk1.5.0/jre/lib/i386/server Contains the .so file used by the Java HotSpot Server
Virtual Machine.
java
The launcher for Java applications.
javadoc
API documentation generator.
appletviewer
Run and debug applets without a web browser.
jar
Create and manage Java Archive (JAR) files.
jdb
The Java Debugger.
javah
C header and stub generator. Used to write native
methods.
javap
Class file disassembler
extcheck
Utility to detect Jar conflicts.
Java Class Path is required for using tools such as javac, java etc. If you are saving
the java file in jdk/bin folder, path is not required.But If you are having your java file
outside the jdk/bin folder, it is necessary to set path of JDK.
There are two ways to set java class path of JDK:
1. Temporary
2. Permanent
this keyword
Garbage Collection
In Java destruction of object from memory is done automatically by the JVM. When
there is no reference to an object, then that object is assumed to be no longer needed
and the memory occupied by the object are released. This technique is
called Garbage Collection. This is accomplished by the JVM.
Unlike C++ there is no explicit need to destroy object.
Can the Garbage Collection be forced explicitly ?
No, the Garbage Collection can not be forced explicitly. We may request JVM
for garbage collection by calling System.gc() method. But This does not guarantee
that JVM will perform the garbage collection.
Advantages of Garbage Collection:
finalize() method
Sometime an object will need to perform some specific task before it is destroyed such
as closing an open connection or releasing any resources held. To handle such
situation finalize() method is used. finalize()method is called by garbage collection
thread before collecting object. Its the last chance for any object to perform cleanup
utility.
gc() Method
gc() method is used to call garbage collector explicitly. However gc() method does not
guarantee that JVM will perform the garbage collection. It only request the JVM for
garbage collection. This method is present in System and Runtime class.
Object Cloning:
The object cloning is a way to create exact copy of an object. For this purpose,
clone() method of Object class is used to clone an object. The java.lang.Cloneable
interface must be implemented by the class whose object clone we want to create. If
we don't implement Cloneable interface, clone() method
generates CloneNotSupportedException.
The clone() method saves the extra processing task for creating the exact copy of an
object. If we perform it by using the new keyword, it will take a lot of processing to be
performed that is why we use object cloning.
Wrapper class in java provides the mechanism to convert primitive into object and
object into primitive.
Since J2SE 5.0, autoboxing and unboxing feature converts primitive into object and
object into primitive automatically. The automatic conversion of primitive into object is
known as autoboxing and vice-versa unboxing.
The eight classes of java.lang package are known as wrapper classes in java. The list
of eight wrapper classes are given below:
Instanceof Operator:
The java instanceof operator is used to test whether the object is an instance of the
specified type (class or subclass or interface). The instanceof in java is also known as
type comparison operator because it compares the instance with type. It returns either
true or false. If we apply the instanceof operator with any variable that has null value, it
returns false.
Java Package
Package are used in Java, in-order to avoid name conflicts and to control access of
class, interface and enumeration etc. A package can be defined as a group of similar
types of classes, interface, enumeration and sub-package. Using package it becomes
easier to locate the related classes.
Package are categorized into two forms:
What is Abstraction
Abstraction is process of hiding the implementation details and showing only the
functionality.
Abstraction in java is achieved by using interface and abstract class. Interface give
100% abstraction and abstract class give 0-100% abstraction.
Syntax:
abstract class <class-name>{}
An abstract class is something which is incomplete and you cannot create instance of
abstract class.
If you want to use it you need to make it complete or concrete by extending it.
A class is called concrete if it does not contain any abstract method and
implements all abstract method inherited from abstract class or interface it has
implemented or extended.
A method that is declare as abstract and does not have implementation is known as
abstract method.
If you define abstract method than class must be abstract.
Syntax:
- Encapsulated Code is more flexible and easy to change with new requirements.
- By providing only getter and setter method access, you can make the class read
only.
- Encapsulation in Java makes unit testing easy.
- A class can have total control over what is stored in its fields. Suppose you want to
set the value of marks field i.e. marks should be positive value, than you can write the
logic of positive value in setter method.
- Encapsulation also helps to write immutable class in Java which are a good choice in
multi-threading environments.
- Encapsulation allows you to change one part of code without affecting other part of
code.
Java String:
The following the class signature of String class defined in java.lang package:
1. valueOf( parameter ) :
valueOf() method is static and is overloaded many times in String class. It's job is to
convert any primitive data type or object, passed as parameter, into a string form.
Its function similar to toString() method of Object class. But toString() method converts
only objects into string form.
2. length( ) :
length( ) is an instance method in String class which returns an int value. It must be
called with an instance of String class and returns the number of characters present in
the string instance.
3.equals( ) :
equals( ) method is inherited from Object class and is overridden in String class. It
returns a boolean value of true if the strings are same or false, if the strings are
different. In the comparison, case( upper or lower) of the letters is considered.
Important Note:
String is a Final class; i.e once created the value cannot be altered. Thus
String objects are called immutable.
The Java Virtual Machine(JVM) creates a memory location especially for
Strings called String Constant Pool. Thats why String can be initialized
without new key word.
String class falls under java.lang.String hierarchy. But there is no need to
import this class. Java platform provides them automatically.
String reference can be overridden but that does not delete the content.
Multiple references can be used for same String but it will occur in the
same place.
A string that can be modified or changed is known as mutable string. StringBuffer and
StringBuilder classes are used for creating mutable string.
Java Multihreading:
Thread :-
A thread is a single sequential( separate ) flow of control within program. Sometimes, it
is called an execution context or light weight process. A thread itself is not a program.
A thread can not run on it's own( as it is a part of a program ). Rather, it runs within a
program. A program can be divided into a number of packets of code ---- each
representing a thread having its own separate flow of control.
Light weight process: A thread is considered a light weight process because it runs
within the context of a program and takes advantage of the resources allocated to that
program.
Heavy weight process: In the heavy weight process, the control changes in between
threads belonging to different processes. ( In light weight process, the control changes
in between threads belonging to same(one) process ).
Execution contest: A thread will have its own execution stack and program counter.
The code running within the thread works only within that context.
One of the strengths of Java is its support for multithreading. All the classes needed to
write a multithreaded program are included in the default imported package java.lang
through class Object, class Thread and interface Runnable.
Synchronization: At times when more than one thread try to access a shared
resource, we need to ensure that resource will be used by only one thread at a time.
The process by which this is achieved is called synchronization. The synchronization
keyword in java creates a block of code referred to as critical section.
Syntax:
Synchronized(object)
{
//statement to be synchronized
}
Every Java object with a critical section of code gets a lock associated with the object.
To enter critical section a thread need to obtain the corresponding object's lock.
Why we use Syncronization ?
If we do not use syncronization, and let two or more threads access a shared resource
at the same time, it will lead to distorted results.
Consider an example, Suppose we have two different threads T1 and T2, T1 starts
execution and save certain values in a file temporary.txt which will be used to calculate
some result when T1 returns. Meanwhile, T2 starts and before T1 returns, T2 change
the values saved by T1 in the file temporary.txt (temporary.txt is the shared resource).
Now obviously T1 will return wrong result.
To prevent such problems, synchronization was introduced. With synchronization in
above case, once T1 starts using temporary.txt file, this file will be locked(LOCK
mode), and no other thread will be able to access or modify it until T1 returns.
Using Synchronized Methods
Using Synchronized methods is a way to accomplish synchronization. But lets first see
what happens when we do not use synchronization in our program.
In Java, synchronized keyword causes a performance cost. A synchronized method in
Java is very slow and can degrade performance. So we must use synchronization
keyword in java when it is necessary else, we should use Java synchronized block that
is used for synchronizing critical section only.
Interthread Communication
Java provide benefits of avoiding thread pooling using inter-thread communication.
The wait(), notify(), and notifyAll() methods of Object class are used for this purpose.
These method are implemented as finalmethods in Object, so that all classes have
them. All the three method can be called only from within a synchronized context.
wait() tells calling thread to give up monitor and go to sleep until some other thread
enters the same monitor and call notify.
notify() wakes up a thread that called wait() on same object.
notifyAll() wakes up all the thread that called wait() on same object.
wait() sleep()
gets awake when notify() or notifyAll() does not get awake when notify() or notifyAll()
method is called. method is called
wait() is generaly used on condition sleep() method is simply used to put your
thread on sleep.
Thread Pooling
Pooling is usually implemented by loop i.e to check some condition repeatedly. Once
condition is true appropriate action is taken. This waste CPU time.
i) Bytecode:
When we compile a .java file, we get a .class file. The .class file can run on any
operating system irrespective of platform on which it was compiled. For this reason,
Java is called platform independent. But the .exe file of C language is not platform
independent.
.exe file contains binary code. Java's .class file contains bytecode. This bytecode
makes Java cross platform. Java compiler produces bytecodes. Any JVM, can run
these bytecode and produce output.
ii) Unicode:
ASCII(extended) character range is 0 to 255. We cannot add one more character, if we
do want. Only English alphabets has got corresponding ASCII values. That is why we
cannot write a C program in any other language than English.
Java's motto is internationalization. That is, it supports many world languages, like
Telugu, Kannada, Greek, Japanese etc. That is, there is a corresponding
ASCII(Unicode) value in Java for all these international languages.
This is possible due to the size of character of 2 bytes. That is, the character can
represent values ranging from 0 to 65,535. This range is called Unicode. We can say
ASCII is a subset of Unicode.
Upto 255, Unicode represents ASCII range and afterwards it adds its own values for
the alphabets of many world languages. Unicode is already includes up to 34,128
characters.
type wrapper:
Java uses primitive data types such as int, double, float etc. to hold the basic data
types for the sake of performance. Despite the performance benefits offered by the
primitive data types, there are situations when you will need an object representation
of the primitive data type. For example, many data structures in Java operate on
objects. So you cannot use primitive data types with those data structures. To handle
such type of situations, Java provides type Wrappers which provide classes that
encapsulate a primitive type within an object.
Autoboxing and Unboxing:
1. Autoboxing / Unboxing lets us use primitive types and Wrapper class objects
interchangeably.
2. We don't have to perform Explicit typecasting.
3. It helps prevent errors, but may lead to unexpected results sometimes. Hence must
be used with care.
4. Auto-unboxing also allows you to mix different types of numeric objects in an
expression. When the values are unboxed, the standard type conversions can be
applied.
i) Applications and
ii) Applets.
Applications are the programs that contain main( ) method and applets are the
programs that do not contain main( ) method. Applications can be executed with a
Java interpreter from the command line( with java command). Applets need a browser
to execute.
Before JDBC, ODBC API was the database API to connect and execute query with the
database. But, ODBC API uses ODBC driver which is written in C language (i.e.
platform dependent and unsecured). That is why Java has defined its own API (JDBC
API) that uses JDBC drivers (written in Java language).
JDBC Driver is a software component that enables java application to interact with
the database.There are 4 types of JDBC drivers:
The JDBC-ODBC bridge driver uses ODBC driver to connect to the database. The
JDBC-ODBC bridge driver converts JDBC method calls into the ODBC function
calls. This is now discouraged because of thin driver.
Java Regex:
The Java Regex or Regular Expression is an API to define pattern for searching or
manipulating strings.
RMI:
The RMI (Remote Method Invocation) is an API that provides a mechanism to create
distributed application in java. The RMI allows an object to invoke methods on an
object running in another JVM. The RMI provides remote communication between the
applications using two objects stub and skeleton.
Networking Ports:
Port Service name Transport protocol
20, 21 File Transfer Protocol (FTP) TCP
22 Secure Shell (SSH) TCP and UDP
23 Telnet TCP
25 Simple Mail Transfer Protocol (SMTP) TCP
50, 51 IPSec
53 Domain Name Server (DNS) TCP and UDP
67, 68 Dynamic Host Configuration Protocol (DHCP) UDP
69 Trivial File Transfer Protocol (TFTP) UDP
80 Hyper Text Transfer Protocol (HTTP) TCP
110 Post Office Protocol (POP3) TCP
119 Network News Transport Protocol (NNTP) TCP
123 Network Time Protocol (NTP) UDP
135-139 NetBIOS TCP and UDP
143 Internet Message Access Protocol (IMAP4) TCP and UDP
161, 162 Simple Network Management Protocol (SNMP) TCP and UDP
389 Lightweight Directory Access Protocol TCP and UDP
443 HTTP with Secure Sockets Layer (SSL) TCP and UDP
Generation of Computers:
These early computers used vacuum tubes as circuitry and magnetic drums for
memory. As a result they were enormous, literally taking up entire rooms and costing a
fortune to run. These were inefficient materials which generated a lot of heat, sucked
huge electricity and subsequently generated a lot of heat which caused ongoing
breakdowns.
These first generation computers relied on machine language (which is the most
basic programming language that can be understood by computers). These computers
were limited to solving one problem at a time. Input was based on punched cards and
paper tape. Output came out on print-outs. The two notable machines of this era were
the UNIVAC and ENIAC machines the UNIVAC is the first every commercial
computer which was purchased in 1951 by a business the US Census Bureau.
The replacement of vacuum tubes by transistors saw the advent of the second
generation of computing. Although first invented in 1947, transistors werent used
significantly in computers until the end of the 1950s. They were a big improvement
over the vacuum tube, despite still subjecting computers to damaging levels of heat.
However they were hugely superior to the vacuum tubes, making computers smaller,
faster, cheaper and less heavy on electricity use. They still relied on punched card for
input/printouts.
By this phase, transistors were now being miniaturised and put on silicon chips (called
semiconductors). This led to a massive increase in speed and efficiency of these
machines. These were the first computers where users interacted using keyboards
and monitors which interfaced with an operating system, a significant leap up from the
punch cards and printouts. This enabled these machines to run several applications at
once using a central program which functioned to monitor memory.
As a result of these advances which again made machines cheaper and smaller, a
new mass market of users emerged during the 60s.
This revolution can be summed in one word: Intel. The chip-maker developed the Intel
4004 chip in 1971, which positioned all computer components (CPU, memory,
input/output controls) onto a single chip. What filled a room in the 1940s now fit in the
palm of the hand. The Intel chip housed thousands of integrated circuits. The year
1981 saw the first ever computer (IBM) specifically designed for home use and 1984
saw the MacIntosh introduced by Apple. Microprocessors even moved beyond the
realm of computers and into an increasing number of everyday products.
The increased power of these small computers meant they could be linked, creating
networks. Which ultimately led to the development, birth and rapid evolution of the
Internet. Other major advances during this period have been the Graphical user
interface (GUI), the mouse and more recently the astounding advances in lap-top
capability and hand-held devices.
Computer devices with artificial intelligence are still in development, but some of these
technologies are beginning to emerge and be used such as voice recognition.
The essence of fifth generation will be using these technologies to ultimately create
machines which can process and respond to natural language, and have capability to
learn and organise themselves.
TYPES OF COMPUTER:
Personal computer:
1. Notebook
2. Tower computer
3. Laptop
4. Subnotebook
5. Handheld
6. Plamtop
7. PDA
Mini Computer:
It is a midsize computer useful in work stations that can cover 200 users
simultaneously.
Workstation:
It designs for engineering applications SDLC and various kinds of applications with
moderate power and graphic technologies. It generally maintains high storage media
along with large RAM. Workstation only work by the UNIX and Linux operating
systems. It has several types of storage media that maintain both diskless and disk
drive workstations.
Supercomputer and Mainframe:
Supercomputer is best fastest computer in world that is very expensive. It work based
on mathematical calculations so, everything work well with simple procedure. For
example, weather forecasting requires a supercomputer. Other uses of
supercomputers scientific simulations, (animated) graphics, fluid dynamic calculations,
nuclear energy research, electronic design, and analysis of geological data (e.g. in
petrochemical prospecting). Perhaps the best known supercomputer manufacturer is
Cray Research
Embedded SQL is a method of inserting inline SQL statements or queries into the
code of a programming language, which is known as a host language. Because the
host language cannot parse SQL, the inserted SQL is parsed by an embedded SQL
preprocessor.
Embedded SQL is a robust and convenient method of combining the computing power
of a programming language with SQL's specialized data management and
manipulation capabilities.
Every Oracle Database has a control file, which is a small binary file that records the
physical structure of the database. The control file includes:
Checkpoint information
The control file must be available for writing by the Oracle Database server
whenever the database is open. Without the control file, the database cannot be
mounted and recovery is difficult.
The control file of an Oracle Database is created at the same time as the database.
By default, at least one copy of the control file is created during database creation.
On some operating systems the default is to create multiple copies. You should
create two or more copies of the control file during database creation. You can also
create control files later, if you lose control files or want to change particular settings
in the control files.
6)Mirroring in Oracle
8)A group of servers, If one server is failed and its users are switched instantly to the
other servers is called Cluster.
Microsoft has three technologies for clustering: Microsoft Cluster Service (MSCS, a HA
clustering service), Component Load Balancing (CLB) (part of Application Center
2000), and Network Load Balancing Services (NLB). In Windows Server
2008 and Windows Server 2008 R2 the MSCS service has been renamed to Windows
Server Failover Clustering and the Component Load Balancing (CLB) feature has been
deprecated.
9)Conversion of message into a form,that cannot be easily understood by
unauthorized people is called encryption.
Encryption is the conversion of electronic data into another form, called ciphertext,
which cannot be easily understood by anyone except authorized parties.Network
encryption (sometimes called network layer, or network level encryption) is a network
security process that applies crypto services at the network transfer layer - above the
data link level, but below the application level. The network transfer layers are layers 3
and 4 of the Open Systems Interconnection (OSI) reference model, the layers
responsible for connectivity and routing between two end points. Using the existing
network services and application software, network encryption is invisible to the end
user and operates independently of any other encryption processes used. Data is
encrypted only while in transit, existing as plaintext on the originating and receiving
hosts.
Since in this methodology a working model of the system is provided, the users get a
better understanding of the system being developed.
Practically, this methodology may increase the complexity of the system as scope of
the system may expand beyond original plans.
12) Term used in networks which has header and trailer Packet
A data packet consists of three elements. The first element is a header, which contains
the information needed to get the packet from the source to the destination, and the
second element is a data area, which contains the information of the user who caused
the creation of the packet. The third element of packet is a trailer, which often contains
techniques ensuring that errors do not occur during transmission.During
communication of data the sender appends the header and passes it to the lower layer
while the receiver removes header and passes it to upper layer. Headers are added at
layer 6,5,4,3 & 2 while Trailer is added at layer 2.
13) Project Management Tools. A Gantt chart, Logic Network, PERT chart, Product
Breakdown Structure and Work Breakdown Structure are standard tools used
in project planning.
15) If you are on an Intranet, when you cant access internet then what will you
check? Proxy settings
A proxy or proxy server is basically another computer which serves as a hub through
which internet requests are processed. By connecting through one of these servers,
your computer sends your requests to the proxy server which then processes your
request and returns what you were wanting. In this way it serves as an intermediary
between your home machine and the rest of the computers on the internet. Proxies are
used for a number of reasons such as to filter web content, to go around restrictions
such as parental blocks, to screen downloads and uploads and to provide anonymity
when surfing the internet.
:Domain Name System is an Internet service that translates domain names into IP
addresses.
The DNS has a distributed database that resides on multiple machines on the
Internet.
DNS has some protocols that allow the client and servers to communicate with each
other.
When the Internet was small, mapping was done by using hosts.txt file.
The host file was located at host's disk and updated periodically from a master host
file.
When any program or any user wanted to map domain name to an address, the host
consulted the host file and found the mapping.
Now Internet is not small, it is impossible to have only one host file to relate every
address with a name and vice versa.
The solution used today is to divide the host file into smaller parts and store each part
on a different computer.
In this method, the host that needs mapping can call the closest computer holding the
needed information.
This method is used in Domain Name System (DNS).
Name space
The names assigned to the machines must be carefully selected from a name space
with complete control over the binding between the names and IP addresses.
There are two types of name spaces: Flat name spaces and Hierarchical names.
Keylogger:
A keylogger is a type of surveillance software (considered to be
either software or spyware) that has the capability to record every keystroke you make
to a log file, usually encrypted. A keylogger recorder can record instant messages, e-
mail, and any information you type at any time using your keyboard. The log file
created by the keylogger can then be sent to a specified receiver. Some
keylogger programs will also record any e-mail addresses you use and Web
site URLsyou visit.
Keyloggers, as a surveillance tool, are often used by employers to ensure employees
use work computers for business purposes only. Unfortunately, keyloggers can also be
embedded in spywareallowing your information to be transmitted to an unknown third
party.
Cloud Computing: Cloud computing is a type of computing that relies on sharing
computing resources rather than having local servers or personal devices to
handle applications.
In cloud computing, the word cloud (also phrased as "the cloud") is used as a
metaphor for "the Internet," so the phrase cloud computing means "a type of Internet-
based computing," where different services such as servers, storage and
applications are delivered to an organization's computers and devices through the
Internet.
Here are a few of the things you can do with the cloud:
How it Works
Cloud computing applies traditional supercomputing, or high-performance
computing power, normally used by military and research facilities, to perform tens of
trillions of computations per second. In consumer-oriented applications such as
financial portfolios, to deliver personalized information, to provide data storage or to
power large, immersive online computer games.
To do this, cloud computing uses networks of large groups of servers typically running
low-cost consumer PC technology with specialized connections to spread data-
processing chores across them. This shared IT infrastructure contains large pools of
systems that are linked together. Often, virtualization techniques are used to maximize
the power of cloud computing.
Infrastructure-as-a-service (IaaS)
The most basic category of cloud computing services. With IaaS, you rent IT
infrastructureservers and virtual machines (VMs), storage, networks, operating
systemsfrom a cloud provider on a pay-as-you-go basis.
There are three different ways to deploy cloud computing resources: public
cloud, private cloud and hybrid cloud.
Public cloud
Public clouds are owned and operated by a third-party cloud service provider, which
deliver their computing resources like servers and storage over the Internet. Microsoft
Azure is an example of a public cloud. With a public cloud, all hardware, software and
other supporting infrastructure is owned and managed by the cloud provider. You
access these services and manage your account using a web browser.
Private cloud
Hybrid cloud
Hybrid clouds combine public and private clouds, bound together by technology that
allows data and applications to be shared between them. By allowing data and
applications to move between private and public clouds, hybrid cloud gives businesses
greater flexibility and more deployment options.
MIS is the use of information technology, people, and business processes to record,
store and process data to produce information that decision makers can use to make
day to day decisions.
The following are some of the justifications for having an MIS system
Components of MIS
The type of information system that a user uses depends on their level in an
organization. The following diagram shows the three major levels of users in an
organization and the type of information system that they use.
This type of information system is used to record the day to day transactions of a
business. An example of a Transaction Processing System is a Point of Sale (POS)
system. A POS system is used to record the daily sales.
Management Information Systems are used to guide tactic managers to make semi-
structured decisions. The output from the transaction processing system is used as
input to the MIS system.
A manual information system does not use any computerized devices. The recording,
storing and retrieving of data is done manually by the people, who are responsible for
the information system.
1. Control key is used in combination with another key to perform a specific task
2. Scanner will translate images of text, drawings and photos into digital form
3. CPU is the brain of the computer
4. Something which has easily understood instructions is said to be user friendly
5. Information on a computer is stored as digital data
6. For creating a document, you use new command at file menu
7. The programs and data kept in main memory while the processor is using them
8. Ctrl + A command is used to select the whole document
9. Sending an e-mail is same as writing a letter
10. A Website address is a unique name that identifies a specific website on the web
11. Answer sheets in bank POs/Clerks examinations are checked by using Optical
Mark Reader
12. Electronic data exchange provides strategic and operational business opportunity
13. Digital signals used in ISDN have whole number values
30. COPY command in MS-DOS is used to copy one or more files in disk drive to
another, copy from one directory to another directory
31. REN command is Internal command
32. Tom Burners-Li propounded the concept of World wide web
33. The memory address sent from the CPU to the main memory over a set of
wires is called address bus
34. MODEM is an electronic device required the computer to connect to the INTERNET
35. A source program is a program which is to be Tran scripted in machine language
36. Virus in computer relates to program
37. Floppy is not a storage medium in the computer related hardware
38. DOS floppy disk does not have a boot record
39. The CPU in a computer comprises of Store, arithmetic and logical unit and control
unit
40. In computer parlor a mouse is a screen saver
41. OMR is used to read choice filled up by the student in common entrance test
42. A network that spreads over cities is WAN
43. File Manager is not a part of a standard office suite
44. A topology of computer network means cabling between PCs
45. In UNIX command Ctrl + Z is used to suspend current process or command
46. Word is the word processor in MS Office
47. Network layer of an ISO-OSI reference model is for networking support
48. Telnet helps in remote login
49. MS Word allows creation of .DOC type of documents by default
50. In case of MS-access, the rows of a table correspond to records
51. Record maintenance in database is not a characteristic of E-mail
52. In a SONET system, an add/drop multipliers removes noise from a signal and can
also add/remove headers
53. The WWW standard allows grammars on many different computer platforms to
show the information on a server. Such programmers are called Web Browsers
54. One of the oldest calculating device was abacus
55. Paint art is not a special program in MS Office
56. Outlook Express is a e-mail client, scheduler, address book
57. The first generation computers had vacuum tubes and magnetic drum
58. Office Assistant is an animated character that gives help in MSOffice
59. Alta Vista has been created by research facility of Digital Electronic corporation of USA
61. Spiders search engines continuously send out that starts on a homepage of a
server and pursue all links stepwise
62. Static keys make a network insecure
63. Joy Stick is an input device that cannot be used to work in MS Office
64. Artificial intelligence can be used in every sphere of life because of its ability to
think like human beings
65. To avoid the wastage of memory, the instruction length should be of word size
which is multiple of character size
Set-2
1. A set of computer programs used for a certain function such as word processing is
the best definition of a software package
2. You can start Microsoft word by using start button
3. A blinking symbol on the screen that shows where the next character will appear
is a cursor
4. Highlight and delete is used to remove a paragraph from a report you had written
5. Data and time are available on the desktop at taskbar
6. A directory within a directory is called sub directory
7. Testing is the process of finding errors in software code
8. In Excel, charts are created using chart wizard option
11. A tool bar contains buttons and menus that provide quick access to commonly
used commands
14. A programming language contains specific rules and words that express the
logical steps of an algorithm
15. One advantage of dial-up internet access is it utilizes existing telephone security
16. Protecting data by copying it from the original source is backup
17. Network components are connected to the same cable in the star topology
18. Two or more computers connected to each other for sharing information form a network
19. A compute checks the database of user name and passwords for a match
before granting access
20. Computers that are portable and convenient for users who travel are known as
laptops
21. Spam is the term for unsolicited e-mail
22. Utility software type of program controls the various computer parts and allows the
user to interact with the computer
23. Each cell in a Microsoft office excel document is referred to by its cell address,
which is the cells row and column labels
24. Eight digit binary number is called a byte
25. Office LANs that are spread geographically apart on a large scale can be
connected using a corporate WAN
26. Storage is the process of copying software programs from secondary storage
media to the hard disk
27. The code for a web page is written using Hyper Text Markup Language
28. Small application programs that run on a Web page and may ensure a
form is completed properly or provide animation are known as flash
29. In a relational database, table is a data structure that organizes the information
about a single topic into rows and columns
30. The first computers were programmed using assembly language
31. When the pointer is positioned on a hyperlink it is shaped like a hand
32. Booting process checks to ensure the components of the computer are operating
and connected properly
33. Checking the existing files saved on the disk the user determine what programs
are available on a computer
34. Special effect used to introduce slides in a presentation are called animation
35. Computers send and receive data in the form of digital signals
36. Most World Wide Web pages contain HTML commands in the language
37. Icons are graphical objects used to represent commonly used application
38. UNIX is not owned and licensed by a company
39. In any window, the maximize button, the minimize button and the close buttons
appear on the title bar
40. Dial-up Service is the slowest internet connection service
41. Every component of your computer is either hardware or software
42. Checking that a pin code number is valid before it is entered into the system is
an example of data validation
43. A compiler translates higher level programs into a machine language program,
which is called object code
44. The ability to find an individual item in a file immediately direct access
50. A spread sheet that works like a calculator for keeping track of money and
making budgets
51. To take information from one source and bring it to your computer is referred to
as download
52. Each box in a spread sheet is called a cell
53. Network components are connected to the same cable in the bus topology
54. Two or more computers connected to each other for sharing information
form a network
55. A computer checks the database of user names and passwords for a match
before granting access.
56. Spam is the other name for unsolicited e-mail
57. Operating system controls the various computer parts and allows the user to
interact with the computer
58. Each cell in a Microsoft Office Excel document is referred to by its cell address,
which is the cells row and column labels
59. Download is the process of copying software programs from secondary storage
media to the hard disk
60. The code for a web page is written using Hypertext Markup Language
61. Small application programs that run on a web page and may ensure a form
is completed properly or provide animation are known as Flash
62. A file is a unique name that you give to a file of information
63. For seeing the output, you use monitor
64. CDs are of round in shape
65. Control key is used in combination with another key to perform a specific task
66. Scanner will translate images of text, drawings and photos into digital form
67. CPU is the brain of the computer
68. Something which has easily understood instructions is said to be user friendly
69. Information on a computer is stored as digital data
70. For creating a document, you use new command at file menu
71. The programs and data kept in main memory while the processor is using them
72. Ctrl + A command is used to select the whole document
73. Sending an e-mail is same as writing a letter
74. A Website address is a unique name that identifies a specific website on the web
75. Answer sheets in bank POs/Clerks examinations are checked by using Optical
Mark Reader
76. Electronic data exchange provides strategic and operational business opportunity
77. Digital signals used in ISDN have whole number values
78. Assembler is language translation software
79. Manual data can be put into computer by scanner
80. In a bank, after computerization cheques are taken care of by MICR
81. The banks use MICR device to minimize conversion process
82. Image can be sent over telephone lines by using scanner
83. Microchip elements are unique to a smart card
84. MS-DOS is a single user operating system
85. Basic can be used for scientific and commercial purpose
Set-3
7. Storage that retains its data after the power is turned off is referred to as non-
volatile storage
8. Virtual memory is memory on the hard disk that the CPU uses as an extended
RAM
9. To move to the beginning of a line of text, press the home key
10. When sending and e-mail, the subject line describes the contents of the message
11. Microsoft is an application suite
12. Information travels between components on the motherboard through bays
13. One advantage of dial-up internet access is it utilizes existing telephone security
14. Network components are connected to the same cable in the star topology
15. Booting checks to ensure the components of the computer are operating
and connected properly
16. Control key is used in combination with another key to perform a specific task
17. Scanner will translate images of text, drawings, and photos into digital form
18. Information on a computer is stored as digital data
19. The programs and data kept in main memory while the processor is using them
20. Storage unit provide storage for information and instruction
21. Help menu button exist at start
22. Microsoft company developed MS Office 2000
23. Charles Babbage is called the father of modern computing
24. Data link layer of OSI reference model provides the service of error detection
and control to the highest layer
25. Optical fiber is not a network
Subscribe Study Regular YouTube
Channel and Join Our Facebook Group
For MCQ and Understand these Topic
Concepts
26. OMR is used to read choice filled up by the student in common entrance test
27. A network that spreads over cities is WAN
28. File Manager is not a part of a standard office suite
29. A topology of computer network means cabling between PCs
30. In UNIX command Ctrl + Z is used to suspend current process or command
31. Word is the word processor in MS Office
32. Network layer of an ISO-OSI reference model is for networking support
33. Telnet helps in remote login
34. MS Word allows creation of .DOC type of documents by default
35. In case of MS-access, the rows of a table correspond to records
36. Record maintenance in database is not a characteristic of E-mail
37. In a SONET system, an add/drop multipliers removes noise from a signal and can
also add/remove headers
38. The WWW standard allows grammars on many different computer platforms to
show the information on a server. Such programmers are called Web Browsers
39. One of the oldest calculating device was abacus
40. Paint art is not a special program in MS Office
41. Outlook Express is a e-mail client, scheduler, address book
42. The first generation computers had vacuum tubes and magnetic drum
43. Office Assistant is an animated character that gives help in MSOffice
44. Alta Vista has been created by research facility of Digital Electronic corporation of USA
46. Spiders search engines continuously send out that starts on a homepage of a
server and pursue all links stepwise
47. Static keys make a network insecure
48. Joy Stick is an input device that cannot be used to work in MS Office
49. Artificial intelligence can be used in every sphere of life because of its ability to
think like human beings
50. To avoid the wastage of memory, the instruction length should be of word size which
is multiple of character size
51. Electronic fund transfer is the exchange of money from one account to another
52. Format menu in MS Word can be use to change page size and typeface
53. Assembly language programs are written using Mnemonics
54. DMA module can communicate with CPU through cycle stealing
55. A stored link to a web page, in order to have a quick and easy access to it later, is called
bookmark
56. B2B type of commerce is characterized by low volume and high value transactions in banking
57. Advanced is not a standard MS Office edition
58. Workstation is single user computer with many features and good processing power
59. History list is the name of list that stores the URLs of web pages and links visited in past few
days
60. FDDI access mechanism is similar to that of IEEE 802.5
61. MS Office 2000 included a full-fledged web designing software are called FrontPage
62. 2000
63. Macintosh is IBMs microcomputer
64. X.21 is physical level standard for X.25
65. Enter key should be pressed to start a new paragraph in MS Word
66. Main frame is most reliable, robust and has a very high processing power.
67. Formatting of these toolbars allows changing of Fonts and their sizes
68. The ZZ command is used to quit editor after saving
69. The program supplied by VSNL when you ask for internet connection for the e-mail access is
pine
70. The convenient place to store contact information for quick, retrieval is address book
71. Digital cash is not a component of an e-wanet
72. For electronic banking, we should ensure the existence and procedures with regard to identification
of customers who become members electronically
73. Jon Von Neumann developed stored-program concept
74. Hardware and software are mandatory parts of complete PC system
75. Firewall is used in PC for security
76. Two rollers are actually responsible for movement of the cursor in mouse
Subscribe Study Regular YouTube Channel
and Join Our Facebook Group
For MCQ and Understand these Topic Concepts