CSC 222 Lect I
CSC 222 Lect I
Prerequisite: CSC201 Data storage and retrieval, information: capture and representation,
management applications, analysis and indexing search retrieval, privacy, integrity, security,
scalability, efficiency and effectiveness; data types, records and files. Files processing methods,
mapping logical organization on to physical storage. Backup procedures and file processing
DATA MANAGEMENT
Data Management is the practice of collecting, storing, organizing, verifying and processing data
securely, efficiently, and cost-effectively.
Data management can also be defined as an administrative process that includes acquiring,
validating, storing, protecting, and processing required data to ensure the accessibility, reliability,
and timeliness of the data for its users.
The goal of data management is to help people, organizations, and connected things optimize the
use of data within the bounds of policy and regulation so that they can make decisions and take
actions that maximize the benefit to the organization.
Operations of Data Management
1. Create, access, and update data across a diverse data tier
2. Store data across multiple clouds and on premises
3. Provide high availability and disaster recovery
4. Use data in a growing variety of apps, analytics, and algorithms
5. Ensure data privacy and security
6. Archive and destroy data in accordance with retention schedules and compliance
requirements
SRAM DRAM
Stands for Static Random Access Stands for Dynamic random Access memory
Memory
Does not require refreshing cycles to Requires periodical refresh cycle to retain data
retain data
Require minimum time to access data Requires more time to access data
ROM: This memory is used as the computer begins to boot up. Small programs called firmware
are often stored in ROM chips on hardware devices (like a BIOS chip), and they contain
instructions the computer can use in performing some of the most basic operations required to
operate hardware devices. ROM does not allow the random access of data rather it allows
sequential access of data. It is less expensive as compared to RAM and other storage devices such
as magnetic disk, etc. ROM memory cannot be easily or quickly overwritten or modified.
Types of PROM include:
Programmable ROM, or PROM, is essentially a blank version of ROM that you can
purchase and program once with the help of a special tool called a programmer. A special
PROM programmer is employed to enter the program on the PROM. Once the chip has been
programmed, information on the PROM can’t be altered. PROM is non-volatile, that is data
is not lost when power is switched off.
Erasable programmable ROM (EPROM) - A type of ROM that is programmed using high
voltages and exposure to ultraviolet light for about 20 minutes.
Electrically-erasable programmable ROM (EEPROM) - Often used in older computer
chips and to control BIOS, EEPROM can be erased and reprogrammed several times while
enabling the erase and writing of only one location at a time. Flash memory is an updated
version of EEPROM that allows numerous memory locations to be changed at the same time.
The main difference between PROM EPROM and EEPROM is that PROM is programmable only
once while EPROM is reprogrammable using ultraviolet light and EPROM is reprogrammable
using an electric charge.
*Note: RAM and ROM are both located on the computer motherboard but in a separate plug in
chipset.
RAM ROM
RAM can be modified (allows reading and writing) ROM can’t be modified (allows reading only)
Large size with higher capacity Small size with less capacity
Requires flow of electricity to retain data Does not require flow of electricity to retain data
Cache: Cache Memory – Cache (pronounced cash) memory is extremely fast memory that is
built into a computer’s central processing unit (CPU), or located next to it on a separate chip. It is
a small, fast, and expensive memory that stores the copies of data that is needed to be accessed
frequently from the main memory. The processor, before reading data from or writing data to the
main memory, checks for the same data in the cache memory. If it finds the data in the cache
memory the processor reads the data from or writes the data to the cache itself because its access
time is much faster than the main memory. The transfer of data between the processor and the
cache memory is bidirectional. The availability of data in the cache is known as cache hit. The
capability of a cache memory is measured on the basis of cache hit. The advantage of cache
memory is that the CPU does not have to use the motherboard’s system bus for data transfer.
Whenever data must be passed through the system bus, the data transfer speed slows to the
motherboard’s capability. The CPU can process data much faster by avoiding the bottleneck
created by the system bus.
In the beginning, the program and the data associated with the program lie in the main
memory, and the cache is empty.
When the processor starts executing the program, it reads the instruction from the main
memory and places it on the processor chip (registers). Along with this, it places a copy of
each instruction on to the cache.
If execution of particular instruction requires any associated data, the processor access it
from the main memory and places a copy of it on the cache memory also.
Now consider these instructions have to be executed repeatedly (as in the case of a loop).
If the instructions are available in the cache, then the processor will directly access them
from the cache memory. As the cache is faster than the main memory, it will ultimately
fasten the execution.
Another Example: Browser Cache for example holds copies of recently accessed data such as a
web page and pictures on web pages. It keeps this data ready to "swap" onto your screen within
fractions of a second. So, instead of requiring your computer to go to the original web page and
photos in Denmark, the cache simply offers you the lastest copy from your own hard. drive.
This caching-and-swapping speeds up page viewing because the next time you request that page,
it is accessed from the cache on your computer instead of from the distant Web server.
(ii) Secondary Cache – It is also known as level 2 (L2) cache or external cache. The
secondary cache is located outside the CPU. It is normally positioned on the
motherboard of a computer. The secondary cache is larger than the primary cache but
slower.
(ii) L3 Cache: It is a specialized memory developed to improve the performance of L1 and
L2. It is larger and slower than both the L1 and L2 cache.
Cache RAM
Holds frequently used data by the Holds program and data that are currently being
CPU executed by the CPU
Registers
Registers are inbuilt memory units on the processor chip. It is the smallest memory and fastest
memory in a computer. It is not a part of the main memory and is located in the CPU.
A register temporarily holds frequently used data, instructions and memory address that are to be
used by the CPU. They hold instructions that are currently processed by the CPU. All data is
required to pass through registers before it can be processed. So they are used by the CPU to
process the data entered by the users.
The memory size of a register is from 2 MB up to a few KB. It can store one word of data. As it is
the nearest memory to the processor, it has the fastest access time.
All CPUs have some registers that store instructions, variables, and temporary results. CPU also
have some special registers for storing special data.
A program is a set of instructions that are brought to the main memory for execution. Now
accessing an instruction from the main memory takes longer time than its execution. Thus, the
CPU uses registers to hold the instructions, key variables and temporary results, this way, during
program execution, each time, an instruction or a word from the main memory is brought into the
register. The CPU then access the instructions from the register and perform the desired action.
CPU even stores temporary results and final results into the registers and from the register back to
the main memory.
Types of Registers
1. General Purpose Registers: Also referred to as a processor register. They serve a variety
of functions such as including holding operands that have been loaded from memory for
processing.
2. Memory Buffer Register (MBR): It stores a word fetched from the main memory or I/O
unit. It even stores the word that the process has to send back to the main memory or I/O
unit.
3. Memory Address Register (MAR): It specifies the address in a memory from where the
word will be read into MBR or where the word from MBR will be written into memory.
4. Instruction Register (IR): It holds an 8-bit opcode (machine instruction) that is currently
being executed.
5. Instruction Buffer Register (IBR): The IBR register temporarily holds the right-hand
instruction from the word in the memory.
6. Program Counter (PC): PC holds the memory address of the instruction that has to be
fetched next for execution.
7. Accumulator (AC): Accumulator holds temporary operands and results of any ALU
operations.
Cache Register
1 Cache is a smaller and faster memory Register is the smallest and fastest memory
unit of a computer system integrated into the computer's processor.
2 Access time is comparatively longer. Access time is shorter than cache unit
3 Cache memory is exactly a memory It is located on the CPU.
unit.
4 It stores recently used data It stores data that the CPU is currently
processing
5 Size : 2KB to a few MB Size: One word of data: ie up to 64bits
5 Cache can be located on the system's Registers are part of the computer's CPU.
motherboard or within the CPU.
Whenever the processor reads some data Whenever the processor identifies operands
from the main memory, it places a copy from the memory, it places them in registers
of it in the cache
6 Types: L1,L2and L3 Types: MAR, MBR, PC, AC etc
7 Web Page Cache, Database Query Loop counters is example of register
Cache, Prefetch Cache, etc. are examples
of Cache memory
In Conclusion, only the primary cache (L1) and all kinds of registers are present on the processor.
However, the registers are the smallest and most high-speed component of any computer. Although
both of them are smaller memory units of computers, they are used for different purposes. The
cache is used for storing recently used instructions and data, whereas the processor use registers
to store instruction and data that it is currently processing.
Secondary Storage
Secondary storage devices are used to store data for future use or as backup. It is the storage area
that allows the user to save and store data permanently. This type of memory does not lose the
data due to any power failure or system crash. That's why we also call it non-volatile storage.
Secondary storage includes memory devices that are not a part of the CPU chipset or motherboard,
for example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash drives, and magnetic
tapes.
Solid State Storage: Examples of solid state storage include solid state drives (SSD), memory
cards, and USB flash drives. Solid state and flash storage use electrical circuits to store data. If an
electrical circuit is high, it represents a binary 1, and if it is low, it represents a 0.
Magnetic Storage: This type of storage media is also known as online storage media. A magnetic
disk is used for storing the data for a long time. It is capable of storing an entire database. It is the
responsibility of the computer system to make availability of the data from a disk to the main
memory for further accessing. Also, if the system performs any operation over the data, the
modified data should be written back to the disk. The tremendous capability of a magnetic disk is
that it does not affect the data due to a system crash or failure, but a disk failure can easily ruin as
well as destroy the stored data. Example include Hard Disk (Internal Hard Disk and External Hard
Disk), hard drives and magnetic tape, Floppy disks
Tertiary Storage
It is the storage type that is external from the computer system. Tertiary storage is used to store
huge volumes of data. Since such storage devices are external to the computer system, they are the
slowest in speed. but it is capable of storing a large amount of data. Tertiary storage is generally
used for data backup. There are following tertiary storage devices available:
Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact
Disk (CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the
other hand, a Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each
side of the disk. Examples of optical storage include CD-ROM, CD-R, CD-RW, DVD, and
Blu-ray Discs.
Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for
archiving or backing up the data. It provides slow access to data as it accesses data
sequentially from the start. Thus, tape storage is also known as sequential-access storage.
Disk storage is known as direct-access storage as we can directly access the data from any
location on disk.
Storage Hierarchy
Besides the above, various other storage devices reside in the computer system. These storage
media are organized on the basis of data accessing speed, cost per unit of data to buy the medium,
and by medium's reliability. Thus, we can create a hierarchy of storage media on the basis of its
cost and speed.
Thus, on arranging the above-described storage media in a hierarchy according to its speed and
cost, we conclude the below-described image:
In the image, the higher levels are expensive but fast. On moving down, the cost per bit is
decreasing, and the access time is increasing. Also, the storage media from the main memory to
up represents the volatile nature, and below the main memory, all are non-volatile devices.
Information retrieval system is a system used to store items of information that need to be
processed, searched, retrieved and disseminated to various user populations.
Functions of Information Storage and Retrieval/Information Retrieval System (ISAR/IRS):
To identify sources of information (sources) relevant to the areas of interest of the target
user community,
To analyze the contents of the sources (information).
To represent the contents of the analyzed sources in a way that will be suitable for
matching user’s queries.
To analyze user’s queries and to represent them in a form that will be suitable for
matching the database.
To match the search statement with the stored database
To retrieve the information that is relevant.
To make necessary adjustments in the system based on feedback from the users.
Kinds of Information Retrieval System:
1. Offline Search: In offline search, users can get the required information with or without
the help of computer and internet for example: libraries, CD-ROM etc.
2. Online Search: means the search of a remotely located database through interactive
communications with the help of computer and communication channel. Online
databases can be access through vendor or directly. For example: OPAC, Databases,
Internet etc.
Retrieval Techniques:
Retrieval techniques are designed to help users to locate the information they need effectively and
efficiently. These techniques help users to find out the required information easily. There are two
types of retrieval techniques: Basic Retrieval Techniques and Advanced Retrieval Techniques
Basic Retrieval Techniques:
i. Boolean Search
ii. Truncation Searching
iii. Proximity Searching
iv. Range Searching
v. Case Sensitive Searching
Advanced Retrieval Technique:
i. Fuzzy Searching
ii. Query Expansion
iii. Multiple Database Searching
NOT: narrows your search by telling the database to eliminate all terms that follow it from
your search results. This can be useful when you are interested in a very specific aspect of
a topic (letting you weed out the issues that you're not planning to write about).
Example: searching for sex education NOT abstinence-only will return articles on sex
education, but not those dealing with abstinence-only approaches.
OR: The inclusion of more concepts to expand their connotation. It is used for broadening
a search. It allows users to combine two or more search terms; the system will retrieve all
those terms that contain either one or all of the constituent terms.
This is particularly helpful when you are searching for synonyms, such as “death penalty”
OR “capital punishment.” So, if you type in death penalty OR capital punishment, your
results will include articles with either term, but not necessarily both.
C. Proximity Searching: A proximity search allows users to specify how close two (or more)
words must be to each other in other to register a match.
For example, a search could be used to find "red brick house", and match phrases such as "red
house of brick" or "house made of red brick". By limiting the proximity, these phrases can be
matched while avoiding documents where the words are scattered or spread across a page or in
unrelated articles in an anthology.
D. Range Searching: It is very useful in numerical searching. It is important in selecting records
within certain data ranges.
The following options are usually available for range searching:
Greater than (˃) Less than (˂)
Equal to (=)
Not equal to (1=0 or ˂˃)
Greater than or equal to (˃ =)
Less than or equal to (˂ =)
Example: To search for a document or items that contain numbers withis a range, type
your search term and the range of numbers seperated by two perioss (..) with no spaces.
Eg to search for pencil that cost between N150 and N250 type he following:
Pencils N150..N250
E. Case Sensitive Searching: Text sometimes exhibit case sensitivity; that is, words can differ
in meaning based on differing use of uppercase and lowercase letters. Words with capital letters
do not always have the same meaning when written with lowercase letters. For example, Bill is
the first name of the former U.S president William Clinton who could sign a bill. For example,
Google searches are generally case insensitive and Gmail is case sensitive by default.
C. Multiple Database Searching: It means searching more than one Information Retrieval
System. The need for searching multiple databases seems threefold.
i. First, searching in single Information retrieval system may not get what the user is looking
for
ii. Secondly, multiple databases searching can serve as a selection tool if the user is not sure
which system would be the best choice for a given query
iii. Third, result obtained from multiple databases searching can suggest or indicate suitable
systems for the user to conduct further searches
Data and Information Capture and Representation
Data is represented on modern storage media using the binary numeral system.
All data stored on storage media – whether that’s hard disk drives (HDDs), solid state drives
(SSDs), external hard drives, USB flash drives, SD cards etc – can be converted to a string of bits,
otherwise known as binary digits. These binary digits have a value of 1 or 0, and the strings can
make up photos, documents, audio and video. A byte is the most common unit of storage and is
equal to 8 bits.
All data in a computer is stored as a number. For example, letters become numbers; the Complete
Works of Shakespeare is around 1250 pages in print, contains 40 million bits, with one byte per
letter, totalling five megabytes (5MB). Photographs are converted to a set of numbers that indicate
the location, colour and brightness of each pixel. Whereas convention numbers use ten digits (0,
1, 2, 3, 4, 5, 6, 7, 8, 9), binary numbers use two digits to represent all possible values. The
conventions numbers 0-8 translate into binary numbers as: 0, 1, 10, 11, 100, 101, 110, 111 and
1000. With binary numbers, any value can be stored as a series of items which are either true (1)
or false (0).
Binary data is primarily stored on the hard disk drive (HDD). The device is made up of a spinning
disk (or disks) with magnetic coatings and heads that can both read and write information in the
form of magnetic patterns. In addition to hard disk drives, floppy disks and tapes also store data
magnetically. Newer laptops, as well as mobile phones, tablets, USB flash drives and SD cards,
use solid state (or flash) storage. With this storage medium, the binary numbers are instead stored
as a series of electrical charges within the NAND flash chips. Because all data is made up of a
string of binary numbers, just one number out of place can cause a file to become corrupt.
Bits, Bytes, Nibble and Word
The term bits, bytes, nibble and word are used widely in reference to computer memory and data
size.
Bits: can be defined as either a binary, which can be 0 or 1. It is the basic unit of data or
information in digital computers
Byte: a group of bits (8 bits) used to represent a character. A byte is considered as the basic unit
of measuring memory size in computer
Nibble: is half a byte, which is usually a grouping of 4 bits.
Word: Two or more bytes make a word. The term word length is used as the measure of the
number of bits in each word. For example, a word can have a length of 16 bits, 32 bits, 64 bits
etc
Types of Data/Information Representation
Computers not only process numbers, letters and special symbols but also complex types of data
such as sound and pictures. However, these complex types of data take a lot of memory and
processor time when coded in binary form. This limitation necessitates the need to develop better
ways of handling long streams of binary digits. Higher number systems are used in computing to
reduce these streams of binary digits into manageable form. This helps to improve the processing
speed and optimize memory usage.
Number System and their Representation
A number system is a set of symbols used to represent values derived from a common base or
radix. As far as computers are concerned, number systems can be classified into two major
categories:
Decimal Number System
Binary Number System
Octal Number System
Hexadecimal Number System
0 0 0 0000
1 1 1 0001
2 2 2 0010
3 3 3 0011
4 4 4 0100
5 5 5 0101
6 6 6 0110
7 7 7 0111
8 10 8 1000
9 11 9 1001
A 12 10 1010
B 13 11 1011
C 14 12 1100
D 15 13 1101
E 16 14 1110
F 17 15 1111
Binary Coded Decimal (BCD): This is a 4-bit code used to represent numeric data only. For
example, a number like 9 can be represented using Binary Coded Decimal as 10012.
Numbers larger than 9, having two or more digits in the decimal system are expressed digit by
digit. For example, the BCD encoding of the base-10 number 127 is
Decimal 0 1 2 3 4 5 6 7 8 9
BCD 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001
Thus, the BCD encoding for the number 127 would be: 0001 0010 0111
The BCD code of a number is not the same as its simple binary representation. In binary form,
for example, the decimal quantity 127 appears as 1111111
Sector Use
Banking For customer information, account activities, payments, deposits, loans, etc
Telecommunication It helps to keep call records, monthly bills, maintaining balances, etc
Finance For storing information about stock, sales, and purchases of financial instruments like stocks
and bonds
Manufacturing It is used for the management of supply chain and for tracking production of items. Inventories
status in warehouses.
HR Management For information about employees, salaries, payroll, deduction, generation of paychecks, etc.