0% found this document useful (0 votes)

4 views18 pages

CSC 222 Lect I

CSC222: Data Management focuses on the collection, storage, organization, and processing of data securely and efficiently. It covers the importance of data management in eliminating redundancy, ensuring privacy and security, and maintaining data integrity, as well as various types of data storage and retrieval systems. The course also discusses primary storage types, including RAM and ROM, and the role of cache memory in enhancing data access speed.

Uploaded by

David Victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

4 views18 pages

CSC 222 Lect I

Uploaded by

David Victor

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 18

CSC222: DATA MANAGEMENT (2 UNITS)

Prerequisite: CSC201 Data storage and retrieval, information: capture and representation,
management applications, analysis and indexing search retrieval, privacy, integrity, security,
scalability, efficiency and effectiveness; data types, records and files. Files processing methods,
mapping logical organization on to physical storage. Backup procedures and file processing

DATA MANAGEMENT
Data Management is the practice of collecting, storing, organizing, verifying and processing data
securely, efficiently, and cost-effectively.
Data management can also be defined as an administrative process that includes acquiring,
validating, storing, protecting, and processing required data to ensure the accessibility, reliability,
and timeliness of the data for its users.
The goal of data management is to help people, organizations, and connected things optimize the
use of data within the bounds of policy and regulation so that they can make decisions and take
actions that maximize the benefit to the organization.
Operations of Data Management
1. Create, access, and update data across a diverse data tier
2. Store data across multiple clouds and on premises
3. Provide high availability and disaster recovery
4. Use data in a growing variety of apps, analytics, and algorithms
5. Ensure data privacy and security
6. Archive and destroy data in accordance with retention schedules and compliance
requirements

Importance of Data Management

1. Eliminate Data Redundancy: Often while processing data in file-based data management
systems, duplicate files are created. Even multiple copies of the same file are stored in different
locations in a system or across multiple systems. This leads to data redundancy. However, to
resolve or reduce these files you require additional manpower and space.
DMS allows you to reduce such repetitions by integrating all the files in a single database. While
the scattered data is getting converted into a single database, the system deletes all the duplicate
values. Besides, any change or duplicate entries are reflected almost immediately. Controlling data
redundancy through DMS results in more accurate data and huge savings on resources and
productive time.
2. Data Sharing and Privacy: DMS allows you to share the data among the authorized users of
the database. In a database, complete access belongs to the management and only the authorized
person can assign the level of access to other users after verifying all the protocols. As the users
have permission, they can view and modify the data files on their own as per their tasks.
3. Data Integrity and Security: DMS ensures the integrity and safety of your data. Data integrity
relates to data accuracy and consistency which play a major role as there are large volumes of data
in multiple databases. These databases are visible to different users who use the information
available to make business-related decisions. Thus, it gets essential to only include correct and
consistent data for all users.
Besides, safety is another aspect that is important to organizations. DMS allows only authorized
users to access the database, ensuring that your data is tamperproof, secure and theft free. It verifies
your identity by assigning a username and password to you.
4. Backup and Recovery: Data loss is one of the major concerns for organizations. In the usual
file processing system, you need to regularly backup the database that wastes lots of time and
resources. And, if you have large volumes of data then the process takes a lot of time.
With DMS, you don’t need to backup your data frequently. It duly takes care of the backup and
recovery process by automatically backing up your data at regular intervals. Besides, you don’t
even need to worry even if your system crashes in the middle of a process, or you have a system
failure. DMS restores the database to its last saved condition.
5. Data Consistency: There are multiple users who access the data for their respective tasks. Thus,
data consistency is a must for accurate business decisions. As DMS ensures no data redundancy,
data consistency is fairly easy to maintain.
All the data remains consistent for all the users. Even the minutest change to the database is
reflected in the database and visible to all who are using the database.
6. Operational Nimbleness: The speed at which a company can make decisions and change
direction is a key factor in determining how successful a company can be. If a company takes too
long to react to the market or its competitors it can spell disaster for the company.
With a good data management system, employees will be able to access information and be
notified of market or competitor changes faster. As a result, companies will be able to make
decisions and take action significantly faster than companies who have poor data management

Data and Information Storage and Retrieval

Data are raw representation of unprocessed facts, figures, concepts or instructions. It can exist in
form of text, numbers, videos, speech, images, audio etc
Examples of data are MP3 music file, video file, spreadsheets, a web page etc
Information Storage and Retrieval System is a branch of computer science relating to storage,
location, searching and selecting upon demand, relevant data on a given subject.
Data Storage
Data storage is a collective method and technology that captures and retain digital information on
electromagnetic, optical or silicon based media.
Data storage is a general term for archiving data in electromagnetic or other forms for use by a
computer or device. Different types of data storage play different roles in a computing
environment. In addition to forms of hard data storage, there are now new options for remote data
storage, such as cloud computing, that can revolutionize the ways that users access data.
Consider these questions to better understand what data storage is:
 What type of thing holds the data? For example, data can sit on hard disks, flash drives, Non-
Volatile Memory Express (NVME) systems etc
 Where is the data stored? For example, data can be stored on-premise, in server farms, on
the Internet of Things (IoT), or through a data storage service, as a cloud provider.
 How is the data stored? For example, solid-state drives use “electronically programmable and
erasable memory microchips” to store data. Other storage devices may use LightStore, an
environmentally friendly technology, or flash memory, an “electronic, non-volatile data storage
medium that is erased and reprogrammed electrically” to store data.

Types of Data/Information Storage

For storing the data, there are different types of storage options available. These storage types
differ from one another as per the speed and accessibility. These storage devices can be broadly
categorized into three types:
Primary Storage
Also known as Main Memory, it is the memory storage that is directly accessible to the CPU
comes under this category. CPU's internal memory (registers), fast memory (cache), and main
memory (RAM) are directly accessible to the CPU, as they are all placed on the motherboard or
CPU chipset. This storage is typically very small, ultra-fast, and volatile. Primary storage requires
continuous power supply in order to maintain its state. In case of a power failure, all its data is
lost.

Types of Primary Storage

 Main memory (RAM and ROM)
 Cache
Main Memory: It is the one that is responsible for operating the data that is available by the
storage medium. The main memory handles each instruction of a computer machine. This type of
memory can store gigabytes of data on a system but is small enough to carry the entire database.
At last, the main memory loses the whole content if the system shuts down because of power
failure or other reasons.
RAM: Random Access Memory, or RAM, is the primary storage of a computer.
When you’re working on a file on your computer, it will temporarily store data in your RAM.
RAM allows you to perform everyday tasks like opening applications, loading webpages, editing
a document or playing games. It also allows you to jump from one task to another without losing
your progress.
The memory space of RAM is limited and therefore all the files and instructions cannot be stored
in it. These files and instructions are normally stored in a different location known as secondary
storage and are copied from there to the RAM before execution. This technique is referred to as
swapping. The memory space available in RAM also affects the speed of a computer system. In
essence, the larger the RAM of your computer, the smoother and quicker it is for you to multitask.
As a result, the computer system need not read the data from the secondary storage again and
again, thus making the processing faster. The main memory is also responsible for holding
intermediate data transferred between CPU and I/O devices.
RAM is a volatile memory, meaning it cannot hold onto information once the system turns off.
For example, if you copy a block of text, restart your computer, and then attempt to paste that
block of text into a document, you’ll find that your computer has forgotten the copied text. This is
because it was only stored temporarily in your RAM.
RAM makes it possible for a computer to access data in a random order, and thus reads and writes
much faster than a computer’s secondary storage.
 SRAM: It stands for Static Random Access Memory. It consists of circuits that retain stored
information as long as the power is supply is on. It is also known as volatile memory. It is
used to build Cache memory. The access time of SRAM is much faster as compared to
DRAM but in terms of cost, it is costly as compared to DRAM.
 DRAM: It stands for Dynamic Random Access Memory. It is used to stores binary bits in
the form of electrical charges that are applied to capacitors. The access time of DRAM is
slower as compare to SRAM but it is cheaper than SRAM and has a high packaging density.
 SDRAM: It stands for Synchronous Dynamic Random Access Memory. It is faster than
DRAM. It is widely used in computers and others. After SDRAM was introduced, the
upgraded version of double data rate RAM, i.e., DDR1, DDR2, DDR3, and DDR4 was
entered into the market and widely used in home/office desktops and laptops.

SRAM DRAM

A type of Semiconductor memory that A type of Random Access Semiconductor

uses bi-stable latching circuitry (flip memory that stores each bit of data in a separate
flops) to store each bit tiny capacitor within an integrated circuit

Stands for Static Random Access Stands for Dynamic random Access memory
Memory

Very fast Not as fast as SRAM

Does not require refreshing cycles to Requires periodical refresh cycle to retain data
retain data

It has more complex circuitry and timing Not as complex as RAM

requirement

Used for CPU cache Used for computer main memory

Require minimum time to access data Requires more time to access data

Complex Structure; has flip-flop Simple structure; has a transistor and a

capacitor

Has a lower density Has a higher density

Expensive Less Expensive

ROM: This memory is used as the computer begins to boot up. Small programs called firmware
are often stored in ROM chips on hardware devices (like a BIOS chip), and they contain
instructions the computer can use in performing some of the most basic operations required to
operate hardware devices. ROM does not allow the random access of data rather it allows
sequential access of data. It is less expensive as compared to RAM and other storage devices such
as magnetic disk, etc. ROM memory cannot be easily or quickly overwritten or modified.
Types of PROM include:
 Programmable ROM, or PROM, is essentially a blank version of ROM that you can
purchase and program once with the help of a special tool called a programmer. A special
PROM programmer is employed to enter the program on the PROM. Once the chip has been
programmed, information on the PROM can’t be altered. PROM is non-volatile, that is data
is not lost when power is switched off.
 Erasable programmable ROM (EPROM) - A type of ROM that is programmed using high
voltages and exposure to ultraviolet light for about 20 minutes.
 Electrically-erasable programmable ROM (EEPROM) - Often used in older computer
chips and to control BIOS, EEPROM can be erased and reprogrammed several times while
enabling the erase and writing of only one location at a time. Flash memory is an updated
version of EEPROM that allows numerous memory locations to be changed at the same time.
The main difference between PROM EPROM and EEPROM is that PROM is programmable only
once while EPROM is reprogrammable using ultraviolet light and EPROM is reprogrammable
using an electric charge.
*Note: RAM and ROM are both located on the computer motherboard but in a separate plug in
chipset.

RAM ROM

Random Access Memory Read Only Memory

RAM is a volatile memory ROM is a non-volatile memory

RAM can be modified (allows reading and writing) ROM can’t be modified (allows reading only)

Temporary Storage Permanent Storage

Used in normal operations Used for startup process of the computer

It is a high speed memory It is much slower than RAM

Large size with higher capacity Small size with less capacity

Uses a lot of power Uses less power

Used in CPU cache, primary memory Used in firmware, microcontrollers

Requires flow of electricity to retain data Does not require flow of electricity to retain data
Cache: Cache Memory – Cache (pronounced cash) memory is extremely fast memory that is
built into a computer’s central processing unit (CPU), or located next to it on a separate chip. It is
a small, fast, and expensive memory that stores the copies of data that is needed to be accessed
frequently from the main memory. The processor, before reading data from or writing data to the
main memory, checks for the same data in the cache memory. If it finds the data in the cache
memory the processor reads the data from or writes the data to the cache itself because its access
time is much faster than the main memory. The transfer of data between the processor and the
cache memory is bidirectional. The availability of data in the cache is known as cache hit. The
capability of a cache memory is measured on the basis of cache hit. The advantage of cache
memory is that the CPU does not have to use the motherboard’s system bus for data transfer.
Whenever data must be passed through the system bus, the data transfer speed slows to the
motherboard’s capability. The CPU can process data much faster by avoiding the bottleneck
created by the system bus.

How does Cache Memory Works?

 In the beginning, the program and the data associated with the program lie in the main
memory, and the cache is empty.

 When the processor starts executing the program, it reads the instruction from the main
memory and places it on the processor chip (registers). Along with this, it places a copy of
each instruction on to the cache.

 If execution of particular instruction requires any associated data, the processor access it
from the main memory and places a copy of it on the cache memory also.

 Now consider these instructions have to be executed repeatedly (as in the case of a loop).
If the instructions are available in the cache, then the processor will directly access them
from the cache memory. As the cache is faster than the main memory, it will ultimately
fasten the execution.

Another Example: Browser Cache for example holds copies of recently accessed data such as a
web page and pictures on web pages. It keeps this data ready to "swap" onto your screen within
fractions of a second. So, instead of requiring your computer to go to the original web page and
photos in Denmark, the cache simply offers you the lastest copy from your own hard. drive.
This caching-and-swapping speeds up page viewing because the next time you request that page,
it is accessed from the cache on your computer instead of from the distant Web server.

There are three types of cache memory in computer system:

(i) Primary Cache – It is also known as level 1(L1) cache or internal cache. The primary
cache is located inside the CPU. It is the smallest among all caches but fastest type of
cache that provides a quick access to the frequently accessed data by the
microprocessor.

(ii) Secondary Cache – It is also known as level 2 (L2) cache or external cache. The
secondary cache is located outside the CPU. It is normally positioned on the
motherboard of a computer. The secondary cache is larger than the primary cache but
slower.
(ii) L3 Cache: It is a specialized memory developed to improve the performance of L1 and
L2. It is larger and slower than both the L1 and L2 cache.

Cache RAM

Holds frequently used data by the Holds program and data that are currently being
CPU executed by the CPU

Speed: Faster Not as fast

Cost: Expensive Not as expensive as Cache

Capacity: Lower Higher

Registers

Registers are inbuilt memory units on the processor chip. It is the smallest memory and fastest
memory in a computer. It is not a part of the main memory and is located in the CPU.
A register temporarily holds frequently used data, instructions and memory address that are to be
used by the CPU. They hold instructions that are currently processed by the CPU. All data is
required to pass through registers before it can be processed. So they are used by the CPU to
process the data entered by the users.
The memory size of a register is from 2 MB up to a few KB. It can store one word of data. As it is
the nearest memory to the processor, it has the fastest access time.
All CPUs have some registers that store instructions, variables, and temporary results. CPU also
have some special registers for storing special data.

How does Register Works?

A program is a set of instructions that are brought to the main memory for execution. Now
accessing an instruction from the main memory takes longer time than its execution. Thus, the
CPU uses registers to hold the instructions, key variables and temporary results, this way, during
program execution, each time, an instruction or a word from the main memory is brought into the
register. The CPU then access the instructions from the register and perform the desired action.
CPU even stores temporary results and final results into the registers and from the register back to
the main memory.

Types of Registers

1. General Purpose Registers: Also referred to as a processor register. They serve a variety
of functions such as including holding operands that have been loaded from memory for
processing.
2. Memory Buffer Register (MBR): It stores a word fetched from the main memory or I/O
unit. It even stores the word that the process has to send back to the main memory or I/O
unit.
3. Memory Address Register (MAR): It specifies the address in a memory from where the
word will be read into MBR or where the word from MBR will be written into memory.
4. Instruction Register (IR): It holds an 8-bit opcode (machine instruction) that is currently
being executed.
5. Instruction Buffer Register (IBR): The IBR register temporarily holds the right-hand
instruction from the word in the memory.
6. Program Counter (PC): PC holds the memory address of the instruction that has to be
fetched next for execution.
7. Accumulator (AC): Accumulator holds temporary operands and results of any ALU
operations.

Differences between Cache Memory and Register

Cache Register
1 Cache is a smaller and faster memory Register is the smallest and fastest memory
unit of a computer system integrated into the computer's processor.
2 Access time is comparatively longer. Access time is shorter than cache unit
3 Cache memory is exactly a memory It is located on the CPU.
unit.
4 It stores recently used data It stores data that the CPU is currently
processing
5 Size : 2KB to a few MB Size: One word of data: ie up to 64bits
5 Cache can be located on the system's Registers are part of the computer's CPU.
motherboard or within the CPU.
Whenever the processor reads some data Whenever the processor identifies operands
from the main memory, it places a copy from the memory, it places them in registers
of it in the cache
6 Types: L1,L2and L3 Types: MAR, MBR, PC, AC etc
7 Web Page Cache, Database Query Loop counters is example of register
Cache, Prefetch Cache, etc. are examples
of Cache memory

In Conclusion, only the primary cache (L1) and all kinds of registers are present on the processor.
However, the registers are the smallest and most high-speed component of any computer. Although
both of them are smaller memory units of computers, they are used for different purposes. The
cache is used for storing recently used instructions and data, whereas the processor use registers
to store instruction and data that it is currently processing.

Secondary Storage
Secondary storage devices are used to store data for future use or as backup. It is the storage area
that allows the user to save and store data permanently. This type of memory does not lose the
data due to any power failure or system crash. That's why we also call it non-volatile storage.
Secondary storage includes memory devices that are not a part of the CPU chipset or motherboard,
for example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash drives, and magnetic
tapes.
Solid State Storage: Examples of solid state storage include solid state drives (SSD), memory
cards, and USB flash drives. Solid state and flash storage use electrical circuits to store data. If an
electrical circuit is high, it represents a binary 1, and if it is low, it represents a 0.
Magnetic Storage: This type of storage media is also known as online storage media. A magnetic
disk is used for storing the data for a long time. It is capable of storing an entire database. It is the
responsibility of the computer system to make availability of the data from a disk to the main
memory for further accessing. Also, if the system performs any operation over the data, the
modified data should be written back to the disk. The tremendous capability of a magnetic disk is
that it does not affect the data due to a system crash or failure, but a disk failure can easily ruin as
well as destroy the stored data. Example include Hard Disk (Internal Hard Disk and External Hard
Disk), hard drives and magnetic tape, Floppy disks

Tertiary Storage
It is the storage type that is external from the computer system. Tertiary storage is used to store
huge volumes of data. Since such storage devices are external to the computer system, they are the
slowest in speed. but it is capable of storing a large amount of data. Tertiary storage is generally
used for data backup. There are following tertiary storage devices available:
 Optical Storage: An optical storage can store megabytes or gigabytes of data. A Compact
Disk (CD) can store 700 megabytes of data with a playtime of around 80 minutes. On the
other hand, a Digital Video Disk or a DVD can store 4.7 or 8.5 gigabytes of data on each
side of the disk. Examples of optical storage include CD-ROM, CD-R, CD-RW, DVD, and
Blu-ray Discs.
 Tape Storage: It is the cheapest storage medium than disks. Generally, tapes are used for
archiving or backing up the data. It provides slow access to data as it accesses data
sequentially from the start. Thus, tape storage is also known as sequential-access storage.
Disk storage is known as direct-access storage as we can directly access the data from any
location on disk.

Storage Hierarchy
Besides the above, various other storage devices reside in the computer system. These storage
media are organized on the basis of data accessing speed, cost per unit of data to buy the medium,
and by medium's reliability. Thus, we can create a hierarchy of storage media on the basis of its
cost and speed.
Thus, on arranging the above-described storage media in a hierarchy according to its speed and
cost, we conclude the below-described image:

In the image, the higher levels are expensive but fast. On moving down, the cost per bit is
decreasing, and the access time is increasing. Also, the storage media from the main memory to
up represents the volatile nature, and below the main memory, all are non-volatile devices.

Other Examples of Storage Devices

Cloud Storage: Cloud storage is a cloud computing model that stores data on the Internet through
a cloud computing provider who manages and operates data storage as a service. It’s delivered on
demand with just-in-time capacity and costs, and eliminates buying and managing your own data
storage infrastructure. This gives you agility, global scale and durability, with “anytime,
anywhere” data access. Data is stored in and accessible from multiple distributed and connected
resources that comprise a cloud.
Cloud storage can provide the benefits of greater accessibility and reliability; rapid deployment;
strong protection for data backup, archival and disaster recovery purposes.
Cloud storage devices include Google Drive, ICloud, DropBox, Microsoft One Drive, IDrive,
Amazon Drive, PCloud Flick retc.
Information Retrieval

Information retrieval system is a system used to store items of information that need to be
processed, searched, retrieved and disseminated to various user populations.
Functions of Information Storage and Retrieval/Information Retrieval System (ISAR/IRS):
 To identify sources of information (sources) relevant to the areas of interest of the target
user community,
 To analyze the contents of the sources (information).
 To represent the contents of the analyzed sources in a way that will be suitable for
matching user’s queries.
 To analyze user’s queries and to represent them in a form that will be suitable for
matching the database.
 To match the search statement with the stored database
 To retrieve the information that is relevant.
 To make necessary adjustments in the system based on feedback from the users.
Kinds of Information Retrieval System:
1. Offline Search: In offline search, users can get the required information with or without
the help of computer and internet for example: libraries, CD-ROM etc.
2. Online Search: means the search of a remotely located database through interactive
communications with the help of computer and communication channel. Online
databases can be access through vendor or directly. For example: OPAC, Databases,
Internet etc.
Retrieval Techniques:
Retrieval techniques are designed to help users to locate the information they need effectively and
efficiently. These techniques help users to find out the required information easily. There are two
types of retrieval techniques: Basic Retrieval Techniques and Advanced Retrieval Techniques
Basic Retrieval Techniques:
i. Boolean Search
ii. Truncation Searching
iii. Proximity Searching
iv. Range Searching
v. Case Sensitive Searching
Advanced Retrieval Technique:
i. Fuzzy Searching
ii. Query Expansion
iii. Multiple Database Searching

1. Basic Retrieval Techniques:

vi. Boolean Search
vii. Truncation Searching
viii. Proximity Searching
ix. Range Searching
x. Case Sensitive Searching
A. Boolean Searching: Boolean search is a type of search allowing users to combine keywords
with operators (or modifiers) such as AND, NOT and OR to further produce more relevant results..
 AND: It includes addition of two different concepts for narrowing down the search. It
retrieves all those items where all the constituent terms occur.
For instance, if you're interested in reading articles about how young people feel about
politics, you can search for youth AND politics. All articles in your results will
include both keywords. This gives you limited result because your result must contain the
two search terms.

NOT: narrows your search by telling the database to eliminate all terms that follow it from
your search results. This can be useful when you are interested in a very specific aspect of
a topic (letting you weed out the issues that you're not planning to write about).
Example: searching for sex education NOT abstinence-only will return articles on sex
education, but not those dealing with abstinence-only approaches.

OR: The inclusion of more concepts to expand their connotation. It is used for broadening
a search. It allows users to combine two or more search terms; the system will retrieve all
those terms that contain either one or all of the constituent terms.
This is particularly helpful when you are searching for synonyms, such as “death penalty”
OR “capital punishment.” So, if you type in death penalty OR capital punishment, your
results will include articles with either term, but not necessarily both.

Example Boolean Search Terms

 AND: Include two search terms. Example network AND administrator
 OR: Broaden your search with multiple terms. Example: “network administrator”
OR “network manager”
 NOT: Use to exclude a specific term. Example: administrator NOT manager

B. Truncation searching (also known as Wildcard): Truncation allows a search to be conducted

for all the different forms of a word having the same common roots. To truncate a search term, do
a keyword search in a database, but remove the ending of the word and add an asterisk (*) to the
end of the word. The database will retrieve results that include every word that begins with the
letters you entered.
For example, if you type in the keyword, interact* the database will search for interact,
interacting, interaction, and interactivty.
There are three types of truncation.
i. Right truncation: Truncation is on right side of the term. For example... Network* will
retrieve documents as networks, networking.
ii. Left truncation: Truncation is on left side of the term. For example... *hyl will retrieve
words such as methyl and ethyl
iii. Middle truncation: with left and right side. For example... *Colo*r will retrieve both the
term colour and color.
Wildcard: They are useful when the multiple spelling of words can affect your search. Some
databases use !or ? or #
Example: Wom!n= Woman, Women
Coloni?e= Colonize, Colonise

C. Proximity Searching: A proximity search allows users to specify how close two (or more)
words must be to each other in other to register a match.
For example, a search could be used to find "red brick house", and match phrases such as "red
house of brick" or "house made of red brick". By limiting the proximity, these phrases can be
matched while avoiding documents where the words are scattered or spread across a page or in
unrelated articles in an anthology.
D. Range Searching: It is very useful in numerical searching. It is important in selecting records
within certain data ranges.
The following options are usually available for range searching:
 Greater than (˃) Less than (˂)
 Equal to (=)
 Not equal to (1=0 or ˂˃)
 Greater than or equal to (˃ =)
 Less than or equal to (˂ =)
Example: To search for a document or items that contain numbers withis a range, type
your search term and the range of numbers seperated by two perioss (..) with no spaces.
Eg to search for pencil that cost between N150 and N250 type he following:
Pencils N150..N250

E. Case Sensitive Searching: Text sometimes exhibit case sensitivity; that is, words can differ
in meaning based on differing use of uppercase and lowercase letters. Words with capital letters
do not always have the same meaning when written with lowercase letters. For example, Bill is
the first name of the former U.S president William Clinton who could sign a bill. For example,
Google searches are generally case insensitive and Gmail is case sensitive by default.

2. Advanced Retrieval Techniques.

1. Fuzzy Searching
2. Query Expansion
3. Multiple Database Searching
A. Fuzzy Searching: Fuzzy searching is designed to find terms that are spelled incorrectly at
data entry or query point. For example the term computer could be misspelled as computer,
compiter or comyter. Optical Character Recognition (OCR) and text compression could also
result in erroneous results. Fuzzy searching is designed for detection and correction of spelling
errors that result from Optical Character Recognition (OCR) and text compression.
Example: When you type “Comyter” and the system ask “Do you mean Computer”?
B. Query Expansion: Query expansion (QE) is a process in Information Retrieval which consists
of selecting and adding terms to the user's query with the goal of minimizing query-document
mismatch and thereby improving retrieval performance. Example the query “vp’ becomes “VP or
Vice President”.

C. Multiple Database Searching: It means searching more than one Information Retrieval
System. The need for searching multiple databases seems threefold.
i. First, searching in single Information retrieval system may not get what the user is looking
for
ii. Secondly, multiple databases searching can serve as a selection tool if the user is not sure
which system would be the best choice for a given query
iii. Third, result obtained from multiple databases searching can suggest or indicate suitable
systems for the user to conduct further searches
Data and Information Capture and Representation
Data is represented on modern storage media using the binary numeral system.
All data stored on storage media – whether that’s hard disk drives (HDDs), solid state drives
(SSDs), external hard drives, USB flash drives, SD cards etc – can be converted to a string of bits,
otherwise known as binary digits. These binary digits have a value of 1 or 0, and the strings can
make up photos, documents, audio and video. A byte is the most common unit of storage and is
equal to 8 bits.
All data in a computer is stored as a number. For example, letters become numbers; the Complete
Works of Shakespeare is around 1250 pages in print, contains 40 million bits, with one byte per
letter, totalling five megabytes (5MB). Photographs are converted to a set of numbers that indicate
the location, colour and brightness of each pixel. Whereas convention numbers use ten digits (0,
1, 2, 3, 4, 5, 6, 7, 8, 9), binary numbers use two digits to represent all possible values. The
conventions numbers 0-8 translate into binary numbers as: 0, 1, 10, 11, 100, 101, 110, 111 and
1000. With binary numbers, any value can be stored as a series of items which are either true (1)
or false (0).
Binary data is primarily stored on the hard disk drive (HDD). The device is made up of a spinning
disk (or disks) with magnetic coatings and heads that can both read and write information in the
form of magnetic patterns. In addition to hard disk drives, floppy disks and tapes also store data
magnetically. Newer laptops, as well as mobile phones, tablets, USB flash drives and SD cards,
use solid state (or flash) storage. With this storage medium, the binary numbers are instead stored
as a series of electrical charges within the NAND flash chips. Because all data is made up of a
string of binary numbers, just one number out of place can cause a file to become corrupt.
Bits, Bytes, Nibble and Word
The term bits, bytes, nibble and word are used widely in reference to computer memory and data
size.
Bits: can be defined as either a binary, which can be 0 or 1. It is the basic unit of data or
information in digital computers
Byte: a group of bits (8 bits) used to represent a character. A byte is considered as the basic unit
of measuring memory size in computer
Nibble: is half a byte, which is usually a grouping of 4 bits.
Word: Two or more bytes make a word. The term word length is used as the measure of the
number of bits in each word. For example, a word can have a length of 16 bits, 32 bits, 64 bits
etc
Types of Data/Information Representation
Computers not only process numbers, letters and special symbols but also complex types of data
such as sound and pictures. However, these complex types of data take a lot of memory and
processor time when coded in binary form. This limitation necessitates the need to develop better
ways of handling long streams of binary digits. Higher number systems are used in computing to
reduce these streams of binary digits into manageable form. This helps to improve the processing
speed and optimize memory usage.
Number System and their Representation
A number system is a set of symbols used to represent values derived from a common base or
radix. As far as computers are concerned, number systems can be classified into two major
categories:
 Decimal Number System
 Binary Number System
 Octal Number System
 Hexadecimal Number System

Hexadecimal digit Octal equivalent Decimal equivalent Binary equivalent

0 0 0 0000
1 1 1 0001
2 2 2 0010
3 3 3 0011
4 4 4 0100
5 5 5 0101
6 6 6 0110
7 7 7 0111
8 10 8 1000
9 11 9 1001
A 12 10 1010
B 13 11 1011
C 14 12 1100
D 15 13 1101
E 16 14 1110
F 17 15 1111

Symbolic Representation Using Coding Schemes

In computing, a single character such as a letter, a number or a symbol is represented by a group
of bits. It is easier for the computer to process numbers, but it is a difficult process to handle text.
Therefore, the characters are encoded. These coding schemes help to represent test in computers,
telecommunication devices, and other electronic devices. The number of bits per character depends
on the coding scheme used. There are various character encoding standards. The most common
coding schemes are:
 Binary Coded Decimal (BCD)
 Extended Binary Coded Decimal Interchange Code (EBCDIC)
 American Standard Code for Information Interchange (ASCII)

Binary Coded Decimal (BCD): This is a 4-bit code used to represent numeric data only. For
example, a number like 9 can be represented using Binary Coded Decimal as 10012.
Numbers larger than 9, having two or more digits in the decimal system are expressed digit by
digit. For example, the BCD encoding of the base-10 number 127 is
Decimal 0 1 2 3 4 5 6 7 8 9

BCD 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001

Thus, the BCD encoding for the number 127 would be: 0001 0010 0111
The BCD code of a number is not the same as its simple binary representation. In binary form,
for example, the decimal quantity 127 appears as 1111111

Example: Convert (123)10 in BCD

From the truth table above,
1 : 0001
2 : 0010
3 : 0011
thus, (123)10 becomes : 0001 0010 0011
BCD is mostly used in a simple electronic device like calculators and microwaves. This is because
it is easier to process and display individual numbers on their Liquid Crystal Display (LCD)
Screens.
Extended Binary Coded Decimal Interchange Code (EBCDIC): Extended binary coded
decimal interchange code (EBCDIC) is an 8-bit binary code representation in which symbols,
letters and numbers are presented in binary language.. It was developed in 1963 and used by IBM.
EBCDIC is widely used in IBM midrange and mainframe computers. EBCDIC was developed to
enhance the existing capabilities of binary-coded decimal code. This code is used in text files of
S/390 servers and OS/390 operating systems of IBM.A total of 256 (28) characters can be coded
using EBCDIC scheme. Each byte consists of two nibbles, each four bits wide. The first four bits
define the class of character, while the second nibble defines the specific character inside that class.
American Standard Code for Information Interchange (ASCII): ASCII is a 7-bit code, which
means that only 128 characters i.e 27 can represented. It is an encoding standard that represents
digits, letters, and symbols using numbers. However, manufacturers have added an eight bit to this
coding scheme which can now provide for 256 characters. This 8-bit coding scheme is referred to
as an 8-bit ASCII. ASCII is compatible with modern encodings and is more efficient. ASCII codes
represent text in computers, telecommunications equipment, and other devices. Most modern
character-encoding schemes are based on ASCII. ASCII is mainly used in programming, data
conversions, graphic arts and text files. There is a new version of ASCII called Extended ASCII.
It includes the standard ASCII characters with additional characters.

Data Management System Application

Sector Use

Banking For customer information, account activities, payments, deposits, loans, etc

Airlines For reservations and schedule information

Universities For student information, course registrations, colleges and grades

Telecommunication It helps to keep call records, monthly bills, maintaining balances, etc

Finance For storing information about stock, sales, and purchases of financial instruments like stocks
and bonds

Sales Use for storing customer, product & sales information

Manufacturing It is used for the management of supply chain and for tracking production of items. Inventories
status in warehouses.

HR Management For information about employees, salaries, payroll, deduction, generation of paychecks, etc.

8086 Microprocessor Trainer Kit - PDF 2
0% (1)
8086 Microprocessor Trainer Kit - PDF 2
116 pages
SIC XE Architecture
No ratings yet
SIC XE Architecture
9 pages
Define Each of The Following Terms: A. Data
0% (1)
Define Each of The Following Terms: A. Data
15 pages
dsPIC33 PIC24 FRM, Flash Programming DS70000609E
No ratings yet
dsPIC33 PIC24 FRM, Flash Programming DS70000609E
26 pages
Bca III Semester Dbms Notes Unit Wise Unit 1 - Jims (Pdfdrive)
No ratings yet
Bca III Semester Dbms Notes Unit Wise Unit 1 - Jims (Pdfdrive)
97 pages
Manual Picbasic Pro Compiler
100% (1)
Manual Picbasic Pro Compiler
220 pages
كتاب برمجة التحكم المنطقى plc ريمون كمال الجزء الاول PDF
100% (1)
كتاب برمجة التحكم المنطقى plc ريمون كمال الجزء الاول PDF
224 pages
Api Delta
No ratings yet
Api Delta
88 pages
Database Management System Answer Key - Activity 1
No ratings yet
Database Management System Answer Key - Activity 1
10 pages
RDBMS For BCOM 6th (Old) and 3rd (New) Sem
80% (5)
RDBMS For BCOM 6th (Old) and 3rd (New) Sem
110 pages
DBMS Unit - 1
No ratings yet
DBMS Unit - 1
12 pages
Dbms Material
No ratings yet
Dbms Material
198 pages
ALI M5661 Firmware Programming Guide
No ratings yet
ALI M5661 Firmware Programming Guide
235 pages
DBMS Unit1 Notes
No ratings yet
DBMS Unit1 Notes
25 pages
DBMS Unit1 BCA Notes
No ratings yet
DBMS Unit1 BCA Notes
41 pages
Dbms Unit-3
No ratings yet
Dbms Unit-3
26 pages
Dbms Book
100% (1)
Dbms Book
85 pages
Varian 620i Reference Manual Mar68
No ratings yet
Varian 620i Reference Manual Mar68
168 pages
Z8000 Asm
No ratings yet
Z8000 Asm
312 pages
CPU Organisation: Instructions and Instruction Sequencing
No ratings yet
CPU Organisation: Instructions and Instruction Sequencing
30 pages
Data and Network Infrastructure
No ratings yet
Data and Network Infrastructure
39 pages
pdf24 Merged
No ratings yet
pdf24 Merged
225 pages
RDBMS Notes
No ratings yet
RDBMS Notes
227 pages
4th Sem BSC Dbms
No ratings yet
4th Sem BSC Dbms
84 pages
DBMS Ctevt Students
100% (1)
DBMS Ctevt Students
230 pages
Data Transfers, Addressing, and Arithmetic
No ratings yet
Data Transfers, Addressing, and Arithmetic
23 pages
MX 29 LV 160
No ratings yet
MX 29 LV 160
66 pages
IVC1 Series PLC Manual Seccion II PDF
No ratings yet
IVC1 Series PLC Manual Seccion II PDF
98 pages
What Is Arithmetic Instructions in 8086 Microprocessor
100% (1)
What Is Arithmetic Instructions in 8086 Microprocessor
2 pages
COSS - Lecture - 5 - With Annotation
No ratings yet
COSS - Lecture - 5 - With Annotation
23 pages
Data Storage
No ratings yet
Data Storage
3 pages
CA Tut14 ANS
No ratings yet
CA Tut14 ANS
2 pages
Unit 1 Rdbms
No ratings yet
Unit 1 Rdbms
42 pages
Assignment 1
No ratings yet
Assignment 1
6 pages
DBMS Unit-1
No ratings yet
DBMS Unit-1
37 pages
WA0002
No ratings yet
WA0002
22 pages
Unit-1a DBMS
No ratings yet
Unit-1a DBMS
26 pages
24 LC 128
No ratings yet
24 LC 128
26 pages
Introduction To Database
No ratings yet
Introduction To Database
28 pages
BA 227 Midterm Exam - Tibay, Krismar
No ratings yet
BA 227 Midterm Exam - Tibay, Krismar
8 pages
Circuit Breaker System Using Iot Based Smart Load Control
No ratings yet
Circuit Breaker System Using Iot Based Smart Load Control
29 pages
CS1202 Foc Unit-2
No ratings yet
CS1202 Foc Unit-2
31 pages
Data Base Management System
No ratings yet
Data Base Management System
18 pages
Basic Electronics
No ratings yet
Basic Electronics
13 pages
Storage by Abdul Naseer 8-B
No ratings yet
Storage by Abdul Naseer 8-B
9 pages
Unit 4
No ratings yet
Unit 4
8 pages
Microprocessor and Microcontroller Part 1
No ratings yet
Microprocessor and Microcontroller Part 1
45 pages
Data Storage 3.3
No ratings yet
Data Storage 3.3
9 pages
Database Management NOTES
No ratings yet
Database Management NOTES
15 pages
Aim: Apparatus: Theory:: 8086 Trainer Kit
No ratings yet
Aim: Apparatus: Theory:: 8086 Trainer Kit
8 pages
Unit 1 StorageTechnologies
No ratings yet
Unit 1 StorageTechnologies
52 pages
Best Programming Manual February 1994
No ratings yet
Best Programming Manual February 1994
18 pages
Unit - 1
No ratings yet
Unit - 1
35 pages
Document 33
No ratings yet
Document 33
12 pages
Bengal College of Engineering and Technology: Report On Storage Strategies
No ratings yet
Bengal College of Engineering and Technology: Report On Storage Strategies
15 pages
Unit-1a DBMS
No ratings yet
Unit-1a DBMS
25 pages
IM 101 - Fundamentals of Database Systems - Unit 1
No ratings yet
IM 101 - Fundamentals of Database Systems - Unit 1
13 pages
Theory Assignment 03
No ratings yet
Theory Assignment 03
17 pages
Document 33
No ratings yet
Document 33
10 pages
Document 33
No ratings yet
Document 33
3 pages
DBMS Unit1 Notes
No ratings yet
DBMS Unit1 Notes
24 pages
Goat On DBMS
No ratings yet
Goat On DBMS
31 pages
12F509 PDF
No ratings yet
12F509 PDF
20 pages
Introduction To Data Base
No ratings yet
Introduction To Data Base
35 pages
Data Base Management System
No ratings yet
Data Base Management System
61 pages
DBMS
No ratings yet
DBMS
6 pages
Siemens S7 1200 S7 1500 S7CommPlus Symbolic Addressing Ethernet
No ratings yet
Siemens S7 1200 S7 1500 S7CommPlus Symbolic Addressing Ethernet
9 pages
Ccs367-Storage Technologies-Unit - I
No ratings yet
Ccs367-Storage Technologies-Unit - I
53 pages
Chapter9 (Databases)
No ratings yet
Chapter9 (Databases)
8 pages
Information Technology: Chapter Two
No ratings yet
Information Technology: Chapter Two
56 pages
Chapter 4
No ratings yet
Chapter 4
54 pages
S7 WCF Blocks Motor e
No ratings yet
S7 WCF Blocks Motor e
62 pages
Unit 1
No ratings yet
Unit 1
12 pages
CMP 214
No ratings yet
CMP 214
66 pages
Module 2-Database and DBMS
No ratings yet
Module 2-Database and DBMS
38 pages
DataBase Management (VBSPU 4th Sem)
No ratings yet
DataBase Management (VBSPU 4th Sem)
108 pages
CC9554-4 V2Y SPEC SW Serial Export Protocol
No ratings yet
CC9554-4 V2Y SPEC SW Serial Export Protocol
50 pages
Managing Information and Technology Lesson 2
No ratings yet
Managing Information and Technology Lesson 2
8 pages
DBMS (R20) Unit - 1
No ratings yet
DBMS (R20) Unit - 1
14 pages
Database
No ratings yet
Database
6 pages
Dbms Unit 1 Notes
No ratings yet
Dbms Unit 1 Notes
17 pages
Database Concepts
No ratings yet
Database Concepts
11 pages
Dbms PDF
No ratings yet
Dbms PDF
20 pages
Module 1 - DBMS
No ratings yet
Module 1 - DBMS
13 pages
DBMS
No ratings yet
DBMS
6 pages
Heidenhain Device Driver
No ratings yet
Heidenhain Device Driver
12 pages
LESSON 1 Information Management
No ratings yet
LESSON 1 Information Management
5 pages
Field: A Character or A Group of Characters (Alphabetic or Numeric) That Has A Specific Meaning. A Field Is Used To Define and Store Data
No ratings yet
Field: A Character or A Group of Characters (Alphabetic or Numeric) That Has A Specific Meaning. A Field Is Used To Define and Store Data
6 pages
Database Management System
From Everand
Database Management System
Manish Soni
No ratings yet
Beginner's Guide for Cybercrime Investigators
From Everand
Beginner's Guide for Cybercrime Investigators
Nicolae Sfetcu
5/5 (1)

CSC 222 Lect I

Uploaded by

CSC 222 Lect I

Uploaded by

CSC222: DATA MANAGEMENT (2 UNITS)

Importance of Data Management

Data and Information Storage and Retrieval

Types of Data/Information Storage

Types of Primary Storage

A type of Semiconductor memory that A type of Random Access Semiconductor

Very fast Not as fast as SRAM

It has more complex circuitry and timing Not as complex as RAM

Used for CPU cache Used for computer main memory

Complex Structure; has flip-flop Simple structure; has a transistor and a

Has a lower density Has a higher density

Expensive Less Expensive

Random Access Memory Read Only Memory

RAM is a volatile memory ROM is a non-volatile memory

Temporary Storage Permanent Storage

Used in normal operations Used for startup process of the computer

It is a high speed memory It is much slower than RAM

Uses a lot of power Uses less power

Used in CPU cache, primary memory Used in firmware, microcontrollers

How does Cache Memory Works?

There are three types of cache memory in computer system:

Speed: Faster Not as fast

Cost: Expensive Not as expensive as Cache

Capacity: Lower Higher

How does Register Works?

Differences between Cache Memory and Register

Other Examples of Storage Devices

1. Basic Retrieval Techniques:

Example Boolean Search Terms

B. Truncation searching (also known as Wildcard): Truncation allows a search to be conducted

2. Advanced Retrieval Techniques.

Hexadecimal digit Octal equivalent Decimal equivalent Binary equivalent

Symbolic Representation Using Coding Schemes

Example: Convert (123)10 in BCD

Data Management System Application

Airlines For reservations and schedule information

Universities For student information, course registrations, colleges and grades

Sales Use for storing customer, product & sales information

You might also like