0% found this document useful (0 votes)
25 views16 pages

CSC 204 Session 1

Download as pdf or txt
Download as pdf or txt
Download as pdf or txt
You are on page 1/ 16

CSC 204: Fundamentals of Data Structures

Study Session 1: Basic Concepts of Data Structure

Introduction
Computers can store and process vast amounts of data. Considering the large amount of data stored
on a computer system, how do you organize information so that you can find, update, add or delete
portions of it efficiently? This is done through the use of appropriate data structure. Data structures
enable a programmer to mentally structure large amounts of data into conceptually manageable
relationships.

In this study session, you will learn about the basic concepts of data structures and their usefulness
in the field of computer science. You will get to know about System Life-Cycle, Data types and
data abstraction. Just as Mathematics is the “Queen and servant of science”, the contents covered
by this study session are fundamental to all aspects of computer science.

Learning Outcomes for Study Session 1


On completion of this study session, you should be able to:
1.1 Familiarize yourself with the overview of Data Structures
1.2 Describe the overview of System Life-Cycle
1.3 Explain Data Abstraction.
1.4 Briefly explain Performance Analysis

Page 23 of 269
CSC 204: Fundamentals of Data Structures

1.1 Overview of Data Structures


In computer science, a data structure is a particular way of storing and organizing data in a
computer so that it can be used efficiently. Data structures provide a means to manage huge
amounts of data efficiently for uses such as large databases and internet indexing services.

Usually, efficient data structures are a key to designing efficient algorithms. Some formal design
methods and programming languages emphasize data structures, rather than algorithms, as the key
organizing factor in software design.

1.1.1 Basic Principles


Data structures are generally based on the ability of a computer to fetch and store data at any place
in its memory, specified by an address—a bit string that can be stored in memory and manipulated
by the program. For example, record and array data structures are based on computing the
addresses of data items with arithmetic operations; while the linked data structures are based on
storing addresses of data items within the structure itself. Many data structures use both principles,
sometimes combined in non-trivial ways (as in XOR linking).

The implementation of a data structure usually requires writing a set of procedures that create and
manipulate instances of that structure. The efficiency of a data structure cannot be analysed
separately from those operations. This observation motivates the theoretical concept of an abstract
data type, a data structure that is defined indirectly by the operations that may be performed on it,
and the mathematical properties of those operations (including their space and time cost).

Box 1.1: Data Structure

Data structure is a particular way of storing and organizing data in a computer so that it can be used
efficiently.

1.1.2 Language Support


Most assembly languages and some low-level languages, such as BCPL (Basic Combined
Programming Language) lack support for data structures. Many high-level programming languages,

Page 24 of 269
CSC 204: Fundamentals of Data Structures

and some higher-level assembly languages, such as MASM, on the other hand, have special syntax
or other built-in support for certain data structures, such as vectors (one-dimensional arrays) in the
C language or multi-dimensional arrays in Pascal.

Most programming languages feature some sorts of library mechanism that allows data structure
implementations to be reused by different programs. Modern languages usually come with standard
libraries that implement the most common data structures. Examples are the C++ Standard
Template Library, the Java Collections Framework, and Microsoft's .NET Framework.

Modern languages also generally support modular programming, the separation between the
interface of a library module and its implementation. Some provide opaque data types that allow
clients to hide implementation details. Object-oriented programming languages, such as C++, Java
and .NET Framework may use classes for this purpose. Many known data structures have
concurrent versions that allow multiple computing threads to access the data structure
simultaneously.

Pilot Question 1.1


i. What is Data Structure?
ii. Give examples of modern programming languages that come with standard libraries that
implements data structures as discussed in this study session.

1.2 Overview of System Life-Cycle


The Systems Development Life Cycle (SDLC), also referred to as the application development
life-cycle, is a term used in systems engineering, information systems and software engineering
to describe a process for planning, creating, testing, and deploying an information system.

SDLC is used during the development of an IT project; it describes the different stages involved
in the project from the drawing board, through the completion of the project.

Page 25 of 269
CSC 204: Fundamentals of Data Structures

System
Development
Life
Cycle

Figure1.1: SDLC Phases

The system development life cycle framework provides a sequence of activities for system
designers and developers to follow. It consists of a set of steps or phases in which each phase of
the SDLC uses the results of the previous one. They are as follows;

1.2.1 Planning and Requirements Analysis


The planning and requirement analysis phase is the first phase of SDLC. The objective of this phase
is to specify problem and result information. Here, the purpose of the software or system will be
determined, the goals of what it needs to accomplish need to be established and a set of definite
requirements can be developed.
a. Conduct the preliminary analysis: In this step, you need to find out the organization's
objectives, the nature and scope of the problem under study. Even if a problem refers only
to a small segment of the organization itself, then you need to find out what the objectives
of the organization itself are. Then, you need to see how the problem being studied fits in
with them.
b. Propose alternative solutions: In digging into the organization's objectives and specific
problems, you may have already covered some solutions. Alternate proposals may come
from interviewing employees, clients, suppliers, and/or consultants. You can also study

Page 26 of 269
CSC 204: Fundamentals of Data Structures

what competitors are doing. With this finding, you will have three choices: leave the system
as it is, improve it, or develop a new system.

1.2.2 System Analysis


This is the phase of SDLC where the entire system is defined in detail. In fact, it is at this stage that
a detailed blueprint of various processes of the software is developed. If needed, the system is
divided into smaller parts to make it easier and more manageable for the developers, designers,
testers, project managers and other professionals who are going to work on the software in the latter
stages.

1.2.3 System Design


In this phase, the physical system is designed with the help of the logical design prepared by system
analysts. The analysts and designers work together and use certain tools and software to create the
overall system design, including the probable output.

1.2.4 Refinement and Coding


As the name implies, in this stage the software is coded with precision. A team of programmers are
assigned by the company to work on the software. More often than not, the work is sub-divided
under a sub-phase called Task Allocation, where each developer is assigned a part of the work
depending on his or her skill set(s). This helps complete the coding efficiently.

1.2.5 Testing
When the software is ready, it is sent to the testing department where Quality Analysts test it
thoroughly for different errors by forming various test cases. They either test the software manually
or use automated testing tools to ensure that each and every component of the software works fine.
Once the QA makes sure that the software is error-free, it goes to the next stage, which is
Implementation.

1.2.6 Implementation and Maintenance


This is the final stage of software development life cycle. In this stage, the software is run on
various systems by users. If it runs smoothly on these systems without any flaw, then it is
considered ready to be launched.

Page 27 of 269
CSC 204: Fundamentals of Data Structures

During the maintenance stage of the SDLC, the system is assessed to ensure it does not become
obsolete. This is also where changes are made to initial software. It involves continuous evaluation
of the system in terms of its performance.

Pilot Question 1.2


i. List the set of steps or phases of System Development Life Cycle.

1.3 Data Types and Data Abstraction


A data type is a term which refers to the kinds of data that variables may hold. With every
programming language there is a set of built-in data types. This means that the language allows
variables to name data of that type and provides a set of operations which meaningfully
manipulates these variables.
Some data types are easy to provide because they are built-in into the computer’s machine language
instruction set, such as integer, character etc. Other data types require considerably more efficient
to implement. In some languages, these are features which allow one to construct combinations of
the built-in types (like structures in ‘C’). However, it is necessary to have such mechanism to create
the new complex data types which are not provided by the programming language. The new type
also must be meaningful for manipulations. Such meaningful data types are referred as abstract data
type.

When designing data structures and algorithms, it is desirable to avoid making decisions based on
the accident of how you first sketch out a piece of code. All design should be motivated by the
explicit needs of the application. The idea of an Abstract Data Type (ADT) is to support this.

Abstract Data Type (ADT) is a data type that separates specification of objects and operations
from representation of objects and implementation of operations. Abstraction captures only those
details about an object that are relevant to the current perspective.

Data abstraction enforces a clear separation between the abstract properties of a data type and the
concrete details of its implementation. The abstract properties are those that are visible to client
code that makes use of the data type—the interface to the data type—while the concrete

Page 28 of 269
CSC 204: Fundamentals of Data Structures

implementation is kept entirely private, and indeed can change, for example to incorporate
efficiency improvements over time.

The idea is that such changes are not supposed to have any impact on client code, since they
involve no difference in the abstract behaviour. Data abstraction allows handling data bits in
meaningful ways. Thus, it is a basic motivation behind data-type.

1.3.1 Data Structures vs. Abstract Data Types (ADT)


Computer program design can be made much easier by organizing information into abstract data
structures (ADS) or abstract data types (ADTs). For example, one can model a table of numbers
that has three columns and an indeterminate number of rows, in terms of an array with two
dimensions: (1) a large number of rows, and (2) three columns.

A key feature of modern computer programs is the ability to manipulate ADS using procedures or
methods that are predefined by the programmer or software designer. This requires that data
structures be specified carefully, with forethought, and in detail.

Pilot Question 1.3


i. What is Data Abstraction?

1.4 Standard Primitive Data types


Standard primitive types are those types that are available on most computers as built-in
features. They include the whole numbers, the logical truth values, and a set of printable
characters. On many computers, fractional numbers are also incorporated, together with the
standard arithmetic operations. We denote these types by the identifiers;
INTEGER, REAL, BOOLEAN, CHARACTER, SET

1.4.1 Integer Type


In computer science, an integer is a datum of integral data type, a data type which represents some
finite subset of the mathematical integers; they are commonly represented in a computer as a group
of binary digits

Page 29 of 269
CSC 204: Fundamentals of Data Structures

Box 2.1: Integer Type

An integer is a number that can be written without a fraction or decimal component.

The type INTEGER comprises a subset of the whole numbers whose size may vary among
individual computer systems. It is assumed that all operations on data of this type are exact and
correspond to the ordinary laws of arithmetic, and that the computation will be interrupted in the
case of a result lying outside the representable subset. This event is called overflow.

The standard operators are the four basic arithmetic operations of addition (+), subtraction (-),
multiplication (*), and division (/, DIV). Whereas the slash denotes ordinary division resulting in a
value of type REAL, the operator DIV denotes integer division resulting in a value of type
INTEGER.

If we define the quotient q = m DIV n and the remainder r = m MOD n, the following relations
hold, assuming n > 0:
q*n + r = m and 0≤r < n

Common integral data types are as follows;

a. Bytes and Octets


The term byte initially meant 'the smallest addressable unit of memory'. In the past, 5-, 6-, 7-, 8-,
and 9-bit bytes have all been used. There have also been computers that could address individual
bits ('bit-addressed machine'), or that could only address 16- or 32-bit quantities ('word-addressed
machine'). The term byte was usually not used at all in connection with bit- and word-addressed
machines.

The term octet always refers to an 8-bit quantity. It is mostly used in the field of computer
networking, where computers with different byte widths might have to communicate. In modern
usage, byte almost invariably means eight bits, since all other sizes have fallen into disuse. Thus,
byte has come to be synonymous with octet.

Page 30 of 269
CSC 204: Fundamentals of Data Structures

b. Words
The term 'word' is used for a small group of bits which are handled simultaneously by processors of
a particular architecture. The size of a word is thus CPU-specific. Many different word sizes have
been used, including 6-, 8-, 12-, 16-, 18-, 24-, 32-, 36-, 39-, 48-, 60-, and 64-bit.

Since the size of a word is architectural, it is usually set by the first CPU in a family, rather than the
characteristics of a later compatible CPU. The meanings of terms derived from word, such as long-
word, double-word, quad-word, and half-word, also vary with the CPU and OS.

1.4.2 The Type Real


The type REAL denotes a subset of the real numbers. Whereas arithmetic with operands of the
types INTEGER is assumed to yield exact results; arithmetic on values of type REAL is permitted
to be inaccurate within the limits of round-off errors caused by computation on a finite number of
digits. This is the principal reason for the explicit distinction between the types INTEGER and
REAL, as it is made in most programming languages.

The standard operators are the four basic arithmetic operations of addition (+), subtraction (-),
multiplication (*), and division (/). It is an essence of data typing that different types are
incompatible under assignment. An exception to this rule is made for assignment of integer values
to real variables, because here the semantics are unambiguous. After all, integers form a subset of
real numbers. However, the inverse direction is not permissible: Assignment of a real value to an
integer variable requires an operation such as truncation or rounding.

Example 2.3

The following is an algorithm for the fast computation of , where n is a non-negative


integer.
y := 1.0; i := n;

WHILE i > 0 DO (* 0 , = xi* y *)


IF ODD(i) THEN y:= y*x END ;
x := x*x; i := i DIV 2
END

Page 31 of 269
CSC 204: Fundamentals of Data Structures

1.4.3 The Type Boolean


The two values of the standard type BOOLEAN are denoted by the identifiers “TRUE” and
“FALSE”.
The Boolean operators are the logical conjunction, disjunction, and negation whose values are
defined in Table 2.1. The logical conjunction is denoted by the symbol &, the logical disjunction by
OR, and negation by “~”.
Note that comparisons are operations yielding a result of type BOOLEAN. Thus, the result of a
comparison may be assigned to a variable, or it may be used as an operand of a logical operator in a
Boolean expression.

For instance, given Boolean variables p and q and integer variables x = 5, y = 8, z = 10, the two
assignments
p := x = y

q:= (x ≤ y) & (y < z)


yield p = FALSE and q = TRUE.

Table 2.1: Boolean Operators.


P Q pANDq pORq ~p
TRUE TRUE TRUE TRUE FALSE
TRUE FALSE TRUE FALSE FALSE
FALSE TRUE TRUE FALSE TRUE
FALSE FALSE FALSE FALSE TRUE

The Boolean operators & (AND) and OR have an additional property in most programming
languages, which distinguishes them from other dyadic operators. Whereas, for example, the sum x
+ y is not defined, if either x or y is undefined, the conjunction p & q is defined even if q is
undefined, provided that p is FALSE.
This conditionality is an important and useful property. The exact definition of & and OR is
therefore given by the following equations:
p & q = if p then q else FALSE
p OR q = if p then TRUE else q

Page 32 of 269
CSC 204: Fundamentals of Data Structures

1.4.4 The Type Char


The standard type CHAR comprises a set of printable characters. Unfortunately, there is no
generally accepted standard character set used on all computer systems. Therefore, the use of the
predicate "standard" may in this case be almost misleading; it is to be understood in the sense of
"standard on the computer system on which a certain program is to be executed."
The character set defined by the International Standards Organization (ISO), and particularly its
American version ASCII (American Standard Code for Information Interchange) is the most widely
accepted set. It consists of 95 printable (graphic) characters and 33 control characters; the latter is
mainly used in data transmission and for the control of printing equipment.

In order to be able to design algorithms involving characters (i.e., values of type CHAR) that are
system independent, we should like to be able to assume certain minimal properties of character
sets, namely:
1. The type CHAR contains the 26 capital Latin letters, the 26 lower-case letters, the 10
decimal digits and a number of other graphic characters, such as punctuation marks.
2. The subsets of letters and digits are ordered and contiguous, i.e.,
("A" x) & (x "Z") implies that x is a capital letter
("a" ≤ x) & (x ≤ "z") implies that x is a lower-case letter
("0" ≤ x) & (x ≤ "9") implies that x is a decimal digit.
3. The type CHAR contains a non-printing, blank character and a line-end character that may
be used as separators.

1.4.5 The Type Set


The type SET denotes sets whose elements are integers in the range 0 to a small number, typically
31 or 63.

Given, for example, variables

VAR r, s, t: SET

Possible assignments are

r := {5}; s := {x, y .. z}; t := {}

Page 33 of 269
CSC 204: Fundamentals of Data Structures

Here, the value assigned to r is the singleton set consisting of the single element 5; t is assigned the
empty set and s the elements x, y, y+1,…, z-1, z.

The following elementary operators are defined on variables of type SET:


* set intersection
+ set union
- set difference
/ symmetric set difference
IN set membership
Constructing the intersection or the union of two sets is often called set multiplication or set
addition, respectively; the priorities of the set operators are defined accordingly, with the
intersection operator having priority over the union and difference operators, which in turn have
priority over the membership operator, which is classified as a relational operator. The following
are examples of set expressions and their fully parenthesized equivalents:
r * s + t = (r*s) + t
r - s * t = r - (s*t)
r - s + t = (r-s) + t

In this class, we will concentrate only on data structures called arrays, sets, records and on ADTs
called lists, stacks, queues, heaps, graphs, and trees. Further details on them will be discussed in
forthcoming study sessions.

Page 34 of 269
CSC 204: Fundamentals of Data Structures

Summary of Study Session 1: Basic Concepts of Data Structures

In this Study Session 1, you have learnt that:

1. Data structure is a particular way of storing and organizing data in a computer so that
it can be used efficiently. Data structures provide a means to manage huge amounts
of data efficiently, such as large databases and internet indexing services
2. The systems development life cycle (SDLC), also referred to as the application
development life-cycle, is a term used in systems engineering, information systems
and software engineering to describe a process for planning, creating, testing, and
deploying an information system.
3. The phases of SDLC are; Requirements Analysis, System Analysis, System Design,
Refinement and Coding, Acceptance, installation, deployment and Maintenance.
4. A data type is a term which refers to the kinds of data that variables may hold. With
every programming language there is a set of built-in data types
5. Abstract Data Type (ADT) is a data type that separates specification of objects and
operations from representation of objects and implementation of operations.
6. Abstraction captures only those details about an object that are relevant to the current
perspective.
7. Data abstraction enforces a clear separation between the abstract properties of a data
type and the concrete details of its implementation.
8. Standard primitive types are those types that are available on most computers as
built-in features. They include the whole numbers, the logical truth values, and a set
of printable characters.
9. Standard primitive data types include integer, real, boolean, character, set.

Page 35 of 269
CSC 204: Fundamentals of Data Structures

Pilot Answers

Pilot Answer 1.1


i. Data structure is a particular way of storing and organizing data in a computer so that it can
be used efficiently. It provides a means to manage huge amounts of data efficiently, such as
large databases and internet indexing services.
ii. C++ Standard Template Library, the Java Collections Framework, and Microsoft's.NET
Framework

Pilot Answer 1.2


i. Set of steps or phases of System Life Cycle includes:.
a. Requirements Analysis
b. System Analysis
c. System Design
d. Refinement and Coding
e. Testing
f. Implementation

Pilot Answer 1.3


i. Data Abstraction captures only those details about an object that are relevant to the current
perspective. It enforces a clear separation between the abstract properties of a data type and
the concrete details of its implementation.

Page 36 of 269
CSC 204: Fundamentals of Data Structures

Glossary of Terms

Abstract Data Type (ADT) is a data type that separates specification of objects & operations from
representation of objects & implementation of operations

Data Structure is a particular way of storing and organizing data in a computer so that it can be
used efficiently

Page 37 of 269
CSC 204: Fundamentals of Data Structures

Self-Assessment Questions (SAQs) for Study Session 1


Now that you have completed this study session, you can assess how well you have achieved its
Learning Outcomes by answering the following questions. Write your answers in your Study Diary
and discuss them with your Tutor at the next Study Support Meeting. You can check your answers
with the Notes on the Self-Assessment Questions at the end of this Module.

i. What is Software Development Life-Cycle?


ii. What are the properties of an algorithm?
iii. Define Recursive Algorithm.
iv. Describe five (5) algorithm representations you know.
v. Explain the term performance analysis of an algorithm as discussed in this study session.

Page 38 of 269

You might also like