SDT Topic 3
SDT Topic 3
Software Development
Techniques
Topic 3:
Data Representation
Scope of Topic
• In this topic we discuss data representation.
– Both in pseudocode and in traditional computer
languages
• Data representation is one of the most important
things to get right when designing an algorithm
algorithm.
– A good structure will make it much easier to
manipulate.
• In this topic, we also discuss some of the ways in
which you can decide how your data is going to be
structured.
Data Types - 1
• Most programming languages come with a suite of
data types.
– Each holds a different category of information, and
this usually impacts on what can be done with it.
• We ha
have
e seen th
thus
s far
far:
– Whole numbers (often known as integers)
– Real numbers (often known as floats)
– Strings of text
• Other more specialised kinds of data exist.
Data Types - 2
• The following data types are also available to you
in your pseudocode programs:
– Boolean
• a data type which contains either true or false. It
can hold nothing else.
– Character
• A data type which contains a single unicode
character
• Later on this module, we will also see how to
create our own data types.
Computer Memory
• Every piece of data that is used in an algorithm
must be stored somewhere.
– It gets stored in the computer’s memory.
• There are real physical constraints that impact on
how we design algorithms
algorithms.
– We have finite amounts of computer memory.
– We have finite amounts of CPU cycles.
• For very small and simple algorithms, neither of
these are very limiting.
Scaling
• Programmers, however, spend a lot of time
thinking about how scalable their programs are.
– If my algorithm works for ten bits of data, will it work for
one hundred? One thousand? One million?
• The seemingly minor decisions you make when
designing your algorithm will impact on that.
– We do not design algorithms to work at only one time on
one piece of data. We need to be mindful of scaling.
• Choosing the wrong data at the start will impact on
this.
Scaling Example
• Consider a program which checks to see whether a
key on a keyboard was pressed in the past sixty
seconds.
– We could store that as a Boolean.
– We could store that as an whole number.
number
– We could store it as a string.
• This program is going to be running for a full year.
– On one hundred computers
• Does it make a difference which we use?
Memory Requirements
• You should always be able to estimate just how
much memory your pseudocode representation will
take up.
• You can do this with basic arithmetic.
– Y
You can att any stage
t in
i your desk-check
d k h k ttott up the
th
memory cost with regards to a particular language.
– You already have a column for each bit of data.
• We will talk about how to make that representation
truly general a little later.
Scaling Example - 1
• Let us go back to our scaling example. First, we
need to work out how much data we are going to
be storing.
– One key press every sixty seconds for a year.
• Sixty an hour
• Twenty four hours a day
• 365 days a year
– 525600 units of data per computer.
• We need to store 5256000 units of data across our
one hundred computers for that year.
Scaling Example - 2
• What does that cost us in Java with different data
types?
Data Type Cost in Bytes Cost in Kilobytes
Whole number 21024000 20531
Boolean 5256000 5132
String 15768000 15398
Why do we do this?
• One of the common things that causes problems
when using algorithms is the memory leak.
• Memory leaks are caused by memory being used
up but never freed when the computer is finished.
• By
B h having
i columns
l that
h iindicate
di h
how many
variables we have in use, we can see if we have
flaws in our logic that result in memory leaks.
• It also allows people to decide if your algorithm is
appropriate for the language they are using.
Pseudocode Example
1 data num1 as whole number
2 data num2 as whole number
3 data sum as whole number
4 data usertext as string
5 num1 = 10
6 num2 = 20
9 output usertext
10 output sum
Null Values
• If you attempt to perform any kind of operation on a
null value, a computer will usually crash.
– You cannot use them in calculations.
– You cannot output them.
• The only thing you can do with null values is input
data into them.
– Either via user input
• That can be complex, and we will talk about that in a
later lecture.
– Or via explicitly setting the value with =
• Answer = “The answer is”
Memory Usage - 1
• Because our desk-check includes the contents of
all our data as well as the types, we can work out
our memory usage at any particular step.
• At step seven, we have three whole numbers, so
our memory usage on java is 12 bytes.
• At step eight, we have those three whole numbers
plus the string (thirteen characters including
spaces, so 15 bytes) – 27 bytes in total for Java.
Memory Usage - 2
• The memory cost of complex data types is
dependent on what other data types they contain.
– Thus, while they exist as variables when we use the
data keyword to create them, they only take up
memory when we put data into them.
– The memory is allocated,
allocated but not used at that
point.
• For now, the difference between primitive and
complex data is related to what happens when they
are created.
– It will become more important later on.
Phone Number - 1
• Why is a phone number best stored as a string?
– It says number right there in the name!
• It is to do with how the data gets manipulated and
output.
– You hardly ever do arithmetic on a phone number.
– You often need to structure a phone number in
chunks, such as 123-456-7890.
– Phone numbers often have a leading 0.
• 0123-456-7890
Phone Number - 2
• Strings are best for phone numbers, because on
the whole, we do not treat phone numbers as
numbers.
– They are unique codes that just happen to be
made upp of all numbers.
• However, there is always a trade-off.
– Easier to structure as a string.
– More difficult to arithmetically manipulate
• We choose the path of least resistance.
Conclusion
• In this lecture, we discussed the following key
subjects:
– Data representation
– Memory y use of data types
yp
– The scaling of data within algorithms
– Complex and primitive data types
– The methods of choosing the right variable
Terminology - 1
• The following new pieces of terminology were
introduced in this lecture:
– Scaling
• Making an algorithm work for large amounts of
d t and
data d operations.
ti
– Variable
• The formal name for a piece of data stored within
an algorithm
Terminology - 2
• The following new pieces of terminology were
introduced in this lecture:
– Primitive data type
• The simplest kind of variables, such as whole
and
d reall numbers
b
– Complex data type
• Variables which are made up of other variables.
Any Questions?