0% found this document useful (0 votes)
3 views12 pages

SDT Topic 3

Uploaded by

aubreyndisale22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views12 pages

SDT Topic 3

Uploaded by

aubreyndisale22
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

Topic 3 – Data Representation Software Development Techniques

Software Development
Techniques
Topic 3:
Data Representation

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.2

Scope of Topic
• In this topic we discuss data representation.
– Both in pseudocode and in traditional computer
languages
• Data representation is one of the most important
things to get right when designing an algorithm
algorithm.
– A good structure will make it much easier to
manipulate.
• In this topic, we also discuss some of the ways in
which you can decide how your data is going to be
structured.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.3

Data and Algorithms


• Algorithms represent the ‘moving parts’ of your
code.
– The parts are always fixed, although the order in
which they move is not.
• Data represents the raw material that those moving
parts are manipulating.
– In technical terms, these are our variable parts, or
just variables.
• You need to make sure the moving parts are suited
to the size and shape of the data.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 1


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.4

Data Types - 1
• Most programming languages come with a suite of
data types.
– Each holds a different category of information, and
this usually impacts on what can be done with it.
• We ha
have
e seen th
thus
s far
far:
– Whole numbers (often known as integers)
– Real numbers (often known as floats)
– Strings of text
• Other more specialised kinds of data exist.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.5

Data Types - 2
• The following data types are also available to you
in your pseudocode programs:
– Boolean
• a data type which contains either true or false. It
can hold nothing else.
– Character
• A data type which contains a single unicode
character
• Later on this module, we will also see how to
create our own data types.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.6

Computer Memory
• Every piece of data that is used in an algorithm
must be stored somewhere.
– It gets stored in the computer’s memory.
• There are real physical constraints that impact on
how we design algorithms
algorithms.
– We have finite amounts of computer memory.
– We have finite amounts of CPU cycles.
• For very small and simple algorithms, neither of
these are very limiting.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 2


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.7

Scaling
• Programmers, however, spend a lot of time
thinking about how scalable their programs are.
– If my algorithm works for ten bits of data, will it work for
one hundred? One thousand? One million?
• The seemingly minor decisions you make when
designing your algorithm will impact on that.
– We do not design algorithms to work at only one time on
one piece of data. We need to be mindful of scaling.
• Choosing the wrong data at the start will impact on
this.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.8

Scaling Example
• Consider a program which checks to see whether a
key on a keyboard was pressed in the past sixty
seconds.
– We could store that as a Boolean.
– We could store that as an whole number.
number
– We could store it as a string.
• This program is going to be running for a full year.
– On one hundred computers
• Does it make a difference which we use?

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.9

Data and Computer Memory - 1


• Pseudocode lets us ignore the implementation
details that go along with an algorithm.
• One of those implementation details is how much
space
p is taken up
p by
y different kinds of data.
• However, we still need to know what the relative
differences are between different kinds of data.
– Otherwise we cannot make sensible choices when
we start designing.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 3


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.10

Data and Computer Memory - 2


• A data type is a wrapper around some part of the
computer’s memory.
• When we create a piece of data, we say to the
computer ‘set aside some memory, big enough to
hold some data of this type. Let me refer to the
address of that memory as the name I give it.
– Such as in data being a whole number
• The computer handles the rest, such as making
sure that one piece of data does not overwrite
another.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.11

Data and Computer Memory - 3


• Because the data type is a wrapper, different
languages can handle data types differently.
– Some languages handle a String as a list of characters.
– Some languages handle a String as a custom data type.
• Real numbers in particular have very complex
representation in a computer’s memory.
• Pseudocode lets us ignore this for the most part.
– But a bad data choice is bad regardless of language, so
we need to bear it in mind.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.12

Sizes of Data Types


• The size is different for each language, but this
table shows the size for data types in Java and
C++
Type
yp Java C++
Whole number 4 bytes 4 bytes
Real number 8 bytes 4 bytes
Boolean 1 byte 1 byte
Character 2 bytes 1 byte
String Number of letters + 2 Number of letters + 1
bytes byte

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 4


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.13

Memory Requirements
• You should always be able to estimate just how
much memory your pseudocode representation will
take up.
• You can do this with basic arithmetic.
– Y
You can att any stage
t in
i your desk-check
d k h k ttott up the
th
memory cost with regards to a particular language.
– You already have a column for each bit of data.
• We will talk about how to make that representation
truly general a little later.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.14

Scaling Example - 1
• Let us go back to our scaling example. First, we
need to work out how much data we are going to
be storing.
– One key press every sixty seconds for a year.
• Sixty an hour
• Twenty four hours a day
• 365 days a year
– 525600 units of data per computer.
• We need to store 5256000 units of data across our
one hundred computers for that year.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.15

Scaling Example - 2
• What does that cost us in Java with different data
types?
Data Type Cost in Bytes Cost in Kilobytes
Whole number 21024000 20531
Boolean 5256000 5132
String 15768000 15398

• Data types can make a huge difference. Choose


the smallest data type you can for your needs.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 5


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.16

Data Representation and Your


Desk-Check - 1
• When doing your desk-check, you should include a
column that gets incremented whenever you set up
a new piece of data.
– To be p
pseudocode,, it should not make anyy assumptions
p
about the language or size of variables.
• Instead, include columns for each type of data
present in the algorithm.
– Data sizes can then be calculated easily by whoever
might want to make sure of the algorithm.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.17

Data Representation and your


Desk-Check - 2
• When doing your desk-check, you should make a
note of the type each time you set up a new piece
of data.
– To be p
pseudocode,, it should not make anyy assumptions
p
about the language or size of variables.
• At the end step, the total memory usage can be
summed up by the contents of the columns.
– Data sizes for languages can then be calculated easily
by whoever might want to make sure of the algorithm.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.18

Adapted Desk Check


Code myAge myNewAge Notes
(integer) (integer)
Data myAge as 0 0
whole number
Data myNewAge as 0 0
whole number
Output “please 0 0 User output
enter your age”
age
Input myAge 21 0 User enters 21
myNewAge = 21 22
myAge+1
Output “In a 21 22 User output
year you will
be”
Output myNewAge 21 22 User Output

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 6


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.19

Why do we do this?
• One of the common things that causes problems
when using algorithms is the memory leak.
• Memory leaks are caused by memory being used
up but never freed when the computer is finished.
• By
B h having
i columns
l that
h iindicate
di h
how many
variables we have in use, we can see if we have
flaws in our logic that result in memory leaks.
• It also allows people to decide if your algorithm is
appropriate for the language they are using.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.20

Primitive and Complex Data - 1


• There are two kinds of data type in most modern
programming languages.
• Primitive data types are the building blocks that are
used to build all other data types.
– Whole numbers, real numbers, characters
• Complex data types are those made up of
combinations of primitive data types.
– Strings

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.21

Primitive and Complex Data - 2


• These usually get treated differently in terms of
how they are allocated memory space.
– Primitive data types are also known as value data
types.
– Complex data types are also known as reference
data types.
• This is going to become important in the coming
weeks.
– For now, it just has an impact on what happens
when we create a new variable.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 7


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.22

Default Variable Values


• When we create a number in our pseudocode, the
first thing we do in our desk-check is set its value to
0.
– This is its default value.
– It is a convention of our pseudocode,
pseudocode it is not
something necessarily honoured by all
programming languages.
• For complex data types, we do not do that.
– Complex data types have no default value, they
start off as null values.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.23

Pseudocode Example
1 data num1 as whole number
2 data num2 as whole number
3 data sum as whole number
4 data usertext as string

5 num1 = 10
6 num2 = 20

7 sum = num1 + num2

8 usertext = "The answer is:"

9 output usertext
10 output sum

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.24

Desk-Check with Null Values


Line Num1 Num2 Sum userText
(integer) (integer) (Integer) (String)
1 0
2 0 0
3 0 0 0
4 0 0 0 Null
5 10 0 0 Null
6 10 20 0 Null
7 10 20 30 Null
8 10 20 30 “The answer is”
9 10 20 30 “The answer is”
10 10 20 30 “The answer is”

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 8


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.25

Null Values
• If you attempt to perform any kind of operation on a
null value, a computer will usually crash.
– You cannot use them in calculations.
– You cannot output them.
• The only thing you can do with null values is input
data into them.
– Either via user input
• That can be complex, and we will talk about that in a
later lecture.
– Or via explicitly setting the value with =
• Answer = “The answer is”

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.26

Memory Usage - 1
• Because our desk-check includes the contents of
all our data as well as the types, we can work out
our memory usage at any particular step.
• At step seven, we have three whole numbers, so
our memory usage on java is 12 bytes.
• At step eight, we have those three whole numbers
plus the string (thirteen characters including
spaces, so 15 bytes) – 27 bytes in total for Java.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.27

Memory Usage - 2
• The memory cost of complex data types is
dependent on what other data types they contain.
– Thus, while they exist as variables when we use the
data keyword to create them, they only take up
memory when we put data into them.
– The memory is allocated,
allocated but not used at that
point.
• For now, the difference between primitive and
complex data is related to what happens when they
are created.
– It will become more important later on.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 9


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.28

Choosing the Right Data Type - 1


• Choosing the right data type is important, because
it makes everything else easier.
• You need to consider:
– What kind of information you need to store
– What kind of manipulations you are going to do to
the data.
– What kind of format will be used for output.
– How often you might need to change the
representation.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.29

Choosing the Right Data Type - 2


• What is the best data type for...
– A phone number?
– An address?
– The gender of a student?
– The age of a person?

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.30

Choosing the Right Data Type - 3


• It is often dependent on context.
– A phone number is usually best stored as a string.
– An address is also best stored as a string.
– A Boolean or a character might best represent
gender.
– The age can be a whole or real number.

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 10


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.31

Phone Number - 1
• Why is a phone number best stored as a string?
– It says number right there in the name!
• It is to do with how the data gets manipulated and
output.
– You hardly ever do arithmetic on a phone number.
– You often need to structure a phone number in
chunks, such as 123-456-7890.
– Phone numbers often have a leading 0.
• 0123-456-7890

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.32

Phone Number - 2
• Strings are best for phone numbers, because on
the whole, we do not treat phone numbers as
numbers.
– They are unique codes that just happen to be
made upp of all numbers.
• However, there is always a trade-off.
– Easier to structure as a string.
– More difficult to arithmetically manipulate
• We choose the path of least resistance.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.33

Conclusion
• In this lecture, we discussed the following key
subjects:
– Data representation
– Memory y use of data types
yp
– The scaling of data within algorithms
– Complex and primitive data types
– The methods of choosing the right variable

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 11


Topic 3 – Data Representation Software Development Techniques

Data Representation Topic 3 - 3.34

Terminology - 1
• The following new pieces of terminology were
introduced in this lecture:
– Scaling
• Making an algorithm work for large amounts of
d t and
data d operations.
ti
– Variable
• The formal name for a piece of data stored within
an algorithm

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.35

Terminology - 2
• The following new pieces of terminology were
introduced in this lecture:
– Primitive data type
• The simplest kind of variables, such as whole
and
d reall numbers
b
– Complex data type
• Variables which are made up of other variables.

V1.0 © NCC Education Limited

Data Representation Topic 3 - 3.36

Topic 3 – Data Representation

Any Questions?

V1.0 © NCC Education Limited

V1.0 Visuals Handout – Page 12

You might also like