Bits Bytes and Data Types
Bits Bytes and Data Types
In this lecture
Computer Languages
Assembly Language
The compiler
Operating system
Data and program instructions
Bits, Bytes and Data Types
ASCII table
Data Types
Bit Representation of integers
Base conversions
1’s compliment, 2’s compliment and negative numbers
Computer Languages
A computer language is a language that is used to communicate with a machine. Like all languages,
computer languages have syntax (form) and semantics (meaning). High level languages such as
Java are designed to make the process of programming easier, but programmer typically has little
control over how efficient the code will run on the hardware. On the other hand, Assembly
language programs are harder to write but are designed so that programmer can optimize the
performance of the code. Then there is the machine language, the language the machine really
understands. All computer languages are designed to communicate with hardware at the end. But
programs written in high level languages may go through many steps of translations before being
executed.
Programs written in C are first converted to an assembly program (designed for the underlying
hardware), which then in turn is converted to the machine language, the language understood by
the hardware. There may be many steps in between. Machine language “defines” the machine and
vice versa. Machine language instructions are simple. They typically consist of very simple
instructions such as adding two numbers or moving data or jumping from one instruction to
another. However, it is of course very difficult to write and debug programs in machine language.
Assembly Language
Programs written in a high level language such as C go through a process of translations that
eventually leads to a set of instructions that can be executed by the underlying hardware. One layer
of this program translation is the assembly language. A high level language is translated into
assembly language. Each CPU/processor has its own assembly language. Assembly code is then
translated into the target machine code. Assembly languages are human readable and contains very
simple instructions. For example,
instructions such as Add two numbers, or move memory from one place to another or jump
from one place to another etc.
Eventually this assembly code is mapped into the corresponding machine language so that the
underlying hardware can carry out the instructions.
The Compiler
A compiler (such as gcc – GNU C compiler or lately GNU compiler collection) translates a program
written in a high level language to object code that can be interpreted and executed by the
underlying system. Compilers go through multiple levels of processing such as, syntax checking,
pre-processing macros and libraries, object code generation, linking, and optimization among many
other things. A course in compiler design will expose you to many of the tasks a compiler typically
does. Writing a compiler is a substantial undertaking and one that requires a lot of attention to
detail and understanding of many theoretical concepts in computer science.
Operating System
Each machine needs an Operating System (OS). An operating system is a software program that
manages coordination between application programs and underlying hardware. OS manages
devices such as printers, disks, monitors and manage multiple tasks such as processes. UNIX is an
operating system. The following figure demonstrates the high level view of the layers of a
computer system. It demonstrates that the end users interface with the computer at the
application level, while programmers deal with utilities and operating system level. On the other
hand, an OS designed must understand how to interface with the underlying hardware
architecture.
End user Programmer
Utilities
Operating System
Computer Hardware
A bit is the smallest unit of memory, and is basically a switch. It can be in one of two states, "0" or
"1". These states are sometimes referenced as "off and on", or "no and yes"; but these are simply
alternate designations for the same concept. Given that each bit is capable of holding two possible
values, the number of possible different combinations of values that can be stored in n bits is 2n. For
example:
Each data byte can be represented using an ASCII (or extended ASCII) value. An ASCII table is
given below. Standard ASCII table assigns each character to a numerical value. For example ‘A’ =
65 and ‘a’ = 97. Printable standard ASCII values are between 32 and
126. The 8th bit in the byte may be used for parity checking in communication or other device
specific functions.
Standard Datatypes - Many standard kinds of data occupy either 1, 2, 4, or 8 bytes, which happen
to be the data sizes that today’s typical processor chips are designed to manipulate most efficiently.
1 byte = 8 bits:
o A single character of text (for most character sets). Thus, an MS Access field with
datatype
Text and field width n consumes n bytes. Example: Text(40) consumes 40 bytes.
o A whole number from –128 to +127. This is what you get in the MS Access
Number/Byte
datatype
o A whole number from 0 to 255
o MS Access Yes/No fields also consume 1 byte. In principle, you only need a
single bit, but one byte is the minimum size for for a field.
2 bytes = 16 bits, or two bytes:
o A whole number between about –32,000 and +32,000; this is MS Access’
Number/Integer
datatype, often also called a “short” integer
o A single character from a large Asian character set
4 bytes = 32 bits:
o Can hold a whole number between roughly –2 billion to +2 billion. This is MS
Access’
Number/Long Integer datatype
o A “single precision” floating-point number. “Floating point” is basically scientific
notation, although the computer’s internal representation uses powers of 2 instead of powers of
10.
his is MS Access’ Number/Single datatype, with the equivalent of about 6 decimal digits of accuracy.
8 bytes = 64 bits:
o Can hold a “double precision” floating-point number with the equivalent of about
15 digits of accuracy. This is MS Access Number/Double datatype, and is the most common way of
storing numbers that can contain fractions.
o Really massive whole numbers (in the range of + or – 9 quintillion). This is
essentially the way MS Access stores the following datatypes
Date/Time
Currency
Because computers tend to work in powers of 2, computer engineers have taken liberty
with the above by substituting the multiplier 1024 (= 210) for 1000. As a result, for many
applications:
1 kilobit (kb) or kilobyte (kB) = 1024 bits or 1024 bytes, respectively
1 megabit (Mb) or megabyte (MB) = 1024 kilobits or 1024
kilobytes, respectively 1 gigabit (Gb) or gigabyte (GB) = 1024 megabits or
1024 megabytes, respectively 1 terabit (Tb) or Terabyte (TB) = 1024
gigabits or 1024 gigabytes, respectively
We’ll call these two different systems “decimal-style” and “binary-style”, respectively.
Which one gets used depends on the convention for marketing or measuring a particular
component.
When you buy a 128 MB RAM chip for a computer, you actually get 128 binary megabytes,
or about 134.22 million (128 MB x 1024 KB/MB x 1024 B/KB). Your computer BIOS will read the
RAM as 128 MB (134.22 / (1.024 x 1.024). When you buy a 15 GB hard drive, however, you might
well get 15 decimal gigabytes, so when the drive is formatted, your computer's operating system
might state its size as 13.97 binary GB (15 / (1.024 x 1.024 x 1.024)). You haven't lost 1 GB; the size
was measured using two different systems.
Each ASCII value can be represented using 7 bits. 7 bits can represent numbers from 0 =
0000 0000 to 127 = 0111 1111 (total of 128 numbers or 27)
Data Types
C has all the standard data types as in any high level language. C has int, short, long, char,
float, double. C has no boolean data type or string type. C has no Boolean type but 0 can
be used for false and anything else for True. A C string is considered a sequence of
characters ending with null character ‘\0’. We will discuss more about strings later. You
can read more about data types in K&R page 36.
An integer is typically represented by 4 bytes (or 32-bits). However this depends on the
compiler/machine you are using. It is possible some architectures may use 2 bytes while
others may use 8 bytes to represent an integer. But generally it is 4 bytes of memory. You
can use sizeof(int) to find out the number of bytes assigned for int data type.
For example:
printf(“The size of int is %d \n”, sizeof(int));
prints the size of an integer in the system you are working on.
Highest bit is the signed bit. Unsigned numbers uses the highest order bit as well to store the value
of the number and hence doubling the range of values. To understand this concept, assume a
signed number represented using 8 bits.
1111 1111 – What is the highest value that can be represented if all 8 bits are used for the number?
Base Conversions
Understanding different bases is critical to understanding how data is represented in the
memory. We consider base-2 (binary), base-8 (octal), base-10(decimal) and base-
16(hexadecimal). A number can be represented in any of the bases stated above. The following
digits are used in each base.
Base-2 - 0, 1
Base-8 - 0, 1, 2, 3,…, 7
Base-10 – 0, 1, 2, …, 9
Base-16 - 0,1, 2,…., 9, A, B, C, D, E, F
A number that is in base-10 such as 97 can be written in base-2 (binary) as follows. To convert the
number to binary, find the sums of the powers of 2 that makes up the number, that is 97 = 64 + 32 +
1, and then represent this number using a binary pattern such as follows.
97 = 01100001
Each number can be converted from binary (base-2) to any other base such as
octal(base-8), decimal(base-10) or hex (base-16).
Examples: 0000 1010 =
0101 1110 =
Examples: 70 =
300 =
Lets now look at the encoding method. The table below shows the bit combinations required for each
character.
A computer usually stores information in eight bits. The eighth bit is unused in ASCII, thus is
normally set to 0. Some systems may use the eight bit to implement graphics or different language
symbols, ie, Greek characters.
Control codes are used in communications and printers. They may be generated from an ASCII
keyboard by holding down the CTRL (control) key and pressing another key (A to Z, plus {, \, ], ^,
<- ).
Example Code the text string 'Hello.' in ASCII using hexadecimal digits.
H = 48
e = 65
l = 6C
l = 6C
o = 6F
. = 2E
The two’s complement of the number is obtained by adding 1 to its one’s complement. That is, the
two’s complement of 30 is obtained as follows.
11111110 00011111
+1
11111110 00100000
Hence -30 is represented as its two’s complement, that is ~30 + 1 = 11111110 00100000
Exercise: Perform binary addition of 34 + (-89) using two’s complement of the negative number.