0% found this document useful (0 votes)
15 views10 pages

Bits Bytes and Data Types

The document provides an overview of computer languages, focusing on the differences between high-level languages, assembly language, and machine language, as well as the role of compilers and operating systems. It explains data representation in memory, including bits, bytes, ASCII encoding, and various data types in C, along with methods for handling negative numbers using one’s and two’s complement. Additionally, it covers base conversions and the significance of understanding different number systems in computing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views10 pages

Bits Bytes and Data Types

The document provides an overview of computer languages, focusing on the differences between high-level languages, assembly language, and machine language, as well as the role of compilers and operating systems. It explains data representation in memory, including bits, bytes, ASCII encoding, and various data types in C, along with methods for handling negative numbers using one’s and two’s complement. Additionally, it covers base conversions and the significance of understanding different number systems in computing.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 10

Bits, Bytes and Data Types

In this lecture
 Computer Languages
 Assembly Language
 The compiler
 Operating system
 Data and program instructions
 Bits, Bytes and Data Types
 ASCII table
 Data Types
 Bit Representation of integers
 Base conversions
 1’s compliment, 2’s compliment and negative numbers

Computer Languages
A computer language is a language that is used to communicate with a machine. Like all languages,
computer languages have syntax (form) and semantics (meaning). High level languages such as
Java are designed to make the process of programming easier, but programmer typically has little
control over how efficient the code will run on the hardware. On the other hand, Assembly
language programs are harder to write but are designed so that programmer can optimize the
performance of the code. Then there is the machine language, the language the machine really
understands. All computer languages are designed to communicate with hardware at the end. But
programs written in high level languages may go through many steps of translations before being
executed.
Programs written in C are first converted to an assembly program (designed for the underlying
hardware), which then in turn is converted to the machine language, the language understood by
the hardware. There may be many steps in between. Machine language “defines” the machine and
vice versa. Machine language instructions are simple. They typically consist of very simple
instructions such as adding two numbers or moving data or jumping from one instruction to
another. However, it is of course very difficult to write and debug programs in machine language.

Assembly Language
Programs written in a high level language such as C go through a process of translations that
eventually leads to a set of instructions that can be executed by the underlying hardware. One layer
of this program translation is the assembly language. A high level language is translated into
assembly language. Each CPU/processor has its own assembly language. Assembly code is then
translated into the target machine code. Assembly languages are human readable and contains very
simple instructions. For example,
instructions such as Add two numbers, or move memory from one place to another or jump
from one place to another etc.

A high level instruction written in C such as


A = A + 1 could be translated into (hypothetical) Assembly as follows.

Mov R1, A // move A to register 1


Inc R1 // increment R1 by 1
Mov A, R1 // move R1 to A

Eventually this assembly code is mapped into the corresponding machine language so that the
underlying hardware can carry out the instructions.

The Compiler
A compiler (such as gcc – GNU C compiler or lately GNU compiler collection) translates a program
written in a high level language to object code that can be interpreted and executed by the
underlying system. Compilers go through multiple levels of processing such as, syntax checking,
pre-processing macros and libraries, object code generation, linking, and optimization among many
other things. A course in compiler design will expose you to many of the tasks a compiler typically
does. Writing a compiler is a substantial undertaking and one that requires a lot of attention to
detail and understanding of many theoretical concepts in computer science.

Operating System
Each machine needs an Operating System (OS). An operating system is a software program that
manages coordination between application programs and underlying hardware. OS manages
devices such as printers, disks, monitors and manage multiple tasks such as processes. UNIX is an
operating system. The following figure demonstrates the high level view of the layers of a
computer system. It demonstrates that the end users interface with the computer at the
application level, while programmers deal with utilities and operating system level. On the other
hand, an OS designed must understand how to interface with the underlying hardware
architecture.
End user Programmer

Application programs OS Designer

Utilities

Operating System

Computer Hardware

Data and Program Instructions


All data and program instructions are stored as sequences of bytes in the memory called Random
Access Memory (RAM). Typically data and instructions are stored in specific parts of the RAM as
directed by the OS/compiler. As programs are executed, each instruction is fetched from memory
and executed to produce the results. To increase the speed of execution of a program, a compiler
may use fast accessed memory locations such as registers and cache memory. There could be 8
registers in the machine with one called the zero register (containing the value zero for
initializations). The following figure demonstrates the architecture of a uni-processor machine
that contains a CPU, memory and IO modules.
Bits, Bytes and Data Types
A bit is the smallest unit of storage represented by 0 or 1. A byte is typically 8 bits. C character data
type requires one byte of storage. A file is a sequence of bytes. A size of the file is the number of
bytes within the file. Although all files are a sequence of bytes,m files can be regarded as text files or
binary files. Text files are human readable (it consists of a sequence of ASCII characters) and binary
files may not be human readable (eg: image file such as bitmap file).

A bit is the smallest unit of memory, and is basically a switch. It can be in one of two states, "0" or
"1". These states are sometimes referenced as "off and on", or "no and yes"; but these are simply
alternate designations for the same concept. Given that each bit is capable of holding two possible
values, the number of possible different combinations of values that can be stored in n bits is 2n. For
example:

1 bit can hold 2 = 21 possible values (0 or 1)


2 bits can hold 2 × 2 = 22 = 4 possible values (00, 01, 10, or 11)
3 bits can hold 2 × 2× 2 = 23 = 8 possible values (000, 001, 010, 011, 100, 101, 110, or 111)
4 bits can hold 2 × 2 × 2 × 2 = 24 =16 possible values
5 bits can hold 2 × 2 × 2 × 2 × 2 = 25 =32 possible values
6 bits can hold 2 × 2 × 2 × 2 × 2 × 2 = 26 = 64 possible values
7 bits can hold 2 × 2 × 2 × 2 × 2 × 2 × 2 = 27 = 128 possible values
8 bits can hold 2 × 2 × 2 × 2 × 2 × 2 × 2 × 2 = 28 = 256 possible values

n bits can hold 2n possible values

Standard units of memory


1000 bytes = 1 Kilobytes(KB)
1000 KB = 1 megabyte (MB)
1000MB = 1Gigabyte(GB)
1000GB=1 Terabyte(TB)
1000 TB = 1 Petabyte(PB)

Each data byte can be represented using an ASCII (or extended ASCII) value. An ASCII table is
given below. Standard ASCII table assigns each character to a numerical value. For example ‘A’ =
65 and ‘a’ = 97. Printable standard ASCII values are between 32 and
126. The 8th bit in the byte may be used for parity checking in communication or other device
specific functions.

Standard Datatypes - Many standard kinds of data occupy either 1, 2, 4, or 8 bytes, which happen
to be the data sizes that today’s typical processor chips are designed to manipulate most efficiently.
 1 byte = 8 bits:
o A single character of text (for most character sets). Thus, an MS Access field with
datatype
Text and field width n consumes n bytes. Example: Text(40) consumes 40 bytes.
o A whole number from –128 to +127. This is what you get in the MS Access
Number/Byte
datatype
o A whole number from 0 to 255
o MS Access Yes/No fields also consume 1 byte. In principle, you only need a
single bit, but one byte is the minimum size for for a field.
 2 bytes = 16 bits, or two bytes:
o A whole number between about –32,000 and +32,000; this is MS Access’
Number/Integer
datatype, often also called a “short” integer
o A single character from a large Asian character set
 4 bytes = 32 bits:
o Can hold a whole number between roughly –2 billion to +2 billion. This is MS
Access’
Number/Long Integer datatype
o A “single precision” floating-point number. “Floating point” is basically scientific
notation, although the computer’s internal representation uses powers of 2 instead of powers of
10.
his is MS Access’ Number/Single datatype, with the equivalent of about 6 decimal digits of accuracy.
 8 bytes = 64 bits:
o Can hold a “double precision” floating-point number with the equivalent of about
15 digits of accuracy. This is MS Access Number/Double datatype, and is the most common way of
storing numbers that can contain fractions.
o Really massive whole numbers (in the range of + or – 9 quintillion). This is
essentially the way MS Access stores the following datatypes
 Date/Time
 Currency
 Because computers tend to work in powers of 2, computer engineers have taken liberty
with the above by substituting the multiplier 1024 (= 210) for 1000. As a result, for many
applications:

 1 kilobit (kb) or kilobyte (kB) = 1024 bits or 1024 bytes, respectively
 1 megabit (Mb) or megabyte (MB) = 1024 kilobits or 1024
kilobytes, respectively 1 gigabit (Gb) or gigabyte (GB) = 1024 megabits or
1024 megabytes, respectively 1 terabit (Tb) or Terabyte (TB) = 1024
gigabits or 1024 gigabytes, respectively

 We’ll call these two different systems “decimal-style” and “binary-style”, respectively.
Which one gets used depends on the convention for marketing or measuring a particular
component.

 When you buy a 128 MB RAM chip for a computer, you actually get 128 binary megabytes,
or about 134.22 million (128 MB x 1024 KB/MB x 1024 B/KB). Your computer BIOS will read the
RAM as 128 MB (134.22 / (1.024 x 1.024). When you buy a 15 GB hard drive, however, you might
well get 15 decimal gigabytes, so when the drive is formatted, your computer's operating system
might state its size as 13.97 binary GB (15 / (1.024 x 1.024 x 1.024)). You haven't lost 1 GB; the size
was measured using two different systems.
Each ASCII value can be represented using 7 bits. 7 bits can represent numbers from 0 =
0000 0000 to 127 = 0111 1111 (total of 128 numbers or 27)

Data Types
C has all the standard data types as in any high level language. C has int, short, long, char,
float, double. C has no boolean data type or string type. C has no Boolean type but 0 can
be used for false and anything else for True. A C string is considered a sequence of
characters ending with null character ‘\0’. We will discuss more about strings later. You
can read more about data types in K&R page 36.

An integer is typically represented by 4 bytes (or 32-bits). However this depends on the
compiler/machine you are using. It is possible some architectures may use 2 bytes while
others may use 8 bytes to represent an integer. But generally it is 4 bytes of memory. You
can use sizeof(int) to find out the number of bytes assigned for int data type.

For example:
printf(“The size of int is %d \n”, sizeof(int));
prints the size of an integer in the system you are working on.

Bit Representation of Integers


If you take a low level look at an integer, this is how integer with value 10 is represented using 4
bytes (32-bits) in the memory:
00000000 00000000 00000000 00001010

Highest bit is the signed bit. Unsigned numbers uses the highest order bit as well to store the value
of the number and hence doubling the range of values. To understand this concept, assume a
signed number represented using 8 bits.

0111 1111 - What is the value of this?

1111 1111 – What is the highest value that can be represented if all 8 bits are used for the number?

Base Conversions
Understanding different bases is critical to understanding how data is represented in the
memory. We consider base-2 (binary), base-8 (octal), base-10(decimal) and base-
16(hexadecimal). A number can be represented in any of the bases stated above. The following
digits are used in each base.

Base-2 - 0, 1
Base-8 - 0, 1, 2, 3,…, 7
Base-10 – 0, 1, 2, …, 9
Base-16 - 0,1, 2,…., 9, A, B, C, D, E, F

A number that is in base-10 such as 97 can be written in base-2 (binary) as follows. To convert the
number to binary, find the sums of the powers of 2 that makes up the number, that is 97 = 64 + 32 +
1, and then represent this number using a binary pattern such as follows.
97 = 01100001
Each number can be converted from binary (base-2) to any other base such as
octal(base-8), decimal(base-10) or hex (base-16).
Examples: 0000 1010 =
0101 1110 =

Examples: 70 =
300 =

Exercise: Write 456 in base-2, base-8, and base-16

The American Standard Code for Information Interchange


ASCII is a computer code which uses 128 different encoding combinations of a group of seven bits
(27 = 128) to represent,

 characters A to Z, both upper and lower case


 special characters, < . ? : etc
 numbers 0 to 9
 special control codes used for device control

Lets now look at the encoding method. The table below shows the bit combinations required for each
character.
A computer usually stores information in eight bits. The eighth bit is unused in ASCII, thus is
normally set to 0. Some systems may use the eight bit to implement graphics or different language
symbols, ie, Greek characters.

Control codes are used in communications and printers. They may be generated from an ASCII
keyboard by holding down the CTRL (control) key and pressing another key (A to Z, plus {, \, ], ^,
<- ).

Example Code the text string 'Hello.' in ASCII using hexadecimal digits.

H = 48
e = 65
l = 6C
l = 6C
o = 6F
. = 2E

thus the string is represented by the byte sequence


48 65 6C 6C 6F 2E

Negative Numbers, One’s Complement and Two’s Complement


aSigned data is generally represented in the computer in their two’s complement. Two’s
complement of a number is obtained by adding 1 to its one’s complement. So how do we find one’s
complement of a number? Here is the definition
One’s complement of x is given by ~x. Obtain the one’s complement of a number by negating each of
its binary bits. For example one’s complement of 30 is (16-bit short)
30 = 16 + 8 + 4 + 2 = 00000001 11100000 ➔ binary 30

-30 = 11111110 00011111 ➔its one’s complement


The two’s complement of the number is obtained by adding 1 to its one’s complement.
That is, the two’s complement of 30 is obtained as follows.
11111110 00011111 +
1
----------------- ---------
11111110 00100000
Hence -30 is represented as its two’s complement, that is ~30 + 1 = 11111110 00100000
Exercise: Perform binary addition of 34 + (-89) using two’s complement of the negative number
~30 = 11111110 00011111 ➔ its one’s complement

The two’s complement of the number is obtained by adding 1 to its one’s complement. That is, the
two’s complement of 30 is obtained as follows.
11111110 00011111
+1

11111110 00100000
Hence -30 is represented as its two’s complement, that is ~30 + 1 = 11111110 00100000

Exercise: Perform binary addition of 34 + (-89) using two’s complement of the negative number.

You might also like