Lesson4-Language Fundamentals
Lesson4-Language Fundamentals
4
(Character set, keywords, identifiers, constants, variables)
Chapter Outline
“First, master the fundamentals.”
–Larry Bird Introduction
Character Set
Tokens
Keywords
Identifiers
Literals
Data types
“I long to accomplish great and noble task, but
Variables
it is my chief duty to accomplish small tasks as
if they were great and noble.” Type qualifiers
–Helen Keller
Conclusion
Character Set: A character denotes any alphabet, digit, white space or any special symbol that is
used to represent information. A character set is collection of characters.
Token: A token is the smallest individual unit of a program.
Instruction: An instruction is a statement that is given to computer to perform a specific
operation.
Function: A function is a collection of instructions that performs a particular task.
Program: A program is a well-organized collection of instructions that is used to communicate with
the computer system to accomplish desired objective.
Alphabets: abcdefghijklmnopqrstuvwxyz
ABCDEFGHIJKLMNOPQRSTUVWXYZ
Digits: 0123456789
Special Symbols:
Symbol Meaning Symbol Meaning Symbol Meaning
{ Opening curly ‘ apostrophe ^ Caret or exclusive OR
brace
} Closing curly brace “ Double quotation & Ampersand
mark
( Opening ~ Negation or tilde * Asterisk
parenthesis
) Closing parenthesis ! Exclamation + Plus
[ Opening square # Pound or number - Minus or hyphen
bracket or hash
] Closing square % mod / Forward slash
bracket
. Dot or period ; Semi-colon \ Backward slash
? Question Mark : Colon > Greater than
| Pipe , Comma < Lesser than
_ Underscore = Assigns to
White space characters: blank space, horizontal tab, new line,
carriage return, vertical tab, form feed
4.3. Tokens
A token is the smallest individual unit (or element) of a program. The tokens used in a program are:
Keywords.
Identifiers.
Literals (constants).
Variables.
Operators.
4.3.1. Keywords
Each language comes with a set of words. As these words play key role in developing a program, these
are often termed as keywords.
Keywords are the built-in words whose meanings are already explained to compiler.
Keywords are the pre-defined or built-in words. Each keyword has its own definition that is defined by the
language developers. A C compiler can recognize keyword and replaces its definition whenever it is
needed. Keywords also called as reserved words. Each keyword has its own purpose and it should be used
only for that purpose. There are 3 types of keywords:
Keywords
int auto if
float static else
char extern switch
double register default
long case
short while
signed do
unsigned for
void break
struct continue
union goto
typedef return
sizeof
enum
const
volatile
It is important to note that all the keywords should be in lowercase. Some compilers may also include
some or all of the following keywords:
ada asm entry far
fortran huge near pascal
4.3.2. Identifiers
As pre-defined names (i.e., keywords) are needed to develop a C program, User-defined names also
needed. These user-defined names are called as identifiers.
Identifiers are the names given to various program elements such as variables, constants,
arrays, functions, pointers…etc.
These names will be given by the user as and when needed. Giving meaningful identifiers makes program
easy to understand. To define an identifier one should follow these rules:
Rule #2: The first character of identifier must be a letter or an underscore (_). The subsequent
characters may be alphabets, digits or underscore. Special symbols are not allowed.
Ex: pradeep K123 Ravi_varma _abc (valid)
1Raj (invalid)
Rule #3: No special symbol is used except underscore (_). No spaces are allowed in an identifier.
Ex: gross_sal Rect_area (valid)
Gross salary s.i. profit&loss (invalid)
Rule #4: Upper and lower case letters in an identifier are distinct (or different).
Ex: The names amount, Amount, aMOUnt and AMOUNT are not the same identifiers.
Rule #5: An identifier can be arbitrarily long. Some implementations of C recognize only the first eight
characters, though most compilers recognize more (typically, 31 characters).
1) Which of the following are valid and invalid identifiers? Give reasons if not valid.
Q 1) record1 2)$tax 3)name 4)name-and-address 5) 1record
6) name and address 7) name_and_address 8) 123-45-6789
U 9) return 10)file_3 11)_master 12)_123 13) Ravi&Bro.
E
S 2) Assume that your C compiler recognizes only first 8 characters of an identifier.
Which of the following are valid and invalid identifiers?
T 1) Master_minds 2)char 3)s.i. 4) SimpleInterest 5)string 6)char1
I 7) identifier_1 8)ANSWER 9)answer 10)number#1
O
N
S
A constant or literal is a value that is being input by the user to a program. The value may
be a character, a string, an integer or a floating-point number.
There are two types of constants: Numeric constants and non-numeric constants. As the names
imply that numeric constant is collection of digits and non-numeric constant is collection of characters
from character set.
Constants
Numeric Non-numeric
Constants Constants
An integer constant may be suffixed by the letter u (or) U, to specify that it is unsigned (only positive). It
may also be suffixed by the letter l or L to specify that it is long (big integer). In the absence of any
suffixes, the data type of an integer constant is derived from its value.
Examples of integer constants:
Integer constant Description
5000U Unsigned decimal integer constant
123456789L Long decimal integer constant
0235353l Long octal integer constant
0x23FA3dU Unsigned hexa decimal integer constant
0XFFFFFFFUL Unsigned long hexa-decimal integer constant
0243UL Unsigned long octal integer constant
123245353UL Unsigned long decimal integer constant
Decimal notation: In this notation, the floating-point number is represented as a whole number followed
by a decimal point and a fractional part. It is possible to omit digits before or after the decimal point. A
floating-point constant can include one of the suffixes: f, F or l, L.
Note: It should be understood that integer constants are exact quantities; where as floating-point
constants are approximations. We should understand that the floating-point constant 1.0 might be
represented within computer’s memory as 0.99999999….., even though it might appear as 1.0 when it is
displayed on the screen (because of automatic rounding). Therefore, floating-point values can not be used
for certain purposes, such as counting, indexing…etc, where the exact values are required.
It is important to note that character constants do not contain the ‘ (single quote character) or new line
within it. In order to represent these and certain other characters, the following escape sequences (or
backslash character constants) may be used:
Backslash Description
character
constant
\n New line
\t Horizontal tab
\v Vertical tab
\b Back space
\r Carriage return
\f Form feed
\a Audible alert (bell)
\\ Backslash
\? Question mark
\’ Single quote
\’’ Double quote
\000 Octal number
\xhh Hexa-decimal number
The escape sequence \000 consists of the backslash followed by 1, 2 or 3 octal digits which are taken to
specify the value of a desired character. A common example of this construction is \0 (not followed by any
digit), which specifies the character NUL.
The escape sequence \xhh consists of backslash followed by x, followed by hexa-decimal digits, which are
taken to specify the value of the desired character. There is no limit on the number of digits, but the
behavior is undefined if the resulting character value exceeds that of largest character.
1) Which of the following are valid and invalid Integer constants? Give reasons if not
valid.
1) 123.34 2) 0893 3)-2345 4)0x123 5)3458UL 6)2345l 7)0124 8)0XFAGE
Q
2) Which of the following are valid and invalid floating-point constants? Give reasons
U if not valid.
E 1) -934 2) 0345 3)-89.34 4)9E+3 5)67.84L 6)89.342f 7)0.3E-4 8)89. 9).89
S 3) Which of the following are valid and invalid character constants? Give reasons if
not valid.
T 1) ‘a’ 2) ‘{‘ 3)’0’ 4)’ ‘ ‘ 5)’\m’ 6)’\023’ 7)’\x3456’ 8)’,’ 9)’134.3’ 10)’435’
I
4) Which of the following are valid and invalid string constants? Give reasons if not
O valid.
1) “Master minds” 2) “234-567-466” 3)’”King & queen” 4)”C” ”is brilliant”
N 5)”he told-“ I miss you”” 6)”Ravi’s friend”
S
In simple terms, data type is a set of values and operations on those values.
4.3.4.1. Primitive data types: There are 5 basic data types in C. The size and range of each of these
data types may vary among processor types and compilers. The following table shows the primitive data
types in C:
Data type Size (in bytes) Range
16-bit 2 2 4
32-bit 2 4 4
Sign modifiers: if unsigned type modifier is preceding a primitive data type, then the variables of the
specified type accept only positive values. If signed type modifier is preceding a primitive data type, then
the variables of specified type accept both positive and negative values.
The following table specifies various data types including type modifiers: (16-bit compiler)
A variable is a named location in memory that holds a value and that value may be varied during
execution of a program.
Ex: f=1.8*c+32
In this formula, 1.8 and 32 are fixed values means that they don’t change each time. Each time the values
of f and c are changed. Hence, f and c will be treated as variables.
In this syntax,
The content in square brackets is optional. The content in angle brackets is mandatory. There should be
spaces in between. The declarative instruction should always be ended with a semi-colon.
The storage class specifies the default value a variable(s) holds, storage location of variable(s),
scope and life time of variable(s). These include: auto, extern, register, static.
The data type is a keyword that specifies the type of data that is being hold by the variable(s).
The variable name is any legal identifier. In other words, it should be built based on the rules of
identifier. If there are more than one variable of the same type, then separate them with commas.
In these two syntaxes, we observe an operator, i.e., assignment operator (=), which is used to assign a
value of Right operand to Left operand. In the second syntax, the variable name should be declared
earlier.
Ex: int a=20; is equivalent to
int a;
a=20;
4.4. Conclusion
Every C program is typically a collection of functions. A function is a collection of instructions that perform
a specific task. Some of instructions in functions made up of words and characters. These are collectively
known as tokens. Hence, tokens are the smallest individual units of a program.