Using Pointers, Arrays, Structures and Unions in 8051 C Compilers
Using Pointers, Arrays, Structures and Unions in 8051 C Compilers
Compilers
by Olaf Pfieffer, based on the C51 Primer by Mike Beach, Hitex UK
Although both the Keil and Raisonance 8051 C compiler systems allow you to use pointers,
arrays, structures and unions as in any PC-based C dialect, there are several important extensions
allowing to generate more efficient code.
; addressed in DPTR
7. CLR A
8. MOV DPTR,#0040 ; Put off-chip address to be indirectly
9. MOVC A,@A+DPTR ; addressed in DPTR
In each case the data is held in a memory location indicated by the value in registers to the right
of the '@'.
Pointers In C
The C equivalent of the indirect instruction is the pointer. The register holding the address to be
indirectly accessed in the assembler examples is a normal C type, except that its purpose is to
hold an address rather than a variable or constant data value.
It is declared by:
1. unsigned char *pointer0 ;
Note the asterisk prefix, indicating that the data held in this variable is an address rather than a
piece of data that might be used in a calculation etc..
In all cases in the assembler example two distinct operations are required:
1. Place address to be indirectly addressed in a register.
2. Use the appropriate indirect addressing instruction to access data held at chosen address.
Fortunately in C the same procedure is necessary, although the indirect register must be
explicitly defined, whereas in assembler the register exists in hardware.
1. /* 1 - Define a variable which will hold an address */
2. unsigned char *pointer ;
3. /* 2 - Load pointer variable with address to be accessed*/
4. /*indirectly */
5. pointer = &c_variable ;
6. /* 3 - Put data '0xff' indirectly into c variable via*/
7. /*pointer */
8. *pointer = 0xff ;
Taking each operation in turn...
1. Reserve RAM to hold pointer. In practice the compiler attaches a symbolic name to a
RAM location, just as with a normal variable.
2. Load reserved RAM with address to be accessed, equivalent to 'MOV R0,#40'. In English
this C statement means: "take the 'address of' c_variable and put it into the reserved
RAM, i.e, the pointer" In this case the pointer's RAM corresponds to R0 and the '&'
equates loosely to the assembler '#'.
3. Move the data indirectly into pointed-at C variable, as per the assembler 'MOV A,@R0'.
The ability to access data either directly, x = y, or indirectly, x = *y_ptr, is extremely useful. Here
is C example:
1. /* Demonstration Of Using A Pointer */
2. void function(void)
3. {
4. unsigned char c_variable ; // 1 - Declare a c variable unsigned char
5. *ptr ;
6.
c_variable = 0xff ;
7.
8.
ptr = &c_variable ;
9.
*ptr = 0xff ;
10. }
Note: Line 8 causes pointer to point at variable. An alternative way of doing this is at compile
time thus:
1. /* Demonstration Of Using A Pointer */
2. void function (void)
3. {
4. unsigned char c_variable;
// 1-Declare a c variable
8.
9.
10. }
Pointers with their asterisk prefix can be used exactly as per normal data types. The statement:
1. x = y + 3 ;
could equally well perform with pointers, as per
1. char x, y ;
2. char *x_ptr = &x ;
3. char *y_ptr = &y ;
4. *x_ptr = *y_ptr + 3 ;
or:
1. x = y * 25 ;
2. *x_ptr = *y_ptr * 25 ;
The most important thing to understand about pointers is that
1. *ptr = var ;
means "set the value of the pointed-at address to value var", whereas
1. ptr = &var ;
means "make ptr point at var by putting the address of (&) in ptr, but do not move any data out of
var itself".
Thus the rule is to initialize a pointer:
1. ptr = &var ;
To access the data indicated by *ptr:
1. var = *ptr ;
Pointers To Absolute Addresses
In embedded C, ROM, RAM and peripherals are at fixed addresses. This immediately raises the
question of how to make pointers point at absolute addresses rather than just variables whose
address is unknown (and largely irrelevant).
The simplest method is to determine the pointed-at address at compile time:
where the initial values are put in place before the program gets to "main()". Note that the size of
this initialized array is not given in the square brackets - the compiler works-out the size
automatically upon compilation.
Another common instance of an array is analogous to the BASIC string as per:
1. A$ = "HELLO!"
In C this equates to:
1. char test_array[] = { "HELLO!" } ;
In C there is no real distinction between strings and arrays as a C array is just a series of
sequential bytes occupied either by a string or a series of numbers. In fact the realms of pointers
and arrays overlap with strings by virtue of :
1. char test_array = { "HELLO!" } ;
2. char *string_ptr = { "HELLO!" } ;
Case 1 creates a sequence of bytes containing the ASCII equivalent of "HELLO!". Likewise the
second case allocates the same sequence of bytes but in addition creates a separate pointer called
*string_ptr to it. Notice that the "unsigned char" used previously has become "char", literally an
ASCII character.
The second is really equivalent to:
1. char test_array = { "HELLO!" } ;
Then at run time:
1. char arr_ptr = test_array ; // Array treated as pointer - or;
2. char arr_ptr = &test_array[0] ;
3. // Put address of first element of array into pointer
This again shows the partial interchangeability of pointers and arrays. In English, the first means
"transfer address of test_array into arr_ptr". Stating an array name in this context causes the array
to be treated as a pointer to the first location of the array. Hence no "address of" (&) or '*' to be
seen.
The second case reads as "get the address of the first element of the array name and put it into
arr_ptr". No implied pointer conversion is employed, just the return of the address of the array
base.
The new pointer "*arr_ptr" now exactly corresponds to *string_ptr, except that the physical
"HELLO!" they point at is at a different address.
Using Arrays
Arrays are typically used like this
1. /* Copy The String HELLO! Into An Empty Array */
2. unsigned char source_array[] = { "HELLO!" } ;
3. unsigned char dest_array[7];
4. unsigned char array_index ;
5. array_index = 0 ; // First character index
6. while(array_index < 7) // Check for end of array
7. {
8.
dest_array[array_index] = source_array[array_index] ;
9.
10.
11.
12. }
The variable array_index shows the offset of the character to be fetched (and then stored) from
the starts of the arrays.
As has been indicated, pointers and arrays are closely related. Indeed the above program could
be re-written as:
1. /* Copy The String HELLO! Into An Empty Array */
2. char *string_ptr = { "HELLO!" } ;
3. unsigned char dest_array[7] ;
4. unsigned char array_index ;
5. array_index = 0 ; // First character index
6. while(array_index < 7)
7. {
8.
dest_array[array_index] = string_ptr[array_index] ;
9.
10.
array_index++ ;
11. }
The point to note is that only the definition of string_ptr (previous source_array) changed. By
removing the '*' on string_ptr and appending a '[ ]' pair, this pointer can be turned back into an
array!
However in this case there is an alternative way of scanning along the HELLO! string, using the
*ptr++ convention:
1. /* Copy The String HELLO! Into An Empty Array */
2. char *string_ptr = { "HELLO!" } ;
3. unsigned char dest_array[7] ;
4. unsigned char array_index ;
5. array_index = 0 ; // First character index
6. while(array_index < 7)
7. {
8.
dest_array[array_index] = *string_ptr++ ;
9.
10.
array_index++ ;
11. }
This is an example of C being somewhat inconsistent; this *ptr++ statement does not mean
"increment the thing being pointed at" but rather, increment the pointer itself, so causing it to
point at the next sequential address. Thus in the example the character is obtained and then the
pointer moved along to point at the next higher address in memory.
Structures
Structures are perhaps what makes C such a powerful language for creating very complex
programs with huge amounts of data. They are basically a way of grouping together related data
items under a single symbolic name.
4.
5.
6.
7.
8. } ;
This does not physically do anything to memory. At this stage it merely creates a template which
can now be used to put real data into memory.
This is achieved by:
1. struct SENSOR_DESC sensor_database ;
This reads as "use the template SENSOR_DESC to layout an area of memory named
sensor_database, reflecting the mix of data types stated in the template". Thus a group of 5
unsigned chars will be created in the form of a structure.
The individual elements of the structure can now be accessed as:
1. sensor_database.gain = 0x30 ;
2. sensor_database.offset = 0x50 ;
3. sensor_database.temp_coeff = 0x60 ;
4. sensor_database.span = 0xC4 ;
5. sensor_database.amp_gain = 0x21 ;
Arrays Of Structures
In the example though, information on many sensors is required and, as with individual chars
and ints, it is possible to declare an array of structures. This allows many similar groups of data
to have different sets of values.
1. struct SENSOR_DESC sensor_database[4] ;
This creates four identical structures in memory, each with an internal layout determined by the
structure template. Accessing this array is performed simply by appending an array index to the
structure name:
1. /*Operate On Elements In First Structure Describing */
2. /*Sensor 0 */
3. sensor_database[0].gain = 0x30 ;
4. sensor_database[0].offset = 0x50 ;
5. sensor_database[0].temp_coeff = 0x60 ;
6. sensor_database[0].span = 0xC4 ;
7. sensor_database[0].amp_gain = 0x21 ;
8. /* Operate On Elements In First Structure Describing */
9. /*Sensor 1 */
{0x20,0x40,0x50,0xA4,0x21},
4.
{0x33,0x52,0x65,0xB4,0x2F},
5.
{0x30,0x50,0x48,0xC4,0x3A},
6.
{0x32,0x56,0x56,0xC5,0x28}
7. } ;
Placing Structures At Absolute Addresses
It is sometimes necessary to place a structure at an absolute address. A typical example are CAN
interfaces or other peripheral chips that offer arrays of data groups.
For example, the registers of a memory-mapped real time clock chip are to be grouped together
as a structure. The template in this instance might be
1. // Contents Of RTCBYTES.C Module
2. struct RTC
3. {
4.
5.
6.
7.
8. } ;
9. struct RTC xdata RTC_chip ; // Create xdata structure
A trick using the linker is required here so the structure creation must be placed in a dedicated
module. This module's XDATA segment, containing the RTC structure, is then fixed at the
required address at link time.
Using the absolute structure could be:
1. /* Structure located at base of RTC Chip */
2. MAIN.C Module
3. extern xdata struct RTC_chip ;
4. /* Other XDATA Objects */
5. xdata unsigned char time_secs, time_mins ;
6. void main(void)
7. {
8.
time_secs = RTC_chip.seconds ;
9.
time_mins = RTC_chip.minutes;
10. }
Linker Input File To Locate RTC_chip structure over real RTC Registers is:
1. l51 main.obj,rtcbytes.obj XDATA(?XD?RTCBYTES(0h))
Pointers To Structures
Pointers can be used to access structures, just as with simple data items. Here is an example:
1. /* Define pointer to structure */
2. struct SENSOR_DESC *sensor_database ;
3. /* Use Pointer To Access Structure Elements */
4. sensor_database->gain = 0x30 ;
5. sensor_database->offset = 0x50 ;
6. sensor_database->temp_coeff = 0x60 ;
7. sensor_database->span = 0xC4 ;
8. sensor_database->amp_gain = 0x21 ;
Note that the '*' which normally indicates a pointer has been replaced by appending '->' to the
pointer name. Thus '*name' and 'name->' are equivalent.
Passing Structure Pointers To Functions
A common use for structure pointers is to allow them to be passed to functions without huge
amounts of parameter passing; a typical structure might contain 20 data bytes and to pass this to
a function would require 20 parameters to either be pushed onto the stack or an abnormally large
parameter passing area. By using a pointer to the structure, only the two or three bytes that
constitute the pointer need be passed. This approach is recommended for C51 as the overhead of
passing whole structures can tie the poor old 8051 CPU in knots!
This would be achieved by:
1. struct SENSOR_DESC *sensor_database ;
2. sensor_database->gain = 0x30 ;
3. sensor_database->offset = 0x50 ;
4. sensor_database->temp_coeff = 0x60 ;
5. sensor_database->span = 0xC4 ;
6. sensor_database->amp_gain = 0x21 ;
7. test_function(*struct_pointer) ;
char seconds ;
5.
char mins ;
6.
char hours ;
7.
char days ;
8. } ;
9.
10. /* Create A Pointer To Structure */
11. struct RTC xdata *rtc_ptr ; // 'xdata' tells C51 that this
12.
16.
17.
18.
19.
rtc_ptr->mins = 0x01 ;
20. }
This general technique can be used in any situation where a pointer-addressed structure needs to
be placed over a specific IO device. However it is the user's responsibility to make sure that the
address given is not likely to be allocated by the linker as general variable RAM!
To summarize, the procedure is:
1. Define template
2. Declare structure pointer as normal
3. At run time, force pointer to required absolute address in the normal way.
Unions
Unions allow you to define different datatype references for the same physical address. This way
you can address a 32-bit word as a "long" OR as 2 different "ints" OR as an array of 4 bytes.
A union is similar in concept to a structure except that rather than creating sequential locations to
represent each of the items in the template, it places each item at the same address. A union
specifying 4 bytes may still only occupy a single byte. A union may consist of a combination of
longs, char and ints all based at the same physical address.
The the number of bytes of RAM used by a union is simply determined by the size of the largest
element, so:
1. union test
2. {
3.
char x ;
4.
int y ;
5.
char a[3] ;
6.
long z ;
7. } ;
requires 4 bytes, this being the size of a long. The physical location of each element is the base
address plus the following offsets:
Offset
0
+1
+2
+3
x
byte
y
high byte
low byte
a
a[0]
a[1]
a[2]
a[3]
z
highest byte
mid byte
mid byte
lowest byte
In embedded C the commonest use of a union is to allow fast access to individual bytes of longs
or ints. These might be 16 or 32 bit real time counters, as in this example:
1. /* Declare Union */
2. union clock
3. {
4.
long real_time_count ;
5.
6.
7.
// int array
char real_time_bytes[4] ; // Reserve four bytes as
8.
// char array
9. } ;
10. /* Real Time Interrupt */
11. void timer0_int(void) interrupt 1 using 1
12. {
13.
clock.real_time_count++ ;
// Increment clock
14.
15.
if(clock.real_time_words[1] == 0x8000)
16.
17.
/* Do something! */
18.
19.
if(clock.real_time_bytes[3] == 0x80)
20.
21.
22.
/* Do something! */
23.
24.
25. }
Generic Pointers
C51 offers two basic types of pointer, the spaced (memory-specific) and the generic.
As has been mentioned, the 8051 has many physically separate memory spaces, each addressed
by special assembler instructions. Such characteristics are not peculiar to the 8051 - for example,
the 8086 has data instructions which operate on a 16 bit (within segment) and a 20 bit basis.
For the sake of simplicity, and to hide the real structure of the 8051 from the programmer, C51
uses three byte pointers, rather than the single or two bytes that might be expected. The end
result is that pointers can be used without regard to the actual location of the data.
For example:
1. xdata char buffer[10] ;
2. code char message[] = { "HELLO" } ;
3. void main(void)
4. {
5. char *s ;
6. char *d ;
7.
8.
s = message ;
9.
d = buffer ;
10.
while(*s != '\0')
11.
12.
13.
*d++ = *s++ ;
}
14. }
Yields the following code:
1.
RSEG ?XD?T1
2. buffer:
3.
DS 10
RSEG ?CO?T1
4. message:
5.
6. ;
7. ;
8. ; xdata char buffer[10] ;
9. ; code char message[] = { "HELLO" } ;
10. ;
11. ;
void main(void) {
12.
RSEG ?PR?main?T1
13.
USING
14. main:
15.
; SOURCE LINE # 6
16. ;
17. ;
char *s ;
18. ;
char *d ;
19. ;
20. ;
21.
s = message ;
; SOURCE LINE # 11
22.
MOV
s?02,#05H
23.
MOV
s?02+01H,#HIGH message
24.
MOV
s?02+02H,#LOW message
25. ;
26.
d = buffer ;
; SOURCE LINE # 12
27.
MOV
d?02,#02H
28.
MOV
d?02+01H,#HIGH buffer
29.
MOV
d?02+02H,#LOW buffer
30. ?C0001:
31. ;
32. ;
33.
while(*s != '\0') {
; SOURCE LINE # 14
34.
MOV
R3,s?02
35.
MOV
R2,s?02+01H
36.
MOV
R1,s?02+02H
37.
LCALL
38.
JZ
?C_CLDPTR
?C0003
39. ;
*d++ = *s++ ;
40.
; SOURCE LINE # 15
41.
INC
s?02+02H
42.
MOV
43.
JNZ
?C0004
44.
INC
s?02+01H
A,s?02+02H
45. ?C0004:
46.
DEC
47.
MOV
48.
LCALL
49.
MOV
R7,A
50.
MOV
R3,d?02
51.
INC
52.
MOV
A,d?02+02H
53.
MOV
R2,d?02+01H
54.
JNZ
?C0005
55.
INC
d?02+01H
56. ?C0005:
A
R1,A
?C_CLDPTR
d?02+02H
57.
DEC
58.
MOV
R1,A
59.
MOV
A,R7
60.
LCALL
?C_CSTPTR
61. ;
62.
; SOURCE LINE # 16
63.
SJMP
64. ;
65.
?C0001
; SOURCE LINE # 17
66. ?C0003:
67.
RET
END
As can be seen, the pointers '*s' and '*d' are composed of three bytes, not two as might be
expected. In making *s point at the message in the code space an '05' is loaded into s ahead of the
actual address to be pointed at. In the case of *d '02' is loaded. These additional bytes are how
C51 knows which assembler addressing mode to use. The library function C_CLDPTR checks
the value of the first byte and loads the data, using the addressing instructions appropriate to the
memory space being used.
This means that every access via a generic pointer requires this library function to be called. The
memory space codes used by C51 are:
CODE - 05
XDATA - 02
PDATA - 03
DATA - 05
IDATA - 01
xdata
*ext_ptr ;
8. CLR A
9. MOV R1,A
10. ?C0004:
11. ;
12. ; while ((s1[i++] = *s2++) != 0);
13. INC s2?10+01H
14. MOV A,s2?10+01H
15. MOV R4,s2?10
16. JNZ ?C0008
17. INC s2?10
18. ?C0008:
19. DEC A
20. MOV DPL,A
21. MOV DPH,R4
22. CLR A
23. MOVC A,@A+DPTR
24. MOV R5,A
25. MOV R4,AR1
26. INC R1
27. MOV A,R7
28. ADD A,R4
29. MOV DPL,A
30. CLR A