-
-
Notifications
You must be signed in to change notification settings - Fork 2.9k
Description
I came across the definition of usize
, which is currently defined as unsigned pointer sized integer and a question arose: Size of what pointer? Function pointer? Pointer to constant data? Pointer to mutable data?
For most platforms, the answer is simple: There is only one address space.
But as Zig tries to target all platforms, we should bear in mind that this is not true for all platforms.
Case Study:
Zig supports AVR at the moment which has two memory spaces:
- Data
- Code
Both memory spaces have different adressing modes which can be used with the Z
register, which is a 16 bit register. Thus, we could concloud that the pointer size is 16 bit. But the AVR instruction set also has a RAMPZ
register that is prepended to the Z
register to extend the memory space to 24 bit. Some modern AVRs have more than 128k ROM (e.g. Mega2560). This means that the effective pointer size 24 bit.
The same problem arises when targeting the 8086 CPU with segmentation. The actual pointer is a 20 bit value that is calculated by combining two 16 bit values (segment + offset).
Problem:
usize
communicates that it stores the size of something, not the address. Right now, usize can contain values larger than the biggest (continously) adressable object in the language and it takes up more space than needed.
C has two distinct types for that reason:
size_t
(can store the size of an adressable object)uintptr_t
(can store any pointer)
AVR-GCC solves the problem of 24 bit pointers by ignoring it and creates shims for functions that are linked beyond the 128k boundary. Data beyond the 64k boundary cannot be adressed and afaik LLVM has the same restriction. I don't think Zig should ignore such platform specifics and should be able to represent them correctly.
Proposal:
Redefine usize
to be can store the size of any object or array and introduce a new type upointer
that is pointer sized integer. Same for isize
and ipointer
.
It should also be discussed if a upointer
will have a guaranteed unique representation or may be ambiguous ("storing a linear address or storing segment + descriptor")?
Changes that should be made as well:
@ptrToInt
and@intToPtr
should now returnupointer
instead ofusize
@sizeOf
will still returnusize
Pro:
- Communicates intend more precise by using distinct types for int-encoded pointers and object sizes / indices
- Saves memory as object sizes may be 50% smaller than pointers
Con:
- One more type
- May spark confusion for people who assume that pointer size is always object size
Example:
// AVR:
const usize = u16;
const upointer = u24;
// 8086:
const usize = u16;
const upointer = u32;
Note:
I'm not quite sure about all of this yet as this is a very special case that only affects some platforms whereas most platforms don't have the object size is not pointer size restriction.
Resources:
- AVR Instruction Set
- Accessing Memory Outside of the 64K Range
- 8086 Adressing
- Explanation/Discussion on SO
Edit: Included answer to the question of @LemonBoy, added pro/con discussion, added example