Skip to content

Proposal: usize definition should be refined #5185

@ikskuh

Description

@ikskuh

I came across the definition of usize, which is currently defined as unsigned pointer sized integer and a question arose: Size of what pointer? Function pointer? Pointer to constant data? Pointer to mutable data?

For most platforms, the answer is simple: There is only one address space.

But as Zig tries to target all platforms, we should bear in mind that this is not true for all platforms.

Case Study:
Zig supports AVR at the moment which has two memory spaces:

  • Data
  • Code

Both memory spaces have different adressing modes which can be used with the Z register, which is a 16 bit register. Thus, we could concloud that the pointer size is 16 bit. But the AVR instruction set also has a RAMPZ register that is prepended to the Z register to extend the memory space to 24 bit. Some modern AVRs have more than 128k ROM (e.g. Mega2560). This means that the effective pointer size 24 bit.

The same problem arises when targeting the 8086 CPU with segmentation. The actual pointer is a 20 bit value that is calculated by combining two 16 bit values (segment + offset).

Problem:
usize communicates that it stores the size of something, not the address. Right now, usize can contain values larger than the biggest (continously) adressable object in the language and it takes up more space than needed.

C has two distinct types for that reason:

  • size_t (can store the size of an adressable object)
  • uintptr_t (can store any pointer)

AVR-GCC solves the problem of 24 bit pointers by ignoring it and creates shims for functions that are linked beyond the 128k boundary. Data beyond the 64k boundary cannot be adressed and afaik LLVM has the same restriction. I don't think Zig should ignore such platform specifics and should be able to represent them correctly.

Proposal:
Redefine usize to be can store the size of any object or array and introduce a new type upointer that is pointer sized integer. Same for isize and ipointer.

It should also be discussed if a upointer will have a guaranteed unique representation or may be ambiguous ("storing a linear address or storing segment + descriptor")?

Changes that should be made as well:

  • @ptrToInt and @intToPtr should now return upointer instead of usize
  • @sizeOf will still return usize

Pro:

  • Communicates intend more precise by using distinct types for int-encoded pointers and object sizes / indices
  • Saves memory as object sizes may be 50% smaller than pointers

Con:

  • One more type
  • May spark confusion for people who assume that pointer size is always object size

Example:

// AVR:
const usize = u16;
const upointer = u24;

// 8086:
const usize = u16;
const upointer = u32;

Note:
I'm not quite sure about all of this yet as this is a very special case that only affects some platforms whereas most platforms don't have the object size is not pointer size restriction.

Resources:

Edit: Included answer to the question of @LemonBoy, added pro/con discussion, added example

Metadata

Metadata

Assignees

No one assigned

    Labels

    proposalThis issue suggests modifications. If it also has the "accepted" label then it is planned.

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions