现代C++基础 - 生命周期与类型安全
现代C++基础 - 生命周期与类型安全
现代C++基础
Modern C++ Basics
Jiaming Liang, undergraduate from Peking University
• Lifetime
• Inheritance Extension
• Oops, the temporary variable goes out of its lifetime suddenly after this
statement, and the string is invalid afterwards!
• They may be more subtle than this, e.g.
• Return reference to local variable, which is destroyed after exiting the function!
• Accessing reference to some variables in another thread, which has been joint.
• You capture local variables in lambda by reference, but it goes out of the
current scope (like assigning to std::function& to parent). ATTENTION!
• Keep in mind reference used in std::bind!
• …
• So let’s see what’s lifetime.
Storage duration
• Storage duration means how long the storage used by the object
will exist.
• There are four kinds of storage duration:
• Static storage duration: global variables, static variables in a function/class.
It will be constructed before entering int main, and destructed after
exiting main. Abortion will not call destructors.
• Automatic storage duration: variables that belong to a block scope or
function arguments. They’ll be constructed when they’re defined, and
destructed when the current scope exits (e.g. the function exits).
• Dynamic storage duration: you can create objects by using new or some
other allocation functions. You can only possess the corresponding pointer,
and destroying the pointer will not destroy the dynamic memory
automatically.
• Thread storage duration: constructed when the thread is created, and
destructed when it exits; we’ll cover it in the future.
Lifetime
• The lifetime of an object begins when the storage with proper
alignment and size is allocated and the object is initialized.
• Including types that can be default-initialized (e.g. int, class with a default
initializer).
• The lifetime of an object ends when it’s destroyed, or the dtor is
called, or its storage is released or reused by non-nested object.
• If a is a member of b, then a is nested within b.
• Particularly, though reference ends the lifetime only when it exits
the scope, using the underlying object that has ended the
lifetime is invalid, which is called dangling reference.
• This is because accessing reference is same as accessing the underlying
object, and accessing out of lifetime is UB.
To be exact, some operations are still allowed, e.g. binding the reference to another reference, accessing static
members, but it’s improper to utilize them.
Lifetime
• Temporary objects, as we’ve seen, are only alive until the
statement ends (i.e. until ;).
• That’s why const& or std::function_ref is safe to be used as parameter,
even if the passed functor is temporary.
• Also, the lifetime of returned temporaries can be extended by some
references, e.g. we’ve learnt const&.
• NOTE AGAIN: this requires “returned temporaries”; returned
reference or pointer to local variable is still wrong.
• The storage still exists, until it exits the current scope; but the original
objects in the array (i.e. int) have been dead, and lifetime of new object
begins.
• This is just “the storage is reused”.
• You need to manually destruct A (without releasing the buffer!) by calling ~A()
before exiting scope.
• In practice, we’ve learnt std::vector; it will allocate memory (i.e. capacity)
before inserting elements (i.e. size). You cannot use vec[size] when capacity
> size since the lifetime of that element doesn’t begin.
Lifetime
• So, can you use buffer[0] to access int where new objects are
constructed as if it’s still a complete int array?
• Of course not, the lifetime of element has ended since its storage is
reused, and accessing out-of-lifetime object is invalid.
• So in modern C++, pointer is far beyond address; it has type T*, and
you can hardly ever access some address by it when there are no
underlying objects of type T alive.
• But you can still use the original array buffer to access other elements
whose storage is not reused.
• Similarly, for union type, it’s illegal to access an
object that’s not in its lifetime (it’s only allowed in C)!
• Here u.a is in its lifetime, while u.b is not.
• You should use std::memcpy or std::bit_cast since C++20 to make them
bitwise equivalent.
Placement new
• The ConstructOnBuffer is in fact placement new, which won’t
allocate memory, but only create the object at the place.
• new(buffer) Type Initializer, where Initializer is optional.
• Compared with new, it only adds a (buffer).
• Of course, you need to make sure the alignment satisfies the requirement
of the type, so you can use keyword alignas.
Preparation…
• Before we go on, we need some preparations…
• std::byte: defined in <cstddef> since C++17; it’s just an enumeration
class and explicitly represents a byte (before we may use unsigned char).
• Trivial dtor:
• We say a class has a trivial dtor if:
• It’s implicitly declared or declared with =default.
• It’s non-virtual and all non-static data members have trivial dtor.
• For example, they have a trivial dtor:
• struct A{ int a; float b; };.
• class A{ int a; public: float b; ~A() = default; };.
• And they don’t have:
• class A{ int a; public: float b; virtual ~A() = default; };.
• class B : public A{};, since the dtor is still virtual.
• class A{ ~A() {}; };, though dtor does nothing and just behaves like default, it’s
defined so it’s not trivial.
• class A{ std::unique_ptr<int> ptr; };, though the dtor is default, there is at least
one data member that doesn’t have trivial dtor.
Lifetime
• Corner cases:
• Case1: if you construct an object that has the same type as the original
object (ignoring cv-qualifier), and they occupies exactly same storage,
then the original name, pointers and references are still valid (it’s just very
similar to use assignment operators…).
• Particularly, if the alignment or padding is not same, then “exactly same storage” is
violated, and it’s still invalid.
• Besides, the original object should not be const complete object (i.e. const T that is not
a normal member, but just a static member or a local or global variable), otherwise
compilers will utilize its const to optimize (i.e. assume the object never change)!
• Case2: It’s best to reuse storage of plain types like int or classes that
have trivial dtor. For other types, since exiting the scope will call the dtor,
you need to guarantee original objects are still there.
• So you have to record the original objects before construct new objects, and
restore them after you destruct new objects. But why bother???
• Case3: It’s illegal to reuse memory of const objects that have determined
their value in the compilation time;
Lifetime
• We’ve learnt in ICS that they may exist in read-only segment of the program,
which forbids writing.
• However, some const variables that cannot be determined at compilation time
(e.g. const member constructed in ctor; or allocated on heap; etc.) is still
reusable.
• Case4: unsigned char/std::byte array is explicitly regulated to be able to
provide storage.
• The only difference is that new object is seen as nested within the array, so the
array doesn’t end its lifetime even if you occupy the storage by other objects!
• This property is important for some classes that need a storage with
construction of another type.
• Lifetime ending of partial members will cause the whole object ends the lifetime.
• So, if you choose to use e.g. an array as buffer so that it’s a member dataset, but
it’s not an unsigned char/std::byte array, then when you construct the new object,
the buffer lifetime ends so that the total class will be out-of-lifetime! Then it’s illegal
to continue to use the object.
Lifetime
• Case5: It’s legal to access the underlying object by pointers without the
same type in these cases (type punning/aliasing):
• 1. add/remove cv-qualification, of course.
• 2. decayed arrays, i.e. a pointer can be used to access array.
• 3. If the underlying type is integer, then using the pointer of its
signed/unsigned variant to access it is OK.
• 4. If convert it to (unsigned) char*/std::byte*, i.e. it’s legal to view an object
as a byte array.
• However, in this case, it’s possibly illegal to write the element, which may end the
lifetime of the original object because of storage reuse.
• Case6: If you have an old pointer where you’ve constructed a new object,
but you want to use the old pointer to get the new pointer, you can use
std::launder defined in <new> since C++17.
• E.g. for our ConstructOnBuffer() example, you can also use
std::launder(reinterpret_cast<A*>(buffer)) to get the actual valid pointer.
Strict aliasing rules
• Based on lifetime, we can do optimizations on pointers.
• For example, it’s impossible for int* a and float* b to refer to the same
object, thus the compiler can assume they’re different.
• So *a += *b; *a += *b; can be optimized as *a += 2 * *b;, without worrying
that they refer to the same object so that it’s finally *a += 3 * *b.
• This is called strict aliasing rules, i.e. if pointers are not aliased
or not contained as member, then compilers can freely assume
that they’re different objects.
• For instance, a pointer to class and to its member type will also not be
optimized, e.g. class A{ int a; float b; }; A* with float*.
• Compilers may optimize it out in the future even if it currently
not, e.g. stackoverflow question, Clang14 does it.
Strict aliasing rules doesn’t overlap with other things, e.g. uint8_t* restrict
target.
Credit: https://stackoverflow.com/questions/26295216/using-this-pointer-causes-
strange-deoptimization-in-hot-loop
Lifetime
• Wait, there is one more thing…what about std::malloc?
• Normally, we use malloc like this:
• You get a void* from the function, what’s the underlying object?
• It’s not A, since malloc only allocates memory!
• So how can you use arr[i].a? There is no object in its lifetime!
• It’s really a shame to say that it’s UB before C++20…
• C++20 adds a small patch, i.e. operations like std::malloc/calloc/realloc or
allocators will implicitly begin the lifetime, and then you can use operations like
above to make objects suitably constructed.
• Particularly, array of unsigned char/std::byte can be used to implicitly create
objects too, which means such code is Okay:
Lifetime
• But such code is still not legal:
• The reason is still lifetime; a float cannot be accessed by a int*.
• But you can int* ptr2 = new(arr) int{2}, which then begins the lifetime of int and
ends lifetime of float (so ptr is illegal). Then you can normally read *ptr2.
• In a word: read before beginning lifetime is still UB.
• Notice that this may cause some astonishment; You use A& to
change its member, but members of B& also change!
• Virtual base usually has worse space & time performance, too.
• Anyway, it’s usually discouraged to use multiple inheritance with
such complexity; Simple ones can simplify your design enough.
• One typical and useful pattern is Mixin Pattern.
Multiple inheritance
• That is, you define many ABCs, which tries to reduce data
members and non-pure-virtual member functions as much as
you can.
• Of course, the best is that all pure-virtual functions (except for virtual dtor)
and no data members, which is simplest and most useful one.
• Interface in e.g. C# and Java is like this.
• They usually denote “-able” functionality, i.e. some kind of ability
in one dimension.
• For example, consider a war simulation game.
• When you want to use it to show the ability of “attack and defense”, then
use Attacker and Defender to refer members.
• When it’s time for users to move, then only movable ones can do it, and
non-movable ones may have other functionalities.
Multiple inheritance
• A possible way is:
• Finally, for e.g. if(…), it’s in fact contextual conversion, i.e. the
context needs it to be bool. This also applies on &&/||/! and
some other operations that obviously needs bool.
Type Safety
• Type Safety
• Implicit conversion
• static_cast
• dynamic_cast and RTTI
• const_cast
• reinterpret_cast
• C-style cast
static_cast
• To show explicitly the functionality of cast so that users can
check it cautiously, C++ divides C-style cast into four parts.
• static_cast is the most powerful one, which can process almost all of
normal conversions.
• dynamic_cast is used to process polymorphism specially.
• const_cast is dangerous, since you are trying to see a const thing as non-
const.
• reinterpret_cast is even more dangerous; it’s used to convert pointers or
references to different types, while lifetime will always forbid you to do
insane things by UB.
• C-style conversions will mix them up, which may let you omit the
danger of const-correctness and lifetime problem.
static_cast
• So let’s demystify static_cast<TargetType>(Exp) first!
• 1. It’s a kind of explicit conversions, so of course all implicit
conversions can be explicitly denoted by static_cast.
• You can also do inverse operations too, even if it’s narrow (e.g. int->short,
double->float). We’ve said how C++ processes narrow numeric
conversions.
• The first and third class of standard conversion is not inversible, i.e. you
cannot turn pointer back to function/array.
• 2. Scoped enumeration can be converted to/from integer or
floating point, which is same as the underlying integer type.
• Remember? You can e.g. enum class A: std::uint8_t{…}.
• Notice that enumeration has limit, not whole range of underlying type.
• Comparison is also the underlying type comparison.
static_cast
• 3. Inheritance-related conversions
• There are two kinds of conversions: upcast and downcast.
• As their names, upcast means conversion to base class, while downcast
means conversion to derived class.
• Here we refer to reference or pointer conversion; otherwise it’s a new object.
• Obviously, upcast is safer since a derived object is always a base object; so
this is also implicit conversion.
• But only public inheritance is convertible; private one is not.
• Downcast is dangerous so it needs explicit
conversion; you must ensure the original
object is just the derived object, so that
the pointer/reference is safe.
• If e.g. the original one is just a base object, then UB (of course, since there is no
derived object in its lifetime!).
• It won’t do any check, so be certain if you really want!
• Virtual base or ambiguous base cannot be downcast, too.
Preparation…
• Before going on, we first introduce another property of class…
• A class is said to be standard-layout, if:
• All non-static data members have the same accessibility and are also
standard-layout.
• This is because the layout of members that have different accessibility are
unspecified (before C++23); e.g. as the sequence of declaration or first all
public members and second all private members.
• No virtual functions or non-standard-layout base classes.
• The base class is not the type of the first member data.
• There is at most one class in the inheritance hierarchy that has non-static
member variable.
• That’s because layout of inheritance is not regulated.
• Purpose of this property: If a class is standard-layout, then its
layout is same as struct in C (as we’ve learnt in ICS!).
Preparation… You can use std::is_standard_layout_v<T> to check
whether T is standard-layout.
• Examples:
• Summary:
• Implicit conversion and their inverse operation, except for array/function
decay.
• Enumeration conversion.
• Inheritance-related conversion.
• Pointer conversion.
Type Safety
• Type Safety
• Implicit conversion
• static_cast
• dynamic_cast and RTTI
• const_cast
• reinterpret_cast
• C-style cast
Dynamic cast and RTTI
• We’ve seen that static_cast doesn’t check validity along
inheritance chain; you can cast to the derived class even if there
is no actual one in its lifetime.
• It just uses UB to regulate, but that’s too weak!
• dynamic_cast tries to solve that; the conversion will fail when it’s
inappropriate.
• To be specific, reference conversion failure will throw std::bad_cast
exception, while pointer conversion failure will return nullptr.
• This is stronger than UB, and more convenient to find bugs!
• So, to do type check in run time, RTTI (Run-Time Type
Information/Identification) is preserved.
• Notice that dynamic_cast can only be used in polymorphic types, since
RTTI relies on things like virtual pointer.
dynamic_cast
• However, safety comes with cost.
• dynamic_cast that uses RTTI is at least 10 times slower as the static_cast,
and can be even hundreds or thousands slower in some cases!
• The former happens when it’s a inheritance chain and converts to the
exact underlying type (usual case); the latter happens for “sidecast” in a
inheritance graph (i.e. multiple inheritance).
• Often, dynamic_cast means some defects in design, i.e. the original type
cannot represent all behaviors by polymorphism; so it’s usually better to
not use it frequently.
• Performance-critical projects may write their own downcast (e.g. LLVM).
• For construction:
• By default, the first alternative is value-initialized.
• You can also assign a value with the same type of some alternative, then that’s the
active alternative.
• You can also construct the member in place, i.e. by (std::in_place_type<T>,
args…).
• This is similar to std::move_only_function, which is fit for construction-only class.
• If there are more than one alternative of this type, then the above two are disabled.
• You can construct by index, i.e. (std::in_place_index<Index>, args…).
• E.g. (std::in_place_index<3>, 4, 1) to construct a vector with four 1.