Jump to content

Null pointer: Difference between revisions

From Wikipedia, the free encyclopedia
Content deleted Content added
Edited Lisp item to make it more accurate. In particular, "therefore" suggested something other than a convention.
Restored revision 1235993528 by 195.50.170.106 (talk): Rv youtube sourcing
(47 intermediate revisions by 26 users not shown)
Line 3: Line 3:
In [[computing]], a '''null pointer''' or '''null reference''' is a value saved for indicating that the [[Pointer (computer programming)|pointer]] or [[reference (computer science)|reference]] does not refer to a valid [[Object (computer science)|object]]. Programs routinely use null pointers to represent conditions such as the end of a [[List (computing)|list]] of unknown length or the failure to perform some action; this use of null pointers can be compared to [[nullable type]]s and to the ''Nothing'' value in an [[option type]].
In [[computing]], a '''null pointer''' or '''null reference''' is a value saved for indicating that the [[Pointer (computer programming)|pointer]] or [[reference (computer science)|reference]] does not refer to a valid [[Object (computer science)|object]]. Programs routinely use null pointers to represent conditions such as the end of a [[List (computing)|list]] of unknown length or the failure to perform some action; this use of null pointers can be compared to [[nullable type]]s and to the ''Nothing'' value in an [[option type]].


A null pointer should not be confused with an [[uninitialized variable|uninitialized pointer]]: a null pointer is guaranteed to compare unequal to any pointer that points to a valid object. However, depending on the language and implementation, an uninitialized pointer may not have any such guarantee. It might compare equal to other, valid pointers; or it might compare equal to null pointers. It might do both at different times; or the comparison might be [[Undefined behavior|undefined behaviour]].
A null pointer should not be confused with an [[uninitialized variable|uninitialized pointer]]: a null pointer is guaranteed to compare unequal to any pointer that points to a valid object. However, in general, most languages do not offer such guarantee for uninitialized pointers. It might compare equal to other, valid pointers; or it might compare equal to null pointers. It might do both at different times; or the comparison might be [[Undefined behavior|undefined behaviour]]. Also, in languages offering such support, the correct use depends on the individual experience of each developer and linter tools. Even when used properly, null pointers are [[semantic completeness|semantically incomplete]], since they do not offer the possibility to express the difference between a "Not Applicable" value versus a "Not known" value or versus a "Future" value.

Because a null pointer does not point to a meaningful object, an attempt to access the data stored at that (invalid) memory location may cause a run-time error or immediate program crash. This is the '''null pointer error'''. It is one of the most common types of software weaknesses,<ref>{{cite web |url=https://cwe.mitre.org/data/definitions/476.html |title=CWE-476: NULL Pointer Dereference |website=MITRE}}</ref> and [[Tony Hoare]], who introduced the concept, has referred to it as a "billion dollar mistake".


== C ==
== C ==


In [[C (programming language)|C]], two null pointers of any type are guaranteed to compare equal.<ref>[[#c-std|ISO/IEC 9899]], clause 6.3.2.3, paragraph 4.</ref> The preprocessor macro <code>NULL</code> is defined as an implementation-defined null pointer constant,<ref>[[#c-std|ISO/IEC 9899]], clause 7.17, paragraph 3: ''NULL... which expands to an implementation-defined null pointer constant...''</ref> which in [[C99]] can be portably expressed as <code>((void *)0)</code> which means that the integer value <code>0</code> converted to the type <code>void*</code> (pointer to [[void type|void]]).<ref>[[#c-std|ISO/IEC 9899]], clause 6.3.2.3, paragraph 3.</ref> The C standard does not say that the null pointer is the same as the pointer to [[memory address]]&nbsp;0, though that may be the case in practice. [[Dereferencing]] a null pointer is [[undefined behavior]] in C,<ref name="ub"/> and a conforming implementation is allowed to assume that any pointer that is dereferenced is not null.
In [[C (programming language)|C]], two null pointers of any type are guaranteed to compare equal.<ref>[[#c-std|ISO/IEC 9899]], clause 6.3.2.3, paragraph 4.</ref> The preprocessor macro <code>NULL</code> is defined as an implementation-defined null pointer constant in <code><[[stdlib.h]]></code>,<ref>[[#c-std|ISO/IEC 9899]], clause 7.17, paragraph 3: ''NULL... which expands to an implementation-defined null pointer constant...''</ref> which in [[C99]] can be portably expressed as <code>((void *)0)</code>, the integer value <code>0</code> [[Type conversion|converted]] to the type <code>void*</code> (see [[pointer to void type]]).<ref>[[#c-std|ISO/IEC 9899]], clause 6.3.2.3, paragraph 3.</ref> The C standard does not say that the null pointer is the same as the pointer to [[memory address]]&nbsp;0, though that may be the case in practice. [[Dereferencing]] a null pointer is [[undefined behavior]] in C,<ref name="ub"/> and a conforming implementation is allowed to assume that any pointer that is dereferenced is not null.


In practice, dereferencing a null pointer may result in an attempted read or write from [[computer memory|memory]] that is not mapped, triggering a [[segmentation fault]] or memory access violation. This may manifest itself as a program crash, or be transformed into a software [[exception handling|exception]] that can be caught by program code. There are, however, certain circumstances where this is not the case. For example, in [[Intel x86|x86]] [[real mode]], the address <code>0000:0000</code> is readable and also usually writable, and dereferencing a pointer to that address is a perfectly valid but typically unwanted action that may lead to undefined but non-crashing behavior in the application. There are occasions when dereferencing the pointer to address zero ''is'' intentional and well-defined; for example, [[BIOS]] code written in C for 16-bit real-mode x86 devices may write the [[interrupt descriptor table]] (IDT) at physical address&nbsp;0 of the machine by dereferencing a null pointer for writing. It is also possible for the compiler to optimize away the null pointer dereference, avoiding a segmentation fault but causing other [https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html undesired behavior].
In practice, dereferencing a null pointer may result in an attempted read or write from [[computer memory|memory]] that is not mapped, triggering a [[segmentation fault]] or memory access violation. This may manifest itself as a program crash, or be transformed into a software [[exception handling|exception]] that can be caught by program code. There are, however, certain circumstances where this is not the case. For example, in [[Intel x86|x86]] [[real mode]], the address <code>0000:0000</code> is readable and also usually writable, and dereferencing a pointer to that address is a perfectly valid but typically unwanted action that may lead to undefined but non-crashing behavior in the application. There are occasions when dereferencing the pointer to address zero ''is'' intentional and well-defined; for example, [[BIOS]] code written in C for 16-bit real-mode x86 devices may write the [[interrupt descriptor table]] (IDT) at physical address&nbsp;0 of the machine by dereferencing a null pointer for writing. It is also possible for the compiler to optimize away the null pointer dereference, avoiding a segmentation fault but causing other undesired behavior.<ref>{{Cite web |last=Lattner |first=Chris |date=2011-05-13 |title=What Every C Programmer Should Know About Undefined Behavior #1/3 |url=https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html |url-status=live |archive-url=https://web.archive.org/web/20230614124121/https://blog.llvm.org/2011/05/what-every-c-programmer-should-know.html |archive-date=2023-06-14 |access-date=2023-06-14 |website=blog.llvm.org |language=en}}</ref>


== C++ ==
== C++ ==


In C++, while the <code>NULL</code> macro was inherited from C, the integer literal for zero has been traditionally preferred to represent a null pointer constant.<ref>{{cite book |last1=Stroustrup |first=Bjarne |authorlink1=Bjarne_Stroustrup |title=[[The C++ Programming Language]] |edition=14th printing of 3rd |date=March 2001 |publisher=Addison–Wesley |location=United States and Canada |isbn=0-201-88954-4 |page=[https://archive.org/details/cprogramminglang00stro_0/page/88 88] |chapter=Chapter 5: <br />The <code>const</code> qualifier (§5.4) prevents accidental redefinition of <code>NULL</code> and ensures that <code>NULL</code> can be used where a constant is required. }}</ref> However, [[C++11]] introduced the explicit null pointer constant <code>nullptr</code> to be used instead.
In C++, while the <code>NULL</code> macro was inherited from C, the integer literal for zero has been traditionally preferred to represent a null pointer constant.<ref>{{cite book |last1=Stroustrup |first1=Bjarne |authorlink1=Bjarne Stroustrup |title=[[The C++ Programming Language]] |edition=14th printing of 3rd |date=March 2001 |publisher=Addison–Wesley |location=United States and Canada |isbn=0-201-88954-4 |page=[https://archive.org/details/cprogramminglang00stro_0/page/88 88] |chapter=Chapter 5: <br />The <code>const</code> qualifier (§5.4) prevents accidental redefinition of <code>NULL</code> and ensures that <code>NULL</code> can be used where a constant is required. }}</ref> However, [[C++11]] introduced the explicit null pointer constant <code>[[nullptr]]</code> and type <code>[[nullptr_t]]</code> to be used instead.


== Other languages ==
== Other languages ==
Line 25: Line 27:
== Null dereferencing ==
== Null dereferencing ==


Because a null pointer does not point to a meaningful object, an attempt to [[dereference]] (i.e., access the data stored at that memory location) a null pointer usually (but not always) causes a run-time error or immediate program crash.
Because a null pointer does not point to a meaningful object, an attempt to [[dereference]] (i.e., access the data stored at that memory location) a null pointer usually (but not always) causes a run-time error or immediate program crash. [[MITRE]] lists the null pointer error as one of the most commonly exploited software weaknesses.<ref>{{cite web |url=https://cwe.mitre.org/data/definitions/476.html |title=CWE-476: NULL Pointer Dereference |website=MITRE}}</ref>
* In [[C (programming language)|C]], dereferencing a null pointer is [[undefined behavior]].<ref name="ub">[[#c-std|ISO/IEC 9899]], clause 6.5.3.2, paragraph 4, esp. footnote 87.</ref> Many implementations cause such code to result in the program being halted with an [[access violation]], because the null pointer representation is chosen to be an address that is never allocated by the system for storing objects. However, this behavior is not universal. It's also not guaranteed, since compilers are permitted to optimize programs under the assumption that they're free of undefined behaviour.


* In [[C (programming language)|C]], dereferencing a null pointer is [[undefined behavior]].<ref name="ub">[[#c-std|ISO/IEC 9899]], clause 6.5.3.2, paragraph 4, esp. footnote 87.</ref> Many implementations cause such code to result in the program being halted with an [[access violation]], because the null pointer representation is chosen to be an address that is never allocated by the system for storing objects. However, this behavior is not universal. It is also not guaranteed, since compilers are permitted to optimize programs under the assumption that they are free of undefined behavior.
* In [[Delphi (programming language)|Delphi]] and many other Pascal implementations, the constant <code>nil</code> represents a null pointer to the first address in memory which is also used to initialize managed variables. Dereferencing it raises an external OS exception which is being mapped onto a Pascal {{code|EAccessViolation}}'' exception instance if the {{code|System.SysUtils}} unit is linked in the uses clause.
* In [[Delphi (programming language)|Delphi]] and many other Pascal implementations, the constant {{code|nil}} represents a null pointer to the first address in memory which is also used to initialize managed variables. Dereferencing it raises an external OS exception which is mapped onto a Pascal {{code|EAccessViolation}} exception instance if the {{code|System.SysUtils}} unit is linked in the {{code|uses}} clause.
* In [[Java (programming language)|Java]], access to a null reference triggers a {{Javadoc:SE|java/lang|NullPointerException}} (NPE), which can be caught by error handling code, but the preferred practice is to ensure that such exceptions never occur.
* In [[Java (programming language)|Java]], access to a null reference causes a {{Javadoc:SE|java/lang|NullPointerException}} (NPE), which can be caught by error handling code, but the preferred practice is to ensure that such exceptions never occur.
* In [[Lisp_(programming_language)|Lisp]], {{code|nil}} is a [[first class object]]. By convention, <code>(first nil)</code> is {{code|nil}}, as is <code>(rest nil)</code>. So dereferencing {{code|nil}} in these contexts will not cause an error, but poorly written code can get into an infinite loop.
* In [[Lisp_(programming_language)|Lisp]], {{code|nil}} is a [[first class object]]. By convention, <code>(first nil)</code> is {{code|nil}}, as is <code>(rest nil)</code>. So dereferencing {{code|nil}} in these contexts will not cause an error, but poorly written code can get into an infinite loop.
* In [[.Net (programming language)|.NET]], access to null reference triggers a {{code|NullReferenceException}} to be thrown. Although catching these is generally considered bad practice, this exception type can be caught and handled by the program.
* In [[.Net (programming language)|.NET]], access to null reference causes a {{code|NullReferenceException}} to be thrown. Although catching these is generally considered bad practice, this exception type can be caught and handled by the program.
* In [[Objective-C]], messages may be sent to a <code>nil</code> object (which is a null pointer) without causing the program to be interrupted; the message is simply ignored, and the return value (if any) is <code>nil</code> or <code>0</code>, depending on the type.<ref>''The Objective-C 2.0 Programming Language'', [https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/ObjectiveC/Chapters/ocObjectsClasses.html#//apple_ref/doc/uid/TP30001163-CH11-SW7 section "Sending Messages to nil"].</ref>
* In [[Objective-C]], messages may be sent to a {{code|nil}} object (which is a null pointer) without causing the program to be interrupted; the message is simply ignored, and the return value (if any) is {{code|nil}} or {{code|0}}, depending on the type.<ref>''The Objective-C 2.0 Programming Language'', [https://developer.apple.com/library/mac/#documentation/Cocoa/Conceptual/ObjectiveC/Chapters/ocObjectsClasses.html#//apple_ref/doc/uid/TP30001163-CH11-SW7 section "Sending Messages to nil"].</ref>
* Before the introduction of [[Supervisor Mode Access Prevention]] (SMAP), a null pointer dereference bug could be exploited by mapping [[Zero page|pagezero]] into the attacker's [[address space]] and hence causing the null pointer to point to that region. This could lead to [[Arbitrary code execution|code execution]] in some cases.<ref>[https://bugs.chromium.org/p/project-zero/issues/detail?id=782&can=1&q=null%20pointer&sort=-reported&colspec=ID%20Status%20Restrict%20Reported%20Vendor%20Product%20Finder%20Summary OS X exploitable kernel NULL pointer dereference in AppleGraphicsDeviceControl]</ref>
* Before the introduction of [[Supervisor Mode Access Prevention]] (SMAP), a null pointer dereference bug could be exploited by mapping [[Zero page|page zero]] into the attacker's [[address space]] and hence causing the null pointer to point to that region. This could lead to [[Arbitrary code execution|code execution]] in some cases.<ref>[https://bugs.chromium.org/p/project-zero/issues/detail?id=782&can=1&q=null%20pointer&sort=-reported&colspec=ID%20Status%20Restrict%20Reported%20Vendor%20Product%20Finder%20Summary "OS X exploitable kernel NULL pointer dereference in AppleGraphicsDeviceControl"]</ref>


== Mitigation ==
== Mitigation ==


There are techniques to facilitate debugging null pointer dereferences.<ref name="BondNethercote2007">{{cite book|last1=Bond|first1=Michael D.|last2=Nethercote|first2=Nicholas|last3=Kent|first3=Stephen W.|last4=Guyer|first4=Samuel Z.|last5=McKinley|first5=Kathryn S.|title=Proceedings of the 22nd annual ACM SIGPLAN conference on Object oriented programming systems and applications - OOPSLA '07|chapter=Tracking bad apples|year=2007|pages=405|doi=10.1145/1297027.1297057|isbn=9781595937865|s2cid=2832749}}</ref><ref name="Cornu2016">{{cite journal|last1=Cornu|first1=Benoit|last2=Barr|first2=Earl T.|last3=Seinturier|first3=Lionel|last4=Monperrus|first4=Martin|title=Casper: Automatic tracking of null dereferences to inception with causality traces|journal=Journal of Systems and Software|volume=122|year=2016|pages=52–62|issn=0164-1212|doi=10.1016/j.jss.2016.08.062|url=https://hal.archives-ouvertes.fr/hal-01354090/document|arxiv=1502.02004}}</ref> Bond et al.<ref name="BondNethercote2007"/> suggest to modify the Java Virtual Machine (JVM) in order to keep track of null propagation. The idea of the Casper system<ref name="Cornu2016"/> is to use source code transformation in order to track this propagation, without modifying the JVM. In some cases, it is possible to automatically generate a patch to fix null pointer exceptions.<ref>{{Cite journal|last1=Durieux|first1=Thomas|last2=Cornu|first2=Benoit|last3=Seinturier|first3=Lionel|last4=Monperrus|first4=Martin|date=2017|title=Dynamic patch generation for null pointer exceptions using metaprogramming|url=https://hal.archives-ouvertes.fr/hal-01419861/file/main.pdf|journal=2017 IEEE 24th International Conference on Software Analysis, Evolution and Reengineering (SANER)|publisher=IEEE|pages=349–358|doi=10.1109/SANER.2017.7884635|isbn=978-1-5090-5501-2|arxiv=1812.00409|s2cid=2736203}}</ref>
There are techniques to facilitate debugging null pointer dereferences.<ref name="BondNethercote2007">{{cite book|last1=Bond|first1=Michael D.|last2=Nethercote|first2=Nicholas|last3=Kent|first3=Stephen W.|last4=Guyer|first4=Samuel Z.|last5=McKinley|first5=Kathryn S.|title=Proceedings of the 22nd annual ACM SIGPLAN conference on Object oriented programming systems and applications - OOPSLA '07|chapter=Tracking bad apples|year=2007|pages=405|doi=10.1145/1297027.1297057|isbn=9781595937865|s2cid=2832749}}</ref> Bond et al.<ref name="BondNethercote2007"/> suggest to modify the Java Virtual Machine (JVM) in order to keep track of null propagation.


Pure [[functional languages]] and user code run in many [[interpreter (computing)|interpreted]] or virtual-machine languages do not suffer the problem of null pointer dereferencing, since no direct access is provided to pointers and, in the case of pure functional languages, all code and data is immutable.
Pure [[functional languages]] and user code run in many [[interpreter (computing)|interpreted]] or virtual-machine languages do not suffer the problem of null pointer dereferencing, since no direct access is provided to pointers and, in the case of pure functional languages, all code and data is immutable.


Where a language does provide or utilise pointers which could otherwise become void, it may be possible to mitigate or avoid runtime null dereferences by providing compilation-time checking via [[static analysis]] or other techniques, with a burgeoning movement toward syntactic assistance from language features such as those seen in modern versions of the [[Eiffel programming language]],<ref>{{cite web | url=https://www.eiffel.org/doc/eiffel/Void-safety-_Background%2C_definition%2C_and_tools |title=Void-safety: Background, definition, and tools|access-date=2021-11-24 }}</ref> [[D programming language|D]],<ref name="SafeD">{{cite web |title=SafeD – D Programming Language |url=http://dlang.org/safed.html |author=Bartosz Milewski |access-date=17 July 2014}}</ref> and [[Rust programming language|Rust]].<ref>{{cite web|title=Fearless Security: Memory Safety|url=https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/|access-date=4 November 2020|archive-date=8 November 2020|archive-url=https://web.archive.org/web/20201108003116/https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/|url-status=live}}</ref>
Where a language does provide or utilise pointers which could otherwise become void, it may be possible to mitigate or avoid runtime null dereferences by providing [[void safety|compilation-time checking]] via [[static analysis]] or other techniques, with a burgeoning movement toward syntactic assistance from language features such as those seen in modern versions of the [[Eiffel programming language]],<ref>{{cite web | url=https://www.eiffel.org/doc/eiffel/Void-safety-_Background%2C_definition%2C_and_tools |title=Void-safety: Background, definition, and tools|access-date=2021-11-24 }}</ref> [[D programming language|D]],<ref name="SafeD">{{cite web |title=SafeD – D Programming Language |url=http://dlang.org/safed.html |author=Bartosz Milewski |access-date=17 July 2014}}</ref> and [[Rust programming language|Rust]].<ref>{{cite web|title=Fearless Security: Memory Safety|url=https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/|access-date=4 November 2020|archive-date=8 November 2020|archive-url=https://web.archive.org/web/20201108003116/https://hacks.mozilla.org/2019/01/fearless-security-memory-safety/|url-status=live}}</ref>


Similar analysis can be performed using external tools, in some languages.
Similar analysis can be performed using external tools, in some languages.
== Alternatives to null pointers ==
As a rule of thumb, for each type of struct or class, define some objects representing some state of the business logic replacing the undefined behaviour on null. For example, "future" to indicate a field inside an structure that will not be available right now (but for which we known in advance that in a future it will defined), "not applicable" to indicate a field in a non-normalized structure, "error", "timeout" to indicate that the field could not be initialized (probably stopping normal execution of the full program, thread, request or command).


== History ==
== History ==
Line 50: Line 54:
|url=http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare
|url=http://www.infoq.com/presentations/Null-References-The-Billion-Dollar-Mistake-Tony-Hoare
|title=Null References: The Billion Dollar Mistake
|title=Null References: The Billion Dollar Mistake
|publisher=InfoQ.com
|date=2009-08-25
|author=Tony Hoare
|author-link=Tony Hoare
}}</ref><ref>{{cite web
|url=https://qconlondon.com/london-2009/qconlondon.com/london-2009/presentation/Null%2BReferences_%2BThe%2BBillion%2BDollar%2BMistake.html
|title=Presentation: "Null References: The Billion Dollar Mistake"
|publisher=InfoQ.com
|publisher=InfoQ.com
|date=2009-08-25
|date=2009-08-25
Line 61: Line 72:
* [[Memory debugger]]
* [[Memory debugger]]
* [[Zero page]]
* [[Zero page]]

== Notes==

{{refs}}


== References ==
== References ==
=== Citations ===
{{Reflist}}


=== Sources ===
{{refbegin}}
{{refbegin}}
* {{cite book |ref = c-std |author = Joint Technical Committee ISO/IEC JTC 1, Subcommittee SC 22, Working Group WG 14 |title = International Standard ISO/IEC 9899 |date = 2007-09-08<!-- 19:22:32 +0000 -->|url = http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf |type = Committee Draft }}
* {{cite book |ref = c-std |author = Joint Technical Committee ISO/IEC JTC 1, Subcommittee SC 22, Working Group WG 14 |title = International Standard ISO/IEC 9899 |date = 2007-09-08<!-- 19:22:32 +0000 -->|url = http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1256.pdf |type = Committee Draft }}
{{refend}}
{{refend}}


{{-}}
{{Nulls}}
{{Nulls}}


[[Category:Data types]]
[[Category:Pointers (computer programming)]]

Revision as of 16:21, 23 July 2024

In computing, a null pointer or null reference is a value saved for indicating that the pointer or reference does not refer to a valid object. Programs routinely use null pointers to represent conditions such as the end of a list of unknown length or the failure to perform some action; this use of null pointers can be compared to nullable types and to the Nothing value in an option type.

A null pointer should not be confused with an uninitialized pointer: a null pointer is guaranteed to compare unequal to any pointer that points to a valid object. However, in general, most languages do not offer such guarantee for uninitialized pointers. It might compare equal to other, valid pointers; or it might compare equal to null pointers. It might do both at different times; or the comparison might be undefined behaviour. Also, in languages offering such support, the correct use depends on the individual experience of each developer and linter tools. Even when used properly, null pointers are semantically incomplete, since they do not offer the possibility to express the difference between a "Not Applicable" value versus a "Not known" value or versus a "Future" value.

Because a null pointer does not point to a meaningful object, an attempt to access the data stored at that (invalid) memory location may cause a run-time error or immediate program crash. This is the null pointer error. It is one of the most common types of software weaknesses,[1] and Tony Hoare, who introduced the concept, has referred to it as a "billion dollar mistake".

C

In C, two null pointers of any type are guaranteed to compare equal.[2] The preprocessor macro NULL is defined as an implementation-defined null pointer constant in <stdlib.h>,[3] which in C99 can be portably expressed as ((void *)0), the integer value 0 converted to the type void* (see pointer to void type).[4] The C standard does not say that the null pointer is the same as the pointer to memory address 0, though that may be the case in practice. Dereferencing a null pointer is undefined behavior in C,[5] and a conforming implementation is allowed to assume that any pointer that is dereferenced is not null.

In practice, dereferencing a null pointer may result in an attempted read or write from memory that is not mapped, triggering a segmentation fault or memory access violation. This may manifest itself as a program crash, or be transformed into a software exception that can be caught by program code. There are, however, certain circumstances where this is not the case. For example, in x86 real mode, the address 0000:0000 is readable and also usually writable, and dereferencing a pointer to that address is a perfectly valid but typically unwanted action that may lead to undefined but non-crashing behavior in the application. There are occasions when dereferencing the pointer to address zero is intentional and well-defined; for example, BIOS code written in C for 16-bit real-mode x86 devices may write the interrupt descriptor table (IDT) at physical address 0 of the machine by dereferencing a null pointer for writing. It is also possible for the compiler to optimize away the null pointer dereference, avoiding a segmentation fault but causing other undesired behavior.[6]

C++

In C++, while the NULL macro was inherited from C, the integer literal for zero has been traditionally preferred to represent a null pointer constant.[7] However, C++11 introduced the explicit null pointer constant nullptr and type nullptr_t to be used instead.

Other languages

In some programming language environments (at least one proprietary Lisp implementation, for example),[citation needed] the value used as the null pointer (called nil in Lisp) may actually be a pointer to a block of internal data useful to the implementation (but not explicitly reachable from user programs), thus allowing the same register to be used as a useful constant and a quick way of accessing implementation internals. This is known as the nil vector.

In languages with a tagged architecture, a possibly null pointer can be replaced with a tagged union which enforces explicit handling of the exceptional case; in fact, a possibly null pointer can be seen as a tagged pointer with a computed tag.

Programming languages use different literals for the null pointer. In Python, for example, a null value is called None. In Pascal and Swift, a null pointer is called nil. In Eiffel, it is called a void reference.

Null dereferencing

Because a null pointer does not point to a meaningful object, an attempt to dereference (i.e., access the data stored at that memory location) a null pointer usually (but not always) causes a run-time error or immediate program crash. MITRE lists the null pointer error as one of the most commonly exploited software weaknesses.[8]

  • In C, dereferencing a null pointer is undefined behavior.[5] Many implementations cause such code to result in the program being halted with an access violation, because the null pointer representation is chosen to be an address that is never allocated by the system for storing objects. However, this behavior is not universal. It is also not guaranteed, since compilers are permitted to optimize programs under the assumption that they are free of undefined behavior.
  • In Delphi and many other Pascal implementations, the constant nil represents a null pointer to the first address in memory which is also used to initialize managed variables. Dereferencing it raises an external OS exception which is mapped onto a Pascal EAccessViolation exception instance if the System.SysUtils unit is linked in the uses clause.
  • In Java, access to a null reference causes a NullPointerException (NPE), which can be caught by error handling code, but the preferred practice is to ensure that such exceptions never occur.
  • In Lisp, nil is a first class object. By convention, (first nil) is nil, as is (rest nil). So dereferencing nil in these contexts will not cause an error, but poorly written code can get into an infinite loop.
  • In .NET, access to null reference causes a NullReferenceException to be thrown. Although catching these is generally considered bad practice, this exception type can be caught and handled by the program.
  • In Objective-C, messages may be sent to a nil object (which is a null pointer) without causing the program to be interrupted; the message is simply ignored, and the return value (if any) is nil or 0, depending on the type.[9]
  • Before the introduction of Supervisor Mode Access Prevention (SMAP), a null pointer dereference bug could be exploited by mapping page zero into the attacker's address space and hence causing the null pointer to point to that region. This could lead to code execution in some cases.[10]

Mitigation

There are techniques to facilitate debugging null pointer dereferences.[11] Bond et al.[11] suggest to modify the Java Virtual Machine (JVM) in order to keep track of null propagation.

Pure functional languages and user code run in many interpreted or virtual-machine languages do not suffer the problem of null pointer dereferencing, since no direct access is provided to pointers and, in the case of pure functional languages, all code and data is immutable.

Where a language does provide or utilise pointers which could otherwise become void, it may be possible to mitigate or avoid runtime null dereferences by providing compilation-time checking via static analysis or other techniques, with a burgeoning movement toward syntactic assistance from language features such as those seen in modern versions of the Eiffel programming language,[12] D,[13] and Rust.[14]

Similar analysis can be performed using external tools, in some languages.

Alternatives to null pointers

As a rule of thumb, for each type of struct or class, define some objects representing some state of the business logic replacing the undefined behaviour on null. For example, "future" to indicate a field inside an structure that will not be available right now (but for which we known in advance that in a future it will defined), "not applicable" to indicate a field in a non-normalized structure, "error", "timeout" to indicate that the field could not be initialized (probably stopping normal execution of the full program, thread, request or command).

History

In 2009, Tony Hoare stated[15][16] that he invented the null reference in 1965 as part of the ALGOL W language. In that 2009 reference Hoare describes his invention as a "billion-dollar mistake":

I call it my billion-dollar mistake. It was the invention of the null reference in 1965. At that time, I was designing the first comprehensive type system for references in an object oriented language (ALGOL W). My goal was to ensure that all use of references should be absolutely safe, with checking performed automatically by the compiler. But I couldn't resist the temptation to put in a null reference, simply because it was so easy to implement. This has led to innumerable errors, vulnerabilities, and system crashes, which have probably caused a billion dollars of pain and damage in the last forty years.

See also

Notes

  1. ^ "CWE-476: NULL Pointer Dereference". MITRE.
  2. ^ ISO/IEC 9899, clause 6.3.2.3, paragraph 4.
  3. ^ ISO/IEC 9899, clause 7.17, paragraph 3: NULL... which expands to an implementation-defined null pointer constant...
  4. ^ ISO/IEC 9899, clause 6.3.2.3, paragraph 3.
  5. ^ a b ISO/IEC 9899, clause 6.5.3.2, paragraph 4, esp. footnote 87.
  6. ^ Lattner, Chris (2011-05-13). "What Every C Programmer Should Know About Undefined Behavior #1/3". blog.llvm.org. Archived from the original on 2023-06-14. Retrieved 2023-06-14.
  7. ^ Stroustrup, Bjarne (March 2001). "Chapter 5:
    The const qualifier (§5.4) prevents accidental redefinition of NULL and ensures that NULL can be used where a constant is required.". The C++ Programming Language (14th printing of 3rd ed.). United States and Canada: Addison–Wesley. p. 88. ISBN 0-201-88954-4.
  8. ^ "CWE-476: NULL Pointer Dereference". MITRE.
  9. ^ The Objective-C 2.0 Programming Language, section "Sending Messages to nil".
  10. ^ "OS X exploitable kernel NULL pointer dereference in AppleGraphicsDeviceControl"
  11. ^ a b Bond, Michael D.; Nethercote, Nicholas; Kent, Stephen W.; Guyer, Samuel Z.; McKinley, Kathryn S. (2007). "Tracking bad apples". Proceedings of the 22nd annual ACM SIGPLAN conference on Object oriented programming systems and applications - OOPSLA '07. p. 405. doi:10.1145/1297027.1297057. ISBN 9781595937865. S2CID 2832749.
  12. ^ "Void-safety: Background, definition, and tools". Retrieved 2021-11-24.
  13. ^ Bartosz Milewski. "SafeD – D Programming Language". Retrieved 17 July 2014.
  14. ^ "Fearless Security: Memory Safety". Archived from the original on 8 November 2020. Retrieved 4 November 2020.
  15. ^ Tony Hoare (2009-08-25). "Null References: The Billion Dollar Mistake". InfoQ.com.
  16. ^ Tony Hoare (2009-08-25). "Presentation: "Null References: The Billion Dollar Mistake"". InfoQ.com.

References

  • Joint Technical Committee ISO/IEC JTC 1, Subcommittee SC 22, Working Group WG 14 (2007-09-08). International Standard ISO/IEC 9899 (PDF) (Committee Draft).{{cite book}}: CS1 maint: multiple names: authors list (link) CS1 maint: numeric names: authors list (link)