NULL and Nothing
NULL and Nothing
2
Introduction
In databases and in QlikView, nothingness is represented by the concept of NULL, i.e. a field
value to show that there is no value assigned to the field in this record. Strictly speaking, NULL
is not a value – it is a lack of value. Although I know this well, I sometimes call it “NULL value”
anyway.
NULLs have certain basic properties:
All QlikView fields and all data types in a SQL database are NULL-able.
In SQL, NULL does not have a data type. In QlikView, this corresponds to the fact that
NULLs are not dual, i.e. they do not have both a string representation and a numeric
representation.
NULLs propagate. If you use a NULL in an expression, it will not cause an error.
Rather, it will propagate through the expression and yield a result which often – but not
always – is NULL.
NULLs cannot be used as key value to join or link tables.
NULLs are neither visible nor clickable in QlikView list boxes, unless you make them
visible and clickable using a method described below. This means that NULLs are not
selectable in QlikView.
A consequence of the above is that if you select all values in list box, you will get a different
result in other list boxes than if you select no values.
The reason for this is that the first case excludes records with NULLs, which could mean that
you exclude real values in other fields that potentially are used in calculations.
But there are also other types of nothingness…
Numerical zero
If a field has a numerical zero in it – 0 – then this of course represents a numerical nothingness,
but it is certainly not the same as NULL. The field has a value and is hence not NULL. The
IsNull() function will return FALSE and the record will be included in the calculation of both
Avg() and Count().
3
IsNull() function will return FALSE and the record will be included in the calculation of Count()
but not in Sum() and Avg() since it is not numeric.
Note that there are several type of characters for white space: soft blanks (Chr(32)), hard
blanks (Chr(160)), the tab character (Chr(09)) and ideographic space (#3000/Chr(12288)). If
you use the trim functions, you should be aware that these only remove soft blanks and no
other characters.
True NULLs
True NULLs are cases when an existing record in the data model has a field where the value
is missing and it hence is marked as NULL. These NULLs have all the properties described in
the introduction.
There are several ways that true NULLs can get into the QlikView data model:
Databases often contain NULLs and if you use ODBC or OLE DB to load data into
QlikView, you will most likely get NULLs in your application.
If you have joins or concatenations in the QlikView script, the missing values will be
converted to NULLs. This is true both for the external SQL commands (“JOIN” and
“UNION”) and for the internal QlikView prefixes (“Join” and “Concatenate”).
Some functions, e.g. an If() with only two parameters, Only(), and most notably Null(),
may return NULL and if these functions are used in the QlikView script, you will most
likely get NULLs in your application.
A text file cannot contain any NULLs in itself, so unless you load it using functions that can
generate NULLs, or the table is part of a concatenate or a join operation, QlikView will not have
any NULLs in the table.
The IsNull() function will return TRUE for NULLs and the record will not be included in the
calculation of any aggregation function, except NullCount().
4
In the example in the picture, the customer with the CustomerID “PARIS” is not represented in
the Orders table and values for OrderID and OrderDate are missing for this specific customer.
If you in such a situation create a chart that uses fields from both tables, the chart will for
calculation purposes generate an internal virtual table – the combination of the two data tables.
In this virtual table, a missing field value will be treated as NULL and the IsNull() function will
return TRUE. The record will not be included in the calculation of any aggregation function
using this field, except NullCount().
Missing values – Type two: Cross table with incomplete Cartesian product
The second type of missing value occurs if you have a QlikView chart with two or more
dimensions. The best example is a cross table, i.e. a pivot table with at least one vertical
dimension and at least one horizontal dimension. Then you might get a situation where a
specific combination of dimensional values is not represented in the data but still has a cell in
the pivot table. Both dimensional fields may even exist in the same data model table. Hence,
you can potentially get this type of missing values with just one data model table and two
dimensional fields.
This type of missing value is also treated as a NULL when possible. It is however not always
calculated. An example for this situation is if you have a data table with amounts per quarter
and the table also holds a field Year.
In the example shown in the picture, there are amounts for all quarters in 2011 but not for Q3
and Q4 2012. When the pivot table is calculated, an algorithm loops over all records in the
database. Since there are no data records for the last two quarters, the consequence is that the
5
expression is never calculated for these cells (middle and rightmost tables). So basically it
looks as if IsNull() has returned NULL, when it in fact has not been calculated at all.
Kleenean logic
Before we look at how NULLs propagate through expressions, we need to look at Kleenean
logic. (From Stephen Cole Kleene, a US mathematician.)
Most people know well what Boolean logic is: It is the algebra you use when you want to deal
with concepts that are TRUE or FALSE. Using Boolean logic, you can set up truth tables to
define how to propagate values through an expression. Below you see the Boolean truth tables
for the OR and AND operators.
However, SQL databases and QlikView do not use Boolean logic. Instead, they use Kleenean
logic, also called three-valued logic or ternary logic. The reason for this is that although
Boolean fields are defined as having two possible values, they have in reality three possible
states: TRUE, FALSE or NULL. As a consequence, you need to define how to propagate a
NULL just as well as propagating a TRUE or a FALSE. Below you find the truth tables for the
OR and AND operators, but using Kleenean logic.
In these tables, the third value is referred to as MAYBE, but it could just as well be denoted as
“Undefined”, “Unknown”, “Indeterminate” or NULL.
Kleenean logic is a tool that helps us define how NULL should be propagated through different
operators and functions. The two truth tables above (MAYBE replaced with NULL) define how
the OR and AND operators work in QlikView.
The tables above show that [NULL OR TRUE] evaluates to TRUE, whereas [NULL AND TRUE]
evaluates to NULL.
6
NULL propagation
NULL propagates in all expressions. This means that the calculation will be made although
some operands or function parameters are NULL. The result is often NULL, but sometimes not.
Here follow the main rules for how NULL propagates.
The NOT operator will return NULL only if the operand is NULL.
If the condition (the first parameter) of an If() function evaluates to NULL, the ELSE
expression (the third parameter) will be returned. I.e. the If() function does not
necessarily return NULL just because the condition is NULL.
Boolean operators return NULL when appropriate. For the AND operator it is enough
that one of the operands is FALSE for the operation to return FALSE, and for the OR
operator it is enough that one of the operands is TRUE for the operation to return
TRUE. See below.
The equality and inequality operators will return NULL if both operands are NULL. But if
only one of the operands is NULL, the comparison will always deny equality, i.e. the
equality operator will return FALSE and the inequality operator will return TRUE.
So, in principle, relational operators to test for NULL should be avoided. Such a test will
never confirm the existence of NULL. Use the function IsNull(Field) instead. Examples:
o Field = Null() evaluates to FALSE or NULL. It will never evaluate to TRUE.
o IsNull( Field ) evaluates to TRUE or FALSE depending on the field value.
Other relational operators will always return NULL if any of the operands is NULL.
7
The LIKE operator is special. It is not a symmetric comparison – it is a test of the first
operand only to see whether this resembles the second operand. It returns NULL if the
second operand is NULL, and only then.
The string concatenation operator returns NULL only if both operands are NULL. This
means that string concatenation works fine if one of the operands is NULL.
Examples:
o Null() & Null() evaluates to NULL.
o 'xyz' & Null() evaluates to 'xyz'.
o '' & Null() evaluates to an empty string and not to NULL
8
Any arithmetic operation having NULL on one side will evaluate to NULL. If you use the
Range functions instead, you can get an expression that disregards the NULL.
Examples:
o Null() + 5 evaluates to NULL
o RangeSum( Null(), 5 ) evaluates to 5
Most numeric functions return NULL if the parameter is NULL. The Range functions are
an exception. Examples:
o Sqrt( Null() ) returns NULL
o Sqrt( RangeSum( Null() ) ) returns 0
Most string functions return NULL if the parameter is NULL. If you want an empty string
instead, you can use a string concatenation inside the string function. Examples:
o Left( Null(), 1 ) evaluates to NULL
o Left( '' & Null(), 1 ) evaluates to an empty string
o Len( Null() ) evaluates to 0, which means that NULL has zero length
o Index( Null(), 'x' ) evaluates to 0, which means that the string 'x' could not be
found in the first parameter. Hence the index function works normally also with
NULL.
An If() function where only two parameters are used will use Null() as ELSE
expression.
Aggregation functions are normally not affected by NULLs, i.e. they do not use the
record with the NULL in the calculation. Except NullCount() which counts the records
having NULL.
The Only() aggregation function returns NULL if there are no or several possible values
of the parameter. It returns the parameter value if there is only one possible value. This
includes the case when there are several records with only one possible value, but at
the same time there are some records with NULL.
The NullCount() aggregation function returns the number of records with NULLs. When
used in charts, it returns 1 also for missing values of type one (although there are no
records in data) and 0 for missing values of type two.
9
How does QlikView display NULL?
How does QlikView show the concept of nothing – when this is the relevant answer to the
user’s click? To investigate this, let’s again use an example with data from an orders database
with two tables: Customers and Orders, linked by CustomerID.
List box
In the picture below, you have a selection of two customers that haven’t placed any orders, so
NULL is the only possible “value” for the OrderID. As a result, all values in the OrderID list box
are gray.
In other words; for a list box, it is simple: NULL is not visible as an explicit list box entry. If all
entries are marked as gray, NULL is the answer to the click.
If you have a list box where you have an entry that is blank; that you can click on; that can be
selected, then it is not a NULL. NULLs are never visible in list boxes and can never be
selected. Instead, you have an empty string or some kind of white space.
Table box
In a table box, as well as in all other places where a NULL can occur, e.g. labels, text boxes,
buttons, etc., NULL is displayed as a dash. In these places, NULL is visible but not clickable.
Chart Dimensions
For a chart, it becomes more complicated. First of all, a NULL can occur either as a
dimensional value or in the expression. These are two very different cases and should not be
confused.
With the above data, it would be reasonable to make a chart that shows sales per customer. If
there are orders that are not attributed to any customer, or if the order contains an unknown
CustomerID, then you will get a NULL in the dimension of the chart – a NULL which is
displayed as a dash. Below you can see that order nr 10874 has no customer associated:
10
If you don’t want to show lines with NULL, you can suppress these on the Dimensions tab in
the chart properties.
If you don’t want to show lines with zero as expression value, you can suppress these on the
Presentations tab in the chart properties.
Here you should note two things: First of all, when Amount is NULL, the Sum(Amount) is
marked with a dash and not with a zero. The reason is that the expression is never calculated
for these cells, since the combination of the dimensional values does not exist in the data.
Secondly, since NULL is suppressed in the product column, customers that have not bought
anything are also suppressed. This is a consequence of the logical inference.
If you do not want to show dashes for NULL – you’d rather have a zero – then you can turn on
a chart setting that populates missing cells (Chart Properties -> Presentation -> Populate
11
missing cells). Then an internal virtual table will be generated and missing values will be
populated with NULLs.
Some tips
12
within the QVD and will hence not convert the NULLs. You will need to force QlikView to load
the data unoptimized. A simple where-clause will do the trick, but I prefer to write code that is
more explicit; so I remember why I did it when I look at the script a year later. My suggestion is
hence:
Load If( Len( Trim( Field ) ) > 0, Field, '$(NullValue)' ) as NewField …
Another problem is that NULLs created through the Join and Concatenate prefixes will only be
partially converted. The only way to make these values visible is to run a second pass through
the table, using Load resident after the join or the concatenate has been performed. Example:
// -------- Temporary table --------
Temp:
LOAD ... FROM A;
Concatenate
LOAD ... FROM B;
13
How to search and select NULLs or missing values?
NULLs cannot be selected explicitly, so to find the records with NULLs, the selection must
always be made in another field. For example, if you have a list of people and you know that
some of the records lack phone numbers, i.e. the phone field is sometimes NULL, you need to
select the unique person identifiers for which the phone number is NULL.
If your users make such a selection often, it may be a good idea to put the second step, the
selection of excluded values, in a button. Label it clearly and use the action “Select Excluded”.
This simplifies a lot for the user.
14
This method works for both NULLs and missing values of type one.
One problem with this approach is that it is not obvious which expression to use. Most people
would probably use '=IsNull(phone)'. However, in my view this expression is not correct: it
sometimes yields the wrong result. This can clearly be seen in the example in the picture
below. The search result displays two people; “Y” and “Z” when only “Z” has NULL as phone
number:
To understand this, you must first understand that there potentially may exist several phone
numbers per person. Hence, QlikView must use an aggregation function to evaluate the
expression. In fact, in advanced searches, QlikView always uses some aggregation function. If
there is no explicit aggregation function in the expression, QlikView assumes the use of the
Only() function. Thus, QlikView uses '=IsNull(Only(phone))' in the search.
The Only() function returns NULL for two different cases: First, when there is no value, and
secondly, when there are several possible values.
Hence, if there is only one record per person, this expression will correctly return TRUE for
people that have no phone number and FALSE for those who have a phone number. But for
people that have several records, the expression may return an unwanted value: If a person
has two different phone numbers, the expression will return TRUE, which is not what you want.
Further, if a person has two records, one with NULL and the other with a phone number, the
Only() function will return the phone number and the expression will return FALSE. Hence, this
search string will not find such a person, although you probably want it to.
Unfortunately, there is a widespread misconception that the search string '=IsNull(Field)' should
be used as advanced search. My recommendation is instead to use the NullCount()
aggregation function in the following search expression:
'=NullCount(Field)>0'
If your users make such a selection often, it may be a good idea to save the advanced search
in a bookmark. The bookmark will remember the advanced search and perform this every time
that it is applied. This simplifies a lot for the user.
15
Create a field in the data model that allows selecting excluded values
A third possibility is to prepare logic for selecting NULLs and missing values in the script, i.e. by
creating fields that hold the information necessary.
For the missing phone numbers (true NULLs), you could add a Boolean field that indicates
whether the record has a phone number of not:
People:
LOAD PersonID,
If( Len(Trim(Phone))>0, 'Yes', 'No') as [Has phone],
... FROM People;
For the customers that have not placed any orders (missing values type one), you could add a
Boolean field that indicates whether the customers has placed any orders or not (whether the
key is represented in another table):
Orders:
LOAD CustomerID,
... FROM Orders;
Customers:
LOAD CustomerID,
If( Exists(CustomerID), Dual('Yes', True()), Dual('No', False()) ) as [Has orders],
... FROM Customers;
The Exists() function compares the value of CustomerID in the Customers table with all
previously loaded values of this field. If the value has been loaded before, the Exists() function
returns TRUE; if not, it returns FALSE. Note that in order for this to work, the Orders table must
be loaded first.
The Dual() functions are not necessary, but by using them the created field can be used directly
as a Boolean flag in conditions.
This approach has the advantage that it is obvious for the user how to find the missing values
and at the same time all other search possibilities are maintained.
16
Empty element sets, either explicitly e.g. <Product = {}> or implicitly by a search with no hits
<Product = {"Perpetuum Mobile"}>, mean no product, i.e. they will result in an empty element
set. Note that the set modifier <Product = > is not the same as <Product = {} >. The former
merely removes the existing selection in the field, whereas the latter returns an empty set.
Set operators
In set analysis, set operators can be used to find the complement to a selection, i.e. the
excluded records. For example, the set expression
{$<OrderID={"*"}>}
will pick out all possible OrderIDs – but not the NULLs – and consequently
{1-$<OrderID={"*"}>}
will return the complement: it will pick out all customers that have an empty OrderID set, i.e.
that have not placed any orders.
HIC
Lund, Oct 13, 2016
17