Ex11-Dealing With NULL Values
Ex11-Dealing With NULL Values
ipynb - Colab
Nulls in a database can cause a few headaches. Moreover, the descriptions in the SQL standards on how to handle NULLs seem ambiguous. It
is not clear from the standards documents exactly how NULLs should be handled in all circumstances.
Sometimes, we actually can avoid NULLs by setting the NOT NULL constrain when we create a table. However, it is worth bearing in mind that
making fields NOT NULL does not always work and could create more headaches than it cures. Not all values of null mean there is a problem
with the data.
SQLite NULL is the term used to represent a missing value. A NULL value in a table is a value in a field that appears to be blank. However, a
NULL value should not simply thought as 0 (zero) or an empty string like ' '. It is a value of as either empty or undefined.
%load_ext sql
%sql sqlite:///data/demo.db3
u'Connected: @data/demo.db3'
If you do not remember the tables in the demo data, you can always use the following command to query.
* sqlite:///data/demo.db3
Done.
name
rch
hru
sub
sed
watershed_daily
watershed_monthly
watershed_yearly
channel_dimension
hru_info
sub_info
rch_info
ave_plant
ave_annual_hru
ave_monthly_basin
ave_annual_basin
sqlite_sequence
https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 1/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab
The SQLite CREATE TABLE AS statement is used to create a table from an existing table by copying the existing table's columns.
%%sql sqlite://
DROP TABLE IF EXISTS watershed_yearly_bk;
CREATE TABLE watershed_yearly_bk AS SELECT * FROM watershed_yearly
Done.
Done.
[]
%%sql sqlite://
SELECT YR, PREC_mm
FROM watershed_yearly_bk
---LIMIT 3
Done.
YR PREC_mm
1981 895.605102539
1982 884.670654297
1983 816.660522461
1984 867.57434082
1985 637.725524902
1986 733.841247559
1987 1007.89447021
1988 895.846618652
1989 930.10546875
1990 751.455383301
1991 984.470336914
1992 907.946350098
1993 1057.77331543
1994 802.126220703
1995 696.852783203
1996 799.967468262
1997 689.377502441
1998 843.460205078
1999 644.301635742
2000 497.951629639
2001 512.250915527
2002 702.02935791
2003 729.944213867
2004 818.378112793
2005 855.009216309
2006 612.290344238
2007 822.174682617
2008 740.08996582
2009 1040.90124512
2010 905.668457031
SQLite UPDATE Query is used to modify the existing records in a table. You can use WHERE clause with UPDATE query to update
selected rows, otherwise all the rows would be updated.
%%sql sqlite://
UPDATE watershed_yearly_bk
SET PREC_mm = NULL
WHERE
PREC_mm < 850.0
18 rows affected.
[]
%%sql sqlite://
SELECT YR, PREC_mm
FROM watershed_yearly_bk
WHERE PREC_mm IS NULL
Done.
YR PREC_mm
1983 None
1985 None
1986 None
1990 None
1994 None
1995 None
1996 None
1997 None
1998 None
1999 None
2000 None
2001 None
2002 None
2003 None
2004 None
2006 None
2007 None
2008 None
%%sql sqlite://
SELECT COUNT(YR) AS MISSING
FROM watershed_yearly_bk
WHERE PREC_mm IS NULL
Done.
MISSING
18
In general, there are two main strategies to handle NULLs during the query session and NOT to change original data in the table.
This strategy is quite simple as we always can filter the data with a WHERE IS NOT NULL condition. However, in practice, the data would be
used at all, if the ratio of NULLs is too high.
%%sql sqlite://
SELECT YR, PREC_mm
FROM watershed_yearly_bk
WHERE PREC_mm IS NOT NULL
https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 3/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab
Done.
YR PREC_mm
1981 895.605102539
1982 884.670654297
1984 867.57434082
1987 1007.89447021
1988 895.846618652
1989 930.10546875
1991 984.470336914
1992 907.946350098
1993 1057.77331543
2005 855.009216309
2009 1040.90124512
2010 90 6684 031
Calculate the counts of NULLs, NOt_NULLs and total. Keep in mind that the COUNT function will neglet NULL values.
%%sql sqlite://
SELECT SUM(CASE WHEN PREC_mm IS NULL THEN 1 else 0 END) COUNT_NULLs,
COUNT(PREC_mm) COUNT_NOT_NULLs,
COUNT(YR) AS TOTAL
From watershed_yearly_bk
Done.
COUNT_NULLs COUNT_NOT_NULLs TOTAL
18 12 30
It is recommended that you should firstly check the database document to make sure that nullable columns (columns that are allowed to have
null values) have documented what a null value means from a business perspective before replacing NULL values with sensible values.
The SQLite provides a more elegant way of handling NULL values. Tha is to use the COALESCE() function that accepts two or more arguments
and returns the first non-null argument into a specified default value if it is null. If all the arguments are NULL, the COALESCE function returns
NULL.
Here we want all NULLs of PREC_mm to be treated as the climatological mean of NOT NULLs.
%%sql sqlite://
SELECT avg(PREC_mm)
From watershed_yearly_bk
Done.
avg(PREC_mm)
936 122131348
%%sql sqlite://
SELECT YR, COALESCE(PREC_mm, 936.122131348) as Precipitation
From watershed_yearly_bk
https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 4/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab
Done.
YR Precipitation
1981 895.605102539
1982 884.670654297
1983 936.122131348
1984 867.57434082
1985 936.122131348
1986 936.122131348
1987 1007.89447021
1988 895.846618652
1989 930.10546875
1990 936.122131348
1991 984.470336914
1992 907.946350098
1993 1057.77331543
1994 936.122131348
1995 936.122131348
1996 936.122131348
1997 936.122131348
1998 936.122131348
1999 936.122131348
2000 936.122131348
2001 936.122131348
2002 936.122131348
2003 936.122131348
2004 936.122131348
2005 855.009216309
2006 936.122131348
2007 936.122131348
2008 936.122131348
2009 1040.90124512
keyboard_arrow_down Summary
Dealing with NULL values is a complicated task. It would be better to get assistances from domain experts or you know very clearly what the
NULL vlaues were presented for.
https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 5/5