0% found this document useful (0 votes)
10 views5 pages

Ex11-Dealing With NULL Values

Uploaded by

AB
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views5 pages

Ex11-Dealing With NULL Values

Uploaded by

AB
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 5

7/25/24, 4:17 PM ex11-Dealing with NULL Values.

ipynb - Colab

keyboard_arrow_down ex11-Dealing with NULL Values


The example data in the tables in the demo.db3 shown earlier are all accurate and complete. Every row has a value for each attribute. However,
real data is usually not so clean and tidy. You will often find NULL values in some tables.

Nulls in a database can cause a few headaches. Moreover, the descriptions in the SQL standards on how to handle NULLs seem ambiguous. It
is not clear from the standards documents exactly how NULLs should be handled in all circumstances.

Sometimes, we actually can avoid NULLs by setting the NOT NULL constrain when we create a table. However, it is worth bearing in mind that
making fields NOT NULL does not always work and could create more headaches than it cures. Not all values of null mean there is a problem
with the data.

SQLite NULL is the term used to represent a missing value. A NULL value in a table is a value in a field that appears to be blank. However, a
NULL value should not simply thought as 0 (zero) or an empty string like ' '. It is a value of as either empty or undefined.

This notebook will present:

How to DROP a table IF EXISTS


How to CREATE a new table from an existing table
How to UPDATE a table with a WHERE condition
How to COUNT NULL values with IS NULL
How to give NULLs default values with the SQLite COALESCE function

%load_ext sql

keyboard_arrow_down 1. Connect to database


It was mentioned before the demo.db3 is extracted from a hydrological modelling. As a result, the data in each table is tidy and complete
without NULL values. However, we can create a table with NULL values for demo.

%sql sqlite:///data/demo.db3

u'Connected: @data/demo.db3'

If you do not remember the tables in the demo data, you can always use the following command to query.

%sql SELECT name FROM sqlite_master WHERE type='table'

* sqlite:///data/demo.db3
Done.
name
rch
hru
sub
sed
watershed_daily
watershed_monthly
watershed_yearly
channel_dimension
hru_info
sub_info
rch_info
ave_plant
ave_annual_hru
ave_monthly_basin
ave_annual_basin
sqlite_sequence

keyboard_arrow_down 2. Create a table with NULL values from an existing table


Take the table of watershed_yearly as an example.

Firstly, make a backup table

https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 1/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab
The SQLite CREATE TABLE AS statement is used to create a table from an existing table by copying the existing table's columns.

%%sql sqlite://
DROP TABLE IF EXISTS watershed_yearly_bk;
CREATE TABLE watershed_yearly_bk AS SELECT * FROM watershed_yearly

Done.
Done.
[]

Have a quick check of the backup table

%%sql sqlite://
SELECT YR, PREC_mm
FROM watershed_yearly_bk
---LIMIT 3

Done.
YR PREC_mm
1981 895.605102539
1982 884.670654297
1983 816.660522461
1984 867.57434082
1985 637.725524902
1986 733.841247559
1987 1007.89447021
1988 895.846618652
1989 930.10546875
1990 751.455383301
1991 984.470336914
1992 907.946350098
1993 1057.77331543
1994 802.126220703
1995 696.852783203
1996 799.967468262
1997 689.377502441
1998 843.460205078
1999 644.301635742
2000 497.951629639
2001 512.250915527
2002 702.02935791
2003 729.944213867
2004 818.378112793
2005 855.009216309
2006 612.290344238
2007 822.174682617
2008 740.08996582
2009 1040.90124512
2010 905.668457031

Secondly, make some values as NULLs

SQLite UPDATE Query is used to modify the existing records in a table. You can use WHERE clause with UPDATE query to update
selected rows, otherwise all the rows would be updated.

%%sql sqlite://
UPDATE watershed_yearly_bk
SET PREC_mm = NULL
WHERE
PREC_mm < 850.0

18 rows affected.
[]

keyboard_arrow_down 3. Find NULLs


https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 2/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab
Null values cannot be determined with an =. We need to use the IS NULL or IS NOT NULL statements to identify null values. So, to get all records
with no recorded snow_depth, we could run this query.

%%sql sqlite://
SELECT YR, PREC_mm
FROM watershed_yearly_bk
WHERE PREC_mm IS NULL

Done.
YR PREC_mm
1983 None
1985 None
1986 None
1990 None
1994 None
1995 None
1996 None
1997 None
1998 None
1999 None
2000 None
2001 None
2002 None
2003 None
2004 None
2006 None
2007 None
2008 None

The count of years with NULLs

%%sql sqlite://
SELECT COUNT(YR) AS MISSING
FROM watershed_yearly_bk
WHERE PREC_mm IS NULL

Done.
MISSING
18

:) It is right the number of rows we updated.

keyboard_arrow_down 4. Handle NULLs


NULLs can be ambiguous and annoying as ther are identified differently depending on data sources. Tale can have NULL values for a number of
reasons such as observations that were not recorded and data corruption.

In general, there are two main strategies to handle NULLs during the query session and NOT to change original data in the table.

4.1 Do nut use rows with NULL values

This strategy is quite simple as we always can filter the data with a WHERE IS NOT NULL condition. However, in practice, the data would be
used at all, if the ratio of NULLs is too high.

%%sql sqlite://
SELECT YR, PREC_mm
FROM watershed_yearly_bk
WHERE PREC_mm IS NOT NULL

https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 3/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab

Done.
YR PREC_mm
1981 895.605102539
1982 884.670654297
1984 867.57434082
1987 1007.89447021
1988 895.846618652
1989 930.10546875
1991 984.470336914
1992 907.946350098
1993 1057.77331543
2005 855.009216309
2009 1040.90124512
2010 90 6684 031

Calculate the counts of NULLs, NOt_NULLs and total. Keep in mind that the COUNT function will neglet NULL values.

%%sql sqlite://
SELECT SUM(CASE WHEN PREC_mm IS NULL THEN 1 else 0 END) COUNT_NULLs,
COUNT(PREC_mm) COUNT_NOT_NULLs,
COUNT(YR) AS TOTAL
From watershed_yearly_bk

Done.
COUNT_NULLs COUNT_NOT_NULLs TOTAL
18 12 30

keyboard_arrow_down 4.2 Replace NULL values with sensible values***

It is recommended that you should firstly check the database document to make sure that nullable columns (columns that are allowed to have
null values) have documented what a null value means from a business perspective before replacing NULL values with sensible values.

The SQLite provides a more elegant way of handling NULL values. Tha is to use the COALESCE() function that accepts two or more arguments
and returns the first non-null argument into a specified default value if it is null. If all the arguments are NULL, the COALESCE function returns
NULL.

The following illustrates the syntax of the COALESCE function:


* COALESCE(parameter1, parameter2, …)*;

Here we want all NULLs of PREC_mm to be treated as the climatological mean of NOT NULLs.

Caluclate the mean nof NON-NULLs

%%sql sqlite://
SELECT avg(PREC_mm)
From watershed_yearly_bk

Done.
avg(PREC_mm)
936 122131348

Replace NULLs with the above mean nof NON-NULLs

%%sql sqlite://
SELECT YR, COALESCE(PREC_mm, 936.122131348) as Precipitation
From watershed_yearly_bk

https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 4/5
7/25/24, 4:17 PM ex11-Dealing with NULL Values.ipynb - Colab

Done.
YR Precipitation
1981 895.605102539
1982 884.670654297
1983 936.122131348
1984 867.57434082
1985 936.122131348
1986 936.122131348
1987 1007.89447021
1988 895.846618652
1989 930.10546875
1990 936.122131348
1991 984.470336914
1992 907.946350098
1993 1057.77331543
1994 936.122131348
1995 936.122131348
1996 936.122131348
1997 936.122131348
1998 936.122131348
1999 936.122131348
2000 936.122131348
2001 936.122131348
2002 936.122131348
2003 936.122131348
2004 936.122131348
2005 855.009216309
2006 936.122131348
2007 936.122131348
2008 936.122131348
2009 1040.90124512

keyboard_arrow_down Summary
Dealing with NULL values is a complicated task. It would be better to get assistances from domain experts or you know very clearly what the
NULL vlaues were presented for.

https://colab.research.google.com/drive/1qIJf8VNNemQmdunNnSVWqppFhLRXjWWn#printMode=true 5/5

You might also like