Parsing and Manipulating Text PDF
Parsing and Manipulating Text PDF
Brian Piccolo
Sr. Director, Digital Strategy
Topics
Reformatting string and character data.
+------------+-----------+-------------------+
| first_name | last_name | full_name |
|------------|-----------|-------------------|
| MARY | SMITH | MARY SMITH |
| LINDA | WILLIAMS | LINDA WILLIAMS |
+------------+-----------+-------------------+
+--------------------------------------------+
| first_name | last_name | full_name |
|--------------------------------------------|
| MARY | SMITH | MARY SMITH |
| LINDA | WILLIAMS | LINDA WILLIAMS |
+--------------------------------------------+
+-------------------+
| full_name |
|-------------------|
| 1: MARY SMITH |
| 2: LINDA WILLIAMS |
+-------------------+
+-------------------------------------+
| UPPER(email) |
|-------------------------------------|
| [email protected] |
| [email protected] |
| [email protected] |
+-------------------------------------+
+-------------------+
| LOWER(title) |
|-------------------|
| academy dinosaur |
| ace goldfinger |
| adaptation holes |
+-------------------+
+-------------------+
| INITCAP(title) |
|-------------------|
| Academy Dinosaur |
| Ace Goldfinger |
| Adaptation Holes |
+-------------------+
+---------------------------------------------------------+
| description |
|---------------------------------------------------------|
| A Epic Drama of a Feminist And a Mad Scientist... |
| A Astounding Epistle of a Database Administrator... |
| A Astounding Reflection of a Lumberjack And a Car... |
| A Fanciful Documentary of a Frisbee And a Lumberjack... |
| A Fast-Paced Documentary of a Pastry Chef And a... |
+---------------------------------------------------------+
+---------------------------------------------------------+
| description |
|---------------------------------------------------------|
| A Epic Drama of a Feminist And a Mad Scientist... |
| An Astounding Epistle of a Database Administrator... |
| An Astounding Reflection of a Lumberjack And a Car... |
+---------------------------------------------------------+
+-------------------------------------+
| title | reverse(title) |
|-------------------------------------|
| ACADEMY DINOSAUR | RUASONID YMEDACA |
| ACE GOLDFINGER | REGNIFDLOG ECA |
+-------------------------------------+
Brian Piccolo
Sr. Director, Digital Strategy
Determining the length of a string
SELECT
title,
CHAR_LENGTH(title)
FROM film;
+-------------------+---------------------+
| title | CHAR_LENGTH(title) |
|-------------------+---------------------|
| ACADEMY DINOSAUR | 16 |
| ACE GOLDFINGER | 14 |
| ADAPTATION HOLES | 16 |
+-------------------+---------------------+
+-------------------+----------------+
| title | LENGTH(title) |
|-------------------+----------------|
| ACADEMY DINOSAUR | 16 |
| ACE GOLDFINGER | 14 |
| ADAPTATION HOLES | 16 |
+-------------------+----------------+
+-------------------------------------+------------------------+
| email | POSITION('@' IN email) |
|-------------------------------------|------------------------|
| [email protected] | 11 |
| [email protected] | 17 |
| [email protected] | 15 |
+-------------------------------------+------------------------+
+-------------------------------------+--------------------+
| email | STRPOS(email, '@') |
|-------------------------------------|--------------------|
| [email protected] | 11 |
| [email protected] | 17 |
| [email protected] | 15 |
+-------------------------------------+--------------------+
FROM film;
+----------------------------------------------------+
| description |
|----------------------------------------------------|
| A Epic Drama of a Feminist And a Mad Scientist who |
| A Astounding Epistle of a Database Administrator A |
| A Astounding Reflection of a Lumberjack And a Car |
+----------------------------------------------------+
+----------------------------------------------------+
| description |
|----------------------------------------------------|
| who must Battle a Teacher in The Canadian Rockies |
| nd a Explorer who must Find a Car in Ancient China |
| Car who must Sink a Lumberjack in A Baloon Factory |
+----------------------------------------------------+
+----------------------------------------------------+
| description |
|----------------------------------------------------|
| ama of a Feminist And a Mad Scientist who must Bat |
| ing Epistle of a Database Administrator And a Expl |
| ing Reflection of a Lumberjack And a Car who must |
+----------------------------------------------------+
+----------------------------------------------------+
| SUBSTRING(email FROM 0 FOR POSITION('@' IN email)) |
|----------------------------------------------------|
| MARY.SMITH |
| PATRICIA.JOHNSON |
| LINDA.WILLIAMS |
+----------------------------------------------------+
+-----------------------------------------------------------------------+
| SUBSTRING(email FROM POSITION('@' IN email)+1 FOR CHAR_LENGTH(email)) |
|-----------------------------------------------------------------------|
| sakilacustomer.org |
| sakilacustomer.org |
| sakilacustomer.org |
+-----------------------------------------------------------------------+
FROM
film AS f;
+----------------------------------------------------+
| description |
|----------------------------------------------------|
| ama of a Feminist And a Mad Scientist who must Bat |
| ing Epistle of a Database Administrator And a Expl |
| ing Reflection of a Lumberjack And a Car who must |
+----------------------------------------------------+
Let's practice!
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Truncating and
padding string data
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
Brian Piccolo
Sr. Director, Digital Strategy
Removing whitespace from strings
TRIM([leading | trailing | both] [characters] from string)
+--------+
| TRIM |
|--------|
| padded |
+--------+
+------------+
| LTRIM |
|------------|
| padded |
+------------+
+----------+
| RTRIM |
|----------|
| padded |
+----------+
+-------------+
| LPAD |
|-------------|
| ####padded |
+-------------+
+-------------+ +------------+
| LPAD | | LPAD |
|-------------| |------------|
| padded | | padde |
+-------------+ +------------+
when lenth àrameter is less than the original length of the str, the result
will be truncated
+-------------+
| RPAD |
|-------------|
| padded#### |
+-------------+
To accomplish this we will use the REVERSE() function to help determine the position of the last
whitespace character in the description before we reach 50 characters. This technique can be used to
determine the position of the last character that you want to truncate and ensure that it is less than or
equal to 50 characters AND does not cut off a word.
SELECT
UPPER(c.name) || ': ' || f.title AS film_category,
-- Truncate the description without cutting off a word
Determine the position of the left(description, 50 -
last whitespace character of -- Subtract the position of the first whitespace character
the truncated description position(
Let's practice!
column and subtract it from ' ' IN REVERSE(LEFT(description, 50))
the number 50 as the second )
parameter in the first function )
above FROM
film AS f
INNER JOIN film_category AS fc
ON f.film_id = fc.film_id
F U N C T I O N S F O R M A N I P U L AT I N G D ATA I N P O S TG R E S Q L
INNER JOIN category AS c
ON fc.category_id = c.category_id;