0% found this document useful (0 votes)
55 views6 pages

Cleaning Text: Ron de Bruin and Norman Harker, March 2005

This document provides formulas for cleaning and standardizing text data in Excel. It presents formulas to remove excess spaces, non-printing characters, and terminating punctuation from cell values. The formulas combine multiple text functions like TRIM, CLEAN, SUBSTITUTE, and LEFT to clean the data in a single formula. Examples are provided to demonstrate how the formulas work on sample text values.

Uploaded by

Vivek Anandan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLS, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
55 views6 pages

Cleaning Text: Ron de Bruin and Norman Harker, March 2005

This document provides formulas for cleaning and standardizing text data in Excel. It presents formulas to remove excess spaces, non-printing characters, and terminating punctuation from cell values. The formulas combine multiple text functions like TRIM, CLEAN, SUBSTITUTE, and LEFT to clean the data in a single formula. Examples are provided to demonstrate how the formulas work on sample text values.

Uploaded by

Vivek Anandan
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as XLS, PDF, TXT or read online on Scribd
You are on page 1/ 6

Cleaning Text

Ron de Bruin and Norman Harker, March 2005


Note: These formulas were developed for use with the DataRefiner Excel Addin.

Remove Excess Spaces, Substitutes " " for all CHAR(160), and non printing characters (especially CHAR(10)) and rem
Norman John
Harker
Norman John Harker
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A7),LEN(TRIM(A7
Norman
Harker!
Norman Harker
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A8),LEN(TRIM(A8
Norman
Harker?
Norman Harker
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A9),LEN(TRIM(A9
Norman
Harker.
Norman Harker
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A10),LEN(TRIM(A
Norman Harker?
Norman Harker
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A11),LEN(TRIM(A
Norman Harker
Norman Harker
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A12),LEN(TRIM(A
Norman John Ron Harker
Norman John Ron Harker =TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A13),LEN(TRIM(A
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A14),LEN(TRIM(A
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A15),LEN(TRIM(A
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A16),LEN(TRIM(A
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A17),LEN(TRIM(A
=TRIM(CLEAN(SUBSTITUTE(LEFT(TRIM(A18),LEN(TRIM(A
Formula Comments
Wow! 7 Text functions and a co-erced logical formula using an array structure in a calculation (of length)!
Look Mum! No VBA!
The above formula is a composite of the two formulas below.
Note that if used on a large data range, the result will be increased size of workbook and slower re-calculation.
In most cases this formula will be used once only and after use you should use Copy > Paste Special > Values > OK.
The source data can then be deleted or you can copy result over source data.

Just Remove Excess Spaces and Substitute " " for CHAR(160) and non-printing characters (especially CHAR(10))
Norman John
Harker
Norman John Harker
=TRIM(CLEAN(SUBSTITUTE(A30,CHAR(160)," ")))
Norman
Harker!
Norman Harker!
=TRIM(CLEAN(SUBSTITUTE(A31,CHAR(160)," ")))
Norman
Harker?
Norman Harker?
=TRIM(CLEAN(SUBSTITUTE(A32,CHAR(160)," ")))
Norman
Harker.
Norman Harker.
=TRIM(CLEAN(SUBSTITUTE(A33,CHAR(160)," ")))
Norman Harker?
Norman Harker?
=TRIM(CLEAN(SUBSTITUTE(A34,CHAR(160)," ")))
Norman Harker
Norman Harker
=TRIM(CLEAN(SUBSTITUTE(A35,CHAR(160)," ")))
Norman John Ron Harker
Norman John Ron Harker =TRIM(CLEAN(SUBSTITUTE(A36,CHAR(160)," ")))
=TRIM(CLEAN(SUBSTITUTE(A37,CHAR(160)," ")))
=TRIM(CLEAN(SUBSTITUTE(A38,CHAR(160)," ")))
=TRIM(CLEAN(SUBSTITUTE(A39,CHAR(160)," ")))
=TRIM(CLEAN(SUBSTITUTE(A40,CHAR(160)," ")))
=TRIM(CLEAN(SUBSTITUTE(A41,CHAR(160)," ")))

Formula Comments
The formula replaces the non breaking character space CHAR(160) by a space. This often causes trouble with data imported
CLEAN removes all non printed characters with the most common "culprit" in Excel being CHAR(10).
CHAR(10) is inserted when you use Alt-Enter to force a line wrap. It is not "seen" in the cell it is entered in but will appear as a
The logic of the formula is that we TRIM after CLEANing after SUBSTITUTE of CHAR(160).
Just Remove Terminating Punctuation and TRIM
Norman John
Norman John
Harker
Harker
Norman
Norman
Harker!
Harker
Norman
Norman
Harker?
Harker
Norman
Norman
Harker.
Harker
Norman Harker?
Norman Harker
Norman Harker
Norman Harker
Norman John Ron Harker
Norman John Ron Harker

=LEFT(TRIM(A50),LEN(TRIM(A50))-OR(RIGHT(TRIM(A50))=

=LEFT(TRIM(A51),LEN(TRIM(A51))-OR(RIGHT(TRIM(A51))=

=LEFT(TRIM(A52),LEN(TRIM(A52))-OR(RIGHT(TRIM(A52))=

=LEFT(TRIM(A53),LEN(TRIM(A53))-OR(RIGHT(TRIM(A53))=
=LEFT(TRIM(A54),LEN(TRIM(A54))-OR(RIGHT(TRIM(A54))=
=LEFT(TRIM(A55),LEN(TRIM(A55))-OR(RIGHT(TRIM(A55))=
=LEFT(TRIM(A56),LEN(TRIM(A56))-OR(RIGHT(TRIM(A56))=
=LEFT(TRIM(A57),LEN(TRIM(A57))-OR(RIGHT(TRIM(A57))=
=LEFT(TRIM(A58),LEN(TRIM(A58))-OR(RIGHT(TRIM(A58))=
=LEFT(TRIM(A59),LEN(TRIM(A59))-OR(RIGHT(TRIM(A59))=
=LEFT(TRIM(A60),LEN(TRIM(A60))-OR(RIGHT(TRIM(A60))=
=LEFT(TRIM(A61),LEN(TRIM(A61))-OR(RIGHT(TRIM(A61))=

Formula Comments
First look at how we determine if there are terminating punctuation characters.
Norman John
Harker
FALSE
=-OR(RIGHT(TRIM(A65))={"?","!","."})
Norman
Harker!
TRUE
=-OR(RIGHT(TRIM(A66))={"?","!","."})
Norman
Harker?
TRUE
=-OR(RIGHT(TRIM(A67))={"?","!","."})
Norman
Harker.
TRUE
=-OR(RIGHT(TRIM(A68))={"?","!","."})
Norman Harker?
TRUE
=-OR(RIGHT(TRIM(A69))={"?","!","."})
Norman Harker
FALSE
=-OR(RIGHT(TRIM(A70))={"?","!","."})
Norman John Ron Harker
FALSE
=-OR(RIGHT(TRIM(A71))={"?","!","."})
FALSE
=-OR(RIGHT(TRIM(A72))={"?","!","."})
FALSE
=-OR(RIGHT(TRIM(A73))={"?","!","."})
FALSE
=-OR(RIGHT(TRIM(A74))={"?","!","."})
FALSE
=-OR(RIGHT(TRIM(A75))={"?","!","."})
FALSE
=-OR(RIGHT(TRIM(A76))={"?","!","."})

Note the "-" before the OR. This forces a return of -1 for TRUE and 0 for FALSE.
We've used an internal array within the OR function to check for existence of "?", "!" or "." in the TRIMmed target cell.
We have to TRIM the target cell in this OR function and elsewhere in the main formula just in case some darned fool has put a
The use of the internal array structure allows us to cycle through the options efficiently and without a long OR function that tes
The OR function will normally return TRUE or FALSE but we negate the function and force to to return -1 or 0.
Forcing to -1 or 0 allows us to use this element to calculate the LENgth of the TRIMmed formula for use by the LEFT function.
This forcing return of logical expressions to is commonly used where addition, subtraction or multiplications by 1 or zero can se

Acknowledgements
Dave McRitchie's website at:
http://www.mvps.org/dmcritchie/excel/strings.htm
A general resource and information repository on string manipulation.
We highly recommend a visit to Dave's web site Index at:
http://www.mvps.org/dmcritchie/excel/xlindex.htm

Ron de Bruin
Norman Harker
March 2005

Further References
See especially the DataRefiner Add
http://www.rondebruin.nl/win/addins
Also visit Ron's Home Page for links
http://www.rondebruin.nl/index.html
And if you like these tips, you might
http://www.rondebruin.nl/tips.htm

cially CHAR(10)) and removes ".", "?" and "!" from end.

T(TRIM(A7),LEN(TRIM(A7))-OR(RIGHT(TRIM(A7))={"?","!","."})),CHAR(160)," ")))

T(TRIM(A8),LEN(TRIM(A8))-OR(RIGHT(TRIM(A8))={"?","!","."})),CHAR(160)," ")))

T(TRIM(A9),LEN(TRIM(A9))-OR(RIGHT(TRIM(A9))={"?","!","."})),CHAR(160)," ")))

T(TRIM(A10),LEN(TRIM(A10))-OR(RIGHT(TRIM(A10))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A11),LEN(TRIM(A11))-OR(RIGHT(TRIM(A11))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A12),LEN(TRIM(A12))-OR(RIGHT(TRIM(A12))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A13),LEN(TRIM(A13))-OR(RIGHT(TRIM(A13))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A14),LEN(TRIM(A14))-OR(RIGHT(TRIM(A14))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A15),LEN(TRIM(A15))-OR(RIGHT(TRIM(A15))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A16),LEN(TRIM(A16))-OR(RIGHT(TRIM(A16))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A17),LEN(TRIM(A17))-OR(RIGHT(TRIM(A17))={"?","!","."})),CHAR(160)," ")))
T(TRIM(A18),LEN(TRIM(A18))-OR(RIGHT(TRIM(A18))={"?","!","."})),CHAR(160)," ")))

al > Values > OK.

specially CHAR(10))

CHAR(160)," ")))

CHAR(160)," ")))

CHAR(160)," ")))

CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))
CHAR(160)," ")))

rouble with data imported from HTML sources.

red in but will appear as a box in cells that reference it.

)-OR(RIGHT(TRIM(A50))={"?","!","."}))

)-OR(RIGHT(TRIM(A51))={"?","!","."}))

)-OR(RIGHT(TRIM(A52))={"?","!","."}))

)-OR(RIGHT(TRIM(A53))={"?","!","."}))
)-OR(RIGHT(TRIM(A54))={"?","!","."}))
)-OR(RIGHT(TRIM(A55))={"?","!","."}))
)-OR(RIGHT(TRIM(A56))={"?","!","."}))
)-OR(RIGHT(TRIM(A57))={"?","!","."}))
)-OR(RIGHT(TRIM(A58))={"?","!","."}))
)-OR(RIGHT(TRIM(A59))={"?","!","."}))
)-OR(RIGHT(TRIM(A60))={"?","!","."}))
)-OR(RIGHT(TRIM(A61))={"?","!","."}))

Mmed target cell.


ome darned fool has put a space after the punctuation.
long OR function that tests different values of the same parameter.

use by the LEFT function.


cations by 1 or zero can serve a given objective.

cially the DataRefiner Addin downloadable free from:


w.rondebruin.nl/win/addins/sectionaddins.htm
Ron's Home Page for links to other addins:
w.rondebruin.nl/index.html
like these tips, you might be interested in a lot more at:
w.rondebruin.nl/tips.htm

You might also like