0% found this document useful (0 votes)
1K views9 pages

Changing Us7ascii To We8mswin1252

This document provides guidance on changing the database character set from US7ASCII or WE8ISO8859P1 to WE8MSWIN1252. It recommends using the Csscan tool to check for invalid data and potential data loss during the change. The steps include verifying prerequisites are met, checking for database issues, using Csscan to identify any non-convertible 'lossy' data, and performing the actual character set alteration if Csscan finds no issues. Special considerations are given for different character set conversions based on the source and destination encodings.

Uploaded by

Abhijit Satam
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1K views9 pages

Changing Us7ascii To We8mswin1252

This document provides guidance on changing the database character set from US7ASCII or WE8ISO8859P1 to WE8MSWIN1252. It recommends using the Csscan tool to check for invalid data and potential data loss during the change. The steps include verifying prerequisites are met, checking for database issues, using Csscan to identify any non-convertible 'lossy' data, and performing the actual character set alteration if Csscan finds no issues. Special considerations are given for different character set conversions based on the source and destination encodings.

Uploaded by

Abhijit Satam
Copyright
© Attribution Non-Commercial (BY-NC)
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 9

Changing US7ASCII or WE8ISO8859P1 to WE8MSWIN1252 [ID 555823.

1]
Modified 22-SEP-2010 Type BULLETIN Status PUBLISHED

In this Document
Purpose
Scope and Application
Changing US7ASCII or WE8ISO8859P1 to WE8MSWIN1252
1) Prerequisites
US7ASCII versus WE8MSWIN1252
WE8ISO8859P1 versus WE8MSWIN1252
2) Check the source database for:
2.a) Invalid objects.
2.b) Orphaned Datapump master tables (10g and up)
2.c) Unneeded sample schema's/users.
2.d) Objects in the recyclebin (10g an up)
3) Check if there are no invalid code points in the database for the current
NLS_CHARACTERSET:
4) Csscan lists "Lossy" data in the scan performed in step 3.
5) Final Csscan run when going to WE8MSWIN1252
6) Performing the actual character set change:
7) Make sure clients are using the correct NLS_LANG setting:
8) Notes:
References

Applies to:

Oracle Server - Enterprise Edition - Version: 8.1.7.4 to 11.2.0.1.0 - Release: 8.1.7 to 11.2
Information in this document applies to any platform.

Purpose

To provide a guide to change the NLS_CHARACTERSET from US7ASCII or WE8ISO8859P1 to


WE8MSWIN1252.

We strongly advice to follow this note also when using export/import from an US7ASCII or
WE8ISO8859P1 to a WE8MSWIN1252 database.

The current NLS_CHARACTERSET is seen in NLS_DATABASE_PARAMETERS.

select value from NLS_DATABASE_PARAMETERS where parameter='NLS_CHARACTERSET';

For other characterset conversion please see Note 225912.1 Changing the Database Character
Set ( NLS_CHARACTERSET )

Scope and Application

Any DBA wanting to change the current NLS_CHARACTERSET from US7ASCII or


WE8ISO8859P1 to WE8MSWIN1252.

Changing US7ASCII or WE8ISO8859P1 to WE8MSWIN1252


1) Prerequisites

In this note the Csscan tool is used. Please install this first
Note 458122.1 Installing and configuring CSSCAN in 8i and 9i
Note 745809.1 Installing and configuring CSSCAN in 10g and 11g
To have an overview of the output and what it means please read Note 444701.1 Csscan output
explained

US7ASCII versus WE8MSWIN1252

All characters included in the US7ASCII character set are defined in WE8MSWIN1252 with the
same codepoint, that means WE8MSWIN1252 is a binary or "strict" superset of US7ASCII.

However there are a few possible problems. While US7ASCII only defines characters like a-z/A-
Z,0-9, US7ASCII is often (ab)used as database characterset for storing non-US7ASCII data.

This are all printable US7ASCII characters: !"#$%&'()*+,-./0123456789:;<=>?


@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]^_`abcdefghijklmnopqrstuvwxyz{|}~

A common problem is that in a environment using English and West European or Latin American
( French, Spanish, Portuguese, Dutch, Italian,...) windows clients a lot of setups use a
NLS_LANG set to US7ASCII or NOT define a NLS_LANG on the client side, which in that case
defaults to US7ASCII. For windows systems this is not correct and provokes in most cases that
there are actually WE8MSWIN1252 codes stored in the US7ASCII database. The most
commonly seen characters are the € symbol and these quotes ‘’“” , this are the 1252 "smart
qoutes" used in Microsoft Office. They look similar to the "normal" US7ASCII qoute " in most fonts
but are different characters and result often in confusion. The Courrier New font for example
distinct them quite good visibly. This is further documented in step 4)

WE8ISO8859P1 versus WE8MSWIN1252

All characters included in the WE8ISO8859P1 character set are defined in WE8MSWIN1252 with
the same codepoint, that means WE8MSWIN1252 is a binary or "strict" superset of
WE8ISO8859P1.
Note 341676.1 Difference between WE8MSWIN1252 and WE8ISO8859P1 characterset

However there are a few possible problems. While WE8ISO8859P1 only defines West European
language characters, WE8ISO8859P1 is often (ab)used as database characterset for non-
western data. So make sure that you are only storing English and West European or Latin
American ( French, Spanish, Portuguese, Dutch, Italian,...) data.

There are for example customers storing Hebrew in a WE8ISO8859P1 database. In that case do
NOT go to WE8MSWIN1252 but check the "lossy" section in Note 444701.1 Csscan output
explained and use Note 225938.1 Database Character Set Healthcheck. If you have ANY
questions following Note 225938.1 , then log a SR.

An other common problem is that in a environment using English and West European or Latin
American ( French, Spanish, Portuguese, Dutch, Italian,...) windows clients a lot of setups use a
NLS_LANG set to WE8ISO8859P1 on the client side. For windows systems this is not correct
and provokes in most cases that there are actually WE8MSWIN1252 codes stored in the
WE8ISO8859P1 database. The most commonly seen characters are the € symbol and these
qoutes: ‘’“” this are the 1252 "smart qoutes" used in Microsoft Office. They look similar to the
"normal" US7ASCII qoute " in most fonts but are different characters and result often in
confusion. The Courrier New font for example distinct them quite good visibly. This is further
documented in step 4)
2) Check the source database for:

2.a) Invalid objects.

select owner, object_name, object_type, status from dba_objects where status


='INVALID';

If there are any invalid objects, resolve / drop those before going further.

2.b) Orphaned Datapump master tables (10g and up)

SELECT o.status, o.object_id, o.object_type,


o.owner||'.'||object_name "OWNER.OBJECT"
FROM dba_objects o, dba_datapump_jobs j
WHERE o.owner=j.owner_name AND o.object_name=j.job_name
AND j.job_name NOT LIKE 'BIN$%' ORDER BY 4,2;

Note 336014.1 How To Cleanup Orphaned DataPump Jobs In DBA_DATAPUMP_JOBS ?

2.c) Unneeded sample schema's/users.

The 'HR', 'OE', 'SH', 'PM', 'IX', 'BI' and 'SCOTT' users are by default sample schema's. There is
no point in having these sample schema's in a production system. If the sample schema's exist
drop them.
This note is useful to identify Oracle provided users in your database Note 160861.1 Oracle
Created Database Users: Password, Usage and Files

An other user that might be removed is SQLTXPLAIN from Note 215187.1

2.d) Objects in the recyclebin (10g an up)

conn / as sysdba
SELECT OWNER, ORIGINAL_NAME, OBJECT_NAME, TYPE from dba_recyclebin order by 1,2;

If there are objects in the recyclebin then perform

conn / as sysdba
PURGE DBA_RECYCLEBIN;

This will remove unneeded objects and otherwise during CSALTER an ORA-38301 will be seen.

3) Check if there are no invalid code points in the database for the current
NLS_CHARACTERSET:

Run csscan with the following syntax for a WE8IS08859P1 database:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y


FROMCHAR=WE8ISO8859P1 TOCHAR=WE8ISO8859P1 LOG=P1check CAPTURE=Y ARRAY=1000000
PROCESS=2

This will create 3 files :

P1check.out a logging of the output of csscan


P1check.txt a Database Scan Summary Report
P1check.err a log file that normally should contain the rowid's of the rows of the tables reported in
P1check.txt

Run csscan with the following syntax for a US7ASCII database:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y FROMCHAR=US7ASCII


TOCHAR=US7ASCII LOG=US7check CAPTURE=Y ARRAY=1000000 PROCESS=2

This will create 3 files :

US7check.out a logging of the output of csscan


US7check.txt a Database Scan Summary Report
US7check.err a log file that normally should contain the rowid's of the rows of the tables reported
in US7check.txt

Always run Csscan connecting with a 'sysdba' connection/user, do not use "system" or "cmsig"
user.

The PROCESS= parameter influences the load on your system, the higher this is (6 or 8 for
example) the faster Csscan will be done, the lower this is the less impact it will have on your
system. Adapt if needed.

Because you've entered the TO and FROM character sets as the same you cannot have any
"convertible" data.
If you have entry's in P1check.txt / US7check.txt under the "Lossy" column then proceed go to
point 4)
Note that in this case you CANNOT use Export / Import to do the characterset change.
If you have NO entry's in P1check.txt / US7check.txt under the "Lossy" column then proceed to
point 5).
Note that in this case you CAN use Export / Import to do the characterset change.

4) Csscan lists "Lossy" data in the scan performed in step 3.

As said before the first thing you need to make sure of is that ALL your current data is English
and West European or Latin American ( French, Spanish, Portuguese, Dutch, Italian,...). There is
no 100% automated way to detect this using csscan or other "scan" way's. You will need to
double check the used clients.

Once you have established that only English and West European / Latin America languages are
stored you need to find out the characterset of the "Lossy" data. The best way to do this is to
open the P1check.err on a US/West European/Latin American Windows system in wordpad or
notepad.

If you then can correctly "see" in notepad for example the Euro symbol and other reported
"Lossy" data you know this is 1252 data. This is the most common case. How this is possible is
explained in detail in Note 252352.1 Euro Symbol Turns up as Upside-Down Questionmark

If the data is "funny" or has "wierd" symbols when you open the P1check.err / US7check.txt in
note/wordpad then you can open P1check.err / US7check.txt in the dos box "edit" editor. If the
data is then correctly showed you have US8PC437 or WE8PC850 data stored (depending on the
chcp value). More information about "lossy" is in Note 444701.1 Csscan output explained

In the case you log a SR then please provide the 3 csscan files. Please do not copy paste the
output in metalink, upload the files itself.
5) Final Csscan run when going to WE8MSWIN1252

5.a) If you had "lossy" data in point 3 and are sure the "Lossy" data is actual WE8MSWIN1252
data then perform a last check with csscan :

Run csscan with the following syntax:

$ csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y


FROMCHAR=WE8MSWIN1252 TOCHAR=WE8MSWIN1252 LOG=1252check CAPTURE=Y ARRAY=1000000
PROCESS=2

The FROMCHAR=WE8MSWIN1252 is not a typo.

If there was "Lossy" in point 3 you cannot use export/import yet and you NEED to use Alter
database character set/Csalter to go to WE8MSWIN1252 BEFORE doing any export/import in a
database with a other NLS_CHARACTERSET then the current (US7ASCII or WE8ISO8859P1).

If this FROMCHAR=WE8MSWIN1252 scan gives only "Changeless" then this means the current
CHAR, VARCHAR2, LONG and CLOB data in this database (even in an US7ASCII or
WE8ISO8859P1 database) is all within the defined code range of 1252. If the csscan output is
"Changeless" then running Csalter / Alter database character set WE8MSWIN1252 will correct
the definition of this database to WE8MSWIN1252 without actually touching the stored data. This
is not a conversion as such, this is a correction of the characterset declaration
(NLS_CHARACTERSET) to match the characterset of the actual data in this database.

Note that aldo csscan can give a good indication, we stress to not blindly assume all your data is
WE8MSWIN1252 because Csscan gives a "Changeless" report, why is documented in Note
444701.1 Csscan output explained

goto point 5.c)

5.b) If the was no "Lossy" data in point 3 then perform a last check with csscan :

Run csscan with the following syntax:

csscan \"sys/<syspassword>@<TNSalias> as sysdba\" FULL=Y TOCHAR=WE8MSWIN1252


LOG=1252check CAPTURE=Y ARRAY=1000000 PROCESS=2

If there was no "Lossy" in point 3 then you can now use export/import (if wanted) to go to
WE8MSWIN1252.
Note:227332.1 NLS considerations in Import/Export - Frequently Asked Questions

5.c) Both scans will will create 3 files :

1252check.out a logging of the output of csscan


1252check.txt a Database Scan Summary Report
1252check.err a log file that normally should contain the rowid's of the rows of the tables reported
in 1252check.txt

5.c.1) The needed csscan output for 8i/9i to use "Alter Database Character Set".

To use "Alter Database Character Set" the Csscan output needs to be changeless for all CHAR
VARCHAR2, CLOB and LONG data (Data Dictionary and user).

In order to use "Alter Database Character Set" you need to see in the charcheck.txt file under
[Scan Summary]:
All character type data in the data dictionary remain the same in the new
character set
All character type application data remain the same in the new character set

A 'clean' Csscan run must have been completed prior to running "Alter Database Character Set".
A 'clean' scan means that there is no convertible, truncation or lossy data in the database, only
changeless data.

5.c.2) The needed csscan output for 10g and up to use Csalter.

To use Csalter the Csscan output needs to be 'clean', meaning it needs to be:

* changeless for all CHAR VARCHAR2, and LONG data (Data Dictionary and Application (user)
Data)
* changeless for all Application (user) Data CLOB
* changeless and/or convertible for all Data Dictionary CLOB

In order to run Csalter you need to see in the charcheck.txt file under [Scan Summary]
All character type application data remain the same in the new character set
and under [Data Dictionary Conversion Summary]
The data dictionary can be safely migrated using the CSALTER script

If you run Csalter without these conditions met then you will see messages like " Unrecognized
convertible data found in scanner result " in the Csalter output and Csalter will abort.

Before you can run Csalter you need to have a 'clean' FULL=Y csscan result that must have been
completed in the past 7 days prior to running Csalter. A 'clean' scan means that there is no
convertible (except Data Dictionary CLOB data which can be convertible and will be handled by
Csalter), truncation or lossy data in the database.

The Csalter script itself takes no arguments, if above conditions are met then Csalter will change
the NLD_CHARACTERSET to the one specified in the TOCHAR of the last Csscan run.

6) Performing the actual character set change:

Perform a backup of the database. Check the backup. Double-check the backup.

6.a) For 9i and 8i

Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's the
sqlplus session where you do the change.

1. Make sure the PARALLEL_SERVER (8i) and CLUSTER_DATABASE parameter are set to
false or it is not set at all. If you are using RAC you will need to start the database in single
instance with CLUSTER_DATABASE = FALSE

conn / as sysdba
sho parameter CLUSTER_DATABASE
sho parameter PARALLEL_SERVER

2. Execute the following commands in sqlplus connected as "/ AS SYSDBA":

conn / as sysdba
SPOOL Nswitch.log
SHUTDOWN IMMEDIATE;
STARTUP MOUNT;
ALTER SYSTEM ENABLE RESTRICTED SESSION;
ALTER SYSTEM SET JOB_QUEUE_PROCESSES=0;
ALTER SYSTEM SET AQ_TM_PROCESSES=0;
ALTER DATABASE OPEN;
ALTER DATABASE CHARACTER SET WE8MSWIN1252;
SHUTDOWN IMMEDIATE;
-- in 8i you need to do another startup/shutdown
STARTUP;
SHUTDOWN;

An alter database takes typically only a few minutes or less, it depends on the number of columns
in the database, not the amount of data.

3. Restore the PARALLEL_SERVER (8i) and CLUSTER_DATABASE parameter if necessary and


start the database. For RAC start the other instances.

6.b) For 10g and up:

Csalter.plb needs to be used within 7 days after the Csscan run, otherwise you will get 'The
CSSCAN result has expired' message.

Shutdown the listener and any application that connects locally to the database.
There should be only ONE connection the database during the WHOLE time and that's the
sqlplus session where you do the change. RAC systems need to be started as single instance.

Use sqlplus connected as "/ AS SYSDBA":


conn / as sysdba
-- Make sure the CLUSTER_DATABASE parameter is set
-- to false or it is not set at all.
-- If you are using RAC you will need to start the database in single instance
-- with CLUSTER_DATABASE = FALSE
sho parameter CLUSTER_DATABASE
-- if you are using spfile note the
sho parameter job_queue_processes
sho parameter aq_tm_processes
-- (this is Bug 6005344 fixed in 11g )
-- then do

shutdown
startup restrict
SPOOL Nswitch.log

-- do this alter system or you might run into "ORA-22839: Direct updates on
SYS_NC columns are disallowed"
-- This is only needed in 11.1.0.6, fixed in 11.1.0.7, not applicable to 10.2
or lower
-- ALTER SYSTEM SET EVENTS '22838 TRACE NAME CONTEXT LEVEL 1,FOREVER';

-- then run Csalter.plb

@?/rdbms/admin/csalter.plb

-- Csalter will aks confirmation - do not copy paste the whole actions on one
time
-- sample Csalter output:

-- 3 rows created.
...
-- This script will update the content of the Oracle Data Dictionary.
-- Please ensure you have a full backup before initiating this procedure.
-- Would you like to proceed (Y/N)?y
-- old 6: if (UPPER('&conf') <> 'Y') then
-- New 6: if (UPPER('y') <> 'Y') then
-- Checking data validility...
-- begin converting system objects

-- PL/SQL procedure successfully completed.

-- Alter the database character set...


-- CSALTER operation completed, please restart database

-- PL/SQL procedure successfully completed.


...
-- Procedure dropped.

-- if you are using spfile then you need to also

-- ALTER SYSTEM SET job_queue_processes=<original value> SCOPE=BOTH;


-- ALTER SYSTEM SET aq_tm_processes=<original value> SCOPE=BOTH;

shutdown
startup

and the database will be WE8MSWIN1252.

Note: in 10.1 you will see csalter asking for "Enter value for 1: ".

-- Would you like to proceed ?(Y/N)?Y


-- old 5: if (UPPER('&conf') <> 'Y') then
-- new 5: if (UPPER('Y') <> 'Y') then
-- Enter value for 1:

-> simply hit enter.

7) Make sure clients are using the correct NLS_LANG setting:

Note 158577.1 NLS_LANG Explained (How does Client-Server Character Conversion Work?)
1.2 What is this NLS_LANG thing anyway?
Note 179133.1 The correct NLS_LANG in a Windows Environment
Note 264157.1 The correct NLS_LANG setting in Unix Environments
Note 229786.1 NLS_LANG and webservers explained.
Note 115001.1 NLS_LANG Client Settings and JDBC Drivers Notes

8) Notes:

* The inverse operation (WE8MSWIN1252->WE8IS8859P1 or WE8MSWIN1252 -> US7ASCII) is


normally NOT possible without losing data !

* You can create a database on Unix with a "Windows" character set like WE8MSWIN1252.
Oracle does not depend on the OS for the DATABASE (or national) character set.The only
restriction is that you cannot use EBCDIC character sets (like used on AS400 etc.) on ASCII
based platforms (like used on Unix and Windows) (or inverse) for the database character set.

* A often asked question is what is then the correct LANG and NLS_LANG setting on my Unix
client? The used LANG and NLS_LANG on the server is NOT affecting client (listener)
connections. Most likely you are using the telnet/ssh env to do administrative tasks on the
database, not to actually enter user data. In that case you can set the LANG to iso-8859-1 and
the NLS_LANG to AMERICAN_AMERICA.WE8MSWIN1252 in the Unix profile.
While this is technically not 100% correct it's a good solution.
Note that you most likely will not be able to see the euro in the telnet env. If you want to double
check the data we advice to use SqlDeveloper, this is a "know good client" that needs no NLS
configuration.
You can download it from http://www.oracle.com/technology/products/database/sql_developer/
If the data is displayed correctly in SQLdeveloper then you are sure it's correct in the database.
Like said, the fact that you cannot see the euro in the telnet env is NOT affecting Windows clients
who connect trough the listener.

If you DO want to see all the symbols known by 1252 (like the Euro) on the Unix prompt we
suggest to use a LANG set to UTF-8 and a NLS_LANG set to AMERICAN_AMERICA.UTF8 Note
264157.1 The correct NLS_LANG setting in Unix Environments

References

NOTE:225912.1 - Changing the Database Character Set ( NLS_CHARACTERSET )


NOTE:444701.1 - Csscan output explained
NOTE:458122.1 - Installing and Configuring Csscan in 8i and 9i (Database Character Set
Scanner)
NOTE:745809.1 - Installing and configuring Csscan in 10g and 11g (Database Character Set
Scanner)

Related

Products

• Oracle Database Products > Oracle Database > Oracle Database > Oracle Server -
Enterprise Edition

Keywords
CSSCAN; LOSSY; CHARACTERSET; WE8ISO8859P1; US7ASCII; WE8MSWIN1252
Errors
ORA-38301; ORA-22839

Back to top

You might also like