Character Set Migration and Lossy Data.

Character set migration is changing a database existing character set to new character set, most of the dba’s are familiar with the concept. So it needs selection of appropriate characters set. It involves dealing with data losses like :
— Data truncation.
— Lossy data.

Most of the PROD databases run in UTF8 Character set, unicode character set, and character set migration from ASCII to UNICODE are common, like WE8DEC to AL32UTF8. Here we see how to handle lossy data. We have two utilities to use for migration, CSSCAN and CSALTER. CSSCAN identifies the data that is convertible, truncated, and lossy. But from 11.2 we use DMU, Oracle recommends it, the only drawback is its based on BASH shell.

This is for migrations using CSSCAN and CSALTER, Below weblink provides information on using the tools:
http://docs.oracle.com/cd/B19306_01/server.102/b14225/ch11charsetmig.htm

Once you get the final report, you will have information on the data that is convertible, lossy, and truncated. The report contains all the reported(convertible, lossy, truncated) rows along with row id’s .
Below script helps you identify the rows:
###################################################################################

SET serveroutput ON
SET trimspool ON

— Script to identify lossy data in Char, varchar2 columns
— Use on your own RISK, this needs testing before execution.

DECLARE
STMT_1 varchar2(8000);
STDA_1 varchar2(6000);
dbcharset varchar2(40);

BEGIN

— get the current db char set;

STMT_1 := := ‘select VALUE from NLS_DATABASE_PARAMETERS where parameter = ”NLS_CHARACTERSET”’;
execute immediate STMT_1 into dbcharset;

DBMS_OUTPUT.PUT_LINE(‘Current NLS Charset:’ || dbcharset);

— Loop all lossy data from the csscan results
—
FOR Record_1 in
( SELECT distinct e.owner_name, e.table_name, e.column_name
FROM csmv$errors e WHERE e.error_type=’DATA_LOSS’ and e.column_type in (‘CHAR’, ‘VARCHAR2’)
and e.owner_name not in (‘SYS’,’SYSTEM’) order by e.owner_name, e.table_name, e.column_name)
LOOP

DBMS_OUTPUT.PUT_LINE(‘Current Row: ‘|| Record_1.owner_name ||’.’|| Record_1.table_name||’ (‘|| Record_1.column_name||’)’);
—
— loop the rowid’s of the rows in that column which are listed as LOSSY
—
FOR Row_1 in
( SELECT data_rowid
FROM csmv$errors ce
WHERE ce.error_type=’DATA_LOSS’ and ce.column_type in (‘CHAR’, ‘VARCHAR2’)
and ce.owner_name=Record_1.owner_name and ce.table_name=Record_1.table_name
and ce.column_name=Record_1.column_name)
LOOP

— If you want each row details, uncomment below line…
— DBMS_OUTPUT.PUT_LINE(‘For Each row ‘|| Record_1.owner_name ||’.’|| Record_1.table_name||’ (‘|| Record_1.column_name ||’) – ‘|| Row_1.data_rowid );
—
STMT_1 := ‘select “‘|| Record_1.column_name || ‘” from “‘|| Record_1.owner_name || ‘”.”‘ || Record1.table_name ;
STMT_1 := STMT_1 ||'” WHERE ROWID = ”’|| Row_1.data_rowid ||””;
execute immediate STMT_1 into STDA_1;

DBMS_OUTPUT.PUT_LINE(STDA_1);
— execute immediate STMT_1;

END LOOP;
END LOOP;

DBMS_OUTPUT.PUT_LINE (‘Execution Completed!!’);
END;

#########################################################################################

Note: It must be executed as CSSCAN table owner, it queries table csmv$errors.

Once you get the data, it’s easy to identify the respective rows. If you want to know about invalid characters that are in existing database then use CONVERT function in the client with NLS_LANG target char set.

OracleTek Blog

Character Set Migration and Lossy Data.

Leave a comment Cancel reply

Related

Leave a comment Cancel reply