NAAM Oracle Character sets Aino Andriessen 1 Demo1 2 nls_length_semantics Intializatie parameter CHAR of BYTE (default) Van toepassing op multi byte character sets Definieert het type voor de lengte van character kolommen en variabelen alter session set nls_length_semantics=CHAR; niet met terugwerkende kracht ev pl/sql recompile alter system 4 nls_length_semantics 2 lengte van karakter kolommen en variabelen expliciet opgeven create table demo (naam varchar2(4 char)) create table demo (naam varchar2(4 byte)) t_naam varchar2(4 char); t_naam demo2.naam%TYPE 5 Demo2 6 Character encoding 8 Character set Character set definieert de 'mapping' tussen binary/headecimale code en het character UTF8 WE8MSWIN1252 WE8ISO8859P1 JA16EUC US7ASCII WE8DEC ... Code pages IBM / windows terminologie ~ analoog met character set code page per language 9 Character sets 2 ASCII 1 byte 128 karakters standaard letters uit het engels zonder accenten ISO 8859 en latin-1 1 byte (8 bit) 256 karakters CP-1252 Windows variant op latin 1 UTF8 variabel, multibyte max 4 bytes ~100000 karakters • ~1 miljoen beschikbaar meertalig ascii codes zijn gelijk 10 Voorbeelden Character Set Hexadecimale code - Euro AL32UTF8 E282AC WE8MSWIN1252 80 ASCII - WE8ISO8859P1 - WE8ISO8859P15 164 Character Set Hexadecimale code - é AL32UTF8 C3A9 (50089) WE8MSWIN1252 E9 (233) ASCII - WE8ISO8859P1 E9 WE8ISO8859P15 E9 11 Unicode / UTF 8 example The image shows the number of bytes needed to store different kinds of characters in the UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes. The supplementary character (treble clef sign) requires 4 bytes of storage. 12 Diakrieten en speciale tekens Diakrieten zijn accenten die bij (boven, onder of zelfs door) een letter gezet worden om de uitspraak van een letter te veranderen en daarmee taaleigen klanken van een (gewijzigde) letter te voorzien. àÿęňĜş etc. Speciale tekens ßæ¿ 13 Diakrieten en speciale tekens Single byte character sets 1 byte voor samengesteld karakter Niet alle combinaties mogelijk code pages UTF-8 diakriet heeft eigen codering samengesteld karakter heeft eigen codering • meestal (altijd) samenstelling van oorspronkelijke karakter + diakriet 14 Database functies Character functies substr - substrb - substrc - substr2 instr - ... length - lengthb chr (n) Returns a character corresponding to the number passed in as the argument in the database character set select chr (50089) from dual; dump Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal representation of expr. The returned result is always in the database character set. select dump (naam, 1017) from demo2; convert Converts a character string from one character set to another utl_raw select utl_raw.cast_to_raw(naam) from demo2; unistr() Converts the characters in x to the national language character set select (unistr('Ren\00e9')) from dual; 15 Demo3 16 nls_lang Client character set When the client NLS_LANG character set is set to the same value as the database character set, Oracle assumes that the data being sent or received are of the same (correct) encoding, so no conversions or validations may occur for performance reasons. The data is just stored as delivered by the client, bit by bit. 18 nls lang 2 language_country.character set american_america.UTF8 dutch_the netherlands.WE8MSWIN1252 american_THE NETHERLANDS.WE8MSWIN1252 Environment variable, nls_lang Verschil in Windows GUI (WE8MSWIN1252) en command line (WE8PC850) Wordt niet door Java clients gebruikt 19 Demo4 20 National character set Support for another character set next to the database character set e.g to allow japanese in a MSWIN1252 or ISO8859 character set Less necessary in a UTF8 database Multibyte nvarchar, nclob etc. 22 Case TELETEX karakterset bestaat niet meer in Oracle select convert(naam,’TELETEX’,’UTF8’) from tabel; Locale builder 23 sql> select name from emp sql> select name from emp@db sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw (name)) from emp@db sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw@db (name)) from emp@db 25 Vraag Diacrietloos zoeken Case insensitive zoeken 26 Summary nls_lenght_semantics Always explicitly define a character column with its type (CHAR or BYTE) Oracle performs automatic character set conversion wysinawyg Use a Java client Working with character sets can be confusing UTF8 is often the preferred character set 27 Referenties Unicode en Ultraedit http://www.ultraedit.com/support/tutorials_power_tips/ultr aedit/unicode.html nls_lang http://www.oracle.com/technology/tech/globalization/htdo cs/nls_lang%20faq.htm Oracle globalization support http://download.oracle.com/docs/cd/B28359_01/server.1 11/b28298/toc.htm Wikipedia 28