1 NAAM Oracle Character sets Aino Andriessen

advertisement


NAAM
Oracle Character sets
Aino Andriessen

1
Demo1



2
nls_length_semantics


 Intializatie parameter
 CHAR of BYTE (default)
 Van toepassing op multi byte character sets
 Definieert het type voor de lengte van character
kolommen en variabelen
 alter session set nls_length_semantics=CHAR;
 niet met terugwerkende kracht
 ev pl/sql recompile
 alter system

4
nls_length_semantics 2


 lengte van karakter kolommen en variabelen
expliciet opgeven
 create table demo (naam varchar2(4 char))
 create table demo (naam varchar2(4 byte))
 t_naam varchar2(4 char);
 t_naam demo2.naam%TYPE

5
Demo2



6
Character encoding



8
Character set


 Character set definieert de 'mapping' tussen
binary/headecimale code en het character







UTF8
WE8MSWIN1252
WE8ISO8859P1
JA16EUC
US7ASCII
WE8DEC
...
 Code pages
 IBM / windows terminologie
 ~ analoog met character set
 code page per language

9
Character sets 2


 ASCII
 1 byte
 128 karakters
 standaard letters uit het engels zonder accenten
 ISO 8859 en latin-1
 1 byte (8 bit)
 256 karakters
 CP-1252
 Windows variant op latin 1
 UTF8
 variabel, multibyte
 max 4 bytes
 ~100000 karakters
•
~1 miljoen beschikbaar
 meertalig
 ascii codes zijn gelijk

10
Voorbeelden


Character Set
Hexadecimale code - Euro
AL32UTF8
E282AC
WE8MSWIN1252
80
ASCII
-
WE8ISO8859P1
-
WE8ISO8859P15
164
Character Set
Hexadecimale code - é
AL32UTF8
C3A9 (50089)
WE8MSWIN1252
E9 (233)
ASCII
-
WE8ISO8859P1
E9
WE8ISO8859P15
E9

11
Unicode / UTF 8 example


The image shows the number of bytes needed to store different kinds of characters in the
UTF-8 character set. The ASCII characters (C, t, and d) require one byte. The Latin and
Greek characters (á, ö, and Ø) require 2 bytes. The Asian character requires 3 bytes.
The supplementary character (treble clef sign) requires 4 bytes of storage.

12
Diakrieten en speciale tekens


 Diakrieten zijn accenten die bij (boven, onder of
zelfs door) een letter gezet worden om de uitspraak
van een letter te veranderen en daarmee taaleigen
klanken van een (gewijzigde) letter te voorzien.
 àÿęňĜş etc.
 Speciale tekens
 ßæ¿

13
Diakrieten en speciale tekens


 Single byte character sets
 1 byte voor samengesteld karakter
 Niet alle combinaties mogelijk
 code pages
 UTF-8
 diakriet heeft eigen codering
 samengesteld karakter heeft eigen codering
•
meestal (altijd) samenstelling van oorspronkelijke karakter +
diakriet

14
Database functies


 Character functies



substr - substrb - substrc - substr2
instr - ...
length - lengthb
 chr (n)


Returns a character corresponding to the number passed in as the argument in the
database character set
select chr (50089) from dual;
 dump


Returns a VARCHAR2 value containing the datatype code, length in bytes, and internal
representation of expr. The returned result is always in the database character set.
select dump (naam, 1017) from demo2;
 convert

Converts a character string from one character set to another
 utl_raw

select utl_raw.cast_to_raw(naam) from demo2;
 unistr()


Converts the characters in x to the national language character set
select (unistr('Ren\00e9')) from dual;

15
Demo3



16
nls_lang


 Client character set
 When the client NLS_LANG character set is set to
the same value as the database character set,
Oracle assumes that the data being sent or
received are of the same (correct) encoding, so no
conversions or validations may occur for
performance reasons. The data is just stored as
delivered by the client, bit by bit.

18
nls lang 2


 language_country.character set
 american_america.UTF8
 dutch_the netherlands.WE8MSWIN1252
 american_THE NETHERLANDS.WE8MSWIN1252
Environment variable, nls_lang
 Verschil in Windows GUI (WE8MSWIN1252) en
command line (WE8PC850)
 Wordt niet door Java clients gebruikt

19
Demo4



20
National character set


 Support for another character set next to the
database character set
 e.g to allow japanese in a MSWIN1252 or ISO8859
character set
 Less necessary in a UTF8 database
 Multibyte
 nvarchar, nclob etc.

22
Case


 TELETEX karakterset
 bestaat niet meer in Oracle
 select convert(naam,’TELETEX’,’UTF8’) from
tabel;
 Locale builder

23


sql> select name from emp
sql> select name from emp@db
sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw
(name)) from emp@db
sql> select utl_raw.cast_to_varchar (utl_raw.cast_to_raw@db
(name)) from emp@db

25
Vraag


 Diacrietloos zoeken
 Case insensitive zoeken

26
Summary


 nls_lenght_semantics
 Always explicitly define a character column with its
type (CHAR or BYTE)
 Oracle performs automatic character set
conversion
 wysinawyg
 Use a Java client
 Working with character sets can be confusing
 UTF8 is often the preferred character set

27
Referenties


 Unicode en Ultraedit
 http://www.ultraedit.com/support/tutorials_power_tips/ultr
aedit/unicode.html
 nls_lang
 http://www.oracle.com/technology/tech/globalization/htdo
cs/nls_lang%20faq.htm
 Oracle globalization support
 http://download.oracle.com/docs/cd/B28359_01/server.1
11/b28298/toc.htm
 Wikipedia

28
Download