Skip Headers
Oracle® Database Globalization Support Guide
11g Release 2 (11.2)

Part Number E10729-07
Go to Documentation Home
Home
Go to Book List
Book List
Go to Table of Contents
Contents
Go to Index
Index
Go to Master Index
Master Index
Go to Feedback page
Contact Us

Go to previous page
Previous
Go to next page
Next
PDF · Mobi · ePub
NLSPG971

Glossary

NLSPG605

accent

A mark that changes the sound of a character. Because the common meaning of accent is associated with the stress or prominence of the character's sound, the preferred word in Oracle Database Globalization Support Guide is diacritic.

See also diacritic.

NLSPG606

accent-insensitive linguistic sort

A linguistic sort that uses information only about base letters, not diacritics or case.

See also linguistic sort, base letter, diacritic, case.

NLSPG607

AL16UTF16

The default Oracle Database character set for the SQL NCHAR data type, which is used for the national character set. It encodes Unicode data in the UTF-16 encoding.

See also national character set.

NLSPG608

AL32UTF8

An Oracle Database character set for the SQL CHAR data type, which is used for the database character set. It encodes Unicode data in the UTF-8 encoding.

See also database character set.

NLSPG609

ASCII

American Standard Code for Information Interchange. A common encoded 7-bit character set for English. ASCII includes the letters A-Z and a-z, as well as digits, punctuation symbols, and control characters. The Oracle Database character set name is US7ASCII.

NLSPG610

base letter

A character without diacritics. For example, the base letter for a, A, ä, and Ä is a.

See also diacritic.

NLSPG611

binary sorting

Ordering character strings based on their binary coded values.

NLSPG612

byte semantics

Treatment of strings as a sequence of bytes.

See also character semantics and length semantics.

NLSPG613

canonical equivalence

A basic equivalence between characters or sequences of characters. For example, ç is equivalent to the combination of c and ,. They cannot be distinguished when they are correctly rendered.

NLSPG614

case

Refers to the condition of being uppercase or lowercase. For example, in a Latin alphabet, A is the uppercase glyph for a, the lowercase glyph.

NLSPG615

case conversion

Changing a character from uppercase to lowercase or vice versa.

NLSPG616

case-insensitive linguistic sort

A linguistic sort that uses information about base letters and diacritics but not case.

See also base letter, case, diacritic, linguistic sort.

NLSPG617

character

A character is an abstract element of text. A character is different from a glyph, which is a specific representation of a character. For example, the first character of the English upper-case alphabet can be displayed as A, A, A, and so on. These forms are different glyphs that represent the same character. A character, a character code, and a glyph are related as follows:

character --(encoding)--> character code --(font)--> glyph

For example, the first character of the English uppercase alphabet is represented in computer memory as a number. The number is called the encoding or the character code. The character code for the first character of the English uppercase alphabet is 0x41 in the ASCII encoding scheme. The character code is 0xc1 in the EBCDIC encoding scheme.

You must choose a font to display or print the character. The available fonts depend on which encoding scheme is being used. The character can be printed or displayed as A, A, or A, for example. The forms are different glyphs that represent the same character.

See also character code and glyph.

NLSPG618

character classification

Information provides details about the type of character associated with each character code. For example, a character can be uppercase, lowercase, punctuation, or control character.

NLSPG619

character code

A character code is a number that represents a specific character. The number depends on the encoding scheme. For example, the character code of the first character of the English uppercase alphabet is 0x41 in the ASCII encoding scheme, but it is 0xc1 in the EBCDIC encoding scheme.

See also character.

NLSPG620

character encoding scheme

A rule that assigns numbers (character codes) to all characters in a character set. Encoding scheme, encoding method, and encoding also mean character encoding scheme.

NLSPG621

character repertoire

The characters that are available to be used, or encoded, for a specific character set.

NLSPG622

character semantics

Treatment of strings as a sequence of characters.

See also byte semantics and length semantics.

NLSPG623

character set

A collection of elements that represent textual information for a specific language or group of languages. One language can be represented by more than one character set.

A character set does not always imply a specific character encoding scheme. A character encoding scheme is the assignment of a character code to each character in a character set.

In this manual, a character set usually does imply a specific character encoding scheme. Therefore, a character set is the same as an encoded character set in this manual.

NLSPG624

character set migration

Changing the character set of an existing database.

NLSPG625

character string

An ordered group of characters.

A character string can also contain no characters. In this case, the character string is called a null string. The number of characters in a null string is 0 (zero).

NLSPG626

client character set

The encoded character set used by the client. A client character set can differ from the server character set. The server character set is called the database character set. If the client character set is different from the database character set, then character set conversion must occur.

See also database character set.

NLSPG627

code point

The numeric representation of a character in a character set. For example, the code point of A in the ASCII character set is 0x41. The code point of a character is also called the encoded value of a character.

See also Unicode code point.

NLSPG628

code unit

The unit of encoded text for processing and interchange. The size of the code unit varies depending on the character encoding scheme. In most character encodings, a code unit is 1 byte. Important exceptions are UTF-16 and UCS-2, which use 2-byte code units, and wide character, which uses 4 bytes.

See also character encoding scheme.

NLSPG629

collation

Ordering of character strings according to rules about sorting characters that are associated with a language in a specific locale. Also called linguistic sort.

See also linguistic sort, monolingual linguistic sort, multilingual linguistic sort, accent-insensitive linguistic sort, case-insensitive linguistic sort.

NLSPG630

data scanning

The process of identifying potential problems with character set conversion and truncation of data before migrating the database character set.

NLSPG631

database character set

The encoded character set that is used to store text in the database. This includes CHAR, VARCHAR2, LONG, and fixed-width CLOB column values and all SQL and PL/SQL text.

NLSPG632

diacritic

A mark near or through a character or combination of characters that indicates a different sound than the sound of the character without the diacritical mark. For example, the cedilla in façade is a diacritic. It changes the sound of c.

NLSPG633

EBCDIC

Extended Binary Coded Decimal Interchange Code. EBCDIC is a family of encoded character sets used mostly on IBM systems.

NLSPG634

encoded character set

A character set with an associated character encoding scheme. An encoded character set specifies the number (character code) that is assigned to each character.

See also character encoding scheme.

NLSPG635

encoded value

The numeric representation of a character in a character set. For example, the code point of A in the ASCII character set is 0x41. The encoded value of a character is also called the code point of a character.

NLSPG636

font

An ordered collection of character glyphs that provides a graphical representation of characters in a character set.

NLSPG637

globalization

The process of making software suitable for different linguistic and cultural environments. Globalization should not be confused with localization, which is the process of preparing software for use in one specific locale (for example, translating error messages or user interface text from one language to another).

NLSPG638

glyph

A glyph (font glyph) is a specific representation of a character. A character can have many different glyphs. For example, the first character of the English uppercase alphabet can be printed or displayed as A, A, A, and so on. These forms are different glyphs that represent the same character.

See also character.

NLSPG639

ideograph

A symbol that represents an idea. Chinese is an example of an ideographic writing system.

NLSPG640

ISO

International Organization for Standards. A worldwide federation of national standards bodies from 130 countries. The mission of ISO is to develop and promote standards in the world to facilitate the international exchange of goods and services.

NLSPG641

ISO 8859

A family of 8-bit encoded character sets. The most common one is ISO 8859-1 (also known as ISO Latin1), and is used for Western European languages.

NLSPG642

ISO 14651

A multilingual linguistic sort standard that is designed for almost all languages of the world.

See also multilingual linguistic sort.

NLSPG643

ISO/IEC 10646

A universal character set standard that defines the characters of most major scripts used in the modern world. In 1993, ISO adopted Unicode version 1.1 as ISO/IEC 10646-1:1993. ISO/IEC 10646 has two formats: UCS-2 is a 2-byte fixed-width format, and UCS-4 is a 4-byte fixed-width format. There are three levels of implementation, all relating to support for composite characters:

NLSPG644

ISO currency

The 3-letter abbreviation used to denote a local currency, based on the ISO 4217 standard. For example, USD represents the United States dollar.

NLSPG645

ISO Latin1

The ISO 8859-1 character set standard. It is an 8-bit extension to ASCII that adds 128 characters that include the most common Latin characters used in Western Europe. The Oracle Database character set name is WE8ISO8859P1.

See also ISO 8859.

NLSPG646

length semantics

Length semantics determines how you treat the length of a character string. The length can be treated as a sequence of characters or bytes.

See also character semantics and byte semantics.

NLSPG647

linguistic index

An index built on a linguistic sort order.

NLSPG648

linguistic sort

An ordering of strings based on requirements from a locale instead of the binary representation of the strings.

See also multilingual linguistic sort and monolingual linguistic sort.

NLSPG649

locale

A collection of information about the linguistic and cultural preferences from a particular region. Typically, a locale consists of language, territory, character set, linguistic, and calendar information defined in NLS data files.

NLSPG650

localization

The process of providing language-specific or culture-specific information for software systems. Translation of an application's user interface is an example of localization. Localization should not be confused with globalization, which is the making software suitable for different linguistic and cultural environments.

NLSPG651

monolingual linguistic sort

An Oracle Database sort that has two levels of comparison for strings. Most European languages can be sorted with a monolingual sort, but it is inadequate for Asian languages.

See also multilingual linguistic sort.

NLSPG652

monolingual support

Support for only one language.

NLSPG653

multibyte

Two or more bytes.

When character codes are assigned to all characters in a specific language or a group of languages, one byte (8 bits) can represent 256 different characters. Two bytes (16 bits) can represent up to 65,536 different characters. Two bytes are not enough to represent all the characters for many languages. Some characters require 3 or 4 bytes.

One example is the UTF8 Unicode encoding. In UTF8, there are many 2-byte and 3-byte characters.

Another example is Traditional Chinese, used in Taiwan. It has more than 80,000 characters. Some character encoding schemes that are used in Taiwan use 4 bytes to encode characters.

See also single byte.

NLSPG654

multibyte character

A character whose character code consists of two or more bytes under a certain character encoding scheme.

Note that the same character may have different character codes under different encoding schemes. Oracle Database cannot tell whether a character is a multibyte character without knowing which character encoding scheme is being used. For example, Japanese Hankaku-Katakana (half-width Katakana) characters are one byte in the JA16SJIS encoded character set, two bytes in JA16EUC, and three bytes in UTF8.

See also single-byte character.

NLSPG655

multibyte character string

A character string that consists of one of the following strings:

NLSPG656

multilingual linguistic sort

An Oracle Database sort that evaluates strings on three levels. Asian languages require a multilingual linguistic sort even if data exists in only one language. Multilingual linguistic sorts are also used when data exists in several languages.

NLSPG657

national character set

An alternate character set from the database character set that can be specified for NCHAR, NVARCHAR2, and NCLOB columns. National character sets are in Unicode only.

NLSPG658

NLB files

Binary files used by the Locale Builder to define locale-specific data. They define all of the locale definitions that are shipped with a specific release of Oracle Database. You can create user-defined NLB files with Oracle Locale Builder.

See also Oracle Locale Builder and NLT files.

NLSPG659

NLS

National Language Support. NLS enables users to interact with the database in their native languages. It also enables applications to run in different linguistic and cultural environments. The term is somewhat obsolete because Oracle Database supports multiple global users at one time.

NLSPG660

NLSRTL

National Language Support Runtime Library. This library is responsible for providing locale-independent algorithms for internationalization. The locale-specific information (that is, NLSDATA) is read by the NLSRTL library during run-time.

NLSPG661

NLT files

Text files used by the Locale Builder to define locale-specific data. Because they are in text, you can view the contents.

NLSPG662

null string

A character string that contains no characters.

NLSPG663

Oracle Locale Builder

A GUI utility that offers a way to view, modify, or define locale-specific data. You can also create your own formats for language, territory, character set, and linguistic sort.

NLSPG664

replacement character

A character used during character conversion when the source character is not available in the target character set. For example, ? (question mark) is often used as the default replacement character for Oracle Database.

NLSPG665

restricted multilingual support

Multilingual support that is restricted to a group of related languages.Western European languages can be represented with ISO 8859-1, for example. If multilingual support is restricted, then Thai could not be added to the group.

NLSPG666

SQL CHAR data types

Includes CHAR, VARCHAR, VARCHAR2, CLOB, and LONG data types.

NLSPG667

SQL NCHAR data types

Includes NCHAR, NVARCHAR, NVARCHAR2, and NCLOB data types.

NLSPG668

script

A particular system of writing. A collection of related graphic symbols that are used in a writing system. Some scripts can represent multiple languages, and some languages use multiple scripts. Examples of scripts include Latin, Arabic, and Han.

NLSPG669

single byte

One byte. One byte usually consists of 8 bits. When character codes are assigned to all characters for a specific language, one byte (8 bits) can represent 256 different characters.

See also multibyte.

NLSPG670

single-byte character

A single-byte character is a character whose character code consists of one byte under a specific character encoding scheme. Note that the same character may have different character codes under different encoding schemes. Oracle Database cannot tell which character is a single-byte character without knowing which encoding scheme is being used. For example, the euro currency symbol is one byte in the WE8MSWIN1252 encoded character set, two bytes in AL16UTF16, and three bytes in UTF8.

See also multibyte character.

NLSPG671

single-byte character string

A single-byte character string consists of one of the following strings:

NLSPG672

supplementary characters

The first version of Unicode was a 16-bit, fixed-width encoding that used two bytes to encode each character. This enabled 65,536 characters to be represented. However, more characters need to be supported because of the large number of Asian ideograms.

Unicode 3.1 defined supplementary characters to meet this need. Unicode 3.1 began using two 16-bit code units (also known as surrogate pairs) to represent a single character. This enabled an additional 1,048,576 characters to be defined. The Unicode 3.1 standard added the first group of 44,944 supplementary characters. More were added with Unicode 4.0 versions, and 1,369 more have been added with Unicode 5.0.

NLSPG673

surrogate pairs

See also supplementary characters.

NLSPG674

syllabary

Provide a mechanism for communicating phonetic information along with the ideographic characters used by languages such as Japanese.

NLSPG675

UCS-2

A 1993 ISO/IEC standard character set. It is a fixed-width, 16-bit Unicode character set. Each character occupies 16 bits of storage. The ISO Latin1 characters are the first 256 code points, so it can be viewed as a 16-bit extension of ISO Latin1.

NLSPG676

UCS-4

A fixed-width, 32-bit Unicode character set. Each character occupies 32 bits of storage. The UCS-2 characters are the first 65,536 code points in this standard, so it can be viewed as a 32-bit extension of UCS-2. This is also sometimes referred to as ISO-10646.

NLSPG677

Unicode

Unicode is a universal encoded character set that enables information from any language to be stored by using a single character set. Unicode provides a unique code value for every character, regardless of the platform, program, or language.

NLSPG678

Unicode database

A database whose database character set is UTF-8.

NLSPG679

Unicode code point

A value in the Unicode codespace, which ranges from 0 to 0x10FFFF. Unicode assigns a unique code point to every character.

NLSPG680

Unicode data type

A SQL NCHAR data type (NCHAR, NVARCHAR2, and NCLOB). You can store Unicode characters in columns of these data types even if the database character set is not Unicode.

NLSPG681

unrestricted multilingual support

The ability to use as many languages as desired. A universal character set, such as Unicode, helps to provide unrestricted multilingual support because it supports a very large character repertoire, encompassing most modern languages of the world.

NLSPG682

UTFE

A Unicode 5.0 UTF-8 Oracle Database character set with 6-byte supplementary character support. It is used only on EBCDIC platforms.

NLSPG683

UTF8

The UTF8 Oracle Database character set encodes characters in one, two, or three bytes. It is for ASCII-based platforms. The UTF8 character set supports Unicode 5.0 and it is compliant to the CESU-8 standard. Although specific supplementary characters were not assigned code points in Unicode until version 3.1, the code point range was allocated for supplementary characters in Unicode 3.0. Supplementary characters are treated as two separate, user-defined characters that occupy 6 bytes.

NLSPG684

UTF-8

The 8-bit encoding of Unicode. It is a variable-width encoding. One Unicode character can be 1 byte, 2 bytes, 3 bytes, or 4 bytes in UTF-8 encoding. Characters from the European scripts are represented in either 1 or 2 bytes. Characters from most Asian scripts are represented in 3 bytes. Supplementary characters are represented in 4 bytes. The Oracle Database character set that supports UTF-8 is AL32UTF8.

NLSPG685

UTF-16

The 16-bit encoding of Unicode. It is an extension of UCS-2 and supports the supplementary characters defined in Unicode by using a pair of UCS-2 code points. One Unicode character can be 2 bytes or 4 bytes in UTF-16 encoding. Characters (including ASCII characters) from European scripts and most Asian scripts are represented in 2 bytes. Supplementary characters are represented in 4 bytes. The Oracle Database character set that supports UTF-16 is AL16UTF16.

NLSPG686

wide character

A fixed-width character format that is useful for extensive text processing because it enables data to be processed in consistent, fixed-width chunks. Wide characters are intended to support internal character processing.

Reader Comment

   

Comments, corrections, and suggestions are forwarded to authors every week. By submitting, you confirm you agree to the terms and conditions. Use the OTN forums for product questions. For support or consulting, file a service request through My Oracle Support.

Hide Navigation

Quick Lookup

Database Library · Master Index · Master Glossary · Book List · Data Dictionary · SQL Keywords · Initialization Parameters · Advanced Search · Error Messages

Main Categories

This Page

This Document

New and changed documents:
RSS Feed HTML RSS Feed PDF