Introduction
Netscape 7.1 is the first commercial browser that has built-in support for Internationalized Domain Name under the new IETF RFC's established in 2003.
An Internationalized Domain Name (IDN) is a domain/host name which uses non-ASCII characters. Until recently domain names allowed only a subset of 7-bit ASCII characters. As the Internet has spread to non-English speaking people around the world, it has become increasingly clear that forcing them to use domain names written only in a subset of the Latin Alphabet is not ideal.
Many of the European languages use the basic Latin Alphabet with additional accented characters for writing but they were not able to use them in domain names. There are many languages whose writing scripts are not based on Latin Alphabet at all. Speakers of these languages were not able to use familiar names in their native languages as part of Internet domain/host names.
For the past few years, there have been a flurry of IETF activities to standardize the protocols involved in domain names to handle non-ASCII characters. In March of 2003, three important RFC's were approved by IETF. (Cf. RFC's 3490, 3491, 3492.) These new RFC's now make it possible for domain name servers to register non-ASCII domain names and application/client vendors to implement standardized support for handling non-ASCII characters in domain names.
How IDN Works
When a browser sees a host name such as http://developer.mozilla.org, it passes a request to the DNS resolver service (usually built into an OS), which in turn sends a request to a nearest domain name server to return an IP address that corresponds to the host name. This IP address is then used to connect to the web server in question.
IDN allows host/domain names with non-ASCII characters for user input into a browser's location bar or URL's embedded in web pages. At the network protocol level, there is no change in the restriction that only a subset of ASCII characters be used in URL/URI. If end users input non-ASCII characters as part of a domain name or if a web page contains a link using a non-ASCII domain name, the application must convert such input into a special encoded format using only the usual ASCII subset characters. RFC 3490 (Internationalizing Domain Names in Applications (IDNA)) defines characters used in IDN to be drawn from Unicode Standard 3.2. It also defines how an application should process non-ASCII characters in such a way to conform to existing host name character restrictions.
How Mozilla Browsers Handle Non-ASCII Domain Names
Unicode and Nameprep
When Mozilla receives IDN input from the user via the location bar or a request to process non-ASCII host name links, it first turns them into Unicode, then normalizes the input string to make it conform to general URI requirement.
The process will convert uppercase characters to lowercase ones (Case folding), unify characters with multiple representation, e.g. conversion of Half-width Kana characters in Japanese into Full-width ones (normalization), eliminate prohibited characters (e.g. space), eliminate ambiguities in bi-directional text (e.g. Arabic and Hebrew), and check whether or not unassigned characters in the Unicode repertoire are used -- allowing them for "query strings" but disallowing them for "stored strings" such as the data input for domain name registration.
This process is called "Nameprep" and is performed according to RFC 3491 (Nameprep: A Stringprep Profile for Internationalized Domain Names (IDN)) and RFC 3454 (Preparation of Internationalized Strings ("stringprep")).
ASCII-compatible encoding (ACE)
The next step is to convert the 8-bit characters in Unicode to 7-bit ones using only restricted ASCII characters. During the discussion phase of the IDN protocols development, there were some competing ASCII-compatible encoding (ACE) schemes proposed but an agreement was reached eventually to standardize on a type of ACE called "Punycode". This is defined in RFC 3492 (Punycode: A Bootstring encoding of Unicode for Internationalized Domain Names in Applications (IDNA)).
The Punycode proposal uses only restricted ASCII characters and numbers (a-z0-9) and a hyphen (-). It was shown to be language independent, superior in compression, compact in code size, round-trip safe, and superior for encoding Chinese/Japanese/Korean characters.
The final step of the process is the affixing of the ACE prefix to the output string from the Nameprep/stringprep and Punycode processing. Since the Punycode contains only ASCII characters, it is possible that an output may, though unlikely, coincide with existing domain names. To avoid such a complication, RFC 3490 defines a special prefix "xn--" for the ACE (Punycode) output. Other encodings used different prefixes. e.g. "bq--" for RACE, but all except the standard ACE prefix "xn--" are now disallowed in IDN.
As an example, an output string to be sent to a DNS server for a Japanese domain name, "http://ジェーピーニック.jp", will look like the following in ACE form:
http://xn--hckqz9bzb1cyrb.jp
Domain Name Registration
After the technical standards were established by IETF, the last remaining issue was for domain name registrars to agree on an international guideline on the use of IDN characters. This was accomplished by the publication of the ICANN guideline for IDN in June of 2003. (Cf. ICANN = Internet Corporation for Assigned Names and Numbers.) The guideline allows domain name registrars in each country to limit the use of characters for domain names. Since the Unicode repertoire contains characters no longer used in any living languages and there are also living characters in most languages that are not suitable for URI/URL creation, the ICANN guideline allows the governing body of each country's domain registrars to set appropriate limitations on the use of characters.
With this last piece of obstacle for standardization out of the way, domain name registrars are expected to move forward on implementing the new RFC's for existing and future IDN registrations quickly.
JPRS (Japan Registry Service) decided to move to the new RFC implementation on July 10, 2003, only a few weeks after the ICANN's guideline was published. This makes it possible for Netscape 7.1/Mozilla 1.4 users to access Japanese host names under .jp top domain without any additional setup using just the built-in IDN functionality.
Real World Examples
Punycode
There are real world examples of IDN that you can test with Netscape 7.1, which uses Punycode as the default IDN encoding. For example, Most sample links on the following test pages can be used without any further setting:
- http://www.nunames.nu/eu-lang-test.htm (Domain names with Latin 1 accented characters)
- http://www.nunames.nu/lldemo/default.htm (Domain names in other languages)
On July 10, 2003 and thereafter, you can access a large number of Japanese domain name sites under the .jp top domain with no further setting on Netscape 7.1/Mozilla 1.4:
RACE (Row-based ASCII Compatible Encoding)
Almost all IDN registration data are expected to change to Punycode by the end of 2003. Some country will complete the conversion quickly, e.g. Japan as mentioned above, but others such as the ones under the .com and .net top domains may take longer.
Most of the existing sites currently use the ASCII-compatible encoding known as RACE or Row-based ASCII Compatible Encoding, which was not accepted as a standard by IETF. If you find IDN test sites under the .com and .net top domains, and if you cannot successfully access these sites, you can use the following workaround until the conversion to Punycode is completed for these top domains:
Using Netscape 7.1 or Mozilla 1.4:
- Type about:config into the location/URL bar. This will list all the preferences for your current profile. These preferences can be modified or new ones can be created without quitting the browser using the steps described below.
- Create a new preference item using the menu New > String via a right-mouse click. The name of the preference is: network.IDN_prefix. The value should be "bq--". This will change the default from Puncycode to RACE.
- Next create another new preference item using the right-mouse click menu New > Boolean. The name of the preference is: network.IDN_testbed. The value should be "true".
- Now access IDN sites under the .com and .net top domains. You should succeed in reaching the sample sites.
- Don't forget to set the value of these preferences to "default" once you are finished with testing!
Caveats and Conclusions
Netscape 7.1/Mozilla 1.4 has solid support for Internationalized Domain Names and is the first browser with built-in support for new RFC's for IDN established by IETF. This means that there is no longer any need to use a plug-in to process non-ASCII domain names.
Netscape/Mozilla's support for IDN is not without some bugs. One notable bug is that non-ASCII names are not always displayed correctly in some UI areas such as Preference panels, Bookmarks and History. Non-ASCII names are not always correctly displayed in the location bar due to the fact that ACE to Unicode conversion is not implemented yet. Of particular concern for Japanese users is the one in which Full-width Japanese Roman characters are not normalized to ASCII roman characters. (Cf. bug 210734.) This forces the Japanese user to shift out of the Japanese input mode to write the top domain names such as .jp causing inconvenience. For other bugs, see this link.
IDN is a global trend and is likely to be adopted by a large number of sites making it easier for average Internet users to find web sites. Many web sites around the world are expected to register native language host names with the appropriate domain name registrars for their top domains. Netscape 7.1 and Mozilla 1.4 are playing a significant role in aiding the development of IDN further.
Original Document Information
- Author(s): Katsuhiko Momoi
- Last Updated Date: 03 Jul 2003