UTF-8 (UCS Transformation Format 8) is the World Wide Web's most common character encoding. Each character is represented by one to four bytes. UTF-8 is backward-compatible with ASCII and can represent any standard Unicode character.
The first 128 UTF-8 characters precisely match the first 128 ASCII characters (numbered 0-127), meaning that existing ASCII text is already valid UTF-8. All other characters use two to four bytes. Each byte has some bits reserved for encoding purposes. Since non-ASCII characters require more than one byte for storage, they run the risk of being corrupted if the bytes are separated and not recombined.
Learn more
General knowledge
- UTF-8 on Wikipedia
- FAQ about UTF-8 on Unicode website
Document Tags and Contributors
    
    Tags: 
    
  
                    
                       Contributors to this page: 
        sideshowbarker, 
        haingh, 
        sebastien-bartoli, 
        r-o-b, 
        hbloomer, 
        Andrew_Pfeiffer, 
        Sheppy, 
        klez, 
        sandeepmishraxp
                    
                    
                       Last updated by:
                      sideshowbarker,