Hangul is the Korean writing system and its primary unit is the syllable. In contrast with other syllabaries like the hiragana and katakana systems used to write Japanese, each syllable can be decomposed into simpler parts that specify the sounds that make up the syllable. Every syllable must have an initial consonant and a vowel; some syllables may contain a final consonant (patchim) or even two final consonants (double patchim).
In a string, each syllable block occupies a single character, but when typing each symbol (jamo) is inputted individually and must be composed into a syllable block. If we think about this composition of symbols to form syllable blocks, it could be very hard to do if the codepoints of the symbols and the syllables were not related. That is, if the codes for ㅁ, 마 and 만 were not related, you'd need a table to know the code of 마 when adding ㅏ to ㅁ, and the code of 만 when typing ㄴ after 마, and so on. The number of combinations would require a relatively large table.
Luckily the Unicode Consortium was smarter than that, and there is an algorithm relating the codes of the jamo and the symbol blocks. It is described here:
Korean language and computers - Wikipedia
The important part is:
The precomposed hangul syllables in the Hangul Syllables block in Unicode are algorithmically defined, using the following formula:
- [(initial) × 588 + (medial) × 28 + (final)] + 44032
- Initial consonants
- Medial vowels
- Final consonants
Comments
Post a Comment