ICU 50.1.2
50.1.2
|
C API: Unicode Script Information. More...
#include "unicode/utypes.h"
Go to the source code of this file.
Typedefs | |
typedef enum UScriptCode | UScriptCode |
Constants for ISO 15924 script codes. More... | |
Functions | |
int32_t | uscript_getCode (const char *nameOrAbbrOrLocale, UScriptCode *fillIn, int32_t capacity, UErrorCode *err) |
Gets script codes associated with the given locale or ISO 15924 abbreviation or name. More... | |
const char * | uscript_getName (UScriptCode scriptCode) |
Gets a script name associated with the given script code. More... | |
const char * | uscript_getShortName (UScriptCode scriptCode) |
Gets a script name associated with the given script code. More... | |
UScriptCode | uscript_getScript (UChar32 codepoint, UErrorCode *err) |
Gets the script code associated with the given codepoint. More... | |
UBool | uscript_hasScript (UChar32 c, UScriptCode sc) |
Do the Script_Extensions of code point c contain script sc? If c does not have explicit Script_Extensions, then this tests whether c has the Script property value sc. More... | |
int32_t | uscript_getScriptExtensions (UChar32 c, UScriptCode *scripts, int32_t capacity, UErrorCode *errorCode) |
Writes code point c's Script_Extensions as a list of UScriptCode values to the output scripts array and returns the number of script codes. More... | |
C API: Unicode Script Information.
Definition in file uscript.h.
typedef enum UScriptCode UScriptCode |
Constants for ISO 15924 script codes.
Many of these script codes - those from Unicode's ScriptNames.txt - are character property values for Unicode's Script property. See UAX #24 Script Names (http://www.unicode.org/reports/tr24/).
Starting with ICU 3.6, constants for most ISO 15924 script codes are included (currently excluding private-use codes Qaaa..Qabx). For scripts for which there are codes in ISO 15924 but which are not used in the Unicode Character Database (UCD), there are no Unicode characters associated with those scripts.
For example, there are no characters that have a UCD script code of Hans or Hant. All Han ideographs have the Hani script code. The Hans and Hant script codes are used with CLDR data.
ISO 15924 script codes are included for use with CLDR and similar.
enum UScriptCode |
Constants for ISO 15924 script codes.
Many of these script codes - those from Unicode's ScriptNames.txt - are character property values for Unicode's Script property. See UAX #24 Script Names (http://www.unicode.org/reports/tr24/).
Starting with ICU 3.6, constants for most ISO 15924 script codes are included (currently excluding private-use codes Qaaa..Qabx). For scripts for which there are codes in ISO 15924 but which are not used in the Unicode Character Database (UCD), there are no Unicode characters associated with those scripts.
For example, there are no characters that have a UCD script code of Hans or Hant. All Han ideographs have the Hani script code. The Hans and Hant script codes are used with CLDR data.
ISO 15924 script codes are included for use with CLDR and similar.
Enumerator | |
---|---|
USCRIPT_INVALID_CODE |
|
USCRIPT_COMMON |
|
USCRIPT_INHERITED |
|
USCRIPT_ARABIC |
|
USCRIPT_ARMENIAN |
|
USCRIPT_BENGALI |
|
USCRIPT_BOPOMOFO |
|
USCRIPT_CHEROKEE |
|
USCRIPT_COPTIC |
|
USCRIPT_CYRILLIC |
|
USCRIPT_DESERET |
|
USCRIPT_DEVANAGARI |
|
USCRIPT_ETHIOPIC |
|
USCRIPT_GEORGIAN |
|
USCRIPT_GOTHIC |
|
USCRIPT_GREEK |
|
USCRIPT_GUJARATI |
|
USCRIPT_GURMUKHI |
|
USCRIPT_HAN |
|
USCRIPT_HANGUL |
|
USCRIPT_HEBREW |
|
USCRIPT_HIRAGANA |
|
USCRIPT_KANNADA |
|
USCRIPT_KATAKANA |
|
USCRIPT_KHMER |
|
USCRIPT_LAO |
|
USCRIPT_LATIN |
|
USCRIPT_MALAYALAM |
|
USCRIPT_MONGOLIAN |
|
USCRIPT_MYANMAR |
|
USCRIPT_OGHAM |
|
USCRIPT_OLD_ITALIC |
|
USCRIPT_ORIYA |
|
USCRIPT_RUNIC |
|
USCRIPT_SINHALA |
|
USCRIPT_SYRIAC |
|
USCRIPT_TAMIL |
|
USCRIPT_TELUGU |
|
USCRIPT_THAANA |
|
USCRIPT_THAI |
|
USCRIPT_TIBETAN |
|
USCRIPT_CANADIAN_ABORIGINAL |
Canadian_Aboriginal script.
|
USCRIPT_UCAS |
Canadian_Aboriginal script (alias).
|
USCRIPT_YI |
|
USCRIPT_TAGALOG |
|
USCRIPT_HANUNOO |
|
USCRIPT_BUHID |
|
USCRIPT_TAGBANWA |
|
USCRIPT_BRAILLE |
|
USCRIPT_CYPRIOT |
|
USCRIPT_LIMBU |
|
USCRIPT_LINEAR_B |
|
USCRIPT_OSMANYA |
|
USCRIPT_SHAVIAN |
|
USCRIPT_TAI_LE |
|
USCRIPT_UGARITIC |
|
USCRIPT_KATAKANA_OR_HIRAGANA |
New script code in Unicode 4.0.1.
|
USCRIPT_BUGINESE |
|
USCRIPT_GLAGOLITIC |
|
USCRIPT_KHAROSHTHI |
|
USCRIPT_SYLOTI_NAGRI |
|
USCRIPT_NEW_TAI_LUE |
|
USCRIPT_TIFINAGH |
|
USCRIPT_OLD_PERSIAN |
|
USCRIPT_BALINESE |
|
USCRIPT_BATAK |
|
USCRIPT_BLISSYMBOLS |
|
USCRIPT_BRAHMI |
|
USCRIPT_CHAM |
|
USCRIPT_CIRTH |
|
USCRIPT_OLD_CHURCH_SLAVONIC_CYRILLIC |
|
USCRIPT_DEMOTIC_EGYPTIAN |
|
USCRIPT_HIERATIC_EGYPTIAN |
|
USCRIPT_EGYPTIAN_HIEROGLYPHS |
|
USCRIPT_KHUTSURI |
|
USCRIPT_SIMPLIFIED_HAN |
|
USCRIPT_TRADITIONAL_HAN |
|
USCRIPT_PAHAWH_HMONG |
|
USCRIPT_OLD_HUNGARIAN |
|
USCRIPT_HARAPPAN_INDUS |
|
USCRIPT_JAVANESE |
|
USCRIPT_KAYAH_LI |
|
USCRIPT_LATIN_FRAKTUR |
|
USCRIPT_LATIN_GAELIC |
|
USCRIPT_LEPCHA |
|
USCRIPT_LINEAR_A |
|
USCRIPT_MANDAIC |
|
USCRIPT_MANDAEAN |
|
USCRIPT_MAYAN_HIEROGLYPHS |
|
USCRIPT_MEROITIC_HIEROGLYPHS |
|
USCRIPT_MEROITIC |
|
USCRIPT_NKO |
|
USCRIPT_ORKHON |
|
USCRIPT_OLD_PERMIC |
|
USCRIPT_PHAGS_PA |
|
USCRIPT_PHOENICIAN |
|
USCRIPT_PHONETIC_POLLARD |
|
USCRIPT_RONGORONGO |
|
USCRIPT_SARATI |
|
USCRIPT_ESTRANGELO_SYRIAC |
|
USCRIPT_WESTERN_SYRIAC |
|
USCRIPT_EASTERN_SYRIAC |
|
USCRIPT_TENGWAR |
|
USCRIPT_VAI |
|
USCRIPT_VISIBLE_SPEECH |
|
USCRIPT_CUNEIFORM |
|
USCRIPT_UNWRITTEN_LANGUAGES |
|
USCRIPT_UNKNOWN |
|
USCRIPT_CARIAN |
|
USCRIPT_JAPANESE |
|
USCRIPT_LANNA |
|
USCRIPT_LYCIAN |
|
USCRIPT_LYDIAN |
|
USCRIPT_OL_CHIKI |
|
USCRIPT_REJANG |
|
USCRIPT_SAURASHTRA |
|
USCRIPT_SIGN_WRITING |
|
USCRIPT_SUNDANESE |
|
USCRIPT_MOON |
|
USCRIPT_MEITEI_MAYEK |
|
USCRIPT_IMPERIAL_ARAMAIC |
|
USCRIPT_AVESTAN |
|
USCRIPT_CHAKMA |
|
USCRIPT_KOREAN |
|
USCRIPT_KAITHI |
|
USCRIPT_MANICHAEAN |
|
USCRIPT_INSCRIPTIONAL_PAHLAVI |
|
USCRIPT_PSALTER_PAHLAVI |
|
USCRIPT_BOOK_PAHLAVI |
|
USCRIPT_INSCRIPTIONAL_PARTHIAN |
|
USCRIPT_SAMARITAN |
|
USCRIPT_TAI_VIET |
|
USCRIPT_MATHEMATICAL_NOTATION |
|
USCRIPT_SYMBOLS |
|
USCRIPT_BAMUM |
|
USCRIPT_LISU |
|
USCRIPT_NAKHI_GEBA |
|
USCRIPT_OLD_SOUTH_ARABIAN |
|
USCRIPT_BASSA_VAH |
|
USCRIPT_DUPLOYAN_SHORTAND |
|
USCRIPT_ELBASAN |
|
USCRIPT_GRANTHA |
|
USCRIPT_KPELLE |
|
USCRIPT_LOMA |
|
USCRIPT_MENDE |
|
USCRIPT_MEROITIC_CURSIVE |
|
USCRIPT_OLD_NORTH_ARABIAN |
|
USCRIPT_NABATAEAN |
|
USCRIPT_PALMYRENE |
|
USCRIPT_SINDHI |
|
USCRIPT_WARANG_CITI |
|
USCRIPT_AFAKA |
|
USCRIPT_JURCHEN |
|
USCRIPT_MRO |
|
USCRIPT_NUSHU |
|
USCRIPT_SHARADA |
|
USCRIPT_SORA_SOMPENG |
|
USCRIPT_TAKRI |
|
USCRIPT_TANGUT |
|
USCRIPT_WOLEAI |
|
USCRIPT_ANATOLIAN_HIEROGLYPHS |
|
USCRIPT_KHOJKI |
|
USCRIPT_TIRHUTA |
|
USCRIPT_CODE_LIMIT |
|
int32_t uscript_getCode | ( | const char * | nameOrAbbrOrLocale, |
UScriptCode * | fillIn, | ||
int32_t | capacity, | ||
UErrorCode * | err | ||
) |
Gets script codes associated with the given locale or ISO 15924 abbreviation or name.
Fills in USCRIPT_MALAYALAM given "Malayam" OR "Mlym". Fills in USCRIPT_LATIN given "en" OR "en_US" If required capacity is greater than capacity of the destination buffer then the error code is set to U_BUFFER_OVERFLOW_ERROR and the required capacity is returned
Note: To search by short or long script alias only, use u_getPropertyValueEnum(UCHAR_SCRIPT, alias) instead. This does a fast lookup with no access of the locale data.
nameOrAbbrOrLocale | name of the script, as given in PropertyValueAliases.txt, or ISO 15924 code or locale |
fillIn | the UScriptCode buffer to fill in the script code |
capacity | the capacity (size) fo UScriptCode buffer passed in. |
err | the error status code. |
const char* uscript_getName | ( | UScriptCode | scriptCode | ) |
Gets a script name associated with the given script code.
Returns "Malayam" given USCRIPT_MALAYALAM
scriptCode | UScriptCode enum |
UScriptCode uscript_getScript | ( | UChar32 | codepoint, |
UErrorCode * | err | ||
) |
Gets the script code associated with the given codepoint.
Returns USCRIPT_MALAYALAM given 0x0D02
codepoint | UChar32 codepoint |
err | the error status code. |
int32_t uscript_getScriptExtensions | ( | UChar32 | c, |
UScriptCode * | scripts, | ||
int32_t | capacity, | ||
UErrorCode * | errorCode | ||
) |
Writes code point c's Script_Extensions as a list of UScriptCode values to the output scripts array and returns the number of script codes.
Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
If there are more than capacity script codes to be written, then U_BUFFER_OVERFLOW_ERROR is set and the number of Script_Extensions is returned. (Usual ICU buffer handling behavior.)
The Script_Extensions property is provisional. It may be modified or removed in future versions of the Unicode Standard, and thus in ICU.
c | code point |
scripts | output script code array |
capacity | capacity of the scripts array |
errorCode | Standard ICU error code. Its input value must pass the U_SUCCESS() test, or else the function returns immediately. Check for U_FAILURE() on output or use with function chaining. (See User Guide for details.) |
const char* uscript_getShortName | ( | UScriptCode | scriptCode | ) |
Gets a script name associated with the given script code.
Returns "Mlym" given USCRIPT_MALAYALAM
scriptCode | UScriptCode enum |
UBool uscript_hasScript | ( | UChar32 | c, |
UScriptCode | sc | ||
) |
Do the Script_Extensions of code point c contain script sc? If c does not have explicit Script_Extensions, then this tests whether c has the Script property value sc.
Some characters are commonly used in multiple scripts. For more information, see UAX #24: http://www.unicode.org/reports/tr24/.
The Script_Extensions property is provisional. It may be modified or removed in future versions of the Unicode Standard, and thus in ICU.
c | code point |
sc | script code |