[ Index ] |
PHP Cross Reference of Unnamed Project |
[Summary view] [Print] [Text view]
1 Charset Conversion Tables 2 3 4 The files in this directory contains conversion tables from a number of charsets (corresponding to the filenames) to UNICODE. 5 This is used by the TYPO3 class "t3lib_cs" which uses these tables when asked to convert a string from one charset to another. 6 7 Conversion tables are reproduced from http://www.microsoft.com/globaldev/reference/cphome.mspx 8 Found overview on this page: http://www.i18nguy.com/unicode/codepages.html 9 A mirror of Czyborra's pages: http://aspell.net/charsets/ 10 11 Further a lot of mapping tables are found here as well: 12 http://www.unicode.org/Public/MAPPINGS/ 13 14 15 16 17 18 PARSING: 19 The conversion table files are parsed linie by linie, extracting by either of these formulars: 20 21 Syntax 1: 22 [Local charset value, hex] = U+[UNICODE hex number] : [descriptive text, ignored] 23 24 Example: 25 A0 = U+00A0 : NO-BREAK SPACE 26 (eg. iso-8859-1.tbl) 27 28 Syntax 2: 29 0x[Local charset value, hex] 0x[UNICODE hex number] [descriptive text, ignored] 30 31 Example: 32 0xA0 0x00A0 NO-BREAK SPACE 33 (eg. big5.tbl) 34 35 36 Lines beginning with "#" or empty lines are ignored. 37 The syntax is auto-detected based on the first line found in the file. 38 Syntax 2 is directly from http://www.unicode.org/Public/MAPPINGS/ and you can probably take any charmap there and just copy into th csconvtbl/ folder and it will be ready for use. 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 INDEX: 55 56 iso-8859-1.tbl 57 ISO Character Set 8859-1 (Latin 1) 58 http://www.microsoft.com/globaldev/reference/iso/28591.htm 59 60 iso-8859-2.tbl 61 ISO Character Set 8859-2 (Latin 2) 62 http://www.microsoft.com/globaldev/reference/iso/28592.htm 63 64 iso-8859-3.tbl 65 ISO Character Set 8859-3 (Latin 3) 66 http://www.microsoft.com/globaldev/reference/iso/28593.htm 67 68 iso-8859-4.tbl 69 ISO Character Set 8859-4 (Baltic) 70 http://www.microsoft.com/globaldev/reference/iso/28594.htm 71 72 iso-8859-5.tbl 73 ISO Character Set 8859-5 (Cyrillic) 74 http://www.microsoft.com/globaldev/reference/iso/28595.htm 75 76 iso-8859-6.tbl 77 ISO Character Set 8859-6 (Arabic) 78 http://www.microsoft.com/globaldev/reference/iso/28596.htm 79 80 iso-8859-7.tbl 81 ISO Character Set 8859-7 (Greek) 82 http://www.microsoft.com/globaldev/reference/iso/28597.htm 83 84 iso-8859-8.tbl 85 ISO Character Set 8859-8 (Hebrew) 86 http://www.microsoft.com/globaldev/reference/iso/28598.htm 87 88 iso-8859-9.tbl 89 ISO Character Set 8859-9 (Turkish) 90 http://www.microsoft.com/globaldev/reference/iso/28599.htm 91 92 iso-8859-10.tbl 93 ISO Character Set 8859-10 94 http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-10.TXT 95 96 iso-8859-11.tbl 97 ISO Character Set 8859-11 (Thai) 98 http://czyborra.com/charsets/iso8859.html 99 http://aspell.net/charsets/iso8859.html 100 http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-11.TXT 101 102 iso-8859-13.tbl 103 ISO Character Set 8859-13 (Lithuanian) 104 http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-13.TXT 105 106 iso-8859-14.tbl 107 ISO Character Set 8859-14 (Celtic) 108 http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-10.TXT 109 110 iso-8859-15.tbl 111 ISO Character Set 8859-15 (Latin 9) 112 http://www.microsoft.com/globaldev/reference/iso/28605.htm 113 114 iso-8859-16.tbl 115 ISO Character Set 8859-16 (Romanian) 116 http://www.unicode.org/Public/MAPPINGS/ISO8859/8859-16.TXT 117 118 windows-1250.tbl 119 Microsoft Windows Codepage : 1250 (Central Europe) 120 http://www.microsoft.com/globaldev/reference/sbcs/1250.htm 121 122 windows-1251.tbl 123 Microsoft Windows Codepage : 1251 (Cyrillic) 124 http://www.microsoft.com/globaldev/reference/sbcs/1251.htm 125 126 windows-1252.tbl 127 Microsoft Windows Codepage : 1252 (Latin I) 128 http://www.microsoft.com/globaldev/reference/sbcs/1252.htm 129 130 windows-1253.tbl 131 Microsoft Windows Code Page : 1253 (Greek) 132 http://www.microsoft.com/globaldev/reference/sbcs/1253.htm 133 134 windows-1254.tbl 135 Microsoft Windows Codepage : 1254 (Turkish) 136 http://www.microsoft.com/globaldev/reference/sbcs/1254.htm 137 138 windows-1255.tbl 139 Microsoft Windows Codepage : 1255 (Hebrew) 140 http://www.microsoft.com/globaldev/reference/sbcs/1255.htm 141 142 windows-1256.tbl 143 Microsoft Windows Codepage : 1256 (Arabic) 144 http://www.microsoft.com/globaldev/reference/sbcs/1256.htm 145 146 windows-1257.tbl 147 Microsoft Windows Codepage : 1257 (Baltic) 148 http://www.microsoft.com/globaldev/reference/sbcs/1257.htm 149 150 windows-1258.tbl 151 Microsoft Windows Codepage : 1258 (Viet Nam) 152 http://www.microsoft.com/globaldev/reference/sbcs/1258.htm 153 154 windows-874.tbl 155 Microsoft Windows Codepage : 874 (Thai) 156 http://www.microsoft.com/globaldev/reference/sbcs/874.htm 157 158 shift_jis.tbl 159 Microsoft Windows Codepage : 932 (Japanese Shift-JIS) 160 http://www.microsoft.com/globaldev/reference/dbcs/932.htm 161 (Multibyte) 162 163 gb2312.tbl 164 Microsoft Windows Codepage : 936 (Simplified Chinese GBK) 165 gb2312 936 Chinese Simplified (GB2312) 166 gb_2312-80 936 Chinese Simplified (GB2312) 167 http://www.microsoft.com/globaldev/reference/dbcs/936.htm 168 (Multibyte) 169 Note: this is a MS-specific superset of the real GB2312 170 171 euc-kr.tbl 172 Microsoft Windows Codepage : 949 (Korean EUC-KR) 173 http://www.microsoft.com/globaldev/reference/dbcs/932.htm 174 (Multibyte) 175 Note: this is a MS-specific superset of the real EUC-KR 176 177 big5.tbl 178 Microsoft Windows Codepage : 950 (Traditional Chinese Big5) 179 http://www.microsoft.com/globaldev/reference/dbcs/950.htm 180 (Multibyte) 181 Note: this is a MS-specific superset of the real Big5 182 183 184 koi8-r.tbl 185 Cyrillic (Russian) 186 http://www.unicode.org/Public/MAPPINGS/VENDORS/MISC/KOI8-R.TXT 187
title
Description
Body
title
Description
Body
title
Description
Body
title
Body
Generated: Thu Aug 11 10:00:09 2016 | Cross-referenced by PHPXref 0.7.1 |