Tag | (0004,1142) |
---|---|
Type | Conditionally Required (1C) |
Keyword | SpecificCharacterSetOfFileSetDescriptorFile |
Value Multiplicity | 1 |
Value Representation | Code String (CS) |
Character set used in the File-set Descriptor File with a File ID as specified in File-set Descriptor File ID (0004,1141). Required to specify the expanded or replacement character set. If absent, only the Basic Graphic set is used. See Section C.12.1.1.2 for Defined Terms.
Specific Character Set (0008,0005) identifies the Character Set that expands or replaces the Basic Graphic Set (ISO 646) for values of Data Elements that have Value Representation of SH, LO, ST, PN, LT, UC or UT. See PS3.5.
If the Attribute Specific Character Set (0008,0005) is not present or has only a single value, Code Extension techniques are not used. Defined Terms for the Attribute Specific Character Set (0008,0005), when single valued, are derived from the International Registration Number as per ISO 2375 (e.g., ISO_IR 100 for Latin alphabet No. 1). See Table C.12-2.
The Specific Character Set value does not indicate the character set version in use at the time of SOP Instance creation. Updates to character sets designated by a Specific Character Set value are expected to be backward compatible.
This Standard does not specify the language associated with a specific character set. Language and character set selection are defined by local and regulatory requirements.
Table C.12-2. Defined Terms for Single-Byte Character Sets Without Code Extensions
Character Set Description |
Defined Term |
ISO Registration Number |
Number of Characters |
Code Element |
Character Set |
---|---|---|---|---|---|
Default repertoire |
none |
ISO-IR 6 |
94 |
G0 |
|
Latin alphabet No. 1 |
ISO_IR 100 |
ISO-IR 100 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 2 |
ISO_IR 101 |
ISO-IR 101 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 3 |
ISO_IR 109 |
ISO-IR 109 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 4 |
ISO_IR 110 |
ISO-IR 110 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Cyrillic |
ISO_IR 144 |
ISO-IR 144 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Arabic |
ISO_IR 127 |
ISO-IR 127 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Greek |
ISO_IR 126 |
ISO-IR 126 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Hebrew |
ISO_IR 138 |
ISO-IR 138 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 5 |
ISO_IR 148 |
ISO-IR 148 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 9 |
ISO_IR 203 |
ISO-IR 203 |
96 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
|||
Japanese |
ISO_IR 13 |
ISO-IR 13 |
94 |
G1 |
[JIS X 0201]: Katakana |
ISO-IR 14 |
94 |
G0 |
[JIS X 0201]: Romaji |
||
Thai |
ISO_IR 166 |
ISO-IR 166 |
88 |
G1 |
|
ISO-IR 6 |
94 |
G0 |
To use the single-byte code table of JIS X0201, the value of Attribute Specific Character Set (0008,0005), value 1 should be ISO_IR 13. This means that ISO-IR 13 is designated as the G1 code element, which is invoked in the GR area. It should be understood that, in addition, ISO-IR 14 is designated as the G0 code element and this is invoked in the GL area.
If the Attribute Specific Character Set (0008,0005) has more than one value, Code Extension techniques are used and Escape Sequences may be encountered in all character sets. Requirements for the use of Code Extension techniques are specified in PS3.5. In order to indicate the presence of Code Extension, the Defined Terms for the repertoires have the prefix "ISO 2022", e.g., ISO 2022 IR 100 for the Latin Alphabet No. 1. See Table C.12-3 and Table C.12-4. Table C.12-3 describes single-byte character sets for value 1 to value n of the Attribute Specific Character Set (0008,0005), and Table C.12-4 describes multi-byte character sets for value 2 to value n of the Attribute Specific Character Set (0008,0005).
A prefix other than "ISO 2022" may be needed in the future if other Code Extension techniques are adopted.
The same character set shall not be used more than once in Specific Character Set (0008,0005).
For example, the values "ISO 2022 IR 100\ISO 2022 IR 100" or "ISO_IR 100\ISO 2022 IR 100" are redundant and not permitted.
Table C.12-3. Defined Terms for Single-Byte Character Sets with Code Extensions
Character Set Description |
Defined Term |
Standard for Code Extension |
ESC Sequence |
ISO Registration Number |
Number of Characters |
Code Element |
Character Set |
---|---|---|---|---|---|---|---|
Default repertoire |
ISO 2022 IR 6 |
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|
Latin alphabet No. 1 |
ISO 2022 IR 100 |
ISO 2022 |
ESC 02/13 04/01 |
ISO-IR 100 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 2 |
ISO 2022 IR 101 |
ISO 2022 |
ESC 02/13 04/02 |
ISO-IR 101 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 3 |
ISO 2022 IR 109 |
ISO 2022 |
ESC 02/13 04/03 |
ISO-IR 109 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 4 |
ISO 2022 IR 110 |
ISO 2022 |
ESC 02/13 04/04 |
ISO-IR 110 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Cyrillic |
ISO 2022 IR 144 |
ISO 2022 |
ESC 02/13 04/12 |
ISO-IR 144 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Arabic |
ISO 2022 IR 127 |
ISO 2022 |
ESC 02/13 04/07 |
ISO-IR 127 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Greek |
ISO 2022 IR 126 |
ISO 2022 |
ESC 02/13 04/06 |
ISO-IR 126 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Hebrew |
ISO 2022 IR 138 |
ISO 2022 |
ESC 02/13 04/08 |
ISO-IR 138 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 5 |
ISO 2022 IR 148 |
ISO 2022 |
ESC 02/13 04/13 |
ISO-IR 148 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Latin alphabet No. 9 |
ISO 2022 IR 203 |
ISO 2022 |
ESC 02/13 06/02 |
ISO-IR 203 |
96 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
|||
Japanese |
ISO 2022 IR 13 |
ISO 2022 |
ESC 02/09 04/09 |
ISO-IR 13 |
94 |
G1 |
[JIS X 0201]: Katakana |
ISO 2022 |
ESC 02/08 04/10 |
ISO-IR 14 |
94 |
G0 |
[JIS X 0201]: Romaji |
||
Thai |
ISO 2022 IR 166 |
ISO 2022 |
ESC 02/13 05/04 |
ISO-IR 166 |
88 |
G1 |
|
ISO 2022 |
ESC 02/08 04/02 |
ISO-IR 6 |
94 |
G0 |
If the Attribute Specific Character Set (0008,0005) has more than one value and value 1 is empty, it is assumed that value 1 is ISO 2022 IR 6.
Table C.12-4. Defined Terms for Multi-Byte Character Sets with Code Extensions
Character Set Description |
Defined Term |
Standard for Code Extension |
ESC Sequence |
ISO Registration Number |
Number of Characters |
Code Element |
Character Set |
---|---|---|---|---|---|---|---|
Japanese |
ISO 2022 IR 87 |
ISO 2022 |
ESC 02/04 04/02 |
ISO-IR 87 |
942 |
G0 |
[JIS X 0208]: Kanji |
ISO 2022 IR 159 |
ISO 2022 |
ESC 02/04 02/08 04/04 |
ISO-IR 159 |
942 |
G0 |
[JIS X 0212]: Supplementary Kanji set |
|
Korean |
ISO 2022 IR 149 |
ISO 2022 |
ESC 02/04 02/09 04/03 |
ISO-IR 149 |
942 |
G1 |
[KS X 1001]: Hangul and Hanja |
Simplified Chinese |
ISO 2022 IR 58 |
ISO 2022 |
ESC 02/04 02/09 04/01 |
ISO-IR 58 |
6,763 |
G1 |
There are multi-byte character sets that prohibit the use of Code Extension Techniques. The following multi-byte character sets prohibit the use of Code Extension Techniques:
The Unicode character set used in [ISO/IEC 10646], when encoded in UTF
The [GB 18030] character set, when encoded per the rules of [GB 18030]
These character sets may only be specified as value 1 in the Specific Character Set (0008,0005) Attribute and there shall only be one value. The minimal length UTF-8 encoding shall always be used for [ISO/IEC 10646].
[ISO/IEC 10646] now prohibits the use of anything but the minimum length encoding for UTF-8. UTF-8 permits multiple different encodings, but when used to encode Unicode characters in accordance with ISO 10646-1 and 10646-2 (with extensions) only the minimal encodings are legal.
The representation for the characters in the DICOM Default Character Repertoire is the same single byte value for the Default Character Repertoire, [ISO/IEC 10646] in UTF-8, [GB 18030] and [GBK]. It is also the 7-bit US-ASCII encoding.
The [GBK] character set is a subset of the [GB 18030] character set, which is restricted in its one- and two-byte code points. In this subset, the [GBK] character set follows the exactly same encoding rules of [GB 18030].
Table C.12-5. Defined Terms for Multi-Byte Character Sets Without Code Extensions
Character Set Description |
Defined Term |
Character Set |
---|---|---|
Unicode in UTF-8 |
ISO_IR 192 |
|
GB18030 |
GB18030 |
|
GBK |
GBK |