owned this note
owned this note
Published
Linked with GitHub
---
tags: HTML4 SPEC
---
# Section 8: Language information and text direction
- [dicument link](https://www.w3.org/TR/html401/struct/dirlang.html)
- purpose: internationalization of HTML (`lang` and `dir` attribute)
## 8.1 Specifying the language of content: the `lang` attribute
This attribute specifies the base language of an **element's attribute** values and **text content**.
- Assisting search engines
- Assisting speech synthesizers
- Helping a **user agent** select glyph variants for high quality typography
- Helping a **user agent** choose a set of quotation marks
- Helping a **user agent** make decisions about hyphenation, ligatures, and spacing
- Assisting spell checkers and grammar checkers
### 8.1.1 Language codes
The `lang` attribute's value is a language code. Language codes consist of a **primary code** and a possibly empty series of **subcodes**:
```
language-code = primary-code ( "-" subcode )*
```
Example:
`en` : English
`en-US` : U.S. version of English
`en-cockney` : Cockney version of English
- Two-letter primary codes : language abbreviations
- Two-letter subcode : country code
### 8.1.2 Inheritance of language codes
An element **inherits** language code information according to the following order of precedence (highest to lowest):
1. The `lang` attribute set for the **element itself**
2. The **closest parent** element that has the `lang` attribute set
3. The **HTTP "Content-Language" header** (which may be configured in a server)
4. **User agent** default values and user preferences
### 8.1.3 Interpretation of language codes
- A language code should be interpreted by user agents as a **hierarchy of tokens** rather than a single token.
- should always favor an **exact match**
Example:
`<HTML lang="en-US">`
1. `en-US`
2. `en`
## 8.2 Specifying the direction of text and tables: the `dir` attribute
This attribute specifies the base **direction** of directionally neutral text in an element's content and attribute values.
- LTR: Left-to-right text
- RTL: Right-to-left text
Example: to express a Hebrew quotation, it is more intuitive to write
`<Q lang="he" dir="rtl">...a Hebrew quotation...</Q>`
```
希伯來語文字 是用來撰寫希伯來語及猶太語言的字體,
並在時間上早於英語數千年。希伯來文為雙向字串的一個範例,
希伯來文的字母是以由右至左的方向讀寫,數字則是由左至右。
```
[Example](https://www.w3schools.com/tags/tryit.asp?filename=tryhtml5_global_dir)
### 8.2.0 [希伯來語簡介](http://languagemystery.blogspot.com/2015/01/blog-post_10.html)
希伯來語(Hebrew)是古代猶太民族的通行語言,是現時世上最古老的語言之一,在宗教上具有崇高的地位,古時《聖經》和猶太教的典籍都是用希伯來語所寫。今天,以希伯來語為官方語言的國家有以色列。
- 希伯來語屬於閃米特語族(Semitic languages),文字的書寫方向是從右到左
- 希伯來語的書寫文字只有輔音(又稱子音,consonants),而沒有元音(又稱母音,vowels)
舉例說,假設你見到一句英語:
You should love your parents with all your heart.
以希伯來語來書寫,首先要脫掉每個詞語的元音變成:
Y shld lv yr prnts wth ll yr hrt.
再從右到左的方向書寫變成:
.trh ry ll htw stnrp ry vl dlhs Y
### 8.2.1 Introduction to the bidirectional algorithm
Consider the following example text:
```
english1 HEBREW2 english3 HEBREW4 english5 HEBREW6
```
The order of characters stored in computer:
```
0 => e
1 => n
...
49 => 6
```
The way to display this sentence depends on which language is **predominant**.
- English
```
english1 2WERBEH english3 4WERBEH english5 6WERBEH
<------ <------ <------
H H H
------------------------------------------------->
E
```
- Hebrew
```
6WERBEH english5 4WERBEH english3 2WERBEH english1
-------> -------> ------->
E E E
<-------------------------------------------------
H
```
### 8.2.2 Inheritance of text direction information
Bidirectional algorithm requires a **base text direction** for text blocks. To specify the base direction of a block-level element, **set the element's `dir` attribute**. The default value of the dir attribute is "ltr" (left-to-right text).
When the `dir` attribute is set for a **block-level** element, it **remains in effect** for the duration of the element and any nested block-level elements
**Inline elements**, on the other hand, do not inherit the dir attribute.
### 8.2.3 Setting the direction of embedded text
Bidirectional algorithm automatically reverses embedded character sequences according to their inherent directionality, see [8.2.1](#821-Introduction-to-the-bidirectional-algorithm).
owever, in general only **one level** of embedding can be accounted for.
```
english1 HEBREW2 english3 HEBREW4 english5 HEBREW6
```
- English
```
english1 2WERBEH english3 4WERBEH english5 6WERBEH
<------ <------ <------
H H H
------------------------------------------------->
E
```
- Hebrew
```
6WERBEH english5 4WERBEH english3 2WERBEH english1
-------> -------> ------->
E E E
<-------------------------------------------------
H
```
- **English for all sentence, Hebrew for part of sentence**: must supply additional information, which we do by delimiting the second embedding explicitly
```
english1 4WERBEH english3 2WERBEH english5 6WERBEH
------->
E
<-----------------------
H
<------------------------------------------------->
E
```
```html
english1 <SPAN dir="RTL">HEBREW2 english3 HEBREW4</SPAN> english5 HEBREW6
```
### 8.2.4 Overriding the bidirectional algorithm: the BDO element
```
<!ELEMENT BDO - - (%inline;)* -- I18N BiDi over-ride -->
<!ATTLIST BDO
%coreattrs; -- id, class, style, title --
lang %LanguageCode; #IMPLIED -- language code --
dir (ltr|rtl) #REQUIRED -- directionality --
>
```
`dir` is mandatory attribute specifies the base direction of the element's text content.
- LTR: Left-to-right text.
- RTL: Right-to-left text.
Some situations may arise when the bidirectional algorithm results in **incorrect presentation**. The BDO element allows authors to **turn off** the bidirectional algorithm for selected fragments of text
Consider a document containing the same text as before:
```
english1 HEBREW2 english3 HEBREW4 english5 HEBREW6
```
the above might be formatted, including line breaks, as:
```
english1 2WERBEH english3
4WERBEH english5 6WERBEH
```
This conflicts with the bidirectional algorithm, because that algorithm would **invert** 2WERBEH, 4WERBEH, and 6WERBEH **a second time**, displaying the Hebrew words left-to-right instead of right-to-left.
Solution:
```htmlembedded
<PRE>
<BDO dir="LTR">english1 2WERBEH english3</BDO>
<BDO dir="LTR">4WERBEH english5 6WERBEH</BDO>
</PRE>
```
### 8.2.5 Character references for directionality and joining control
Since ambiguities sometimes arise as to the directionality of certain characters, specification includes characters to enable their proper resolution.
Some directional entities:
```
<!ENTITY zwnj CDATA "‌"--=zero width non-joiner-->
<!ENTITY zwj CDATA "‍"--=zero width joiner-->
<!ENTITY lrm CDATA "‎"--=left-to-right mark-->
<!ENTITY rlm CDATA "‏"--=right-to-left mark-->
```
### 8.2.6 The effect of style sheets on bidirectionality
When an **inline element** that does not have a `dir` attribute is transformed to the style of a **block-level element** by a style sheet, it inherits the `dir` attribute from its closest parent block element
When a **block element** that does not have a `dir` attribute is transformed to the style of an **inline element** by a style sheet, explicitly adding a `dir` attribute (assigned the inherited value) to the transformed element.