# Language detection

Each time a post is created, the body content is analyzed to determine which language(s) it contains.\
The result is stored in JSON format in the field `body_language` in the table [Comments](https://docs.hivesql.io/technical-informations/state-tables/comments)\
As a post can contain multiple languages, the result is an array.

Each item in the array contains the following values:

* Language code
* Confidence score
* Is reliable - true/false

If the language cannot be determined (ex: post containing pictures only), the array will be left empty

The result is something like this:

![](https://2883586225-files.gitbook.io/~/files/v0/b/gitbook-legacy-files/o/assets%2F-MWGLUXcek_KgwMLAO9R%2F-MZPwHdH5GpLeA7Tf7s1%2F-MZQ3LDrTztrzUo0G829%2Fimage.png?alt=media\&token=9723f618-2cbc-4841-9732-aca9f1128245)

A post with several languages will have his `body_language` field set to something like

```
[{"language":"es","isReliable":true,"confidence":8.22},
 {"language":"en","isReliable":false,"confidence":5.12}]
```

\
The `confidence` value is related to how much text the post contains. The more text analyzed, the better the language analysis, the higher the `confidence` value. Confidence is not a ratio and can be higher than 100.

If the post contains words in different languages, `isReliable` will be set to `true` to identify the most probable language, even if its `confidence` value is lower.&#x20;

If there is only one language and `isReliable` is set to `false`, this indicates `confidence` is too low.

Be aware that language detector works using probabilities and sometimes it is not accurate with very short texts. The same happens when different languages used in the post have similar words.

The language detector can also be tricked when the content of a post contains lots of *"technical noise"* like pictures, source code, edit tags, ...

| code | language       | code | language           | code    | language            | code | language          |
| ---- | -------------- | ---- | ------------------ | ------- | ------------------- | ---- | ----------------- |
| aa   | Afar           | ab   | Abkhazian          | af      | Afrikaans           | ak   | Akan              |
| am   | Amharic        | ar   | Arabic             | as      | Assamese            | ay   | Aymara            |
| az   | Azerbaijani    | ba   | Bashkir            | be      | Belarusian          | bg   | Bulgarian         |
| bh   | Bihari         | bi   | Bislama            | bn      | Bengali             | bo   | Tibetan           |
| br   | Breton         | bs   | Bosnian            | bug     | Buginese            | ca   | Catalan           |
| ceb  | Cebuano        | chr  | Cherokee           | co      | Corsican            | crs  | Seselwa           |
| cs   | Czech          | cy   | Welsh              | da      | Danish              | de   | German            |
| dv   | Dhivehi        | dz   | Dzongkha           | egy     | Egyptian            | el   | Greek             |
| en   | English        | eo   | Esperanto          | es      | Spanish             | et   | Estonian          |
| eu   | Basque         | fa   | Persian            | fi      | Finnish             | fj   | Fijian            |
| fo   | Faroese        | fr   | French             | fy      | Frisian             | ga   | Irish             |
| gd   | Scots\_Gaelic  | gl   | Galician           | gn      | Guarani             | got  | Gothic            |
| gu   | Gujarati       | gv   | Manx               | ha      | Hausa               | haw  | Hawaiian          |
| hi   | Hindi          | hmn  | Hmong              | hr      | Croatian            | ht   | Haitian Creole    |
| hu   | Hungarian      | hy   | Armenian           | ia      | Interlingua         | id   | Indonesian        |
| ie   | Interlingue    | ig   | Igbo               | ik      | Inupiak             | is   | Icelandic         |
| it   | Italian        | iu   | Inuktitut          | iw      | Hebrew              | ja   | Japanese          |
| jw   | Javanese       | ka   | Georgian           | kha     | Khasi               | kk   | Kazakh            |
| kl   | Greenlandic    | km   | Khmer              | kn      | Kannada             | ko   | Korean            |
| ks   | Kashmiri       | ku   | Kurdish            | ky      | Kyrgyz              | la   | Latin             |
| lb   | Luxembourgish  | lg   | Ganda              | lif     | Limbu               | ln   | Lingala           |
| lo   | Laothian       | lt   | Lithuanian         | lv      | Latvian             | mfe  | Mauritian Creole  |
| mg   | Malagasy       | mi   | Maori              | mk      | Macedonian          | ml   | Malayalam         |
| mn   | Mongolian      | mr   | Marathi            | ms      | Malay               | mt   | Maltese           |
| my   | Burmese        | na   | Nauru              | ne      | Nepali              | nl   | Dutch             |
| no   | Norwegian      | nr   | Ndebele            | nso     | Pedi                | ny   | Nyanja            |
| oc   | Occitan        | om   | Oromo              | or      | Oriya               | pa   | Punjabi           |
| pl   | Polish         | ps   | Pashto             | pt      | Portuguese          | qu   | Quechua           |
| rm   | Rhaeto Romance | rn   | Rundi              | ro      | Romanian            | ru   | Russian           |
| rw   | Kinyarwanda    | sa   | Sanskrit           | sco     | Scots               | sd   | Sindhi            |
| sg   | Sango          | si   | Sinhalese          | sk      | Slovak              | sl   | Slovenian         |
| sm   | Samoan         | sn   | Shona              | so      | Somali              | sq   | Albanian          |
| sr   | Serbian        | ss   | Siswant            | st      | Sesotho             | su   | Sundanese         |
| sv   | Swedish        | sw   | Swahili            | syr     | Syriac              | ta   | Tamil             |
| te   | Telugu         | tg   | Tajik              | th      | Thai                | ti   | Tigrinya          |
| tk   | Turkmen        | tl   | Tagalog            | tlh     | Klingon             | tn   | Tswana            |
| to   | Tonga          | tr   | Turkish            | ts      | Tsonga              | tt   | Tatar             |
| ug   | Uighur         | uk   | Ukrainian          | ur      | Urdu                | uz   | Uzbek             |
| ve   | Venda          | vi   | Vietnamese         | vo      | Volapuk             | war  | Waray Philippines |
| wo   | Wolof          | xh   | Xhosa              | yi      | Yiddish             | yo   | Yoruba            |
| za   | Zhuang         | zh   | Chinese Simplified | zh-hant | Chinese Traditional | zu   | Zulu              |


---

# Agent Instructions: Querying This Documentation

If you need additional information that is not directly available in this page, you can query the documentation dynamically by asking a question.

Perform an HTTP GET request on the current page URL with the `ask` query parameter:

```
GET https://docs.hivesql.io/technical-informations/language-detection.md?ask=<question>
```

The question should be specific, self-contained, and written in natural language.
The response will contain a direct answer to the question and relevant excerpts and sources from the documentation.

Use this mechanism when the answer is not explicitly present in the current page, you need clarification or additional context, or you want to retrieve related documentation sections.
