Skip to main content

Brainspace

Brainspace Language Support

Brainspace’s patented algorithms work with all tokenized languages. The analytics experience is dramatically improved by adding stop words for most common business languages, including the ability to automatically detect terms and phrases, group documents and cluster terms on the Cluster Wheel, and execute concept searches for each supported language. Brainspace also automatically detects primary and secondary languages within each document and provides a set of fields that store the language-detection information.

Note

Brainspace identifies languages for a document and then applies language-specific stop words to documents. The Common stop-words list is empty by default. You can create a custom stop-word list and upload it to Common if you want certain stop words to be applied to all languages. For example, Brainspace does not provide a stop-word list for Estonian. If a you have a large Estonian population, it might be useful to upload an Estonian stop-word list to common; however, any tokens that overlap with other languages will be applied to those languages as well. For example, if the word “face” is a stop word in Estonian, that word will be stopped in English documents as well.

The following is a support summary of all Brainspace 6 languages. For languages that only have identification support, Brainspace still provides the following analysis:

  • Tokenize a document using space as the separator between terms (English-based).

  • Use n-gram phrase detection.

  • Index original token along with an English-based normalization token.

    Note

    This can at times lead to inconsistent results.

Phrase detection using parts of speech is generally more meaningful than n-gram because Brainspace has tailored detection to that specific language by leveraging parts of speech. Phrase detection using n-gram is statistically-based and does not incorporate language specific customization.

The following table describes the level of support that Brainspace 6 provides for different languages. In addition to default stop words, you can upload custom stop words to any language included in the Languages and Stop Words pane (see Manage Stop Words).

Note

Out of the following list of all supported languages, Chinese and Icelandic are the two that don't get any kind of stemming, lemmatizing, or other language specific handling when indexing.

Table 1. Feature Support

Language

Language Identification

Stop Words

Phrase Detection (Parts of Speech)

Phrase Detection (n-gram)

Entity Extraction

Albanian

x

Arabic

x

x

x

x

x

Bengali

x

Bulgarian

x

Catalan

x

Chinese

x

x

x

x

x

Croatian

x

Czech

x

x

x

Danish

x

x

Dutch

x

x

x

x

English

x

x

x

x

Estonian

x

Finnish

x

x

French

x

x

x

x

German

x

x

x

x

Greek

x

x

x

Gujarati

x

Hebrew

x

x

x

x

Hindi

x

Hungarian

x

x

x

Icelandic

x

x

Indonesian

x

x

Italian

x

x

x

x

Japanese

x

x

x

x

Kannada

x

Korean

x

x

x

x

Kurdish

x

Latvian

x

Lithuanian

x

Macedonian

x

Malay

x

x

Malayalam

x

Norwegian

x

x

Pashto

x

x

Persian

x

x

x

x

Polish

x

x

x

Portuguese

x

x

x

x

Romanian

x

x

x

Russian

x

x

x

x

Serbian

x

Slovak

x

Slovenian

x

Somali

x

Spanish

x

x

x

x

Swedish

x

x

Tagalog

x

Tamil

x

Telugu

x

Thai

x

Turkish

x

Ukraine

x

Urdu

x

x

x

x

Uzbek

x

Vietnamese

x

x