Lucy Languages

When Lucy's Translation features are enabled, users can ask and find Answers in any of the supported languages as configured. It is a paid feature with annual subscription and support cost, and there is a limit to your company’s usage based on how many characters are used.

Lucy Engineering resources are required to configure a company and it’s sources for Translation. This can be considered a one-time setup and Translation will work seamlessly after that. Please reach out to your Lucy Customer Success Manager for support in setting up Translation service. 

What is included in Translation

  1. Language detection of documents available to the company.

  2. Language detection of queries.

    1. To optimize performance, the user’s language is detected for the first query of their session.

    2. The user may select a different language from the Language Selector dropdown.

  3. Questions in any language return results from all languages.

    1. The best result is chosen for its content rather than for its source language.

  4. Content filter for language.

    1. Users may narrow their results to those written in a specific language.

  5. Show Translate button for Answers.

    1. The Show Translate and Select Text buttons appear in the Answer View when the Query Language is different than the Document Language.

    2. When a user clicks the Show Translate button, the entire Answer text is translated and returned to the user. The translation is also stored against the Answer for faster retrieval in the future.

    3. When the Select Text button is selected, the Answer Preview enters a text selection mode. The user may then draw a box around the text they would like to be translated. The text of their selection is translated and just that portion of the translation is presented back to the user.

What is not included in Translation

  1. All of Lucy UI remains in English.

  2. Documents themselves are not translated. The translation appears next to the rendering of the document.

  3. Companies must pay for their character usage (see Usage Expectations).

How it works

Ingestion 

Documents are language detected using a sampling of their text. We grab a set amount of characters of page 2 (page 1 if not available) and send it to the Translation Service for language detection. All Answers from the document get tagged to the resulting language, and if it is not English, all Answers get translated to English for search indexing.

A Threshold keeps track of character usage per company, and Translation Service stops running if the limit is met.

Search

The first question a user asks is sent to Translation Provider for language detection. This sets their Preferred Language until they change it or Switch Company. They can change it using the Language Selector dropdown in the filter bar at the top of their search results or within the navigation bar, depending on instance configuration:

Figures 1 & 2 – Language Selector Dropdown

If a user opens an Answer whose Answer Language is different than their Preferred Language, there will be a button called Translate. When a user clicks, they have the ability to select specific text within the document to translate.

Figures 3 & 4 – Translate Button and Select Text Functionality

After a specific area of text is selected, the text will be translated on top of the answer. 

Figures 5 & 6 – Select and Translated Text

Additional considerations

Supported languages

Translation supported languages are set by our Translation Provider. The supported languages list can be found here. Please note the Service requires the Provider to support both language detection and text translation as supported features. As of November 28 2023, we support the below 111 languages (including English).


Language


ISO Language Code


Afrikaans

af

Albanian

sq

Amharic

am

Arabic

ar

Armenian

hy

Assamese

as

Azerbaijani (Latin)

az

Bangla

bn

Bashkir

ba

Basque

eu

Bosnian (Latin)

bs

Bulgarian

bg

Cantonese (Traditional)

yue

Catalan

ca

Chinese Simplified

zh-Hans

Chinese Traditional

zh-Hant

Croatian

hr

Czech

cs

Danish

da

Dari

prs

Divehi

dv

Dutch

nl

English

en

Estonian

et

Faroese

fo

Fijian

fj

Finnish

fi

French

fr

Galician

gl

Georgian

ka

German

de

Greek

el

Gujarati

gu

Haitian Creole

ht

Hausa

ha

Hebrew

he

Hindi

hi

Hmong Daw (Latin)

mww

Hungarian

hu

Icelandic

is

Igbo

ig

Indonesian

id

Inuktitut

iu

Inuktitut (Latin)

iu-Latn

Irish

ga

Italian

it

Japanese

ja

Kannada

kn

Kazakh

kk

Khmer

km

Kinyarwanda

rw

Klingon

tlh-Latn

Klingon (plqaD)

tlh-Piqd

Korean

ko

Kurdish (Central)

ku

Kyrgyz (Cyrillic)

ky

Lao

lo

Latvian

lv

Lithuanian

lt

Macedonian

mk

Malagasy

mg

Malay (Latin)

ms

Malayalam

ml

Maltese

mt

Maori

mi

Marathi

mr

Mongolian (Cyrillic)

mn-Cyrl

Myanmar

my

Nepali

ne

Norwegian

nb

Odia

or

Pashto

ps

Persian

fa

Polish

pl

Portuguese (Brazil)

pt

Punjabi

pa

Queretaro Otomi

otq

Romanian

ro

Russian

ru

Samoan (Latin)

sm

Serbian (Cyrillic)

sr-Cyrl

Serbian (Latin)

sr-Latn

Sindhi

sd

Sinhala

si

Slovak

sk

Slovenian

sl

Somali (Arabic)

so

Spanish

es

Swahili (Latin)

sw

Swedish

sv

Tahitian

ty

Tamil

ta

Tatar (Latin)

tt

Telugu

te

Thai

th

Tibetan

bo

Tigrinya

ti

Tongan

to

Turkish

tr

Turkmen (Latin)

tk

Ukrainian

uk

Upper Sorbian

hsb

Urdu

ur

Uyghur (Arabic)

ug

Uzbek (Latin)

uz

Vietnamese

vi

Welsh

cy

Xhosa

xh

Yoruba

yo

Yucatec Maya

yua

Zulu

zu


Usage expectations

Translation usage is priced based on number of characters sent to the Translation service. The service counts by how many million characters are used, and our backend keeps track of a Threshold for the same.

Documents are language detected using a sampling of their text to reduce the overall load of characters from Translation Service. Because of this architecture, repositories with predominantly English documents use very few characters compared to repositories which require a lot of translation.

Because user queries are relatively short and Language Detection only happens on the first question per session, Language Detection may keep happening after the Threshold is met. This is of minimal cost to Lucy and provides a better user experience (no interruption of service).

What to ask for when your company wants Translation

  1. Which languages would you like?

  2. What percent of your documents do you expect to be non-English?

    1. Use this to estimate usage.

    2. Compare to how many documents are available total.

  3. Review what is and is not included (see above).

  4. Is budget available to cover the implementation?

Non-Latin alphabet languages

While supported, languages that do not use the Latin alphabet have some additional considerations. Specifically, OCR may have accuracy issues when identifying characters from non-Latin alphabet texts. Expectations should be set with clients that translations will not be perfect and some accuracy issues may arise.

Also, the Lucy UI may require updates to handle the rendering of text in other alphabets. As of writing we have invested in this work for Cyrillic alphabet (Russian, Ukrainian, etc.). More work would be required to enable other character-sets, and additional scope should be considered for these languages.

Note: Filename search does not support special characters and this includes characters within non-latin alphabet languages. 


Was this article helpful?