Lucy Languages
When Lucy's Translation features are enabled, users can ask and find Answers in any of the supported languages as configured. It is a paid feature with annual subscription and support cost, and there is a limit to your company’s usage based on how many characters are used.
Lucy Engineering resources are required to configure a company and it’s sources for Translation. This can be considered a one-time setup and Translation will work seamlessly after that. Please reach out to your Lucy Customer Success Manager for support in setting up Translation service.
What is included in Translation
Language detection of documents available to the company.
Language detection of queries.
To optimize performance, the user’s language is detected for the first query of their session.
The user may select a different language from the Language Selector dropdown.
Questions in any language return results from all languages.
The best result is chosen for its content rather than for its source language.
Content filter for language.
Users may narrow their results to those written in a specific language.
Show Translate button for Answers.
The Show Translate and Select Text buttons appear in the Answer View when the Query Language is different than the Document Language.
When a user clicks the Show Translate button, the entire Answer text is translated and returned to the user. The translation is also stored against the Answer for faster retrieval in the future.
When the Select Text button is selected, the Answer Preview enters a text selection mode. The user may then draw a box around the text they would like to be translated. The text of their selection is translated and just that portion of the translation is presented back to the user.
What is not included in Translation
All of Lucy UI remains in English.
Documents themselves are not translated. The translation appears next to the rendering of the document.
Companies must pay for their character usage (see Usage Expectations).
How it works
Ingestion
Documents are language detected using a sampling of their text. We grab a set amount of characters of page 2 (page 1 if not available) and send it to the Translation Service for language detection. All Answers from the document get tagged to the resulting language, and if it is not English, all Answers get translated to English for search indexing.
A Threshold keeps track of character usage per company, and Translation Service stops running if the limit is met.
Search
The first question a user asks is sent to Translation Provider for language detection. This sets their Preferred Language until they change it or Switch Company. They can change it using the Language Selector dropdown in the filter bar at the top of their search results or within the navigation bar, depending on instance configuration:
Figures 1 & 2 – Language Selector Dropdown
If a user opens an Answer whose Answer Language is different than their Preferred Language, there will be a button called Translate. When a user clicks, they have the ability to select specific text within the document to translate.
Figures 3 & 4 – Translate Button and Select Text Functionality
After a specific area of text is selected, the text will be translated on top of the answer.
Figures 5 & 6 – Select and Translated Text
Additional considerations
Supported languages
Translation supported languages are set by our Translation Provider. The supported languages list can be found here. Please note the Service requires the Provider to support both language detection and text translation as supported features. As of November 28 2023, we support the below 111 languages (including English).
Language | ISO Language Code |
---|---|
Afrikaans | af |
Albanian | sq |
Amharic | am |
Arabic | ar |
Armenian | hy |
Assamese | as |
Azerbaijani (Latin) | az |
Bangla | bn |
Bashkir | ba |
Basque | eu |
Bosnian (Latin) | bs |
Bulgarian | bg |
Cantonese (Traditional) | yue |
Catalan | ca |
Chinese Simplified | zh-Hans |
Chinese Traditional | zh-Hant |
Croatian | hr |
Czech | cs |
Danish | da |
Dari | prs |
Divehi | dv |
Dutch | nl |
English | en |
Estonian | et |
Faroese | fo |
Fijian | fj |
Finnish | fi |
French | fr |
Galician | gl |
Georgian | ka |
German | de |
Greek | el |
Gujarati | gu |
Haitian Creole | ht |
Hausa | ha |
Hebrew | he |
Hindi | hi |
Hmong Daw (Latin) | mww |
Hungarian | hu |
Icelandic | is |
Igbo | ig |
Indonesian | id |
Inuktitut | iu |
Inuktitut (Latin) | iu-Latn |
Irish | ga |
Italian | it |
Japanese | ja |
Kannada | kn |
Kazakh | kk |
Khmer | km |
Kinyarwanda | rw |
Klingon | tlh-Latn |
Klingon (plqaD) | tlh-Piqd |
Korean | ko |
Kurdish (Central) | ku |
Kyrgyz (Cyrillic) | ky |
Lao | lo |
Latvian | lv |
Lithuanian | lt |
Macedonian | mk |
Malagasy | mg |
Malay (Latin) | ms |
Malayalam | ml |
Maltese | mt |
Maori | mi |
Marathi | mr |
Mongolian (Cyrillic) | mn-Cyrl |
Myanmar | my |
Nepali | ne |
Norwegian | nb |
Odia | or |
Pashto | ps |
Persian | fa |
Polish | pl |
Portuguese (Brazil) | pt |
Punjabi | pa |
Queretaro Otomi | otq |
Romanian | ro |
Russian | ru |
Samoan (Latin) | sm |
Serbian (Cyrillic) | sr-Cyrl |
Serbian (Latin) | sr-Latn |
Sindhi | sd |
Sinhala | si |
Slovak | sk |
Slovenian | sl |
Somali (Arabic) | so |
Spanish | es |
Swahili (Latin) | sw |
Swedish | sv |
Tahitian | ty |
Tamil | ta |
Tatar (Latin) | tt |
Telugu | te |
Thai | th |
Tibetan | bo |
Tigrinya | ti |
Tongan | to |
Turkish | tr |
Turkmen (Latin) | tk |
Ukrainian | uk |
Upper Sorbian | hsb |
Urdu | ur |
Uyghur (Arabic) | ug |
Uzbek (Latin) | uz |
Vietnamese | vi |
Welsh | cy |
Xhosa | xh |
Yoruba | yo |
Yucatec Maya | yua |
Zulu | zu |
Usage expectations
Translation usage is priced based on number of characters sent to the Translation service. The service counts by how many million characters are used, and our backend keeps track of a Threshold for the same.
Documents are language detected using a sampling of their text to reduce the overall load of characters from Translation Service. Because of this architecture, repositories with predominantly English documents use very few characters compared to repositories which require a lot of translation.
Because user queries are relatively short and Language Detection only happens on the first question per session, Language Detection may keep happening after the Threshold is met. This is of minimal cost to Lucy and provides a better user experience (no interruption of service).
What to ask for when your company wants Translation
Which languages would you like?
What percent of your documents do you expect to be non-English?
Use this to estimate usage.
Compare to how many documents are available total.
Review what is and is not included (see above).
Is budget available to cover the implementation?
Non-Latin alphabet languages
While supported, languages that do not use the Latin alphabet have some additional considerations. Specifically, OCR may have accuracy issues when identifying characters from non-Latin alphabet texts. Expectations should be set with clients that translations will not be perfect and some accuracy issues may arise.
Also, the Lucy UI may require updates to handle the rendering of text in other alphabets. As of writing we have invested in this work for Cyrillic alphabet (Russian, Ukrainian, etc.). More work would be required to enable other character-sets, and additional scope should be considered for these languages.
Note: Filename search does not support special characters and this includes characters within non-latin alphabet languages.