Lucy Languages

When Lucy's Translation features are enabled, users can ask and find Answers in any of the supported languages as configured. It is a paid feature with annual subscription and support cost, and there is a limit to your company’s usage based on how many characters are used.

Lucy Engineering resources are required to configure a company and it’s sources for Translation. This can be considered a one-time setup and Translation will work seamlessly after that. Please reach out to your Lucy Customer Success Manager for support in setting up Translation service.

What is included in Translation

Language detection of documents available to the company.
Language detection of queries.
1. To optimize performance, the user’s language is detected for the first query of their session.
2. The user may select a different language from the Language Selector dropdown.
Questions in any language return results from all languages.
1. The best result is chosen for its content rather than for its source language.
Content filter for language.
1. Users may narrow their results to those written in a specific language.
Show Translate button for Answers.
1. The Show Translate and Select Text buttons appear in the Answer View when the Query Language is different than the Document Language.
2. When a user clicks the Show Translate button, the entire Answer text is translated and returned to the user. The translation is also stored against the Answer for faster retrieval in the future.
3. When the Select Text button is selected, the Answer Preview enters a text selection mode. The user may then draw a box around the text they would like to be translated. The text of their selection is translated and just that portion of the translation is presented back to the user.

What is not included in Translation

All of Lucy UI remains in English.
Documents themselves are not translated. The translation appears next to the rendering of the document.
Companies must pay for their character usage (see Usage Expectations).

How it works

Ingestion

Documents are language detected using a sampling of their text. We grab a set amount of characters of page 2 (page 1 if not available) and send it to the Translation Service for language detection. All Answers from the document get tagged to the resulting language, and if it is not English, all Answers get translated to English for search indexing.

A Threshold keeps track of character usage per company, and Translation Service stops running if the limit is met.

Search

The first question a user asks is sent to Translation Provider for language detection. This sets their Preferred Language until they change it or Switch Company. They can change it using the Language Selector dropdown in the filter bar at the top of their search results or within the navigation bar, depending on instance configuration:

Figures 1 & 2 – Language Selector Dropdown

If a user opens an Answer whose Answer Language is different than their Preferred Language, there will be a button called Translate. When a user clicks, they have the ability to select specific text within the document to translate.

Figures 3 & 4 – Translate Button and Select Text Functionality

After a specific area of text is selected, the text will be translated on top of the answer.

Figures 5 & 6 – Select and Translated Text

Additional considerations

Supported languages

Translation supported languages are set by our Translation Provider. The supported languages list can be found here. Please note the Service requires the Provider to support both language detection and text translation as supported features. As of November 28 2023, we support the below 111 languages (including English).

Language	ISO Language Code
Afrikaans	af
Albanian	sq
Amharic	am
Arabic	ar
Armenian	hy
Assamese	as
Azerbaijani (Latin)	az
Bangla	bn
Bashkir	ba
Basque	eu
Bosnian (Latin)	bs
Bulgarian	bg
Cantonese (Traditional)	yue
Catalan	ca
Chinese Simplified	zh-Hans
Chinese Traditional	zh-Hant
Croatian	hr
Czech	cs
Danish	da
Dari	prs
Divehi	dv
Dutch	nl
English	en
Estonian	et
Faroese	fo
Fijian	fj
Finnish	fi
French	fr
Galician	gl
Georgian	ka
German	de
Greek	el
Gujarati	gu
Haitian Creole	ht
Hausa	ha
Hebrew	he
Hindi	hi
Hmong Daw (Latin)	mww
Hungarian	hu
Icelandic	is
Igbo	ig
Indonesian	id
Inuktitut	iu
Inuktitut (Latin)	iu-Latn
Irish	ga
Italian	it
Japanese	ja
Kannada	kn
Kazakh	kk
Khmer	km
Kinyarwanda	rw
Klingon	tlh-Latn
Klingon (plqaD)	tlh-Piqd
Korean	ko
Kurdish (Central)	ku
Kyrgyz (Cyrillic)	ky
Lao	lo
Latvian	lv
Lithuanian	lt
Macedonian	mk
Malagasy	mg
Malay (Latin)	ms
Malayalam	ml
Maltese	mt
Maori	mi
Marathi	mr
Mongolian (Cyrillic)	mn-Cyrl
Myanmar	my
Nepali	ne
Norwegian	nb
Odia	or
Pashto	ps
Persian	fa
Polish	pl
Portuguese (Brazil)	pt
Punjabi	pa
Queretaro Otomi	otq
Romanian	ro
Russian	ru
Samoan (Latin)	sm
Serbian (Cyrillic)	sr-Cyrl
Serbian (Latin)	sr-Latn
Sindhi	sd
Sinhala	si
Slovak	sk
Slovenian	sl
Somali (Arabic)	so
Spanish	es
Swahili (Latin)	sw
Swedish	sv
Tahitian	ty
Tamil	ta
Tatar (Latin)	tt
Telugu	te
Thai	th
Tibetan	bo
Tigrinya	ti
Tongan	to
Turkish	tr
Turkmen (Latin)	tk
Ukrainian	uk
Upper Sorbian	hsb
Urdu	ur
Uyghur (Arabic)	ug
Uzbek (Latin)	uz
Vietnamese	vi
Welsh	cy
Xhosa	xh
Yoruba	yo
Yucatec Maya	yua
Zulu	zu

Usage expectations

Translation usage is priced based on number of characters sent to the Translation service. The service counts by how many million characters are used, and our backend keeps track of a Threshold for the same.

Documents are language detected using a sampling of their text to reduce the overall load of characters from Translation Service. Because of this architecture, repositories with predominantly English documents use very few characters compared to repositories which require a lot of translation.

Because user queries are relatively short and Language Detection only happens on the first question per session, Language Detection may keep happening after the Threshold is met. This is of minimal cost to Lucy and provides a better user experience (no interruption of service).

What to ask for when your company wants Translation

Which languages would you like?
What percent of your documents do you expect to be non-English?
1. Use this to estimate usage.
2. Compare to how many documents are available total.
Review what is and is not included (see above).
Is budget available to cover the implementation?

Non-Latin alphabet languages

While supported, languages that do not use the Latin alphabet have some additional considerations. Specifically, OCR may have accuracy issues when identifying characters from non-Latin alphabet texts. Expectations should be set with clients that translations will not be perfect and some accuracy issues may arise.

Also, the Lucy UI may require updates to handle the rendering of text in other alphabets. As of writing we have invested in this work for Cyrillic alphabet (Russian, Ukrainian, etc.). More work would be required to enable other character-sets, and additional scope should be considered for these languages.

Note: Filename search does not support special characters and this includes characters within non-latin alphabet languages.

Was this article helpful?