How we Lowered our Article Localization Costs a Full Order-of-Magnitude With LLMs

I recently joined the executive leadership team at the non-profit Waha, as their Localization Director. I have committed to work with them for the next 3 years with the goal of shipping their app and curriculum in 80+ new languages during that time.

This has been a role I’ve been really loving. It has been stretching and demanding in all the ways I love. The role requires a high degree of competence in online-communication, a deep appreciation of nuances in languages and cultures, and a broad understanding of current technologies and how they can be used to help simplify the workflows of our volunteers and partners.

There are a number of projects I’ve already had the joy of spearhead in my work with Waha. One of these project pertains to translating our library of Articles. We currently have a growing number of resources in our ecosystem (in the form of articles, testimonials, tutorials, etc.) that haven’t yet been translated into languages other than English. We want the whole Waha ecosystem to be massively multilingual, and so one of my projects has been figuring out a workflow to quickly and affordably translate these articles into other languages for our users.

We requested a number of quotes from various translation companies for what getting these articles localized would cost, and they were all prohibitively expensive for our small non-profit, especially when considering the number of articles we need to translate.

So I set out to find another workflow that would give us top-quality translations for a fraction of the cost. Here’s what we came up with!

👷🏽‍♀️ What we did and how we did it:

Like so many others, I’ve been eagerly following the rapid developments in the AI world this year. Working in the localization field in the age of language-intelligent computers means there is a lot of value to be unlocked by building workflows that can implement these new tools well.

Such was the case with this project!

So, we decided to run an experiment: Could we create a workflow using AI tools, that would give us really high-quality machine-translations of our articles, that we could then contract with a fantastic human proofreader to check and fix errors, and verify quality?

Basically, could we make an awesome AI-Assisted Translation workflow?

One key problem we foresaw here was that Waha has some very specific vocabulary, terminology, and acronyms, that we use throughout the app and curriculum. We have most of those terms written in an internal Glossary, which is the first thing we translate whenever we start working on a new language. So we knew that getting these key terms translated correctly with our articles would be vital.

This kind of ruled out the most well-known translation tools like DeepL, Google Translate, etc. We needed a translation tool that we could actually give instructions to. Something that could adapt to the specific needs of our project, instead of just spitting out an attempt at a translation without understanding the context…

Yup! This would be a job for an LLM!

Because I had a developer account with OpenAI, that already had gpt-4-turbo-preview unlocked, we decided to use it to create the base translation. We figured that if we could give it a few stylistic instructions, along with our glossary file, that we could get a pretty good result. So we tried that!

Here’s the prompt we used:

Act as a professional translator, translating from English to {TARGET LANGUAGE}.

Use phrasing and terminology that makes the article sound like it was originally
written by someone from {COUNTRY/REGION WHERE TARGET LANGUAGE IS SPOKEN}.

Use natural sounding {TARGET LANGUAGE} instead of a literal English translation.
I want dynamic equivalence meaning, instead of a word-for-word translation.

Use the following translation glossary whenever the English terms appear:

```csv
Term [en],Term [target_language]
[COMMA SEPERATED LIST OF ENGLISH TERM], [FOLLOWED BY PREFERED TARGET LANGUAGE USAGE]
``

When translating the English second person pronoun "you",
use the {TARGET LANGUAGE} singular pronoun
(such as "tu" in Spanish, or "sen" in Turkish),
instead of the plural ("ustedes" or "siz"),
unless context clearly suggests a plural should be used.



Translate the article below:

--- English Source Article ---

[ARTICLE HERE]

--- Translation of above article into {TARGET LANGUAGE} ---

We copied in the original English article (and, obviously, replaced the {PLACEHOLDERS} with the appropriate target language, etc. ) and pressed “send”, then waited for the text to generate.

For this experiment, we got our proofreader set up with ChatKit, which allowed her to use my gpt-4-turbo-preview API key in a nice user-friendly interface. We could have used a paid ChatGPT Plus account, but the GPT-4 model used in ChatGPT Plus has a limit of about 8,000 tokens (about 6,000 words), where using gpt-4-turbo-preview through the API gives us up to 128,000 tokens (over 90,000 words). While 6,000 words would have been more than enough for the experiment thus far, those extra 84,000+ words of context were about to become really useful to us.

After ChatKit / gpt-4-turbo finished rendering the translation, our proofreader checked it. When she was done, we debriefed. She said the translation was good — better than she would have expected a machine could do — but still far from perfect.

This was a good step in the right direction, but there was still a lot of room to improve.

There were a large number of tiny problems with the resulting translations, which were difficult to turn into clear instructions that we could give in the prompt. (We were looking for things like the “plural” / “singular” instruction in the prompt above, that would allow us to give clear instructions about how the articles should be translated. “This way, and not that way.” But the issues with the translation weren’t clear enough that we could describe them in a specific way like that, and there were enough small issues that it would take a long time to try to explain them all that way.)

If only there was a way to give the AI more context, so it could give even better translations. 🤔

🤖 Making the bot BETTER

We mulled over the project for a bit, and then had a realization:
We have an example of exactly how we want these articles translated, in the form of the article we just finished. And we’re about to have several more, as this translation project continues!

And, most importantly, we can easily tell the AI to make use of these examples!

So, we added a couple simple lines to the prompt before the article we wanted translated:

Here is are example English Articles,
and their ideal {TARGET LANGUAGE} translation.
Use similar style of {TARGET LANGUAGE} in your translation.

--- English Example #1 ---

[ORIGINAL ARTICLE]

--- {TARGET LANGUAGE} Translation #1 ---

[FINISHED/PROOF-READ Target Language Article]

We then put the first English article and its proof-read translation into the prompt, as well as the next article we wanted translated, then we pressed “send” to translate article number #2.

Our proofreader started working through the new article, and she was shocked.

“Wow! This is so much better than last time!” she told me.

She finished second article much quicker than the first, as there were much fewer things that she needed to correct. There were still a number of things she needed to fix, but it was much better.

So we repeated this process for the 3rd article!

We added the 1st English article, the 1st translation, the 2nd English article, and the 2nd translation to the prompt, before finally giving it the next article we wanted translated.

Then, our proofreader went to work again… and again she finished much faster than before.

We kept repeating this process for each new article: adding the completed, proof-read articles into the new prompt to get better and better base-translations for each subsequent article. And each article we did this for, our proofreader was able to beat her previous record of how fast she was able to correct the resulting machine translation, because the generated translation had fewer and fewer problems that she needed to fix.

This is where gpt-4-turbo-preview was really able to shine. With the 90,000+ word context window, we could give more than 40 completed articles of stylistic context to each prompt, which made for some extremely context-rich translations.

(Each article was ~1,000 words. We gave both the English and target language article as context. So we could have used as much as ~2,000 words / ~90,000 words ≈ 45 articles included in the prompt)

💲 Final Math

We translated about a dozen articles in this experiment. At the end, we sat down to debrief the project. Our final translation prompt was a behemoth, with 12 articles of context in it. But it worked fantastically well!

When we did the final breakdown of costs for the project, we discovered that the cost of translating / proofreading those final 3-4 articles with this new workflow (and our fine-tuned prompt with lots of context in it) was a full 10x less expensive than any of the quotes we were given from other translation organizations… and our translation cost was continuing to lower as our AI got more and more context to work with.

This was exciting to us. As a non-profit, new high-efficiency workflows like this mean that we can have a lot more impact-per-donor-dollar than we had before.

We’ve immediately started the process of integrating this workflow, and the lessons we’ve learned from designing it, into our wider localization initiative. We’re excited for how it will increase both the velocity with which we can ship our translation projects, and lower the costs of those projects. We’re hopeful that workflows and tools like this will play a significant part in helping us get our curriculum, app, and supplemental training-content into 80+ languages in the next 3 years!

If you’re interested in learning more about the work we’re involved in, or maybe even partnering with us in that work, check out the following links:

👷🏽‍♀️ What we did and how we did it:#

🤖 Making the bot BETTER#

💲 Final Math#

👷🏽‍♀️ What we did and how we did it:

🤖 Making the bot BETTER

💲 Final Math