"

Disclosure on the Use of Artificial Intelligence in Translation

Disclosure on the Use of Artificial Intelligence in Translation

I translated Hugo Becker’s Mechanics and Aesthetics of Cello Playing with the assistance of an artificial intelligence translation methodology created by Luke Sheneman and John Brunsfeld of the Institute for Interdisciplinary Data Sciences at the University of Idaho. This project was supported by the University of Idaho Fellowship in Interdisciplinary Data Sciences.

The Mechanics and Aesthetics of Cello Playing is written in complicated academic language with a lot of technical terminology. Some of it is cello- and music-specific, and some of it comes from medical fields. The AI translation methodology took these aspects of language into account when creating a resolved translation. The resulting document, while not publication-ready, was a useful and motivating starting point. As I worked through it line by line against the original, I realized with gratitude that it was much easier to make corrections to an existing translation than to create one from scratch.

I am not a fluent German speaker, despite having grown up in a German-speaking home, but my language skills from some long-ago undergraduate classes and a few visits to Germany were enough for me to navigate the project. Over the time it took me to translate and edit Mechanics and Aesthetics, AI technology changed rapidly. By the time I got to my final round of edits, large language models had become increasingly good at explaining German idioms. This was a great help, because Becker’s writing is not easy to read even for speakers of perfect German. Even then, there were occasions when AI and I both got hopelessly lost in one of Becker’s long, clause-filled sentences, and that was before I even attempted to grapple with his love of literary allusions and florid metaphors. When I ran into such difficulties, I was thankful for the help of my German-speaking father, Roger Wilson, who was always able to untangle them for me and to identify the sources of quotations.

Below is a description of the AI translation methodology by Luke Sheneman.

We created an end-to-end generative AI workflow where we used various AI models in discrete stages to perform the translation of Becker’s book to English.  Here is a description of the stages and methods for each stage:

  1. Multi-Modal Computer Vision

We used Computer Vision to convert images of the pages of Becker’s book to German text: The book was provided as a 144-page PDF. Each page of the PDF was a scanning of two pages of the book (left and right sides), for a total of approximately 288 total pages. The challenge was to accurately convert this collection of images into German text. For this, we used OpenAI’s GPT-4o model, which is a multimodal large-language model (LLM) with very strong vision capabilities. It was able to convert the scanned images into text files. We wrote a Python script that:

  • Saved every page of the 144-page PDF to a separate PNG file.
  • Used the OpenAI API to submit the image to the GPT-4o and asked the model to return the German text.
  • Saved the German text as files locally.

2. Ensemble Translation

We then used a collection of five Large Language Models (LLMs) to perform individual translations of the extracted German text to English. The idea was to have each model independently attempt the translation, thinking that each would have unique strengths and weaknesses. In a later step, we use a powerful reasoning model to analyze the collective set of translations from all of these LLMs to look at where they agree and disagree (and why) and derive some insight into alternative translations and result in a preferred final translation.

We used these specific models:

  • Aya: A 35-billion parameter model from Cohere that specializes in multilingual tasks
  • GPT4o: A frontier model from OpenAI
  • Llama 3.1: A 70-billion parameter LLM from Meta
  • Mistral Large: A 123-billion parameter model from Mistral.AI
  • Qwen 2.5: A 72-billion parameter model from Alibaba (China)

Each of these models were given this simple prompt:

“Translate the following German text to English. Retain the separator line of ******** characters as-is:\n{german_text}”

We create individual translations for each page of Becker’s book by each of these models and then aggregate the results into one set of five AI translations per page.

3. AI as Judge

We then rely on OpenAI’s o1 reasoning model to perform a careful analysis of the five independent AI translations for each page that were produced in step 2 above. The instructions to o1 were to compare and contrast the translations, and identify and analyze specific important places where the AI models disagreed. We also asked o1 to resolve each specific disagreement and explain its reasoning for how it resolved each discrete translation conflict.

Finally, we asked o1 to consider all of the five independent translations and reflect on its own analysis of where they differ to produce a final resolved translation. It is important to note that we provided context about the nature of the original book and the current audience, which influenced decisions on translation, grammar, and word choice.

This is the prompt used with o1:

“Analyze the following translations from multiple models. For context, the original text was from a 1929 German book for musicians about playing the cello, focusing on human anatomical mechanics. First, list and describe all important translation differences across models. For each difference, suggest the best way to resolve that discrepancy based on context and the original meaning.

Finally, in a last section called “RESOLVED TRANSLATION:”, output the best possible complete translation of the text given the multiple translations and overall analyses of the translations. Here is the aggregated translations produced from multiple models: \n\n{translations_text}”

The final result is a set of 144 PDFs, each one corresponding to each scanned page of the original 144-page PDF. These PDFs contained:

  • The original scanned image
  • The five translations from the ensemble of AI models
  • The detailed comparative analysis of these five translations by the OpenAI o1 reasoning model
  • The final resolved translation provided by the o1 model

Running this workflow was more-or-less fully automated by a set of Python scripts, and took about an hour to complete the full translation workflow.

The comprehensive workflow leveraged the strengths of many AI models to produce the best possible translation that was also conditioned based on the explicit context (background, final audience, etc.). Given the way this was done, this translation is superior to generic tools like Google Translate and can go from scanned image to translation in a handful of automated steps.

License

Icon for the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License

Mechanics and Aesthetics of Cello Playing Copyright © 2025 by Miranda Wilson is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License, except where otherwise noted.

Share This Book