Saltar al contenido principal

Generating drafts

Introduction

SIL International’s Natural Language Processing team has developed a tool that can assist in Bible translation by creating a rough first draft for translators to edit and refine. As of January 1st, 2024, this tool is now available for use within Scripture Forge, a scripture editing platform that is closely integrated with Paratext.

All of the drafts that are created contain errors that need to be corrected by the translation team. For some projects, the quality will be too low to be useful. However, extensive field testing has shown that a significant number of teams find the drafts to be very helpful in their work, and of sufficient quality to use as a starting point for the team to edit.

Ultimately, the measure of success for the drafts is in their usefulness as a tool to assist the translation team in their work, not their ability to stand alone as a finished product.

How it works

In order to use the tool effectively, it’s important to understand how it works. The drafts are created using a two-step process:

  1. Learn the language.
  2. Translate the text.

The most important step is the first one, the learning of the language. The quality of the draft that is created depends almost entirely on how well this step goes.

The system learns by seeing the same sentence written in multiple languages, one in a language that it already understands, and one in the language you are translating into.

The most important point to remember is that these sentences need to say the same thing in both languages. This is the same principle that allowed scholars to learn hieroglyphics from the Rosetta Stone: the same text was written in three different languages, allowing comparison between a language that was already understood and one that was not.

Text that is available in both the language you are translating from, and the language you are translating to, is known as “parallel text”. In general, the more parallel text you have, the better the system will learn the language. For most projects, this parallel text will be your reference text, and the translation work you have already completed. At this time, we recommend that you have at least the New Testament translated.

Example use cases

A basic example

Suppose a translation team is translating the Bible into a local language, using the English NIV as a reference text. The team has completed the entire New Testament, and is beginning work on Genesis. The system would generate a draft as follows:

  1. Compare the English NIV New Testament with the local language New Testament, in order to learn the language..
  2. Having learned the language, translate Genesis from the English NIV into the local language.

After generating the book of Genesis, the team would edit the draft to correct any errors. Afterwards, they would be able to generate a draft of the next book they plan to work on, and the system would use the translation of Genesis to improve the quality of the next draft.

A more complex example

In the first example, the team was using the English NIV as a reference text, and the system learned the language by comparing the English NIV with the local language New Testament. However, in many cases a better quality draft can be generated by using a different text than the one the team is translating from.

For example, it is often possible to improve the quality of the draft by using a back translation of the local language into the source language. If the team has back translated the project into English, it could be used as the reference text instead of the English NIV. A back translation is usually much more literal than a normal translation, and therefore makes it easier for the system to understand how the local language maps to English. In this example, the system would generate a draft as follows:

  1. Compare the English back translation with the local language New Testament, in order to learn the language..
  2. Having learned the language, translate Genesis from the English NIV into the local language.

Determining the ideal setup for a project is a complex process, and it’s not something you will need to learn. The SIL Natural Language Processing team is developing tools to determine the ideal setup, and can assist teams during the onboarding process.

Generating back translations

In addition to creating drafts into the vernacular language, the system can also generate back translations into supported source languages. In order to generate a back translation draft, the team needs to already have back translated at least a few books from the vernacular language into the source language.

In this example, suppose a team has translated the four gospels into the vernacular language, and has back translated Matthew, Mark, and Luke into English. To generate a back translation draft of John, the system would do the following:

  1. Compare the English back translations of Matthew, Mark, and Luke with the vernacular language versions of Matthew, Mark, and Luke, in order to learn the language.
  2. Having learned the language, translate John from the vernacular language into English.

Back translation drafts will also contain errors and need to be edited, but the quality is usually substantially higher than for the vernacular drafts.

Getting started

Generating back translation drafts is currently open and available to all Paratext users. Generating drafts into the vernacular, due to the complexity involved in setup, requires a team to be onboarded by the SIL Natural Language Processing team. Please fill out the translation drafting registration form, and a member of the team will assess whether your project is a good candidate for generating drafts.

Regardless of whether you are generating back translation drafts or vernacular drafts, you can begin by connecting your Paratext project to Scripture Forge by following these steps:

  1. Log in to Scripture Forge, using your Paratext credentials.
  2. Connect your Paratext project by following the Connect a Paratext Project guide. When you connect the project, select your reference text as the source. For a back translation, the source text should be the vernacular.
  3. After connecting your project, click “Generate draft” in the sidebar.
  4. If you are generating a draft into the vernacular, this is as far as you can on your own, and you will need to fill out the translation drafting registration form. If your project has already been onboarded, or you are working with a back translation, click “Generate draft” to start the process.
  5. Select the books you want to translate, and then select the books you want to use as training data.
  6. Click “Generate draft” to start the process.

The draft generating process can take anywhere from several hours to several days.

Once you have a draft generated, you can preview the draft and import individual chapters into your project.

Supported languages for back translation drafting

Back translation drafts can be generated from any language, but must be back translated into one of the following languages.

Language nameISO 639-2 (T) code
Achineseace
Mesopotamian Arabicacm
Ta’izzi-Adeni Arabicacq
Tunisian Arabicaeb
Afrikaansafr
South Levantine Arabicajp
Akanaka
Amharicamh
North Levantine Arabicapc
Arabicarb
Najdi Arabicars
Moroccan Arabicary
Egyptian Arabicarz
Assameseasm
Asturianast
Awadhiawa
Aymaraayr
South Azerbaijaniazb
Azerbaijaniazj
Bashkirbak
Bambarabam
Balineseban
Belarusianbel
Bembabem
Banglaben
Bhojpuribho
Banjarbjn
Tibetanbod
Bosnianbos
Buginesebug
Bulgarianbul
Catalancat
Cebuanoceb
Czechces
Chokwecjk
Central Kurdishckb
Crimean Turkishcrh
Welshcym
Danishdan
Germandeu
dindik
Dyuladyu
Dzongkhadzo
Greekell
Englisheng
Esperantoepo
Estonianest
Basqueeus
Eweewe
Faroesefao
Fijianfij
Finnishfin
Fonfon
Frenchfra
Friulianfur
Nigerian Fulfuldefuv
Scottish Gaelicgla
Irishgle
Galicianglg
Guaranigrn
Gujaratiguj
Haitian Creolehat
Hausahau
Hebrewheb
Hindihin
Chhattisgarhihne
Croatianhrv
Hungarianhun
Armenianhye
Igboibo
Ilokoilo
Indonesianind
Icelandicisl
Italianita
Javanesejav
Japanesejpn
Kabylekab
Kachinkac
Kambakam
Kannadakan
Kashmirikas
Georgiankat
krknc
Kazakhkaz
Kabiyekbp
Kabuverdianukea
Khmerkhm
Kikuyukik
Kinyarwandakin
Kyrgyzkir
Kimbundukmb
Kurdishkmr
Kongokon
Koreankor
Laolao
Ligurianlij
Limburgishlim
Lingalalin
Lithuanianlit
Lombardlmo
Latgalianltg
Luxembourgishltz
Luba-Lulualua
Gandalug
Luoluo
Mizolus
Latvianlvs
Magahimag
Maithilimai
Malayalammal
Marathimar
Minangkabaumin
Macedonianmkd
Malagasyplt
Maltesemlt
Manipurimni
Mongoliankhk
Mossimos
Māorimri
Burmesemya
Dutchnld
Norwegian Nynorsknno
Norwegian Bokmålnob
Nepalinpi
Northern Sothonso
Nuernus
Nyanjanya
Occitanoci
Oromogaz
Odiaory
Pangasinanpag
Punjabipan
Papiamentopap
Persianpes
Polishpol
Portuguesepor
Dariprs
Southern Pashtopbt
Quechuaquy
Romanianron
Rundirun
Russianrus
Sangosag
Sanskritsan
Santalisat
Sicilianscn
Shanshn
Sinhalasin
Slovakslk
Slovenianslv
Samoansmo
Shonasna
Sindhisnd
Somalisom
Southern Sothosot
Spanishspa
Albanianals
Sardiniansrd
Serbiansrp
Swazissw
Sundanesesun
Swedishswe
Swahiliswh
Silesianszl
Tamiltam
Tatartat
Telugutel
Tajiktgk
Filipinotgl
Thaitha
Tigrinyatir
Tamashektaq
Tok Pisintpi
Tswanatsn
Tsongatso
Turkmentuk
Tumbukatum
Turkishtur
Akantwi
Tamazighttzm
Uyghuruig
Ukrainianukr
Umbunduumb
Urduurd
Uzbekuzn
Venetianvec
Vietnamesevie
Waraywar
Wolofwol
Xhosaxho
Yiddishydd
Yorubayor
Cantoneseyue
Chinesezho
Malayzsm