Understanding Drafting
Introduction
SIL International’s Natural Language Processing team has developed a tool that can assist in Bible translation by creating a rough first draft for translators to edit and refine. As of January 1st, 2024, this tool is now available for use within Scripture Forge, a scripture editing platform that is closely integrated with Paratext.
All of the drafts that are created contain errors that need to be corrected by the translation team. For some projects, the quality will be too low to be useful. However, extensive field testing has shown that a significant number of teams find the drafts to be very helpful in their work, and of sufficient quality to use as a starting point for the team to edit.
Ultimately, the measure of success for the drafts is in their usefulness as a tool to assist the translation team in their work, not their ability to stand alone as a finished product.
How it works
In order to use the tool effectively, it’s important to understand how it works. The drafts are created using a two-step process:
- Learn the language.
- Translate the text.
The most important step is the first one, the learning of the language. The quality of the draft that is created depends almost entirely on how well this step goes.
The system learns by seeing the same sentence written in multiple languages, one in a language that it already understands, and one in the language you are translating into.
The most important point to remember is that these sentences need to say the same thing in both languages. This is the same principle that allowed scholars to learn hieroglyphics from the Rosetta Stone: the same text was written in three different languages, allowing comparison between a language that was already understood and one that was not.
Text that is available in both the language you are translating from, and the language you are translating to, is known as “parallel text”. In general, the more parallel text you have, the better the system will learn the language. For most projects, this parallel text will be your reference text, and the translation work you have already completed. At this time, we recommend that you have at least the New Testament translated.
Example use cases
A basic example
Suppose a translation team is translating the Bible into a local language, using the English NIV as a reference text. The team has completed the entire New Testament, and is beginning work on Genesis. The system would generate a draft as follows:
- Compare the English NIV New Testament with the local language New Testament, in order to learn the language.
- Having learned the language, translate Genesis from the English NIV into the local language.
After generating the book of Genesis, the team would edit the draft to correct any errors. Afterwards, they would be able to generate a draft of the next book they plan to work on, and the system would use the translation of Genesis to improve the quality of the next draft.
A more complex example
In the first example, the team was using the English NIV as a reference text, and the system learned the language by comparing the English NIV with the local language New Testament. However, in many cases a better quality draft can be generated by using a different text than the one the team is translating from.
For example, it is often possible to improve the quality of the draft by using a back translation of the local language into the source language. If the team has back translated the project into English, it could be used as the reference text instead of the English NIV. A back translation is usually much more literal than a normal translation, and therefore makes it easier for the system to understand how the local language maps to English. In this example, the system would generate a draft as follows:
- Compare the English back translation with the local language New Testament, in order to learn the language.
- Having learned the language, translate Genesis from the English NIV into the local language.
Determining the ideal setup for a project is a complex process, and it’s not something you will need to learn. The SIL Natural Language Processing team is developing tools to determine the ideal setup, and can assist teams during the onboarding process.
Generating back translations
In addition to creating drafts into the vernacular language, the system can also generate back translations into supported source languages. In order to generate a back translation draft, the team needs to already have back translated at least a few books from the vernacular language into the source language.
In this example, suppose a team has translated the four gospels into the vernacular language, and has back translated Matthew, Mark, and Luke into English. To generate a back translation draft of John, the system would do the following:
- Compare the English back translations of Matthew, Mark, and Luke with the vernacular language versions of Matthew, Mark, and Luke, in order to learn the language.
- Having learned the language, translate John from the vernacular language into English.
Back translation drafts will also contain errors and need to be edited, but the quality is usually substantially higher than for the vernacular drafts.
Getting started
Generating back translation drafts is currently open and available to all Paratext users. Generating drafts into the vernacular, due to the complexity involved in setup, requires a team to be onboarded by the SIL Natural Language Processing team. Please fill out the translation drafting registration form, and a member of the team will assess whether your project is a good candidate for generating drafts.
Regardless of whether you are generating back translation drafts or vernacular drafts, you can begin by connecting your Paratext project to Scripture Forge by following these steps:
- Log in to Scripture Forge, using your Paratext credentials.
- Connect your Paratext project by following the Connect a Paratext Project guide. When you connect the project, select your reference text as the source. For a back translation, the source text should be the vernacular.
- After connecting your project, click “Generate draft” in the sidebar.
- If you are generating a draft into the vernacular, this is as far as you can on your own, and you will need to fill out the translation drafting registration form by clicking on “Sign up for drafting”. If your project has already been onboarded, or you are working with a back translation, click “Generate draft” to start the process.
- Select the books you want to translate, and then select the books you want to use as training data.
- Click “Generate draft” to start the process.
The draft generating process can take anywhere from several hours to several days.
Once you have a draft generated, you can preview the draft and import individual chapters into your project.
Supported languages for back translation drafting
Back translation drafts can be generated from any language, but must be back translated into one of the following languages.
Language name | ISO 639-1 | ISO 639-2 | ISO 639-3 |
---|---|---|---|
Achinese | ace | ||
Mesopotamian Arabic | acm | ||
Ta'izzi-Adeni Arabic | acq | ||
Tunisian Arabic | aeb | ||
Afrikaans | af | afr | afr |
South Levantine Arabic | ajp | ||
Akan | ak | aka | |
Amharic | am | amh | amh |
North Levantine Arabic | apc | ||
Standard Arabic | ar | arb | ara |
Najdi Arabic | ars | ||
Moroccan Arabic | ary | ||
Egyptian Arabic | arz | ||
Assamese | as | asm | asm |
Asturian | ast | ||
Awadhi | awa | ||
Aymara | ayr | ||
South Azerbaijani | azb | ||
North Azerbaijani | azj | ||
Bashkir | ba | bak | bak |
Bambara | bm | bam | bam |
Balinese | ban | ||
Belarusian | be | bel | bel |
Bemba | bem | ||
Bengali | bn | ben | ben |
Bhojpuri | bho | ||
Banjar | bjn | ||
Tibetan | bo | bod | tib |
Bosnian | bs | bos | bos |
Buginese | bug | ||
Bulgarian | bg | bul | bul |
Catalan | ca | cat | cat |
Cebuano | ceb | ||
Czech | cs | ces | cze |
Chokwe | cjk | ||
Central Kurdish | ckb | ||
Crimean Turkish | crh | ||
Welsh | cy | cym | wel |
Danish | da | dan | dan |
German | de | deu | ger |
Dinka | dik | ||
Dyula | dyu | ||
Dzongkha | dz | dzo | dzo |
Greek | el | ell | gre |
English | en | eng | eng |
Esperanto | eo | epo | epo |
Estonian | et | est | est |
Basque | eu | eus | baq |
Ewe | ee | ewe | ewe |
Faroese | fo | fao | fao |
Fijian | fj | fij | fij |
Finnish | fi | fin | fin |
Fon | fon | ||
French | fr | fra | fre |
Friulian | fur | ||
Nigerian Fulfulde | fuv | ||
Scottish Gaelic | gd | gla | gla |
Irish | ga | gle | gle |
Galician | gl | glg | glg |
Guarani | gn | grn | grn |
Gujarati | gu | guj | guj |
Haitian Creole | hat | ||
Hausa | ha | hau | hau |
Hebrew | he | heb | heb |
Hindi | hi | hin | hin |
Chhattisgarhi | hne | ||
Croatian | hr | hrv | hrv |
Hungarian | hu | hun | hun |
Armenian | hy | hye | arm |
Igbo | ig | ibo | ibo |
Iloko | ilo | ||
Indonesian | id | ind | ind |
Icelandic | is | isl | ice |
Italian | it | ita | ita |
Javanese | jv | jav | jav |
Japanese | ja | jpn | jpn |
Kabyle | kab | ||
Kachin | kac | ||
Kamba | kam | ||
Kannada | kn | kan | kan |
Kashmiri | ks | kas | kas |
Georgian | ka | kat | geo |
Central Kanuri | knc | ||
Kazakh | kk | kaz | kaz |
Kabiye | kbp | ||
Kabuverdianu | kea | ||
Khmer | km | khm | khm |
Kikuyu | ki | kik | kik |
Kinyarwanda | rw | kin | kin |
Kyrgyz | ky | kir | kir |
Kimbundu | kmb | ||
Northern Kurdish | kmr | ||
Kongo | kg | kon | kon |
Korean | ko | kor | kor |
Lao | lo | lao | lao |
Ligurian | lij | ||
Limburgish | li | lim | lim |
Lingala | ln | lin | lin |
Lithuanian | lt | lit | lit |
Lombard | lmo | ||
Latgalian | ltg | ||
Luxembourgish | lb | ltz | ltz |
Luba-Lulua | lua | ||
Luganda | lg | lug | lug |
Luo | luo | luo | |
Lushai | lus | lus | |
Latvian | lv | lvs | lav |
Magahi | mag | mag | |
Maithili | mai | mai | |
Malayalam | ml | mal | mal |
Marathi | mr | mar | mar |
Minangkabau | min | min | |
Macedonian | mk | mkd | mac |
Plateau Malagasy | plt | plt | |
Maltese | mt | mlt | mlt |
Manipuri | mni | mni | |
Halh Mongolian | khk | khk | |
Mossi | mos | mos | |
Maori | mi | mri | mao |
Burmese | my | mya | bur |
Dutch | nl | nld | dut |
Norwegian Nynorsk | nn | nno | nno |
Norwegian Bokmål | nb | nob | nob |
Nepali | npi | npi | |
Northern Sotho | nso | nso | |
Nuer | nus | nus | |
Chichewa | ny | nya | nya |
Occitan | oc | oci | oci |
West Central Oromo | gaz | gaz | |
Odia | ory | ory | |
Pangasinan | pag | pag | |
Punjabi | pa | pan | pan |
Papiamento | pap | pap | |
Persian | fa | pes | per |
Polish | pl | pol | pol |
Portuguese | pt | por | por |
Dari | prs | prs | |
Southern Pashto | pbt | pbt | |
Quechua | quy | quy | |
Romanian | ro | ron | rum |
Rundi | rn | run | run |
Russian | ru | rus | rus |
Sango | sg | sag | sag |
Sanskrit | sa | san | san |
Santali | sat | sat | |
Sicilian | scn | scn | |
Shan | shn | shn | |
Sinhala | si | sin | sin |
Slovak | sk | slk | slo |
Slovenian | sl | slv | slv |
Samoan | sm | smo | smo |
Shona | sn | sna | sna |
Sindhi | sd | snd | snd |
Somali | so | som | som |
Sotho, Southern | st | sot | sot |
Spanish | es | spa | spa |
Tosk Albanian | sq | als | als |
Sardinian | sc | srd | srd |
Serbian | sr | srp | srp |
Swazi | ss | ssw | ssw |
Sundanese | su | sun | sun |
Swedish | sv | swe | swe |
Swahili | sw | swh | swh |
Silesian | szl | szl | szl |
Tamil | ta | tam | tam |
Tatar | tt | tat | tat |
Telugu | te | tel | tel |
Tajik | tg | tgk | tgk |
Tagalog | tl | tgl | tgl |
Thai | th | tha | tha |
Tigrinya | ti | tir | tir |
Tamashek | tmh | taq | taq |
Tok Pisin | tpi | tpi | tpi |
Tswana | tn | tsn | tsn |
Tsonga | ts | tso | tso |
Turkmen | tk | tuk | tuk |
Tumbuka | tum | tum | tum |
Turkish | tr | tur | tur |
Twi | tw | twi | twi |
Tamazight | tzm | tzm | tzm |
Uighur | ug | uig | uig |
Ukrainian | uk | ukr | ukr |
Umbundu | umb | umb | umb |
Urdu | ur | urd | urd |
Uzbek | uz | uzn | uzn |
Venetian | vec | vec | vec |
Vietnamese | vi | vie | vie |
Waray | war | war | war |
Wolof | wo | wol | wol |
Xhosa | xh | xho | xho |
Yiddish | yi | ydd | yid |
Yoruba | yo | yor | yor |
Cantonese | zh | yue | yue |
Chinese | zh | zho | chi |
Malay | ms | zsm | zsm |
Zulu | zu | zul | zul |