2023-10-06

Context-Aware Localization of iOS Apps with ChatGPT

L10n-GPT on GitHub

TL;DR: Usually, automated translation of terms used in user interfaces gives poor results, because correct translation depends on the context in which a term or sentence occurs. The approach discussed here uses ChatGPT to generate additional information about the context in which a term is shown in a UI to then create a translation that takes this context into account, as a human translator would to.


Localization describes the process of adapting a software’s UI (user interface) for multiple languages, e.g. making an iOS app available in English and German. As it’s a rather long word, the abbreviation l10n (l, then ten letters, then n) for localization is common as well as i18n as a short form of internationalization.

L10n usually has the problem that especially short sentences or terms, as they occur a lot in user interfaces, lack context to be translated properly. Therefore, localization has long been a task that was fulfilled by humans that are able to look at the app’s UI and know what the purpose of an app is. These context information distinguish them from digital translators like Google Translate or DeepL.

Context-Aware l10n of iOS apps as discussed here, will first extract additional information from the source code that builds the user interface. These information may state that a term that is to be translated, is shown in a list in the settings of an app or that it’s a button’s title in an overlay that asks for the user’s confirmation. This context information can then be used in a next step to provide better translations as compared to translations that are just generated from the term itself.


Starting from an iOS project in Xcode that is not localized, i.e. only available in a single language, the following steps will be performed to automatically localize this app:

  1. Augment the existing source code files with String(localized: .., comment: ..) constructors to not only provide a string but also a comment that provides information about where this term appears in the UI.
  2. Generate a String Catalog in Xcode to provide translations.
  3. Let ChatGPT do the translation, using context information from the comment: ... fields and write the results back to the String Catalog.


I am starting with an existing iOS app that is already available on the AppStore but only in one language, German in this case. For a future release, I want to make the app available in English as well but did not think about localization while developing the app so far. Therefore, I just used normal strings to build the user interface with SwiftUI.

The next section deals with augmenting the swift source code to provide some comments in String constructors that will make translating the string catalog easier. These helping comments are useful for human translators and the proposed ChatGPT-based approach alike.

1 Generating Context Information for Strings

As already stated in the introduction, I want to get ChatGPT to modify the Swift source code of my iOS app so that I don’t use any normal strings to build the user interface but only string initializers that hold comments as well to provide some context when translating the app.

I am interfacing ChatGPT using the OpenAI Python API. You need to create an account with OpenAI and get an API token which you can use in your Python script to authenticate yourself. OpenAI has tutorials on this, therefore I’m not going to explain the process here.

Prompt structure for generating String constructors with comments. For every non-localized file, a new prompt is generated. It contains info about the task and an example of how the localization looks for another file.

The prompt that I gave to ChatGPT consists of the following parts:

  1. An introduction to the task and an explanation of what I want to achieve, i.e. translating an app.
  2. A sample Swift file that is not localized, as well as a version of the same file that has already been rewritten by hand to use String(..) constructors.
  3. A file that has not been localized along with the ask to prepare this file in the same way as seen in the example.

This workflow kind of resembles the training of a machine learning algorithm, where example data is used to show a machine what we want to achieve. The difference here ist, that a single file as an input is enough and we don’t need a huge database of non-localize files and their rewritten counterparts.

I will go over the individual steps in detail and then discuss the results that were achieved.

1.1 Explaining the Task

The first part of the prompt that is given to ChatGPT via the API is the following explanation of the task:

I want you to look for strings in a Swift file and replace them with calls to the `String` initializer with the arguments `localized` and `comment` so that the file can be easier used for localization of an app.
  In the `comment` field of the initializer, include information about where the text will be visible, e.g. in a footer in the user interface, as part of a row in a table, as a heading for the whole page, etc.
  Do not remove any other text from the input file.
  Just output the content of the modified file. Do not add introductory text like "Here is the updated file" or similar.
  If the lines get very long, do not put `String(localized: "...", comment: "...")` in one line, but do it like this:
  ```
  String(
      localized: "...",
      comment: "..."
  )
  ```

  You will be given three files.
  First, an example of a file that has not been adapted with the string constructors.
  Second, the same file, but with the required changes already applied.
  Third, a file without the calls to the `String` initializer where it's your task to make those modifications and to return the whole file modified.

I came up with this prompt after a couple of unsuccessful runs. The false responses only outputted the changed lines, not the full code, left out some comments or similar.

1.2 Example Swift Files

Next, the contents of two Swift files were added to the prompt, each with a small introductory comment as shown in the following structure. The two version of the file I chose were not just arbitrarily selected from all of the file in my Swift project. I specifically chose a file which covers as many edge cases as possible, e.g. long strings for which I want ChatGPT to output the String(..) instantiation over multiple.

First, a file without the modifications:

  <contents of non localized file>

  Now the same file, but with the required changes already applied:

  <content of localized file>

1.3 3. Provide the Unseen File

The last step in generating the prompt is to append the file that I want ChatGPT to process. This was a single Swift file from my project folder. For each Swift file that needs to be converted to use localized strings with comments, one prompt was generated and one API call to ChatGPT was made.

Now the last file, for which you should make those changes and respond with the whole file in updated form.
          
  <content of new file which is not yet localized>

1.4 Prompt Response

First of all, the response quality was strongly dependent on the model that was used. Using GPT 3.5 (gpt-3.5-turbo-16k, the standard 3.5 model’s token limit is too low) gave bad results that were not usable. They had just too many errors in them. However, the results with GPT 4 model (gpt-4-0613 in the OpenAI API) were very good right from the start. I made some small adjustments to my prompt for tuning but very little improvement iterations were needed from my side.

The figure below shows a good example where the model achieved exactly what I wanted it to do.

Positive example of localization performed by ChatGPT. The text that appears in the UI is used in a string constructor and the comment provides context about the strings usage in the UI, i.e. in this case a list in the settings.

However, there were still some few cases in which the model did not output code I could just commit in my project. For example when it tried to localize the empty string, i.e. "". The picture below shows the model’s suggested localization for this case. While this is technically correct, I don’t want to localize the empty string if this is used to judge whether a user typed a valid name.

One case were localization was not needed.

I do see some blame on my side as well, since it would have been cleaner if my code checked for !pinName.strip().isEmpty in the first place.

Note: strip() is an extension for Swift strings which I added to resemble the behavior that is known from the equally named method in Python.

Another file I handed to the model did not have any code related to the user interface in it. That probably tricked the model a bit as there was not a single line to change and it started instantiating strings from comments that documented the code. In another file without UI-elements, it started to localize UserDefaults keys, which is absolutely not what we want since this would cause the app to access different data once the language was changed.

In a file that does not build a user interface, there was nothing to localize. ChatGPT did not like that and localized a comment.

Apart from that, the model was kind enough to correct a few spelling mistakes in comments.

The few mistakes I described here were minor and about the only problems I had. Way more than 100 strings were reliably augmented with comments as requested! I will definitely use this approach again, but the next time only provide files that actually hold some code related to the user interface.

2 Translating the String Catalog

Since my project did not have any form of localization so far, I started with setting up a String Catalog in the project. This process is described in Apple’s documentation.

This brought me to a state where I had a string catalog with two languages in my project, one for German that was already fully populated because it is the app’s default language and another one in English which has only the keys but no English translation yet.

Localization Catalog in Xcode showing the original German terms along with the comments.

In the string catalog, there’s a column for the comments that are given in the String(..) initializers in the code. This comes in handy now, since especially for short strings, it might be context-dependent how they’re translated. In my app, there are multiple buttons labeled with “Abbrechen” (engl.: abort) and the string catalog shows all of the comments from the different occurrences of this word.

Localization Catalog in Xcode showing the new language and only 1% completion after one translation was added by hand. Note the State in the right-most column.

From here on, one could translate this by hand or pass this task to a professional translator. I would like ChatGPT to do this for me as well. Therefore I needed to dig into how the string catalog’s data structure is built so that I could parse it, have ChatGPT do the translations and then augment the existing string catalog with the information I got from ChatGPT.

Prompt structure for generating the translations. It does not only contain the text that is to be translated but also the description of the task, info about the app and the comment that was generated in the previous step.

Why not just use Google Translate or DeepL to do the translation? Because a lot of translations depend on the context in which the text occurs. This is the reason why I wanted ChatGPT to write comments in the first step while he had access to the source code of the app. By doing so, we can now hand the German terms and the context information in the form of comments to ChatGPT so that more qualitative translations are created.

2.1 Xcode String Catalog Format

The string catalog Localizable.xcstrings in Xcode is luckily not a proprietary data structure but a json-formatted file which can easily be parsed and edited with Python. The file starts with an info about the source language and then holds a dictionary called strings, which contains another dictionary for each of the keys (i.e. German terms in my case).

{
  "sourceLanguage" : "de",
  "strings" : {

    "Abbrechen" : {
      "comment" : "Text for the cancel button in alerts.\nButton in the alert box where the user can exit without making any changes.\nButton in alert box to cancel the current operation.",
      "localizations" : {
        "en" : {
          "stringUnit" : {
            "state" : "translated",
            "value" : "Abort"
          }
        }
      }
    },

    "Aktionen" : {
      "comment" : "Tab item name"
    },

    "App Version: " : {
      "comment" : "Prefix for the app version string in the about page."
    },

  },
  "version" : "1.0"
}

2.2 ChatGPT Translation Prompt

Parsing this and providing it to ChatGPT is straight forward. For each term we would like to have a translation for, we will pass the key, e.g. “Aktionen” and the provided comment, which tells us that this string appears in the app’s tab bar.

The prompt that I handed to ChatGPT consisted again of some form of explanatory text, that describes the task and an overview of the data that I will provide. The text I used is given below.

I want you to translate some text from German to English.
  This text will be used to offer an iOS app in different languages.
  The input given to you will consist of three lines for each phrase that needs to be translated.
  First, the phrase in German.
  Second, a comment that describes in which context the phrase is occurring in the application's UI. Make sure that the translation you provide fits this context.
  Thrid, a line starting with "translation: " in which you should add your translation.

  Please return only the lines starting with "translation:" with your added translation after the colon. Do not include the comments in the translations, those are only to add context.

In addition to this description and the comments from the Localizable.xcstrings file, I noticed that some additional context information is necessary. I am translating an app that deals with electricity meters and the German word for that is “Stromzähler.” Oftentimes, this word is abbreviated with “Zähler” and I used this abbreviation in the app’s UI. But this usually translates back to counter which is of course not synonymous with power meter or electricity meter.

That’s why I additionally included the following information in the prompt:

In the data you will be given, the word "Zähler" refers to a digital power meter or an electricity meter. Do not translate it with "counter".

Next, I appended the information from the Localizable.xcstrings file.

key: Gespeicherte Pins verwalten
  comment: Text of the navigation link to manage pins in the about page.
  translation:

For each prompt, I used 30 of those statements and left the translation empty, while using the key and comment values that were parsed from Localizable.xcstrings. The prompts were passed to OpenAI’s gpt-3.5-turbo model.

2.3 Handling the Translated Response

ChatGPT’s response was a new line separated string where each line started with translation: and held the context-aware translation for the German text. I just took this data and wrote it back to the Localizable.xcstrings file while making sure to stick to the structure that Xcode expects.

translation: Manage Saved Pins

And that’s it! This is how arrived at the green check mark in Xcode using context-aware translation of my user interface.

App fully localized in English and German

The script I came up with allows reading a Localizable.xcstrings file that is already partly localized. If the app grows and new terms and sentences are introduced, the script can be re-run and only provide the translation for new terms. Xcode will automatically add new strings from the Swift code to Localizable.xcstrings when the app is build.

3 Current Limitations

One limitation to the system is the missing localization of words that can occur in a plural form. This, however, is at the moment just a shortcoming of my implementation and if the discussed scripts were adjusted accordingly, ChatGPT could assist with this as well.

Another problem is the additional context information that I had to provide during the translation step. While the information that described the user interface were used by ChatGPT to generate translations that are correct in the app’s context, I still needed to provide some additional info that told ChatGPT to not translate “Zähler” with “counter.” I see two possible steps to mitigate this problem:

  1. Provide a detailed description of what the app is doing as part of the query.
  2. Use the more advanced GPT-4 model for the translation step as well as I found it to be more capable in understanding complex relationships.

4 Outlook

Using the context information that ChatGPT extracted from the source code to provide translations that respect how the user interface is structured, is something that is only possible with large language models like ChatGPT and was not previously possible with networks that would only do translation from one language to another.

I was surprised by just how good the results were that ChatGPT created and I ended up using all of the generated translations in the app without the need to make any modifications.