Translation Impact Chat GPT-4 Vs. 3.5
Open AI just released GPT-4, an upgraded version of its state-of-the-art generative AI.
According to Chat GPT itself, the difference between the models is:
In plain language it’s bigger, better and meaner. What does this mean in terms of translation and localization? According to Open AI Research:
In simple terms: Imagine you are really good at taking tests in school. You have always done well on tests in English, but you want to see if you can do well on tests in other languages too. So, you take a big test with 14,000 questions in many different subjects like math, science, and history.
To make it even harder, you take the test in different languages using a special tool that translates the questions into other languages for you. You try it in 26 different languages, but only 2 of them don't work well.
Surprisingly, you do even better on the test in these other languages than you did on the same test in English. You even do really well in languages that aren't used very much, like Latvian, Welsh, and Swahili.
That's kind of what happened when they tested a computer program called GPT-4. It's really good at understanding questions and answering them, even when they are in different languages. It did better than other similar programs like GPT-3.5 and Chinchilla, even for the harder languages.
Let’s compare a few real world examples in a few languages.
A quick analysis of this limited sample of 10 idiomatic sentences in US English reveals no major differences between the output of GPT-3 and GPT-4. This demonstrates that:
GPT-3 was already quite impressive to begin with.
One must pay close attention to details in order to discern the differences between GPT-3 and GPT-4.
When examining the subtle differences between the two, GPT-4 was in one case significantly better than GPT-3 but also notably worse in another instance. In most cases, it was either identical or slightly better.
The concepts that applied to GPT-3 hold even truer for GPT-4. With more parameters and data, prompts become increasingly important. The slightest variation in prompt suggestion can yield substantially different results. As the AI model becomes more refined, greater precision is required from the user.
Regarding other languages mentioned in OpenAI's research, we will soon conduct our own investigation into their performance and return with findings about translation quality and opportunities in the context of GPT-3 vs. GPT-4.