The Way of the Translator Towards AI Annotation
I Wish It Had Been a Rabbit’s Hole from Alice in Wonderland
A portal into a whole new world. This past February I took a weekend off of everything and everyone, and I went to Montevideo, Uruguay. Quite a lovely weekend, I might add. Of course, I am still craving real time away from stress, and a proper holiday in a chosen dreamy location. Therefore, I considered that weekend as a tiny appetizer. The last couple of years have been tough for me in the health department and, as a consequence, I failed to be on top of worldwide industry advancements. Hence the reason why the light speed rise of AI took me completely by surprise. I should have known better. But I was otherwise engaged, having removed two whole parts of my body. No details required.
And there I was. Back from the lovely Carrasco area in Montevideo with friends, lovely beaches, palm trees, and English style country houses… That was the exact moment when I felt pushed down a frightful and endless cliff into an abyss flooded with frozen waters. No warnings. No heads up. Nothing. Being on this side of the world, one might expect this lack of information due to being away from my desk for so long.
And this is how my over 20 years of experience and love for my work as a translator went down the AI driven drain. Just like that. I had an awful period of 15 days when I went from anger, through despair, sheer hate for the whole new paradigm and all around it, flashes of hope, and then, utter and complete silence. I began to rise from my own professional ashes. I am 49, and I still need to pay my bills, share support for my family, somewhat enjoy life, and if we get a little crazy, perhaps even save a few bucks now and then.
No way, José. Not even AI, but AI hardcore enthusiasts kept telling me it was over. They kept telling me it should have been over sooner! The nerve. They still do. I still rose with new ideas most days. And I started talking to colleagues about this situation.
We Find and Even Make Our Own Beacons
Then I began reading all I could find, that sounded half intelligent, about AI, its uses, applications, main companies, relevant roles, etc. Although I am ready and eager to learn new stuff, I must confess, I wouldn’t go as far as a new university career. That is neither my goal nor a priority for me at this stage. Nevertheless, I remain open to adding new skills to my tool belt. As I have always been.
I figured that, if I had to set aside my role as a scientific-literary translator, perhaps I could find another way of working with words in this new scheme of things. I can adapt, I can do many things, I can even start over, but I will not surrender 100% of the things I love doing. I can flex, but I won’t break. Just like bamboo. For many years I have been a willing victim, but a victim nonetheless, of forcing myself to be someone else in order to fit societal norms. Not another day of that. Not on any level.
A Different Journey for Everyone - My Field Notes
So, there I went. Firstly, I updated my resume so that it would reflect my now multifaceted professional profile. I managed to check the current proposal of specialized training that would fit my schedule, my goals, and my budget. And I enrolled in the chosen ones. I am, of course, sailing in those unchartered waters. But I am loving every minute of it.
I analyzed which were the most relevant AI companies right now. Of course, even this can change really quickly. We are talking about a business in constant movement.
The main actors in today’s AI industry are OpenAI with ChatGPT, Google with Gemini, Anthropic with Claude, Nvidia, and Llama with Meta. I am sure you probably already know that, but the idea here is to come with me on the trip that I made to “AI AI Land”.
Basic Terminology to Get My Drift
What is an AI Data Annotator - AI Data Annotators serve as a vital bridge, turning raw, unstructured data into machine-readable information, which is the lifeblood of functional AI models.
Text annotation marks features, labels semantics, composition, context, purpose, emotion, and other data tags, helping machines recognize human intentions or emotions for accurate language understanding.
What are third-party companies - Third-party companies are sub-hired to fulfill a task for the main company. In this scenario, many former translation agencies or even remote work companies are sub-hired to recruit and onboard staff to fulfill different roles related to AI.
What are LLMs - LLMs are AI systems used to model and process human language. They are called “large” because these types of models are normally made of hundreds of millions or even billions of parameters that define the model's behavior, which are pre-trained using a massive corpus of text data.
High Rollers All The Way
AI firms do not recruit data annotators themselves. They outsource this task to companies specialized in data collection, annotation, and other areas. Clearly, they are busier developing new stuff every second of every day.
Confirmed by their own internal information, I can mention that these are the current connections between AI companies and their resource suppliers (aka, third-party companies).
OpenAI, the company behind me (ChatGPT), hires human annotators to train and fine-tune the model. These annotators are typically employed through a combination of in-house staff and third-party contracting firms. Their daily work involves labeling data, reviewing outputs, and providing feedback to improve the model's accuracy, coherence, and safety. Their primary company is Scale AI.
Google collaborates with third-party firms for hiring annotators to train their AI models, including Gemini. One of the known firms involved in this process is Appen. Appen provides data annotation and other AI training services, which help improve the performance and accuracy of AI models like Gemini by providing high-quality labeled data.
In 2024, the company Scale AI is responsible for hiring AI annotators for Anthropic. Scale AI specializes in providing data labeling and annotation services that are critical for training and validating AI models.
This year, companies hiring AI annotators for Nvidia include TELUS International and Appen. TELUS International offers data annotation, transcription, and content moderation, crucial for training AI models used by Nvidia and other tech companies. Appen, a well-known leader in data annotation services, provides high-quality training data essential for machine learning and AI applications.
Meta uses a combination of internal resources and external partnerships for the annotation and fine-tuning of their models.
Dipping My Pen in the Real Ink
While connecting all these dots and checking their websites, I contacted the above-mentioned data annotation companies. Of course, not all of them share the same HH. RR. or any other policies, for that matter.
Expectedly, since this whole revolution really began a couple of years ago, most of the data annotation companies already have annotators teams for the thousands. Clearly, I am much more than a late bloomer to this game.
Nevertheless, for the sake of curiosity and exercise, I did contact all these companies that provide services to the most powerful AI giants. As I imagined, none of them did even bother to reply with an automatic message of refusal. And I do understand. Really.
As you might have guessed, there are many more companies in the same business with perhaps less popularity or mutating companies that came into serving AI firms from another industry. In the first group I found Outlier. And in the second one, coming from the localization/translation business, we have e2f.
Of course, on many occasions, this kind of companies do to not reveal the AI client for whom they are working. And that is fully understandable as part of their confidentiality policies and agreements with each other. This is the reason why I can neither assume nor infer for which AI company they work.
All That Glitters…
As in every other human trade, sometimes companies begin their journey within one industry and then morph into another market segment, due to countless reasons. The first one is survival and the second one, the power of their will to progress.
After sending both companies my resume and actually having worked in the past for e2f as a translator, I received an email from both of them in order to proceed with the onboarding process.
I had to fulfill and sign NDA agreements, complete my profile within their websites in the portal for collaborators, take several tests, pass all of them, and then attend numerous training sessions. In fact, in one of these companies, there are countless training sessions. The reason is they require annotators to complete training courses for every single new project. Of course, I have always taken additional training sessions for many translation clients in the past. The issue here is that training sessions are too many and too lengthy. When you do start working on the tasks for a project, many of them are only paid less than half of the full hourly rate, which is rather low for starters. The main reason is that the company considers you to be in training all the time. Get it?
The actual different tasks within the annotation can be repetitive in some cases, monotonous in others, and quite complicated in others. There are a few specific tasks which appealed to me a bit more, for instance, writing imaginative prompts for the AI model with several constraints, then reviewing the AI response, and finally offering feedback.
As we all know, humans might commit different kinds of mistakes along the way. The problem here is that annotation companies have almost no room for error. And whenever you make one, even a tiny mistake, you might be taken out of the annotation team.
Another negative aspect of this kind of work is that, since everything in the AI world is in constant movement at an overwhelming speed, you need to attend feedback sessions every single day of the week. Even on weekends.
A specific requirement applies to this kind of service. The main hiring companies that I mentioned above, Appen, for example, do have a Careers section. The thing is the annotation process needs their human teams to be located within a specific region, even if they work remotely. This is due to cultural reasons, background, general knowledge, etc.
The Extremely Cheap Elephant in the Room
And finally, we come to a sensitive subject in any service industry: rates. If I am being honest, after researching this market exclusively on Data Annotation prospects, most companies share the same rates. They might divide the annotator's profile they seek in terms of more or less specific subject matters. Those who, due to their background, are qualified to annotate very specific fields of knowledge, might get better rates.
Otherwise, from my very personal viewpoint, the rates they are offering to newcomers into the annotation arena are monstrously low. Almost non-existent. And these tasks do require a lot of attention to detail, reading comprehension, and lots of your extra time in order to fulfill the training and feedback session requirements.
On top of that, most of the time, these companies are not the best in terms of organization either. You are thrown into a Slack channel that invades your Inbox with hundreds of cryptical messages that no one responds to. This takes even more of your time, and it is not useful at all. Quite the contrary.
In order to check other points of view, I contacted two colleagues regarding their own experience with the data annotation companies I briefly worked for. They both agreed with me in terms of lack of organization, faulty communication with annotators’ teams, too many tests, and feedback meetings that take too much time and jeopardize deadlines. Of course, we all think rates are too low, but one of them considers that it is worth the effort, if you have the time to devote to these tasks.
The Takeaways
After all this reading, research, process, trial and error, and writing, I found the answer to my original question. Is it possible and productive for a professional translator to readjust her/his career course, and begin a new path in Data Annotation?
It all depends on each translator. As clear as that. There are no absolutes here, like in most things in life. I know now that Data Annotation is not the new path I will follow. But I really enjoyed the process of finding this out.
Nevertheless, this option would work well for translators with a different personal profile. Newly graduated translators, young professional linguists with little or no financial/family responsibilities to fulfill, or as a supplementary income, in case they already have a more convenient source of income.
The most important feature of the task as a Data Annotator is the chance to make a small contribution to the quality of AI Models responses. But for my particular profile, the cost is too high, and the pay too low. In any case, I highly value the training experience, and all the knowledge I am eager to incorporate from now on.