Group portrait UvA Language Technology Lab

Language technology for the social good

The University of Amsterdam’s Language Technology Lab focuses on text generation for machine translation, summarisation and question answering with a keen eye on user control and multilingualism.

Since the launch of ChatGPT in late 2022 and its subsequent spectacularly rapid adaptation, language technology has attracted more attention than ever before. ‘Nowadays, one vacancy attracts more than a hundred applicants,’ says professor Christof Monz, leader of the Language Technology Lab at the University of Amsterdam’s Informatics Institute. Despite the seemingly impressive performance of ChatGPT, both language processing and generation are far from being solved problems. As everyone has been able to experience by now, ChatGPT writes falsehoods with great conviction, writes impersonal, cliched and sometimes even harmful text, offers little control over the text and works only for major languages. Monz’s lab is working to improve some of these weaknesses.

‘In general, our lab focuses on the generation of text in the form of machine translation, summarisation, question answering, and control of the generated text’, says Monz. ‘As for the latter, we want to give users more control over aspects such as quality, formality and toxicity. Does a sentence flow well? Is the content appropriate? Should you translate the English ‘you’ into Dutch as ‘jij’ or as ‘u’? That depends on the context. How do you translate slang? How do you avoid discriminatory and other harmful language?’

For machine translation, Monz’s lab is focusing on smaller languages, for which the well-known translation engines such as Google Translate or DeepL do not work well, if at all. Monz: ‘Automatic translation from Bengali to Swahili, for example, is currently dramatically poor. Because we value inclusiveness, providing language technology for smaller languages is important too. Therefore, we are developing techniques that are able to translate languages for which little or no data exist.’ This is also the focus of Monz’s ongoing NWO Vici project.

One of the interesting applications of the lab’s work is the translation of documents from the City of Amsterdam for various minorities in the city, work done with a larger consortium called Language Sciences for Social Good. Monz: ‘Translating such documents happened a lot during the COVID-19 pandemic. With official city documents, quality and accuracy are, of course, extra important. Some citizens also need the text in official documents to be presented in a simplified manner. That’s something we want to work on in the coming years.’

Controllability by design

Assistant professor Vlad Niculae joined the Language Technology Lab when it was established in 2020. ‘I was really excited about the opportunity to shape a new group’s direction,’ says Niculae. Whereas Monz has a background in linguistics, Niculae comes from computer science. Niculae: ‘Christof told me that he was looking for somebody as different to him as possible but with the same values and the same drive. We both aim for a deep understanding of language technology problems but take different approaches. I am looking more for generalisations and finding mathematical answers is what gets me excited.’

In October 2022, Niculae started working on the NWO Veni project ‘Intelligent interactive natural language systems you can trust and control’. Niculae: ‘In this project, I propose a redesign of the dominant paradigm that currently underlies language generation systems like ChatGPT. I argue that in that paradigm, you cannot build in controllability and that we need a new paradigm that includes controllability by design. To give an example: one of my students is working on the generation of subtitles. That is not just about automatically recognising audio but also about the timing of the subtitle, and about the maximum number of words before the subtitle becomes unreadable. These are some of the parameters that you want to control. Every application domain has its own specific control parameters.’

Improving dialogue

Kata Naszádi worked for four years at Amazon on the automatic speech recognition system for its Alexa personal assistant before starting her PhD in 2020 at the Language Technology Lab. In recent years, the number of PhD students in the lab has grown to thirteen. ‘What is special in this group is that we are a foodie team’, says Naszádi. ‘There is this stereotype that PhD students go for the cheapest food options, but we actually like to go to really good restaurants together. An Iranian PhD student took us to an Iranian restaurant, a Chinese took us to a Chinese restaurant. As a Hungarian, I cooked a Hungarian meal for the group.’

Naszádi’s PhD research is part of the Gravitation programme Hybrid Intelligence. ‘I am trying to improve the dialogue between a human and an artificial agent in which they have to achieve some goal together’, she says. ‘I use a virtual environment based on the game Minecraft, in which an agent needs to build things and the human gives directions on what to build. They use natural language in order to coordinate their actions and understand each other better. ‘

She also collaborates with researchers from TU Delft and Erasmus MC on developing a dialogue system that allows a microsurgeon to communicate with a tiny camera he uses during surgery on blood vessels, for example. Naszádi: ‘We want to make the conversation during the surgery flow naturally, so that the surgeon can tell the camera things like go a bit more to the left or a bit more down. That would give the surgeon a better vision and thus improve the quality of the surgery.’

Group passport – Language Technology Lab

Research fields: natural language processing, machine translation, summarisation, question answering, language modelling, image captioning
Institution: Informatics Institute of the University of Amsterdam (UvA)
Website

By Bennie Mols
Images Ivar Pel