Motivated by giving meaning
Giving meaning to data. That is the golden thread in the work of Geert-Jan Houben, Professor of Web Information Systems and Pro Vice Rector Magnificus Artificial Intelligence, Data and Digitalisation at Delft University of Technology. As chair of the Data Science Platform Netherlands, his mission is to stimulate fundamental research around data to create systems, techniques and methods to make data science practitioners more effective.
How did you end up in computer science?
‘In high school, I was good at Mathematics, which I also liked a lot. I had already set my mind on studying Mathematics at Eindhoven University of Technology, when my uncle, who worked there at the time, pointed out that they were going to start a new programme in Computer Science. The programme promised to be a combination of logic, reasoning and structure, which immediately appealed to me. So, I decided to enroll.
In fact, my first year at TU/e turned out to be the first year ever that Computer Science was taught as a separate field of study at Dutch universities. It was a lot of fun to be part of that first generation: there were many interactions with the lecturers, and everyone was dedicated to making this a success.
During my studies, I got intrigued by databases and how to use them as efficiently as possible. I performed my final project in a research group that worked on information systems in a broad sense. The necessary translations between the unstructured, real world and the structured world of databases fascinated me. So, I decided to pursue a PhD in this topic.’
When and why did you decide you wanted to become a scientist?
‘For several years, I combined my work in academia with positions in consultancy around information systems. In the end, I realised that if I wanted to make a real career, I would have to choose either one. I went for academia, since it provides you with the freedom and responsibility to pursue your own interests and to change course completely every once in a while. In addition to that, what I like about being a scientist is that you are in constant interaction with younger generations who you can guide along their personal career paths.’
What is your research about?
‘Over the years, the golden thread in my research has been semantics, assigning meaning to data. The central question is: How can we best retrieve, process and interpret data generated by humans and (software) machines? Where I started out working on databases, that quite naturally evolved toward web-based systems. But essentially, the challenges and research questions haven’t changed: I still make connections between what has been coded in software and how people interpret that. I want to understand what human‐ and machine-generated web data represents in terms of people’s actions, interests, intents, and behaviours on the web, and to develop new solutions to meet the fundamental challenges in how systems effectively attribute and exploit semantics for data.
In my group at Delft University of Technology, we both look at what data is needed to realise the transfer of certain information, and at the demands of and interpretations by the users of those data and their meaning. For example, we looked into MOOCs and how to best support students in navigating them. The main question on the data science side often is if it is possible to deliver the right data and insights in an efficient way, and on scale. To do that, it is essential to determine what data is needed in the first place, and if the system is able to learn along the way about the context of usage. For data to be used effectively, meaning and context are key. Especially in the current era where there is a proliferation of AI-based solutions, understanding the setting and context in which data is collected and used, is of utmost importance.
Since data science essentially is about the transfer of meaning and insight, we also look at the user side of the equation. For example, we look at chatbots, where we use dialogues to better understand queries and enhance information retrieval. The hardest question there is how we can gain information from the implicit, non-spoken part of the dialogue.
‘Over the years, the golden thread in my research has been semantics, assigning meaning to data. The central question is: How can we best retrieve, process and interpret data generated by humans and (software) machines?’
You have been one of the founders and still are the chair of the Data Science Platform Netherlands. What do you aim to achieve with that organisation?
‘DSPN was the first Special Interest Group of IPN. We observed how data science as a separate subdiscipline of computer science was gaining traction. Computer science researchers were curious how the field would develop, and how they should relate to the topic. We launched a joint platform to exchange best practices and viewpoints on this emerging field. It was only logical to do this together with IPN.
Our initial aim was to discriminate the scientific approach from the hypes around what was initially called big data, and to unite scientists that are working on data, ranging from databases to data processing, data analysis and information retrieval. That has worked out rather well. People have found each other, and new connections have been made.’
What is on your agenda for the coming years?
‘The demands on data management have changed drastically as a result of the rise of machine learning techniques. Analytics and queries, data processing, data retrieval, and data repositories are now seen in new contexts, and new preconditions apply. At the moment, data science and AI are often mentioned in one breath. It is great what we can do with AI on a technical level, but now the challenge is to understand the role of data in this process. Can people determine where the data underlying some AI model is coming from, and how the AI leads to a certain effect? Now, AI is often based on data sets that happened to be available. The question is if and how AI improves if you base it on better data sets.
Other questions that have become more urgent are if we can make data better accessible at large scale, and if we can improve on aspects such as the security of data.
At the moment, there is an increasing amount of people who are working with data. But to make the most of it, you need to know what you are doing when it comes to data collection, processing and analytics. That is where data scientists come in. So, my message to the broader computer science community would be: involve us in your plans to profit from our knowhow, and let us help you make your data-driven software components and systems better.’
Photo: Sjoerd van der Hucht
