ICT with Industry 2019 will be held from January 21 to 25 in the Lorentz centre in Leiden.
The call for participation has been published here.
The preliminary programme can be found here.
We have selected five cases for this edition of ICTwI, which are detailed below.
KB (National Library of the Netherlands)
(Semi-) Automatic Cataloguing of Textual Cultural Heritage Objects
Juliette Lonij en Martijn Kleppe
In collaboration with Iris Hendrickx, Radboud University
The KB | National Library of the Netherlands has been digitizing its collections at a rapid pace for a number of years now. Large amounts of scans and machine-readable text created from e.g. historical newspapers, periodicals and books are made available to end users through portals such as Delpher. At the same time, the amount of content deposited by publishers or harvested from the web in digital form, such as e-books, e-journals, and web pages, is growing quickly as well.
Rich and accurate descriptive metadata, ranging from title and author on the one hand to specialist scientific subject headings on the other, form an essential prerequisite for enabling users to effectively navigate these collections. The current practice of creating such metadata manually, however, has become prohibitively time-consuming and, in some cases, prone to error. We therefore invite researchers to explore possibilities for automatically extracting relevant metadata from the objects in our digitized and born digital collections, using methods and techniques from the field of Artificial Intelligence – the subfields of Machine Learning and Natural Language Processing in particular.
Detecting and classifying damage to traffic signs from images
Jeroen Delcour and Frank Thomson
In collaboration with Efstratios Gavves, University of Amsterdam
At 139,294 km long, the Netherlands has one of the world’s densest road networks. Maintaining such a large network of not only roads but also surrounding infrastructure such as traffic signs can be very time-consuming and expensive if not done efficiently. In the case of traffic signs, video along with the camera’s position in the world is recorded from a car. Trained inspectors look through these videos, annotating any damage to the traffic signs they see. Needless to say, this is a very arduous task. Additionally, there is a constant shortage of trained inspectors.
The goal is to automate the detection of damaged traffic signs from video. It is a challenging case since it involves a wide range of possible damages. The signs could be poorly legible: e.g. obscured by vegetation, covered in stickers, or faded over time. Additionally, some types of damage aren’t visible on the sign itself, but involve the angle of the sign relative to the road it’s associated with (i.e. it should face the road) or even the angle of its supporting pole (such as after a collision with a car). Perhaps most challenging to detect is deformation of the sign, often as a result of fireworks.
We believe reliably detecting this wide range of damage in a real-world scenario to be a challenge involving a number of sub-disciplines. Most notable are computer vision and machine learning, but we are open to creative approaches from other fields. As such, the lessons learnt from this challenge may be beneficial to an equally large number of sub-disciplines and could provide inspiration for further research into solving several computer vision tasks, such as classification problems where classes are very similar or their differences are poorly defined, and orientation estimation both in 2D and 3D.
An interesting consideration is the practical application of the result: the goal is to reduce the amount of video trained inspectors have to look at. As such, it is more important to achieve high recall than high precision. This provides a twist to the challenge and sets it apart from traditional image classification tasks common in academia.
Opening the black box of user profiles in content-based recommender systems
David Graus, Dung Manh Chu, Maya Sappelli, Bahadir Cambel, Philippe Bressers
In collaboration with Nava Tintarev, TU Delft
Personalized experiences powered by recommender systems have, after years of being a mostly academic endeavor, finally permeated our daily lives. Whether it is through personalized recommendations in web shops (e.g., Amazon), personalized media consumption (e.g., Spotify, Netflix), search engines, or virtual assistants (e.g., Apple Siri, Amazon Echo). However, driven by data breach incidents and ad-driven business models, we’ve recently seen a rise in distrust and skepticism around the collection of personal data—a requirement for recommender systems. In addition, the GDPR has generated an increased interest in aspects of explainability and transparency of black-box machine learning algorithms and models.
In the wake of the aforementioned developments, explaining recommendations is a task that is generating a lot of interest in the recommender systems community. Typically, (content-based) recommender systems operate through matching item-features (i.e., attributes of items such as topics, authors, tags) to a user profile, typically constructed from aggregated item-features of the items consumed by a user. There exist multiple purposes for explaining recommendations. Whereas earlier efforts focus on explanations for transparency (i.e., explaining how the system works), e.g., by exposing structured representations of user profiles. More recently efforts are focused on effectiveness (i.e., helping users to make decisions), e.g., by showing a user why an item matches their profile. In addition, further criteria for designing recommendation explanations are scrutability (allowing users to tell the system is wrong), and trust (increasing a user’s confidence in the system).
In this project, we aim to explore methods of explaining one aspect of how our content-based recommender system works: the user profile. More specifically, we aim to automatically summarize and visualize the recommender system’s high dimensional internal representations of users. These profiles are
automatically constructed from their reading behavior, by leveraging attributes of items, e.g., topics, entities, and tags. Effectively summarizing and visualizing the recommender system’s internal user profiles may find applications in increasing or enabling:
Urgent, or can it wait? Personalising push for Algemeen Dagblad
In collaboration with Menno van Zaanen, Tilburg University
Everyday, the Algemeen Dagblad (AD) produces over 500 news articles. When the author of an article is finished writing their article, they typically publish it online, through websites and apps. Currently, that is also when a team of editors decides whether the article should be pushed to our users. These editors have a tough decision to make. Tough for at least two reasons.
First, our AD app users are currently subscribed to very broad categories for which they wish to receive push notifications. For instance, users can express that they want to receive push notifications for “Sport”. So anytime an article about “Formula 1” gets published, our editors have to decide whether this would be interesting for all our “Sport” subscribers, of whom many are subscribed to “Sport” because they are interested in “Soccer” instead. We are solving this by inferring much more fine grained user preferences from user behaviour and by recommending only articles to users that match these inferred preferences.
Secondly, articles are push-worthy because they are urgent and only relevant for a short period of time. For user that are interested in “Soccer”, the outcome of a match of their favorite soccer club may be that. Other articles that very closely match users’ interests may not be push-worthy simply because they are not urgent. Examples of these are food recipes or travel stories. Deciding whether something is urgent or whether it can wait is a hard and unsolved problem that we want to tackle in this challenge.
Very concrete, the challenge will be this. Given an article (with its metadata), predict whether it is push-worthy. We expect this to be based on 1) the predicted urgency, and 2) the predicted expiry date for an article and 3) the predicted popularity in general.
Part of the task is figuring out whether push-worthiness is dependent on the user (and their interests) as well, and/or whether it can be determined independently of the user (and their interests).
We expect this to be a supervised classification machine learning task to which the ICT community would very naturally contribute.
Our articles have the following metadata: named entities, IPTC tags, linked entities, locations, authors, location in printed newspaper, sentiment, readability, etc. Additionally, we will have the plain text available. NLP experience will not be necessarily required, but may be useful for implementing additional NLP enrichments that would serve as good predictors for the task at hand.
Machine learning experience, on the other hand, would certainly be relevant for this task. Experience with deep learning on text might be helpful. We will be able to experiment with different classifiers in an online experimentation setting at the Algemeen Dagblad.
Captioning News Footage
In collaboration with Thomas Mensink, University of Amsterdam
The RTL Nieuws team produces hours of video content every day. A typical eight o’clock broadcast contains around ten edited clips and with hourly broadcasts and online shorts this adds up to more than 30 edited clips every day. These are each compiled from multiple sources for raw footage, both from newswires and from our own reporters.
All this raw footage and edited clips are manually archived for future reuse. RTL Nieuws’ documentation department spends over 50 hours every week on adding annotations and metadata to all this footage. On the other end, editors search the archive for relevant footage around current news event.
We believe that with the right ICT solution we can make this a less labor-intensive process and potentially even increase the retrievability of the content. For this, we need a deep understanding of the video content and how to automatically generate (textual) metadata from this. We are looking for experts in computer vision and natural language generation to join our team.
In a first pilot project, we have found that current cloud solutions such as Google Cloud Video Intelligence and Microsoft Video Indexer can be leveraged to provide additional metadata tagging based on audiovisual content. However, these do not reach the level of visual detail that our archivists deliver and that our editors are using to find the context.
We are seeking to close this gap in vocabulary between the searching editors and the automatically generated metadata. A solution might be found in automatically generated video captions. As part of the ACM MM Challenge in 2016 and 2017, Microsoft Research has run the Video to Language challenge  on their Video to Text (MSR-VTT) dataset, a large-scale video benchmark for bridging video and language. The dataset contains about 50 hours and 260K clip-sentence pairs in total, covering the most comprehensive categories and diverse visual content, and representing the largest dataset in terms of sentence and vocabulary. Given an input video clip, the goal is to automatically generate a complete and natural sentence to describe video content, ideally encapsulating its most informative dynamics.