The ongoing pandemic has reinforced the importance of data in making informed decisions. Data, however, comes in many forms. Very broadly, we distinguish between structured and unstructured data. Structured data comprises numerical data, for instance operational numbers in hospitals, weather related measurements, financial reporting, or anywhere else where numbers play a role. Unstructured data comprises data derived from text, speech, images, or any other form of human communication.
Increased interest in Artificial Intelligence (AI) derives largely from the growing success over recent decades in the automatic analysis of such unstructured data by use of machine learning techniques. The field of Natural Language Processing (NLP) has developed increasingly powerful methods to automatically analyse large collections of unstructured textual data, originating from a wide variety of sources such as news reports, patient health records, scientific articles, enterprise documents, emails, social media, etc.
NLP is, in fact, one of the oldest branches of AI research, which started back in the fifties, first in the USA, with a specific interest in automatic methods for translating Russian documents into English. The first international scientific meetings between NLP researchers started in the early sixties. Since then, NLP has grown into a broad field of research that has contributed to the development of everyday household and business tools such as search engines, chatbots, machine translation services, sentiment analysis and opinion mining, among many others.
At the Data Science Institute, we are developing innovative approaches to improve such applications even further, as well as developing completely new ones. For instance, in the context of the EU funded Pandem-2 project, we develop methods for the automatic extraction of suggestions from social media to improve two-way communication between government agencies and the public.