Impact & Opinions | Tionchar & Tuairimí

Data ethics – who cares?
edition-image
Equality

Data ethics – who cares?

28 April 21
 | 
0
(0)
 STARS
 | 12 MINS

We know Artificial Intelligence and big data can impact on equality, diversity and inclusion, but we must face up to the reality that it’s now at a level which cannot be handled.

While new approaches for data science and data analytics have a lot of positive impact, they need to properly reflect the diversity in the world so as not to further amplify existing inequalities.

Data is everywhere: In our pockets, in board rooms, in court, on the road. More data is produced everyday than ever before, and while in many cases it is not clear if any of it will ever be used, more data is being processed than ever before.

Actually, that’s great. As scientists, we are trained to believe in data, as a cold, objective, un-manipulated, unpolitical thing. It is the raw material we work with. There is nothing outside of data: All has to come from the data and be justified by the data. Sure, we know that no data is ever created perfect, but we can control how it is collected and what is done with it so that those minute imperfections can be safely ignored. Or at least we could.

As scientists, we are trained to believe in data, as a cold, objective, un-manipulated, unpolitical thing. It is the raw material we work with. There is nothing outside of data: All has to come from the data and be justified by the data.

The thing with data now is that it is not controlled. The scale of it, the speed at which it is created and the environment in which it emerges means that we can no longer treat data as something we carefully collect so it is safe to use.

It has become a liability. Because it is so vast, the small issues of noise, imprecision and bias, that previously we could control and integrate in our protocols, are now amplified to a level that cannot really be handled.

Real life examples of how that affects what is done with data are all over the place.

The US justice system has been known to use prediction algorithms that are clearly biased against ethnic minorities. Face detection and emotion recognition systems are known to work much better for adult, white, neurotypical men than for anybody else. Even the recent debacle around calculated grades for A Level examinations in the UK is an undeniable example of using data at a scale that makes it uncontrollable.

The US justice system has been known to use prediction algorithms that are clearly biased against ethnic minorities. Face detection and emotion recognition systems are known to work much better for adult, white, neurotypical men than for anybody else.

When those things happen, it is easy to blame the “algorithm”. But the algorithm is really only there to find signals in the data that it can use. Either those signals are there, or they are not. If they are, the data is to blame.

The issue is that we cannot justify ourselves any longer by saying “garbage in, garbage out”. The data is not garbage, but it exists at a scale that makes any minute imbalance, any noise, any assumption made during its collection, and any societal bias it reflects, a very big deal. That’s because this phenomenon is not only present in highly visible cases like the ones above. It affects everyday life.

Try typing “nurse” on Google Image search and count the number of men you can see in the results. It might depend on where you are, but I would bet that there won’t be any in the first 10 results. Now try “surgeon”.

It is hard to blame Google for this. The data is the data, and the data says women are nurses and (white, middle-aged) men are surgeons. We are to blame for this since it is reflecting our prejudices. That signal emerged from our behaviour. The algorithm is only amplifying it, but that amplification can have devastating consequences in the long term.

It is a vicious cycle: How can we ever get away from those stereotypes, or how can we ever move to a more diverse society, if those stereotypes are constantly being restated whenever we go online.

We are being recommended movies, music or books based on what other people like us have enjoyed, but how can we ever enjoy anything else if you can’t even see it.

Our kids are recommended toys, videos and social media accounts based on what others of their gender, age and location bought, watched or followed, but how can they ever grow out of those categories if it is all that is ever available to them?

Answering those questions is at the forefront of research in data science and AI at the moment, and it has to be done through embedding diversity in research methods.

Data ethics is a societal challenge to be addressed in a multidisciplinary way. It involves devising new methodologies that integrate ethical questions at the time of designing the process that will manipulate the data, i.e. methodologies that effectively ask the question “what’s the worst that can happen?” so we can include safeguards and warnings into those processes. It also requires the use of the very same data science approaches to inspect the data and identify biases that could lead to ethical misuse of those data.

In a more technical way, part of those challenges is what the area of Explainable Artificial Intelligence (XAI) is trying to address. XAI is about providing an explanation for every decision made by an AI or data analytics process, so the validity of the decision can be verified.

This is a very complicated thing to do however: Since those models increasingly rely on thousands of data points, in networks of millions of connections, automatically coming up with an explanation which is both valid and understandable can be a challenge.

Artificial Intelligence and data science have to be developed in a multidisciplinary way, by involving experts not only in computer science and statistics, but also the people who understand the domain, the data and the potential impact the use of the data can have.

An explanation has to be grounded. It has to connect what is being explained with known phenomena in the domain. This is why artificial intelligence and data science have to be developed in a multidisciplinary way, by involving experts not only in computer science and statistics, but also the people who understand the domain, the data and the potential impact the use of the data can have.

There has to be someone asking “does it make sense?”

Because it might not, and that might be an issue for Equality, Diversity and Inclusion.

Originally from France, Mathieu D’Aquin is Professor of Informatics specialising in data analytics and semantic technologies at the Data Science Institute and the Insight Centre for Data Analytics at NUI Galway. He is also a member of Irish Epidemiological Modelling Advisory Group, which provides advice and forecast models to the National Public Health Emergency Team.

RATE

0 / 5. Vote count: 0

Discover More

Keep up to date on the latest from us straight to your inbox

Privacy policy