Schedule a Demo Sign In

Open Source Data Science: How to Reduce Bias in AI

The following is an article written by Abby Seneor, Chief Technology Officer, and Matteo Mezzanotte, Head of Communications. This article was published on the 14th of October, 2022 on the World Economic Forum Blog.

  • Bias is an inherent human trait and can be reflected and embedded in everything we create, particularly when it comes to technology.
  • Open-source data science could help address the issue of bias in the development of artificial intelligence (AI).
  • Taking an open and collaborative approach to data science can pave the way for a fairer and more equitable world by reducing bias in AI.

In his book Don’t Think of an Elephant, the linguist and philosopher George Lakoff argues that: “Frames are mental structures that shape the way we see the world. As a result they shape the goals we seek, the plans we make, the way we act, and what counts as a good or bad outcome of our actions.”

In this light, biases can act as an inherent frame for human beings and are present in real life. Here are some examples:

  •  A recent study identified bias in the dataset of pulse oximetry sensors, that estimate the amount of oxygen in a person’s blood. In Black patients, the sensors did not accurately measure and detect low blood oxygenation, leading to a potential increased risk for hypoxemia – a below-normal level of oxygen. Indeed, the study showed that Black patients had nearly three times the frequency of occult hypoxemia that was not detected by pulse oximetry as White patients.
  • In a novel loan application experiment conducted by the World Bank in 77 Turkish banks, 35% of the loan officers were biased against female applicants, with women receiving loan amounts of $14,000 lower on average compared with men.

The “nature vs nurture” argument is an age-old debate, especially when it comes to the subject of human bias. And while the answer is not simple, one thing is for certain: we are all biased and we embed those very same biases in almost everything we create.

Bias can even be found in technology, specifically, in artificial intelligence (AI). Look at what DALL-E mini gives you when asked to represent a “painting of a CEO founding a start-up in Europe” and count how many women you see.

          AI-generated image illustrating a “painting of a CEO founding a start-up in Europe”, highlights bias in AI as none of the images depict a female as the CEO. Image: DALL·E mini

Lakoff continues: “To change our frames is to change all of this. Reframing is social change.”

Following the frame-bias model, addressing bias in AI is therefore key to change society via a new, ethical and responsible approach to technology. It is then necessary to point out the big ‘elephant in the room’ – that has always been there, though ignored – and deal with it.

Artificial intelligence and bias

Bias in AI is when the machine gives consistently different outputs for one group of people compared to another, as outlined in the blood oxygenation example above.

Typically these bias outputs follow classical societal biases like race, gender, biological sex, nationality or age.

Bias in AI algorithms can emanate from unrepresentative or incomplete training data or the reliance on flawed information that reflects historical inequalities. If left unchecked, biased algorithms can lead to decisions which can have a collective, disparate impact on certain groups of people even without the programmer’s intention to discriminate.
— Nicol Turner Lee, Paul Resnick, and Genie Barton, writing for the Brookings Institution

Facial recognition and racial biases in health care are two classic examples. The data used to train these systems lack examples of people with dark skin colour, therefore they do a poor job of recognizing people of colour. If you put bad and poor data in, you get bad and poor data out.

The human factor relating to bias in AI

When we talk about AI, we are talking about people. Humans design AI algorithms, and humans are still the main beneficiaries of all the AI applications we use on a daily basis.

This explains why we should start considering biased AI not just as a technology problem, but rather as a human problem and thereby adopt a new perspective.

When it comes to algorithmic bias, there are two main issues that must be addressed: data and definition of success. Is the data available complete? Is it representative of all people? If that is not the case, the prediction made by the algorithm will inevitably be biased.

But in fact, the real issue is that we, as humans, and as being responsible for designing AI systems, are naturally, unconsciously biased. And debiasing humans is harder than debiasing AI systems.

Openness and collaboration is the answer

The answer to this might be open source data science (OSDS), which is maturing and changing the field of data science. Just as open source software has revolutionized the world of software, opening the code to a community of developers, OSDS can do the same in the field of data science by opening the models and the data they have been trained on.

Hugging Face, for example, represents a great example of a company built around openness and collaboration. After open-sourcing the model behind its technology – a chatbot – it became a platform for democratizing machine learning.

But what if OSDS could be used, apart from collaboration, talent attraction, and openness, for mitigating bias?

How open source data science can reduce bias in AI

Let’s, as an example, consider two companies using face recognition for two different uses – one for disease diagnosis and fashion recommendation.

These companies would definitely not be competitors. Yet, today, both are most likely developing a crucial part of their system – face recognition – in parallel. This means duplicate work and a waste of data scientists' time and money.

In an OSDS world, both companies would work on an open face detection project, contributing code and data to help battle edge cases and create a more robust solution. By doing so, they could dedicate fewer data scientists to the task compared to the current state, and have them focus on tougher, more critical problems.

Data scientists wanting to learn, contribute and showcase their skills to improve their portfolio would jump in to identify bugs, and inefficiencies and create alternative models that prioritize various metrics for different use cases, as seen in the Ethical AI Community project, created to support, or in other initiatives such as Open Future.

Collaboration key to a more ethically-driven AI

An emerging issue in the field of AI is the growing recognition of the importance and interest in solving issues relating to biases and fairness. OSDS can be a major part of the solution to correcting this worrisome phenomenon.

For example, users can access datasets and find data bugs, like missing data for underrepresented minorities. Then, collaborators can help "fix" the dataset by submitting appropriate data points, or even start crowdsourcing campaigns to contribute to them and remove the extant biases in the dataset.

In brief, the benefits of OSDS are:

  • Making high-quality resources accessible to the broader community
  • Solving the problem of reproducibility, by creating reliable open source data science projects for better results, fairer models, and more reliable performances
  • More efficient use of data scientists across projects
  • Guaranteeing a positive impact on AI transparency, diversity and inclusion

Just like Cubism was a revolutionary approach to representing reality from different views of subjects together in the same picture, OSDS brings together different perspectives, angles, and points of view of the same reality, and at the same time for a more representative and inclusive model.

Through openness and collaboration, open source data science, far from being the only answer for a more ethically-driven AI, can help reduce bias and bring more fairness and equity to the world.

Citibeats leverages ethical AI for social understanding. Gathering and analyzing unstructured data from social media comments, blog posts, forums, and more, our Sustainability and Social Risk Monitors provide insight into millions of unfolding conversations regarding inflation, protests, food shortages, and more—empowering world leaders to develop data-driven strategies and inclusive policies. 

Schedule a demo today to learn more.