>> Ideas, Problems, and Resistance Methods Surrounding Data Biases

click here for a plain text file of this page →

Intentions, Experiences, and Encounters with Data

This piece of writing results from four-ish years of working, academically researching, teaching, organizing, and sharing my knowledge about data and computational technologies. It is essential to situate myself and this work’s emergence within a larger context as all my experiences and views came from somewhere or someone. This is for my family and friends and for whoever comes across this. Data is not just for computer scientists to think about, your data is something you should have control over, and it is important to talk about. I hope this helps us have more conversations.

My interest in data started in 2010 when I worked part-time in a data entry position. The repetition and monotony of data entry created a peaceful place for me. Mathematics (one of my undergraduate majors) is about order and patterns. I also found peace here, as I had understood numbers and data as being so disconnected from my own life.

In my first experience working as a data analyst, I was tasked with analyzing environmental data and creating a more consistent way of collecting data. Carla, an Environmental Specialist and my first supervisor as a data analyst, made sure that I knew the data I was working with and the people who created it. I was in the Environmental Protection Department, and I did not understand why I needed to meet with people to work with environmental data – the data should be neutral, there should be no bias, it’s not about people, right? I was working with data of all kinds, and I was sent to meet the researchers who had collected it before I could access it. Looking at the measurement tools and where they were placed around the plant showed me that data is not just numbers and could never be neutral. There was somebody who decided it was important to measure the temperature at the plant. Somebody also decided to measure it with this one specific device made in the 60s with an instruction manual that was only in German, and somebody decided where to place this device.

Data does not exist without us. We create it, give it meaning, and use it for a reason. Data and the uses of data about people are never purely technical or neutral. They are very much intertwined into the cultural, socioeconomic, and political contexts in which they were produced. Data collection, too, is inherently political. The act itself, the appropriation of life, objects, and more in the form of numbers, can be seen as part of an ideology.

My background and experiences with data go much further in the past, but so does yours. Data is all around us. If you have a birth certificate, you have experience with data. From the moment you were born, you were already categorized into a binary category of “male or female”; if not, maybe you were categorized into a binary “woman or man. For some, this may be fine, but for the 1.7% of the population (Its Intersex Awareness, 2018) who are Intersex, being categorized and constrained into this binary can be harmful and have lifelong effects. Biological sex can be seen as “not a coherent category” (Albert and Delano, 2021), and a decision was made to see it as one. This is just one example of an encounter with data and a problem with modern data practices. The rest of this writing will discuss where data is and isn’t and steps we can take towards better futures alongside data.

An Example: Datagénero

DataGénero is an organisation that seeks to build a sustainable and inclusive data future from and for Argentina. They focus on the data practices that directly affect the lives of women and LGBTQ+ people in Latin America and advise individuals, governments and organisations that work with data. Their work focuses on the specific and cultural contexts in Argentina and Latin America, and they work with una perspectiva de género (a gender perspective, DataGénero).

A gender perspective in the context of data work means considering gender and other identity intersections, including race, class, sexuality, and ability, at all stages when working with data. This is important when collecting data about people because you almost always encounter gendered experiences. Social and gender norms can be represented, resisted, and reproduced in all social relations. Because of this, it is crucial to consider the dynamics of gender and its intersection with other inequalities when working with people and data about people. The work that DataGénero is doing in Argentina and Latin America is important because it makes certain identities, experiences, and labours visible in the form of data. It is also necessary for policymaking, sharing resources, and providing access, specifically in Argentina, where lots of data grouped by gender does not exist yet. Because numbers are seen by the people in charge as more valid than individual stories, collecting data can be instrumental when advocating for policy and policy changes. It can also help to know the data to more appropriately direct funding and resources.

When working with data, considering different intersections of identities, as DataGénero does, when working with data is crucial because many of us do not live “single-issue lives” (Lorde, 1984). When intersectional data is missing, it can have real-world meanings for funding, policy-, and access-related decisions, but it does not mean that people are missing. I bring this up because there is some discussion around data and “invisible people”. The term “invisible” refers to the lack of data about a certain group of people. For example, in a call to collect more gender data, Plan International had put out a report entitled “Counting the Invisible.” Similarly, a book about the lack of data about women was titled “Invisible women” (Criado Perez, 2019). This lack of data is understood to render people invisible. I do not agree with this term, nor do I completely agree with the statement by King et al. that “where data are missing, people are missing” (2020). Instead, I believe that it is not the people that are missing but rather a representation of their experiences. The difference here is that even without data, a person can be seen, heard, and respected. Again, these are all decisions that people are making regarding data, who is represented and who isn’t. These are all biases.

When talking about data, the lack of data grouped by gender is referred to as having a lack of “gender-disaggregated data”. To aggregate data means to group data based on a specific characteristic, often for the purpose of a summary. To disaggregate data is to separate the data based on a certain characteristic. Closer to home in Canada, there was no effective pandemic response for racialised communities and people because of the lack of race disaggregated during the year 2020 of the COVID-19 pandemic. Race is just one identity intersection that can be considered when taking an intersectional approach. Other identity intersections are also often not collected in data, and with this layering of missing identities, even more experiences are not being represented. The constitutional principle of Laïcite in France “guarantees the neutrality of the State” (Gilbert & Keane, 2016). Because of this principle, it fails to collect disaggregated data by ethnicity or religion. Again, making the decision not to include certain identities is due to a bias.

The work that DataGénero is doing is a positive example of how to work with data about people. It is local, contextual, and the organisers have experience in Argentina. The work also raises a crucial of data bias. Decisions about what data to collect, how to collect it, how to share it, and how to interpret it all have an inherent bias. The interpretation of data can be seen as a very powerful tool. Creating and collecting data can be used to create positive change, as what DataGénero are doing, for liberatory movements, or just for fun. However, data collection and interpretation can also be used in harmful or straight-up creepy ways. When considering and critically engaging with data around you, you might not even be aware of how much of your life is being collected as data.

When collecting data about people, many decisions need to be made, and we know this creates a bias. Data about people can also be harmful due to it often being thought of, and treated as, a commodity. The saying “if the product is free, you are the product” rings true with how social media and advertising today are structured, and this statement is very much linked to data collection.

In 2017, it was reported that 98.5 percent of Facebook’s revenue came from advertisements (Dillet, 2018). The way this is linked to data, is that Facebook collects incredible amounts of data from you, puts you into categories based on this data from your online behaviours, and then sells this information to advertisers. It is sold in a way that doesn’t look like you are buying the data about people; it is sold as a way for people to choose who they would like to advertise to. Furthermore, because people can choose who they would like to advertise to, they also have been able to choose who they would not like to advertise to. This can hide jobs, housing, and credit opportunities from marginalized groups, which is a form of discrimination. Having nothing to hide does not prevent you from being affected by what happens with so much data collection. And just because it doesn’t affect you does not mean you should not care about it.

Being grouped into categories based on data about you is very prevalent. For example, Amazon, an e-commerce company, uses “real-time personalization and recommendation” by collecting a lot of data about you, from how many seconds you spend looking at an item, to where you are located, and much more. These decisions to collect this data about only certain things about you were made with the company’s goals in mind. Even though you might get a more personalized experience, you are just being tricked into buying more. With these choices in what to collect from you, Amazon can recommend products for you, sometimes even before you know you want them. This isn’t quite machine learning or artificial intelligence yet. It groups people into categories based on the data that is collected.

Similar experiences have happened at Target, a department store in the United States, over ten years ago. Based on the consumer data that Target had collected, data analysts predicted a pattern that pregnant people often switch to unscented lotion. Because of this, Target would send “baby coupons for baby items” to customers based on the prediction that they might be pregnant because of their recent shopping changes. This led to customers feeling “creeped out” and learning about pregnancies they were not previously aware of (Hill, 2012). It would be a bit creepy for a company to know something so personal about a consumer without them sharing it, right? Unfortunately, the data that is about us today goes much further than our shopping habits, and a lot of it is collected without us being aware. Data about us is seen as a commodity; some have even called data “the new oil” (Joris Toonders, 2014). Yikes – but also in the same way that oil is taken out of the land and extracted in harmful ways, data is also taken without consent.

Consent is an essential topic in any interaction, and it is also crucial when thinking about data in our lives. Depending on our location, our jobs, our health, the companies that we engage with, and more, our data is constantly being collected by many different groups. Data can be collected and used with our consent, or without our consent. Facebook, for example, collects a lot more data than just what we decide to share on the platform, so they can sell more to advertisers. An example of data collection with consent is sharing your email address to receive email updates from an organization or company you are interested in. In this scenario, you know which data you are sharing and with whom, and we imagine here that it is not taken from you without you knowing. However, this consent is limited in the way that you were probably not be told if your email is shared with others if interest in this company can prevent you from accessing another company, how and when you can remove the consent and get your data back. These are important questions to ask when sharing your data, or when collecting data.

In the EU, there is the General Data Protection Regulation (GDPR and the UK-GDPR, 2018) which governs the processing of individual data. This is meant to work towards more equitable data practices and make sure one can remove consent and data they shared with others. With this regulation, one often sees a pop-up when visiting a website asking for consent to collect different types of data from the user. In this case, the user should have the ability to refuse their data from being collected or to accept that their data is collected. However, in many of these pop-ups, the button or way to refuse to share data is much more difficult to find. As a result, users can feel frustrated when finding information, they need from the website and reluctantly accept sharing data rather than enthusiastically and voluntarily. For example, when searching on Google, it takes four more clicks to refuse data collection than it does to accept sharing your data. While it is a step in the right direction to ask before taking data, it can feel annoying to click through the complicated policies to decline data collection, influencing the user to share their data unenthusiastically. Consent should be enthusiastic and positive, and as a friend of mine, Cheska, said, “consent should be easy” (Lotherington, 2022).

So much of what we consume and what we do is represented in data now. It is hard to avoid it, but it is better to be aware of what is going on.

Critical Approaches to Data

To summarise, I believe that contextualising the data, your relation to the data, and recognising the various powers and biases involved with the data can be the best way to resist extractive data collection and be aware of the potential harms. In the same way I needed to be able to read the Table of Nuclides (I would call this the hulk version of the periodic table) before working with the data at Canadian Nuclear Laboratories, thinking of the bigger picture and your relations to the data and who is collecting the data are how to think of data critically. Thinking about Data Justice is another one of the ways.

“Data Justice” goes further than just collecting data from more people. Data Justice demands that the lives and experiences of those being represented in the data be prioritised. This means considering a gender perspective and different intersecting identities. Data Justice also ensures that everyone involved in the project with data is held accountable and that power relations are considered when working with data. This means considering data biases, consent, and the commodification of data, among other concerns as mentioned previously. Finally, data justice works to make sure collecting data is not so extractive, unlike taking oil from the ground.

Data Justice may seem a bit abstract, but it can be applied to all data practices and interactions to engage critically with data. Just remember, data does not exist without us, data is never neutral. Try to have more conversations about data - or just more conversations about anything, because there is nothing that could replace human interactions, not even with all the data in the world.

Reference List

Albert, K. and Delano, M. (2021) “This whole thing smacks of gender: Algorithmic exclusion in bioimpedance-based body composition analysis,” in Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency. New York, NY, USA: ACM.

Angwin, J., Tobin, A. and Varner, M. (no date) Facebook (still) letting housing advertisers exclude users by race, ProPublica. Available at: https://www.propublica.org/article/facebook-advertising-discrimination-housing-race-sex-national-origin (Accessed: March 31, 2022).

Criado Perez, C. (2019) Invisible women: Exposing data bias in a world designed for men. London, England: Chatto & Windus.

DataGénero (no date) Datagenero.org. Available at: https://www.datagenero.org/datag%C3%A9nero (Accessed: March 30, 2022).

D’Ignazio, C. and Klein, L. F. (2020) Data Feminism. London, England: MIT Press.

Dillet, R. (2018) Facebook knows literally everything about you, TechCrunch. Available at: https://techcrunch.com/2018/03/23/facebook-knows-literally-everything-about-you/?guccounter=1 (Accessed: March 31, 2022).

Escobar, A. (2018) Designs for the pluriverse: Radical interdependence, autonomy, and the making of worlds. Durham, NC: Duke University Press.

Gates, M. F. (2020) Sexist and incomplete data hold back the world’s COVID-19 response, Bill & Melinda Gates Foundation. Available at: https://www.gatesfoundation.org/ideas/articles/stat-melinda-gates-sexist-covid19-data (Accessed: March 30, 2022).

Gebru, T. et al. (2021) “Datasheets for datasets,” Communications of the ACM, 64(12), pp. 86–92. doi: 10.1145/3458723.

General Data Protection Regulation (GDPR) – official legal text (2016) General Data Protection Regulation (GDPR). Available at: https://gdpr-info.eu/ (Accessed: March 30, 2022).

Gilbert, J. and Keane, D. (2016) “Equality versus fraternity? Rethinking France and its minorities,” International journal of constitutional law, 14(4), pp. 883–905. doi: 10.1093/icon/mow059.

Hill, K. (2012) How target figured out A teen girl was pregnant before her father did, Forbes. Available at: https://www.forbes.com/sites/kashmirhill/2012/02/16/how-target-figured-out-a-teen-girl-was-pregnant-before-her-father-did/?sh=66a059936668 (Accessed: March 31, 2022).

Its Intersex Awareness Day - here are 5 myths we need to shatter (2018) Amnesty International. Available at: https://www.amnesty.org/en/latest/news/2018/10/its-intersex-awareness-day-here-are-5-myths-we-need-to-shatter/ (Accessed: March 31, 2022).

Joris Toonders, Y. (2014) Data is the new oil of the digital economy, WIRED. Available at: https://www.wired.com/insights/2014/07/data-new-oil-digital-economy/ (Accessed: March 31, 2022).

King, C. et al. (2020) “Addressing missing data in substance use research: A review and data justice-based approach: A review and data justice-based approach,” Journal of addiction medicine, 14(6), pp. 454–456. doi: 10.1097/ADM.0000000000000644.

Lorde, A. (1984) Sister Outsider: Essays and Speeches. Penguin Classics.

Lotherington, C. (2022).

McKenzie, K. (2020) RACE AND ETHNICITY DATA COLLECTION DURING COVID-19 IN CANADA: IF YOU ARE NOT COUNTED YOU CANNOT COUNT ON THE PANDEMIC RESPONSE. Available at: https://rsc-src.ca/en/race-and-ethnicity-data-collection-during-covid-19-in-canada-if-you-are-not-counted-you-cannot-count.

Real-time personalization and recommendation (no date) Amazon.com. Available at: https://aws.amazon.com/personalize/ (Accessed: March 31, 2022).

Thompson, E. et al. (2021) “COVID-19: A case for the collection of race data in Canada and abroad,” Releve des maladies transmissibles au Canada [Canada communicable disease report], 47(7–8), pp. 300–304. doi: 10.14745/ccdr.v47i78a02.

Tierra Común (no date) Tierra Común. Available at: https://www.tierracomun.net/en/home (Accessed: March 30, 2022).

Zuboff, S. (2020) The age of surveillance capitalism the age of surveillance capitalism: The fight for a human future at the new frontier of power. New York, NY: PublicAffairs.