Discrimination and Biases in the Digital Age: Examining the Concept of Community Non-Personal Data
Updated: Oct 25, 2020
- Sriya Sridhar*
On July 22, 2020, the Committee of Experts on Non-Personal Data Framework (‘Committee’) released the draft report containing its recommendations on the governance of non-personal data (‘NPD Report’ or ‘Report’). Briefly, non-personal data (‘NPD’), in contradistinction to ‘personal data’, is defined as data that cannot identify a natural person or does not relate to an identifiable person. It may also include personal data or aggregated data that has been anonymized, to the extent that “individual specific events are no longer identifiable”. The NPD Report comes at a time of intense public conversation and debates surrounding the recently released draft Personal Data Protection Bill, 2019, which has raised serious questions on the scope and adequacy of personal data protection in India.
The Committee recommends that there be a separate governance framework for NPD, given the monopolization of user data by large technology companies, and the benefits of sharing large datasets for the benefit of citizens, MSMEs and startups. In turn, this would lead to a more competitive data landscape, promoting innovation and increasing transparency.
The Committee proposes the division of NPD into three categories, which are private, public, and community NPD. Private NPD is defined as “NPD collected or produced by persons or entities other than the governments, the source or subject of which relates to assets and processes that are privately-owned by such person or entity, and includes those aspects of derived and observed data that result from private effort”. Public NPD is defined as “NPD collected or generated by the governments, or by any agency of the governments, and includes data collected or generated in the course of execution of all publicly funded works”. Community NPD is defined as “any NPD collected or produced by anonymized personal data, and non-personal data about inanimate and animate things or phenomena – whether natural, social or artefactual, whose source or subject pertains to a community of natural persons which shall not include Private Non-Personal Data.” The concept of community NPD and group privacy has been introduced in India for the first time and has far-reaching legal and social implications.
Currently, the fundamental right to privacy as well as consent frameworks in India are conceptualized on an individual level. However, the provisions for ‘informed consent’ from an individual lack meaning in a situation of a power imbalance between the individual and a corporation which stores data permanently, aggregates the data, and uses it for a variety of purposes. Individuals do not have a sense of the true externalities that come with sharing their data. They are not aware of how one instance of data collection affects broader society, how it can be used to create patterns, and the biases or discrimination that the input of one’s data can lead to. Moreover, an individual does not truly have the consenting power when their only option is to opt-out of a system that is biased against them. An opt-out does not affect the data processing entity on any meaningful level. In this situation, recognizing data privacy on a collective level can help address existing power imbalances, and more meaningfully address discrimination and bias in digital spaces.
This essay will examine the concept of community NPD and collective rights to data, and critique the conceptualization of these rights. The conceptualization of these rights leaves much room for ambiguity and does not truly address the drawbacks and discrimination resulting from a framework of purely individual consent. The framework must ensure that collective privacy and consent are safeguarded and that the rights of communities, consisting of religious and gender minorities and other marginalized groups, are simultaneously protected.
I. Defining a ‘Collective’ in the Age of Algorithms
In the current digital landscape, much of data, either personal or non-personal, is processed on an aggregate level. This method of aggregating data often eliminates the individual element, which in turn limits the control that individuals have over the processing of their data, in terms of providing their informed consent. For example, social media platforms and search engines aggregate data for targeted advertising. The process of ‘anonymization’ is therefore a key source of revenue for corporations, and data for governments. Anonymization and aggregation pose a risk to individuals as one may not be aware of how the submission of one’s data has a broader effect on a community. For example, an act of engagement with a certain ‘liberal’ article or viewpoint on social media can be aggregated to determine the kind of products that are marketed to an individual. When such data is anonymized on a large scale, algorithms create patterns and predictions about groups of people and communities, that may be far removed from the data that was processed on an individual level.
A study conducted by researchers at Northeastern University in the USA found that even without the element of human bias, the advertisement delivery algorithm of Facebook determined audiences, to show job ads to, based on gender and race. For example, advertisements for nurses and cleaners, among others, were shown mostly to women users and advertisements for fast food workers and cashiers, among others, were shown to male, African-American users. Therefore, data analytics and algorithms can define groups not only based on commonalities, such as gender or caste, but on the basis of ‘ad-hoc’ categories, such as consumption patterns and online behaviour. Then, for instance, it is reasonable to conclude that an individual user, who purchases women’s shoes or cosmetic products, may be put into an ad-hoc category. This ad-hoc category will then link the individual to advertisements that the algorithm determines to be relevant to employment advertisements. This, in turn, has the consequence of perpetuating gender biases and discrimination in employment, which is more damaging for marginalized communities, gender minorities, and those from a lower socio-economic background.
While the Committee’s recognition of the possibility of collective rights based on the collection of NPD is welcome, there is much left to be desired with respect to defining such a community. The definition provided in the NPD Report is wide, with communities being defined as groups of people “bound by common interests and purposes” and “involved in social and/or economic interactions”. An example of such a “common interest” could range from a commonality of religion to data about travelling on a particular roadway, or those from a particular locality affected by COVID-19. When read with the definition of Private NPD, there are several possibilities of an overlap. For example, a private entity, such as a ride-sharing application, may collect data from several individuals, a group of whom may constitute a community for any number of commonalities. Such an overlap can be particularly problematic when there is an issue of access to opportunities for housing, loans, employment, or with respect to surveillance and law enforcement.
For example, discrimination in access to housing continues to exist on the basis of religion, marital status, gender, caste, and even food consumption based on cultural differences. Article 15(2) of the Constitution prohibits discrimination on the basis of religion, race, caste, sex, place of birth with respect to access to shops, establishments, and other public spaces. However, lawyers and activists have noted that the constitutional right to equality is not supplemented by legislation that serves as a comprehensive anti-discrimination law, which covers acts that constitute both direct and indirect discrimination. Therefore, even though direct discrimination is prohibited, it is possible for such biases to permeate into the algorithms of real-estate or housing platforms or applications. These biases can prove to be more exclusionary for those minority groups, who already face other forms of discrimination. For instance, religious minorities may face bias from law enforcement, or women with respect to access to financing.
Data analytics has led to the creation of indeterminate categories, which may not even fall within the ambit of the existing framework for protection against discrimination. In a framework, where an anti-discrimination law does not exist for even determinate categories, the definition of a ‘collective’ in the digital space becomes all the more important. While the Personal Data Protection Bill recognizes that personal data may include analytical inferences, the clear demarcation of the rights of communities to make decisions about their data is crucial. This is especially relevant when the data goes through a process of anonymization and entities make use of ‘historic data’ for significant revenue generation, for example, by using it for marketing initiatives and targeted advertising. It is important that the Committee engages with the question of defining a collective, determines rights the collective would have on its data, and regulates the use of Community NPD, when integrated into algorithms, which have the potential to produce biased outcomes.
If the rights of a collective can be effectively demarcated and enforced, there is great potential for such communities of people with common interests to exercise meaningful consent against a corporation or government body that may be discriminating against them, directly or indirectly. Collective consent, therefore, has implications for a collective to enforce their right to privacy and to have their data handled in a responsible, non-discriminatory manner, for a particular purpose. To this end, the following section will discuss the concept of data trusts, which has been introduced in the NPD Report, and analyse why such a model can be beneficial. It will also highlight the need for more comprehensive legislation.
II. Data Trusts and Meaningful Collective Consent
The idea of data trusts has been explored across the world. The United Nations Conference on Trade and Development’s (‘UNCTAD’) ‘Digital Economy Report’ also speaks of the advantages of treating data as a part of the commons, due to its non-rivalrous nature, as well as the potential for its use as a public good. Essentially, a data trust is an institutional structure, where data is placed under the stewardship of a trustee or board of trustees. Trustees would have a fiduciary responsibility to the beneficiaries of the data trust, that is, the community of people who have consented to keep their data in it. The concept promotes the beneficial use of data for purposes that are catered towards public interest.
There are examples of the data trust model being implemented by both private and government entities. The UK Government has established the Open Data Institute, with the objective of managing data gathered from public spaces, for causes relating to the public interest. The UK Biobank is a repository of medical data on more than 500,000 people who have consented to their data being used for research into cures, treatment, and diagnosis of life-threatening diseases. Especially in the backdrop of the COVID-19 pandemic, there seem to be emerging applications of data trusts.
The crucial aspect of the concept of a data trust is that a collective of users can consent more meaningfully and take back control in a system that is currently skewed in favour of corporations and other entities that hold our data. Such a structure has promising applications in relation to tackling discriminatory algorithmic bias and allowing more vulnerable communities a say in how their data is used. For example, agricultural data would be a large source of non-personal data in the Indian context. Theoretically, if a collective of farmers could entrust this data to a trust that works for their benefit, the data could be conveyed to them in the right context, to help optimize processes, such as the use of fertilizers, optimum weather conditions, etc. By taking advantage of data, this can be used to aid the enforcement of their socio-economic rights. Similarly, there are applications for other vulnerable or low-income groups. For example, the government and civil society organizations have attempted to collect data on the prevalence of HIV among the LGBTQ community, as well as sex workers. This data is often underreported and not collected, due to the stigma and difficulty faced by such vulnerable communities in providing their health data without any representation or assurance of privacy and confidentiality. Theoretically, an independent data trustee, that has stewardship over the data, would be able to represent such communities and ensure that their data is used for research and public health interventions. This data trust would work with the community in ways such that the data could be collected without jeopardizing their rights. However, the NPD Report does not engage with the socially-oriented applications of data trusts.
Under the NPD Report, data trusts would be constituted to manage “important data for sector-specific purposes” in the event that mandatory sharing of the data is required by the government or by data trustees chosen to manage the data in the trust. Currently, the exact composition of data trusts and the extent of the obligations of public and private entities to place data within the trust are not clear under the report. It is also not clear what the duties of the data trustee would be, with respect to the data held within the trust. There is much to be desired in terms of a concrete shape and form of the trusts, the rules that would be applicable, and whether data trustees would have fiduciary duties towards beneficiary communities. The most pertinent question with respect to anti-discrimination is the duties that the trustee would have towards the particular community whose data is within the trust, and the extent to which the government and other corporations would have the power to dip into the trust for ‘sector-specific purposes’. Without the clear identification and definition of beneficiary communities, there is concern regarding the protection of collective privacy and ensuring that the data is not used for discriminatory purposes. Therefore, the Committee has missed a crucial opportunity to engage with questions of meaningful collective consent, use of data for the protection of communities, and their right to equality and enforcement of socio-economic rights.
An additional cause for concern is that data trustees and the government are allowed to independently determine the importance of community data in consultation with sector-specific authorities. The trustee or government authority can then request such community data directly from private entities to place it within the trust. If the data pertains to a specific community or group of people, it must necessarily follow that the community is a part of a consultation process for requesting data, processing the data, or placing it within a trust for the public to access.
The lack of a transparent framework or understanding of collective rights is worrying, given the increasing collection of both personal and non-personal data for uses, such as predictive policing and facial recognition for law enforcement. Therefore, procedures relating to consent mechanisms, rights of communities, and duties towards such communities are crucial for framing legislation around group privacy.
It bears repeating that the Committee must begin with clear definitions and scope of operation for the collection of community NPD and for forming data trusts. To understand whether group privacy and collective consent would fit within our regulatory framework, it is important for policy-makers to engage with questions of anti-discrimination, algorithmic biases, indeterminate communities, and a more robust law to protect existing categories of minority communities. By grappling with the question of how collective agency is affected in the digital economy and fleshing out the concept of community NPD more, there is a possibility of moving towards a more democratic data framework that can meaningfully address the biases, avenues for discrimination, and the inequalities perpetuated by the lack of control users have over their data.
*Sriya Sridhar is a lawyer specializing in intellectual property and technology law. She is also a pro bono researcher on Artificial Intelligence and Fairness with the Institute for Internet and the Just Society in Berlin.
 Ministry of Electronics and Information Technology, Report by the Committee of Experts on Non-Personal Data Governance Framework, <https://static.mygov.in/rest/s3fs-public/mygov_159453381955063671.pdf> accessed 16 September 2020.
 Report (n 1), s 4.1(ii).  Report (n 1), s 3.8, 3.9.  Report (n 1), Recommendation 1, p 14.  Report (n 1), s 4.4.  Report (n 1), s 4.2.  Report (n 1), s 4.3.  Anouk Ruhaak, ‘When One Affects Many: The Case for Collective Consent’ (Mozilla Foundation, 13 February 2020) <https://foundation.mozilla.org/en/blog/when-one-affects-many-case-collective-consent/> accessed 16 September 2020.  Muhammad Ali and others, ‘Discrimination through optimization: How Facebook's ad delivery can lead to skewed outcomes’ (2019) arXiv preprint, Cornell University <https://arxiv.org/abs/1904.02095> accessed 17 September 2020.  ibid.  Brent Mittelstadt, ‘From Individual to Group Privacy in Big Data Analytics’ (2017) 30 Philosophy & Technology 475 <https://link.springer.com/article/10.1007/s13347-017-0253-7> accessed 17 September 2020.  Report (n 1), s. 4.1 - 4.4.
 Rakesh Kumar, ‘India needs to bring an algorithm transparency bill to combat bias’ (Observer Research Foundation, 9 September 2019) <https://www.orfonline.org/expert-speak/india-needs-to-bring-an-algorithm-transparency-bill-to-combat-bias-55253/> accessed 16 September 2020.  Suhrith Parthasarathy, ‘The need for an anti-discrimination law’ (The Hindu, 15 June 2020) <https://www.thehindu.com/opinion/lead/the-need-for-an-anti-discrimination-law/article31828372.ece> accessed 17 September 2020.  Divij Joshi, ‘Non-Personal Data Regulation: Interrogating ‘Group Privacy’ (Center for Law and Policy Research, 30 July, 2020) <https://clpr.org.in/blog/non-personal-data-regulation-interrogating-group-privacy/> accessed 16 September 2020. Analytical inferences would include any data that is derived from the collection of personal data sets. For example, a search engine would use the collection of large personal data sets to continually improve their proprietary algorithms and the speed of search results. The derivation of this data would constitute an analytical inference.  United Nations Conference on Trade and Development, Digital Economy Report 2019, UNCTAD/DER/2019 <https://unctad.org/en/pages/PublicationWebflyer.aspx?publicationid=2466> accessed 17 September 2020.  Anouk Rahaak, ‘Data Trusts: Why, What and How’ (Medium, 12 November 2019), <https://medium.com/@anoukruhaak/data-trusts-why-what-and-how-a8b53b53d34> accessed 17 September 2020.  Open Data Institute, ‘Data Trusts: Lessons from Three Pilots (Report)’ (15 April 2019) <https://theodi.org/article/odi-data-trusts-report/> accessed 18 September 2020.  Anouk Rahaak, ‘Data Commons & Data Trusts: What they are and how they relate’ (Medium, 15 May 2019) <https://medium.com/@anoukruhaak/data-commons-data-trust-63ac64c1c0c2> accessed 17 September 2020.  Report (n 1), s 4.9 and 4.10.  Divij Joshi, ‘Non-Personal Data: Examining Data Trusts?’ (Center for Law and Policy Research, 11 August 2020 <https://clpr.org.in/blog/non-personal-data-what-is-data-trusts/> accessed 18 September 2020.  Aditi Agrawal, ‘Summary: Report on Non-Personal Data Framework released by MEITY's Committee of Experts’ (Medianama, 13 July, 2020) <https://www.medianama.com/2020/07/223-summary-non-personal-data-report-meity/> accessed 16 September 2020.  Kumar (n 13).