Cecil Abungu
12 min readMar 16, 2020

--

PROTECTING REPUBLICAN FREEDOM WHILE SPURRING THE ADVANCEMENT OF ARTIFICIAL INTELLIGENCE IN A DATA ECONOMY

Introduction

In the last five years, investment in the research and development of artificial intelligence (AI) has taken a new and urgent turn in several countries[1] as its many possibilities and advantages become increasingly apparent. The heady predictions made years back[2] now seem within reach as AI continues to make a radical difference in innumerable aspects of our lives, from work[3] to the management of healthcare[4] and the relentless drive to build driverless vehicles.[5] Those sectors give us an idea why there seems to be consensus among analysts that AI will in future have a monumental influence on which economies become dominant and which ones flounder,[6] all this shaped by the network effects that are known to secure a dominant perch for early birds.[7]

Apart from our day-to-day lives, it seems clear now that AI will also play a crucial role in national security issues.[8] A recent Harvard Belfer Centre study concluded that AI could have impact as sweeping as that which the development of nuclear weapons brought to national security issues,[9] a finding made even more concerning by a Brookings report highlighting several digital information warfare risks that the development of AI carries.[10] Many believe that the only way states can prepare for such risks is by further developing their own AI capabilities, and the upshot of all this is that AI development has inevitably become a pressing concern for the world’s leading states.

The Stakes for Republican Freedom

The most promising modern fields of AI are machine learning and its offshoot, deep learning. Machine learning involves computers learning functions from a dataset while deep learning tops that off with the use of deep neural networks.[11]The two have driven the resurgence of attention on AI as well as its advances in functions such as problem solving, game playing, pattern perception and semantic information processing.[12] In both, a program learns in either a supervised, unsupervised or reinforcing manner, and with the exception of the limited reinforcement learning, these lessons require massive training datasets.[13] Indeed, data is such a critical part of modern AI development that some experts have altogether rejected the chicken-and-egg argument and fully credited the era of big data for the AI boom.[14]

Consider that Facebook’s facial recognition algorithm required training on nearly four million datasets versus four thousand identities.[15] Training like this is the first level of machine learning and the second level (inference) cannot take place without it, so the large datasets are an indispensable part of the whole enterprise. This insatiable appetite for data is made worse by the fact that oftentimes, the strategy to solve an ill-posed problem (where the program that is learning is unable to come up with one function from the data it has been fed — it instead comes up with several) is to feed the program even more data in the hope that this will allow it to find patterns with more nuance. And although it is true that algorithms using reinforcement learning require less data, they are not very useful when confronted with real-world problems.[16]

Given how important data is to the advancement of AI, it is no surprise then that coordinated collection of big data has assumed a scale never seen before.[17] Indeed, the collection of everything from health records and social security numbers to social media posts and search-engine queries has become inescapable.[18]

The risk such collection of personal data poses for republican freedom in a liberal society is significant. I mean here Friedrich Hayek’s conception of freedom as the condition of men in which coercion of some by others is reduced as much as is possible in a society[19] understood alongside Elizabeth Anderson’s definition of freedom as the sociologically complex condition of nondomination.[20] One of the crucial ways that this kind of freedom is protected is by limiting the collection of personal data. Many people are only able to freely execute their roles as citizens (speak, vote) and move through civil society (worship centers, medical clinics et cetera) because they feel assured that some private parts of their lives will remain so. Consider especially those who do things that a community considers abnormal: the coordinated collection of their personal data, whether or not it is abused, leaves them structurally vulnerable to arbitrary and coercive exercise of power — exactly what republican freedom is against.

Yet limiting such collection of data through requiring privacy by design (as Europe can effectively be argued to have done through Article 23.1 of the General Regulation on Data Protection (GDPR)) brings up its own understudied and underdiscussed problems for republican freedom. Specifically, regulations that require privacy by design could become a vehicle for paternalistic technological management which forecloses the possibility of continuous negotiation of an individual or community’s informational interests.[21] Isn’t this the sort of domination that republican freedom is inherently opposed to?

The pressure on liberal western states to protect this republican freedom by assiduously regulating big data collection is further complicated by the knowledge that China is collecting data on an even more imposing scale without any limitations,[22] and this fear of China is deftly played on by tech leaders who have their own bread to butter. In testimony before the US Congress in early 2018, for example, Facebook CEO Mark Zuckerberg claimed that regulating the use of personal data would effectively be yielding to China the lead in developing AI.[23] A recently released report also claimed that the GDPR has cost Europe a competitive advantage in AI advancement.[24]

In this realm of personal data, can liberal states defend republican freedom from privacy infringement and domination yet still maintain a competitive edge in AI advancement? I believe so, and will in the next part propose some ideas for action.

Putting the Person back into Personal Data

1. A departure from an analytical starting point that completely commodifies personal data

Internet-based personal data is unique because it is not just created by one party (whether the internet site user or the entity that has created the site through which whatever is done morphs into useful data). For that reason, the ownership of the data is not as clear-cut as many allege. Yet at the moment, the prevalent and flawed idea upon which personal data is contracted is that the data is a commodity which the user entirely sells each time they use a site. This foundation makes it easier for republican freedom to be jeopardized, and a defense of the freedom first requires the reconceptualization of that foundation.

The answer in my view lies in the use of scholar Bill Maurer’s kinship thesis.[25] Only an analytical framework which sees data as kin can adequately cover the nature of personal data. From this starting point, data would effectively be a ‘child’ that is the result of an interaction between the individual and the entity’s program. Thus, the Hohfeldian rights over the data would not be considered to have been ‘sold’ as soon as one starts to use the site, which is why I think that scholars, data companies and regulators should all adopt this justificatory framework. The proposals that follow will explain how this can be further realized.

2. Towards Personal Data Contracts that Respect Individual Autonomy

Once we assent to an analytical framework that appreciates the complex nature of personal data, I propose that the next step ought to be a re-evaluation of personal data contracting towards more simplicity, accessibility, transparency and individual agency. The very basis for contract law is the respect for individual autonomy[26] yet internet users in liberal states are at the moment stuck between universalistic regimes that compel privacy by design and others that effectively leave the decision to the big tech companies. In both regimes, contracts for personal data are complicated and users drastically misperceive what they are consenting to.

I propose that each site be required, on first visit (and at least every twelve months) to present a user with a personal data contract that mirrors one constructed by the jurisdiction’s regulator. The contract constructed by the regulator should be written with a lay person in mind and in a tiered manner that allows a user to decide which information to share. The level of access to the site can be directly tied to the tier that the person chooses, but the contract should nudge the individual towards choosing to share most data (for the sake of AI advancement). Although this may appear difficult to police, we have seen proof that it is possible in Europe’s policing of GDPR compliance. Such an approach would protect republican freedom not by blanket paternalistic requirements but by divesting the decision to the person and empowering them to understand and make their decisions. And of course, the nudge would be an attempt to push people towards providing data to give a competitive edge for AI advancement.

3. Establishment of Synthetic Data Institutes

Synthetic Data Institutes will go a long way to solve two intractable problems which in AI advancement threaten republican freedom. The first is the sensitivity of any data generated with a human being playing a crucial role in the loop: whenever this is the case, privacy risks will always exist. The second problem has to do with the creeping monopolism of the big tech companies. Let me explain. Limited access to datasets is a significant entry barrier for startups building AI tools to compete with the big tech giants. Most new entrants are unable to compete with the pull and traffic that the big tech behemoths can boast and the obvious consequence is that they are also unable to collect the same amounts of data. Hence, they are unable to compete with the big tech companies,[27] leaving internet users structurally vulnerable to domination — exactly what republican freedom aims to minimize.

Synthetic datasets are effectively fabricated datasets that have all the attributes of real datasets. They are generated in three ways, but I will focus here on the most promising one: generation by generative adversarial networks that operate on deep neural networks.[28] The use of deep neural networks allows synthetic datasets to be immune to the typical identification-reidentification problems that come with other data and therefore offer no threat to privacy (and by extension republican freedom). This makes sense once you understand deep neural networks as simple brain-like information processing units that interact to model complex relationships which human beings themselves cannot in time explain.

Synthetic datasets will also democratize access to data and spur innovation and growth of AI startups, beginning to address the creeping monopolism of the big tech companies that has become a growing risk to republican freedom. Finally, concerns need not arise regarding lost utility since evaluations have found synthetic data to be just as useful as real data.[29] It is because of these incredible possibilities that I press the case for large scale, targeted investment into it starting with the establishment of research institutes entirely focused on it.

4. Serious Investment into Provable Data Deidentification Techniques

The 2007 Netflix privacy fiasco showed us that personal data apparently protected through longstanding methods of deidentification like suppression and generalization remain vulnerable to reidentification.[30] Those methods remain susceptible to the breaches of personal data privacy and thus republican freedom. The advent of methods such as differential privacy has created fresh hope, and I am convinced that more serious investment into their them could lead to innovations that keep the way open for the advancement of AI while at the same time reinforcing protections for republican freedom.

The task of course is to build tools that assure watertight deidentification without destroying the signal in the data, especially since personalization is one of the key areas of usefulness for machine learning (its tools use an individual person’s data to adapt their experience on a platform, assign them a risk score, et cetera). Differential privacy is a giant step towards this direction but remains flawed because it can still be circumvented if enough identical queries are asked or if the questions asked require a high level of specificity.[31] It also remains unwieldy to provability, which is crucial to build public trust in new AI tools. While the ultimate goal remains elusive, it is not impossible, especially because industry investment in these sort of techniques remains relatively insignificant. Given the promise they offer, there needs to be a serious push for their research and development in liberal states.

Conclusion

Protecting personal data in this age of big data is a critical part of defending the people from domination. Yet as we defend our republican freedom, it is just as important not to hamstring AI advancement. I hope that the proposals this essay gives meet this important moment.

[1] Government of the United Kingdom, Press Release, ‘Funding for 84 million euros for artificial intelligence and robotics research and smart energy innovation announced’ 8 November 2017; Mathieu Rosemain and Michel Rose, ‘France to spend $1.8 billion on AI to compete with US, China’ Reuters, March 29 2018; State of European Technology Report, 30 November 2017; Daniel Araya, ‘Who will lead in the age of Artificial Intelligence’ Forbes, January 1 2019; Ralf Llanasas, ‘A look at the US/ China battle for AI leadership’ Design News, June 7 2019; Paul Mozur, ‘Beijing wants AI to be made in China by 2030’ New York Times, July 20 2017; National Science Foundation, Press Statement 18–005, ‘Statement on Artificial Intelligence for American Industry’ May 10 2018.

[2] See Ray Kurzweil, The Age of Intelligent Machines, MIT Press, Cambridge Massachusetts, 1990, page 425–449.

[3] Paul R Daugherty and H James Wilson, Human + Machina: Reimagining Work in the Age of AI, Harvard Business Review Press, Boston, 2018, pages 23 to 76;

[4] Eric Topol, Deep Medicine: How Artificial Intelligence Can Make Healthcare Human Again, Basic Books, New York, 2019, page 59–67.

[5] Hod Lipson and Melba Kurman, Driverless: Intelligent Cars and the Road Ahead, The MIT Press, Cambridge Massachusetts, 2016, page 85–125.

[6] PWC estimates that AI will add nearly 15.7 trillion USD to the global economy by 2030, all of which will be unevenly split between the countries in the lead. See https://www.pwc.com/gx/en/news-room/press-releases/2017/ai-to-drive-gdp-gains-of-15_7-trillion-with-productivity-personalisation-improvements.html (last seen October 22 2019).

[7] Brad Smith and Carol Ann Browne, Tools and Weapons: The Promise and Peril of the Digital Age, Penguin Press, New York, 2019, page 270.

[8] Tom Simonite, ‘For Superpowers, Artificial Intelligence fuels new global arms race’ Wired, September 8 2017.

[9] Greg Allen and Taniel Chan, Artificial Intelligence and National Security, Harvard Kennedy School Belfer Centre Study, July 2017, page 102.

[10] Alina Polyakova, ‘Weapons of the weak: Russia and AI-driven asymmetric warfare’ Brookings Report, November 15 2018.

[11] John D Kelleher, Deep Learning, The MIT Press, Cambridge Massachusetts, 2019, page 6; 20–21.

[12] John D Kelleher, Deep Learning, The MIT Press, Cambridge Massachusetts, 2019, page 35.

[13] Ian Goodfellow, Yoshua Bengio and Aaron Courville, Deep Learning, The MIT Press, Cambridge Massachusetts, 2017, page 104–106; 112; 240–243; 280; 305; 330–332.

[14] John D Kelleher, Deep Learning, The MIT Press, Cambridge Massachusetts, 2019, page 21.

[15] Yaniv Taigman et al ‘Closing the Gap to Human Level Perfomance in Face Verification’, 2014 IEEE Conference on Computer Vision and Pattern Recognition, June 23 2014.

[16] While you can train a reinforcement learning agent how to play a mean game of chess without training data, that is mostly because we can formally explain the chess ruleset (what the agent is allowed to do on a given turn, how to win or lose) to the machine and let it experiment (i.e. play against itself) until it converges on a good policy for winning. We cannot explain most phenomena we would want to predict or model with reinforcement learning quite so easily (imagine trying to completely formalize the mechanics of a stock market or climate system) — and that is why reinforcement learning agents are so good at playing games, but have trouble with most real-world problems.

[17] Stuart Thompson and Charlie Warzel, ‘Twelve Million Phones, One Dataset, Zero Privacy’ New York Times December 19 2019.

[18] Louise Matsakis ‘The Wired guide to your personal data (and who is using it)’ Wired, February 15 2019.

[19] FA Hayek, The Constitution of Liberty, University of Chicago Press, Chicago, 1960, page 11.

[20] Elizabeth Anderson ‘Freedom and Equality’ in David Schmidtz and Carmen E Pavel (editors), The Oxford Handbook of Freedom, New York, 2018, page 4.

[21] Roger Brownsword ‘Law, Liberty and Technology’ in Roger Brownsword, Eloise Scotford and Karen Yeung (editors), The Oxford Handbook of Law, Regulation and Technology, Oxford University Press, New York, 2017, page 63–64.

[22] Shafi Mussadique, ‘China’s edge in the tech race is vast amounts of data’ CNBC, December 9 2018; Kai-Fu Lee, AI Superpowers: China, Silicon Valley and the New World Order, Houghton Mifflin Harcourt, Boston, 2018, page 16–17; page 55–66.

[23] Cecilia Kang et al, ‘Mark Zuckerberg Testimony: Day 2 brings tougher questioning’ New York Times, April 11 2019.

[24] Eline Chivot and Daniel Castro ‘The EU Needs to Reform the GDPR to Remain Competitive in the Algorithmic Economy’, Centre for Data Innovation Report, May 13 2019.

[25] Bill Maurer, ‘Principles of Descent and Alliance for Big Data’ in Tom Boellstorff and Bill Maurer (editors) Data, Now and Bigger! Prickly Paradigm Press, Chicago, 2015, page 67–86.

[26] Charles Fried, Contract as Promise: A Theory of Contractual Obligation, Oxford University Press, New York, 2015.

[27] For example, only two out of the top 20 companies changing lives and making money out of big data are European, see the European Commission Policy on Big Data Value Public-Private Partnership, 13 July 2018.

[28] Steven M Bellovin, Preetam K Dutta and Nathan Reitinger ‘Privacy and Synthetic Datasets’, Stanford Technology Law Review, Vol 22 Issue 1, 2019, page 31–33.

[29] Steven M Bellovin, Preetam K Dutta and Nathan Reitinger ‘Privacy and Synthetic Datasets’, Stanford Technology Law Review, Vol 22 Issue 1, 2019, page 35–36.

[30] Arvind Narayanan and Vitaly Shmatikov ‘Robust Deanonymization of Large Sparse Datasets’ University of Texas at Austin, page 8–12.

[31] Paula Bonizzoni, Gianluca Della Vedova and Ricardo Dondi ‘The K-Anonymity Problem is Hard’, Proceedings of the 17thInternational Symposium on Fundamentals of Computation Theory, 2009, page 26–37; Andreas Haeberlen, Benjamin Pierce and Arjun Narayan ‘Differential Privacy Under Fire’, University of Pennsylvania Computer and Information Science Paper, page 1.

--

--

Cecil Abungu

Social science researcher interested in a range of subjects.