AI is separating privacy from the personal
After 25 years studying data privacy, Heng Xu says researchers need to start over.
Published: October 26, 2025 / Author: Courtney Ryan

Back in 2001, when U.S. consumers were still adopting flip phones, countries on the other side of the world, such as Japan, South Korea and Singapore, were witnessing the earliest proliferation of mobile commerce. Using short messaging service (SMS) commerce applications, customers could book concert tickets, order a cab or purchase soft drinks from a nearby convenience store, all from their phones.
At the time, these innovations were novel and exciting, and indicative of the new internet age. A quarter of a century later, they might be considered prophetic for how much of modern life is now “lived” on smartphones.
For Heng Xu, professor of Information Technology, Analytics, and Operations at the University of Notre Dame’s Mendoza College of Business, the burgeoning mobile commerce industry served as pedagogical inspiration. “I was fascinated by the possibility that everything we do can possibly be done through a tiny device,” she said. “I decided to do my dissertation on it and be part of this revolution.”

Heng Xu
Xu was earning her Ph.D. at the National University of Singapore. As she began interviewing industry experts, she landed on a question that she is still asking today: How do we address the fundamental issue of protecting user privacy? As Xu has interrogated this question over the years, the concept of information privacy has changed as drastically as the technology that threatens it.
Throughout her career, Xu has sought to define what privacy is and probe ways in which it can be protected or circumvented. What constitutes private information and what is its value? How is it exploited and by whom? Much of her research has concerned data, particularly individuals’ personal data, or at least it did. Today, privacy concerns are as relevant as ever, but over the past 10 years, those concerns have shifted in focus from protecting personal data to determining how collective knowledge can be used to infer individual behavior.
“I want to challenge our previous way of doing everything in terms of privacy research,” she said. “The traditional approach won’t work anymore.”
The “traditional approach” is explored in and even influenced by Xu’s 2011 paper, “Information Privacy Research: An Interdisciplinary Review,” published in MIS Quarterly and co-authored by Tamara Dinev of Florida Atlantic University and H. Jeff Smith of Miami University. The paper, which won the MIS Quarterly Impact Award in 2021, responded to the era’s privacy outcries from internet users and policymakers who were appalled to learn that companies such as AOL and Facebook collected, stored and even profited off of individuals’ sensitive information.
Xu and her co-authors examined what made data collection acceptable in some cases and objectionable in others. They investigated how privacy was defined in scholarly literature, whether privacy concerns could be measured and if privacy theory could be generalizable by scholars. After determining that contextual factors — such as whether the information was used by a health insurance company versus a social media platform — prevented privacy theory from being generalized across all situations, the authors then argued that researchers and practitioners should move away from a normative focus that emphasized ethical theories and morals to an empirical one rooted in observation. Thanks in part to the influence of this paper, empirical studies on information privacy multiplied over the ensuing decade.
Underpinning much of this scholarship was how organizational policies and legislation could govern flows of personal data. In a data flow, information is collected from an individual, then processed, stored, analyzed and shared. Policymakers attempting to govern this flow regulate what data can be collected, how it is stored and processed, and with whom it can be shared. They scrutinize data breaches by pinpointing precisely where in the data flow cycle information was vulnerable.
“Carefully analyzing data flows to understand which of these steps raised red flags or led to privacy breaches was my career until around 2015,” said Xu. “But then, after the arrival of generative AI, everything changed.”
With AI, an organization does not need to collect personal data from an individual to learn about them. In some cases, data that has already been collected can be legally preserved via data-anonymization techniques, which disassociate data from its subjects so that none of the information can be traced back to any individual. In other situations, organizations can ask for permission to collect specific data from individuals. Generative AI memorizes and analyzes the resulting massive amounts of legally available data and then infers how an individual might behave based on that data.
“Machine learning algorithms don’t need to exactly pinpoint who I am,” she explained. “They just need to know statistically what I might prefer in order to send me targeted ads or influence or even change my decision making.”
Through machine learning algorithms, AI can rifle through millions of data points and group them in various ways to create profiles. These collective entities are then used to infer characteristics about individuals. If someone creates a profile on Facebook, for example, and engages with the app by watching videos or making purchases, Facebook’s AI can approximate which products are likely to appeal to that person or which politicians they are likely to vote for based on how their behavior aligns with collective entities.
Xu calls this “knowledge inference” because an AI can infer insightful information about an individual without knowing anything specific about them. She explained that this has upended any laws or policies that were created to protect privacy based on personal data flows.
“Our current laws are very weak and don’t address this situation because data privacy law is based on collecting my data,” she said. “Facebook can accurately target me and make me feel intruded upon and annoyed, but if I file a lawsuit against them, they can prove that they never collected any data from me.”
AI’s extrapolation of knowledge from collective entities rather than singular persons creates a quandary for current laws such as the European Union’s General Data Protection Regulation (GDPR), which establishes privacy rights and protections for individuals. Further, though data-anonymization techniques might erase any links to personal information, AI is still inferring information from anonymized or “erased” individuals by mining the collective data pool.
Given the fundamental issues at the center of data privacy governance, Xu suggested that scholars and policymakers start over from scratch in terms of how they define and address the problems they are attempting to probe and solve.
“I don’t have a magical term yet, but generally I think ‘data protection’ is more meaningful than saying ‘privacy protection,’” said Xu. “Because our brain associates privacy with personal protection and with the new generation of deep learning, it’s all statistical inference based on a population rather than an individual. So I think our focus should be on data itself rather than on personal data.”
As AI upends the prevailing wisdom around data and privacy, Xu said this makes for an especially provocative time to research privacy. In a guest editorial for MIS Quarterly, “Reflections on the 2021 Impact Award: Why Privacy Still Matters,” she and co-author Dinev contemplate the 10 years since they published “Information Privacy Research: An Interdisciplinary Review.” They conclude that privacy concerns have shifted from an individual to a collective focus and that future research should emphasize the technical design of privacy protection techniques as well as their societal impacts.
“I am excited for the fact that privacy and privacy research won’t disappear [due to AI],” said Xu. “Instead, its focus will shift from data control (‘don’t misuse my data’) to inference control (‘don’t make that prediction about me’).”
She believes that Mendoza is an ideal environment to explore the interdisciplinary nuances inherent to privacy research, especially with the school’s emphasis on ethical business leadership. In her class Ethics of Data Analytics, Xu is often struck by how many of her undergraduate students have already taken several ethics courses. This will especially set them apart as business leaders in the future. It will also keep them busy given how the legal definition of privacy needs to be refined and how AI is challenging past assumptions about how businesses leverage technology.
Xu recently advanced research into data ethics with the paper, “Implications of Data Anonymization on the Statistical Evidence of Disparity,” published in Management Science. Co-authored with Nan Zhang, professor of IT, Analytics, and Operations at Mendoza, the study examines how data anonymization disparately impacts privacy protection for underprivileged subpopulations.
Though data anonymization has been studied to determine its limitations, this paper highlighted a different potential pitfall. Could obscuring individual identities in datasets also hide existing disparities, thus perpetuating societal inequalities? Xu and Zhang tested two common mechanisms of data anonymization: data removal and noise insertion. Data removal entails removing a combination of variables that can be used to identify an individual. Noise insertion, on the other hand, involves adding random values to the dataset to conceal the impact of any single person’s data. By testing and comparing both methods, they demonstrate that data anonymization can indeed inadvertently mask or fabricate evidence of disparate impact, obscuring inequities in the data.
Xu said that the goal for the paper was not just to reveal the unintended consequences of privacy protection, but to advocate for disparity-aware data protection. She remarked that these approaches should “protect people’s privacy without erasing their ability to be seen.”
“There’s a huge gap between what is legal versus what is ethical in terms of using data and using data for training today’s machine learning algorithms,” she said. “This can become a competitive advantage for AI and data ethics education … I think this is a completely new field now.”
Related Stories