2022.08.11

How to do Phone Privacy (Pretty Good)

Today we do a deeper dive into the thinking behind PGPP (Pretty Good Phone Privacy), which we announced this week.

The Core of Privacy, and our Goals

At its core, privacy is being free from observation. For most of the thousands of years of human history, it’s not something we’ve had to think much about: there was no way for anyone other than those physically near us to observe us, and there was no way for information about specific people, let alone the mass of humanity, to be permanently gathered. This has changed in perhaps a decade without many people grappling with the profound historical nature of our collective loss of privacy.

With privacy comes many things. At the individual level privacy provides peace of mind, a freedom from chilling effects that allows one to innovate and experiment and express one’s mind. At a societal level, privacy enables the advancement of society as a whole. However, we live in a world of ubiquitous surveillance –- data collection about every aspect of our lives, largely by companies whose names we have never heard of and have no relationship with. Critics of these practices are right that not only is this surveillance widespread, but fighting against it is challenging: perfect privacy in the year 2022 is nearly impossible to achieve. Even when users try to protect themselves, companies are incentivized to adapt: to find new ways they can continue to track and collect this lucrative data.

So it’s in this context that we developed PGPP. Not only is the name an homage to a classic privacy technology, but also it’s a reminder to ourselves and to the world that when it comes to privacy in 2022, pretty good, not perfect, is something worth striving for, and we can keep improving upon that with each passing month.

User Identifiers, Network Identifiers, Other Identifiers

Data collection, even in the pre-computerized era, has always hinged on data identifiers and data classification. Without the ability to categorize and classify data about the world – in the words of anthropologist James C. Scott, to See Like a State – the data broker ecosystem, the surveillance state, and many other mechanisms that harm our privacy would not be possible. When a data broker seeks to aggregate information and sell virtual dossiers about individuals based on their ill-begotten data collection, they are able to put together a cohesive picture about a person by piecing together information tagged with various identifiers. The more accurate and comprehensive that dataset, the more valuable it is to data brokers and their customers, and the more that data can be turned against us. Given the largely (though not exclusively) commercial nature of the privacy risks we face today, we can see our goal as to reduce the value – the accuracy and breadth – of the datasets they collect to the point that it is no longer worth it for them to go through the process of collecting it. This is not a black and white matter, and so the fight for privacy is an incremental one versus the shadowy data brokers and the larger entities – social media companies, governments, etc. – that do the same data collection out in the open.

All modern technologies use identifiers to track us – these are the keys into the databases, the entries in the routing tables – but this tracking often begins with a more innocuous purpose. To route Internet traffic to us, a device needs an IP address at which others on the Internet can reach it. To connect to a mobile network, a SIM needs an IMSI that is associated with a billing account. These are network identifiers. There are also numerous human identifiers – national ID numbers, passport numbers, drivers licenses, and much more, but even something as simple as a full name – that are meant to uniquely identify a person. And finally we live in an era of ubiquitous devices – everything from laptops and phones to RFID tags embedded in ordinary objects – and these all have identifiers of their own.

Our goal in PGPP is to decouple, to the extent possible, the human identifiers associated with a person from the network identifiers associated with a phone. In so doing, we chip away at the data that can be meaningfully aggregated at scale by those who seek to do so. While they will always adapt to technologies like PGPP, we view restoring privacy as a complex, layered, incremental process that involves technical advancements like PGPP along with political and social changes.

Decoupling on a Phone

There are two network identifiers that have long been used as the keys in databases about us: the IP address and the IMSI. The former is used on the Internet, with some companies boasting of the breadth of Internet traffic they can see, even across VPNs. The latter is the IMSI, permanently associated with a SIM and phone number today and used as a database key inside of the core software of mobile networks to not only keep mundane information (such as for connectivity and mobility support) but also as a means for easy location data aggregation. Along with these two network identifiers are the human identifiers that we present to companies – our full names (along with other information, such as home address) that serve to uniquely identify us – that are in their databases permanently linked (and often inadvertently leaked through their negligence or sold for profit).

PGPP’s mechanisms decouple a user’s human identity from their network identity such that those pieces are difficult to put back together using existing databases and methods. When a user connects to the Internet using PGPP Relay, all that our INVISV first hop server learns is that some subscriber is using our system, but nothing more. Fastly, the second hop, learns that some subscriber has requested to be connected to some IP on the Internet, but not who or what. Thus the user’s human identity is separate from their network identity. When using PGPP mobile connectivity, upon subscription, while the account of the user is associated with a credit card (though such a card could be prepaid, and not associated with a person’s name), we use Chaum’s blind signatures to issue the phone blinded tokens that enable it to prove that it should gain data-only connectivity during an IMSI change without identifying which user has which token.

There are many hardware identifiers that exist in all personal electronic devices. Laptops have MAC addresses, CPU serial numbers, and many more. Phones have IMEI, EID, Serial Number, MAC addresses, and more. However, fortunately, these identifiers typically cannot be relied upon to be unique and unchanging by the network so they are not the primary identifier used by networks, and in the case of Bluetooth and WiFi MACs can be easily randomized. Hardware identifiers are also not directly associated with the user’s identity, and the information about hardware identifiers, when it exists (such as in the stolen handset database), is decoupled across multiple distinct parties, not up front and easily usable for bulk data collection. We imagine this will change with time, but for now it is not the primary vector for data collection. And even the elimination of hardware identifiers is insufficient: as our colleagues at UCSD found in recent work, phones can be identified by their physical wireless signals themselves, without even using a unique hardware identifier. And this isn’t unique to Bluetooth or recent hardware: it has long been known that hardware devices can be uniquely identified by the unique imperfections that remain from the manufacturing process.

Those who take privacy very seriously (and we count ourselves among that group) will note that the existence of any permanent identifiers reduces a user’s privacy. This is an unfortunate consequence of the centralization of manufacturing, standardization, and operation of modern networks and modern devices. Each year we learn of new, hidden identifiers and communication buried deep within onboard chips that users have little visibility into. The ability to identify devices on and off the network, unfortunately, seems likely to never go away. So our aim should continue to be to decouple human identity from their device and network identities.

Practical Objectives and the Strongest Mobile Privacy

We got to a place a few years ago where it seemed that perfect privacy was impossible – that’s the reality of it. So we started to ask ourselves: what practical protections for ordinary users can we deploy against bulk data collection, which sweeps users up into massive datasets without their consent? That’s the dominant mode of data collection right now, and the one that we felt that could have broad-based impact even if it was just pretty good, not perfect.

Unfortunately mobile networks introduce inherent challenges to privacy, as they are designed and built by a consortium of major operators and device manufacturers, as a walled garden. On the other hand, technologies like WiFi are far more open and have less inherent surveillance, since anyone can set up a WiFi network with cheap and relatively open hardware. Right now mobile network access is a necessity of life for many people, and as such, we view PGPP Mobile Pro/Core as a pretty good option for such users, as it raises the privacy bar against data collection by mobile networks and others. However, they can and will adapt to PGPP, and find other means to collect user data.

Users who are able to opt out of using mobile networks entirely can instead use PGPP Relay on WiFi networks and avoid nearly all mobile tracking that can be done in the network itself – such a user can hop from WiFi network to WiFi network, using WiFi MAC randomization locally combined with PGPP Relay’s IP address decoupling. This combined with judicious selection of apps and Android variant can ensure a high degree of privacy and security. (This still doesn’t prevent physical signals-based device identification, but there are no consumer-grade techniques for mitigating such attacks that we’re aware of.)

The Decoupling Principle writ Large

INVISV’s goal is to take practical steps to ensure privacy for ordinary people, and the Decoupling Principle we mention above is a key part of our approach. We’ll have more to say on that in an upcoming post.