App got your data?

Who you gonna call?

Hi. Can here. Let’s talk about…you guessed it, data. I wrote this piece a few weeks ago for an opinion section, but plans have changed. Pardon the unusually formal tone, and the length.

Countless tech companies are built on the data that comes from our phones. Find a date with Tinder, take an Uber to a restaurant you suggested by Yelp, consult Clue for some, let’s say, romantic advice. And maybe, if all goes well, there’s Glow to check on your baby’s progress. Even the birds and the bees, there’s an app for that! True, but what they don’t tell you often is there’s also a database behind that app as well.

Think about it. Would you tell where you were last night to a person you just met on the street? You might, for example, share your financials with an institution that is bound by some regulation, but divulging your medical history with a teenager who just learned how to code by following an online tutorial would be insane. Yet, as the technology of technology advances and the bar for creating a compelling, useful app lowers, new risks that we never thought about arise.

The untold reality is that you don’t know whether all the intimate details you put into your phone are going into a secure computing system whose access is tightly controlled, or into a random computer sitting in some guy’s basement. On your phone’s screen, they all look the same.

From Clue’s blog post about the privacy of health data

In my past as a software engineer, I worked at several tech companies, small and large. I helped build a social news website at Digg, was an early employee at both tutorial platform Snapguide and cloud storage company Upthere. Then, I spent around three years at Uber, working on both the infrastructure and security teams. Throughout my career, the teams I worked on collected data about people’s reading habits, built instant messaging features where people could meet and build relationships, and helped millions of people take billions of Uber trips.

One thing I learned in Silicon Valley was that even the smallest start-ups grow very quickly. A weekend project to store some files online can become a service used by millions of people in the span of weeks. This growth doesn’t just come in the form of users, but also employees: It’s not unheard for a company in so-called “growth mode” to double its number of employees every six months (at smaller companies, that rate could even be faster).

With Growth Comes Problems

With this fast growth come two significant problems. The first is that, for many firms, internal data controls are hardly top of mind concerns. They are at best secondary, if not many later considerations. At some level, this is rational. Unless you’re working in a regulated industry, the pitch deck you send to venture capitalists isn’t made more attractive by showing how sophisticated your access control systems are. 

And, realistically, product managers and engineering leaders at fast-growing start-ups aren’t going to tighten and limit data access for their engineers when the mission is to increase user growth at all costs. Almost by definition, building systems to monitor, restrict, and audit data access introduces friction to product development. Unless you have an internal company culture that values privacy from day one, it’s tough to change “the way things are done around here”. Culture eats strategy for breakfast, but is also much harder to remold. No one likes it when you start throwing sand into the wheels of their work-flow. People join companies for their culture, and don’t like it when it changes under them.

I’ve been on both sides of this. When I was at a small company, and our users were all friends and family, I got used to investigating error reports by logging into other users’ accounts with little abandon. Yet, when we launched the product to external (still beta) users, and we had to build sophisticated access control systems, I could feel my blood boiling. I knew, at an academic level, it was the right thing to limit access, but all I could think was how unproductive I would be then. Lots of grumbling ensued, and many angry emails followed.

Then, a few years later, I happened to find myself building the same systems that blocked access to others. It was my adjacent team, at a much bigger company, that was limiting thousands of employees’ access to customer data. We were the recipient of much grumbling, and some of those many angry emails were addressed to me. Some argued the blocks would make our service less safe, while other straight-up complained about how unproductive they would become. Our responses about how I’d been there, and how everything would be okay, fell on deaf ears. The teams marched along anyway, and we built the required systems to limit access.

Uber’s God view circa 2011.

I’ve also seen how it can go wrong when companies are loathe to limit employee access to user data. Although God View predated me at Uber, stories of its misuse were both of a public and private record. Named as a small nod to a specific type of video game where you play God, the tool allowed employees to track down users in real-time. Its heavenly allure, however, proved too powerful for most mere mortals to handle. First, there were the allegations that Uber employees used the tool as a neat party trick, and then the news hit that an executive texted a journalist on his way about his location. Uber retired the free-for-all access to God View (if they had it) and settled with FTC on allegations on improper access to user data.

Building roadblocks at a quickly-growing company is a tough pill to swallow. But a more insidious issue is how few people, end-users of these apps and products, know how much data, how much private information is collected on them each and every day. According to analytics firm App Annie, the average user in the US, South Korea, and Australia has over 100 apps on their phone. People in the US spend around 3 hours on those apps, every day of their waking time, but your phone doesn’t sleep. According to Washington Post’s research, more than 5400 apps still talk to some servers when you are catching some Zs. 

Dude, Where’s My Data?

Once the data leaves your phone (or computer), it acquires a quasi-public quality where it is visible to anyone but you. It could be sitting in a hard drive, on a computer that is sold by accident one day. Or it could get mixed and matched with other billions of data points from thousands of other apps and sold to the highest bidder, and there will be thousands — if not hundreds of thousands — of people who’ll have the ability to peer into it, on a whim. This is not some far-fetched, dystopian fantasy but the reality that we live in.

A common defense for this dragnet data collection is that users simply do not care about their privacy, or they are okay with providing data for the added convenience. This is possible, but it’s incredibly naïve. For many users, the type and quantity of data they hand over to the apps are shrouded behind pages and pages of legalese. It’s unfair to expect people who simply want to get from point A to point B, or who simply want to share a photo with their friends, to decipher complicated regulations. 

Maybe more importantly, for most users, there’s not much of a choice. A company that can monetize such private data, be it through advertising or some other data-sharing scheme, will always be at an advantage to one that doesn’t. Combine that with the infamous tech monopolies, it’s hard to argue people’s revealed preferences, as the free-market ideologues love to point out, really reveal that much. 

Even technically sophisticated users like myself find it impossible to imagine where their data will end up. Small tech companies rarely have the required access controls, nor the resources to build them. And worse, inside the company, it’s a free-for-all. For the start-ups that make it to the big leagues, most of the internal technology is developed in such a hodge-podge manner that it’s hard to keep track of where any piece of data will end up. It’s not uncommon, for example, that a well-meaning analyst will run a database query for a weekly report and end up including some user’s private data on a PowerPoint. Once the data is in the system, it gets used, and abused.

Make Things Explicit

Maybe he reads Margins?

Not all hope is lost, though. As data privacy makes it ways from a nerdy concern to a common topic in national debates, we should see more companies take internal security more seriously. Especially for big companies, where the fear of embarrassment is a strong motivator. For smaller firms, however, things are murkier; most of them operate in a regulatory vacuum, and by the time they are big enough, it might be too late.

A good starting point would be making data collection and storage more explicit, instead of the default. Europe’s Right to be Forgotten and GDPR regulations have their problems, but they do force companies to keep track of how and where users’ data is stored. California’s own GDPR-lite (or GDPR Pro), as kneecapped as it might be for now, might help. Bringing that accountability forward, when products and the technologies that underlie them are designed — instead of trying to shoehorn a deletion system into the process later — would be good for everyone.

A more involved approach would be figuring out what kinds of data require special care. Personal contact information and location data are good starting points. There’s some precedent here, with HIPAA regulation storage and portability of medical data of individuals as well as PCI compliance, an industry-standard that sets rules on storing credit card information.

Initially, it might be tricky to balance regulation with stifling innovation. But technology firms can also live true to their DNA and innovate their way out and up. For example, new technologies like differential privacy can enable analysts to run their reports on aggregate data while preserving individuals’ privacy. Doing more of the data processing on users’ devices, instead of transferring it to a server, might limit the risk of massive data breaches. And new companies might pop up that can do the heavy lifting of handling regulation and compliance while allowing smaller firms to focus on their core businesses.

What I’m Reading

Inside R/Relationships, the Unbearably Human Corner of Reddit: Reddit is a dark place, but there are diamonds in the rough. In an age where every other social network tries to eschew any and all human moderation, r/relationships may offer some clues on how to build, grow, and sustain a real community. Let’s make this woman the next CEO of Facebook.

Anne’s rules forbid gendered insults, including bitch, obviously, but also dick, somewhat perplexingly. They forbid alpha and beta, because that dichotomy attracts the Red Pill crowd. They forbid external links or images of any kind. (“People will go through a breakup and post revenge porn, and we’re not going to have that,” Anne explained. “Or they’ll post 15 pictures of a text-message exchange. I would rather roll naked in my own vomit.”)

Silicon Valley billionaires’ strange new respect for Elizabeth Warren: I can neither vote in US (yet), nor am I a billionaire (also, yet?) but as a Silicon Valley person, I am surprised and impressed like most people about Warren’s ascension in the Democratic Party nomination ticket.

Also cutting $2,800 checks to Warren in recent months are former Y Combinator chief Sam Altman; the founder of Sonos, John MacFarlane; and Chris Sacca, a billionaire investor who runs a network of Silicon Valley liberal donors. Just as rank-and-file employees at Google are surprisingly pro-Warren, counterintuitive cracks are beginning to show at the elite level as well.