Article 5ZTB7 Conspicuously missing data

Conspicuously missing data

by
John
from John D. Cook on (#5ZTB7)

I was working on a report for a client this afternoon when I remembered this comic from Spiked Math.

three_logicians.png

I needed to illustrate the point that revealing information about one person or group can reveal information on other people or other groups. If you give your genetic information to a company, for example, you also give that company (and every entity they share your data with) information about your relatives.

This comic doesn't illustrate the point I had in mind, but it does illustrate a related point. The third logician didn't reveal the preferences of the first two, though it looks like that at first. Actually, the first two implicitly reported their own preferences.

If the first logician did not want a beer, he or she could have said No" to the question Does everyone want a beer?" Answering this question with I don't know" is tantamount to answering the question Do you want a beer?" with Yes." What appears to be a non-committal answer is a definite answer on closer exanmination.

One of the Safe Harbor provisions under HIPAA is that data may not contain sparsely populated three-digit zip codes. Sometimes databases will replace sparse zip codes with nulls. But if the same database reports a person's state, and the state only has one sparse zip code, then the data effectively lists all zip codes. Here the suppressed zip code is conspicuous by its absence. The null value itself didn't reveal the zip code, nor did the state, but the combination did.

A naive approach to removing sensitive data can be about as effective as bleeping out profanity: it's not hard to infer what was removed.

Related postsThe post Conspicuously missing data first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments