## Monday, October 28, 2019

### Anonymous Summary Statistics or How to protect nerds from bullies

School is out for the semester and it’s time for report cards! Each student receives their report card in the mail and opens it to see how their semester has gone. The report card contains a list of subjects, each with the students mark along with some summary statistic (average and standard deviation). Given the summary statistics a student can determine how well they've performed relative to their peers and if they are an exceptional student.

Now let's say that there are some bullies in the class. These bullies got their report cards and weren't too happy with the results. They were far below the class average! The bullies want to know who to direct their anger towards. They know some nerd got great marks and they want to know who that nerd is!

If you are the principal of the school and need to create classes, how can you do so to protect the nerds from bullies?

Let's go over some cases.

### Case 1 - Home Schooled

 As if being homeschooled wasn't punishment enough

This case sucks. If you are the nerd in a home school of two then the bully is going to know who you are very quickly. In a case like this, the only way to protect the bully would be to forego sharing any summary statistics

### Case 2 - Well Mannered School

Let's suppose the school is largely full of good kids. If the frequency of bullies amongst kids is low then we don't need to make too many considerations for the nerd's anonymity. As long as the number of well mannered kids outnumbers bullies there is little risk to the nerd.

### Case 3 - Rough School

 oof
If the school is a rough one then there would likely be a lot of bullies. In a case with a high frequency of bullies, class sizes would need to be large in order to ensure that the nerd has some other good natured students in his class.

In reality most schools would fall into either case 2 or 3. The only issue is that in real life people aren't just bullies or well mannered; they have good days and bad days. This means that in any class, there won't be a static number of bullies.

Knowing all of this how can we design classes and report cards in order to protect nerds?

### Security through obscurity

In this design some key information in kept from the bullies. If a group of bullies didn't know how many other students were in the class, they would be unable to make use of any of the summary statistic. By obfuscating some key information the identify if the nerd is kept safe. In practice this is a poor security measure. Key information that is static is likely to be eventually determined by the bullies.

### Security through large classes

For this design we make very large class sizes in the hopes that not everyone is a bully. If classes are large enough then the likelihood of all students except the nerd being a bully is low. This is a decent security measure but is faulty though as large class sizes come with trade-offs.

### Security though fuzzy metrics

For this design we don't give students the whole picture. Instead of giving the average on a report card we could instead provide a fuzzy metric, something like an arrow to show if you are above or below average. If the classes marks are normally distributed then we expect several students to be above average, protecting the identity of the nerd. This is a great security measure as it protects anonymity without too large a trade-off.

#### Requirements to determine the nerds marks

In a case were bullies want to determine the marks of the nerd then several requirements are required:
• All students except one must be bullies
• All bullies must work together and share report cards communally
• All bullies must be in the same class as the nerd
• Bullies must know the total number of students
• Bullies must know who in the class isn't a bully

### Breaking Character

The example above can be used as a way to conceptualize the Actor Model in Data Science. If Summary Statistics are to be shared between untrusting parties in the presence of bad actors than how can risk be quantified?

To my knowledge there is no magic formula to determine the threat to breaking anonymity. Only strong domain knowledge and an understanding of the risk avenues can lead to the design of safe products.