A short talk about cryptography at the Berkman Klein Center

The 7th of July me and Aaron, as interns at the Berkman Klein for Internet and Society, gave a presentation on the basics of cryptography and a quick overview on the essential tools.

What follows is a short summary of that presentation. The slides are available here

Whose Security?

Let's define what security is. Security is the possibility to being set free from structural costraints, and as that we can distinguish various levels of security depending on who we are.

Also, if we want to investigate security we should also define our threats: security, as being set free, from intelligence surveillance can be our target. Our concerns as different if we consider instead security from censorship or corporation data mining.

uber god view facebook law enforcement form

What is shown above is the Uber God View, a tool Uber used to track a Buzzfeed's journalist locations, and the Facebook standard form that is given to law enforcement when requested.

Security is a state of mind

Security is hard. It is really rare to reach a state of complete security and even in that case, it depends on our target.

What is important is to train ourselves to security. Security is a state of mind and there are no tools that automatically protect us without our active partecipation.

Let's explore that in details.

The layers of security

We can distinguish four layers of security:

  • Device Security
  • Network Security
  • Message Security
  • Human Security
Device Security, where everything happens

Device security is related to the "physical host".

If the computer we use is tampered, at the hardware level, or the phone is bugged, there is no way to escape using higher level tools.

In other words, it doesn't matter if we use a super secure password if our computer is registering all our keystrokes and send them to a third party.

Also, device security is useful if we consider that our device can fall into the hands of attackers that may be able to traceback all the activities.

Some precautions for this purpose:

  • full disk encryption
  • minimal set of application installed
  • open source operating systems
Network Security

The network is the infrastructures that our device is attached to. In most of the case, when we consider our computer is the internet (and the GSM network in case of mobile phones).

Network security is essential to evade censorship, behavioural tracking and identity theft.

Some tools that may help in this case:

  • vpn
  • tor
  • p2p networks
  • mesh networks

And for the web:

  • opensource web browsers (such as firefox)
  • no google apps on android phones
  • https
Message Security

Message security is the level of protection regarding the content that you want to send or receive.

Message security is essential if you want to avoid any third party snooping and the confidentiality of your messages.

The tools we can use in this context:

  • OTR
  • opensource messaging protocols (XMPP, matrix)
  • Signal
  • PGP

Also, always remember that encrypting the content of the message doesn't guarantee that your identity and the metadata are hidden.

Everything comes down to the human level at a certain point.

This is way it is important to train ourselves in security.

If we consider Kevin Mitnick's history, or the recent FBI deputy director hack we see that social engineering plays a big role when we want to undermine the security of an individual of interest.

But security matters even if we are not target of interest.

For example let's consider our password. If we use the same password on every site and one cracker manages to gain access to just one of them, our whole activities online can be exposed and our identity stolen. This is relevant. Myspace had its database breached and the password of Zuckerberg (even a simple one) was exposed. Given that he used the same password on twitter and other sites, his multiple accounts were compromised.

What is TOR and how it works

When you visit a website with your mobile phone or a browser on your computer lots of things go on under the hoods.

Your computer, as a client, makes what is called an handshake with the server.

After telling the server that the client is interested in its content, a series of packets containing data is exchanged.

That is the content of a connection. Inside this packets there are a multitude of information of two kinds:

  • the web page or the content we are trying to visualize
  • information on the status of both the server and the client

The informations contained in every packet can be analized to understand the "identity" of the client that is requesting the content on the server, first of all the IP that is a sort of web address that every computer on the net has.

Not only, during the transmission of this packets, various entity on the communication channel can analize the content and mine our data.

Cute infographic

TOR still uses this kind of routine to gather the content of a web page, but instead of connecting directly to the destination server it goes through a series of other servers called relay: instead of going directly from A to B, it goes from A to C to D to E to F to B.

If the web was a kindergarden Alice instead of telling directly her phrase to Bob, she would tell the word to a friend that in turn would tell the word to a friend and so on, until Bob heards the word, without knowing that Alice said that at the beginning.

At this point you should ask yourself: are the data more protected if it goes through a network of relays? It actually is given that every time you send a packet through the TOR network, it gets encrypted so that no one knows it's content.

To tell the truth, actually the relay (called exit node) that will send the packet to the destination server, knows the content of the packet but does not know the origin.

Ultimately a website can be entirely hosted on the TOR network, called the onion network, so that the packets never exit from the relays and the relay don't know the phisycal location of the server, so every entity on the network reach a perfect level of anonimacy.

Who owns the relays?

Actually every one can host and own a relay if they are willing to do so. I personally host one right now and there are many others that share a little fraction of their network connection.

My little raspi is moving some packets right now

Running a relay node is very easy and everybody should do so. Running an exit node instead is more troublesome and I don't suggest it if you are not a big entity that can handle some sorts of occasional trouble.

Don't play the fool on the TOR network

Of course TOR doesn't guarantee you perfect anonimacy. At the end it all comes to the human layer.

It's no use to surf the web through TOR if we then log in to our personal blog or our personal facebook page.

But there are other subtle factors that can be exploited by web companies to gather info and track their users.A

Such factors are:

  • the size of the screen and the colors supported by it
  • the timezone
  • canvas and images that the server asks your computer to generate
  • information about your OS that are sent through packets
  • the fonts available on your system
  • touch support
  • cookies
  • ads and cross site requests

In particular, most of these are exploitable using a web programming language, javascript, that lots of web pages uses to render content. TOR users should avoid the use of javascript.

Public Private Key Encryption

While TOR is recent technology, public key encryption is a concept way older.

What happens when we use public / private key encryption tools is conceptually similar to what happens with our physical correspondence.

A public key is similar to our mailbox.

Everyone that knows the location of a person's mailbox can write a message and put it inside but only the owner of that mailbox, using is own key can open the mailbox and read the various messages.

When we use PGP or GPG (an implementation of the public key encription concept) we generate a pair of key.

A public one that we should broadcast or at least share with our social circle, and a private key that must remain secret at any cost.

Everyone can encrypt every kind of digital content using our public key (that is just a really long string) and only the owner of the private key can proceed to decryption of the content.

This also means that we know who is gonna read the message if encrypted using this kind of technologies.

One easy tool for GPG encryption is GPA


What would you do if you were asked to put under surveillance one person?

For sure placing a bug with microphone and recording capabilities would be the best option.

But what if, instead of recording every thing the subject does, we just take a note of all his actions, without taking care of the content. For example, if the subject speaks to someone, we record the time, the place, the duration of the conversation and all the info of the person he is talking with. What if, when the person walks into a mall, we record the time, the location, the shops he entered, the money he spent, the number of things bought, but not the things he bought, in detail.

You can see that you can have a fairly precise idea of the habits of the person under your surveillance.

Actually from metadata is easy to grab all kinds of personal information. Also, if a tiny portions of the information we have on the subject are more detailed (for example social network photos) we have a picture as clear as never.

This is not just one of the biggest concern that should pop into your mind when you are talking about nation wide mass surveillance, it is also the core of the business of corporations like Facebook and Google.

Whatsapp does not read the content of your messages but it stores every single bit of metadata that comes with it.

Metadatas are enought to build a complete profile of the users and they are even more dangerous in the hands of an evil state agency.

Nothing to hide

Even if we have nothing to hide, we have much to fear.

The "nothing to hide" argument is something that everyone of us in this room has heard, at least one time.

We should fear this sentence because it is the ultimate ammision of a big misunderstanding on the whole debate.

Privacy, first of all, is control over our data, not only the right to secrecy.

Transparency should be for everyone

There is a big incoherence when asking to your citizens to handle over their data.

Transparency should be a two way thing, while at the current state big three letter agencies, but high level people as well, cover their tracks and are not transparent on their reports.

This enhance a situation of big inequality between the people and the State.

Even worse, it is not the citizen by himself that can choose if he has something to hide, but the autority.

This may seem a little naive to say, but with Bruce words:

If you give me six lines written by the hand of the most honest of men, I will find something in them which will hang him.

This is true even without considering social discrimination and mass media manipolation.

The fundamental of society

Every action can be seen as either legal or illegal. When we take a decision this is one of the first, implicit concern.

This is not true in a surveillance system: when you are doing something your concern is all about the possibility of raising suspicion.

An idea not an action is what is needed in such a dystopic condition to prove a citizen guilty.

Sometimes two wrongs make a right

In America we are now discussing weed legalization.

Do you think that such debate would have been possible if no one could had the possibility, even if against the law, to try that substance and show other citizen the real implications of their actions?

The same goes for gay marriages, that we are discussing in Italy. Challenging the law, breaking it if needed, is a way to improve the current system.

Inside the panopticon every potential criminal would be persecuted and this kind of advancement would not be possible.

To hide is to care

A simple truth is that we don't close the windows to cover up our crimes.

Our innermost experiences become in our intimacy, which is the most sacred place.

Phone messaging apps comparison

I made this chart for a presentation at the Berkman Klein Center

Whatsapp Telegram Signal
Source code closed source open source open source
Api none various library
Encription protocol state of the art self made state of the art
Contact list location cloud cloud cloud, encrypted
Forward Secrecy yes yes yes
Database phone storage cloud phone storage
Backup capability Icloud or Gcloud builtin none
Revenue ads (Facebook) donation based donation based
Federation no no no
Alternative download location website F-Droid none
Uses third party services no no Google Cloud Messaging
Servers location US Russia US
Tied to mobile number yes yes, but nickname available yes
Desktop client no yes no

Arduino Uno as HID keyboard

Turin is the hometown of Arduino. I have been at the fablab multiple times but I had to come all the way to America to get my hands on a simple Arduino Uno.

For 60$ I bought a cheap (but still good!) mechanical keyboard by Qisan, a clone of the Arduino Uno and a USB host shield.

Given that is 3 years since I have been using a dvorak layout and it's a pain to change layout on every machine that you have to use. You can imagine that given this three pieces of hardware together I put together an hardware key mapper for the keyboard.

I have never had experience with Arduino before but it was not that difficult to make it do simple things like blinking the led or send signal through to a serial monitor.

It took me half an hour to wear down all my excitement: the USB Host Shield library broke all the compatibility with the similar project I found wandering online.

In particular this blog has the most precious information and the guy wrote a HID driver that allows the Uno to be seen as a HID device.

It was a noob error but I didn't checked the various arduino alternatives and I discovered late that just a few have the HID capabilities that would make this work easier. I should have bought and Arduino Due or Leonardo maybe.

Also, the various guides about flashing with a dfu tool are specific to older models of the Uno and it took me some time to figure the name of the new components so that I could flash a new firmware.

A small journey in the Arduino world

It feels pretentious to write a little guide for this kind of work, given also the fact that I have roughly 10 hours of experience with the Arduino. But the other resources are really outdated so I hope this piece can be useful to someone out there.

All the files I have used today are on my repos and I included also an outdated version of the USB Host Shield library that I used.

The original code from this blog post works like a charm but just as a simple passthrough.

It was not difficult at all to examine the code: during each loop of the iteration a char array gets read from the shield and if it is contains information Arduino with the Serial.Write method send the data to the host.

The buffer array is a simple array of length 8 and the first two positions are reserved. In particular the first one represent the various modifier keys.

The dvorak layout has the same pairs as the US layout but eventually I got used to having the '@' where at the same place of 'Q' (qwerty) and '"' over the '2'. Also, I am an avid vim user (I should thank Simone Basso for that) and I swapped some keys on the new 65 keys keyboard. The modifier bit at the beginning of the array came in handy for my code.

An hardware key remapper is a simple but long switch C statement but I decided to consider also the modifier bit: in this way certain keys like the Window (UGH!) key is mapped to a different layer of keys. I got all the codes for the HID events here.

The process of flashing the code on the Uno goes like this:

  • write the looping code;
  • push it to the Arduino using the IDE;
  • shortcircuit the board so that it goes in DFU mode;
  • flash the .hex HID firmware;
  • try your code;
  • repeat until it's right.

Everything fits in one picture

Flashing the firmware

The firmware is in my repo but I got it from (here)[http://hunt.net.nz/users/darran/weblog/a6d52/Arduino_UNO_Keyboard_HID_version_02.html]. The tool I used to flash it is dfu-programmer (version 0.62). Every time you want to flash a new firmware the Arduino must be put in DFU mode (you can see the difference with lsusb). To do that simply create a shortcircuit using a small metal wire on the two pins near the reset button and a led will blink. This video shows the method briefly (no real need for a jumper). The commands are the following and there is no risk to brick the Uno:

dfu-programmer atmega16u2 erase
dfu-programmer atmega16u2 flash Arduino-keyboard-0.2.hex
dfu-programmer atmega16u2 reset

After each flashing the device needs to be disconnected once. Of course you can flash the original firmware back. It is included in my repo or on the official ones.

Arduino and the shield

That's it, as you can see is not difficult at all. The worst part is gathering the various info that are left dormant in blogs or forums.

Lifehacks (2)

  • If you're at a party and you don't know anyone, make it a point to meet the host and introduce yourself. The host can introduce you to other guys/girls and it scores you points.

  • Never buy high-end cables, and never buy cables at retail. Cables have higher profit margins than almost everything except extended warranties. Despite what the marketing and sales people will tell you, there is no difference. Need a computer cable? Order it from a wholesaler online. That USB cable that your printer requires will cost you $25 at Staples and $1.50 at Newegg.

  • Never quote an entire post unless it's shorter than the one you write in response.

  • Don't eat food after 6pm.

  • In college, always check the library to see if the teacher is using a test bank.

Interpolation using a genetic algorithm

This weekend I was in Milan to get a visa and I had the opportunity to work with a friend, Michele, on genetic algorithms. It was the first time I dig up in such field and it was very exciting. In this post I want to explain some bits of our work.

A brief introduction to GA

A genetic algorithm is a search/optimization algorithm that uses an heuristic approach to reduce the search space and evolve gradually to a solution.


It is an algorithm that has its root in the theory of natural selectioni by Charles Darwin. The main components of a GA are:

  • the population, that concentrate all the available solutions at a given time;
  • the fitness function, that gives an approximation of the quality of the solution codified by a given member of the population.

In a GA the first thing to do is to generate a population.

A population is a group of objects with given attributes, usually a string, and they contains in some form the solution (usually inside a string); the first population is randomly generated and contains a big number of solutions, but not every solution (this is not a bruteforce approach).

After this step the fitness functions evaluates the quality of every solution that a given member carries: the evaluation should be considered from a bottom up point of view.


Now, as in Darwin's theory of evolution, the member of the population are going to "reproduce": two members are going to be coupled to generate a new member of the second generation and every child member will contain a solution that is the product of the original genes of their parent members.

This time the reproduction of the population into a second one is not entirely random. The fitness function gives us an approximation of the quality of every gene that a member carries and by the rule of the "survival by the fittest" the probability that a member is going to reproduce with another one is proportional to the quality of its genes.

When we have a second generation of members we can recur on our GA and generate a third generation. From this point we can recur until we converge to a solution that is common to every member, or at least that is suited to our needs.


Actually, in some cases, a mutation function can be added, so that, like in real world, some times the genes are "scrambled" indipendently from the fitness function.

There is more to a GA, for example we could talk about possible ways of storing the genes inside a member or when to use mutation, anyway I want to stop here and continue with an analysis of my problem.

Interpolating a function using a GA

Me and Michele decided to spend some time developing a little python script to explore GA capabilities and we decided to interpolate some points on a cartesian plane.

Our program, that is available here uses a class to define the various members of the population and a string for the genes, a class as well for the points on the plane.

The fitness function is not as precise as it should be because this is only a proof of concept:

mutationProbability = 0.1
rangeLimit = 5
def fitness(item, pointList, n):
    value = 0
    for p in pointList:
        y = 0
        for i in range(n):
           y += item.gene[i] * pow(p.x, i)
        result = 1 - (abs (p.y - y) / rangeLimit)
        if result < 0:
            result = 0
        value += result
    return value / n

item is just a member of the population, poinList is the list of points and n is the number of points (n - 1 is the grade of the function).

for i in range(n):
    y += item.gene[i] * pow(p.x, i)

this piece of code gives us the value of the function encoded in the genes in the points of pointList;

result = 1 - (abs (p.y - y) / rangeLimit)
    if result < 0:
        result = 0

while here the script stores 1 - the previous result because if the GA has yield a good result there should be distance = 0 from the function evaluated and the points; If this is the case, the fitness function should attribute the highest possible reproduction probability for that member. At the end the fitness function returns the total value over the number of points evaluated.

As you can see this fitness function is by no means an optimal one. The reproduction probability is higher for functions that crosses some points and are really distant from others rather than for functions that are closer to every point but crosses none. Anyway for simple cases the GA yields good results, as an example for points (0 0), (1 4), (2 9) one of the member with the highest reproduction probability has this function in its genes:

-0.0487839869993989 * x^0 + 4.600339125358671 * x^1 + -0.2780958075230644 * x^2

that crosses this points: (0 -0.0488), (1 4.2735), (2 8.0395) given 80 iterations, initial population of 600 members and a two digit approximation.

For a more precise computation a higher population size and a really high number of iterations should be used.