Green party politician Malte Spitz sued to have German telecoms giant Deutsche Telekom hand over six months of his phone data that he then made available to ZEIT ONLINE. We combined this geolocation data with information relating to his life as a politician, such as Twitter feeds, blog entries and websites, all of which is all freely available on the internet. 
By pushing the play button, you will set off on a trip through Malte Spitz's life. The speed controller allows you to adjust how fast you travel, the pause button will let you stop at interesting points. In addition, a calendar at the bottom shows when he was in a particular location and can be used to jump to a specific time period.

Turns out it's a lot. Full story here.

AuthorDon Dini

Amazing article on Obama's reelection big data operation over at the New York Times. 

As the denizens of the cave were setting out to do that, the digital-analytics team, led by Rayid Ghani, a 35-year-old research scientist from Accenture Labs, developed an idea: Why not try sifting through self-described supporters’ Facebook pages in search of friends who might be on the campaign’s list of the most persuadable voters? Then the campaign could ask the self-identified supporters to bring their undecided friends along. The technique, as they saw it, could also get supporters to urge friends to register to vote, to vote early or to volunteer and donate.


Full story here. 

AuthorDon Dini

"Access Main Computer File", a delightful site with stills of UIs appearing in movies over the years. Computer UI in film is almost always terrible, as it is inherently about how to make something not cinematic appear exciting to the viewer. Over the years, however, there are definite gems that inspire actual software engineers with the kernel of a great new idea.

And the minority report interface is a terrible user interface.


So first things first, let’s get some­thing straight: cars are vehi­cles, but bicy­cles are vehi­cles as well. When you are rid­ing your bicy­cle you should fol­low the rules of the road the same way cars do. Cre­at­ing your own rules out of thin air cre­ates con­fu­sion for every­body. You must inte­grate your­self with the rest of traf­fic as if you are a slow-moving vehi­cle, because that is what you are.

Full post here.

I've been psyching myself up to get into bicycling (there are two cycles, you say?). If this thread on Reddit is anything to go by, bikers in LA are tough as nails.


Recently GigaOM carried an article written by Derrick Harris entitled: "Facebook trapped in MySQL fate worse than death". The gist of the article is:

  • Facebook is trapped using MySQL as their persistence data layer
  • This is unfortunate, because MySQL is inappropriate
  • It is inappropriate because MySQL is poor at large scales

The greatest counterargument, I suppose is, to quote John Siracusa (of Ars Technica and Hypercritical), that Facebook works. Can you imagine an architecture of greater scale? They have 700 million active users!

The article quotes Michael Stonebraker, who, as the creator of the Ingres database (and later the Postgres database. You'd think the greater eminence of the Postgres DB would ensure it's presence outside these parentheses, but look how wrong you are!), certainly knows what he's talking about when it comes to databases.

The tone of the article, however, and many like it, is this issue of painting scalability in an absolute sense - i.e. that data storage technology X either "scales" or "doesn't scale", from which one is led to infer that a service can or cannot be implemented upon said technology.

In truth, the scalability of data storage technologies can only be described in a relative sense - they are more scalable, or less scalable. Saying that something "doesn't scale" is lazy at best and misleading at worst.

The scaling problem in a nutshell The scalability of distributed systems and web applications in particular is typically talked about on quite vague terms. One can make a reasonable definition with a bit of investigation.

It seems clear that scalability contains within it this notion of what is happening to one's infrastructure as more clients are requesting resources from it.

As applied to distributed systems, scalability is used in two senses:

  • How ability to handle load increases as you add resources
  • Ability to handle increasing load in a graceful manner (for a fixed set of resources)

Notice that scaling is not performance.

Performance is about the rate of computation for fixed circumstances. Scalability is what happens as demand upon a system increases. For example, the maxsort algorithm has high performance for 5 numbers but poor scalability. The quicksort algorithm by contrast has high scalability.

Let's consider a web server, serving static content (e.g. a page containing only the text "Hello World!") and conduct an experiment - what happens as the number of requests upon it starts to increase?

One sees that two important things happen:

  • Amount of time required per-request goes up
  • Failed requests start to appear

What is the explanation for this?

Your web server has a maximum number of concurrent client requests it can serve. As connections are filling up this limit - no slowdown is yet noticeable client side. After the limit is reached, the OS initiates a backlog queue, of signals from clients (SYN signals from the TCP protocol) waiting to be processed. The size of this backlog queue is OS dependent.

As this backlog queue starts getting filled, client requests take longer to be fulfilled. Until ultimately, after the queue is filled, requests start to get dropped. Now, this is totally OS and machine context dependent, but typical values for Apache on a commodity machine when this starts happening are ~10^4 connections.

Recall this is simply for requesting a static page. As more complex requests occur, such as dynamically synthesizing the response to an SQL query, you can imagine the time + memory required to fulfill a request grows, and so the number of requests the server can receive before they start being dropped becomes smaller.

In a nutshell, then, the scaling problem is thus: we wish to prevent slow responses / dropped requests by the server. The way to do this is to prevent the request queues from filling up - and to do that one spreads requests across multiple machines. If your storage solution (such as MySQL) is better than another, then it will require fewer nodes to spread requests across. If it is poorer, more nodes to spread requests across.

Now, one can make the argument that were Facebook to use a different storage technology, they might need fewer resources. But this notion that they're on the brink of collapse is hyperbole. They simply have the requisite resources to make MySQL respond to their experienced load. Facebook pumped a sufficient number of machines into their MySQL based architecture to make their system work.

Technologies are more scalable, or less scalable. They are not scalable in an absolute sense. One can say, "well, we don't have the hardware to make this data storage system work" or, "Hey, we switched to data storage system X and now we can serve twice as many requests per hour" but it is not meaningful simply to say "X does not scale." It is a lazy answer.


Swift as a deer. Quiet as a shadow. Fear cuts deeper than swords.Quick as a snake. Calm as still water. Fear cuts deeper than swords. Strong as a bear. Fierce as a wolverine. Fear cuts deeper than swords. The man who fears losing has already lost. Fear cuts deeper than swords. Fear cuts deeper than swords. Fear cuts deeper than swords.

From A Song of Ice and Fire.


Jorge Cham of PhD Comics conducted an interview with physicists Daniel Whiteson and Jonathan Feng, then set the audio of the interview to this amazing video comic type thing. Can't really do it justice - you need to go and watch this thing. Watch it and be inspired and amazed.

Let's look at what the universe is made out of, like a pie chart. 5% of it - stuff we know. 20% - dark matter. 75% of it - we have no idea.

That's a lot of stuff.

Too many people think "Yeah scientists mostly have it figured out. They're down studying the details of the details of the details." But we have no idea! We're only now, by looking at the details, realizing what the questions we should be asking are.

When I obtained an undergrad degree in Physics during the first Earth age, that was certainly the prevailing sentiment. I think perhaps it was due to Physics grad students being a very, very depressed lot.

In the vid, Whiteson and Feng discuss the state of modern particle physics, what is really known about the universe, and most significantly the vast uncharted future.

Check it out at PhD Comics here

Dont miss it.