"...Facebook and Twitter are treated as a goldmine where people’s thoughts are concerned. Scientists believe that the heaps of data that these social media platforms gather can correctly portray what users are thinking. What scientists overlook, however, is to correct inherent biases that datasets contain."

Full story here

Posted
AuthorDon Dini

https://www.youtube.com/watch?v=2NzA4XLjRaM

Hi I'm up late working. What's happening internet. 

Posted
AuthorDon Dini

Full story here.

Quite a triumph for iPython notebook. By publishing the notebook along with a paper, scientists give others the ability to reproduce their work immediately. This is quite...amazing. 

Posted
AuthorDon Dini
Green party politician Malte Spitz sued to have German telecoms giant Deutsche Telekom hand over six months of his phone data that he then made available to ZEIT ONLINE. We combined this geolocation data with information relating to his life as a politician, such as Twitter feeds, blog entries and websites, all of which is all freely available on the internet. 
By pushing the play button, you will set off on a trip through Malte Spitz's life. The speed controller allows you to adjust how fast you travel, the pause button will let you stop at interesting points. In addition, a calendar at the bottom shows when he was in a particular location and can be used to jump to a specific time period.

Turns out it's a lot. Full story here.

Posted
AuthorDon Dini

Amazing article on Obama's reelection big data operation over at the New York Times. 

As the denizens of the cave were setting out to do that, the digital-analytics team, led by Rayid Ghani, a 35-year-old research scientist from Accenture Labs, developed an idea: Why not try sifting through self-described supporters’ Facebook pages in search of friends who might be on the campaign’s list of the most persuadable voters? Then the campaign could ask the self-identified supporters to bring their undecided friends along. The technique, as they saw it, could also get supporters to urge friends to register to vote, to vote early or to volunteer and donate.

 

Full story here. 

Posted
AuthorDon Dini

"Access Main Computer File", a delightful site with stills of UIs appearing in movies over the years. Computer UI in film is almost always terrible, as it is inherently about how to make something not cinematic appear exciting to the viewer. Over the years, however, there are definite gems that inspire actual software engineers with the kernel of a great new idea.

And the minority report interface is a terrible user interface.

Posted
Authorddini
CategoriesUncategorized

So first things first, let’s get some­thing straight: cars are vehi­cles, but bicy­cles are vehi­cles as well. When you are rid­ing your bicy­cle you should fol­low the rules of the road the same way cars do. Cre­at­ing your own rules out of thin air cre­ates con­fu­sion for every­body. You must inte­grate your­self with the rest of traf­fic as if you are a slow-moving vehi­cle, because that is what you are.

Full post here.

I've been psyching myself up to get into bicycling (there are two cycles, you say?). If this thread on Reddit is anything to go by, bikers in LA are tough as nails.

Posted
Authorddini
CategoriesUncategorized

Recently GigaOM carried an article written by Derrick Harris entitled: "Facebook trapped in MySQL fate worse than death". The gist of the article is:

  • Facebook is trapped using MySQL as their persistence data layer
  • This is unfortunate, because MySQL is inappropriate
  • It is inappropriate because MySQL is poor at large scales

The greatest counterargument, I suppose is, to quote John Siracusa (of Ars Technica and Hypercritical), that Facebook works. Can you imagine an architecture of greater scale? They have 700 million active users!

The article quotes Michael Stonebraker, who, as the creator of the Ingres database (and later the Postgres database. You'd think the greater eminence of the Postgres DB would ensure it's presence outside these parentheses, but look how wrong you are!), certainly knows what he's talking about when it comes to databases.

The tone of the article, however, and many like it, is this issue of painting scalability in an absolute sense - i.e. that data storage technology X either "scales" or "doesn't scale", from which one is led to infer that a service can or cannot be implemented upon said technology.

In truth, the scalability of data storage technologies can only be described in a relative sense - they are more scalable, or less scalable. Saying that something "doesn't scale" is lazy at best and misleading at worst.

The scaling problem in a nutshell The scalability of distributed systems and web applications in particular is typically talked about on quite vague terms. One can make a reasonable definition with a bit of investigation.

It seems clear that scalability contains within it this notion of what is happening to one's infrastructure as more clients are requesting resources from it.

As applied to distributed systems, scalability is used in two senses:

  • How ability to handle load increases as you add resources
  • Ability to handle increasing load in a graceful manner (for a fixed set of resources)

Notice that scaling is not performance.

Performance is about the rate of computation for fixed circumstances. Scalability is what happens as demand upon a system increases. For example, the maxsort algorithm has high performance for 5 numbers but poor scalability. The quicksort algorithm by contrast has high scalability.

Let's consider a web server, serving static content (e.g. a page containing only the text "Hello World!") and conduct an experiment - what happens as the number of requests upon it starts to increase?

One sees that two important things happen:

  • Amount of time required per-request goes up
  • Failed requests start to appear

What is the explanation for this?

Your web server has a maximum number of concurrent client requests it can serve. As connections are filling up this limit - no slowdown is yet noticeable client side. After the limit is reached, the OS initiates a backlog queue, of signals from clients (SYN signals from the TCP protocol) waiting to be processed. The size of this backlog queue is OS dependent.

As this backlog queue starts getting filled, client requests take longer to be fulfilled. Until ultimately, after the queue is filled, requests start to get dropped. Now, this is totally OS and machine context dependent, but typical values for Apache on a commodity machine when this starts happening are ~10^4 connections.

Recall this is simply for requesting a static page. As more complex requests occur, such as dynamically synthesizing the response to an SQL query, you can imagine the time + memory required to fulfill a request grows, and so the number of requests the server can receive before they start being dropped becomes smaller.

In a nutshell, then, the scaling problem is thus: we wish to prevent slow responses / dropped requests by the server. The way to do this is to prevent the request queues from filling up - and to do that one spreads requests across multiple machines. If your storage solution (such as MySQL) is better than another, then it will require fewer nodes to spread requests across. If it is poorer, more nodes to spread requests across.

Now, one can make the argument that were Facebook to use a different storage technology, they might need fewer resources. But this notion that they're on the brink of collapse is hyperbole. They simply have the requisite resources to make MySQL respond to their experienced load. Facebook pumped a sufficient number of machines into their MySQL based architecture to make their system work.

Technologies are more scalable, or less scalable. They are not scalable in an absolute sense. One can say, "well, we don't have the hardware to make this data storage system work" or, "Hey, we switched to data storage system X and now we can serve twice as many requests per hour" but it is not meaningful simply to say "X does not scale." It is a lazy answer.

Posted
Authorddini
CategoriesUncategorized