So first things first, let’s get something straight: cars are vehicles, but bicycles are vehicles as well. When you are riding your bicycle you should follow the rules of the road the same way cars do. Creating your own rules out of thin air creates confusion for everybody. You must integrate yourself with the rest of traffic as if you are a slow-moving vehicle, because that is what you are.
I've been psyching myself up to get into bicycling (there are two cycles, you say?). If this thread on Reddit is anything to go by, bikers in LA are tough as nails.
Recently GigaOM carried an article written by Derrick Harris entitled: "Facebook trapped in MySQL fate worse than death". The gist of the article is:
- Facebook is trapped using MySQL as their persistence data layer
- This is unfortunate, because MySQL is inappropriate
- It is inappropriate because MySQL is poor at large scales
The greatest counterargument, I suppose is, to quote John Siracusa (of Ars Technica and Hypercritical), that Facebook works. Can you imagine an architecture of greater scale? They have 700 million active users!
The article quotes Michael Stonebraker, who, as the creator of the Ingres database (and later the Postgres database. You'd think the greater eminence of the Postgres DB would ensure it's presence outside these parentheses, but look how wrong you are!), certainly knows what he's talking about when it comes to databases.
The tone of the article, however, and many like it, is this issue of painting scalability in an absolute sense - i.e. that data storage technology X either "scales" or "doesn't scale", from which one is led to infer that a service can or cannot be implemented upon said technology.
In truth, the scalability of data storage technologies can only be described in a relative sense - they are more scalable, or less scalable. Saying that something "doesn't scale" is lazy at best and misleading at worst.
The scaling problem in a nutshell The scalability of distributed systems and web applications in particular is typically talked about on quite vague terms. One can make a reasonable definition with a bit of investigation.
It seems clear that scalability contains within it this notion of what is happening to one's infrastructure as more clients are requesting resources from it.
As applied to distributed systems, scalability is used in two senses:
- How ability to handle load increases as you add resources
- Ability to handle increasing load in a graceful manner (for a fixed set of resources)
Notice that scaling is not performance.
Performance is about the rate of computation for fixed circumstances. Scalability is what happens as demand upon a system increases. For example, the maxsort algorithm has high performance for 5 numbers but poor scalability. The quicksort algorithm by contrast has high scalability.
Let's consider a web server, serving static content (e.g. a page containing only the text "Hello World!") and conduct an experiment - what happens as the number of requests upon it starts to increase?
One sees that two important things happen:
- Amount of time required per-request goes up
- Failed requests start to appear
What is the explanation for this?
Your web server has a maximum number of concurrent client requests it can serve. As connections are filling up this limit - no slowdown is yet noticeable client side. After the limit is reached, the OS initiates a backlog queue, of signals from clients (SYN signals from the TCP protocol) waiting to be processed. The size of this backlog queue is OS dependent.
As this backlog queue starts getting filled, client requests take longer to be fulfilled. Until ultimately, after the queue is filled, requests start to get dropped. Now, this is totally OS and machine context dependent, but typical values for Apache on a commodity machine when this starts happening are ~10^4 connections.
Recall this is simply for requesting a static page. As more complex requests occur, such as dynamically synthesizing the response to an SQL query, you can imagine the time + memory required to fulfill a request grows, and so the number of requests the server can receive before they start being dropped becomes smaller.
In a nutshell, then, the scaling problem is thus: we wish to prevent slow responses / dropped requests by the server. The way to do this is to prevent the request queues from filling up - and to do that one spreads requests across multiple machines. If your storage solution (such as MySQL) is better than another, then it will require fewer nodes to spread requests across. If it is poorer, more nodes to spread requests across.
Now, one can make the argument that were Facebook to use a different storage technology, they might need fewer resources. But this notion that they're on the brink of collapse is hyperbole. They simply have the requisite resources to make MySQL respond to their experienced load. Facebook pumped a sufficient number of machines into their MySQL based architecture to make their system work.
Technologies are more scalable, or less scalable. They are not scalable in an absolute sense. One can say, "well, we don't have the hardware to make this data storage system work" or, "Hey, we switched to data storage system X and now we can serve twice as many requests per hour" but it is not meaningful simply to say "X does not scale." It is a lazy answer.
Swift as a deer. Quiet as a shadow. Fear cuts deeper than swords.Quick as a snake. Calm as still water. Fear cuts deeper than swords. Strong as a bear. Fierce as a wolverine. Fear cuts deeper than swords. The man who fears losing has already lost. Fear cuts deeper than swords. Fear cuts deeper than swords. Fear cuts deeper than swords.
From A Song of Ice and Fire.
Jorge Cham of PhD Comics conducted an interview with physicists Daniel Whiteson and Jonathan Feng, then set the audio of the interview to this amazing video comic type thing. Can't really do it justice - you need to go and watch this thing. Watch it and be inspired and amazed.
Let's look at what the universe is made out of, like a pie chart. 5% of it - stuff we know. 20% - dark matter. 75% of it - we have no idea.
That's a lot of stuff.
Too many people think "Yeah scientists mostly have it figured out. They're down studying the details of the details of the details." But we have no idea! We're only now, by looking at the details, realizing what the questions we should be asking are.
When I obtained an undergrad degree in Physics during the first Earth age, that was certainly the prevailing sentiment. I think perhaps it was due to Physics grad students being a very, very depressed lot.
In the vid, Whiteson and Feng discuss the state of modern particle physics, what is really known about the universe, and most significantly the vast uncharted future.
Dont miss it.