In the first part of my answer to this question I pointed out two big facts: that almost all languages default to SVO or SOV order, and that the differences in frequency of word order preferences can be neatly explained with three principles that describe specific tendencies across languages when choosing the order of words - "Theme First," "Animated First," and "Verb-Object Bonding" - which amount to saying subjects early and keeping verbs and object adjacent. These principles describe the data pretty well - default orders with the subject in between the verb and the object are about five times less frequent, and default orders with the object before the subject are about fifteen times less frequent.
I briefly explained verb-object bonding two weeks ago but left as an open question why subjects tend to come earlier in the sentence throughout the world. The category of a "subject" is in many ways a combination of two others: the agent (the doer that performs an action) and the topic (the thing which is being discussed). Neatly, the "Animated First" principle says that the more animate or agent-like arguments usually come first, and the "Theme First" principle says that topics usually come first. With two principles both telling us that subjects should come before objects, it's hardly a wonder that they almost always do. The reasons behind these principles is the mystery, though, which a 2010 paper by Luke Maurits attempts to answer.
Before you can understand the purpose of the experiment you need to be familiar with a certain concept: speakers and listeners are in an eternal battle. Speakers want to say as little as possible, so as not to waste everyone's time and energy, but the listener wants to understand them. As a result listeners have to depend a lot on thinking ahead - a speaker can throw 20 phonemic segments at them every second, and if they have to spent extra processing time dealing with an unexpected word they can lose the whole sentence. If you've ever tried listening to someone speak a language you only kinda know, you've had that experience, when spending half a second thinking about a word you've never heard before fatally interrupts your grasp on the rest. This is called working memory. It's a finite resource and listening to even the most well-constructed sentence uses a lot. The ideal sentence has a constant rate of information flow, with minimal unexpected information.
Maurits is a computer scientist by training and took a modeling approach, using a caricature world of ducks and cheese and three-word sentences to approximate the way that word order affects information density. At first it seems counterintuitive that the order of subject, verb, and object should affect the speed at which the listener receives information, but the model revealed that some word orders are better than others at managing unpredictability. The solution is fixed on two key points: some arguments are more predictable than others, and each argument implies things about what the other arguments might be.
Subjects are very predictable. In English subjects are definite about 96% of the time, meaning that they've been mentioned in a recent sentence, so before the listener hears a single word of a sentence they know there's a pretty small applicant pool for the role of subject. In fact, the subject is usually the subject of previous sentences too, so it's practically a freebie for the listener. Notice that we usually throw in a whole extra clause if we're planning on using a previously unmentioned subject ("There was this guy, and he...").
Verbs are somewhat predictable too. The number of verbs in a language is usually smaller than the number of nouns, and if you know some other parts of the sentence you have a pretty good shot at guessing the verb.
Objects are the real wild cards. Objects are much more likely than subjects to be indefinite, never introduced previously in the conversation, and the number of objects a verb could conceivably take is usually large due to the sheer variety of nouns in the world. If you know the subject and the verb already it might not be so tough to guess, but without that information it can get pretty unpredictable. It's definitely kinder to your listener to put it closer to the end, after the subject for sure.
So this at last tells us why the most prevalent word orders put nice tame subjects before prickly, unpredictable beasts like objects. Systems that don't are unstable - syntactic moves like moving topics to the beginning, which are common in many languages, might ultimately put the kibosh on a default word order that listeners don't want to hear.
Another Maurits paper about word order evolution