Basically the way it works is we have a master database server and a batch of slave servers. The idea is on pages where you only need to read data (especially slower lookups like keyword searches), we use the slave server. These servers are load balanced so if one is busy, it uses a different one, thus keeping page loads nice and efficient.
When we need to add any information, like posting a message, or submitting content, it writes to the primary database. We never directly write data to the slave servers, rather, they use a sort of query queue. When the master database gets a new item added, it adds that query to the slave queue and those all get run in order.
The problems we have been experiencing come from an issue where this queue on the slave servers is falling radically far behind the master. Typically they run immediately, but some of our big automated jobs take a little longer and the slaves fall behind. The site still runs fact because everything is doing what it's supposed to, but new data (such as a newly submitted movie) doesn't exist on the slave server yet, so the submission page throws an error. After a few minutes, the slave catches up and the submission magically works.
Anyway, this latency was something that obviously should not have been happening, so that's why we had the downtime yesterday.