Click here to see the full image
Among all the Facebook papers I have read so far, the Unicorn paper provides a particularly rough reading experience. I'm not sure if this is due to the fact this is more of a search optimization paper rather than a pure distributed system paper. I ended up with a very wordy mindmap, which sucks.
One thing that's interesting in the paper is the inner apply truncation problem. It's pretty similar to the very expensive JOIN problem that all the relational databases face. I wonder how hard is it for Facebook to utilize its optimization algorithms in the inner apply truncation problem to the MySQL database it uses as the backing storage. Of course, by walking down this approach, Facebook would end up building a big wrapping layer above MySQL, which essentially is what Unicorn is partially. But apparently Unicorn is a lot more than that, and a lot more customized towards Facebook's social related usecases, compared to a simple wrap layer that solves the scaling-out and JOIN problems only.
Beside the paper itself, there are two blog posts on Facebook's engineering blog:
Indexing and ranking in Graph Search
Building out the infrastructure for Graph Search
They can be a bit useful to understand this magical unicorn. The first one feeds some background knowledge to understand what problem they want to solve. The second post gives a nice explanation on the tier structure and the job of vertical aggregators and top aggregators, the forward index, etc. When I read the paper, some of these components/concepts are scattered/buried everywhere so it's a bit hard to keep track of and understand everything. This post helps me re-organizing those information after I read the paper.