Sinandra: A blog engine using Sinatra and Cassandra

At GroupDock, we use Cassandra to store our Activity Streams data. After evaluating different storage options, we decided to use Cassandra. While making this decision, we built a simple blog application using Cassandra to become familiar with its data model. While we understand that people get tired of 'blog' applications, the data model for a blog is so simple and has been seen so often that any developer can probably write a blog engine in his sleep. That lets us focus on Cassandra and avoids confusions that could blur what we are trying to learn.

Sinatra is a great ruby web framework to build simple web applications. I find it particularly useful when building small examples and prototypes as it lets you define everything in one file.

You can find the code for our Cassandra-backed Sinatra blog at github.com/groupdock/sinandra. Yes, we know! We got reaaaally creative with the name.

We've released this code as open-source so that people learning Cassandra have one more example to look at. You probably shouldn't use this code in production as is. The code is very short and simple so in this blog post, we won't go about explaining how everything works. However, we will briefly go over the Cassandra data model we've designed for the blog.

Below is the Cassandra storage configuration for the blog.

We store the actual content for every blog post in the BlogEntries column family. We set the CompareWith of this column family to BytesType because we do not care about the order in which Cassandra stores this data. We will keep track of the order ourself using the TaggedPosts column family which we compare with TimeUUIDType. When we insert a new blog entry, we always enter an entry in the TaggedPosts column family with the __notag__ tag. This is our catch all tag. If we ever want to get all the possible blog posts (which we do in our blog home page), we use the __notag__ to get them all from the TaggedPosts column family. Because this column family uses CompareWith of TimeUUIDType, they will come back in chronological order. What we store in the TaggedPosts column family, is the UUIDs of the actual BlogEntries which we then use to look up for the data in the BlogEntries column family.

If the user added some tags to a blog post, we store additional entries for each tag in the TaggedPosts column family. We also use the Lists family to keep track of a list of all the tags. We query this Lists to show the list of tags in the sidebar of our blog.

In order to have an archives page that displays the posts by the month they were created, we create a key for each month that is a combination of the month and the year (e.g. October 2010). We then use the Archives column family in a similar fashion as the TaggedPosts column family. We insert an entry there for each blog posts. In essence, we are tagging them and we could have used the same TaggedPosts column family but we created a different one so that it is clearer. We use the Lists column family to keep a list of archives month-year combinations. When displaying the archives page, we first query that list from the Lists column family, which we then use to query for the BlogEntries UUIDs in the Archives column family.

Finally, we use a Comments super-column family to keep track of the comments on the blog entries. This allows us to keep comments stored for each blog post in chronological order.

We hope that open-sourcing this example application helped someone out there understand Cassandra and its data model.



blog comments powered by Disqus