The View From Mount Olympus

What did we do?
Here at Gibraltar Labs, we’ve been working on a Social Media analyser that we call Socialyze. This, cloud based, application can analyse a feed from a given Social Media account, say Twitter, and provide you with a number of real time and aggregated graphs which will help you visualize the vast amounts of information contained within your stream.

For the duration of the Olympic games, we have turned Socialyze on and told it to capture all tweets that contain the official hashtags #Olympics and #London2012. As of writing, our database contains some 19 million tweets.

Why did we do it?
One of the reasons that we turned Socialyze loose on the Olympic feed was to demonstrate to potential customers that Socialyze can handle vast quantities of information. Currently, the database grows at around 1.6 million tweets per day. There can’t be many potential customers out there who have a Social Media stream as large as that. If you do, let us know, we’d love to turn Socialyze loose on it to see how it copes. Winking smile

How did we do it?
Socialyze is a cloud based application. The Olympic instance runs on Amazon Web Services, but there is no reason that it couldn’t run on Azure or any other cloud provider.

The tweets are streamed to the instance, via the Twitter Streaming API, and stored in a MongoDB database. For business reasons, this database is not sharded nor replicated.

The result’s page at is driven by HTML and JavaScript, (JQuery) and talks to a RESTful API on the Olympic instance that is written in NodeJS. In turn, NodeJS passes the calculation off to an engine that is written in Python. This engine checks to see if the calculation is already available in the cache. If it is, the cached version is returned, if not then the calculation is made and stored in the cache. Redis is used to provide this cache.

What do the Graphs Mean?
We’ve chosen to show a number of graphs on the results page. They are:

  1. Top 5 Days by Posting Volume
    1. This graph shows you on which days your audience were most active.
  2. Top 5 Countries by Posting Volume %
    1. This graphs shows you where your top posting audience are geographically.
  3. Posting Acceleration – Last Hour
    1. The acceleration over the last hour shows you how fast interest is waxing (or waning in the case of negative acceleration) through your audience.
  4. What’s Hot – Last Hour
    1. Is a frequency distribution of the words used in the last hour and shows you what the “hot” topics have been.
  5. Top Posters by Volume – Last Hour
    1. Tells you who your most active posters have been in the last hour. Monitoring this graph will show you who is best at pushing out your marketing message at specific times of the day.
  6. Top Posters by Mentions – Last Hour
    1. Will show you who is most talked about during the last hour.
  7. Top 20 Words by Frequency – Daily Summary
    1. As 4 above but will show you which have been the “hot” topics throughout the day.

Stay tuned throughout the next few days and we’ll dig a little deeper into how we built this instance, how we calculate the measures, and what it all means.

