Using the Strategy Pattern to Control Grouping of Data

By Allen Fung and Ajay Lakshminarayanarao

An important ShareThis product is the Social Data Feed, which our customers use to access social signals across the internet and consume country level data about social events generated around the world. The Social Data Feed is a file that contains an event for each webpage that our users have seen. This file contains about 500 million events daily. An event contains information, such as URL, geo, and timestamp. A customer recently requested that we put the events in multiple files, based on country of origin, instead of a single file.

We used the Strategy pattern to solve this problem. This enables us to support the new functionality as well as preserve the original one. The Strategy pattern allows an object to have some of its behavior defined in terms of another object which follows an interface. We used the Strategy pattern by creating two new objects. One of them had a method that returned the country of an event. The other had a method that always returned an empty string, no matter what event was provided.

Here’s the Strategy that returned the country:

class GroupByCountry: def doOperation(self, json): return json[‘geo’][‘ISO’]

Here’s the Strategy that always returned the empty string:

class NoGroupBy: def doOperation(self, json): return “”

The code below shows how these Strategies are used in the application that creates the Social Data Feed. This application takes a log file from a web server, adds information to each event, and outputs files with the added information. You can see the Strategy pattern used in line 5 of the code. The application will produce either multiple files or a single file depending on which Strategy is passed into the run method.

def run(file, writer, worker, outFilename, strategy): with open(file) as f: for line in f: json = worker.addInfo(f) country = strategy.doOperation(json) writer.write(outFilename + ”.” + country, json) writer.close()

The main alternative to Strategies are lambdas. Lambdas are more concise than Strategies, but do not easily support the addition of methods or state. We also wanted to separate each Strategy into its own module and allow the Strategy to be specified on the command line. With this requirement, lambdas became just as verbose as Strategies, so we used Strategies in the end.