Model Building/Retraining Automation @ShareThis Using H20 Platform

Dr. Changyi Zhu, Principal Engineer

An H2O (version 2/h2o-2) Rest API (Java) interface, as shown below, has been created at ShareThis to automate model building and retraining

In order to improve campaign performance, it is imperative to continuously refresh machine learned models so that the models use recent data for fine tuning. In order to do this, we need an automated pipeline. We use H20 ( http://http://h2o.ai/ ) platform to build models. It has a nice UI to build  new models. The UI is really handy to experiment with new models but once a model algorithm has been decided, retraining with more recent data and pushing it to production should be lot more automated (API driven).  Since more of the infrastructure is  on Java, we want to build a interface which will make it easy.

image


Prepare Model Builder Input Properties

A default input property file (for GBM or GLM model) is provided by this interface. The jsons  shown below can be modified for a specific (GBM) model.


image



image




Build the Model

A default bash script shown below can be modified to run the build (for a GBM model).

#!/bin/bash
java $CLASSPATH com.sharethis.service.h2o.model.GBM bin/res/gbm.properties

A jar file for the model will be generated by the interface and can be loaded by an application dynamically for optimization. Both the REST request and response are saved in a log file for further analysis to automate the model building/retraining process

Analyze the REST Response for Model Building/Retraining

The log file can be loaded by an analyzer using java.util. Properties and the REST responses, which are saved as key/value pairs, can be accessed through the following keys:

public static final String IMPORTFILERESPONSE
public static final String PARSEFILERESPONSE
public static final String MODELBUILDRESPONSE
public static final String MODELPREDICTRESPONSE
public static final String MODELINSPECTRESPONSE
public static final String MODELCALCAUC_RESPONSE

The response values can be deserialized into a list of java objects using the classes included in the interface. Those responses can then be analyzed by an analyzer, which is a part of a future project, for a given set of rules to tune the model input properties for a specific model.