Batch Updating Plugins
Batch updating plugins allows your plugin to update/set one or more user properties in the current project's data in large batches. An example would be a plugin that predicts the likelihood that a user will convert, and predicts this for every user in the current project.
For each user it saves a number from 0.0 to 1.0, based on how likely it is that this user will convert. This property can then be re-used everywhere on the platform, such as segments, report filters and more.
Configuring plugin as batch updating
To make your plugin support batch updating, one of the plugin JSON result stages needs to output an object with key batches
on the root object. This can be added to the output JSON from the initial stage or any additional stage. If multiple stages output an batches
results object, the one from the last stage will be taken.
Here is a basic batches
object specification that is added on the trainMore
stage:
As can be seen in the example above, the batches
object is pretty straightforward:
- maxBatchSize: How many user objects to score in a single batch? Defaults to 10,000, minimum needs to be 1000 and maximum 10,000,000.
- options: Optional — This object will be passed on to the manifest of the batch stage. This way you can easily share parameters from for example the model training stage with the batch stage.
Batch updating plugin manifest
When a plugin supports batch updating, it again receives a JSON manifest for the plugin to read, just like in any other stage such as the initial one.
During local development, get your batch stage manifest by doing a GET like this — note that the last path for the stage stage needs to be batch
:
Store the results in a file called batch-manifest.json
, as we'll use it in the next step to the launch plugin's batch process.
Here is what the JSON manifest will look like:
First note that the stage
will always be batch
for the batch updating part of plugins. Basically when the plugin loads the JSON manifest, and it finds that manifest.stage == 'batch'
we simply start the batch scoring part of our plugin code.
Data urls are available under dataUrls
, because we need to get a batch of say 1000 user records with their features and score each of them. Any of the datasets requested in the initial or additional stages are available here. Note that each data url has range_start_gt_or_eq
and range_end_lt
query parameters appended automatically — this so that we get roughly the number of users per batch as we've requested during earlier stages with the batches
object.
To update the actual user properties, we need to upload a JSON file with the updated properties for each user. This is done via the getUploadUrls
batch
property, more on this later.
You can download files uploaded to storage from previous stages using the downloadUrls
, and append the file path used during uploading, for example ${manifest['downloadUrls']['initial']}/model.pkl
The options
object is an exact copy of what was specified in the previous stage under the batches
object. The metadata
is the exact same object as for any previous stages.
Batch updating user properties
A batch updating plugin should generate two files.
The usual results.json
which has no data
or other properties like it has for regular plugin runs, but only has the status
object like below:
In case any of the batches doesn't have a success
code, the plugin run as a whole will fail.
The other file is a data.json
file, which contains the actual users and property values that should be updated/set:
As can be seen in the example above the data.json
file is pretty straightforward:
- category: Optional, but recommended to specify this. Each time this plugins runs, the properties will be saved under the name of the plugin plus the timestamp of the run. If
category
is given those properties will end up under its own sub-menu in the user properties menu. If not used, the properties will appear at root of the user properties. - properties: An array of strings. Each element corresponds with the name of the user property being set or updated. Needs to have a minimum of one element.
- updates: An array of arrays. Each element starts with the
user_id
(see Dataset and features), then each element after that corresponds with the values as specified in theproperties
array. So in this example the values"A", "A", "B", "C"
are for to the"Class"
property, while0.43, 0.59, 0.37, 0.01
are for the"Rating"
property. The first element thus updatesuser_1
and setsClass
for that user toA
andRating
to0.43
.
Running the plugin
Once you're ready to test your batch updating plugin, start it with the command below, where batch-manifest.json
was generated in the previous step:
Once the run finished, you should have a results.json
and data.json
file.
Your code should automatically upload the data.json
file using the batch
key of getUploadUrls
. So you'll have to add that part before your plugin is fully completed. Basically you first need to get a signed upload url, by doing a GET
on getUploadUrls
for the batch
key, and append to the GET
request /data.json
. Then make a PUT
request on that signed upload url as-is, sending the data.json
file. Here's the flow using curl:
You can either implement this using http request libraries for your language, or just use system exec and curl
from your plugin code, as curl is available in your plugin runner environment.