RedisAI: Thor’s Stormbreaker for Deep Learning deployment

Published in

tensorwerk

19 min readApr 20, 2019

Well, kinda :-)

TLDR; Do you need to deploy a deep learning model to production? Do you know how to operate Redis? There might be something in for you.

2017–2019: We have seen the rise of a few model servers in the last three years from different tech giants and other organizations. A few promising entities are Google’s Tensorflow serving, Amazon’s MxNet Model Server, Nvidia’s TensorRT and Berkeley’s Clipper. While all of these represent great options for deploying deep learning to production, the production landscape is still rife with opportunities for addressing the challenges of reliability and scale. Well, it’s not just a matter of scale, there are multitudes of questions that pop into the head of developers, data scientists or DevOps engineers when it’s time to move from a working prototype to production. We have done an informal survey with friends and colleagues that have production deployments of deep learning models, in order to understand those questions and what the answers might sound like. Few of them are:

Which deep learning framework: the majority went with TensorFlow, PyTorch is on the rise;
Which programming language: Python won in this, obviously. Interesting fact - we even met people who had the entire backend stack built on Erlang but had to hire Python devs to manage the deep learning service;
What to do to scale: even though Tensorflow serving + k8 won here, we have had quite a good number of companies that could handle the scale with single-node infrastructure;
Need of microservice-ization: yes, without a second thought;
GPU or CPU for inference: most people use GPU for training and CPU for inference;
How to monitor: the majority was satisfied with Python loggers and print functions. They felt investing in setting up a fine-grained monitoring system around their deep learning execution code was not readily available and would have required significant effort;
More than one deep-learning/machine-learning framework: although some found it useful to combine different frameworks especially for having traditional machine learning algorithms in the same deep learning pipeline, it wasn’t perceived as an easy thing to implement.

That could give us some insights into the way things are handled right now. These are definitely great advancements from where we were a couple of years back. However, the process of deploying to production is painful and cumbersome even with all the advances we have made so far. Some of us want to use PyTorch for development and use TorchScript for deployment but then where to? Some want to use Haskell for their app, but they definitely need a microservice written in python (perhaps c/c++/swift is possible). Scalability & high availability are a whole other story. DevOps needs to reach out to tools like k8, along with the required skill-set and time (to optimize the cost considering the server instances for running models and storing data).

The most viable choices for deploying deep learning to production in the industry as of today can be summarized as:

Deploy with a cloud platform like SageMaker or GCP Cloud API - They are designed to scale but we’ll be depending on the cloud and locked to the specific cloud provider;
Deploy Tensorflow serving, MXNet-model-server or another serving engine on your own - Operating these runtimes at scale requires setting up clusters on k8 or similar orchestration systems.
A more novice-friendly and streamlined approach would be to put your Python inference code behind a minimal API server like Flask or Django. While this is easy, managing the scale with this setup is not a cake walk. For starters, each inference call requires a non-negligible amount of time compared to your standard API call, and you want to build back-pressure mechanisms using a queuing system from day one.

We have been challenged with a decision on one of these three options in the past. We now have another, with the release of RedisAI.

So, what is RedisAI? In a nutshell, it is a new option for productionizing deep learning models, born from a collaboration between [tensor]werk and RedisLabs. We think it represents an opportunity to strike a new balance between operational simplicity, industrial-grade reliability and the ability to serve small applications up to large deployments at scale.

In this blog post, we walk around the features of RedisAI with a working example and discuss the upcoming features we are excited about. If you are new to Redis itself, you might want to go back and understand why Redis exists and how it works.

RedisAI started out as RedisTF, which was born in 2016 as an experiment to connect Tensorflow’s C API to the then-new Redis module API. Development resumed in the fall of 2018 and it quickly grew to a full-fledged deep learning runtime with multi-backend support. Before starting to run through features of RedisAI, let’s answer the WHY question. Why another runtime server when high-performance alternatives are available? To answer this question, we have to connect back to the survey result given above.

Why?

With another runtime that could or could not handle the scale automatically, developers need to learn one more tool to address production deployment and get to know how it behaves at scale. The interesting but obvious fact is that a good percentage of companies/people already uses Redis to fill several needs such as caching, queuing, etc. Those who already know how to use Redis can do deep learning deployment with very little extra effort since RedisAI is built as a Redis module that can be loaded into an existing Redis server. Keeping data and runtime close will help us save cost and operational complexity — fewer moving parts (data) is less cost and less headache.

Distributing data across clusters or ensuring high availability is no easy feat. Stay tuned for the example below that shows how you can set up a high availability infra with RedisAI with a minimal amount of components and moving parts, out of the box.

RedisAI lets you store tensors in a cluster, use them as input to a model and run it on CPU or GPU to create new tensors, which you can then get. RedisAI understands how to run PyTorch and TensorFlow models, and shortly models saved in the ONNX interchange format from almost any framework thanks to ONNXRuntime (an upcoming feature). RedisAI can actually execute models from multiple frameworks as part of a single pipeline. If your tech stack is on some other language and you don’t really want to introduce Python into it, as far as the language of your choice has a Redis client (very likely), you can deploy your model into RedisAI and use your Redis client to control the execution with little to no overhead, with back-pressure taken care of — you won’t need one or more additional microservices. As an interesting new feature, you can write SCRIPTs to manipulate your data on CPU or GPU before or after the execution with your model. SCRIPTs are written in TorchScript, a subset of the Python language sporting a rich tensor API, not needing a Python interpreter but instead running on a highly optimized libtorch runtime. With that and with the features we have in the pipeline we hope we’ve caught your attention, as we believe that RedisAI makes a good contender for the go-to DL/ML serving solution in your stack.

What and How?

Enough with WHY, now we take a look into WHAT and HOW. As I mentioned before, RedisAI is built as a module that can be loaded into a Redis server with the --loadmodule switch. With the RedisAI module, Redis can store another data type, the Tensor (another word for multi-dimensional arrays). For the C-savvy readers, tensors in Redis are represented as DLPack structs, which is an RFC for a common in-memory tensor structure and operator interface for deep learning systems. Here’s what a DLPack tensor looks like:

typedef struct {
 void* data;
 DLContext ctx;
 int ndim;
 DLDataType dtype;
 int64_t* shape;
 int64_t* strides;
 uint64_t byte_offset;
} DLTensor;

A user can send multidimensional arrays to RedisAI using the Redis client and store them in Redis as DLPack tensors. Once there, RedisAI can use those tensors as inputs to PyTorch and Tensorflow models, through a thin interoperability layer. Shortly, RedisAI will also be able to run deep learning models from other frameworks through ONNXRuntime. For those who haven’t heard about it, it is a cross-platform, high-performance inference engine built by Microsoft to execute ONNX models. Thus, as far as your model is convertible to the ONNX format, you can run it on RedisAI. That includes machine learning models exported from sklearn, read on.

Apart from storing tensors and running models, RedisAI also can execute TorchScript SCRIPTs on CPU and GPUs, as I have mentioned before. We’ll see a good example of SCRIPT while we run through the example. RedisAI does all the above while still being Redis, with all the features and operational flexibility we love it for.

RedisAI is architected in a way that it enables users to keep the data local, keep the model hot and keep the stack short. This equips users to save cost and optimize the use of resources.

Now we’ll jump into an example implementation of object detection with Yolo V3 on RedisAI step by step.

We’ll start by installing Redis and RedisAI
We’ll then look into setting up the client
After that, we’ll deploy the model and implement the preprocessing code as a SCRIPT
Then we’ll see it all in action, with setting tensors, executing the model, reading the output and post-processing

Installing Redis and RedisAI

RedisAI requires you to install Redis as a prerequisite and then load RedisAI as a module. But for making our developers’ life easier, we have created a Docker image which can get you up and running with just:

docker run -p 6379:6379 -it --rm redisai/redisai

However, if you want to go through the process step by step, you should install Redis first. Installing Redis is as simple as executing a couple of shell commands as given in the official doc. Building RedisAI is also possible with few lines of commands (clone, get dependencies and build):

git clone https://github.com/RedisAI/RedisAI.git
cd RedisAI
bash get_deps.sh cpu
mkdir build
cd build
cmake -DDEPS_PATH=../deps/install ..
make
cd ~

The only line that would make you uncomfortable if you are a regular cmake user is the third line. get_deps.sh is used for installing backends (libtorch for PyTorch, libtensorflow for TensorFlow). While it is possible to install and run the GPU version of the backends on Linux by executing bash get_deps.sh (without passing any arguments), we are trying it out with the CPU version for the examples. If you are setting up a remote server, you might need to add exceptions in the firewall and install some dependency packages as well. Here is the link to the installation.shscript we have used while setting up RedisAI in a new DigitalOcean cloud machine running with Ubuntu 18.04.

Now you should have a Redis server installed and RedisAI built inside the build folder in the RedisAI directory. Running a Redis server loaded with RedisAI can be done with a one-liner:

LD_LIBRARY_PATH=<path to RedisAI>/deps/install/lib redis-server --loadmodule <path to RedisAI>/build/redisai.so

You should see a screen similar to this which indicates you have a Redis server running and the RedisAI module loaded.

That’s it!!!. Your RedisAI inference server is up and running.

Client Utility

One of the main advantages of using RedisAI as an inference server over the other choices for inferencing in production is support for client libraries. As I mentioned before, if your language has a client package for the Redis server, then you can communicate to RedisAI with the same client. Here I show setting a tensor as an example using redis-cli, the Python Redis client, the Python RedisAI client (which is is just a convenience wrapper for the RedisAI-specific commands around the Python Redis client), the NodeJS Redis client and the Go Redis client. Although here I am showing simple examples, the example repo contains sophisticated deep learning examples using clients from different languages. Also, you can find the extensive list of all RedisAI commands in the official doc.

redis-cli

# Setting a tensor using redis-cli
AI.TENSORSET foo FLOAT 2 2 VALUES 1 2 3 4

Python Redis client

# Setting a tensor using python Redis client
import redis
r = redis.Redis(host='localhost', port=6379)
r.execute_command('AI.TENSORSET', 'foo', 'FLOAT', 2, 2, 'VALUES', 1, 2, 3, 4)

Python RedisAI client

# Setting a tensor using Python RedisAI client
import redisai as rai
con = rai.Client(host='localhost', port=6379)
foo = rai.Tensor(rai.DType.float, [2, 2], [1, 2, 3, 4])
con.tensorset(foo)

NodeJS Redis client

# Setting a tensor using NodeJS Redis client
var Redis = require('ioredis')
let redis = new Redis({ parser: 'javascript' });
redis.call('AI.TENSORSET','foo','FLOAT',2,2,'VALUES',1,2,3,4)

Go Redis client

# Setting a tensor using GoLang Redis client
package mainimport "github.com/go-redis/redis"func main() {
    client := redis.NewClient(&redis.Options{
            Addr:     "localhost:6379",
            Password: "",
        })          client.Do("AI.TENSORSET","foo","FLOAT",2,2,"VALUES",1,2,3,4)
}

Pipeline

Now that we have a basic idea about different client utilities, let’s start building the pipeline. As I have mentioned, we are deploying a Yolo V3 model built with Tensorflow into RedisAI. I have chosen a Tensorflow model for this example to show how SCRIPT (which calls into TorchScript from PyTorch) and Tensorflow model can be plugged together into a single pipeline. We are also doing some preprocessing from our Python program before passing the input image into RedisAI. With all of this chained together, you should get a succinct view of how flexible RedisAI is and how organized your pipeline can be with it.

As part of this example, we’ll give an image of Arya Stark to our Yolo model and expect the model to predict the bounding boxes for a person, since we don’t have other classes in the image. If you don’t know what Yolo is, it’s probably the most popular object detection neural network architecture (courtesy of pjreddie). A detailed explanation is available here. Our Yolo implementation in Tensorflow expects input images to be square-shaped and hence we run the image through a letter_box function in Python which pads the shorter side with constant pixels and makes it into a square. The square image with pixel values ranges from 0 to 255 is then passed to RedisAI, where a SCRIPT normalizes the integer pixel values to the 0–1 range (this is a prerequisite for our Yolo model). The normalized image is then passed through the actual TensorFlow model. After that, we get the output, which in this case is the predicted bounding box coordinates, from RedisAI and draw a new image with a bounding box using pillow, a Python imaging library.

The complete pipeline for our Yolo model deployment

Preprocessing from Python

The letterbox function can be defined as below. We are using OpenCV for resizing and padding the border with constant value pixels. And before returning the padded image back, we expand the three-dimensional tensor representation of the image (a 2d image can be represented as a three-dimensional tensor that has 2d spatial dimensions for each R, G, B channels) to four-dimensional. This is because of our Tensorflow model expects a four-dimensional input with the zeroth dimension as batch size.

import numpy as np
import cv2def letter_box(numpy_image, height):
    shape = numpy_image.shape[:2]
    ratio = float(height) / max(shape)
    new_shape = (int(round(shape[1] * ratio)), int(round(shape[0] * ratio)))
    dw = (height - new_shape[0]) / 2
    dh = (height - new_shape[1]) / 2
    top, bottom = int(round(dh - 0.1)), int(round(dh + 0.1))
    left, right = int(round(dw - 0.1)), int(round(dw + 0.1))
    numpy_image = cv2.resize(numpy_image, new_shape, interpolation=cv2.INTER_AREA)
    numpy_image = cv2.copyMakeBorder(
        numpy_image, top, bottom, left, right, cv2.BORDER_CONSTANT, value=(127.5, 127.5, 127.5))
    return np.expand_dims(numpy_image, axis=0)

The letter_box() function resizes the above-given picture of Arya Stark to the given size and pads it with the required number of pixels to make it square. The output of letter_box with height=416 looks like this.

SCRIPT

The second part of preprocessing is implemented as a SCRIPT. Quoting myself again, SCRIPT is a subset of the Python language that is natively interpreted by libtorch. The AI.SCRIPTSET command accepts a string, which is supposed to be the TorchScript code snippet containing one or more functions operating on tensors using the TorchScript tensor API. You can pass more than one function as part of a single script and call one function from others or you can call them separately from the RedisAI client. Here, for simplicity, we are saving only one function as SCRIPT and calling it from the client. The main use case for SCRIPT is for running pre/post-processing operations on tensor data, but it is not the only one. Having an option to do it outside the model graph but on CPU/GPU and inside the runtime itself is a big win.

def pre_process(image):
    image = image.float()
    image /= 255
    return image

We save the above function as script.txt (make sure you have a new line character at the end and that you are not using import statements since it is not being executed by a Python interpreter) and run the code below to set the SCRIPT in RedisAI (make sure you have Redis server running and that it is loaded with RedisAI):

import redisai as rai
con = rai.Client(host='localhost', port=6379)with open('script.txt') as f:
    script = f.read()con.scriptset('script', rai.Device.cpu, script)

Setup the Model

We have made the first two blocks in our pipeline and now it’s time to set up the model. The pretrained Tensorflow Yolo model is available to download here. It is a protobuf file containing a TensorFlow frozen graph, exported using https://github.com/tensorflow/tensorflow/blob/master/tensorflow/python/tools/freeze_graph.py). Utilities for supporting users in exporting graphs are part of the RedisAI Python client and is available here (keep an eye on the README for more updates). Once you have the model file downloaded as yolo.pb, setting a model to RedisAI can be done like this

with open('yolo.pb', 'rb') as f:
    model = f.read()
con.modelset(
    'model', rai.Backend.tf, rai.Device.cpu, model,
    input=['input_1', 'input_image_shape'],
    output=['concat_11', 'concat_12', 'concat_13'])

Tensorflow graph needs input_1 and input_image_shape as input tensors and has a placeholder for those two. Since tensors are placeholders in the graph, we need to know what are the names of the output placeholders we are interested in. This information needs to be passed to RedisAI while setting up the model so that RedisAI knows where in the graph to fetch the output tensors from. We could either check the graph nodes and figure out the input and output placeholders or could use some tools like Netron to visualize the frozen graph and to figure it out.

Above is a screenshot from the Netron visualizer I have used to find the placeholder names. The placeholders we are interested in are concat_11 which contains the coordinates of boxes and concat_13 contains the predicted classes of each box.

Post Processing

This is the final step in our pipeline, where RedisAI is done with the execution and the outputs have been saved to corresponding keys. We can now fetch tensors from these keys and post-process the input image to superimpose the bounding boxes generated by Yolo.

def post_process(classes, boxes, shapes)
    pad_x = max(shapes[0] - shapes[1], 0) * (new_shape / max(shapes))
    pad_y = max(shapes[1] - shapes[0], 0) * (new_shape / max(shapes))
    unpad_h = new_shape - pad_y
    unpad_w = new_shape - pad_x 
    for ind, class_val in enumerate(classes):
        top, left, bottom, right = boxes[ind]
        top = ((top.astype('int32') - pad_y // 2) / unpad_h) * shapes[0]
        left = ((left.astype('int32') - pad_x // 2) / unpad_w) * shapes[1]
        bottom = ((bottom.astype('int32') - pad_y // 2) / unpad_h) * shapes[0]
        right = ((right.astype('int32') - pad_x // 2) / unpad_w) * shapes[1]
        yield left, top, right, bottom

The post_process function given above takes classes and boxes from RedisAI and the shape of the input image. It then determines the coordinates on the original images based on the predicted coordinates from RedisAI. This is required because we need to plot the bounding box into the original image, with the size and aspect ratio it had at the beginning. We could actually save the post-processing part as a SCRIPT or even as a PyTorch ScriptModule model and do the operations inside RedisAI, we’ll keep it outside for now.

Executing the Pipeline

The pipeline is now read, hot and craving for inputs! The snippet below reads the image from disk, converts it to a NumPy array and executes the first stage in the pipeline. The output is then passed as a tensor to RedisAI, with values passed in as a BLOB. The tensor is saved in Redis at the key image. We then run the SCRIPT and use the output of SCRIPTRUN to run the model. The output of the model is saved on three different keys which are then fetched by the client and sent to the post-processing function.

new_shape = 416
image_path = <path to aryastark.jpg>
pil_image = Image.open(image_path)
numpy_img = np.array(pil_image)
image = letter_box(numpy_img, new_shape)
image = rai.BlobTensor.from_numpy(image)
con.tensorset('image', image)
input_shape = rai.Tensor(rai.DType.float, shape=[2], value=[new_shape, new_shape])
con.tensorset('input_shape', input_shape)
con.scriptrun('script', 'pre_process', input=['image'], output=['normalized_image'])
con.modelrun(
    'model',
    input=['normalized_image', 'input_shape'],
    output=['boxes', 'scores', 'classes'])
boxes = con.tensorget('boxes', as_type=rai.BlobTensor).to_numpy()
classes = con.tensorget('classes', as_type=rai.BlobTensor).to_numpy()
draw = ImageDraw.Draw(pil_image)
shapes = numpy_img.shape
for left, top, right, bottom in post_process(classes, boxes, shapes):
    draw.rectangle(((left, top), (right, bottom)), outline='green')
pil_image.save('out_{}.jpg'.format(image_path.split('.')[0]), "JPEG")

If you notice, RedisAI client has functionalities to convert NumPy array to RedisAI tensors and convert the RedisAI tensors back to NumPy array. The final line saves the image with the bounding box to the disk. This is how it looks like:

RedisAI at Scale

As you know already, with RedisAI we can utilize the whole Redis ecosystem and hence scale our production runtime to a multi-node cluster setup with failover through Redis Sentinel. We’ll publish another blog post to show an example of using Sentinel, but here we give a brief peek into a failover scenario through Sentinel. For running a Sentinel example, take a look into the example repo.

Above given is typical master-replica setup with four Redis instances. In the first picture, we have three replicas (faded red) connected to one master node. Replicas are read-only instances (unless you configure otherwise). With an upcoming feature, called DAGRUN (read about it in the next section and here), we will have the possibility to have a pipeline of commands act as read-only, if we wish so, and thus run on replicas. In the second picture above, the black node represents a Redis instance that went down. At this point, we don’t have a master, and replicas in the Sentinel is now in the process of electing the new master. The third picture shows that one of the replicas now becomes the master. The master-slave changes are handled from the Redis client and without the user knowing about the downtime, Sentinel brings the whole infra back. With Redis cluster, we can distribute, shard the tensor data across different nodes as well. In the next blog, we will delve into more details on high availability and persistence.

Roadmap

Personally what I am most excited about is some of the features that are in the roadmap. We’ll start with DAG. As you probably already have assumed, DAG let you define a direct acyclic graph of RedisAI operations, where individual commands in the DAG can operate on standard Redis keys or on “volatile keys” (keys that do not touch keyspace, they only exist in the context of the command). A DAGRUN command where all output keys are volatile is effectively a read-only command, as such it can be run in replicas so that the master can be left pressureless. Volatile keys are gone once the execution of a DAGRUN is finished. Following is the example of a DAG. Volatile keys are recognizable because their names are surrounded by ~.

AI.DAGRUN
  TENSORSET ~img~ FLOAT 1 3 224 224 BLOB blob_data
  SCRIPTRUN preproc normalize INPUTS ~img~ OUTPUTS ~in~
  MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
  SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label

Above given is an example of DAGRUN with two SCRIPTs and one Model in the pipeline. The TENSORSET command will save the data to a volatile key ~img~ which will be used by SCRIPTRUN. The output of the first SCRIPTRUN is saved to a volatile key ~in~ which is then used in the MODELRUN. The output of MODELRUN is saved to another volatile key and that is used in the last SCRIPTRUN. The final output is then saved into the key label. But here we are saving the output to a non-volatile key and hence can’t execute on a replica. A slightly changed version of the above command that could run on a replica is given below. Notice the DAGRUN command is replaced with DAGRUNRO which is for letting the RedisAI know that it’s a read-only command.

AI.DAGRUNRO
  TENSORSET ~img~ FLOAT 1 3 224 224 BLOB blob_data
  SCRIPTRUN preproc normalize INPUTS ~img~ OUTPUTS ~in~
  MODELRUN resnet18 INPUTS ~in~ OUTPUTS ~out~
  SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS ~label~
  TENSORGET ~label~ VALUES

Apart from the flexibility and the ability to run on replicas, DAG command gives us the capability to operate the individual commands in parallel if they can execute independently. Look at the example below with one TF model and another PyTorch model is running in an ensemble pipeline. Since two MODELRUN commands inside DAGRUN don’t have any dependency in between, RedisAI executes them in parallel.

AI.MODELSET resnet18a TF GPU0 ...
AI.MODELSET resnet18b TORCH GPU1 ...
AI.TENSORSET img ...
  AI.DAGRUN
    SCRIPTRUN preproc normalize INPUTS img OUTPUTS ~in~
    MODELRUN resnet18a INPUTS ~in~ OUTPUTS ~out1~
    MODELRUN resnet18b INPUTS ~in~ OUTPUTS ~out2~
    SCRIPTRUN postproc ensamble INPUTS ~out1~ ~out2~ OUTPUTS ~out~
    SCRIPTRUN postproc probstolabel INPUTS ~out~ OUTPUTS label

ONNXRuntime as a backend is another big feature which is under construction. With ONNXRuntime, we’ll be able to run models from probably all the popular deep learning framework that supports ONNX export. Users of Apache MxNet and Microsoft’s CNTK are probably the people who get benefit out of this. It’s not just the deep learning models we can run on RedisAI through ONNXRuntime. Since scikit-learn supports exporting in a subset of ONNX called ONNX-ML which ONNXRuntime supports, we will be able to run traditional machine learning algorithms like random forest, SVN, K-means, etc. on RedisAI.

Autobatching of input data, more ways to monitor the runtime, dynamic loading of the backend, Redis Stream integration, RedisAI Enterprise, more Redis Module integration like RediSearch & RedisTS, etc are the most awaited features we could expect in the next few months leading us to General Availability.

Conclusion

I know, I know, this is a long post and I do not intend to make the conclusion lengthy. You already learned enough about RedisAI and now it’s time to go and explore RedisAI by yourself if you’re interested. Whether you work on computer vision, natural language processing or using any other deep neural networks or traditional ML algorithms: if you have a model that you need to serve in production, chances are, RedisAI can make a difference in the way you operate and might make your life easier. We have some exciting months ahead of us with a bunch of features in the backlog to develop as well as more ideas that could possibly become features of RedisAI later. We hope to make RedisAI the go-to solution in a dev’s stack for serving DL/ML at scale. Let us know your feedback and we promise we’ll make sure those feedbacks end up feeding into the development cycle.