Run your PyTorch model on Android GPU using libMACE

How to speed up PyTorch model on Android using GPU inference and MACE library

4 min readJun 24, 2020

In recent years, there has been a trend towards using GPU inference on mobile phones. In Tensorflow, GPU support on mobile devices is built into the standard library, but it is not yet implemented in the case of PyTorch, so we need to use third-party libraries. In this article, we will look at the PyTorch➔ONNX➔libMACE bundle.

Installation of libMACE

Next, we will look at the steps for installing libMACE.

Let’s create a separate virtual environment that will contain the library and everything it needs to work.

Then we need to install the bazel build system, according to the official installation guide.

Building the MACE library requires Android NDK. We need the r15c version, which we will download to the ~/VENV/opt/android-ndk-r15c/ directory.

Next, we need to install additional libraries.

At the last stage, we will create a script for configuring environment variables android_env.sh (shown below).

Converting a model

MACE uses its own format for neural networks representation, so we need to transform the original model. The conversion process consists of several stages. We will look at it using the example of ResNet 50 from the torchvision library.

At the first stage, we convert the PyTorch model to ONNX format.

After conversion, the contents of the folder should look like this.

In the second stage, we need to save the model in its own libMACE format. Let’s create a configuration file according to the guide.

The file must specify the absolute path to the ONNX file (model_file_path), the SHA256 checksum (model_sha256_checksum), the geometry and names of input and output tensors(input_tensors, input_shapes, output_tensors, and output_shapes), and the data format, in our case NCHW.

The checksum can be calculated using the sha256sum utility.

After creating the configuration file, we can run the model conversion script.

If the conversion was completed without errors, the following text will be displayed.

The result of the conversion will be the files reset_model.data and reset_model.pb, in ~/VENV/opt/mace/build/resnet_model/model/.

Configuring the Android Studio project

For our app in Android Studio, we need to specify the type C++ Native Application.

Next we need a binary build of the MACE library from the repository.

Let’s create a directory /app/libmace with folders arm64-v8 and armeabi-v7a where we copy versions of libmace.so for cpu_gpu arch from libmace-v0.13.0/lib/.

In the CMakeLists.txt we need to add MACE includes dir.

Next in the file CMakeLists.txt we need to create a lib_mace library and add it to the target_link_libraries list. We also need to add -ljnigraphics to this list for JNI Bitmap support.

In the app/build.gradle file, we need to add the abiFilters and externalNativeBuild subsections to the defaultConfig.

In the android section, we need to add the sourceSets entry (only for old Android Studio).

To AndroidManifest.xml we add reading permission to the file system

Next, let’s create an assets folder (Make New➔Folder➔Assets Folder) and copy resnet_model.pb and resnet_model.data to it.

Model loading

First, we will add the model loading function to MainActivity.java.

We will also add its implementation to native_lib.cpp.

The MACE library requires a special startup configuration, which is shown below.

Let’s look at the code in more detail. First, we need to select a device for computing (device_type). In the case of the GPU, MACE prepares OpenCL binaries, so it requires the storage_path directory where library can save them. Next, we specify the priority of the task(GPUPerfHint and GPUPriorityHint). If we select a high priority, the interface may freeze.

After creating the startup configuration, we load the neural network into memory and create a MaceEngine.

The neural network is returned as shared_ptr, which we can’t directly pass to MainActivity, so we’ll introduce an intermediate class ModelData(shown below).

The result of loadModel is a pointer to an object of this type.

Returning the pointer as long.

Model inference

For model inference we declare the classification function in MainActivity.

Also we add its definition to native_lib.cpp.

Later in this section, you will find a step-by-step description of this function.

First, we restore the models pointer from long.

Next, we need to prepare the data in NCHW format (a sample code is provided below).

After data loading we need to form a parameters for a neural network as a tensors dictionary, like this code. The shapes of tensors must match the ones specified in the resnet_model.yml file.

Now the data is ready. Let’s launch the models inference.

The predicted class will be the number of output with the maximum value.

Now, the main part of application is created. Also we need, some Java logic which is related to user interaction. It is not described in the article but can be found in the project repository. After adding it we can build the program and run it on the mobile phone.

The full source code can be downloaded here.