Deploy Transformers models with confidentiality

Learn how to deploy Transformers models, with privacy guarantees thanks to Confidential Computing!

Deploy Transformers models with confidentiality

I. Presentation of BlindAI

Our goal at Mithril Security is to democratize Confidential AI, so that any data scientist can leverage sensitive data with privacy guarantees for data owners.

This is the reason why we have built BlindAI, an open-source, fast and accessible inference solution for AI.

By using BlindAI, data scientists can deploy neural networks for various scenarios, from BERT models to analyze confidential documents, to medical imaging with ConvNets, through speech-to-text with WaveNet.

BlindAI's added confidentiality guarantees

Our solution provides end-to-end protection by leveraging Confidential Computing, with the use of Intel SGX. Confidential Computing enables the creation of enclaves, secure environments, where sensitive data can be processed with guarantees that no outsider can have access to it. In addition, a proof ensuring that the right code is loaded inside can be given, so that data owners know that only trusted code will be executed on their data, for transparency and to avoid backdoors.

Therefore, by using BlindAI, we help data scientists deploy models on sensitive data, for instance medical or biometric data, and unlock markets blocked by security, privacy or regulatory constraints.

We provide more information about Confidential Computing in our series Confidential Computing explained.

Our solution comes in two parts:

  • An inference server using Rust SGX. It loads ONNX models, exported from Pytorch or Tensorflow inside the enclave, and serves state-of-the-art AI models securely.
  • A client SDK in Python, which allows users to securely consume AI models. It will check before sending anything that data will be only shared with services loading the right code, and with the right security features, including end-to-end protection.

II. Getting started with Transformers and BlindAI


In this tutorial, we propose a quick start with the deployment of a state-of-the-art model, DistilBERT for a simple classification task, with confidentiality guarantees, using BlindAI.

Because we leverage Confidential Computing with Intel SGX under the hood, specific hardware is required on the server side to benefit from this technology's security guarantees. You can check if your Intel CPU is compatible by looking at Intel Ark to find if a given CPU supports it.

In this post, we will show you how to use BlindAI in simulation mode. This mode is not secure but it will work regardless of the hardware you are using. To launch BlindAI in hardware mode, check this documentation page.

The two modes have very similar steps, though the hardware mode requires a few more steps, in addition to running it with the right CPU.

Workflow of BlindAI

For this use case of deployment of a DistilBERT model for classification, we will do it in three steps:

  • Launch the inference server
  • Upload the model
  • Send data for prediction

A - Launch server

BlindAI's server has an ONNX based inference engine, coded in Rust and leveraging Intel SGX to serve models with confidentiality. It comes with the networking layer to upload the model securely inside the enclave, and send data for prediction.

We provide pre-built Dockers images of BlindAI at

You can build the server yourself following the guidelines at

To launch the server in simulation mode, use this Docker image:

docker run -p 50051:50051 -p 50052:50052 mithrilsecuritysas/blindai-server-sim
Run the inference server in simulation mode

B - Install BlindAI client SDK

Now that the confidential inference server is launched, it expects a model to be loaded first, then we can start sending data for prediction. Both of these tasks are covered by our Python SDK.

You can install it using PyPi with:

pip install blindai
Install BlindAI

or you can build it from source using our repository.

C - Upload the model

For this tutorial, we want to deploy a DistilBERT model for classification, within our confidential inference server. This could be useful for instance to analyze medical records in a privacy-friendly manner and compliant way.

Because our inference server loads ONNX models, we have to first export a DistilBERT in ONNX format. Pytorch or Tensorflow models can be easily exported to ONNX.

Step 1: Load the BERT model

from transformers import DistilBertForSequenceClassification

# Load the model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
Load the BERT model

For simplicty, we will take a pre-trained DistilBERT without fine-tuning it, as the purpose is to show how to deploy a model with confidentiality. In future articles we will show examples that go from training to deployment.

Step 2: Export it in ONNX format

Because it uses tracing behind the scenes, we need to feed it an example input.

from transformers import DistilBertTokenizer
import torch

# Create dummy input for export
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
sentence = "I love AI and privacy!"
inputs = tokenizer(sentence, padding = "max_length", max_length = 8, return_tensors="pt")["input_ids"]

# Export the model
	model, inputs, "./distilbert-base-uncased.onnx",
	export_params=True, opset_version=11,
	input_names = ['input'], output_names = ['output'],
	dynamic_axes={'input' : {0 : 'batch_size'},
	'output' : {0 : 'batch_size'}})
Export the model in ONNX format

Now that we have an ONNX file, we are ready to upload it to our inference server. At that point, the API is slightly different between simulation and hardware mode as the latter involves additional steps to fully check all security properties of the remote server.

from blindai.client import BlindAiClient, ModelDatumType

# Launch client
client = BlindAiClient()

client.connect_server(addr="localhost", simulation=True)

client.upload_model(model="./distilbert-base-uncased.onnx", shape=inputs.shape, dtype=ModelDatumType.I64)
Upload the model to a server in simulation mode

The client is straightforward, we require an address, so if you have loaded the inference server on the same machine, simply mention "localhost" as we did. For simplicity, in simulation connect_server simply creates an insecure channel to the server.

For the upload_model method, we need to specify the ONNX file, the shape of the inputs, and the type of data. Here because we run a BERT model, the inputs would be integers to represent the different tokens sent to the model.

Now that the model is loaded inside, we just have to send data to get a prediction.

D - Get prediction

The process is as straightforward as before, simply tokenize the input you want before sending it. As of now, the tokenization must happen at the client side, but we will implement it shortly on the server side, so that the client interface remains lightweight.

from transformers import DistilBertTokenizer

# Prepare the inputs
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
sentence = "I love AI and privacy!"
inputs = tokenizer(sentence, padding = "max_length", max_length = 8)["input_ids"]
Prepare the inputs to be sent

Now we simply have to create our client, connect and send data to be analysed. In the same fashion as before, we will create a client, and simply send data to be analysed with the proper communication channel.

from blindai.client import BlindAiClient

# Load the client
client = BlindAiClient()
client.connect_server("localhost", simulation=True)

# Get prediction
response = client.run_model(inputs)
Get prediction with a server in simulation mode

And voila! We can benefit from a state-of-the-art model with confidentiality guarantees: no need to fear data exposure anymore when using external AI solution!

You can check the correctness of the prediction by comparing it to results from the original Pytorch model provided by the Transformers library:

>>> response.output
[0.0005601687589660287, 0.06354495882987976]
Results with BlindAI
>>> model(torch.tensor(inputs).unsqueeze(0)).logits.detach()
tensor([[0.0006, 0.0635]])
Results with original model in Pytorch

The notebook can be found here.

We hope you enjoyed our first article! We have many more articles on the way, from confidential Speech-to-text to medical image analysis. To support Mithril Security, please star our GitHub repository!