Deploy Transformers models with confidentiality

Learn how to deploy Transformers models, with privacy guarantees thanks to Confidential Computing!

Deploy Transformers models with confidentiality

I. Presentation of BlindAI

Our goal at Mithril Security is to democratize Confidential AI, so that any data scientist can leverage sensitive data with privacy guarantees for data owners.

This is the reason why we have built BlindAI, an open-source, fast and accessible inference solution for AI.

By using BlindAI, data scientists can deploy neural networks for various scenarios, from BERT models to analyze confidential documents, to medical imaging with ConvNets, through speech-to-text with WaveNet.

BlindAI's added confidentiality guarantees

Our solution provides end-to-end protection by leveraging Confidential Computing, with the use of Intel SGX. Confidential Computing enables the creation of enclaves, secure environments, where sensitive data can be processed with guarantees that no outsider can have access to it. In addition, a proof ensuring that the right code is loaded inside can be given, so that data owners know that only trusted code will be executed on their data, for transparency and to avoid backdoors.

Therefore, by using BlindAI, we help data scientists deploy models on sensitive data, for instance medical or biometric data, and unlock markets blocked by security, privacy or regulatory constraints.

We provide more information about Confidential Computing in our series Confidential Computing explained.

Our solution comes in two parts:

  • An inference server using Rust SGX. It loads ONNX models, exported from Pytorch or Tensorflow inside the enclave, and serves state-of-the-art AI models securely.
  • A client SDK in Python, which allows users to securely consume AI models. It will check before sending anything that data will be only shared with services loading the right code, and with the right security features, including end-to-end protection.

II. Getting started with Transformers and BlindAI


In this tutorial, we propose a quick start with the deployment of a state-of-the-art model, DistilBERT for a simple classification task, with confidentiality guarantees, using BlindAI.

To reproduce this article, we provide a Colab notebook for you to try.

In this article, we'll be using the managed Mithril Cloud as our BlindAI server, which means you don't have to launch anything locally.

A - Register on Mithril Cloud

Querying models can be done without signing up, but uploading a model will require you to be registered on Mithril Cloud. First, go there and register.

Once you are registered, go to the Settings page and create an API key to query our Cloud.

Please write down your API key somewhere and don’t lose it, otherwise, you will have to create another one!

Once you have your API key, we can move to the next step: install our library before uploading your model.

B - Install BlindAI

You will need our Python SDK to upload a model and query it securely.

You can install it using PyPI with:

pip install blindai
Install BlindAI

Or you can build it from source using our repository.

C - Upload the model

For this tutorial, we want to deploy a DistilBERT model for classification, within our confidential inference server. This could be useful for instance to analyze medical records in a privacy-friendly manner and compliant way.

Because our inference server loads ONNX models, we have to first export a DistilBERT in ONNX format. Pytorch or Tensorflow models can be easily exported to ONNX.

Step 1: Load the BERT model

from transformers import DistilBertForSequenceClassification

# Load the model
model = DistilBertForSequenceClassification.from_pretrained("distilbert-base-uncased")
Load the BERT model

For simplicity, we will take a pre-trained DistilBERT without fine-tuning it, as the purpose is to show how to deploy a model with confidentiality. In future articles we will show examples that go from training to deployment.

Step 2: Export it in ONNX format

Because it uses tracing behind the scenes, we need to feed it an example input.

from transformers import DistilBertTokenizer
import torch

# Create dummy input for export
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
sentence = "I love AI and privacy!"
inputs = tokenizer(sentence, padding = "max_length", max_length = 8, return_tensors="pt")["input_ids"]

# Export the model
	model, inputs, "./distilbert-base-uncased.onnx",
	export_params=True, opset_version=11,
	input_names = ['input'], output_names = ['output'],
	dynamic_axes={'input' : {0 : 'batch_size'},
	'output' : {0 : 'batch_size'}})
Export the model in ONNX format

Now that we have an ONNX file, we are ready to upload it to our inference server.

import blindai
import uuid

api_key = "YOUR_API_KEY" # Enter your API key here
model_id = "distilbert-" + str(uuid.uuid4())

# Upload the ONNX file to the remote enclave
with blindai.Connection(api_key=api_key) as client:
    response = client.upload_model("distilbert.onnx", model_id=model_id)
Upload the model to our BlindAI Cloud

Now that the model is loaded inside, we just have to send data to get a prediction.

D - Get prediction

The process is as straightforward as before, simply tokenize the input you want before sending it.

from transformers import DistilBertTokenizer

# Prepare the inputs
tokenizer = DistilBertTokenizer.from_pretrained("distilbert-base-uncased")
sentence = "I love AI and privacy!"
inputs = tokenizer(sentence, padding = "max_length", max_length = 8, return_tensors="pt")["input_ids"]
Prepare the inputs to be sent

Now we simply have to create our client, connect and send data to be analysed. In the same fashion as before, we will create a client, and simply send data to be analysed with the proper communication channel.

import blindai

with blindai.connect() as client:
  response = client.predict(model_id, inputs)
Get prediction with a server in simulation mode

And voila! We can benefit from a state-of-the-art model with confidentiality guarantees: no need to fear data exposure anymore when using external AI solution!

You can check the correctness of the prediction by comparing it to results from the original Pytorch model provided by the Transformers library:

>>> response.output[0].as_flat()
[0.0005601687589660287, 0.06354495882987976]
Results with BlindAI
>>> model(inputs).logits.detach()
tensor([[0.0006, 0.0635]])
Results with original model in Pytorch

The notebook can be found here.

We hope you enjoyed our first article! We have many more articles on the way, from confidential Speech-to-text to medical image analysis. To support Mithril Security, please star our GitHub repository!