Our Journey To Democratize Confidential AI

Our Journey To Democratize Confidential AI

This article provides insights into Mithril Security's journey to make AI more trustworthy and their perspective on addressing privacy concerns in the world of AI, along with their vision for the future.

Daniel Huynh

Introduction

Our main goal at Mithril Security is to make privacy-first AI solutions accessible so individuals and organizations can use AI without worrying their data will be exposed or misused to train someone else’s model.

We aim to achieve this through our flagship project, BlindChat - an open-source and private alternative to AI Assistant. BlindChat will allow anyone to benefit from advanced Conversational AI capabilities while keeping their data totally confidential.

Our Origins

To provide some background, I thought it would be useful to share details on the team behind this ambitious project so you can get to know the people driving this mission.

Who is Mithril Security

Mithril Security was founded in April 2021 by myself (Daniel), Raphaël (COO), and Mehdi (CTO). We are a group of AI and privacy enthusiasts based in Paris, determined to make AI more trustworthy.

Before starting Mithril, I worked at Microsoft in 2020 on Privacy Enhancing Technologies. These advanced techniques allow users to securely share sensitive data with AI providers without the provider accessing the real data, thanks to encryption.

In practice, this means that people who are not comfortable with sending data to AI providers, e.g., OpenAI or Anthropic, could benefit from their AI without any fear of data exposure as data is end-to-end protected, so not even admins from the AI provider could see or leak data!

I found those techniques fascinating, and I pitched the idea of democratizing those cutting-edge techniques, often lost in obscure GitHub, to Raphaël, my soon-to-be COO. 

When I explained these techniques to my friend Raphaël, he immediately grasped the potential. As a consultant helping banks adopt AI services, he struggled to evaluate SaaS solutions without compromising security or significant investments in on-prem deployments.

My proposal offered a solution to this challenge.

We brought Mehdi on as CTO later on. Having worked together previously, when I told him about democratizing privacy-preserving AI, he was excited by both the technical challenge and societal impact. His skills would be crucial to making this a reality.

And this is how we got started, and in April 2021, Mithril Security was founded! Not in a garage but entirely remotely due to COVID, and come on, we lived in Paris, and we don’t have garages there. We have bakeries.

Then Corentin joined us later as our first intern, but he quickly demonstrated aptitude in security, so we made him head of security (which was a good choice as he managed to get our first product successfully in less than a year).

Fast forward and following a Pre-Seed round in 2022, we hired other full-time tech employees, with Laura joining us as head of developer content and Shannon and Yassine joining the security team. So this is us!

Our Journey

As any early-stage startup, we've had to try many different approaches, pivoting frequently - especially given our cutting-edge work in AI and privacy.

  • BlindAI: was one of our early confidential AI projects - an inference server written purely in Rust focused on small models running on Intel CPUs. We had the code audited by Quarkslab for security. However, we moved away from BlindAI because of some key limitations. It relied on Intel SGX, which was too restrictive for our needs, only supporting Intel CPUs. The Intel CPUs also lacked the performance we needed for fast inference. Given these constraints around hardware support and speed, we continued to explore new directions.
  • BastionLab: was our exploration into remote data science with built-in access control and privacy protections. It aimed to allow organizations like hospitals to securely give external data scientists, such as researchers at pharmaceutical companies, access for analysis. BastionLab would sanitize all shared data and enforce differential privacy, ensuring anything the external researchers saw was privacy-preserving. This was a promising concept but ultimately did not gain traction due to some challenges. On the technical side, implementing differential privacy required making every operator differentially private, introducing significant complexity. The market also wasn't mature enough yet for privacy-preserving data sharing with external parties, so adoption was limited. Given these technical hurdles and market timing, BastionLab did not get the traction we needed to be viable. But it was a valuable learning experience in the intricacies of privacy-preserving data science.
  • AICert: was designed to cryptographically prove the provenance of AI models. It binds a given set of model weights to the specific code and data used to create them. This provides immutable evidence of the procedures involved in developing a model. AICert could also prove models were not backdoored - we actually tested it by subtly backdooring some models ourselves. It could further certify models were not trained on copyrighted data.

Unfortunately, at the time we developed AICert, awareness of AI cybersecurity risks was still fairly low, and regulations around AI ethics were fragmented. We felt the market was not mature enough yet to fully take advantage of AICert's capabilities.

Given the lack of demand, we decided not to launch AICert as a product yet. But we still believe strong cryptographic provenance and auditing will be critical for secure, ethical AI in the future as awareness and regulations evolve. AICert remains a valuable technical exercise in this important space.

While these efforts provided great learnings, we needed to find the right product fit with value in the near term. This became clear when we saw the explosion of adoption and hype around Large Language Models (LLMs) like ChatGPT, combined with what we knew about the privacy risks posed by them.

Our Perspective on LLMs

We believe LLMs have robust fundamentals:

  • Huge user appeal, as evidenced by the viral growth of conversational AI solutions such as ChatGPT, Claude.ai, and Bard. For example, ChatGPT became the fastest product to reach 1 million users.
  • Tangible productivity gains. Studies show LLMs help workers complete more tasks, faster, and with higher quality when paired with humans. For instance, this paper from BCG and Wharton already shows how consultants working in a symbiosis with a vanilla GPT4 completed more tasks (12.2%), faster (25.1%), and of better quality (40%) than the control.

However, they also introduce serious privacy risks, which we've explored in depth separately. Not only is there inherent risk in sharing data with providers as with any SaaS, but fine-tuning user data means any confidential information could be extracted later by others querying the model - this is what happened in the Samsung case.

Currently, the only way enterprises can address these privacy and compliance risks is to deploy AI models in their own Virtual Private Cloud (VPC) infrastructure. But this equates to reverting to the days of on-premise solutions, losing many cloud benefits.

While VPC deployment is more flexible than pure on-premise, it has notable drawbacks:

  • Engineering teams must be heavily involved in deploying in-house models. Non-technical staff lose the ability to leverage external AI services on their own.
  • In-house development is expensive, time-consuming, and needs significant expertise rather than leveraging specialized providers.

In essence, organizations face a dichotomy today between:

  1. External AI with no privacy guarantees or control over data practices
  2. Complex and costly in-house deployment unable to leverage external innovation

Ideally, they could benefit from advanced cloud AI without relinquishing privacy or control.

Neither of these are acceptable long-term options.

What We Are Building Now

Our ambition is to democratize privacy-first AI solutions that anyone can use, starting with  BlindChat, which aims to provide an open-source and privacy-by-design alternative to AI Assistant. A live demo of our product running inside your browser for full privacy is available here: BlindChat Local Demo, or you can sign up to the Alpha for the more powerful enclave-based solution here: BlindChat Enclave Alpha Sign Up

This approach provides an option that marries the privacy strength of VPC in-house deployments and the accessibility and flexibility of SaaS AI solutions. The below chart compares the current 2 approaches with the preferred alternative suggested by Mithril Security:

BlindChat will allow anyone to benefit from the power of LLMs without compromising privacy, as data remains encrypted end-to-end. This is the first step towards allowing AI SaaS vendors to offer confidential versions of their services to privacy-sensitive enterprises such as banks, insurance companies, pharmaceuticals, and healthcare providers, to name a few. If you want to learn more about our near-term plans, check out our roadmap

We believe this will unlock the responsible use of AI in privacy-sensitive fields like healthcare, finance, and more. If you are interested, do not hesitate to contact us, come to our GitHub, or chat with us on Discord.