When considering data confidentiality and protection, there is no data more important than personal data, whether medical, financial or even social. Discussions about accessing our data or even our metadata are about who knows what and whether my personal data is safe. Today’s announcement between Intel, Microsoft and DARPA is a program designed to keep information secure and encrypted, but still uses that data to build better models or provide better statistical analysis without revealing actual data. It’s called Fully Homomorphic Encryption, but it’s so computationally intensive that the concept is almost useless in practice. This program between the three companies is an engine for providing IP and silicon to speed up calculations, allowing a more secure environment for collaborative data analysis.
Keep your data in mind
Data protection is one of the most important aspects for the future of computers. The volume of personal data is constantly growing, as is the value of this data and the number of legal remedies required. This complicates any processing of personal, private and confidential data, which often leads to special data silos, as each processing requires data transfer combined with encryption / decryption involving trust, which is not always possible. All that is needed is for a key in the circuit to be lost or expired and for the data set to be compromised.
There is a way around this, known as completely homomorphic encryption (FHE). FHE makes it possible to take encrypted data, transfer it where it needs to be, calculate on it and get results without ever knowing the exact database.
Take, for example, the analysis of medical records: if the researcher had to process a certain set of data for an analysis, the traditional method would be to encrypt the data, send the data, decrypt the data and process it – but give the researcher access to the data. recordings may not be legal or may face regulatory challenges. With FHE, this researcher can take encrypted data, perform the analysis, and obtain a result without ever knowing any specifics of the data set. This may include a combined statistical analysis of a population on multiple encrypted datasets or taking these encrypted datasets and using them as additional inputs for learning machine learning algorithms, increasing accuracy by deploying more data. Of course, the researcher must have confidence that the data provided is complete and genuine, but this may be a different topic than allowing calculations on encrypted data.
One of the questions why this matters is because the best data data comes from the largest data sets. This includes the ability to train on a neural network, and the best neural networks face problems with a lack of sufficient data or face regulatory obstacles when it comes to the sensitive nature of that data. That is why completely homomorphic encryption, the ability to analyze data without knowing its content, is important.
Completely homomorphic encryption, as a concept, has existed for several decades, but the concept has only been realized in the last 20 years. During this initial period, a number of partial homomorphic encryption schemes were introduced, and since 2010 several PHE / FHE designs have been developed capable of handling basic operations on encrypted data or cyphertexts with a number of libraries developed with industry standards. Some of them are open source. Many of these methods are computationally complex for obvious reasons due to the handling of encrypted data, although efforts are being made with SIMD-like packages and other functions to speed up processing. Although FHE schemas are accelerating, this is not the same as decryption, as mathematics does not decrypt data – because data is always encrypted, it can (perhaps) be used by unreliable third parties as basic information is never exposes. (It can be argued that a sufficient set of data can reveal more than expected, even though it is encrypted.)
Today’s post: Customized silicon for FHE
When measuring the performance of the FHE calculation, the result is compared with the same analysis against the plain text version. Due to the computational complexity of FHE computation, current computational methods are significantly slower. Encryption methods to activate FHE can increase the data size by 100-1000x and then the calculations on these data are 10000x to 1 million times slower than conventional calculations. This means that a second calculation of raw data can take from 3 hours to 12 days.
So whether it means combining hospital medical records in a state or personalizing a personal service using personal metadata collected on a user’s smartphone, FHE on this scale is no longer a viable solution. Enter the DARPA DPRIVE program.
- DARPA: Agency for Advanced Defense Research Projects
- DPRIVE: Data protection in a virtual environment
Intel announced that as part of the DPRIVE program, it has signed an agreement with DARPA to develop a custom IP leading to silicon to enable faster FHE in the cloud, particularly with Microsoft’s Azure and JEDI cloud, initially with the government’s USA. As part of this multi-year project, the expertise of Intel Labs, Intel Design Engineering and Intel Data Platforms Group will come together to create a dedicated ASIC to reduce FHE’s computational costs compared to existing CPU-based methods. The press release states that the aim is to reduce the processing time by five orders of magnitude of the current methods, reducing the computational times from days to minutes.
Intel already has a foot in the door when it comes to FHE, with an Intel Labs research team dedicated to the problem. This is primarily due to software, standards and regulatory barriers, but will now move on to hardware design, cloud software stacks and joint deployment in the Azure and JEDI cloud for the US government. Other highlighted target markets include healthcare, insurance and finance.
During Intel Labs Day in December 2020, Intel detailed some of the direction it was already heading in this work, along with standards and the development of parallel traditional encryption, but internationally given additional regulatory barriers. Microsoft will now be part of this discussion with the DPRIVE program, along with Intel’s continued investment at the academic level.
Apart from the ‘five orders of magnitude’ element, today’s communication does not go beyond that in setting final targets, nor does it provide a time frame, instead of saying that it is a “multi-annual” agreement. It will be interesting to see how much Intel or their academics discuss on the topic beyond today, beyond the standardization of work.