Data is the most valuable resource we have today when it comes to solving our greatest challenges. With the right data, and enough of it, there is no limit to the compelling use cases we can create. Imagine a world where we could stop financial crimes like
money laundering, help reduce the number of deaths due to breast cancer, and more accurately track and balance the global import and export of goods between countries. Solving problems like these has the potential to save innumerable lives and revolutionise
industries. However, creating the solutions to such problems requires massive amounts of data that is distributed across the globe.
This data is available, but not accessible in a central fashion. It is hidden in the databases of individual hospitals, small bank branches, manufacturing facilities, and in the trenches of other siloed databases across the public and private sectors.
This brings up the concept of centralisation. Could we hypothetically centralise the world’s data or centralise only that which is relevant for a particular use case? The natural follow-up to this question, is, of course, should we? There is some nuance
to this debate, but the short answer is no. While incredibly useful if only used for good, one massive database is a huge risk if it becomes accessible to bad actors.
So, if one massive database is too risky, then how can we get the world’s data into the hands of data scientists and machine learning engineers to hasten the development of revolutionary solutions? The answer could lie in an open source library called PySyft.
Our ability to develop models and answer difficult questions is limited because data is distributed across the globe, siloed and made entirely inaccessible by legal contracts and stringent partnership agreements. PySyft pushes for privacy enhancing technologies
(PETs) that allow data scientists to compute on information they do not own, without ever receiving a copy of the data, on machines that they do not have full control over. It removes the need to move potentially sensitive data to a remote server allowing
data owners to keep their data on their machines while allowing data scientists to derive value and innovate solutions. PySyft is developing the future of data sharing through federated data networks powered by PETs, allowing data scientists to leverage more
data than ever.
To conceptualise how PySyft could deliver truly revolutionary results, let’s go back to our breast cancer use case. Currently, top performing machine learning models for breast cancer detection use less than 0.1% of the world’s data. Worldwide there are
more than 750 million mammography images taken over a decade. If a data scientist wanted to access even a fraction of a fraction of these images they would have to sign partnership agreements, go through governance reviews, deploy secure data stores, manage
access, and much more. From time to monetary costs, this is not scalable and does not give us enough data to work with.
However, with federated data networks, hospitals all over the globe could share their data in a safe and secure manner and allow data scientists and developers to securely compute and develop models that vastly improve our understanding of the disease, its
progression and diagnostics, saving lives. Those using the data would have no physical access to the medical datasets, would not be able to store the data on their machines and, instead of going through the process of securing five to ten partnership agreements,
they could access a network of hundreds or thousands of hospitals.
PySyft, in my opinion, is creating the future of data sharing through the use of federated data networks. The world has enough data to solve many important and unsolved problems. However, stringent access restrictions in centralising data are preventing
advancement. We have the computational power, we have the data—PySyft could give us the necessary access.
Leave a Reply