3 Tips To Accelerate The Detection Of COVID-19 With Machine Learning

Geo-Distributed System
3 min readJul 8, 2021
Global crisis

The COVID -19 has introduced an unprecedented global crisis. Besides nucleic acid test, medical diagnostic image analysis using machine learning is increasingly vital for early detection and further medical treatment.

This scenario poses a challenge for standard machine learning approaches.
Sharing diagnostic images can be an issue under the the regulations like GDPR while training model cannot be done only with insufficient dataset.

In practice , engineers introduce federated learning to solve this problem. Federated learning enables devices on edge to collaboratively learn a shared model (even globally) while keeping all the training data on device, decoupling the ability to do machine learning from the need to store the data in the cloud.

The new data stack challenge

This approach, however, causes another critical problem: high communication cost.

According to Google: Every time any device downloads the current model, improves it by learning from data on edge, and then summarizes the changes as a small focused update. Only this update to the model is sent to the cloud, using encrypted communication, where it is immediately averaged with others’ updates to improve the shared model. All the training data remains on devices, and no individual updates are stored in the cloud.

In short, such highly iterative algorithms require low-latency, high-throughput connections to the training data.

Also, one particular condition needs your attention: uploading weighs as much as downloading. The truth is, most devices work in unreliable and relatively slow network connections. And also, upload speeds are typically 2x~4x slower than download speeds.

So besides the improvement of algorithms, how can we improve this?

Time, Time, Time

Time really matters in such scenario. Since we all know that the data on devices must be transferred in real-time with low-latency or accurate diagnosis cannot be guaranteed.

Maybe QUIC can improve this. QUIC stands for Quick UDP Internet Connection, first deployed by Google in 2013.

One of the main distinctions between QUIC and TCP is the performance on relatively poor network connections. QUIC runs like a car while TCP
runs like a carriage (details in here).

In this paper, scientist studied the approaches to get medical diagnostic image analysis with federated learning.We can see the medical images ranges from 10MB to 1000MB, which takes at least 65 seconds for uploading the updates.

According to a QUIC benchmark report, on the condition that high delay, packet loss, and high bandwidth, QUIC will perform much better than TCP including time for transfer and throughput.

So we can see the use case alike, following are the toolkits that may be helpful.

  1. QUIC

QUIC is being considered for replacing TCP as a transport protocol for HTTP/3. You can pay close attention to Google Blog and also learn from more news from here.

2.Streaming Framework

Streaming image recognition deployed by a framework atop of QUIC can reduce the total time of uploading and AI inference.

3.Building a distributed app in secs

Macrometa and Cloudflare enable building apps that typically require massive investments with high technical risk and complexity to be built intuitively and quickly.

An emerging technology stack

Now in this case, the detection of COVID-19 has already posed a challenge to traditional technology stack. It is not a good choice to transport massive volume of data generated on devices to cloud, especially when they need to be processed in real-time.

Macrometa’s Approach To Solving Complex, Geo Distributed Data Challenges

The next decade will see a new age of decentralized data generated from various devices. We see geo-distributed computing can be a counterpart of cloud computing. Hope you join here to discuss.

--

--