mirror of
https://github.com/CoderSherlock/CoderSherlock.github.io.git
synced 2026-06-13 08:08:10 -07:00
Add new posts about eddl
This commit is contained in:
+91
-2
@@ -6,10 +6,99 @@
|
||||
</description>
|
||||
<link>https://codersherlock.github.com//</link>
|
||||
<atom:link href="https://codersherlock.github.com//feed.xml" rel="self" type="application/rss+xml"/>
|
||||
<pubDate>Tue, 12 Oct 2021 18:31:37 -0400</pubDate>
|
||||
<lastBuildDate>Tue, 12 Oct 2021 18:31:37 -0400</lastBuildDate>
|
||||
<pubDate>Wed, 13 Oct 2021 18:33:50 -0400</pubDate>
|
||||
<lastBuildDate>Wed, 13 Oct 2021 18:33:50 -0400</lastBuildDate>
|
||||
<generator>Jekyll v4.1.1</generator>
|
||||
|
||||
<item>
|
||||
<title>EDDL: How do we train neural networks on limited edge devices - PART 1</title>
|
||||
<description><p>This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published.
|
||||
As the first part of the introductions, I focus only on the motivation and summary of our works.
|
||||
More details in design and implementation can be found in late posts.</p>
|
||||
|
||||
<p><img src="/static/2021-10/edgelearn-1.png" height="250" /></p>
|
||||
|
||||
<h2 id="why-do-we-need-training-on-edge">Why do we need training on edge?</h2>
|
||||
|
||||
<p>Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before.
|
||||
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves.</p>
|
||||
|
||||
<p>Researchers, no matter in industry on academia, are working in a way that still learning from users’ data but also keeping raw sensitive data under users’ control.
|
||||
Many publications already showed feasibility of only sharing after-trained model instead of raw data.
|
||||
One recent popular study on this is google’s <a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a>.</p>
|
||||
|
||||
<p>During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency.
|
||||
Since one end device has limited resources, training time and power consumption can be disappointing.
|
||||
We believe there must have a leverage between privacy and efficiency in some target scenarios.</p>
|
||||
|
||||
<p>Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests.
|
||||
Therefore, these co-located users have similar demands in using AI-involved routines.
|
||||
Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners.</p>
|
||||
|
||||
<p>Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program.
|
||||
This process may takes long time and small amount of samples may not be recognized by the global neural networks model.
|
||||
With a customized local model trained and deployed on the edge can successfully counter the problem.
|
||||
With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible.</p>
|
||||
|
||||
<h2 id="why-training-on-edge-is-hard">Why training on edge is hard?</h2>
|
||||
|
||||
<p>Since all co-located users’ device can be used for an edge training, issues and challenges occur as deploying this distributed system.</p>
|
||||
|
||||
<p>The first challenge is <strong>struggling workers</strong>.
|
||||
Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU.
|
||||
They are not designed to do machine learnings.
|
||||
So, a good edge-based distributed learning framework must can handle variety speeds in training tasks.</p>
|
||||
|
||||
<p>The second challenge is how to <strong>scale up</strong> clusters.
|
||||
In a campus, thousands and more devices may contribute computing resources to the same training tasks.
|
||||
However, these devices may located in far not matter in physical or in network topology.
|
||||
How can we well use them well, without struggled with endless transmission time remains a challenge.</p>
|
||||
|
||||
<p>The third issue is frequently <strong>joining and exiting</strong> of devices.
|
||||
We can’t rely on each devices to faithfully working on training tasks rather than their original workload.
|
||||
Smartly schedule work balance and handle join/exit issues also need under consideration.</p>
|
||||
|
||||
<h2 id="our-proposal">Our proposal</h2>
|
||||
|
||||
<ul>
|
||||
<li>
|
||||
<p>Dynamic training data distribution and runtime profiler</p>
|
||||
|
||||
<p>We design a dynamic training data distribution mechanism that helps to both the first and the third challenges.
|
||||
Preprocessing data can be transmitted without leakage of raw sensitive information.
|
||||
This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time.
|
||||
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness.</p>
|
||||
|
||||
<p>To counter heterogeneity’s, more approaches were applied in our later research.
|
||||
More details were introduced to runtime profiler in the later works.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Asynchronous and synchronous aggregation enabled</p>
|
||||
|
||||
<p>In our findings, asynchronous and synchronous parameter update have their pros and cons.
|
||||
Keeping sync all the time leads struggling worker issue unsolvable.
|
||||
However, async’s harm to accuracy and convergence time also need attentions.
|
||||
To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages.</p>
|
||||
</li>
|
||||
<li>
|
||||
<p>Leader role splitting</p>
|
||||
|
||||
<p>The idea is to let worker devices with higher bandwidth taking leader role during training.
|
||||
Parameter updating does not require much computation but only need bandwidth.
|
||||
Devices with sufficient bandwidth can also work as virtual leader devices.
|
||||
This approach helps with minimize physical devices we used and more leaders can further scale up workers limits.</p>
|
||||
</li>
|
||||
</ul>
|
||||
</description>
|
||||
<pubDate>Wed, 13 Oct 2021 16:53:20 -0400</pubDate>
|
||||
<link>https://codersherlock.github.com//archivers/eddl-how-do-we-train-on-limited-edge-devices</link>
|
||||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/eddl-how-do-we-train-on-limited-edge-devices</guid>
|
||||
|
||||
|
||||
<category>Research</category>
|
||||
|
||||
</item>
|
||||
|
||||
<item>
|
||||
<title>Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</title>
|
||||
<description><p>Let’s generate a word cloud like this.
|
||||
|
||||
Reference in New Issue
Block a user