Add new posts about eddl

2026-06-13 08:08:10 -07:00 · 2021-10-13 18:34:16 -04:00
parent 1595ec54ad
commit c2f2b702c5
16 changed files with 621 additions and 40 deletions
@@ -6,10 +6,99 @@
 </description>
    <link>https://codersherlock.github.com//</link>
    <atom:link href="https://codersherlock.github.com//feed.xml" rel="self" type="application/rss+xml"/>
-    <pubDate>Tue, 12 Oct 2021 18:31:37 -0400</pubDate>
-    <lastBuildDate>Tue, 12 Oct 2021 18:31:37 -0400</lastBuildDate>
+    <pubDate>Wed, 13 Oct 2021 18:33:50 -0400</pubDate>
+    <lastBuildDate>Wed, 13 Oct 2021 18:33:50 -0400</lastBuildDate>
    <generator>Jekyll v4.1.1</generator>
    
+      <item>
+        <title>EDDL: How do we train neural networks on limited edge devices - PART 1</title>
+        <description>&lt;p&gt;This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published.
+As the first part of the introductions, I focus only on the motivation and summary of our works.
+More details in design and implementation can be found in late posts.&lt;/p&gt;
+
+&lt;p&gt;&lt;img src=&quot;/static/2021-10/edgelearn-1.png&quot; height=&quot;250&quot; /&gt;&lt;/p&gt;
+
+&lt;h2 id=&quot;why-do-we-need-training-on-edge&quot;&gt;Why do we need training on edge?&lt;/h2&gt;
+
+&lt;p&gt;Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before.
+Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves.&lt;/p&gt;
+
+&lt;p&gt;Researchers, no matter in industry on academia, are working in a way that still learning from users’ data but also keeping raw sensitive data under users’ control.
+Many publications already showed feasibility of only sharing after-trained model instead of raw data.
+One recent popular study on this is google’s &lt;a href=&quot;https://ai.googleblog.com/2017/04/federated-learning-collaborative.html&quot;&gt;federated learning&lt;/a&gt;.&lt;/p&gt;
+
+&lt;p&gt;During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency.
+Since one end device has limited resources, training time and power consumption can be disappointing.
+We believe there must have a leverage between privacy and efficiency in some target scenarios.&lt;/p&gt;
+
+&lt;p&gt;Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests.
+Therefore, these co-located users have similar demands in using AI-involved routines.
+Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners.&lt;/p&gt;
+
+&lt;p&gt;Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program.
+This process may takes long time and small amount of samples may not be recognized by the global neural networks model.
+With a customized local model trained and deployed on the edge can successfully counter the problem.
+With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible.&lt;/p&gt;
+
+&lt;h2 id=&quot;why-training-on-edge-is-hard&quot;&gt;Why training on edge is hard?&lt;/h2&gt;
+
+&lt;p&gt;Since all co-located users’ device can be used for an edge training, issues and challenges occur as deploying this distributed system.&lt;/p&gt;
+
+&lt;p&gt;The first challenge is &lt;strong&gt;struggling workers&lt;/strong&gt;.
+Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU.
+They are not designed to do machine learnings.
+So, a good edge-based distributed learning framework must can handle variety speeds in training tasks.&lt;/p&gt;
+
+&lt;p&gt;The second challenge is how to &lt;strong&gt;scale up&lt;/strong&gt; clusters.
+In a campus, thousands and more devices may contribute computing resources to the same training tasks.
+However, these devices may located in far not matter in physical or in network topology. 
+How can we well use them well, without struggled with endless transmission time remains a challenge.&lt;/p&gt;
+
+&lt;p&gt;The third issue is frequently &lt;strong&gt;joining and exiting&lt;/strong&gt; of devices.
+We can’t rely on each devices to faithfully working on training tasks rather than their original workload.
+Smartly schedule work balance and handle join/exit issues also need under consideration.&lt;/p&gt;
+
+&lt;h2 id=&quot;our-proposal&quot;&gt;Our proposal&lt;/h2&gt;
+
+&lt;ul&gt;
+  &lt;li&gt;
+    &lt;p&gt;Dynamic training data distribution and runtime profiler&lt;/p&gt;
+
+    &lt;p&gt;We design a dynamic training data distribution mechanism that helps to both the first and the third challenges.
+  Preprocessing data can be transmitted without leakage of raw sensitive information. 
+  This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time.
+  Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness.&lt;/p&gt;
+
+    &lt;p&gt;To counter heterogeneity’s, more approaches were applied in our later research.
+  More details were introduced to runtime profiler in the later works.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Asynchronous and synchronous aggregation enabled&lt;/p&gt;
+
+    &lt;p&gt;In our findings, asynchronous and synchronous parameter update have their pros and cons. 
+  Keeping sync all the time leads struggling worker issue unsolvable.
+  However, async’s harm to accuracy and convergence time also need attentions.
+  To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages.&lt;/p&gt;
+  &lt;/li&gt;
+  &lt;li&gt;
+    &lt;p&gt;Leader role splitting&lt;/p&gt;
+
+    &lt;p&gt;The idea is to let worker devices with higher bandwidth taking leader role during training.
+  Parameter updating does not require much computation but only need bandwidth. 
+  Devices with sufficient bandwidth can also work as virtual leader devices.
+  This approach helps with minimize physical devices we used and more leaders can further scale up workers limits.&lt;/p&gt;
+  &lt;/li&gt;
+&lt;/ul&gt;
+</description>
+        <pubDate>Wed, 13 Oct 2021 16:53:20 -0400</pubDate>
+        <link>https://codersherlock.github.com//archivers/eddl-how-do-we-train-on-limited-edge-devices</link>
+        <guid isPermaLink="true">https://codersherlock.github.com//archivers/eddl-how-do-we-train-on-limited-edge-devices</guid>
+        
+        
+        <category>Research</category>
+        
+      </item>
+    
      <item>
        <title>Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</title>
        <description>&lt;p&gt;Let’s generate a word cloud like this.