Add new post about eddl part 2

This commit is contained in:
2021-10-31 18:18:46 -04:00
parent 6f4c32d4fb
commit f86bce1774
20 changed files with 1572 additions and 153 deletions
+1 -44
View File
@@ -1,46 +1,3 @@
My Personal Blog My Personal Blog
LANG: en_US
写点啥呢?感觉自己真是开坑狂魔。
准备开三个新坑吧,除去现在还在更新的科研进度,接下来我要开两个新坑介绍自己的两个项目,以及一个摄影的图坑。
+1
View File
@@ -202,6 +202,7 @@ exclude:
- /screenshots - /screenshots
- /test - /test
- /vendor - /vendor
- configure.sh
defaults: defaults:
- scope: - scope:
+1
View File
@@ -3,6 +3,7 @@ title: "Xv6 introduction"
date: 2017-07-28 14:56:55 -0400 date: 2017-07-28 14:56:55 -0400
tags: xv6 tags: xv6
author: Pengzhan Hao author: Pengzhan Hao
cover: '/static/2021-10/Xv6_LS_Command_Output.png'
--- ---
In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes. In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
@@ -14,66 +14,66 @@ More details in design and implementation can be found in late posts.
## Why do we need training on edge? ## Why do we need training on edge?
Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before. Cloud is not trustworthy anymore. More and more facts support that breach on the cloud happens frequently than before.
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves. Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech companies know better to someones than the user.
Researchers, no matter in industry on academia, are working in a way that still learning from users' data but also keeping raw sensitive data under users' control. Researchers, no matter in the industry on academia, are working in a way that still learning from users' data but also keeping raw sensitive data under users' control.
Many publications already showed feasibility of only sharing after-trained model instead of raw data. Many publications have already shown the feasibility of only sharing the after-trained model instead of raw data.
One recent popular study on this is google's [federated learning](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html). One recent popular study on this is google's [federated learning](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html).
During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency. During investigating this problem, we found that letting end-user train their data is safe, but sacrifice efficiency.
Since one end device has limited resources, training time and power consumption can be disappointing. Since one end device has limited resources, training time and power consumption can be disappointing.
We believe there must have a leverage between privacy and efficiency in some target scenarios. We believe there must have leverage between privacy and efficiency in some target scenarios.
Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests. Fortunately, we observed that users who belong to the same campus, plant, firm, and community always share similar interests.
Therefore, these co-located users have similar demands in using AI-involved routines. Therefore, these co-located users have similar demands in using AI-involved routines.
Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners. Also, co-located users are easily targeted by the same type of threats, such as ransomware to financial practitioners.
Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program. Think about this, sending features of a new malware app to cloud services to train neural networks used by antivirus programs.
This process may takes long time and small amount of samples may not be recognized by the global neural networks model. This process may take a long time and a small number of samples may not be recognized by the global neural networks model.
With a customized local model trained and deployed on the edge can successfully counter the problem. A customized local model trained and deployed on the edge can successfully counter the problem.
With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible. With edge training as a supplement to the cloud training can achieve better response time and let the whole system more flexible.
## Why training on edge is hard? ## Why training on edge is hard?
Since all co-located users' device can be used for an edge training, issues and challenges occur as deploying this distributed system. Since all co-located users' devices can be used for edge training, issues and challenges occur as deploying this distributed system.
The first challenge is **struggling workers**. The first challenge is **struggling workers**.
Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU. Training devices are heterogeneous, from limited IoT cameras to high-end media centers with powerful GPUs.
They are not designed to do machine learnings. They are not designed to do machine learning.
So, a good edge-based distributed learning framework must can handle variety speeds in training tasks. So, a good edge-based distributed learning framework must be able to handle a variety of speeds in training tasks.
The second challenge is how to **scale up** clusters. The second challenge is how to **scale up** clusters.
In a campus, thousands and more devices may contribute computing resources to the same training tasks. On a campus, thousands and more devices may contribute computing resources to the same training tasks.
However, these devices may located in far not matter in physical or in network topology. However, these devices may be located far no matter in physical or in network topology.
How can we well use them well, without struggled with endless transmission time remains a challenge. The question of how can we well use them well, without struggling with endless transmission time remains a challenge.
The third issue is frequently **joining and exiting** of devices. The third issue is frequently **joining and exiting** of devices.
We can't rely on each devices to faithfully working on training tasks rather than their original workload. We can't rely on each device to faithfully work on training tasks rather than their original workload.
Smartly schedule work balance and handle join/exit issues also need under consideration. Smartly schedule work balance and handle join/exit issues also need under consideration.
## Our proposal ## Our proposal
- Dynamic training data distribution and runtime profiler - Dynamic training data distribution and runtime profiler
We design a dynamic training data distribution mechanism that helps to both the first and the third challenges. We design a dynamic training data distribution mechanism that helps both the first and the third challenges.
Preprocessing data can be transmitted without leakage of raw sensitive information. Preprocessing data can be transmitted without leakage of raw and sensitive information.
This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time. This can help struggling workers who can train small batches in order to upload parameters with a similar training time.
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness. Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can help with keeping global training parameters from pollution and staleness.
To counter heterogeneity's, more approaches were applied in our later research. To counter heterogeneity, more approaches were applied in our later research.
More details were introduced to runtime profiler in the later works. More details were introduced to the runtime profiler in the later works.
- Asynchronous and synchronous aggregation enabled - Asynchronous and synchronous aggregation enabled
In our findings, asynchronous and synchronous parameter update have their pros and cons. In our findings, asynchronous and synchronous parameter update have their pros and cons.
Keeping sync all the time leads struggling worker issue unsolvable. Keeping sync all the time leads to struggling worker issues unsolvable.
However, async's harm to accuracy and convergence time also need attentions. However, async's harm to accuracy and convergence time also needs attention.
To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages. To carefully choose between these two update policies at the runtime is what we proposed to make use of their own advantages.
- Leader role splitting - Leader role splitting
The idea is to let worker devices with higher bandwidth taking leader role during training. The idea is to let worker devices with higher bandwidth take leader-role during training.
Parameter updating does not require much computation but only need bandwidth. Parameter updating does not require much computation but only needs a great of bandwidth.
Devices with sufficient bandwidth can also work as virtual leader devices. Devices with sufficient bandwidth can also work as virtual leader devices.
This approach helps with minimize physical devices we used and more leaders can further scale up workers limits. This approach helps minimize physical devices we used and more leaders can further scale up workers' limits.
@@ -0,0 +1,109 @@
---
title: "EDDL: How do we train neural networks on limited edge devices - PART 2"
date: 2021-10-31 13:01:14 -0400
tags: Research
author: Pengzhan Hao
cover: '/static/2021-10/f.5_Impl_leader_worker.png'
mathjax: true
---
In the last post, part1, our idea of distributed learning on edge environment was generally addressed.
I introduced the reason why edge distributed learning is needed and what improvements it can achieve.
In this post, I will talk about our motivation study and how our framework works.
## How does data support us training on edge?
Before designing and implementing our framework, we first need confirmation that training on edge resource-limited devices is worthwhile.
We were using a malware detection neural network to show why a small, customized neural network is better.
We collected 32000+ mobile apps feature as global data.
With these data records, we trained a multilayer perceptron called "PerNet" to determine whether a given feature belongs to a benign or malware app.
We called this **detection**.
As well, PerNet can also classify malware apps into different types of attacks.
We called this **classification**.
The global model can achieve 93% above recall rate and 96.93% above accuracy.
With all these data, we selected two community app usage sub-dataset for local model generations.
- Large categories (Scenario 1)
We chose the 5 largest categories of apps, including entertainment, tools, brain&Puzzle, Lifestyle, and Education, as well as the 5 largest malware categories.
All together, 12000+ apps were included in this sub-dataset, almost 50 to 50 between benign and malware.
- Campus-community categories (Scenario 2)
We chose the 5 most downloaded categories from college students as benign groups, as well as a similar amount of 5 malware categories.
To ensure that malware apps are included in 5 benign categories, we also considered synthesizing some other malware apps within categories of 5 most downloaded(benign) categories.
With these two types of sub-dataset, we used the same PerNet to generate multiple local models.
Under each scenarios experiment, we compared global and local models on the preserved test dataset.
In all classification performances, local beat global in every scenario.
In detection performances, local also share the same accuracy as global does.
![Inference results](/static/2021-10/t.3_inference_result.png)
In summary, local models were trained on special occasions.
Under the same circumstance, a global model can achieve no better accuracy than local models.
The reason why local is better might be because of overfitting.
I believe this issue also be considered in the machine learning communities that they brought [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning),
a technique to optimize global models to special scenarios but performing more training to a global model once it's shipped to local.
## Design and Implementation
### Overall design
The basic EDDL distributed training setup consists of 3 parts.
**EDDL training cluster**, a device cluster that consists of edge or mobile devices that are participating in training.
**EDDL manager**, the initial driver program that works as collect training data, relay data to training devices and initial training clusters.
**Training data entry (TDE)**, a data storage for all training data.
### Dynamic training data distribution
Existing distributed DNN training solutions usually statically partition training data among workers.
It can be a problem when the training node joins and exits.
We designed our framework that can dynamically distribute training data during learning.
Before every training batch started, a batch of TDE will be sent to devices.
In our experiments, we found that by applying this design, overall training time was shortened by doing.
Especially in large amount devices cases, this optimization can be 50% less than statically divided.
### Scaling up cluster size
Our framework was designed to have both sync and async parameter aggregation.
Asynchronous aggregation can allow a high outcome of training batch but with a sacrifice or converge time.
Synchronous aggregation allows a quick converge time in epochs, however can't ensure performance when there's a struggler worker.
As showed in experiments, we chose sync as default because the converging time is dominant in overall training time.
But, we also considered the possibilities of that async with more workers can achieve similar overall training time.
We introduced a formula to determine whether adding more training nodes can help or not.
Here we used bandwidth usage coefficient (BUC) as
$$ BUC = \dfrac{n}{T_{sync}} $$
In this formula, $$n$$ is the number of devices, and $$T_{sync}$$ is the transmission time of parameters.
With an increasing number of workers, n increase linearly but transmission time does not.
When $$BUC$$ increases, the cluster can speed up training time by adding workers.
Otherwise, adding more workers won't help with overall training time.
### Adaptive leader role splitting
The idea of role splitting is simple that a device can work as a worker as well leader.
The advantage of doing this is straightforward that we can transfer 1 less parameter and training time will be shortened.
However, in our current settings, it can't perform much better help since only 1 leader role is in a cluster.
We can benefit from this in our future works.
### Overall architecture
![Implementation](/static/2021-10/f.5_Impl_leader_worker.png)
Details were given in the image.
### Prototype hardware and software
EDDL was designed to be run on two single-board computer embedded platforms.
One such platform is [ODROID-XU4](https://www.hardkernel.com/shop/odroid-xu4-special-price/), which is equipped with a 2.1/1.4 GHz 32-bit ARM processor and 2GB memory.
The other platform is the [Raspberry Pi 3 Model B board](https://www.raspberrypi.com/products/raspberry-pi-3-model-b/), which comes with an ARM 1.2 GHz 64-bit quad-core processor and 1GB memory.
The operating system running on the above platforms is Ubuntu 18.04 with Linux kernel 4.14.
We used [Dlib](http://dlib.net/), a C++ library that provides implementations for a wide range of machine learning algorithms.
We chose the Dlib library because it is written in C/C++, and can be easily and natively used in embedded devices.
+5 -3
View File
@@ -428,7 +428,7 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
<ul class="menu"> <ul class="menu">
<li> <li>
<button type="button" class="button button--secondary button--pill tag-button tag-button--all" data-encode=""> <button type="button" class="button button--secondary button--pill tag-button tag-button--all" data-encode="">
Show All<div class="tag-button__count">6</div> Show All<div class="tag-button__count">7</div>
</button> </button>
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Network"> </li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Network">
<span>Network</span><div class="tag-button__count">1</div> <span>Network</span><div class="tag-button__count">1</div>
@@ -436,8 +436,8 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Nonsense"> </li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Nonsense">
<span>Nonsense</span><div class="tag-button__count">1</div> <span>Nonsense</span><div class="tag-button__count">1</div>
</button> </button>
</li><li><button type="button" class="button button--pill tag-button tag-button-2" data-encode="Research"> </li><li><button type="button" class="button button--pill tag-button tag-button-3" data-encode="Research">
<span>Research</span><div class="tag-button__count">2</div> <span>Research</span><div class="tag-button__count">3</div>
</button> </button>
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="visualization"> </li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="visualization">
<span>visualization</span><div class="tag-button__count">1</div> <span>visualization</span><div class="tag-button__count">1</div>
@@ -448,6 +448,8 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
</li></ul> </li></ul>
</div> </div>
<div class="js-result layout--archive__result d-none"><div class="article-list items"><section><h2 class="article-list__group-header">2021</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="Research"> <div class="js-result layout--archive__result d-none"><div class="article-list items"><section><h2 class="article-list__group-header">2021</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="Research">
<div class="item__content"><span class="item__meta">Oct 31</span><a itemprop="headline" class="item__header" href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2">EDDL: How do we train neural networks on limited edge devices - PART 2</a></div>
</li><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="Research">
<div class="item__content"><span class="item__meta">Oct 13</span><a itemprop="headline" class="item__header" href="/posts/eddl-how-do-we-train-on-limited-edge-devices">EDDL: How do we train neural networks on limited edge devices - PART 1</a></div> <div class="item__content"><span class="item__meta">Oct 13</span><a itemprop="headline" class="item__header" href="/posts/eddl-how-do-we-train-on-limited-edge-devices">EDDL: How do we train neural networks on limited edge devices - PART 1</a></div>
</li></ul></section><section><h2 class="article-list__group-header">2020</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="visualization"> </li></ul></section><section><h2 class="article-list__group-header">2020</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="visualization">
<div class="item__content"><span class="item__meta">Sep 15</span><a itemprop="headline" class="item__header" href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div> <div class="item__content"><span class="item__meta">Sep 15</span><a itemprop="headline" class="item__header" href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div>
+1 -1
View File
@@ -1 +1 @@
window.TEXT_SEARCH_DATA={'posts':[{'title':"Stop Talking is the worst title of one blog",'url':"/posts/welcome-to-my-blog"},{'title':"Using charles proxy to monitor mobile SSL traffics",'url':"/posts/charles-is-not-a-good-tool"},{'title':"Some of my previews experiment works: 2016",'url':"/posts/some-of-my-previews-exper-work"},{'title':"Xv6 introduction",'url':"/posts/intro-xv6"},{'title':"Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries",'url':"/posts/generate-word-cloud-with-chinese-fenci"},{'title':"EDDL: How do we train neural networks on limited edge devices - PART 1",'url':"/posts/eddl-how-do-we-train-on-limited-edge-devices"}]}; window.TEXT_SEARCH_DATA={'posts':[{'title':"Stop Talking is the worst title of one blog",'url':"/posts/welcome-to-my-blog"},{'title':"Using charles proxy to monitor mobile SSL traffics",'url':"/posts/charles-is-not-a-good-tool"},{'title':"Some of my previews experiment works: 2016",'url':"/posts/some-of-my-previews-exper-work"},{'title':"Xv6 introduction",'url':"/posts/intro-xv6"},{'title':"Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries",'url':"/posts/generate-word-cloud-with-chinese-fenci"},{'title':"EDDL: How do we train neural networks on limited edge devices - PART 1",'url':"/posts/eddl-how-do-we-train-on-limited-edge-devices"},{'title':"EDDL: How do we train neural networks on limited edge devices - PART 2",'url':"/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"}]};
+139 -35
View File
File diff suppressed because one or more lines are too long
+16 -2
View File
@@ -424,7 +424,21 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
<div class="col-main cell cell--auto"><!-- start custom main top snippet --> <div class="col-main cell cell--auto"><!-- start custom main top snippet -->
<!-- end custom main top snippet --> <!-- end custom main top snippet -->
<article itemscope itemtype="http://schema.org/WebPage"><header style="display:none;"><h1>Home</h1></header><meta itemprop="headline" content="Home"><meta itemprop="author" content="Pengzhan Hao"/><div class="js-article-content"><div class="layout--articles"><div class="article-list items items--divided"><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/edgelearn-1.png" /></div><div class="item__content"> <article itemscope itemtype="http://schema.org/WebPage"><header style="display:none;"><h1>Home</h1></header><meta itemprop="headline" content="Home"><meta itemprop="author" content="Pengzhan Hao"/><div class="js-article-content"><div class="layout--articles"><div class="article-list items items--divided"><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/f.5_Impl_leader_worker.png" /></div><div class="item__content">
<header><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"><h2 itemprop="headline" class="item__header">EDDL: How do we train neural networks on limited edge devices - PART 2</h2></a></header>
<div class="item__description"><div class="article__content" itemprop="description articleBody">In the last post, part1, our idea of distributed learning on edge environment was generally addressed.
I introduced the reason why edge distributed learning is needed and what improvements it can achieve.
In this post, I will talk about our motivation study and how our framework works.
How does data support us training on edge?
Before designin...</div><p><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2">Read more</a></p></div><div class="article__info clearfix"><ul class="left-col menu"><li>
<a class="button button--secondary button--pill button--sm"
href="/archive.html?tag=Research">Research</a>
</li></ul><ul class="right-col menu"><li><i class="fas fa-user"></i> <span>Pengzhan Hao</span></li><li><i class="far fa-calendar-alt"></i> <span>Oct 31, 2021</span>
</li></ul></div><meta itemprop="author" content="Pengzhan Hao"/><meta itemprop="datePublished" content="2021-10-31T13:01:14-04:00">
<meta itemprop="keywords" content="Research"></div>
</article><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/edgelearn-1.png" /></div><div class="item__content">
<header><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices"><h2 itemprop="headline" class="item__header">EDDL: How do we train neural networks on limited edge devices - PART 1</h2></a></header> <header><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices"><h2 itemprop="headline" class="item__header">EDDL: How do we train neural networks on limited edge devices - PART 1</h2></a></header>
<div class="item__description"><div class="article__content" itemprop="description articleBody">This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published. <div class="item__description"><div class="article__content" itemprop="description articleBody">This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published.
As the first part of the introductions, I focus only on the motivation and summary of our works. As the first part of the introductions, I focus only on the motivation and summary of our works.
@@ -444,7 +458,7 @@ If your written language is based on latin alphabet(or other language has space
</li></ul><ul class="right-col menu"><li><i class="fas fa-user"></i> <span>Pengzhan Hao</span></li><li><i class="far fa-calendar-alt"></i> <span>Sep 15, 2020</span> </li></ul><ul class="right-col menu"><li><i class="fas fa-user"></i> <span>Pengzhan Hao</span></li><li><i class="far fa-calendar-alt"></i> <span>Sep 15, 2020</span>
</li></ul></div><meta itemprop="author" content="Pengzhan Hao"/><meta itemprop="datePublished" content="2020-09-15T22:00:14-04:00"> </li></ul></div><meta itemprop="author" content="Pengzhan Hao"/><meta itemprop="datePublished" content="2020-09-15T22:00:14-04:00">
<meta itemprop="keywords" content="visualization"></div> <meta itemprop="keywords" content="visualization"></div>
</article><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__content"> </article><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/Xv6_LS_Command_Output.png" /></div><div class="item__content">
<header><a href="/posts/intro-xv6"><h2 itemprop="headline" class="item__header">Xv6 introduction</h2></a></header> <header><a href="/posts/intro-xv6"><h2 itemprop="headline" class="item__header">Xv6 introduction</h2></a></header>
<div class="item__description"><div class="article__content" itemprop="description articleBody">In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes. <div class="item__description"><div class="article__content" itemprop="description articleBody">In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
Understand system call and know how to implement a simple one will be coved as the first half. Understand system call and know how to implement a simple one will be coved as the first half.
File diff suppressed because it is too large Load Diff
@@ -441,42 +441,42 @@ More details in design and implementation can be found in late posts.</p>
<h2 id="why-do-we-need-training-on-edge">Why do we need training on edge?</h2> <h2 id="why-do-we-need-training-on-edge">Why do we need training on edge?</h2>
<p>Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before. <p>Cloud is not trustworthy anymore. More and more facts support that breach on the cloud happens frequently than before.
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves.</p> Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech companies know better to someones than the user.</p>
<p>Researchers, no matter in industry on academia, are working in a way that still learning from users data but also keeping raw sensitive data under users control. <p>Researchers, no matter in the industry on academia, are working in a way that still learning from users data but also keeping raw sensitive data under users control.
Many publications already showed feasibility of only sharing after-trained model instead of raw data. Many publications have already shown the feasibility of only sharing the after-trained model instead of raw data.
One recent popular study on this is googles <a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a>.</p> One recent popular study on this is googles <a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a>.</p>
<p>During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency. <p>During investigating this problem, we found that letting end-user train their data is safe, but sacrifice efficiency.
Since one end device has limited resources, training time and power consumption can be disappointing. Since one end device has limited resources, training time and power consumption can be disappointing.
We believe there must have a leverage between privacy and efficiency in some target scenarios.</p> We believe there must have leverage between privacy and efficiency in some target scenarios.</p>
<p>Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests. <p>Fortunately, we observed that users who belong to the same campus, plant, firm, and community always share similar interests.
Therefore, these co-located users have similar demands in using AI-involved routines. Therefore, these co-located users have similar demands in using AI-involved routines.
Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners.</p> Also, co-located users are easily targeted by the same type of threats, such as ransomware to financial practitioners.</p>
<p>Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program. <p>Think about this, sending features of a new malware app to cloud services to train neural networks used by antivirus programs.
This process may takes long time and small amount of samples may not be recognized by the global neural networks model. This process may take a long time and a small number of samples may not be recognized by the global neural networks model.
With a customized local model trained and deployed on the edge can successfully counter the problem. A customized local model trained and deployed on the edge can successfully counter the problem.
With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible.</p> With edge training as a supplement to the cloud training can achieve better response time and let the whole system more flexible.</p>
<h2 id="why-training-on-edge-is-hard">Why training on edge is hard?</h2> <h2 id="why-training-on-edge-is-hard">Why training on edge is hard?</h2>
<p>Since all co-located users device can be used for an edge training, issues and challenges occur as deploying this distributed system.</p> <p>Since all co-located users devices can be used for edge training, issues and challenges occur as deploying this distributed system.</p>
<p>The first challenge is <strong>struggling workers</strong>. <p>The first challenge is <strong>struggling workers</strong>.
Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU. Training devices are heterogeneous, from limited IoT cameras to high-end media centers with powerful GPUs.
They are not designed to do machine learnings. They are not designed to do machine learning.
So, a good edge-based distributed learning framework must can handle variety speeds in training tasks.</p> So, a good edge-based distributed learning framework must be able to handle a variety of speeds in training tasks.</p>
<p>The second challenge is how to <strong>scale up</strong> clusters. <p>The second challenge is how to <strong>scale up</strong> clusters.
In a campus, thousands and more devices may contribute computing resources to the same training tasks. On a campus, thousands and more devices may contribute computing resources to the same training tasks.
However, these devices may located in far not matter in physical or in network topology. However, these devices may be located far no matter in physical or in network topology.
How can we well use them well, without struggled with endless transmission time remains a challenge.</p> The question of how can we well use them well, without struggling with endless transmission time remains a challenge.</p>
<p>The third issue is frequently <strong>joining and exiting</strong> of devices. <p>The third issue is frequently <strong>joining and exiting</strong> of devices.
We cant rely on each devices to faithfully working on training tasks rather than their original workload. We cant rely on each device to faithfully work on training tasks rather than their original workload.
Smartly schedule work balance and handle join/exit issues also need under consideration.</p> Smartly schedule work balance and handle join/exit issues also need under consideration.</p>
<h2 id="our-proposal">Our proposal</h2> <h2 id="our-proposal">Our proposal</h2>
@@ -485,29 +485,29 @@ Smartly schedule work balance and handle join/exit issues also need under consid
<li> <li>
<p>Dynamic training data distribution and runtime profiler</p> <p>Dynamic training data distribution and runtime profiler</p>
<p>We design a dynamic training data distribution mechanism that helps to both the first and the third challenges. <p>We design a dynamic training data distribution mechanism that helps both the first and the third challenges.
Preprocessing data can be transmitted without leakage of raw sensitive information. Preprocessing data can be transmitted without leakage of raw and sensitive information.
This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time. This can help struggling workers who can train small batches in order to upload parameters with a similar training time.
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness.</p> Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can help with keeping global training parameters from pollution and staleness.</p>
<p>To counter heterogeneitys, more approaches were applied in our later research. <p>To counter heterogeneity, more approaches were applied in our later research.
More details were introduced to runtime profiler in the later works.</p> More details were introduced to the runtime profiler in the later works.</p>
</li> </li>
<li> <li>
<p>Asynchronous and synchronous aggregation enabled</p> <p>Asynchronous and synchronous aggregation enabled</p>
<p>In our findings, asynchronous and synchronous parameter update have their pros and cons. <p>In our findings, asynchronous and synchronous parameter update have their pros and cons.
Keeping sync all the time leads struggling worker issue unsolvable. Keeping sync all the time leads to struggling worker issues unsolvable.
However, asyncs harm to accuracy and convergence time also need attentions. However, asyncs harm to accuracy and convergence time also needs attention.
To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages.</p> To carefully choose between these two update policies at the runtime is what we proposed to make use of their own advantages.</p>
</li> </li>
<li> <li>
<p>Leader role splitting</p> <p>Leader role splitting</p>
<p>The idea is to let worker devices with higher bandwidth taking leader role during training. <p>The idea is to let worker devices with higher bandwidth take leader-role during training.
Parameter updating does not require much computation but only need bandwidth. Parameter updating does not require much computation but only needs a great of bandwidth.
Devices with sufficient bandwidth can also work as virtual leader devices. Devices with sufficient bandwidth can also work as virtual leader devices.
This approach helps with minimize physical devices we used and more leaders can further scale up workers limits.</p> This approach helps minimize physical devices we used and more leaders can further scale up workers limits.</p>
</li> </li>
</ul> </ul>
</div><section class="article__sharing d-print-none"></section><div class="d-print-none"><footer class="article__footer"><meta itemprop="dateModified" content="2021-10-13T16:53:20-04:00"><!-- start custom article footer snippet --> </div><section class="article__sharing d-print-none"></section><div class="d-print-none"><footer class="article__footer"><meta itemprop="dateModified" content="2021-10-13T16:53:20-04:00"><!-- start custom article footer snippet -->
@@ -515,7 +515,7 @@ Smartly schedule work balance and handle join/exit issues also need under consid
<!-- end custom article footer snippet --> <!-- end custom article footer snippet -->
<div class="article__subscribe"><div class="subscribe"><i class="fas fa-rss"></i> <a type="application/rss+xml" href="/feed.xml">Subscribe</a></div> <div class="article__subscribe"><div class="subscribe"><i class="fas fa-rss"></i> <a type="application/rss+xml" href="/feed.xml">Subscribe</a></div>
</div><div class="article__license"></div></footer> </div><div class="article__license"></div></footer>
<div class="article__section-navigator clearfix"><div class="previous"><span>PREVIOUS</span><a href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div></div></div> <div class="article__section-navigator clearfix"><div class="previous"><span>PREVIOUS</span><a href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div><div class="next"><span>NEXT</span><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2">EDDL: How do we train neural networks on limited edge devices - PART 2</a></div></div></div>
</div> </div>
+4
View File
@@ -25,6 +25,10 @@
<lastmod>2021-10-13T16:53:20-04:00</lastmod> <lastmod>2021-10-13T16:53:20-04:00</lastmod>
</url> </url>
<url> <url>
<loc>https://codersherlock.github.com//posts/eddl-how-do-we-train-on-limited-edge-devices-part2</loc>
<lastmod>2021-10-31T13:01:14-04:00</lastmod>
</url>
<url>
<loc>https://codersherlock.github.com//about.html</loc> <loc>https://codersherlock.github.com//about.html</loc>
</url> </url>
<url> <url>
Binary file not shown.

After

Width:  |  Height:  |  Size: 8.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB

Executable
+3
View File
@@ -0,0 +1,3 @@
export PATH="$HOME/gems/bin:$PATH"
export GEM_HOME="$HOME/gems"
export JEKYLL_ENV=production
Binary file not shown.

After

Width:  |  Height:  |  Size: 8.8 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 87 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 37 KiB