mirror of
https://github.com/CoderSherlock/CoderSherlock.github.io.git
synced 2026-06-12 23:58:11 -07:00
Add new post about eddl part 2
This commit is contained in:
+1
-1
@@ -1 +1 @@
|
|||||||
I"ý
|
I"
|
||||||
@@ -1,46 +1,3 @@
|
|||||||
My Personal Blog
|
My Personal Blog
|
||||||
|
|
||||||
|
LANG: en_US
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
写点啥呢?感觉自己真是开坑狂魔。
|
|
||||||
准备开三个新坑吧,除去现在还在更新的科研进度,接下来我要开两个新坑介绍自己的两个项目,以及一个摄影的图坑。
|
|
||||||
@@ -202,6 +202,7 @@ exclude:
|
|||||||
- /screenshots
|
- /screenshots
|
||||||
- /test
|
- /test
|
||||||
- /vendor
|
- /vendor
|
||||||
|
- configure.sh
|
||||||
|
|
||||||
defaults:
|
defaults:
|
||||||
- scope:
|
- scope:
|
||||||
|
|||||||
@@ -3,6 +3,7 @@ title: "Xv6 introduction"
|
|||||||
date: 2017-07-28 14:56:55 -0400
|
date: 2017-07-28 14:56:55 -0400
|
||||||
tags: xv6
|
tags: xv6
|
||||||
author: Pengzhan Hao
|
author: Pengzhan Hao
|
||||||
|
cover: '/static/2021-10/Xv6_LS_Command_Output.png'
|
||||||
---
|
---
|
||||||
|
|
||||||
In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
|
In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
|
||||||
|
|||||||
@@ -14,66 +14,66 @@ More details in design and implementation can be found in late posts.
|
|||||||
|
|
||||||
## Why do we need training on edge?
|
## Why do we need training on edge?
|
||||||
|
|
||||||
Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before.
|
Cloud is not trustworthy anymore. More and more facts support that breach on the cloud happens frequently than before.
|
||||||
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves.
|
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech companies know better to someones than the user.
|
||||||
|
|
||||||
Researchers, no matter in industry on academia, are working in a way that still learning from users' data but also keeping raw sensitive data under users' control.
|
Researchers, no matter in the industry on academia, are working in a way that still learning from users' data but also keeping raw sensitive data under users' control.
|
||||||
Many publications already showed feasibility of only sharing after-trained model instead of raw data.
|
Many publications have already shown the feasibility of only sharing the after-trained model instead of raw data.
|
||||||
One recent popular study on this is google's [federated learning](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html).
|
One recent popular study on this is google's [federated learning](https://ai.googleblog.com/2017/04/federated-learning-collaborative.html).
|
||||||
|
|
||||||
During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency.
|
During investigating this problem, we found that letting end-user train their data is safe, but sacrifice efficiency.
|
||||||
Since one end device has limited resources, training time and power consumption can be disappointing.
|
Since one end device has limited resources, training time and power consumption can be disappointing.
|
||||||
We believe there must have a leverage between privacy and efficiency in some target scenarios.
|
We believe there must have leverage between privacy and efficiency in some target scenarios.
|
||||||
|
|
||||||
Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests.
|
Fortunately, we observed that users who belong to the same campus, plant, firm, and community always share similar interests.
|
||||||
Therefore, these co-located users have similar demands in using AI-involved routines.
|
Therefore, these co-located users have similar demands in using AI-involved routines.
|
||||||
Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners.
|
Also, co-located users are easily targeted by the same type of threats, such as ransomware to financial practitioners.
|
||||||
|
|
||||||
Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program.
|
Think about this, sending features of a new malware app to cloud services to train neural networks used by antivirus programs.
|
||||||
This process may takes long time and small amount of samples may not be recognized by the global neural networks model.
|
This process may take a long time and a small number of samples may not be recognized by the global neural networks model.
|
||||||
With a customized local model trained and deployed on the edge can successfully counter the problem.
|
A customized local model trained and deployed on the edge can successfully counter the problem.
|
||||||
With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible.
|
With edge training as a supplement to the cloud training can achieve better response time and let the whole system more flexible.
|
||||||
|
|
||||||
## Why training on edge is hard?
|
## Why training on edge is hard?
|
||||||
|
|
||||||
Since all co-located users' device can be used for an edge training, issues and challenges occur as deploying this distributed system.
|
Since all co-located users' devices can be used for edge training, issues and challenges occur as deploying this distributed system.
|
||||||
|
|
||||||
The first challenge is **struggling workers**.
|
The first challenge is **struggling workers**.
|
||||||
Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU.
|
Training devices are heterogeneous, from limited IoT cameras to high-end media centers with powerful GPUs.
|
||||||
They are not designed to do machine learnings.
|
They are not designed to do machine learning.
|
||||||
So, a good edge-based distributed learning framework must can handle variety speeds in training tasks.
|
So, a good edge-based distributed learning framework must be able to handle a variety of speeds in training tasks.
|
||||||
|
|
||||||
The second challenge is how to **scale up** clusters.
|
The second challenge is how to **scale up** clusters.
|
||||||
In a campus, thousands and more devices may contribute computing resources to the same training tasks.
|
On a campus, thousands and more devices may contribute computing resources to the same training tasks.
|
||||||
However, these devices may located in far not matter in physical or in network topology.
|
However, these devices may be located far no matter in physical or in network topology.
|
||||||
How can we well use them well, without struggled with endless transmission time remains a challenge.
|
The question of how can we well use them well, without struggling with endless transmission time remains a challenge.
|
||||||
|
|
||||||
The third issue is frequently **joining and exiting** of devices.
|
The third issue is frequently **joining and exiting** of devices.
|
||||||
We can't rely on each devices to faithfully working on training tasks rather than their original workload.
|
We can't rely on each device to faithfully work on training tasks rather than their original workload.
|
||||||
Smartly schedule work balance and handle join/exit issues also need under consideration.
|
Smartly schedule work balance and handle join/exit issues also need under consideration.
|
||||||
|
|
||||||
## Our proposal
|
## Our proposal
|
||||||
|
|
||||||
- Dynamic training data distribution and runtime profiler
|
- Dynamic training data distribution and runtime profiler
|
||||||
|
|
||||||
We design a dynamic training data distribution mechanism that helps to both the first and the third challenges.
|
We design a dynamic training data distribution mechanism that helps both the first and the third challenges.
|
||||||
Preprocessing data can be transmitted without leakage of raw sensitive information.
|
Preprocessing data can be transmitted without leakage of raw and sensitive information.
|
||||||
This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time.
|
This can help struggling workers who can train small batches in order to upload parameters with a similar training time.
|
||||||
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness.
|
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can help with keeping global training parameters from pollution and staleness.
|
||||||
|
|
||||||
To counter heterogeneity's, more approaches were applied in our later research.
|
To counter heterogeneity, more approaches were applied in our later research.
|
||||||
More details were introduced to runtime profiler in the later works.
|
More details were introduced to the runtime profiler in the later works.
|
||||||
|
|
||||||
- Asynchronous and synchronous aggregation enabled
|
- Asynchronous and synchronous aggregation enabled
|
||||||
|
|
||||||
In our findings, asynchronous and synchronous parameter update have their pros and cons.
|
In our findings, asynchronous and synchronous parameter update have their pros and cons.
|
||||||
Keeping sync all the time leads struggling worker issue unsolvable.
|
Keeping sync all the time leads to struggling worker issues unsolvable.
|
||||||
However, async's harm to accuracy and convergence time also need attentions.
|
However, async's harm to accuracy and convergence time also needs attention.
|
||||||
To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages.
|
To carefully choose between these two update policies at the runtime is what we proposed to make use of their own advantages.
|
||||||
|
|
||||||
- Leader role splitting
|
- Leader role splitting
|
||||||
|
|
||||||
The idea is to let worker devices with higher bandwidth taking leader role during training.
|
The idea is to let worker devices with higher bandwidth take leader-role during training.
|
||||||
Parameter updating does not require much computation but only need bandwidth.
|
Parameter updating does not require much computation but only needs a great of bandwidth.
|
||||||
Devices with sufficient bandwidth can also work as virtual leader devices.
|
Devices with sufficient bandwidth can also work as virtual leader devices.
|
||||||
This approach helps with minimize physical devices we used and more leaders can further scale up workers limits.
|
This approach helps minimize physical devices we used and more leaders can further scale up workers' limits.
|
||||||
|
|||||||
@@ -0,0 +1,109 @@
|
|||||||
|
---
|
||||||
|
title: "EDDL: How do we train neural networks on limited edge devices - PART 2"
|
||||||
|
date: 2021-10-31 13:01:14 -0400
|
||||||
|
tags: Research
|
||||||
|
author: Pengzhan Hao
|
||||||
|
cover: '/static/2021-10/f.5_Impl_leader_worker.png'
|
||||||
|
mathjax: true
|
||||||
|
---
|
||||||
|
|
||||||
|
In the last post, part1, our idea of distributed learning on edge environment was generally addressed.
|
||||||
|
I introduced the reason why edge distributed learning is needed and what improvements it can achieve.
|
||||||
|
In this post, I will talk about our motivation study and how our framework works.
|
||||||
|
|
||||||
|
## How does data support us training on edge?
|
||||||
|
|
||||||
|
Before designing and implementing our framework, we first need confirmation that training on edge resource-limited devices is worthwhile.
|
||||||
|
We were using a malware detection neural network to show why a small, customized neural network is better.
|
||||||
|
|
||||||
|
We collected 32000+ mobile apps feature as global data.
|
||||||
|
With these data records, we trained a multilayer perceptron called "PerNet" to determine whether a given feature belongs to a benign or malware app.
|
||||||
|
We called this **detection**.
|
||||||
|
As well, PerNet can also classify malware apps into different types of attacks.
|
||||||
|
We called this **classification**.
|
||||||
|
The global model can achieve 93% above recall rate and 96.93% above accuracy.
|
||||||
|
|
||||||
|
With all these data, we selected two community app usage sub-dataset for local model generations.
|
||||||
|
|
||||||
|
- Large categories (Scenario 1)
|
||||||
|
We chose the 5 largest categories of apps, including entertainment, tools, brain&Puzzle, Lifestyle, and Education, as well as the 5 largest malware categories.
|
||||||
|
All together, 12000+ apps were included in this sub-dataset, almost 50 to 50 between benign and malware.
|
||||||
|
|
||||||
|
- Campus-community categories (Scenario 2)
|
||||||
|
We chose the 5 most downloaded categories from college students as benign groups, as well as a similar amount of 5 malware categories.
|
||||||
|
To ensure that malware apps are included in 5 benign categories, we also considered synthesizing some other malware apps within categories of 5 most downloaded(benign) categories.
|
||||||
|
|
||||||
|
With these two types of sub-dataset, we used the same PerNet to generate multiple local models.
|
||||||
|
Under each scenarios experiment, we compared global and local models on the preserved test dataset.
|
||||||
|
In all classification performances, local beat global in every scenario.
|
||||||
|
In detection performances, local also share the same accuracy as global does.
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
In summary, local models were trained on special occasions.
|
||||||
|
Under the same circumstance, a global model can achieve no better accuracy than local models.
|
||||||
|
The reason why local is better might be because of overfitting.
|
||||||
|
I believe this issue also be considered in the machine learning communities that they brought [transfer learning](https://en.wikipedia.org/wiki/Transfer_learning),
|
||||||
|
a technique to optimize global models to special scenarios but performing more training to a global model once it's shipped to local.
|
||||||
|
|
||||||
|
## Design and Implementation
|
||||||
|
|
||||||
|
### Overall design
|
||||||
|
|
||||||
|
The basic EDDL distributed training setup consists of 3 parts.
|
||||||
|
**EDDL training cluster**, a device cluster that consists of edge or mobile devices that are participating in training.
|
||||||
|
**EDDL manager**, the initial driver program that works as collect training data, relay data to training devices and initial training clusters.
|
||||||
|
**Training data entry (TDE)**, a data storage for all training data.
|
||||||
|
|
||||||
|
### Dynamic training data distribution
|
||||||
|
|
||||||
|
Existing distributed DNN training solutions usually statically partition training data among workers.
|
||||||
|
It can be a problem when the training node joins and exits.
|
||||||
|
We designed our framework that can dynamically distribute training data during learning.
|
||||||
|
Before every training batch started, a batch of TDE will be sent to devices.
|
||||||
|
|
||||||
|
In our experiments, we found that by applying this design, overall training time was shortened by doing.
|
||||||
|
Especially in large amount devices cases, this optimization can be 50% less than statically divided.
|
||||||
|
|
||||||
|
### Scaling up cluster size
|
||||||
|
|
||||||
|
Our framework was designed to have both sync and async parameter aggregation.
|
||||||
|
Asynchronous aggregation can allow a high outcome of training batch but with a sacrifice or converge time.
|
||||||
|
Synchronous aggregation allows a quick converge time in epochs, however can't ensure performance when there's a struggler worker.
|
||||||
|
|
||||||
|
As showed in experiments, we chose sync as default because the converging time is dominant in overall training time.
|
||||||
|
But, we also considered the possibilities of that async with more workers can achieve similar overall training time.
|
||||||
|
|
||||||
|
We introduced a formula to determine whether adding more training nodes can help or not.
|
||||||
|
Here we used bandwidth usage coefficient (BUC) as
|
||||||
|
|
||||||
|
$$ BUC = \dfrac{n}{T_{sync}} $$
|
||||||
|
|
||||||
|
In this formula, $$n$$ is the number of devices, and $$T_{sync}$$ is the transmission time of parameters.
|
||||||
|
With an increasing number of workers, n increase linearly but transmission time does not.
|
||||||
|
When $$BUC$$ increases, the cluster can speed up training time by adding workers.
|
||||||
|
Otherwise, adding more workers won't help with overall training time.
|
||||||
|
|
||||||
|
### Adaptive leader role splitting
|
||||||
|
|
||||||
|
The idea of role splitting is simple that a device can work as a worker as well leader.
|
||||||
|
The advantage of doing this is straightforward that we can transfer 1 less parameter and training time will be shortened.
|
||||||
|
|
||||||
|
However, in our current settings, it can't perform much better help since only 1 leader role is in a cluster.
|
||||||
|
We can benefit from this in our future works.
|
||||||
|
|
||||||
|
### Overall architecture
|
||||||
|
|
||||||
|

|
||||||
|
|
||||||
|
Details were given in the image.
|
||||||
|
|
||||||
|
### Prototype hardware and software
|
||||||
|
|
||||||
|
EDDL was designed to be run on two single-board computer embedded platforms.
|
||||||
|
One such platform is [ODROID-XU4](https://www.hardkernel.com/shop/odroid-xu4-special-price/), which is equipped with a 2.1/1.4 GHz 32-bit ARM processor and 2GB memory.
|
||||||
|
The other platform is the [Raspberry Pi 3 Model B board](https://www.raspberrypi.com/products/raspberry-pi-3-model-b/), which comes with an ARM 1.2 GHz 64-bit quad-core processor and 1GB memory.
|
||||||
|
|
||||||
|
The operating system running on the above platforms is Ubuntu 18.04 with Linux kernel 4.14.
|
||||||
|
We used [Dlib](http://dlib.net/), a C++ library that provides implementations for a wide range of machine learning algorithms.
|
||||||
|
We chose the Dlib library because it is written in C/C++, and can be easily and natively used in embedded devices.
|
||||||
+5
-3
@@ -428,7 +428,7 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
|
|||||||
<ul class="menu">
|
<ul class="menu">
|
||||||
<li>
|
<li>
|
||||||
<button type="button" class="button button--secondary button--pill tag-button tag-button--all" data-encode="">
|
<button type="button" class="button button--secondary button--pill tag-button tag-button--all" data-encode="">
|
||||||
Show All<div class="tag-button__count">6</div>
|
Show All<div class="tag-button__count">7</div>
|
||||||
</button>
|
</button>
|
||||||
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Network">
|
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Network">
|
||||||
<span>Network</span><div class="tag-button__count">1</div>
|
<span>Network</span><div class="tag-button__count">1</div>
|
||||||
@@ -436,8 +436,8 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
|
|||||||
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Nonsense">
|
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="Nonsense">
|
||||||
<span>Nonsense</span><div class="tag-button__count">1</div>
|
<span>Nonsense</span><div class="tag-button__count">1</div>
|
||||||
</button>
|
</button>
|
||||||
</li><li><button type="button" class="button button--pill tag-button tag-button-2" data-encode="Research">
|
</li><li><button type="button" class="button button--pill tag-button tag-button-3" data-encode="Research">
|
||||||
<span>Research</span><div class="tag-button__count">2</div>
|
<span>Research</span><div class="tag-button__count">3</div>
|
||||||
</button>
|
</button>
|
||||||
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="visualization">
|
</li><li><button type="button" class="button button--pill tag-button tag-button-1" data-encode="visualization">
|
||||||
<span>visualization</span><div class="tag-button__count">1</div>
|
<span>visualization</span><div class="tag-button__count">1</div>
|
||||||
@@ -448,6 +448,8 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
|
|||||||
</li></ul>
|
</li></ul>
|
||||||
</div>
|
</div>
|
||||||
<div class="js-result layout--archive__result d-none"><div class="article-list items"><section><h2 class="article-list__group-header">2021</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="Research">
|
<div class="js-result layout--archive__result d-none"><div class="article-list items"><section><h2 class="article-list__group-header">2021</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="Research">
|
||||||
|
<div class="item__content"><span class="item__meta">Oct 31</span><a itemprop="headline" class="item__header" href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2">EDDL: How do we train neural networks on limited edge devices - PART 2</a></div>
|
||||||
|
</li><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="Research">
|
||||||
<div class="item__content"><span class="item__meta">Oct 13</span><a itemprop="headline" class="item__header" href="/posts/eddl-how-do-we-train-on-limited-edge-devices">EDDL: How do we train neural networks on limited edge devices - PART 1</a></div>
|
<div class="item__content"><span class="item__meta">Oct 13</span><a itemprop="headline" class="item__header" href="/posts/eddl-how-do-we-train-on-limited-edge-devices">EDDL: How do we train neural networks on limited edge devices - PART 1</a></div>
|
||||||
</li></ul></section><section><h2 class="article-list__group-header">2020</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="visualization">
|
</li></ul></section><section><h2 class="article-list__group-header">2020</h2><ul class="items"><li class="item" itemscope itemtype="http://schema.org/BlogPosting" data-tags="visualization">
|
||||||
<div class="item__content"><span class="item__meta">Sep 15</span><a itemprop="headline" class="item__header" href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div>
|
<div class="item__content"><span class="item__meta">Sep 15</span><a itemprop="headline" class="item__header" href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div>
|
||||||
|
|||||||
@@ -1 +1 @@
|
|||||||
window.TEXT_SEARCH_DATA={'posts':[{'title':"Stop Talking is the worst title of one blog",'url':"/posts/welcome-to-my-blog"},{'title':"Using charles proxy to monitor mobile SSL traffics",'url':"/posts/charles-is-not-a-good-tool"},{'title':"Some of my previews experiment works: 2016",'url':"/posts/some-of-my-previews-exper-work"},{'title':"Xv6 introduction",'url':"/posts/intro-xv6"},{'title':"Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries",'url':"/posts/generate-word-cloud-with-chinese-fenci"},{'title':"EDDL: How do we train neural networks on limited edge devices - PART 1",'url':"/posts/eddl-how-do-we-train-on-limited-edge-devices"}]};
|
window.TEXT_SEARCH_DATA={'posts':[{'title':"Stop Talking is the worst title of one blog",'url':"/posts/welcome-to-my-blog"},{'title':"Using charles proxy to monitor mobile SSL traffics",'url':"/posts/charles-is-not-a-good-tool"},{'title':"Some of my previews experiment works: 2016",'url':"/posts/some-of-my-previews-exper-work"},{'title':"Xv6 introduction",'url':"/posts/intro-xv6"},{'title':"Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries",'url':"/posts/generate-word-cloud-with-chinese-fenci"},{'title':"EDDL: How do we train neural networks on limited edge devices - PART 1",'url':"/posts/eddl-how-do-we-train-on-limited-edge-devices"},{'title':"EDDL: How do we train neural networks on limited edge devices - PART 2",'url':"/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"}]};
|
||||||
|
|||||||
+139
-35
File diff suppressed because one or more lines are too long
+16
-2
@@ -424,7 +424,21 @@ c13 9 26 20 30 26 7 11 -9 26 -27 26 -5 0 -3 -5 5 -10 9 -6 10 -10 3 -10 -24
|
|||||||
<div class="col-main cell cell--auto"><!-- start custom main top snippet -->
|
<div class="col-main cell cell--auto"><!-- start custom main top snippet -->
|
||||||
|
|
||||||
<!-- end custom main top snippet -->
|
<!-- end custom main top snippet -->
|
||||||
<article itemscope itemtype="http://schema.org/WebPage"><header style="display:none;"><h1>Home</h1></header><meta itemprop="headline" content="Home"><meta itemprop="author" content="Pengzhan Hao"/><div class="js-article-content"><div class="layout--articles"><div class="article-list items items--divided"><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/edgelearn-1.png" /></div><div class="item__content">
|
<article itemscope itemtype="http://schema.org/WebPage"><header style="display:none;"><h1>Home</h1></header><meta itemprop="headline" content="Home"><meta itemprop="author" content="Pengzhan Hao"/><div class="js-article-content"><div class="layout--articles"><div class="article-list items items--divided"><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/f.5_Impl_leader_worker.png" /></div><div class="item__content">
|
||||||
|
<header><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2"><h2 itemprop="headline" class="item__header">EDDL: How do we train neural networks on limited edge devices - PART 2</h2></a></header>
|
||||||
|
<div class="item__description"><div class="article__content" itemprop="description articleBody">In the last post, part1, our idea of distributed learning on edge environment was generally addressed.
|
||||||
|
I introduced the reason why edge distributed learning is needed and what improvements it can achieve.
|
||||||
|
In this post, I will talk about our motivation study and how our framework works.
|
||||||
|
|
||||||
|
How does data support us training on edge?
|
||||||
|
|
||||||
|
Before designin...</div><p><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2">Read more</a></p></div><div class="article__info clearfix"><ul class="left-col menu"><li>
|
||||||
|
<a class="button button--secondary button--pill button--sm"
|
||||||
|
href="/archive.html?tag=Research">Research</a>
|
||||||
|
</li></ul><ul class="right-col menu"><li><i class="fas fa-user"></i> <span>Pengzhan Hao</span></li><li><i class="far fa-calendar-alt"></i> <span>Oct 31, 2021</span>
|
||||||
|
</li></ul></div><meta itemprop="author" content="Pengzhan Hao"/><meta itemprop="datePublished" content="2021-10-31T13:01:14-04:00">
|
||||||
|
<meta itemprop="keywords" content="Research"></div>
|
||||||
|
</article><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/edgelearn-1.png" /></div><div class="item__content">
|
||||||
<header><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices"><h2 itemprop="headline" class="item__header">EDDL: How do we train neural networks on limited edge devices - PART 1</h2></a></header>
|
<header><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices"><h2 itemprop="headline" class="item__header">EDDL: How do we train neural networks on limited edge devices - PART 1</h2></a></header>
|
||||||
<div class="item__description"><div class="article__content" itemprop="description articleBody">This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published.
|
<div class="item__description"><div class="article__content" itemprop="description articleBody">This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published.
|
||||||
As the first part of the introductions, I focus only on the motivation and summary of our works.
|
As the first part of the introductions, I focus only on the motivation and summary of our works.
|
||||||
@@ -444,7 +458,7 @@ If your written language is based on latin alphabet(or other language has space
|
|||||||
</li></ul><ul class="right-col menu"><li><i class="fas fa-user"></i> <span>Pengzhan Hao</span></li><li><i class="far fa-calendar-alt"></i> <span>Sep 15, 2020</span>
|
</li></ul><ul class="right-col menu"><li><i class="fas fa-user"></i> <span>Pengzhan Hao</span></li><li><i class="far fa-calendar-alt"></i> <span>Sep 15, 2020</span>
|
||||||
</li></ul></div><meta itemprop="author" content="Pengzhan Hao"/><meta itemprop="datePublished" content="2020-09-15T22:00:14-04:00">
|
</li></ul></div><meta itemprop="author" content="Pengzhan Hao"/><meta itemprop="datePublished" content="2020-09-15T22:00:14-04:00">
|
||||||
<meta itemprop="keywords" content="visualization"></div>
|
<meta itemprop="keywords" content="visualization"></div>
|
||||||
</article><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__content">
|
</article><article class="item" itemscope itemtype="http://schema.org/BlogPosting"><div class="item__image" style="vertical-align: middle"><img class="image" src="/static/2021-10/Xv6_LS_Command_Output.png" /></div><div class="item__content">
|
||||||
<header><a href="/posts/intro-xv6"><h2 itemprop="headline" class="item__header">Xv6 introduction</h2></a></header>
|
<header><a href="/posts/intro-xv6"><h2 itemprop="headline" class="item__header">Xv6 introduction</h2></a></header>
|
||||||
<div class="item__description"><div class="article__content" itemprop="description articleBody">In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
|
<div class="item__description"><div class="article__content" itemprop="description articleBody">In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
|
||||||
Understand system call and know how to implement a simple one will be coved as the first half.
|
Understand system call and know how to implement a simple one will be coved as the first half.
|
||||||
|
|||||||
File diff suppressed because it is too large
Load Diff
@@ -441,42 +441,42 @@ More details in design and implementation can be found in late posts.</p>
|
|||||||
|
|
||||||
<h2 id="why-do-we-need-training-on-edge">Why do we need training on edge?</h2>
|
<h2 id="why-do-we-need-training-on-edge">Why do we need training on edge?</h2>
|
||||||
|
|
||||||
<p>Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before.
|
<p>Cloud is not trustworthy anymore. More and more facts support that breach on the cloud happens frequently than before.
|
||||||
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves.</p>
|
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech companies know better to someones than the user.</p>
|
||||||
|
|
||||||
<p>Researchers, no matter in industry on academia, are working in a way that still learning from users’ data but also keeping raw sensitive data under users’ control.
|
<p>Researchers, no matter in the industry on academia, are working in a way that still learning from users’ data but also keeping raw sensitive data under users’ control.
|
||||||
Many publications already showed feasibility of only sharing after-trained model instead of raw data.
|
Many publications have already shown the feasibility of only sharing the after-trained model instead of raw data.
|
||||||
One recent popular study on this is google’s <a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a>.</p>
|
One recent popular study on this is google’s <a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a>.</p>
|
||||||
|
|
||||||
<p>During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency.
|
<p>During investigating this problem, we found that letting end-user train their data is safe, but sacrifice efficiency.
|
||||||
Since one end device has limited resources, training time and power consumption can be disappointing.
|
Since one end device has limited resources, training time and power consumption can be disappointing.
|
||||||
We believe there must have a leverage between privacy and efficiency in some target scenarios.</p>
|
We believe there must have leverage between privacy and efficiency in some target scenarios.</p>
|
||||||
|
|
||||||
<p>Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests.
|
<p>Fortunately, we observed that users who belong to the same campus, plant, firm, and community always share similar interests.
|
||||||
Therefore, these co-located users have similar demands in using AI-involved routines.
|
Therefore, these co-located users have similar demands in using AI-involved routines.
|
||||||
Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners.</p>
|
Also, co-located users are easily targeted by the same type of threats, such as ransomware to financial practitioners.</p>
|
||||||
|
|
||||||
<p>Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program.
|
<p>Think about this, sending features of a new malware app to cloud services to train neural networks used by antivirus programs.
|
||||||
This process may takes long time and small amount of samples may not be recognized by the global neural networks model.
|
This process may take a long time and a small number of samples may not be recognized by the global neural networks model.
|
||||||
With a customized local model trained and deployed on the edge can successfully counter the problem.
|
A customized local model trained and deployed on the edge can successfully counter the problem.
|
||||||
With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible.</p>
|
With edge training as a supplement to the cloud training can achieve better response time and let the whole system more flexible.</p>
|
||||||
|
|
||||||
<h2 id="why-training-on-edge-is-hard">Why training on edge is hard?</h2>
|
<h2 id="why-training-on-edge-is-hard">Why training on edge is hard?</h2>
|
||||||
|
|
||||||
<p>Since all co-located users’ device can be used for an edge training, issues and challenges occur as deploying this distributed system.</p>
|
<p>Since all co-located users’ devices can be used for edge training, issues and challenges occur as deploying this distributed system.</p>
|
||||||
|
|
||||||
<p>The first challenge is <strong>struggling workers</strong>.
|
<p>The first challenge is <strong>struggling workers</strong>.
|
||||||
Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU.
|
Training devices are heterogeneous, from limited IoT cameras to high-end media centers with powerful GPUs.
|
||||||
They are not designed to do machine learnings.
|
They are not designed to do machine learning.
|
||||||
So, a good edge-based distributed learning framework must can handle variety speeds in training tasks.</p>
|
So, a good edge-based distributed learning framework must be able to handle a variety of speeds in training tasks.</p>
|
||||||
|
|
||||||
<p>The second challenge is how to <strong>scale up</strong> clusters.
|
<p>The second challenge is how to <strong>scale up</strong> clusters.
|
||||||
In a campus, thousands and more devices may contribute computing resources to the same training tasks.
|
On a campus, thousands and more devices may contribute computing resources to the same training tasks.
|
||||||
However, these devices may located in far not matter in physical or in network topology.
|
However, these devices may be located far no matter in physical or in network topology.
|
||||||
How can we well use them well, without struggled with endless transmission time remains a challenge.</p>
|
The question of how can we well use them well, without struggling with endless transmission time remains a challenge.</p>
|
||||||
|
|
||||||
<p>The third issue is frequently <strong>joining and exiting</strong> of devices.
|
<p>The third issue is frequently <strong>joining and exiting</strong> of devices.
|
||||||
We can’t rely on each devices to faithfully working on training tasks rather than their original workload.
|
We can’t rely on each device to faithfully work on training tasks rather than their original workload.
|
||||||
Smartly schedule work balance and handle join/exit issues also need under consideration.</p>
|
Smartly schedule work balance and handle join/exit issues also need under consideration.</p>
|
||||||
|
|
||||||
<h2 id="our-proposal">Our proposal</h2>
|
<h2 id="our-proposal">Our proposal</h2>
|
||||||
@@ -485,29 +485,29 @@ Smartly schedule work balance and handle join/exit issues also need under consid
|
|||||||
<li>
|
<li>
|
||||||
<p>Dynamic training data distribution and runtime profiler</p>
|
<p>Dynamic training data distribution and runtime profiler</p>
|
||||||
|
|
||||||
<p>We design a dynamic training data distribution mechanism that helps to both the first and the third challenges.
|
<p>We design a dynamic training data distribution mechanism that helps both the first and the third challenges.
|
||||||
Preprocessing data can be transmitted without leakage of raw sensitive information.
|
Preprocessing data can be transmitted without leakage of raw and sensitive information.
|
||||||
This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time.
|
This can help struggling workers who can train small batches in order to upload parameters with a similar training time.
|
||||||
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness.</p>
|
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can help with keeping global training parameters from pollution and staleness.</p>
|
||||||
|
|
||||||
<p>To counter heterogeneity’s, more approaches were applied in our later research.
|
<p>To counter heterogeneity, more approaches were applied in our later research.
|
||||||
More details were introduced to runtime profiler in the later works.</p>
|
More details were introduced to the runtime profiler in the later works.</p>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<p>Asynchronous and synchronous aggregation enabled</p>
|
<p>Asynchronous and synchronous aggregation enabled</p>
|
||||||
|
|
||||||
<p>In our findings, asynchronous and synchronous parameter update have their pros and cons.
|
<p>In our findings, asynchronous and synchronous parameter update have their pros and cons.
|
||||||
Keeping sync all the time leads struggling worker issue unsolvable.
|
Keeping sync all the time leads to struggling worker issues unsolvable.
|
||||||
However, async’s harm to accuracy and convergence time also need attentions.
|
However, async’s harm to accuracy and convergence time also needs attention.
|
||||||
To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages.</p>
|
To carefully choose between these two update policies at the runtime is what we proposed to make use of their own advantages.</p>
|
||||||
</li>
|
</li>
|
||||||
<li>
|
<li>
|
||||||
<p>Leader role splitting</p>
|
<p>Leader role splitting</p>
|
||||||
|
|
||||||
<p>The idea is to let worker devices with higher bandwidth taking leader role during training.
|
<p>The idea is to let worker devices with higher bandwidth take leader-role during training.
|
||||||
Parameter updating does not require much computation but only need bandwidth.
|
Parameter updating does not require much computation but only needs a great of bandwidth.
|
||||||
Devices with sufficient bandwidth can also work as virtual leader devices.
|
Devices with sufficient bandwidth can also work as virtual leader devices.
|
||||||
This approach helps with minimize physical devices we used and more leaders can further scale up workers limits.</p>
|
This approach helps minimize physical devices we used and more leaders can further scale up workers’ limits.</p>
|
||||||
</li>
|
</li>
|
||||||
</ul>
|
</ul>
|
||||||
</div><section class="article__sharing d-print-none"></section><div class="d-print-none"><footer class="article__footer"><meta itemprop="dateModified" content="2021-10-13T16:53:20-04:00"><!-- start custom article footer snippet -->
|
</div><section class="article__sharing d-print-none"></section><div class="d-print-none"><footer class="article__footer"><meta itemprop="dateModified" content="2021-10-13T16:53:20-04:00"><!-- start custom article footer snippet -->
|
||||||
@@ -515,7 +515,7 @@ Smartly schedule work balance and handle join/exit issues also need under consid
|
|||||||
<!-- end custom article footer snippet -->
|
<!-- end custom article footer snippet -->
|
||||||
<div class="article__subscribe"><div class="subscribe"><i class="fas fa-rss"></i> <a type="application/rss+xml" href="/feed.xml">Subscribe</a></div>
|
<div class="article__subscribe"><div class="subscribe"><i class="fas fa-rss"></i> <a type="application/rss+xml" href="/feed.xml">Subscribe</a></div>
|
||||||
</div><div class="article__license"></div></footer>
|
</div><div class="article__license"></div></footer>
|
||||||
<div class="article__section-navigator clearfix"><div class="previous"><span>PREVIOUS</span><a href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div></div></div>
|
<div class="article__section-navigator clearfix"><div class="previous"><span>PREVIOUS</span><a href="/posts/generate-word-cloud-with-chinese-fenci">Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</a></div><div class="next"><span>NEXT</span><a href="/posts/eddl-how-do-we-train-on-limited-edge-devices-part2">EDDL: How do we train neural networks on limited edge devices - PART 2</a></div></div></div>
|
||||||
|
|
||||||
</div>
|
</div>
|
||||||
|
|
||||||
|
|||||||
@@ -25,6 +25,10 @@
|
|||||||
<lastmod>2021-10-13T16:53:20-04:00</lastmod>
|
<lastmod>2021-10-13T16:53:20-04:00</lastmod>
|
||||||
</url>
|
</url>
|
||||||
<url>
|
<url>
|
||||||
|
<loc>https://codersherlock.github.com//posts/eddl-how-do-we-train-on-limited-edge-devices-part2</loc>
|
||||||
|
<lastmod>2021-10-31T13:01:14-04:00</lastmod>
|
||||||
|
</url>
|
||||||
|
<url>
|
||||||
<loc>https://codersherlock.github.com//about.html</loc>
|
<loc>https://codersherlock.github.com//about.html</loc>
|
||||||
</url>
|
</url>
|
||||||
<url>
|
<url>
|
||||||
|
|||||||
Binary file not shown.
|
After Width: | Height: | Size: 8.8 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 87 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 37 KiB |
Executable
+3
@@ -0,0 +1,3 @@
|
|||||||
|
export PATH="$HOME/gems/bin:$PATH"
|
||||||
|
export GEM_HOME="$HOME/gems"
|
||||||
|
export JEKYLL_ENV=production
|
||||||
Binary file not shown.
|
After Width: | Height: | Size: 8.8 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 87 KiB |
Binary file not shown.
|
After Width: | Height: | Size: 37 KiB |
Reference in New Issue
Block a user