mirror of
https://github.com/CoderSherlock/CoderSherlock.github.io.git
synced 2026-06-13 08:08:10 -07:00
47c2c2e8a1
changed the date format under category page
539 lines
54 KiB
XML
539 lines
54 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
|
||
<channel>
|
||
<title>Stop Talking, Start Doing</title>
|
||
<description>My personal blog, with some boring research staff and some tricks I was fancy to. I'll try my best to make this blog fun and useful. Not just a place I complain about all happens in my Lab.
|
||
</description>
|
||
<link>https://codersherlock.github.com//</link>
|
||
<atom:link href="https://codersherlock.github.com//feed.xml" rel="self" type="application/rss+xml"/>
|
||
<pubDate>Wed, 13 Oct 2021 19:10:30 -0400</pubDate>
|
||
<lastBuildDate>Wed, 13 Oct 2021 19:10:30 -0400</lastBuildDate>
|
||
<generator>Jekyll v4.1.1</generator>
|
||
|
||
<item>
|
||
<title>EDDL: How do we train neural networks on limited edge devices - PART 1</title>
|
||
<description><p>This post introduces our previous milestone in project “Edge trainer”, as the paper “EDDL: A Distributed Deep Learning System for Resource-limited Edge Computing Environment.” was published.
|
||
As the first part of the introductions, I focus only on the motivation and summary of our works.
|
||
More details in design and implementation can be found in late posts.</p>
|
||
|
||
<p><img src="/static/2021-10/edgelearn-1.png" height="250" /></p>
|
||
|
||
<h2 id="why-do-we-need-training-on-edge">Why do we need training on edge?</h2>
|
||
|
||
<p>Cloud is not trustworthy anymore. More and more facts supports that breach on cloud happens frequently than before.
|
||
Nowadays, with more generated personal sensitive data has been uploaded to the cloud center, tech company know better to someones than user themselves.</p>
|
||
|
||
<p>Researchers, no matter in industry on academia, are working in a way that still learning from users’ data but also keeping raw sensitive data under users’ control.
|
||
Many publications already showed feasibility of only sharing after-trained model instead of raw data.
|
||
One recent popular study on this is google’s <a href="https://ai.googleblog.com/2017/04/federated-learning-collaborative.html">federated learning</a>.</p>
|
||
|
||
<p>During investigated this problem, we found that let end user train their own data is safe, but sacrifice efficiency.
|
||
Since one end device has limited resources, training time and power consumption can be disappointing.
|
||
We believe there must have a leverage between privacy and efficiency in some target scenarios.</p>
|
||
|
||
<p>Fortunately, we observed that users who belongs to the same campus, plant, firm and community always share similar interests.
|
||
Therefore, these co-located users have similar demands in using AI-involved routines.
|
||
Also, co-located users are easily targeted by same type of threats, such as ransomware to financial practitioners.</p>
|
||
|
||
<p>Think about this, sending features of a new malware app to cloud services in order to train a neural networks used by antivirus program.
|
||
This process may takes long time and small amount of samples may not be recognized by the global neural networks model.
|
||
With a customized local model trained and deployed on the edge can successfully counter the problem.
|
||
With edge training as a supplement of cloud training can achieve better response time and let the whole system more flexible.</p>
|
||
|
||
<h2 id="why-training-on-edge-is-hard">Why training on edge is hard?</h2>
|
||
|
||
<p>Since all co-located users’ device can be used for an edge training, issues and challenges occur as deploying this distributed system.</p>
|
||
|
||
<p>The first challenge is <strong>struggling workers</strong>.
|
||
Training devices are heterogeneity, from limited IoT camera to high-end media center with powerful GPU.
|
||
They are not designed to do machine learnings.
|
||
So, a good edge-based distributed learning framework must can handle variety speeds in training tasks.</p>
|
||
|
||
<p>The second challenge is how to <strong>scale up</strong> clusters.
|
||
In a campus, thousands and more devices may contribute computing resources to the same training tasks.
|
||
However, these devices may located in far not matter in physical or in network topology.
|
||
How can we well use them well, without struggled with endless transmission time remains a challenge.</p>
|
||
|
||
<p>The third issue is frequently <strong>joining and exiting</strong> of devices.
|
||
We can’t rely on each devices to faithfully working on training tasks rather than their original workload.
|
||
Smartly schedule work balance and handle join/exit issues also need under consideration.</p>
|
||
|
||
<h2 id="our-proposal">Our proposal</h2>
|
||
|
||
<ul>
|
||
<li>
|
||
<p>Dynamic training data distribution and runtime profiler</p>
|
||
|
||
<p>We design a dynamic training data distribution mechanism that helps to both the first and the third challenges.
|
||
Preprocessing data can be transmitted without leakage of raw sensitive information.
|
||
This can helps with struggling workers who can train small batches in order to upload parameters with a similar training time.
|
||
Also, for extremely slow devices, join and exit of devices cases, dynamic data distribution and profiler can helps with keep global training parameters from polluted and staleness.</p>
|
||
|
||
<p>To counter heterogeneity’s, more approaches were applied in our later research.
|
||
More details were introduced to runtime profiler in the later works.</p>
|
||
</li>
|
||
<li>
|
||
<p>Asynchronous and synchronous aggregation enabled</p>
|
||
|
||
<p>In our findings, asynchronous and synchronous parameter update have their pros and cons.
|
||
Keeping sync all the time leads struggling worker issue unsolvable.
|
||
However, async’s harm to accuracy and convergence time also need attentions.
|
||
To carefully chose between these two update policies at the runtime is what we proposed to make use of their own advantages.</p>
|
||
</li>
|
||
<li>
|
||
<p>Leader role splitting</p>
|
||
|
||
<p>The idea is to let worker devices with higher bandwidth taking leader role during training.
|
||
Parameter updating does not require much computation but only need bandwidth.
|
||
Devices with sufficient bandwidth can also work as virtual leader devices.
|
||
This approach helps with minimize physical devices we used and more leaders can further scale up workers limits.</p>
|
||
</li>
|
||
</ul>
|
||
</description>
|
||
<pubDate>Wed, 13 Oct 2021 16:53:20 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/eddl-how-do-we-train-on-limited-edge-devices</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/eddl-how-do-we-train-on-limited-edge-devices</guid>
|
||
|
||
|
||
<category>Research</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</title>
|
||
<description><p>Let’s generate a word cloud like this.
|
||
Don’t understand the language is not a big deal.
|
||
If your written language is based on latin alphabet(or other language has space between words), skip tokenization.</p>
|
||
|
||
<p><img src="/static/2020-09/2020-06-28.png" height="250" /></p>
|
||
|
||
<h2 id="background">Background</h2>
|
||
|
||
<p>Recently, I set up a web-based RSS client for retrieving and organizing everyday news. I used <a href="https://tt-rss.org/">TinyTinyRSS</a>, or as ttrss, a popular RSS client which friendly to docker. Thanks to developer <a href="https://ttrss.henry.wang/#about">HenryQW</a>, a well-written Nginx-based docker configuration is already available in docker hub. With more feeds were added, I found some feeds does not need to be checked everyday. Thus I was thinking to create a script to automatically list all keywords appears in a last period and generate a heat map kind figure of it.</p>
|
||
|
||
<p>Before you go further, I’ll tell you all my settings to give readers a general overview.</p>
|
||
|
||
<p>My first step is to read all text-based information from TTRSS’s PostgreSQL database. With information, I used a Chinese-NLP library, <a href="https://github.com/fxsjy/jieba">jieba</a>, to extract all keyword with their occurrences frequency. By using <a href="https://github.com/amueller/word_cloud">WordCloud</a>, a python library, word cloud figure is generated and present. More details will be discussed in later sections.</p>
|
||
|
||
<h2 id="get-rss-feeds-text">Get RSS feeds’ text</h2>
|
||
|
||
<p>My first thought is generating a keyword heat map for economy news of a last week. Since this blog post are more skewed to Chinese tokenization and draw the word cloud figure. I’ll leave my code here just in case. The SQL connector I used is <a href="https://pypi.org/project/psycopg2/">psycopg2</a>, an easy-use PostgreSQL library.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="bp">self</span><span class="p">.</span><span class="n">dbe</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span>
|
||
<span class="n">host</span><span class="o">=</span><span class="n">DB_HOST</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="n">DB_PORT</span><span class="p">,</span> <span class="n">database</span><span class="o">=</span><span class="n">DB_NAME</span><span class="p">,</span> <span class="n">user</span><span class="o">=</span><span class="n">DB_USER</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="n">DB_PASS</span><span class="p">)</span>
|
||
|
||
<span class="k">def</span> <span class="nf">get_1w_of_feed_byid</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">:</span>
|
||
<span class="n">cur</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">dbe</span><span class="p">.</span><span class="n">cursor</span><span class="p">()</span>
|
||
<span class="n">cur</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'SELECT content FROM public.ttrss_entries </span><span class="se">\
|
||
</span><span class="s"> where date_updated &gt; now() - interval </span><span class="se">\'</span><span class="s">1 week</span><span class="se">\'</span><span class="s"> AND id in ( </span><span class="se">\
|
||
</span><span class="s"> select int_id from DB_TABLE_NAME </span><span class="se">\
|
||
</span><span class="s"> where feed_id='</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span> <span class="o">+</span> <span class="s">' </span><span class="se">\
|
||
</span><span class="s"> ) </span><span class="se">\
|
||
</span><span class="s"> ORDER BY id ASC '</span>
|
||
<span class="p">)</span>
|
||
<span class="n">rows</span> <span class="o">=</span> <span class="n">cur</span><span class="p">.</span><span class="n">fetchall</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">rows</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>Most arguments are intuitive and easy to understand. The only exception is argument of function <em>get_1w_of_feed_byid</em>. This <strong>id</strong> is the feed index of my subscriptions.</p>
|
||
|
||
<h2 id="tokenize-with-frequency">Tokenize with frequency</h2>
|
||
|
||
<p>Two popular tokenization library were used, and I chose <a href="https://github.com/fxsjy/jieba">jieba</a> after a few comparison. Before cutting the sentence, we first need to remove all punctuation marks.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">remove_biaodian</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
|
||
<span class="n">punct</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="s">u''':!),.:;?]}¢'"、。〉》」』】〕〗〞︰︱︳﹐、﹒
|
||
﹔﹕﹖﹗﹚﹜﹞!),.:;?|}︴︶︸︺︼︾﹀﹂﹄﹏、~¢
|
||
々‖•·ˇˉ―--′’”([{£¥'"‵〈《「『【〔〖([{£¥〝︵︷︹︻
|
||
︽︿﹁﹃﹙﹛﹝({“‘-—_…'''</span><span class="p">)</span>
|
||
<span class="n">ret</span> <span class="o">=</span> <span class="s">""</span>
|
||
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">text</span><span class="p">:</span>
|
||
<span class="k">if</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">punct</span><span class="p">:</span>
|
||
<span class="n">ret</span> <span class="o">+=</span> <span class="s">''</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="n">ret</span> <span class="o">+=</span> <span class="n">x</span>
|
||
<span class="k">return</span> <span class="n">ret</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>After we have an all characters string, we can call jieba. By using the function <em>jieba.posseg.cut</em> with or without paddle, we can have a word list and their “part of speech”. As you can see in the following code, I also did two more works.</p>
|
||
|
||
<p>First, in the if statement, I only kept all nouns with some categories. Category abbreviation such as “nr” and “ns” represent different “part of speech”, I attached with categories I used in the following table. For more details you can find in this <a href="https://github.com/fxsjy/jieba">link</a>.</p>
|
||
|
||
<p>The second work is only keeping words with length longer than 2 characters. In Chinese, there’s no space between words such as Latin writing systems. Since then, some single-character-words such as conjunction words are easy to be misrecognized as specialty-noun. And this misrecognition will cause more single-character being regarded as specialty-noun. I am not able to improve NLP method, so I used a easy way to fix this by removing any words less than 2 characters.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">jieba.posseg</span> <span class="k">as</span> <span class="n">pseg</span>
|
||
|
||
<span class="k">def</span> <span class="nf">get_noun_jieba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">content</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">:</span>
|
||
<span class="n">content</span> <span class="o">=</span> <span class="n">remove_biaodian</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
|
||
<span class="n">words</span> <span class="o">=</span> <span class="n">pseg</span><span class="p">.</span><span class="n">cut</span><span class="p">(</span><span class="n">content</span><span class="p">)</span> <span class="c1"># Invoking jieba.posseg.cut function
|
||
</span>
|
||
<span class="n">ret</span> <span class="o">=</span> <span class="p">[]</span>
|
||
<span class="k">for</span> <span class="n">word</span><span class="p">,</span> <span class="n">flag</span> <span class="ow">in</span> <span class="n">words</span><span class="p">:</span>
|
||
<span class="c1"># print(word, flag)
|
||
</span> <span class="k">if</span> <span class="n">flag</span> <span class="ow">in</span> <span class="p">[</span><span class="s">'nr'</span><span class="p">,</span> <span class="s">'ns'</span><span class="p">,</span> <span class="s">'nt'</span><span class="p">,</span> <span class="s">'nw'</span><span class="p">,</span> <span class="s">'nz'</span><span class="p">,</span> <span class="s">'PER'</span><span class="p">,</span> <span class="s">'ORG'</span><span class="p">,</span> <span class="s">'x'</span><span class="p">]:</span> <span class="c1"># LOC
|
||
</span> <span class="n">ret</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">word</span><span class="p">)</span>
|
||
<span class="k">return</span> <span class="p">[</span><span class="n">remove_biaodian</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">ret</span> <span class="k">if</span> <span class="n">i</span><span class="p">.</span><span class="n">strip</span><span class="p">()</span> <span class="o">!=</span> <span class="s">""</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">remove_biaodian</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">strip</span><span class="p">()))</span> <span class="o">&gt;=</span> <span class="mi">2</span><span class="p">]</span>
|
||
</code></pre></div></div>
|
||
|
||
<ul>
|
||
<li>Word category names and abbreviations</li>
|
||
</ul>
|
||
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Abbreviation</th>
|
||
<th>Category name/ Part of speech</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>nr</td>
|
||
<td>People name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>ns</td>
|
||
<td>Location name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>nt</td>
|
||
<td>Organization name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>nw</td>
|
||
<td>Arts work noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>nz</td>
|
||
<td>Other noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>PER</td>
|
||
<td>People name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>ORG</td>
|
||
<td>Location name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>x</td>
|
||
<td>Non-morpheme word</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
<p>With all words extracted, we can easily calculate their frequencies. After this, we can using the following line of code to print a sorted result to verify correctness.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">noun</span> <span class="o">=</span> <span class="n">seg</span><span class="p">.</span><span class="n">get_noun_jieba</span><span class="p">(</span><span class="n">test_content</span><span class="p">)</span>
|
||
<span class="c1"># ... Calculate frequency of above word list ...
|
||
</span><span class="k">print</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">a_dict</span><span class="p">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
|
||
</code></pre></div></div>
|
||
|
||
<h2 id="draw-word-cloud">Draw word cloud</h2>
|
||
|
||
<p>With a keyword and frequency dictionary(data structure), we can just call built-in functions from wordcloud library to generate the figure.</p>
|
||
|
||
<p>First we need to initialize an instance of wordcloud class. As you can see in my code, I set it with 6 parameters. Width and Height of the canvas, maximum amount of words used to generate the figure, the font of words, background color and margin between any two words.</p>
|
||
|
||
<p>After having the instance, we call function <em>generate_from_frequencies</em> and pass keyword dictionary to it. The return value of this function is an bitmap image, which we can use <a href="https://matplotlib.org/">matplotlib</a> to plot it to your screen.</p>
|
||
|
||
<p>I tested my plot on ubuntu-subsystem on Windows 10, unfortunately matplotlib under subsystem depends on x11 window manager and its not default available on windows. We need to install an x11 manager to support. <a href="https://sourceforge.net/projects/xming/">Xming</a> is the one I used.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">wordcloud</span> <span class="kn">import</span> <span class="n">WordCloud</span>
|
||
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
|
||
|
||
<span class="n">font_path</span> <span class="o">=</span> <span class="s">"./font/haipai.ttf"</span>
|
||
<span class="n">output_path</span> <span class="o">=</span> <span class="s">"./font/out.png"</span>
|
||
|
||
|
||
<span class="k">def</span> <span class="nf">show_figure_with_frequency</span><span class="p">(</span><span class="n">keywords</span><span class="p">:</span> <span class="nb">dict</span><span class="p">):</span>
|
||
<span class="n">wc</span> <span class="o">=</span> <span class="n">WordCloud</span><span class="p">(</span><span class="n">width</span><span class="o">=</span><span class="mi">828</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">1792</span><span class="p">,</span> <span class="n">max_words</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">font_path</span><span class="o">=</span><span class="n">font_path</span><span class="p">,</span>
|
||
<span class="n">background_color</span><span class="o">=</span><span class="s">"white"</span><span class="p">,</span> <span class="n">margin</span><span class="o">=</span><span class="mi">1</span><span class="p">).</span><span class="n">generate_from_frequencies</span><span class="p">(</span><span class="n">keywords</span><span class="p">)</span>
|
||
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">wc</span><span class="p">)</span>
|
||
<span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">)</span>
|
||
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>If everything work fine, a word cloud figure will show up in a new window. My version looks like this.</p>
|
||
|
||
<p><img src="/static/2020-09/2020-06-28.png" height="150" /></p>
|
||
|
||
<p>This generated word cloud figure reflects the most popular economy news’ keyword in the week started 06-28-2020. Two largest words in the figure are “新冠” and “新冠病毒”, both means “Covid-19” (This figure was in the week of the second covid spur in Beijing, China). The size of the image fits my phone screen and I can use an app to automatic sync it to my phone’s wallpaper. However, in this image, too many location nouns are presented. This will be something I can make progress on in the future.</p>
|
||
|
||
</description>
|
||
<pubDate>Tue, 15 Sep 2020 22:00:14 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/generate-word-cloud-with-chinese-fenci</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/generate-word-cloud-with-chinese-fenci</guid>
|
||
|
||
|
||
<category>visualization</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Xv6 introduction</title>
|
||
<description><p>In this post, you will learn a few basic concepts of xv6. Learning path will be closed coupled to first project assignment I gave when I assisted in teaching OS classes.
|
||
Understand system call and know how to implement a simple one will be coved as the first half.
|
||
In the second half of this post, I will discuss a little bit more on how to debug xv6 using gdb.</p>
|
||
|
||
<h2 id="xv6-systemcall">Xv6 Systemcall</h2>
|
||
|
||
<p>To invoke a system call, we have to first define a user mode function to be the interface of the kernel instruction in file <em>user.h</em>.</p>
|
||
|
||
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>This interface-like function will then pass the function name, in this case function, to <em>usys.S</em>. When using user mode function in programs, <em>usys.S</em> will generate a reference to SYS_function and push system call number of this function into %eax. After that, system can know from <em>syscall.c</em> and determining whether this system call is available. We must define same name system function and add it into <em>syscall.h</em> and <em>syscall.c</em>.</p>
|
||
|
||
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SYS_function ## // ## is the system call number
|
||
</span><span class="p">[</span><span class="n">SYS_function</span><span class="p">]</span> <span class="n">sys_function</span> <span class="c1">// real system function name</span>
|
||
<span class="k">extern</span> <span class="kt">int</span> <span class="nf">sys_function</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// real system function declaration</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>After adding these sentences to syscall files, we can implement real function in specific place where you want to make the function works well.</p>
|
||
|
||
<p>Sometimes, we need to pass variables among system calls. In this case, variables’ values are not necessary and even can’t be pass directly into system_function. When invoke a system call function, all variables of this system call will be pushed into current process’ stack. In file <em>syscall.c</em>, multiple functions are provided to get these variables from the process. I won’t waste time on explaining how to use these functions especially when elegant and detailed comments were written in source codes. However, I will explain concepts and how process organized and works in xv6 in future articles.</p>
|
||
|
||
<h2 id="debug-xv6-with-gdb">Debug xv6 with gdb</h2>
|
||
|
||
<p>Please make sure that you have used gdb before.
|
||
If you never used gdb, you may write a simple 50-100 lines c code and practice how to use gdb first.</p>
|
||
|
||
<ul>
|
||
<li><a href="https://sourceware.org/gdb/current/onlinedocs/gdb/">GDB Manual</a></li>
|
||
<li><a href="https://darkdust.net/files/GDB%20Cheat%20Sheet.pdf">GDB cheatsheet (pdf)</a></li>
|
||
</ul>
|
||
|
||
<p>To make sure xv6 gdb enabled, please check if <em>.gdbinit.tmpl</em> file exist.
|
||
This file is used for generate <em>.gdbinit</em> file which you can late consider it as a configuration for gdb.</p>
|
||
|
||
<p>Before running the xv6 instance in QEMU, one more thing you need to know is that using gdb to debug xv6 must be attached remotely.
|
||
This is because xv6 was running within QEMU, and emulator is virtually gapped from the host device.
|
||
Later when you start debugging, QEMU will open a gdb server to let gdb client connect to.</p>
|
||
|
||
<p>Once you want to start, using following command to compile and run xv6</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>make qemu-nox-gdb
|
||
<span class="k">***</span> Now run <span class="s1">'gdb'</span><span class="nb">.</span>
|
||
qemu-system-i386 <span class="nt">-nographic</span> <span class="nt">-drive</span> <span class="nv">file</span><span class="o">=</span>fs.img,index<span class="o">=</span>1,media<span class="o">=</span>disk,format<span class="o">=</span>raw <span class="nt">-drive</span> <span class="nv">file</span><span class="o">=</span>xv6.img,index<span class="o">=</span>0,media<span class="o">=</span>disk,format<span class="o">=</span>raw <span class="nt">-smp</span> 2 7
|
||
</code></pre></div></div>
|
||
|
||
<p>At this moment, it feels xv6 was stuck, this is because QEMU is ready to be connected by the gdb client.
|
||
You may use the <em>.gdbinit</em> to automatically finish this remote connection by simple typein following command in another terminal.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">$ </span>gdb <span class="nt">-x</span> .gdbinit
|
||
GNU gdb <span class="o">(</span>Debian 8.2.1-2+b3<span class="o">)</span> 8.2.1
|
||
|
||
...
|
||
|
||
The target architecture is assumed to be i8086
|
||
<span class="o">[</span>f000:fff0] 0xffff0: ljmp <span class="nv">$0x3630</span>,<span class="nv">$0xf000e05b</span>
|
||
0x0000fff0 <span class="k">in</span> ?? <span class="o">()</span>
|
||
+ symbol-file kernel
|
||
warning: A handler <span class="k">for </span>the OS ABI <span class="s2">"GNU/Linux"</span> is not built into this configuration
|
||
of GDB. Attempting to <span class="k">continue </span>with the default i8086 settings.
|
||
|
||
<span class="o">(</span>gdb<span class="o">)</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>Now within this gdb client shell, type ‘c’ to continue the xv6, and you will see xv6 start execution in the first terminal.</p>
|
||
|
||
<p>At this moment, you may add breakpoints to your code to see if your code is correctly implemented or not.</p>
|
||
|
||
<p><strong>One more thing</strong>, if you open <em>.gdbinit</em> file, you’ll find that it by default connect to a localhost target.
|
||
If you are working on some other environment that target and client were not placed in the same device, change the localhost to ip address correspondingly.
|
||
Using ssh may connect to different physical devices under same domain name, this is because load balancer were used. To check ip address, search command <em>ip</em>.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>target remote localhost:28467
|
||
<span class="c"># target remote [ip-addr]:28467</span>
|
||
</code></pre></div></div>
|
||
</description>
|
||
<pubDate>Fri, 28 Jul 2017 14:56:55 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/intro-xv6</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/intro-xv6</guid>
|
||
|
||
|
||
<category>xv6</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Some of my previews experiment works: 2016</title>
|
||
<description><p>This blog contains only some basic record of my works. For some details, I will write a unique blog just for some specific topics.</p>
|
||
|
||
<h1 id="2016-10">2016-10</h1>
|
||
|
||
<h2 id="time-experiment-of-rsync">Time Experiment of rsync</h2>
|
||
|
||
<p>Patch is based on rsync with version 3.1.2. [<a href="https://download.samba.org/pub/rsync/rsync-3.1.2.tar.gz">Rsync</a>|<a href="/static/2016-10/rsync/rsync-3.1.2-time.patch">Patch</a>]</p>
|
||
|
||
<h3 id="how-to-collect-data">How to collect data</h3>
|
||
|
||
<p>Basically, everything of transmission time and computation time will be output with overall time will be printed on the console.
|
||
But we also need some bash script to collect data through different size of random size and with different modification through them.</p>
|
||
|
||
<ul>
|
||
<li>Start from 8K to 64M, modify at beginning, [<a href="/static/2016-10/rsync/small2Big_change_at_begin.sh">Bash script</a>]</li>
|
||
<li>Start from 8K to 64M, modify at last, [<a href="/static/2016-10/rsync/small2Big_change_at_last.sh">Bash script</a>]</li>
|
||
<li>Start from 8K to 64M, modify at random place with a (slow) python script, [<a href="/static/2016-10/rsync/small2Big_change_at_anyplace.sh">Bash script</a>|<a href="/static/2016-10/rsync/addbyte.py">Python program</a>]</li>
|
||
</ul>
|
||
|
||
<h2 id="time-experiment-of-seafile">Time Experiment of seafile</h2>
|
||
|
||
<p>Patch is based on seafile 5.1.4. You can find the release from <a href="https://github.com/haiwen/seafile/releases">seafile official repo</a>. You may follow official compile instructions from <a href="https://manual.seafile.com/build_seafile/linux.html">here</a>. [<a href="">Patch <strong>no longer avaiable, new version at following sections</strong></a>]</p>
|
||
|
||
<h3 id="how-to-collect-data-1">How to collect data</h3>
|
||
|
||
<p>We also need everything be done using scripting. But this time I only design added some distance between two increasing files’ sizes.</p>
|
||
|
||
<ul>
|
||
<li>Start from 8K to 16M, 4 times increasing, modify at beginning/ at 1024 different places with python script. [<a href="/static/2016-11/seafile/trans.sh">Bash Script</a>|<a href="/static/2016-11/seafile/addbyte.py">Python program</a>]</li>
|
||
<li>After using this auto testing script, everything of output will be marked in log files of seafile, which located in <strong>~/.ccnet/log/seafile.log</strong></li>
|
||
<li>We need to use this simple awk code and vim operation to extract data.</li>
|
||
</ul>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># CDC: content defined chucks</span>
|
||
<span class="c"># HUT: Http upload traffic</span>
|
||
<span class="c"># ALL: overall time of one commit &amp; upload</span>
|
||
<span class="nb">awk</span> <span class="s1">'/CDC|HUT|ALL/ {print $4,$5}'</span> ~/.ccnet/log/seafile.log <span class="o">&gt;</span> results.stat
|
||
</code></pre></div></div>
|
||
|
||
<h3 id="install-seafile-on-odroid-xu">Install Seafile on odroid xu</h3>
|
||
|
||
<p>Due to failure of my cross-compile to seafile on android. I used develop board as a replacement experiment platform for ARM-seafile testing. I used a <a href="http://www.hardkernel.com/main/products/prdt_info.php?g_code=G137510300620">odroid xu</a> as hardware standard. Because all I need is an ARM platform, only an ARM-Ubuntu is enough for me. But develop prototype on a board is much fun than coding, I won’t address much this time. But I’ll start a blog telling some really cool stuff I made for a strange aim.</p>
|
||
|
||
<p>To install a ubuntu with GUI is my all preparation work. I found to way to do this.</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p><a href="http://www.armhf.com/boards/odroid-xu/">armhf</a> is a website for arm-based ubuntu. It has a detailed instruction to follow at <a href="http://www.armhf.com/boards/odroid-xu/odroid-sd-install/">here</a>. They also provide ubuntu 12.04/ 14.04 and debian 7.5 to choose. But unfortunately odroid xu’s hdmi output doesn’t supported by ubuntu native firmware. So install ubuntu-desktop might can’t be boot up for video output.</p>
|
||
</li>
|
||
<li>
|
||
<p>Burn images is much easy to install a pre-complied ubuntu system. I found this on odroid xu’s forum, which contains xubuntu image [<a href="http://odroid.in/ubuntu_14.04lts/ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img.xz">download</a>] for odroid xu. With this image, you just need to use dd command to write whole system mirror into sdcard.</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># If .img end with xz, use this command to uncompress first</span>
|
||
unxz ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img.xz
|
||
<span class="c"># Burn image into SD-card</span>
|
||
<span class="nb">sudo dd </span><span class="k">if</span><span class="o">=</span>ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img <span class="nv">of</span><span class="o">=</span>/dev/sdb <span class="nv">bs</span><span class="o">=</span>1M <span class="nv">conv</span><span class="o">=</span>fsync
|
||
<span class="nb">sync</span>
|
||
</code></pre></div></div>
|
||
|
||
<h1 id="2016-11">2016-11</h1>
|
||
|
||
<h2 id="android-kernel">Android Kernel</h2>
|
||
|
||
<h3 id="how-to-build-an-android-kernel">How to build an Android Kernel?</h3>
|
||
|
||
<p>Generally, I won’t tell anything in this parts, just mark some related links, and point out some mistakes or error solutions.</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p><a href="http://source.android.com/source/building-kernels.html#figuring-out-which-kernel-to-build">Google Official Guide</a>
|
||
– If you don’t have AOSP sources, you have to download prebuilt toolchains which recommended in this guide might not be correct. Use following links to choose your fitting tools.
|
||
— <a href="https://android.googlesource.com/?format=HTML">ASOP git root</a>, under sub class “/platform/prebuilts/gcc”</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://softwarebakery.com/building-the-android-kernel-on-linux">Packing and Flashing a Boot.img</a> <strong>[highly recommend]</strong></p>
|
||
</li>
|
||
</ul>
|
||
|
||
<h1 id="2016-12">2016-12</h1>
|
||
|
||
<h2 id="android-kernel-1">Android Kernel</h2>
|
||
|
||
<h3 id="how-to-compile-with-ftrace">How to compile with ftrace?</h3>
|
||
|
||
<p>If we want to debug under android, ftrace is a great tool for working. But, ftrace is not available in android if we used default configure file. Android kernel configuration is in <strong>arch/arm64/kernel/configs</strong>. We need to add few lines under that.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CONFIG_STRICT_MEMORY_RWX</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_FUNCTION_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_FUNCTION_GRAPH_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_DYNAMIC_FTRACE</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_PERSISTENT_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_IRQSOFF_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_PREEMPT_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_SCHED_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_STACK_TRACER</span><span class="o">=</span>y
|
||
</code></pre></div></div>
|
||
|
||
<h3 id="how-to-extract-android-images-dump-an-image">How to extract android images: Dump an image</h3>
|
||
|
||
<p>If we want to hold a rooted status after flashing boot, we need to extract an image from android devices. We can first use following command to find which blocks belongs to. According to some references, <a href="http://forum.xda-developers.com/showthread.php?t=2450045">this article</a> provide three ways to dump an image, I picked one for easy using.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>adb shell
|
||
<span class="nb">ls</span> <span class="nt">-al</span> /dev/block/platform/<span class="nv">$SOME</span><span class="se">\_</span>DEVICE../../by-name <span class="c"># {Partitions} -&gt; {Device Block}</span>
|
||
|
||
<span class="c"># dump file</span>
|
||
su
|
||
<span class="nb">dd </span><span class="k">if</span><span class="o">=</span>/dev/block/mmcblk0p37 <span class="nv">of</span><span class="o">=</span>/sdcard/boot.img
|
||
</code></pre></div></div>
|
||
</description>
|
||
<pubDate>Fri, 28 Oct 2016 12:27:33 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/some-of-my-previews-exper-work</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/some-of-my-previews-exper-work</guid>
|
||
|
||
|
||
<category>Research</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Using charles proxy to monitor mobile SSL traffics</title>
|
||
<description><p>In this blog, I will generally talk about how to use proper tools to monitor SSL traffics of a mobile devices. Currently, I only can dealing with those SSL traffics which use an obviously certification. Some applications may not using system root cert or they doesn’t provide us a method to modify their own certs. For these situation, I still didn’t find a good solutions for it. But I’ll keep updating this if I get one.<br />
|
||
My current solution is using AP to forward all SSL traffic to a proxy, <a href="https://www.charlesproxy.com/">charles proxy</a> is my first choice (Prof asked). It’s a non-free software which still update new versions now. So mainly, I’ll talk about how to charles SSL proxy.</p>
|
||
|
||
<h3 id="preparations">Preparations</h3>
|
||
<ul>
|
||
<li>Monitor device situation: Linux Machine with wireless adapter</li>
|
||
<li>Download the newest version(4.0.1) of charles</li>
|
||
<li>Target android devices with root privilege</li>
|
||
</ul>
|
||
|
||
<h3 id="install-charles-and-configuration">Install Charles and Configuration</h3>
|
||
|
||
<ul>
|
||
<li>You have to install charles first. After downloading the charles proxy, you have to unzip it and configure some basic settings.</li>
|
||
</ul>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># open charles first</span>
|
||
./bin/charles
|
||
</code></pre></div></div>
|
||
<ul>
|
||
<li>Save charles’ private key and public key</li>
|
||
</ul>
|
||
|
||
<p>In Help -&gt; SSL Proxying -&gt; Export Charles Root Certificate and Private Key, enter a password and save the public and private key in *.p12 format.<br />
|
||
You also need to save charles Root Certificate, it also contains in the same menu. For convience, save it as *.pem format.</p>
|
||
|
||
<ul>
|
||
<li>Set Proxy and SSL Proxy</li>
|
||
</ul>
|
||
</description>
|
||
<pubDate>Thu, 27 Oct 2016 22:50:33 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/charles-is-not-a-good-tool</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/charles-is-not-a-good-tool</guid>
|
||
|
||
|
||
<category>Network</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Stop Talking is the worst title of one blog</title>
|
||
<description>
|
||
</description>
|
||
<pubDate>Wed, 26 Oct 2016 22:50:33 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/hello</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/hello</guid>
|
||
|
||
|
||
<category>Nonsense</category>
|
||
|
||
</item>
|
||
|
||
</channel>
|
||
</rss>
|