mirror of
https://github.com/CoderSherlock/CoderSherlock.github.io.git
synced 2026-06-13 08:08:10 -07:00
389 lines
42 KiB
XML
389 lines
42 KiB
XML
<?xml version="1.0" encoding="UTF-8"?>
|
||
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
|
||
<channel>
|
||
<title>Stop Talking, Start Doing - 停止空想,开始行动</title>
|
||
<description>My personal blog, with some boring research staff and some tricks I was fancy to. I'll try my best to make this blog fun and useful. Not just a place I complain about all happens in my Lab.
|
||
</description>
|
||
<link>https://codersherlock.github.com//</link>
|
||
<atom:link href="https://codersherlock.github.com//feed.xml" rel="self" type="application/rss+xml"/>
|
||
<pubDate>Tue, 15 Sep 2020 22:22:06 -0400</pubDate>
|
||
<lastBuildDate>Tue, 15 Sep 2020 22:22:06 -0400</lastBuildDate>
|
||
<generator>Jekyll v4.1.1</generator>
|
||
|
||
<item>
|
||
<title>Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</title>
|
||
<description><p><img src="/static/2020-09/2020-06-28.png" height="350" /></p>
|
||
|
||
<h2 id="background">Background</h2>
|
||
|
||
<p>Recently, I set up a web-based RSS client for retrieving and organizing everyday news. I used <a href="https://tt-rss.org/">TinyTinyRSS</a>, or as ttrss, a popular RSS client which friendly to docker. Thanks to developer <a href="https://ttrss.henry.wang/#about">HenryQW</a>, a well-written Nginx-based docker configuration is already available in docker hub. With more feeds were added, I found some feeds does not need to be checked everyday. Thus I was thinking to create a script to automatically list all keywords appears in a last period and generate a heat map kind figure of it.</p>
|
||
|
||
<p>Before you go further, I’ll tell you all my settings to give readers a general overview.</p>
|
||
|
||
<p>My first step is to read all text-based information from TTRSS’s PostgreSQL database. With information, I used a Chinese-NLP library, <a href="https://github.com/fxsjy/jieba">jieba</a>, to extract all keyword with their occurrences frequency. By using <a href="https://github.com/amueller/word_cloud">WordCloud</a>, a python library, word cloud figure is generated and present. More details will be discussed in later sections.</p>
|
||
|
||
<h2 id="get-rss-feeds-text">Get RSS feeds’ text</h2>
|
||
|
||
<p>My first thought is generating a keyword heat map for economy news of a last week. Since this blog post are more skewed to Chinese tokenization and draw the word cloud figure. I’ll leave my code here just in case. The SQL connector I used is <a href="https://pypi.org/project/psycopg2/">psycopg2</a>, an easy-use PostgreSQL library.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">__init__</span><span class="p">(</span><span class="bp">self</span><span class="p">):</span>
|
||
<span class="bp">self</span><span class="p">.</span><span class="n">dbe</span> <span class="o">=</span> <span class="n">psycopg2</span><span class="p">.</span><span class="n">connect</span><span class="p">(</span>
|
||
<span class="n">host</span><span class="o">=</span><span class="n">DB_HOST</span><span class="p">,</span> <span class="n">port</span><span class="o">=</span><span class="n">DB_PORT</span><span class="p">,</span> <span class="n">database</span><span class="o">=</span><span class="n">DB_NAME</span><span class="p">,</span> <span class="n">user</span><span class="o">=</span><span class="n">DB_USER</span><span class="p">,</span> <span class="n">password</span><span class="o">=</span><span class="n">DB_PASS</span><span class="p">)</span>
|
||
|
||
<span class="k">def</span> <span class="nf">get_1w_of_feed_byid</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="nb">id</span><span class="o">=</span><span class="mi">1</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">:</span>
|
||
<span class="n">cur</span> <span class="o">=</span> <span class="bp">self</span><span class="p">.</span><span class="n">dbe</span><span class="p">.</span><span class="n">cursor</span><span class="p">()</span>
|
||
<span class="n">cur</span><span class="p">.</span><span class="n">execute</span><span class="p">(</span><span class="s">'SELECT content FROM public.ttrss_entries </span><span class="se">\
|
||
</span><span class="s"> where date_updated &gt; now() - interval </span><span class="se">\'</span><span class="s">1 week</span><span class="se">\'</span><span class="s"> AND id in ( </span><span class="se">\
|
||
</span><span class="s"> select int_id from DB_TABLE_NAME </span><span class="se">\
|
||
</span><span class="s"> where feed_id='</span> <span class="o">+</span> <span class="nb">str</span><span class="p">(</span><span class="nb">id</span><span class="p">)</span> <span class="o">+</span> <span class="s">' </span><span class="se">\
|
||
</span><span class="s"> ) </span><span class="se">\
|
||
</span><span class="s"> ORDER BY id ASC '</span>
|
||
<span class="p">)</span>
|
||
<span class="n">rows</span> <span class="o">=</span> <span class="n">cur</span><span class="p">.</span><span class="n">fetchall</span><span class="p">()</span>
|
||
<span class="k">return</span> <span class="n">rows</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>Most arguments are intuitive and easy to understand. The only exception is argument of function <em>get_1w_of_feed_byid</em>. This <strong>id</strong> is the feed index of my subscriptions.</p>
|
||
|
||
<h2 id="tokenize-with-frequency">Tokenize with frequency</h2>
|
||
|
||
<p>Two popular tokenization library were used, and I chose <a href="https://github.com/fxsjy/jieba">jieba</a> after a few comparison. Before cutting the sentence, we first need to remove all punctuation marks.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">def</span> <span class="nf">remove_biaodian</span><span class="p">(</span><span class="n">text</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">str</span><span class="p">:</span>
|
||
<span class="n">punct</span> <span class="o">=</span> <span class="nb">set</span><span class="p">(</span><span class="s">u''':!),.:;?]}¢'"、。〉》」』】〕〗〞︰︱︳﹐、﹒
|
||
﹔﹕﹖﹗﹚﹜﹞!),.:;?|}︴︶︸︺︼︾﹀﹂﹄﹏、~¢
|
||
々‖•·ˇˉ―--′’”([{£¥'"‵〈《「『【〔〖([{£¥〝︵︷︹︻
|
||
︽︿﹁﹃﹙﹛﹝({“‘-—_…'''</span><span class="p">)</span>
|
||
<span class="n">ret</span> <span class="o">=</span> <span class="s">""</span>
|
||
<span class="k">for</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">text</span><span class="p">:</span>
|
||
<span class="k">if</span> <span class="n">x</span> <span class="ow">in</span> <span class="n">punct</span><span class="p">:</span>
|
||
<span class="n">ret</span> <span class="o">+=</span> <span class="s">''</span>
|
||
<span class="k">else</span><span class="p">:</span>
|
||
<span class="n">ret</span> <span class="o">+=</span> <span class="n">x</span>
|
||
<span class="k">return</span> <span class="n">ret</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>After we have an all characters string, we can call jieba. By using the function <em>jieba.posseg.cut</em> with or without paddle, we can have a word list and their “part of speech”. As you can see in the following code, I also did two more works.</p>
|
||
|
||
<p>First, in the if statement, I only kept all nouns with some categories. Category abbreviation such as “nr” and “ns” represent different “part of speech”, I attached with categories I used in the following table. For more details you can find in this <a href="https://github.com/fxsjy/jieba">link</a>.</p>
|
||
|
||
<p>The second work is only keeping words with length longer than 2 characters. In Chinese, there’s no space between words such as Latin writing systems. Since then, some single-character-words such as conjunction words are easy to be misrecognized as specialty-noun. And this misrecognition will cause more single-character being regarded as specialty-noun. I am not able to improve NLP method, so I used a easy way to fix this by removing any words less than 2 characters.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">jieba.posseg</span> <span class="k">as</span> <span class="n">pseg</span>
|
||
|
||
<span class="k">def</span> <span class="nf">get_noun_jieba</span><span class="p">(</span><span class="bp">self</span><span class="p">,</span> <span class="n">content</span><span class="p">:</span> <span class="nb">str</span><span class="p">)</span> <span class="o">-&gt;</span> <span class="nb">list</span><span class="p">:</span>
|
||
<span class="n">content</span> <span class="o">=</span> <span class="n">remove_biaodian</span><span class="p">(</span><span class="n">content</span><span class="p">)</span>
|
||
<span class="n">words</span> <span class="o">=</span> <span class="n">pseg</span><span class="p">.</span><span class="n">cut</span><span class="p">(</span><span class="n">content</span><span class="p">)</span> <span class="c1"># Invoking jieba.posseg.cut function
|
||
</span>
|
||
<span class="n">ret</span> <span class="o">=</span> <span class="p">[]</span>
|
||
<span class="k">for</span> <span class="n">word</span><span class="p">,</span> <span class="n">flag</span> <span class="ow">in</span> <span class="n">words</span><span class="p">:</span>
|
||
<span class="c1"># print(word, flag)
|
||
</span> <span class="k">if</span> <span class="n">flag</span> <span class="ow">in</span> <span class="p">[</span><span class="s">'nr'</span><span class="p">,</span> <span class="s">'ns'</span><span class="p">,</span> <span class="s">'nt'</span><span class="p">,</span> <span class="s">'nw'</span><span class="p">,</span> <span class="s">'nz'</span><span class="p">,</span> <span class="s">'PER'</span><span class="p">,</span> <span class="s">'ORG'</span><span class="p">,</span> <span class="s">'x'</span><span class="p">]:</span> <span class="c1"># LOC
|
||
</span> <span class="n">ret</span><span class="p">.</span><span class="n">append</span><span class="p">(</span><span class="n">word</span><span class="p">)</span>
|
||
<span class="k">return</span> <span class="p">[</span><span class="n">remove_biaodian</span><span class="p">(</span><span class="n">i</span><span class="p">)</span> <span class="k">for</span> <span class="n">i</span> <span class="ow">in</span> <span class="n">ret</span> <span class="k">if</span> <span class="n">i</span><span class="p">.</span><span class="n">strip</span><span class="p">()</span> <span class="o">!=</span> <span class="s">""</span> <span class="ow">and</span> <span class="nb">len</span><span class="p">(</span><span class="n">remove_biaodian</span><span class="p">(</span><span class="n">i</span><span class="p">.</span><span class="n">strip</span><span class="p">()))</span> <span class="o">&gt;=</span> <span class="mi">2</span><span class="p">]</span>
|
||
</code></pre></div></div>
|
||
|
||
<ul>
|
||
<li>Word category names and abbreviations</li>
|
||
</ul>
|
||
|
||
<table>
|
||
<thead>
|
||
<tr>
|
||
<th>Abbreviation</th>
|
||
<th>Category name/ Part of speech</th>
|
||
</tr>
|
||
</thead>
|
||
<tbody>
|
||
<tr>
|
||
<td>nr</td>
|
||
<td>People name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>ns</td>
|
||
<td>Location name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>nt</td>
|
||
<td>Organization name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>nw</td>
|
||
<td>Arts work noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>nz</td>
|
||
<td>Other noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>PER</td>
|
||
<td>People name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>ORG</td>
|
||
<td>Location name noun</td>
|
||
</tr>
|
||
<tr>
|
||
<td>x</td>
|
||
<td>Non-morpheme word</td>
|
||
</tr>
|
||
</tbody>
|
||
</table>
|
||
|
||
<p>With all words extracted, we can easily calculate their frequencies. After this, we can using the following line of code to print a sorted result to verify correctness.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">noun</span> <span class="o">=</span> <span class="n">seg</span><span class="p">.</span><span class="n">get_noun_jieba</span><span class="p">(</span><span class="n">test_content</span><span class="p">)</span>
|
||
<span class="c1"># ... Calculate frequency of above word list ...
|
||
</span><span class="k">print</span><span class="p">(</span><span class="nb">sorted</span><span class="p">(</span><span class="n">a_dict</span><span class="p">.</span><span class="n">items</span><span class="p">(),</span> <span class="n">key</span><span class="o">=</span><span class="k">lambda</span> <span class="n">x</span><span class="p">:</span> <span class="n">x</span><span class="p">[</span><span class="mi">1</span><span class="p">]))</span>
|
||
</code></pre></div></div>
|
||
|
||
<h2 id="draw-word-cloud">Draw word cloud</h2>
|
||
|
||
<p>With a keyword and frequency dictionary(data structure), we can just call built-in functions from wordcloud library to generate the figure.</p>
|
||
|
||
<p>First we need to initialize an instance of wordcloud class. As you can see in my code, I set it with 6 parameters. Width and Height of the canvas, maximum amount of words used to generate the figure, the font of words, background color and margin between any two words.</p>
|
||
|
||
<p>After having the instance, we call function <em>generate_from_frequencies</em> and pass keyword dictionary to it. The return value of this function is an bitmap image, which we can use <a href="https://matplotlib.org/">matplotlib</a> to plot it to your screen.</p>
|
||
|
||
<p>I tested my plot on ubuntu-subsystem on Windows 10, unfortunately matplotlib under subsystem depends on x11 window manager and its not default available on windows. We need to install an x11 manager to support. <a href="https://sourceforge.net/projects/xming/">Xming</a> is the one I used.</p>
|
||
|
||
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">wordcloud</span> <span class="kn">import</span> <span class="n">WordCloud</span>
|
||
<span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
|
||
|
||
<span class="n">font_path</span> <span class="o">=</span> <span class="s">"./font/haipai.ttf"</span>
|
||
<span class="n">output_path</span> <span class="o">=</span> <span class="s">"./font/out.png"</span>
|
||
|
||
|
||
<span class="k">def</span> <span class="nf">show_figure_with_frequency</span><span class="p">(</span><span class="n">keywords</span><span class="p">:</span> <span class="nb">dict</span><span class="p">):</span>
|
||
<span class="n">wc</span> <span class="o">=</span> <span class="n">WordCloud</span><span class="p">(</span><span class="n">width</span><span class="o">=</span><span class="mi">828</span><span class="p">,</span> <span class="n">height</span><span class="o">=</span><span class="mi">1792</span><span class="p">,</span> <span class="n">max_words</span><span class="o">=</span><span class="mi">200</span><span class="p">,</span> <span class="n">font_path</span><span class="o">=</span><span class="n">font_path</span><span class="p">,</span>
|
||
<span class="n">background_color</span><span class="o">=</span><span class="s">"white"</span><span class="p">,</span> <span class="n">margin</span><span class="o">=</span><span class="mi">1</span><span class="p">).</span><span class="n">generate_from_frequencies</span><span class="p">(</span><span class="n">keywords</span><span class="p">)</span>
|
||
<span class="n">plt</span><span class="p">.</span><span class="n">imshow</span><span class="p">(</span><span class="n">wc</span><span class="p">)</span>
|
||
<span class="n">plt</span><span class="p">.</span><span class="n">axis</span><span class="p">(</span><span class="s">'off'</span><span class="p">)</span>
|
||
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>If everything work fine, a word cloud figure will show up in a new window. My version looks like this.</p>
|
||
|
||
<p><img src="/static/2020-09/2020-06-28.png" height="150" /></p>
|
||
|
||
<p>This generated word cloud figure reflects the most popular economy news’ keyword in the week started 06-28-2020. Two largest words in the figure are “新冠” and “新冠病毒”, both means “Covid-19” (This figure was in the week of the second covid spur in Beijing, China). The size of the image fits my phone screen and I can use an app to automatic sync it to my phone’s wallpaper. However, in this image, too many location nouns are presented. This will be something I can make progress on in the future.</p>
|
||
|
||
</description>
|
||
<pubDate>Tue, 15 Sep 2020 22:00:14 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/generate-word-cloud-with-chinese-fenci</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/generate-word-cloud-with-chinese-fenci</guid>
|
||
|
||
|
||
<category>visualization</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Xv6 introduction</title>
|
||
<description><p>I hate xv6, a stupid, useless education-oriented system. In this article, I will generally talk about how to implement system call to this operating system.</p>
|
||
|
||
<h2 id="xv6-systemcall">Xv6 Systemcall</h2>
|
||
<p>To invoke a system call, we have to first define a user mode function to be the interface of the kernel instruction in file <em>user.h</em>.</p>
|
||
|
||
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kt">void</span> <span class="nf">function</span> <span class="p">(</span><span class="kt">void</span><span class="p">);</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>This interface-like function will then pass the function name, in this case function, to <em>usys.S</em>. When using user mode function in programs, <em>usys.S</em> will generate a reference to SYS_function and push system call number of this function into %eax. After that, system can know from <em>syscall.c</em> and determining whether this system call is available. We must define same name system function and add it into <em>syscall.h</em> and <em>syscall.c</em>.</p>
|
||
|
||
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#define SYS_function ## // ## is the system call number
|
||
</span><span class="p">[</span><span class="n">SYS_function</span><span class="p">]</span> <span class="n">sys_function</span> <span class="c1">// real system function name</span>
|
||
<span class="k">extern</span> <span class="kt">int</span> <span class="nf">sys_function</span><span class="p">(</span><span class="kt">void</span><span class="p">);</span> <span class="c1">// real system function declaration</span>
|
||
</code></pre></div></div>
|
||
|
||
<p>After adding these sentences to syscall files, we can implement real function in specific place where you want to make the function works well.</p>
|
||
|
||
<p>Sometimes, we need to pass variables among system calls. In this case, variables’ values are not necessary and even can’t be pass directly into system_function. When invoke a system call function, all variables of this system call will be pushed into current process’ stack. In file <em>syscall.c</em>, multiple functions are provided to get these variables from the process. I won’t waste time on explaining how to use these functions especially when elegant and detailed comments were written in source codes. However, I will explain concepts and how process organized and works in xv6 in future articles.</p>
|
||
</description>
|
||
<pubDate>Fri, 28 Jul 2017 14:56:55 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/intro-xv6</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/intro-xv6</guid>
|
||
|
||
|
||
<category>xv6</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Some of my previews experiment works: 2016</title>
|
||
<description><p>This blog contains only some basic record of my works. For some details, I will write a unique blog just for some specific topics.</p>
|
||
|
||
<h1 id="2016-10">2016-10</h1>
|
||
|
||
<h2 id="time-experiment-of-rsync">Time Experiment of rsync</h2>
|
||
|
||
<p>Patch is based on rsync with version 3.1.2. [<a href="https://download.samba.org/pub/rsync/rsync-3.1.2.tar.gz">Rsync</a>|<a href="/static/2016-10/rsync/rsync-3.1.2-time.patch">Patch</a>]</p>
|
||
|
||
<h3 id="how-to-collect-data">How to collect data</h3>
|
||
|
||
<p>Basically, everything of transmission time and computation time will be output with overall time will be printed on the console.
|
||
But we also need some bash script to collect data through different size of random size and with different modification through them.</p>
|
||
|
||
<ul>
|
||
<li>Start from 8K to 64M, modify at beginning, [<a href="/static/2016-10/rsync/small2Big_change_at_begin.sh">Bash script</a>]</li>
|
||
<li>Start from 8K to 64M, modify at last, [<a href="/static/2016-10/rsync/small2Big_change_at_last.sh">Bash script</a>]</li>
|
||
<li>Start from 8K to 64M, modify at random place with a (slow) python script, [<a href="/static/2016-10/rsync/small2Big_change_at_anyplace.sh">Bash script</a>|<a href="/static/2016-10/rsync/addbyte.py">Python program</a>]</li>
|
||
</ul>
|
||
|
||
<h2 id="time-experiment-of-seafile">Time Experiment of seafile</h2>
|
||
|
||
<p>Patch is based on seafile 5.1.4. You can find the release from <a href="https://github.com/haiwen/seafile/releases">seafile official repo</a>. You may follow official compile instructions from <a href="https://manual.seafile.com/build_seafile/linux.html">here</a>. [<a href="">Patch <strong>no longer avaiable, new version at following sections</strong></a>]</p>
|
||
|
||
<h3 id="how-to-collect-data-1">How to collect data</h3>
|
||
|
||
<p>We also need everything be done using scripting. But this time I only design added some distance between two increasing files’ sizes.</p>
|
||
|
||
<ul>
|
||
<li>Start from 8K to 16M, 4 times increasing, modify at beginning/ at 1024 different places with python script. [<a href="/static/2016-11/seafile/trans.sh">Bash Script</a>|<a href="/static/2016-11/seafile/addbyte.py">Python program</a>]</li>
|
||
<li>After using this auto testing script, everything of output will be marked in log files of seafile, which located in <strong>~/.ccnet/log/seafile.log</strong></li>
|
||
<li>We need to use this simple awk code and vim operation to extract data.</li>
|
||
</ul>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># CDC: content defined chucks</span>
|
||
<span class="c"># HUT: Http upload traffic</span>
|
||
<span class="c"># ALL: overall time of one commit &amp; upload</span>
|
||
<span class="nb">awk</span> <span class="s1">'/CDC|HUT|ALL/ {print $4,$5}'</span> ~/.ccnet/log/seafile.log <span class="o">&gt;</span> results.stat
|
||
</code></pre></div></div>
|
||
|
||
<h3 id="install-seafile-on-odroid-xu">Install Seafile on odroid xu</h3>
|
||
|
||
<p>Due to failure of my cross-compile to seafile on android. I used develop board as a replacement experiment platform for ARM-seafile testing. I used a <a href="http://www.hardkernel.com/main/products/prdt_info.php?g_code=G137510300620">odroid xu</a> as hardware standard. Because all I need is an ARM platform, only an ARM-Ubuntu is enough for me. But develop prototype on a board is much fun than coding, I won’t address much this time. But I’ll start a blog telling some really cool stuff I made for a strange aim.</p>
|
||
|
||
<p>To install a ubuntu with GUI is my all preparation work. I found to way to do this.</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p><a href="http://www.armhf.com/boards/odroid-xu/">armhf</a> is a website for arm-based ubuntu. It has a detailed instruction to follow at <a href="http://www.armhf.com/boards/odroid-xu/odroid-sd-install/">here</a>. They also provide ubuntu 12.04/ 14.04 and debian 7.5 to choose. But unfortunately odroid xu’s hdmi output doesn’t supported by ubuntu native firmware. So install ubuntu-desktop might can’t be boot up for video output.</p>
|
||
</li>
|
||
<li>
|
||
<p>Burn images is much easy to install a pre-complied ubuntu system. I found this on odroid xu’s forum, which contains xubuntu image [<a href="http://odroid.in/ubuntu_14.04lts/ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img.xz">download</a>] for odroid xu. With this image, you just need to use dd command to write whole system mirror into sdcard.</p>
|
||
</li>
|
||
</ul>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># If .img end with xz, use this command to uncompress first</span>
|
||
unxz ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img.xz
|
||
<span class="c"># Burn image into SD-card</span>
|
||
<span class="nb">sudo dd </span><span class="k">if</span><span class="o">=</span>ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img <span class="nv">of</span><span class="o">=</span>/dev/sdb <span class="nv">bs</span><span class="o">=</span>1M <span class="nv">conv</span><span class="o">=</span>fsync
|
||
<span class="nb">sync</span>
|
||
</code></pre></div></div>
|
||
|
||
<h1 id="2016-11">2016-11</h1>
|
||
|
||
<h2 id="android-kernel">Android Kernel</h2>
|
||
|
||
<h3 id="how-to-build-an-android-kernel">How to build an Android Kernel?</h3>
|
||
|
||
<p>Generally, I won’t tell anything in this parts, just mark some related links, and point out some mistakes or error solutions.</p>
|
||
|
||
<ul>
|
||
<li>
|
||
<p><a href="http://source.android.com/source/building-kernels.html#figuring-out-which-kernel-to-build">Google Official Guide</a>
|
||
– If you don’t have AOSP sources, you have to download prebuilt toolchains which recommended in this guide might not be correct. Use following links to choose your fitting tools.
|
||
— <a href="https://android.googlesource.com/?format=HTML">ASOP git root</a>, under sub class “/platform/prebuilts/gcc”</p>
|
||
</li>
|
||
<li>
|
||
<p><a href="https://softwarebakery.com/building-the-android-kernel-on-linux">Packing and Flashing a Boot.img</a> <strong>[highly recommend]</strong></p>
|
||
</li>
|
||
</ul>
|
||
|
||
<h1 id="2016-12">2016-12</h1>
|
||
|
||
<h2 id="android-kernel-1">Android Kernel</h2>
|
||
|
||
<h3 id="how-to-compile-with-ftrace">How to compile with ftrace?</h3>
|
||
|
||
<p>If we want to debug under android, ftrace is a great tool for working. But, ftrace is not available in android if we used default configure file. Android kernel configuration is in <strong>arch/arm64/kernel/configs</strong>. We need to add few lines under that.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="nv">CONFIG_STRICT_MEMORY_RWX</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_FUNCTION_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_FUNCTION_GRAPH_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_DYNAMIC_FTRACE</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_PERSISTENT_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_IRQSOFF_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_PREEMPT_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_SCHED_TRACER</span><span class="o">=</span>y
|
||
<span class="nv">CONFIG_STACK_TRACER</span><span class="o">=</span>y
|
||
</code></pre></div></div>
|
||
|
||
<h3 id="how-to-extract-android-images-dump-an-image">How to extract android images: Dump an image</h3>
|
||
|
||
<p>If we want to hold a rooted status after flashing boot, we need to extract an image from android devices. We can first use following command to find which blocks belongs to. According to some references, <a href="http://forum.xda-developers.com/showthread.php?t=2450045">this article</a> provide three ways to dump an image, I picked one for easy using.</p>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code>adb shell
|
||
<span class="nb">ls</span> <span class="nt">-al</span> /dev/block/platform/<span class="nv">$SOME</span><span class="se">\_</span>DEVICE../../by-name <span class="c"># {Partitions} -&gt; {Device Block}</span>
|
||
|
||
<span class="c"># dump file</span>
|
||
su
|
||
<span class="nb">dd </span><span class="k">if</span><span class="o">=</span>/dev/block/mmcblk0p37 <span class="nv">of</span><span class="o">=</span>/sdcard/boot.img
|
||
</code></pre></div></div>
|
||
</description>
|
||
<pubDate>Fri, 28 Oct 2016 12:27:33 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/some-of-my-previews-exper-work</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/some-of-my-previews-exper-work</guid>
|
||
|
||
|
||
<category>Research</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Using charles proxy to monitor mobile SSL traffics</title>
|
||
<description><p>In this blog, I will generally talk about how to use proper tools to monitor SSL traffics of a mobile devices. Currently, I only can dealing with those SSL traffics which use an obviously certification. Some applications may not using system root cert or they doesn’t provide us a method to modify their own certs. For these situation, I still didn’t find a good solutions for it. But I’ll keep updating this if I get one.<br />
|
||
My current solution is using AP to forward all SSL traffic to a proxy, <a href="https://www.charlesproxy.com/">charles proxy</a> is my first choice (Prof asked). It’s a non-free software which still update new versions now. So mainly, I’ll talk about how to charles SSL proxy.</p>
|
||
|
||
<h3 id="preparations">Preparations</h3>
|
||
<ul>
|
||
<li>Monitor device situation: Linux Machine with wireless adapter</li>
|
||
<li>Download the newest version(4.0.1) of charles</li>
|
||
<li>Target android devices with root privilege</li>
|
||
</ul>
|
||
|
||
<h3 id="install-charles-and-configuration">Install Charles and Configuration</h3>
|
||
|
||
<ul>
|
||
<li>You have to install charles first. After downloading the charles proxy, you have to unzip it and configure some basic settings.</li>
|
||
</ul>
|
||
|
||
<div class="language-bash highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c"># open charles first</span>
|
||
./bin/charles
|
||
</code></pre></div></div>
|
||
<ul>
|
||
<li>Save charles’ private key and public key</li>
|
||
</ul>
|
||
|
||
<p>In Help -&gt; SSL Proxying -&gt; Export Charles Root Certificate and Private Key, enter a password and save the public and private key in *.p12 format.<br />
|
||
You also need to save charles Root Certificate, it also contains in the same menu. For convience, save it as *.pem format.</p>
|
||
|
||
<ul>
|
||
<li>Set Proxy and SSL Proxy</li>
|
||
</ul>
|
||
</description>
|
||
<pubDate>Thu, 27 Oct 2016 22:50:33 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/charles-is-not-a-good-tool</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/charles-is-not-a-good-tool</guid>
|
||
|
||
|
||
<category>Network</category>
|
||
|
||
</item>
|
||
|
||
<item>
|
||
<title>Stop Talking is the worst title of one blog</title>
|
||
<description>
|
||
</description>
|
||
<pubDate>Wed, 26 Oct 2016 22:50:33 -0400</pubDate>
|
||
<link>https://codersherlock.github.com//archivers/hello</link>
|
||
<guid isPermaLink="true">https://codersherlock.github.com//archivers/hello</guid>
|
||
|
||
|
||
<category>Nonsense</category>
|
||
|
||
</item>
|
||
|
||
</channel>
|
||
</rss>
|