CoderSherlock.github.io/_site/feed.xml

<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom">
  <channel>
    <title>Stop Talking, Start Doing - 停止空想，开始行动</title>
    <description>My personal blog, with some boring research staff and some tricks I was fancy to. I'll try my best to make this blog fun and useful. Not just a place I complain about all happens in my Lab.
</description>
    <link>https://codersherlock.github.com//</link>
    <atom:link href="https://codersherlock.github.com//feed.xml" rel="self" type="application/rss+xml"/>
    <pubDate>Tue, 15 Sep 2020 22:22:06 -0400</pubDate>
    <lastBuildDate>Tue, 15 Sep 2020 22:22:06 -0400</lastBuildDate>
    <generator>Jekyll v4.1.1</generator>

      <item>
        <title>Generate Word Cloud Figures with Chinese-Tokenization and WordCloud python libraries</title>
        <description>&lt;p&gt;&lt;img src=&quot;/static/2020-09/2020-06-28.png&quot; height=&quot;350&quot; /&gt;&lt;/p&gt;

&lt;h2 id=&quot;background&quot;&gt;Background&lt;/h2&gt;

&lt;p&gt;Recently, I set up a web-based RSS client for retrieving and organizing everyday news. I used &lt;a href=&quot;https://tt-rss.org/&quot;&gt;TinyTinyRSS&lt;/a&gt;, or as ttrss, a popular RSS client which friendly to docker. Thanks to developer &lt;a href=&quot;https://ttrss.henry.wang/#about&quot;&gt;HenryQW&lt;/a&gt;, a well-written Nginx-based docker configuration is already available in docker hub. With more feeds were added, I found some feeds does not need to be checked everyday. Thus I was thinking to create a script to automatically list all keywords appears in a last period and generate a heat map kind figure of it.&lt;/p&gt;

&lt;p&gt;Before you go further, I’ll tell you all my settings to give readers a general overview.&lt;/p&gt;

&lt;p&gt;My first step is to read all text-based information from TTRSS’s PostgreSQL database. With information, I used a Chinese-NLP library, &lt;a href=&quot;https://github.com/fxsjy/jieba&quot;&gt;jieba&lt;/a&gt;, to extract all keyword with their occurrences frequency. By using &lt;a href=&quot;https://github.com/amueller/word_cloud&quot;&gt;WordCloud&lt;/a&gt;, a python library, word cloud figure is generated and present. More details will be discussed in later sections.&lt;/p&gt;

&lt;h2 id=&quot;get-rss-feeds-text&quot;&gt;Get RSS feeds’ text&lt;/h2&gt;

&lt;p&gt;My first thought is generating a keyword heat map for economy news of a last week. Since this blog post are more skewed to Chinese tokenization and draw the word cloud figure. I’ll leave my code here just in case. The SQL connector I used is &lt;a href=&quot;https://pypi.org/project/psycopg2/&quot;&gt;psycopg2&lt;/a&gt;, an easy-use PostgreSQL library.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;__init__&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
	&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dbe&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;psycopg2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;connect&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;
    	&lt;span class=&quot;n&quot;&gt;host&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_HOST&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;port&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_PORT&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;database&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_NAME&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;user&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_USER&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;password&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;DB_PASS&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_1w_of_feed_byid&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;dbe&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cursor&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;execute&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'SELECT content FROM public.ttrss_entries &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\
&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;    	where date_updated &amp;gt; now() - interval &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\'&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;1 week&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\'&lt;/span&gt;&lt;span class=&quot;s&quot;&gt; AND id in ( &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\
&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;        select int_id from DB_TABLE_NAME &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\
&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;        where feed_id='&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;id&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;' &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\
&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;        ) &lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\
&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;        ORDER BY id ASC '&lt;/span&gt;
        &lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;rows&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;cur&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;fetchall&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;rows&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;Most arguments are intuitive and easy to understand. The only exception is argument of function &lt;em&gt;get_1w_of_feed_byid&lt;/em&gt;. This &lt;strong&gt;id&lt;/strong&gt; is the feed index of my subscriptions.&lt;/p&gt;

&lt;h2 id=&quot;tokenize-with-frequency&quot;&gt;Tokenize with frequency&lt;/h2&gt;

&lt;p&gt;Two popular tokenization library were used, and I chose &lt;a href=&quot;https://github.com/fxsjy/jieba&quot;&gt;jieba&lt;/a&gt; after a few comparison. Before cutting the sentence, we first need to remove all punctuation marks.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;remove_biaodian&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;punct&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;set&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;u''':!),.:;?]}¢'&quot;、。〉》」』】〕〗〞︰︱︳﹐､﹒
                ﹔﹕﹖﹗﹚﹜﹞！），．：；？｜｝︴︶︸︺︼︾﹀﹂﹄﹏､～￠
                々‖•·ˇˉ―--′’”([{£¥'&quot;‵〈《「『【〔〖（［｛￡￥〝︵︷︹︻
                ︽︿﹁﹃﹙﹛﹝（｛“‘-—_…'''&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;text&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;punct&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;''&lt;/span&gt;
        &lt;span class=&quot;k&quot;&gt;else&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
            &lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;+=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;
    &lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After we have an all characters string, we can call jieba. By using the function &lt;em&gt;jieba.posseg.cut&lt;/em&gt; with or without paddle, we can have a word list and their “part of speech”.  As you can see in the following code, I also did two more works.&lt;/p&gt;

&lt;p&gt;First, in the if statement, I only kept all nouns with some categories. Category abbreviation such as “nr” and “ns” represent different “part of speech”, I attached with categories I used in the following table. For more details you can find in this &lt;a href=&quot;https://github.com/fxsjy/jieba&quot;&gt;link&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;The second work is only keeping words with length longer than 2 characters. In Chinese, there’s no space between words such as Latin writing systems. Since then, some single-character-words such as conjunction words are easy to be misrecognized as specialty-noun.  And this misrecognition will cause more single-character being regarded as specialty-noun. I am not able to improve NLP method, so I used a easy way to fix this by removing any words less than 2 characters.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;jieba.posseg&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pseg&lt;/span&gt;

&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;get_noun_jieba&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;bp&quot;&gt;self&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;str&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;-&amp;gt;&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;list&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;content&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;remove_biaodian&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;words&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;pseg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;cut&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;content&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;	&lt;span class=&quot;c1&quot;&gt;# Invoking jieba.posseg.cut function
&lt;/span&gt;
	&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[]&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;flag&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;words&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt;
		&lt;span class=&quot;c1&quot;&gt;# print(word, flag)
&lt;/span&gt;		&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;flag&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'nr'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'ns'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'nt'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'nw'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'nz'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'PER'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'ORG'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;'x'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]:&lt;/span&gt;   &lt;span class=&quot;c1&quot;&gt;# LOC
&lt;/span&gt;			&lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;append&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;word&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
	&lt;span class=&quot;k&quot;&gt;return&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;remove_biaodian&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;for&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;in&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;ret&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;if&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;!=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;&quot;&lt;/span&gt; &lt;span class=&quot;ow&quot;&gt;and&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;len&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;remove_biaodian&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;i&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;strip&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()))&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;&amp;gt;=&lt;/span&gt; &lt;span class=&quot;mi&quot;&gt;2&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;ul&gt;
  &lt;li&gt;Word category names and abbreviations&lt;/li&gt;
&lt;/ul&gt;

&lt;table&gt;
  &lt;thead&gt;
    &lt;tr&gt;
      &lt;th&gt;Abbreviation&lt;/th&gt;
      &lt;th&gt;Category name/ Part of speech&lt;/th&gt;
    &lt;/tr&gt;
  &lt;/thead&gt;
  &lt;tbody&gt;
    &lt;tr&gt;
      &lt;td&gt;nr&lt;/td&gt;
      &lt;td&gt;People name noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ns&lt;/td&gt;
      &lt;td&gt;Location name noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;nt&lt;/td&gt;
      &lt;td&gt;Organization name noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;nw&lt;/td&gt;
      &lt;td&gt;Arts work noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;nz&lt;/td&gt;
      &lt;td&gt;Other noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;PER&lt;/td&gt;
      &lt;td&gt;People name noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;ORG&lt;/td&gt;
      &lt;td&gt;Location name noun&lt;/td&gt;
    &lt;/tr&gt;
    &lt;tr&gt;
      &lt;td&gt;x&lt;/td&gt;
      &lt;td&gt;Non-morpheme word&lt;/td&gt;
    &lt;/tr&gt;
  &lt;/tbody&gt;
&lt;/table&gt;

&lt;p&gt;With all words extracted, we can easily calculate their frequencies.  After this, we can using the following line of code to print a sorted result to verify correctness.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;n&quot;&gt;noun&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;seg&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;get_noun_jieba&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;test_content&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
&lt;span class=&quot;c1&quot;&gt;# ... Calculate frequency of above word list ...
&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;print&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;nb&quot;&gt;sorted&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;a_dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;items&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(),&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;key&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;k&quot;&gt;lambda&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;x&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]))&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h2 id=&quot;draw-word-cloud&quot;&gt;Draw word cloud&lt;/h2&gt;

&lt;p&gt;With a keyword and frequency dictionary(data structure), we can just call built-in functions from wordcloud library to generate the figure.&lt;/p&gt;

&lt;p&gt;First we need to initialize an instance of wordcloud class. As you can see in my code, I set it with 6 parameters. Width and Height of the canvas, maximum amount of words used to generate the figure, the font of words, background color and margin between any two words.&lt;/p&gt;

&lt;p&gt;After having the instance, we call function &lt;em&gt;generate_from_frequencies&lt;/em&gt; and pass keyword dictionary to it. The return value of this function is an bitmap image, which we can use &lt;a href=&quot;https://matplotlib.org/&quot;&gt;matplotlib&lt;/a&gt; to plot it to your screen.&lt;/p&gt;

&lt;p&gt;I tested my plot on ubuntu-subsystem on Windows 10, unfortunately matplotlib under subsystem depends on x11 window manager and its not default available on windows. We need to install an  x11 manager to support. &lt;a href=&quot;https://sourceforge.net/projects/xming/&quot;&gt;Xming&lt;/a&gt; is the one I used.&lt;/p&gt;

&lt;div class=&quot;language-python highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kn&quot;&gt;from&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;wordcloud&lt;/span&gt; &lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WordCloud&lt;/span&gt;
&lt;span class=&quot;kn&quot;&gt;import&lt;/span&gt; &lt;span class=&quot;nn&quot;&gt;matplotlib.pyplot&lt;/span&gt; &lt;span class=&quot;k&quot;&gt;as&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;

&lt;span class=&quot;n&quot;&gt;font_path&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./font/haipai.ttf&quot;&lt;/span&gt;
&lt;span class=&quot;n&quot;&gt;output_path&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;s&quot;&gt;&quot;./font/out.png&quot;&lt;/span&gt;


&lt;span class=&quot;k&quot;&gt;def&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;show_figure_with_frequency&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keywords&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;:&lt;/span&gt; &lt;span class=&quot;nb&quot;&gt;dict&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;):&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt; &lt;span class=&quot;o&quot;&gt;=&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;WordCloud&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;width&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;828&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;height&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1792&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;max_words&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;200&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;font_path&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;font_path&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt;
                   &lt;span class=&quot;n&quot;&gt;background_color&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;&quot;white&quot;&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;,&lt;/span&gt; &lt;span class=&quot;n&quot;&gt;margin&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;&lt;span class=&quot;mi&quot;&gt;1&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;).&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;generate_from_frequencies&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;keywords&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;imshow&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;wc&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;axis&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;s&quot;&gt;'off'&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;)&lt;/span&gt;
    &lt;span class=&quot;n&quot;&gt;plt&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;.&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;show&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;()&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;If everything work fine, a word cloud figure will show up in a new window. My version looks like this.&lt;/p&gt;

&lt;p&gt;&lt;img src=&quot;/static/2020-09/2020-06-28.png&quot; height=&quot;150&quot; /&gt;&lt;/p&gt;

&lt;p&gt;This generated word cloud figure reflects the most popular economy news’ keyword in the week started 06-28-2020. Two largest words in the figure are “新冠” and “新冠病毒”, both means “Covid-19” (This figure was in the week of the second covid spur in Beijing, China). The size of the image fits my phone screen and I can use an app to automatic sync it to my phone’s wallpaper. However, in this image, too many location nouns are presented. This will be something I can make progress on in the future.&lt;/p&gt;

</description>
        <pubDate>Tue, 15 Sep 2020 22:00:14 -0400</pubDate>
        <link>https://codersherlock.github.com//archivers/generate-word-cloud-with-chinese-fenci</link>
        <guid isPermaLink="true">https://codersherlock.github.com//archivers/generate-word-cloud-with-chinese-fenci</guid>


        <category>visualization</category>

      </item>

      <item>
        <title>Xv6 introduction</title>
        <description>&lt;p&gt;I hate xv6, a stupid, useless education-oriented system. In this article, I will generally talk about how to implement system call to this operating system.&lt;/p&gt;

&lt;h2 id=&quot;xv6-systemcall&quot;&gt;Xv6 Systemcall&lt;/h2&gt;
&lt;p&gt;To invoke a system call, we have to first define a user mode function to be the interface of the kernel instruction  in file &lt;em&gt;user.h&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;function&lt;/span&gt; &lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;This interface-like function will then pass the function name, in this case function, to &lt;em&gt;usys.S&lt;/em&gt;. When using user mode function in programs, &lt;em&gt;usys.S&lt;/em&gt; will generate a reference to SYS_function and push system call number of this function into %eax. After that, system can know from &lt;em&gt;syscall.c&lt;/em&gt; and determining whether this system call is available. We must define same name system function and add it into &lt;em&gt;syscall.h&lt;/em&gt; and &lt;em&gt;syscall.c&lt;/em&gt;.&lt;/p&gt;

&lt;div class=&quot;language-c highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;cp&quot;&gt;#define SYS_function ##		// ## is the system call number
&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;[&lt;/span&gt;&lt;span class=&quot;n&quot;&gt;SYS_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;]&lt;/span&gt;  &lt;span class=&quot;n&quot;&gt;sys_function&lt;/span&gt;	&lt;span class=&quot;c1&quot;&gt;// real system function name&lt;/span&gt;
&lt;span class=&quot;k&quot;&gt;extern&lt;/span&gt; &lt;span class=&quot;kt&quot;&gt;int&lt;/span&gt; &lt;span class=&quot;nf&quot;&gt;sys_function&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;(&lt;/span&gt;&lt;span class=&quot;kt&quot;&gt;void&lt;/span&gt;&lt;span class=&quot;p&quot;&gt;);&lt;/span&gt;	&lt;span class=&quot;c1&quot;&gt;// real system function declaration&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;p&gt;After adding these sentences to syscall files, we can implement real function in specific place where you want to make the function works well.&lt;/p&gt;

&lt;p&gt;Sometimes, we need to pass variables among system calls. In this case, variables’ values are not necessary and even can’t be pass directly into system_function. When invoke a system call function, all variables of this system call will be pushed into current process’ stack. In file &lt;em&gt;syscall.c&lt;/em&gt;, multiple functions are provided to get these variables from the process. I won’t waste time on explaining how to use these functions especially when elegant and detailed comments were written in source codes. However, I will explain concepts and how process organized and works in xv6 in future articles.&lt;/p&gt;
</description>
        <pubDate>Fri, 28 Jul 2017 14:56:55 -0400</pubDate>
        <link>https://codersherlock.github.com//archivers/intro-xv6</link>
        <guid isPermaLink="true">https://codersherlock.github.com//archivers/intro-xv6</guid>


        <category>xv6</category>

      </item>

      <item>
        <title>Some of my previews experiment works: 2016</title>
        <description>&lt;p&gt;This blog contains only some basic record of my works. For some details, I will write a unique blog just for some specific topics.&lt;/p&gt;

&lt;h1 id=&quot;2016-10&quot;&gt;2016-10&lt;/h1&gt;

&lt;h2 id=&quot;time-experiment-of-rsync&quot;&gt;Time Experiment of rsync&lt;/h2&gt;

&lt;p&gt;Patch is based on rsync with version 3.1.2. [&lt;a href=&quot;https://download.samba.org/pub/rsync/rsync-3.1.2.tar.gz&quot;&gt;Rsync&lt;/a&gt;|&lt;a href=&quot;/static/2016-10/rsync/rsync-3.1.2-time.patch&quot;&gt;Patch&lt;/a&gt;]&lt;/p&gt;

&lt;h3 id=&quot;how-to-collect-data&quot;&gt;How to collect data&lt;/h3&gt;

&lt;p&gt;Basically, everything of transmission time and computation time will be output with overall time will be printed on the console.
But we also need some bash script to collect data through different size of random size and with different modification through them.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Start from 8K to 64M, modify at beginning, [&lt;a href=&quot;/static/2016-10/rsync/small2Big_change_at_begin.sh&quot;&gt;Bash script&lt;/a&gt;]&lt;/li&gt;
  &lt;li&gt;Start from 8K to 64M, modify at last, [&lt;a href=&quot;/static/2016-10/rsync/small2Big_change_at_last.sh&quot;&gt;Bash script&lt;/a&gt;]&lt;/li&gt;
  &lt;li&gt;Start from 8K to 64M, modify at random place with a (slow) python script, [&lt;a href=&quot;/static/2016-10/rsync/small2Big_change_at_anyplace.sh&quot;&gt;Bash script&lt;/a&gt;|&lt;a href=&quot;/static/2016-10/rsync/addbyte.py&quot;&gt;Python program&lt;/a&gt;]&lt;/li&gt;
&lt;/ul&gt;

&lt;h2 id=&quot;time-experiment-of-seafile&quot;&gt;Time Experiment of seafile&lt;/h2&gt;

&lt;p&gt;Patch is based on seafile 5.1.4. You can find the release from &lt;a href=&quot;https://github.com/haiwen/seafile/releases&quot;&gt;seafile official repo&lt;/a&gt;. You may follow official compile instructions from &lt;a href=&quot;https://manual.seafile.com/build_seafile/linux.html&quot;&gt;here&lt;/a&gt;. [&lt;a href=&quot;&quot;&gt;Patch &lt;strong&gt;no longer avaiable, new version at following sections&lt;/strong&gt;&lt;/a&gt;]&lt;/p&gt;

&lt;h3 id=&quot;how-to-collect-data-1&quot;&gt;How to collect data&lt;/h3&gt;

&lt;p&gt;We also need everything be done using scripting. But this time I only design added some distance between two increasing files’ sizes.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Start from 8K to 16M, 4 times increasing, modify at beginning/ at 1024 different places with python script. [&lt;a href=&quot;/static/2016-11/seafile/trans.sh&quot;&gt;Bash Script&lt;/a&gt;|&lt;a href=&quot;/static/2016-11/seafile/addbyte.py&quot;&gt;Python program&lt;/a&gt;]&lt;/li&gt;
  &lt;li&gt;After using this auto testing script, everything of output will be marked in log files of seafile, which located in &lt;strong&gt;~/.ccnet/log/seafile.log&lt;/strong&gt;&lt;/li&gt;
  &lt;li&gt;We need to use this simple awk code and vim operation to extract data.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# CDC: content defined chucks&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# HUT: Http upload traffic&lt;/span&gt;
&lt;span class=&quot;c&quot;&gt;# ALL: overall time of one commit &amp;amp; upload&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;awk&lt;/span&gt; &lt;span class=&quot;s1&quot;&gt;'/CDC|HUT|ALL/ {print $4,$5}'&lt;/span&gt; ~/.ccnet/log/seafile.log &lt;span class=&quot;o&quot;&gt;&amp;gt;&lt;/span&gt; results.stat
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;install-seafile-on-odroid-xu&quot;&gt;Install Seafile on odroid xu&lt;/h3&gt;

&lt;p&gt;Due to failure of my cross-compile to seafile on android. I used develop board as a replacement experiment platform for ARM-seafile testing. I used a &lt;a href=&quot;http://www.hardkernel.com/main/products/prdt_info.php?g_code=G137510300620&quot;&gt;odroid xu&lt;/a&gt; as hardware standard. Because all I need is an ARM platform, only an ARM-Ubuntu is enough for me. But develop prototype on a board is much fun than coding, I won’t address much this time. But I’ll start a blog telling some really cool stuff I made for a strange aim.&lt;/p&gt;

&lt;p&gt;To install a ubuntu with GUI is my all preparation work. I found to way to do this.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://www.armhf.com/boards/odroid-xu/&quot;&gt;armhf&lt;/a&gt; is a website for arm-based ubuntu. It has a detailed instruction to follow at &lt;a href=&quot;http://www.armhf.com/boards/odroid-xu/odroid-sd-install/&quot;&gt;here&lt;/a&gt;. They also provide ubuntu 12.04/ 14.04 and debian 7.5 to choose. But unfortunately odroid xu’s hdmi output doesn’t supported by ubuntu native firmware. So install ubuntu-desktop might can’t be boot up for video output.&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;Burn images is much easy to install a pre-complied ubuntu system. I found this on odroid xu’s forum, which contains xubuntu image [&lt;a href=&quot;http://odroid.in/ubuntu_14.04lts/ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img.xz&quot;&gt;download&lt;/a&gt;] for odroid xu. With this image, you just need to use dd command to write whole system mirror into sdcard.&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# If .img end with xz, use this command to uncompress first&lt;/span&gt;
unxz ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img.xz
&lt;span class=&quot;c&quot;&gt;# Burn image into SD-card&lt;/span&gt;
&lt;span class=&quot;nb&quot;&gt;sudo dd &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;ubuntu-14.04lts-xubuntu-odroid-xu-20140714.img &lt;span class=&quot;nv&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/dev/sdb &lt;span class=&quot;nv&quot;&gt;bs&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;1M &lt;span class=&quot;nv&quot;&gt;conv&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;fsync
&lt;span class=&quot;nb&quot;&gt;sync&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h1 id=&quot;2016-11&quot;&gt;2016-11&lt;/h1&gt;

&lt;h2 id=&quot;android-kernel&quot;&gt;Android Kernel&lt;/h2&gt;

&lt;h3 id=&quot;how-to-build-an-android-kernel&quot;&gt;How to build an Android Kernel?&lt;/h3&gt;

&lt;p&gt;Generally, I won’t tell anything in this parts, just mark some related links, and point out some mistakes or error solutions.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;http://source.android.com/source/building-kernels.html#figuring-out-which-kernel-to-build&quot;&gt;Google Official Guide&lt;/a&gt;
– If you don’t have AOSP sources, you have to download prebuilt toolchains which recommended in this guide might not be correct. Use following links to choose your fitting tools.
— &lt;a href=&quot;https://android.googlesource.com/?format=HTML&quot;&gt;ASOP git root&lt;/a&gt;, under sub class “/platform/prebuilts/gcc”&lt;/p&gt;
  &lt;/li&gt;
  &lt;li&gt;
    &lt;p&gt;&lt;a href=&quot;https://softwarebakery.com/building-the-android-kernel-on-linux&quot;&gt;Packing and Flashing a Boot.img&lt;/a&gt; &lt;strong&gt;[highly recommend]&lt;/strong&gt;&lt;/p&gt;
  &lt;/li&gt;
&lt;/ul&gt;

&lt;h1 id=&quot;2016-12&quot;&gt;2016-12&lt;/h1&gt;

&lt;h2 id=&quot;android-kernel-1&quot;&gt;Android Kernel&lt;/h2&gt;

&lt;h3 id=&quot;how-to-compile-with-ftrace&quot;&gt;How to compile with ftrace?&lt;/h3&gt;

&lt;p&gt;If we want to debug under android, ftrace is a great tool for working. But, ftrace is not available in android if we used default configure file. Android kernel configuration is in &lt;strong&gt;arch/arm64/kernel/configs&lt;/strong&gt;. We need to add few lines under that.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;nv&quot;&gt;CONFIG_STRICT_MEMORY_RWX&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_FUNCTION_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_FUNCTION_GRAPH_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_DYNAMIC_FTRACE&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_PERSISTENT_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_IRQSOFF_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_PREEMPT_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_SCHED_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;span class=&quot;nv&quot;&gt;CONFIG_STACK_TRACER&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;y
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;

&lt;h3 id=&quot;how-to-extract-android-images-dump-an-image&quot;&gt;How to extract android images: Dump an image&lt;/h3&gt;

&lt;p&gt;If we want to hold a rooted status after flashing boot, we need to extract an image from android devices. We can first use following command to find which blocks belongs to. According to some references, &lt;a href=&quot;http://forum.xda-developers.com/showthread.php?t=2450045&quot;&gt;this article&lt;/a&gt; provide three ways to dump an image, I picked one for easy using.&lt;/p&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;adb shell
&lt;span class=&quot;nb&quot;&gt;ls&lt;/span&gt; &lt;span class=&quot;nt&quot;&gt;-al&lt;/span&gt; /dev/block/platform/&lt;span class=&quot;nv&quot;&gt;$SOME&lt;/span&gt;&lt;span class=&quot;se&quot;&gt;\_&lt;/span&gt;DEVICE../../by-name &lt;span class=&quot;c&quot;&gt;# {Partitions} -&amp;gt; {Device Block}&lt;/span&gt;

&lt;span class=&quot;c&quot;&gt;# dump file&lt;/span&gt;
su
&lt;span class=&quot;nb&quot;&gt;dd &lt;/span&gt;&lt;span class=&quot;k&quot;&gt;if&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/dev/block/mmcblk0p37 &lt;span class=&quot;nv&quot;&gt;of&lt;/span&gt;&lt;span class=&quot;o&quot;&gt;=&lt;/span&gt;/sdcard/boot.img
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
</description>
        <pubDate>Fri, 28 Oct 2016 12:27:33 -0400</pubDate>
        <link>https://codersherlock.github.com//archivers/some-of-my-previews-exper-work</link>
        <guid isPermaLink="true">https://codersherlock.github.com//archivers/some-of-my-previews-exper-work</guid>


        <category>Research</category>

      </item>

      <item>
        <title>Using charles proxy to monitor mobile SSL traffics</title>
        <description>&lt;p&gt;In this blog, I will generally talk about how to use proper tools to monitor SSL traffics of a mobile devices. Currently, I only can dealing with those SSL traffics which use an obviously certification. Some applications may not using system root cert or they doesn’t provide us a method to modify their own certs. For these situation, I still didn’t find a good solutions for it. But I’ll keep updating this if I get one.&lt;br /&gt;
My current solution is using AP to forward all SSL traffic to a proxy, &lt;a href=&quot;https://www.charlesproxy.com/&quot;&gt;charles proxy&lt;/a&gt; is my first choice (Prof asked). It’s a non-free software which still update new versions now. So mainly, I’ll talk about how to charles SSL proxy.&lt;/p&gt;

&lt;h3 id=&quot;preparations&quot;&gt;Preparations&lt;/h3&gt;
&lt;ul&gt;
  &lt;li&gt;Monitor device situation: Linux Machine with wireless adapter&lt;/li&gt;
  &lt;li&gt;Download the newest version(4.0.1) of charles&lt;/li&gt;
  &lt;li&gt;Target android devices with root privilege&lt;/li&gt;
&lt;/ul&gt;

&lt;h3 id=&quot;install-charles-and-configuration&quot;&gt;Install Charles and Configuration&lt;/h3&gt;

&lt;ul&gt;
  &lt;li&gt;You have to install charles first. After downloading the charles proxy, you have to unzip it and configure some basic settings.&lt;/li&gt;
&lt;/ul&gt;

&lt;div class=&quot;language-bash highlighter-rouge&quot;&gt;&lt;div class=&quot;highlight&quot;&gt;&lt;pre class=&quot;highlight&quot;&gt;&lt;code&gt;&lt;span class=&quot;c&quot;&gt;# open charles first&lt;/span&gt;
./bin/charles
&lt;/code&gt;&lt;/pre&gt;&lt;/div&gt;&lt;/div&gt;
&lt;ul&gt;
  &lt;li&gt;Save charles’ private key and public key&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;In Help -&amp;gt; SSL Proxying -&amp;gt; Export Charles Root Certificate and Private Key, enter a password and save the public and private key in *.p12 format.&lt;br /&gt;
You also need to save charles Root Certificate, it also contains in the same menu. For convience, save it as *.pem format.&lt;/p&gt;

&lt;ul&gt;
  &lt;li&gt;Set Proxy and SSL Proxy&lt;/li&gt;
&lt;/ul&gt;
</description>
        <pubDate>Thu, 27 Oct 2016 22:50:33 -0400</pubDate>
        <link>https://codersherlock.github.com//archivers/charles-is-not-a-good-tool</link>
        <guid isPermaLink="true">https://codersherlock.github.com//archivers/charles-is-not-a-good-tool</guid>


        <category>Network</category>

      </item>

      <item>
        <title>Stop Talking is the worst title of one blog</title>
        <description>
</description>
        <pubDate>Wed, 26 Oct 2016 22:50:33 -0400</pubDate>
        <link>https://codersherlock.github.com//archivers/hello</link>
        <guid isPermaLink="true">https://codersherlock.github.com//archivers/hello</guid>


        <category>Nonsense</category>

      </item>

  </channel>
</rss>