Jekyll2024-01-03T02:59:55+00:00https://h313.info/feed.xmlsome guy’s blogjust a wack systems dude tbhHaoda Wangharry@h313.infoThe Promised Neverland: A Marxist Analysis2022-09-05T09:33:13+00:002022-09-05T09:33:13+00:00https://h313.info/2022/09/05/the-promised-neverland-a-marxist-analysis<p>In 2019, Fuji TV premiered <em>The Promised Neverland</em>, an anime series revolving around a group of
kids living in an orphanage. Nothing bad happens, and they all live very happy lives. You should
watch it!</p>
<p>Now that you’ve (probably) finished the series, everything below is going to contain spoilers.
Sorry.</p>
<p>From what we see of the “House,” the factory’s operation clearly mirrors Marx’s model of economics,
introduced in chapter 7 of <em>Capital</em>. In this chapter, Marx shows that “the elementary factors of
the labour-process are the personal activity of man, i.e., work itself, the subject of that work,
and, its instruments.” The “personal activity of man” here consists of children of the House going
about their day, who “setting in motion […] the natural forces of his body, in order to
appropriate Nature’s productions in a form adapted to his own wants.” <em>The Promised Neverland</em> takes
this quite literally as well, since the “personal activity” of the children, in studying and
playing, can increase the value of “the subject of that work,” which also happens to be the
children, except in an unalived form.</p>
<p>Marx describes instruments of labour as “a thing, or a complex of things, which the labourer
interposes between himself and the subject of his labour, and which serves as the conductor of his
activity.” In this case, we see that the house and its surroundings are quite literally the
instrument of labour, which allows the final goods (the children) to be “refined” from their raw
products.</p>
<p>Exploitation of our workers is also very clear-cut, with their deaths being the requirement for
production. We also see zero self-determinism afforded to the children, which mirrors Marx’s
principle for maximizing the productivity of workers as described in chapter 14 of <em>Capital</em>: “In
manufacture, in order to make the collective labourer, and through him capital, rich in social
productive power, each labourer must be made poor in individual productive powers.”</p>
<p>Class in this show is very clearly defined as well, with demons being the bourgeoisie and the humans
reflecting the proletariat. However, the show also broaches the concept of class traitors, in
Isabella and Krone. Despite being humans, which the show portrays as members of the proletariat,
they are actively working against the interests of their own species and serving the demons for
self-preservation. This mirrors modern-day examples of class traitors such as “the blue-collar man
who becomes a security guard employed to harass striking workers” (Ehrenreich 154).</p>
<p>Just one problem, though: <em>Capital</em>’s thesis operates on an economy where workers contribute labor
to goods and services. However, in this case, the workers <em>are</em> the product. This seemingly major
oversight highlights a central point of the show’s thesis: modern capitalism has reached a
point such that everything about us has been commodified, to the point where our entire selves are
also reduced to products, and every interaction is commodified or reduced to simply market logic.</p>
<p>We can see this commodification in many aspects of everyday life as well. Take, for example, the
ecosystem of messaging apps such as Facebook Messenger or WhatsApp. Message data in these apps are
aggregated and sold to advertisers (Doyle). To Doyle, this is exploitation of affective labor –
work that affects people’s emotions. While traditionally affective labor has been used to refer to
work done by those such as fast-food workers and flight attendants (Negri et al. 108), technology
has created a whole new industry for this type of work.</p>
<p>An increasingly widespread form of affective labor exploitation is seen in the dating app industry.
While the premium features offered by dating apps are a clear-cut example of exploitation of this
type of labor, Hobbs et al. examine a similar commodification done among users: “Tim’s use […]
denotes a sales technique designed to encourage other Tinder users to ‘buy’ the profile.”
Indirectly, this work done by the user also encourages more users to sign up, thus providing extra
value to the company through that user’s labor.</p>
<p>Similarly, we can see this use of affective labor as a prime method of production within the world
of <em>The Promised Neverland</em>. The quality, and thus the value, of the goods produced in the plants
(the children), is implied to be directly linked with their intelligence. Thus, it is in the plants’
interest to ensure that the maximal amount of affective labor is added to the children to increase
their final value. And thus we see that the method of production described in <em>Capital</em> is
replicated once again, but making use of affective labor rather than physical and mental labor.</p>
<p>Perhaps unintentionally, <em>The Promised Neverland</em>’s critique of capitalist modes of production hits
upon some interesting points regarding modern society. While methods of production are shifting,
increasingly many industries have shifted over to leveraging affective labor. However, the show
displays that exploitation of affective labor is still ultimately unethical, as the final product is
ultimately consumed by the system which created it.</p>
<h5 id="sources">Sources</h5>
<p>Doyle, K., 2015. Facebook, Whatsapp and the commodification of affective labour. <em>Communication,
politics & culture</em>, 48(1), pp.51-65.</p>
<p>Ehrenreich, Barbara. Fear of falling: The inner life of the middle class. Hachette UK, 2020.</p>
<p>Hobbs, Mitchell, Stephen Owen, and Livia Gerber. “Liquid love? Dating apps, sex, relationships and
the digital transformation of intimacy.” Journal of Sociology 53, no. 2 (2017): 271-284.</p>
<p>Marx, Karl. Capital: volume I. Vol. 1. Penguin UK, 2004.</p>
<p>Negri, Antonio, Michael Hardt, and David Camfield. “Multitude: war and democracy in the age of
empire.” <em>Labour</em> 56 (2005): 359.</p>Haoda Wangharry@h313.infoIn 2019, Fuji TV premiered The Promised Neverland, an anime series revolving around a group of kids living in an orphanage. Nothing bad happens, and they all live very happy lives. You should watch it!Where does Liyue’s food supply come from?2022-02-12T09:33:13+00:002022-02-12T09:33:13+00:00https://h313.info/2022/02/12/where-does-liyues-food-supply-come-from<p>Liyue is a fictional nation in MiHoYo’s 2020 ARPG Genshin Impact. In-game, the
nation consists of a large amount of uncultivated wilderness and ruins, in which
the player spends most of their time. However, three more locations of
importance also includes a large city, a border crossing, and an inn built into a
tree.</p>
<p>The primary city of Liyue, Liyue Harbor, is quite large and contains a
significant amount of NPCs. However, from the large number of guards both in and
out of the city, as well as from their interactions with the player character at
various locations, even within the wilderness, we can also safely assume that
these guards form part of a larger standing army. Due to the logistical issues
involved in fielding such an army, we can thus conclude that the population of
this nation must be larger than the number of NPCs within the country. This begs
the question: where does their food supply come from?</p>
<p><img src="https://h313.info/blog/assets/img/liyue.png" alt="Image of Liyue Harbor" /></p>
<p>To reason about Liyue’s food supply first requires us to find a suitable
analogue for this nation within our own history. First, we can reasonably assume
that Liyue is an analogue for China, given that the names of the playable
characters from this region are Pinyin for Chinese names (e.g. 甘雨 or Ganyu,
胡桃 or Hu Tao). Furthermore, we can accurately align the time period the player
experiences in Liyue to a time period within Chinese history with a variety of
techniques.</p>
<p>A significant clue to date Liyue can be found within the ships in the harbor. We
can see that the larger ships contain cannon ports. This is also confirmed in
the lore text for Beidou, another playable character, which states: “With
cannons and harpoons, arrows and ropes the fleet would assail Haishan.” The
existence of cannons of these ships rules out any possibility of Liyue existing
at a time before the Ming dynasty.</p>
<p>We can now use the architecture of Liyue Harbor to further narrow down its time
period. In particular, we notice significant usage of stone bricks within the
foundations of the buildings, as well as the walls surrounding the city.
Furthermore, we also see that the buildings above these foundations are built
with timber and feature extensive use of the hip-gable roof. This is highly
consistent with the Ming and Qing dynasty design styles studied in Chapter 14 of
Nancy Steinhardt’s “Chinese Architecture: A History.” Thus, we must date Liyue
to the late Ming or early Qing dynasty.</p>
<p>With this baseline, we can examine the socioeconomic situation of Liyue Harbor.
Zhihong Shi’s paper “Agricultural development in qing china: A quantitative
study” shows that around the time of the early Qing dynasty (1661 CE), 549
million mu, or 366,000 square kilometers of land was allocated as farmland.
Cross-referencing this with a population estimate of about 120 million, we get
roughly 4.6 mu, or 3,050 square meters of farmland used to sustain each person.</p>
<p>As Liyue is a major port city and trade hub, we can compare it to Shanghai,
which was also a major trade hub and shipping port during the Qing dynasty. At
this time, the province containing Shanghai had a population of 3,453,524
persons. Thus, using our figure for farmland sustaining each person from above,
we find that Liyue requires about 10,533 square kilometers of farmland to
support this population.</p>
<p>We also find that the existence of staple goods such as almond tofu, which
implies some amount of soy farming, or the “Lantern Rite Special Come and Get
It,” which includes noodles and thus implies some form of rice farming, confirms
that Liyue cannot rely on hunting and gathering for its food source.
Furthermore, Game text notes that Liyue has existed for thousands of years, and
sustaining such a large population without overhunting would be impossible.</p>
<p>While some amount of the food might come from trade, this cannot account for the
use of livestock as a source of food, since players can also cook a “Jueyun
Guoba” or a “Jueyun Chili Chicken” dish. In particular, the flavor text for the
chili chicken notes that it “[retains] the freshness of the delightful juice
contained within the chicken.” This implies that the chicken is fresh, and thus
must come from a nearby farm.</p>
<p>As we have seen, there must be some sort of large-scale agriculture happening
within Liyue to sustain the city. However, we see a curious lack of it in the
game world. This clearly resulted from gross mismanagement of the country’s
economy, and if not rectified soon may present a significant problem to the
stability of the country.</p>Haoda Wangharry@h313.infoLiyue is a fictional nation in MiHoYo’s 2020 ARPG Genshin Impact. In-game, the nation consists of a large amount of uncultivated wilderness and ruins, in which the player spends most of their time. However, three more locations of importance also includes a large city, a border crossing, and an inn built into a tree.Tweeting with LaTeX2021-06-15T16:00:00+00:002021-06-15T16:00:00+00:00https://h313.info/latex/2021/06/15/tweeting-with-latex<p>Some time ago, I stumbled across WolframConnect, a set of libraries for Mathematica that supported posting and querying from a bunch of social media sites. So I tried it on Twitter, and it worked pretty well. However, apprently apathetic to this discovery, someone mentioned that “I’ll be impressed when I can tweet in LaTeX.” I believe I can impress him.</p>
<h3 id="write18-and-immediate"><code class="language-plaintext highlighter-rouge">\write18</code> and <code class="language-plaintext highlighter-rouge">\immediate</code></h3>
<p>LaTeX provides a set of output streams that can be written to using the <code class="language-plaintext highlighter-rouge">\write</code> command. Of particular interest here is stream 18, which can directly write a command to the system shell. Due to the vulnerabilities that could be exploited due to this, most TeX distributions require you to add the <code class="language-plaintext highlighter-rouge">--shell-escape</code> argument before accepting <code class="language-plaintext highlighter-rouge">\write18</code> commands.</p>
<p>We will also need to force the <code class="language-plaintext highlighter-rouge">\write</code> to run once the parser reaches it rather than when the page is finished being created. This is where the <code class="language-plaintext highlighter-rouge">\immediate</code> command comes in. So we could run something like <code class="language-plaintext highlighter-rouge">cowsay</code> in LaTeX by simply adding a line <code class="language-plaintext highlighter-rouge">\immediate\write18{cowsay yes}</code>. When compiling the TeX file, this will then happen:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>❯ pdflatex --shell-escape meme.tex
This is pdfTeX, Version 3.141592653-2.6-1.40.22 (TeX Live 2021/Arch Linux) (preloaded format=pdflatex)
\write18 enabled.
entering extended mode
(./meme.tex
LaTeX2e <2020-10-01> patch level 4
L3 programming layer <2021-02-18>
(/usr/share/texmf-dist/tex/latex/base/article.cls
Document Class: article 2020/04/10 v1.4m Standard LaTeX document class
(/usr/share/texmf-dist/tex/latex/base/size11.clo)) _____
< yes >
-----
\ ^__^
\ (oo)\_______
(__)\ )\/\
||----w |
|| ||
(/usr/share/texmf-dist/tex/latex/l3backend/l3backend-pdftex.def) (./meme.aux)
[1{/var/lib/texmf/fonts/map/pdftex/updmap/pdftex.map}] (./meme.aux) )</usr/shar
e/texmf-dist/fonts/type1/public/amsfonts/cm/cmbx12.pfb></usr/share/texmf-dist/f
onts/type1/public/amsfonts/cm/cmr10.pfb></usr/share/texmf-dist/fonts/type1/publ
ic/amsfonts/cm/cmr12.pfb></usr/share/texmf-dist/fonts/type1/public/amsfonts/cm/
cmr17.pfb>
Output written on meme.pdf (1 page, 40293 bytes).
Transcript written on meme.log.
</code></pre></div></div>
<h3 id="twitters-api">Twitter’s API</h3>
<p>Requesting an API key was easy enough, though if a Twitter employee goes through the list of reasons why somebody would request an API key they would also happen to find “I’m trying to prove someone wrong by tweeting in LaTeX” now. However, Twitter’s API uses OAuth which is famously known to be a pain to deal with by hand. So we’ll use the <a href="https://github.com/twitter/twurl">Twurl</a> package, which provides a <code class="language-plaintext highlighter-rouge">curl</code>-like interface to Twitter’s API.</p>
<p>To log in, we will run <code class="language-plaintext highlighter-rouge">twurl authorize --consumer-key <key> --consumer-secret <secret></code> and follow the instructions to generate the require Oauth keys. That’s now enough for us to call a <code class="language-plaintext highlighter-rouge">twurl</code> instruction to post a tweet.</p>
<h3 id="the-tex-file">The TeX File</h3>
<p>The final thing we’ll have to do is call <code class="language-plaintext highlighter-rouge">twurl</code> on <code class="language-plaintext highlighter-rouge">/1.1/statuses/update.json</code> as a <code class="language-plaintext highlighter-rouge">\write18</code> command in LaTeX. So we can create a <code class="language-plaintext highlighter-rouge">nice.tex</code> file like the following:</p>
<div class="language-latex highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">\documentclass</span><span class="na">[11pt]</span><span class="p">{</span>article<span class="p">}</span>
<span class="k">\title</span><span class="p">{</span>Just a Tweet<span class="p">}</span>
<span class="k">\author</span><span class="p">{</span>some guy<span class="p">}</span>
<span class="k">\date</span><span class="p">{}</span>
<span class="k">\newcommand</span><span class="p">{</span><span class="k">\tweetText</span><span class="p">}{</span>Are you impressed now?<span class="p">}</span>
<span class="k">\newcommand</span><span class="p">{</span><span class="k">\inReplyToStatusId</span><span class="p">}{</span><tweet<span class="p">_</span>id><span class="p">}</span>
<span class="k">\newcommand</span><span class="p">{</span><span class="k">\username</span><span class="p">}{</span><reply<span class="p">_</span>to<span class="p">_</span>username><span class="p">}</span>
<span class="k">\immediate\write</span>18<span class="p">{</span>twurl -r 'status=are you impressed now <span class="k">\username</span><span class="p">&</span>username=<span class="k">\username</span><span class="p">&</span>in<span class="p">_</span>reply<span class="p">_</span>to<span class="p">_</span>status<span class="p">_</span>id=<span class="k">\inReplyToStatusId</span>' /1.1/statuses/update.json<span class="p">}</span>
<span class="c">%\immediate\write18{curl -XPOST --url 'https://api.twitter.com/1.1/statuses/update.json?status=\tweetText' --header 'authorization: OAuth oauth_consumer_key="\oauthCustomerKey", oauth_nonce="\generatedOauthNonce", oauth_signature="\generatedOauthSignature", oauth_signature_method="HMAC-SHA1", oauth_timestamp="\generatedTimestamp", oauth_token="\oauthToken", oauth_version="1.0"'}</span>
<span class="nt">\begin{document}</span>
<span class="k">\maketitle</span>
<span class="k">\section</span><span class="p">{</span>Introduction<span class="p">}</span>
Here's what we tweeted: <span class="k">\tweetText</span>
<span class="nt">\end{document}</span>
</code></pre></div></div>
<p>And then we’ll run it through TeX to get a PDF with <code class="language-plaintext highlighter-rouge">pdflatex --shell-escape nice.tex</code>, which results in our tweet and an informative PDF:</p>
<p><img src="https://h313.info/blog/assets/img/tweet.png" alt="Tweet and PDF output" /></p>Haoda Wangharry@h313.infoSome time ago, I stumbled across WolframConnect, a set of libraries for Mathematica that supported posting and querying from a bunch of social media sites. So I tried it on Twitter, and it worked pretty well. However, apprently apathetic to this discovery, someone mentioned that “I’ll be impressed when I can tweet in LaTeX.” I believe I can impress him.Examining Air Pollution in Los Angeles2021-04-21T13:00:00+00:002021-04-21T13:00:00+00:00https://h313.info/2021/04/21/examining-air-pollution-in-los-angeles<p>Study after study has shown the relationship between environmental pollution and the population’s
wealth and race. For example, Andrew Hurley’s study of pollution in Gary, Indiana found that “The
skewed social distribution of toxic waste sites represented the most marked example of an
environmental regime that discriminated along the lines of race and class” (Hurley 172). Another
study of the same effect in a Chinese province showed that “townships in Jiangsu province with large
populations of rural migrants are disproportionately exposed to industrial pollution” (Schoolman).
The effect wealth has on a person’s environment is an effect that reaches across cultures and
countries. However, very few have examined these effects in Los Angeles county. Thus, we will take
a look at the pollution levels around various areas of the county and examine if the same effect
will be present here, and speculate on the causes of it.</p>
<p>In this examination of pollution we will first need to define a measurable index for pollution.
Luckily, the US Environmental Protection Agency provides indices to quantify the amount of different
particle pollutants in the air (George et al.). The primary measurement we will pull from will be
the PM2.5 index, which indicates the density of fine particulates in the air that are 2.5
micrometers or smaller. We will also make use of indexes provided by the National Air Toxics
Assessment (NATA), which measure the amount of toxins in the air (George et al.).</p>
<p>The EPA provides data on these pollutants divided into census blocks, as determined by the US
Census. The Census (and all federal agencies) divides and identifies geographical areas of the US
using ANSI codes, which are unique 12-digit codes representing a geographical area. The first two
numbers represent the state, and the next three a county within that state. The remaining 7 numbers
represent an area within that county. In this case, we will focus on the ANSI codes within Los
Angeles County, of which there are 6,425. These begin with the county’s code, <code class="language-plaintext highlighter-rouge">06037</code>.</p>
<h3 id="the-dataset">The Dataset</h3>
<p>The EPA packges the data discussed above into a user-friendly map-based format called the EJSCREEN
Mapping Tool, which is available <a href="https://ejscreen.epa.gov/mapper/">here</a> (Corrales). The following
image uses this tool to overlay the hazardous waste proximity of various census tracts in Los
Angeles County to their population densities.</p>
<p><img src="https://h313.info/blog/assets/img/ejscreen.png" alt="EJSCREEN screenshot" /></p>
<p>While this map is easy for a human to use, this map interface is unreadable to a machine, which
makes it hard to draw statistical conclusions from. Thus, we will use the raw data from the
<code class="language-plaintext highlighter-rouge">EJSCREEN_2020_USPR.csv</code> file available <a href="https://www.epa.gov/ejscreen/download-ejscreen-data">here</a>,
which contains absolute numerical data for each census block.</p>
<p>This data table contains data for every single county in every single state, which is far more data
than we need. Therefore, we can run the following to import the CSV as a <code class="language-plaintext highlighter-rouge">pandas</code> dataframe and then
filter out ANSI codes not in Los Angeles county:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">pandas</span> <span class="k">as</span> <span class="n">pd</span>
<span class="c1"># low_memory=False is needed as it's a pretty big CSV
</span><span class="n">df</span> <span class="o">=</span> <span class="n">pd</span><span class="p">.</span><span class="n">read_csv</span><span class="p">(</span><span class="s">'EJSCREEN_2020_USPR.csv'</span><span class="p">,</span> <span class="n">low_memory</span><span class="o">=</span><span class="bp">False</span><span class="p">)</span>
<span class="n">df</span> <span class="o">=</span> <span class="n">df</span><span class="p">.</span><span class="n">loc</span><span class="p">[(</span><span class="n">df</span><span class="p">[</span><span class="s">'ID'</span><span class="p">]</span> <span class="o">></span> <span class="mi">60370000000</span><span class="p">)</span> <span class="o">&</span> <span class="p">(</span><span class="n">df</span><span class="p">[</span><span class="s">'ID'</span><span class="p">]</span> <span class="o"><</span> <span class="mi">60380000000</span><span class="p">)]</span>
</code></pre></div></div>
<p>Here’s what the data looks like when we’ve filtered it:</p>
<p><img src="https://h313.info/blog/assets/img/ejscreen_la.png" alt="los angeles jupyter data" /></p>
<p>This table includes both pollutant data including PM2.5 and NATA indexes, in addition to
demographics data such as percentage of the residents who are of color, and the percentage of
residents under the poverty line. We’ll examine this data for insights.</p>
<h3 id="demographic-analysis">Demographic Analysis</h3>
<p>Let’s start by looking at the demographics of the people who live within Los Angeles. We’ll look at
the <code class="language-plaintext highlighter-rouge">MINORPCT</code> and <code class="language-plaintext highlighter-rouge">LOWINCPCT</code> columns in the dataset, which measure the percentage of the
population who are minorities, as well as the percentage of the population who are low income. We
can’t use the raw counts of these values, since these counts will tend to increase as the population
increases, while percentages will stay relatively constant.</p>
<p>Running a regression will show if there’s a statistical relationship between these two variables. In
this case, we’ll do a linear regression, which will show if the data tends to trend in a direction.
This can be done easily using the <code class="language-plaintext highlighter-rouge">statsmodels</code> package:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="k">as</span> <span class="n">sm</span>
<span class="n">MINORPCT</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'MINORPCT'</span><span class="p">]].</span><span class="nb">apply</span><span class="p">(</span><span class="n">pd</span><span class="p">.</span><span class="n">to_numeric</span><span class="p">).</span><span class="n">values</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">LOWINCPCT</span> <span class="o">=</span> <span class="n">df</span><span class="p">[[</span><span class="s">'LOWINCPCT'</span><span class="p">]].</span><span class="nb">apply</span><span class="p">(</span><span class="n">pd</span><span class="p">.</span><span class="n">to_numeric</span><span class="p">).</span><span class="n">values</span><span class="p">.</span><span class="n">reshape</span><span class="p">(</span><span class="o">-</span><span class="mi">1</span><span class="p">,</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">ols</span> <span class="o">=</span> <span class="n">sm</span><span class="p">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">LOWINCPCT</span><span class="p">,</span> <span class="n">MINORPCT</span><span class="p">)</span>
<span class="n">result</span> <span class="o">=</span> <span class="n">ols</span><span class="p">.</span><span class="n">fit</span><span class="p">()</span>
<span class="n">prediction</span> <span class="o">=</span> <span class="n">result</span><span class="p">.</span><span class="n">predict</span><span class="p">(</span><span class="n">MINORPCT</span><span class="p">)</span>
</code></pre></div></div>
<p>Once we have a function, we can graph it with the <code class="language-plaintext highlighter-rouge">matplotlib</code> library and the following snippet:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">import</span> <span class="nn">matplotlib.pyplot</span> <span class="k">as</span> <span class="n">plt</span>
<span class="n">plt</span><span class="p">.</span><span class="n">figure</span><span class="p">(</span><span class="n">figsize</span><span class="o">=</span><span class="p">(</span><span class="mi">25</span><span class="p">,</span> <span class="mi">15</span><span class="p">))</span>
<span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">MINORPCT</span><span class="p">,</span> <span class="n">LOWINCPCT</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">scatter</span><span class="p">(</span><span class="n">MINORPCT</span><span class="p">,</span> <span class="n">prediction</span><span class="p">,</span> <span class="n">color</span><span class="o">=</span><span class="s">'red'</span><span class="p">)</span>
<span class="n">plt</span><span class="p">.</span><span class="n">show</span><span class="p">()</span>
</code></pre></div></div>
<p><img src="https://h313.info/blog/assets/img/la_race_and_income.png" alt="graph of race and income" /></p>
<p>With a mean-squared error of just 883.80, we can safely conclude that a person’s race in Los Angeles
correlates with their wealth. In the same vein, a Duke University study also “revealed major
disparities in wealth accumulation across various racial and ethnic groups in Los Angeles” (De La
Cruz-Viesca 5). The study also shows that the median net worth of Mexicans and blacks in the county
is between $3500 and $4000, while the median net worth of a white household is #355,000, a whole 2
degrees of magnitude higher (De La Cruz-Viesca et al. 40).</p>
<h3 id="airborne-pollutant-analysis">Airborne Pollutant Analysis</h3>
<p>We can now also check for correlations between environmental pollution, race, and class. We’ll start
with the PM2.5 index which we discussed above. This data is stored in the <code class="language-plaintext highlighter-rouge">PM25</code> statistic. Also, we
will use a <em>demographic index</em> calculated by EJSCREEN to be the average of the low income percentage
and the minority population percentage. Graphing the relation between the PM2.5 index and this new
index, we can see that this data is significantly messier than before, with nearly equal
distributions of particulate matter pollution throughout the county.</p>
<p><img src="https://h313.info/blog/assets/img/ejscreen_pm25.png" alt="los angeles jupyter data" /></p>
<p>This is to be expected however, since air pollution doesn’t usually to stay put in a single area,
but rather tends to move along the wind (Lu et al. 1500). In Los Angeles’ case, these winds tend to
blow East from the Pacific Ocean towards Riverside county (Lu et al. 1504). Such a pattern can be
seen when the data is examined on a map of Los Angeles, as shown below. Coastal areas are
significantly less polluted than their inland counterparts, with particulate matter levels seeing a
large drop across mountainous areas.</p>
<p><img src="https://h313.info/blog/assets/img/ejscreen_pm25_screenshot.png" alt="los angeles ejscreen data" /></p>
<p>This lack of correlation between race and class for airbone pollution plays out for every statistic
provided by the EPA on airbone pollution. For example, the EPA detected significantly higher amounts
of diesel particulates in the Long Beach Dock area, which is most likely due to the high amount of
shipping and diesel-based container trucks passing through the port. Similarly, the NATA respriatory
hazard and cancer risk indices almost perfectly overlap, with significantly higher results in areas
of heavy vehicle traffic.</p>
<p>Note that this analysis only examinines data from 2020. A similar analysis of NATA data from 1996
found a significant correlation between air pollution, wealth, and minority population (Pastor et
al. 144). The significant decrease in airbone pollutant concentrations may be due to international
treaties aiming to limit emissions adopted by the US in the period, such as the Kyoto Protocol and
the Paris Accords.</p>
<h3 id="other-hazardous-pollutants">Other Hazardous Pollutants</h3>
<p>Of course, air pollution is far from the only type of pollution posing a risk to our health. The EPA
also provides data on each census tract’s proximity to various areas posing risks to the population,
including:</p>
<ul>
<li>Treatment Storage and Disposal (TSDF) facilities, which is just a fancy word for landfill</li>
<li>Risk Management Plan (RMP) facilities, which are locations that deal with highly hazardous
substances</li>
<li>National Priorities List (NPL) sites, which are hazardous waste cleanup sites being cleaned up
under the Superfund program. They are more commonly known as “Superfund sites.”</li>
</ul>
<p>When we run the same analysis we did in the last section here, we see a much stronger correlation
between distance from these locations and the demographic index.</p>
<p><img src="https://h313.info/blog/assets/img/ejscreen_land_pollutants.png" alt="los angeles jupyter data" /></p>
<p>Let’s take a closer look at the Montrose Chemical Co. Superfund site. This was the site of a DDT
manufacturing facility from 1947 to 1982 which contaminated the soil and groundwater in the
surrounding neighborhood (US Environmental Protection Agency). An <em>extremely interesting</em> fact to
note is that the former site of this plant corresponds almost perfectly with a marked increase in
the demographic index of the area.</p>
<p><img src="https://h313.info/blog/assets/img/ejscreen_superfund_comparison.png" alt="los angeles jupyter data" /></p>
<p>The same effect can be seen in multiple other Superfund sites, including the three sites in the
city of South Gate, just southeast of downtown Los Angeles. There, industrial activities polluted
the surrounding area, which also house a significant minority population.</p>
<p>This correlation can be traced back to the same phenomenon that caused the demographics of Compton
to shift from a middle-class white population to a lower-class minority population in the 1950s.
Compton was close to many of the factory jobs that sustained the middle class of the city, and
middle-class black workers began moving in to the area, while white workers fled for farther
suburbs. As the Superfund program was only established in 1980, it would follow that most of the
superfund sites would be in industrial areas, and since minority populations were more likely to
live near these areas they would also be more likely to be affected by these sites.</p>
<p>Looking at RMP and TSDP facilities tells a more nuanced story. There are only 16 sites that process
waste recieved from other areas in the entire county, all of which are in areas with a high
demographic index (Envirofact). This obviously cannot be a coincidence. One factor that may be
causing this may be that riskier jobs are taken by those with less schooling, which is strongly
negatively correlated with being a minority (Leigh 63). It may be that poorer minorities are forced
to take jobs in places that are at a higher risk of contaminating the area, and to live near these
jobs.</p>
<h3 id="conclusion">Conclusion</h3>
<p>We have shown that there is a strong negative correlation between wealth and minority status caused
by a hsitory of discrimination and racism. Furthermore, atmospheric pollution in Los Angeles is
uniform across the county due to Eastward winds that disperse strong concentrations of it.
Furthermore, we have also discovered a significantly higher low-income and minority population near
RMP, TSDP, and NPL sites with dangerous levels of pollution, which may either be due to historical
demographic shifts that coincided with a period of industrial irresponsibility or be the inevitable
outcome of an economic and governmental system that proritizes white working power.</p>
<h3 id="works-cited">Works Cited</h3>
<p>Lu, Rong, and Richard P. Turco. “Air pollutant transport in a coastal environment—II.
Three-dimensional simulations over Los Angeles basin.” <em>Atmospheric Environment</em> 29, no. 13 (1995):
1499-1518.</p>
<p>Corrales, Mark. “EJSCREEN: EPA’s Environmental Justice Mapping Tool.” In <em>APHA 2016 Annual Meeting &
Expo (Oct. 29-Nov. 2, 2016)</em>. American Public Health Association, 2016.</p>
<p>De La Cruz-Viesca, Melany, Zhenxiang Chen, Paul M. Ong, Darrick Hamilton, and William A. Darity Jr.
“The Color of Wealth in Los Angeles.” <em>Durham, NC/New York/Los Angeles: Duke University/The New
School/University of California, Los Angeles</em> (2016).</p>
<p>“Envirofact.” EPA. US Environmental Protection Agency. https://enviro.epa.gov/index.html.</p>
<p>George, Barbara Jane, Bradley D. Schultz, Ted Palma, Alan F. Vette, Donald A. Whitaker, and Ronald
W. Williams. “An evaluation of EPA’s National-Scale Air Toxics Assessment (NATA): comparison with
benzene measurements in Detroit, Michigan.” <em>Atmospheric Environment</em> 45, no. 19 (2011): 3301-3308.</p>
<p>Hurley, Andrew. <em>Environmental inequalities: Class, race, and industrial pollution in Gary, Indiana,
1945-1980</em>. Univ of North Carolina Press, 1995.</p>
<p>Leigh, J. Paul. “Who chooses risky jobs?.” Social Science & Medicine 23, no. 1 (1986): 57-64.</p>
<p>Office of Environmental Justice, EJSCREEN Environmental Justice Mapping and Screening Tool: EJSCREEN
Technical Documentation § (2019).</p>
<p>US Environmental Protection Agency, Montrose & Del Amo Superfund Sites Fact Sheet § (2018).</p>
<p>US Environmental Protection Agency, South Gate Superfund Sites Fact Sheet § (2017).</p>
<p>Pastor Jr, Manuel, Rachel Morello-Frosch, and James L. Sadd. “The air is always cleaner on the other
side: Race, space, and ambient air toxics exposures in California.” <em>Journal of urban affairs</em> 27,
no. 2 (2005): 127-148.</p>
<p>Schoolman, Ethan D., and Chunbo Ma. “Migration, class and environmental inequality: Exposure to
pollution in China’s Jiangsu Province.” <em>Ecological Economics 75</em> (2012): 140-151.</p>
<p><em>This post was written as part of the final project for USC’s AMST 101 class.</em></p>Haoda Wangharry@h313.infoStudy after study has shown the relationship between environmental pollution and the population’s wealth and race. For example, Andrew Hurley’s study of pollution in Gary, Indiana found that “The skewed social distribution of toxic waste sites represented the most marked example of an environmental regime that discriminated along the lines of race and class” (Hurley 172). Another study of the same effect in a Chinese province showed that “townships in Jiangsu province with large populations of rural migrants are disproportionately exposed to industrial pollution” (Schoolman). The effect wealth has on a person’s environment is an effect that reaches across cultures and countries. However, very few have examined these effects in Los Angeles county. Thus, we will take a look at the pollution levels around various areas of the county and examine if the same effect will be present here, and speculate on the causes of it.Improve Software Debugging with Binary Analysis2020-11-06T17:33:13+00:002020-11-06T17:33:13+00:00https://h313.info/cpp/security/binary-analysis/2020/11/06/improve-software-debugging-with-binary-analysis<p>One of the seriously underutilized tools of the trade in the software development world, at least in my experience, has been binary analysis. We have linters, unit tests, correctness proofs, and static analysis tools to help catch bugs in our software. However, when a bug inevitably pops up that escapes all these checks, it could be hard to fix. Binary analysis can enhance our debugging toolkit by catching bugs that stem from the compiler. While most binary analysis is done in the field of security, many of those principles can be brought into normal software development to fix hard-to-detect problems as well.</p>
<p>For example, I came across a problem recently where a program’s floating-point output would differ across different compilers and operating systems. The code compiled in both cases were the same, except that the computers were running dfferent operating systems, and had different versions of the compiler. While we only had support for one operating system, we also used the other operating system for testing purposes, since it had packages in its repo that we needed and weren’t on the other one. So, Dockerizing the program was not practical.</p>
<p>Since the only major difference between the two builds was the compiler and CPU, I suspected that it had something to do with the optimization flags, and the binary that the compiler creates. So, I opened up a binary analysis software to take a look at the offending <code class="language-plaintext highlighter-rouge">.so</code> files.</p>
<p>The two files were virtually identical, with similar control flow graphs and instruction sets. What stood out, however, was that the original binary was using lots of <code class="language-plaintext highlighter-rouge">movapd</code> where the other binary was using <code class="language-plaintext highlighter-rouge">fmul</code>. Hmm, <code class="language-plaintext highlighter-rouge">movapd</code> is a <a href="https://en.wikipedia.org/wiki/SSE2">SSE2 instruction</a>, while <code class="language-plaintext highlighter-rouge">fmul</code> is a <a href="https://en.wikipedia.org/wiki/X87">x87 instruction</a>. SSE2 is a much newer (and faster, if used correctly) instruction set than x87, with the former being released in 2000 and the latter being released in 1980. It seems like our two compiler versions had understood <code class="language-plaintext highlighter-rouge">-O2</code> to mean different things.</p>
<p>As it turns out, GCC automatically enables SSE2 instructions for 64-bit programs with the argument <code class="language-plaintext highlighter-rouge">-msse2</code>. However, since we were building a 32-bit program, it defaulted to x87 instructions. This makes sense for general purpose computing, since the x86 was introduced in 1985, and there’s no guarantee that the processor would support the instrution set if it was 32-bit. Since the first AMD64 processor was released in 2003, and the first IA-64 processor in 2001, it was reasonable to assume that an IA-64 or AMD64 CPU would be able to handle SSE2. The difference here arises from how SSE2 calculates floating point operations at double precision, but x87 uses higher precision 80-bit precision values for intermediate operations.</p>
<p>This was a long aside, but this little bug was easily solved by adding a <code class="language-plaintext highlighter-rouge">-msse2</code> to all builds, since we were <em>pretty</em> sure that none of our machines was running on a pre-2000 CPU. However, we would’ve never found this bug, and probably would have resorted to building all our dependencies manually, if we didn’t realize that the compilers created different binaries. So, hopefully having convinced you of the importance of binary analysis, I’ll do my best to run through the basics of it.</p>
<h3 id="frameworks-apps-and-more">Frameworks, apps, and more</h3>
<p>There’s quite a few binary analysis frameworks out there, each with their strengths. Some of the more common ones are:</p>
<ul>
<li><a href="https://www.hex-rays.com/products/ida/">IDA Pro</a>, a popular and powerful proprietary tool with a limited free version. It’s the first tool I use to analyze a binary, though often I will have to use another tool to continue working with it, since the free version doesn’t have a decompiler. I’ve heard good things about the paid addons though.</li>
<li><a href="https://ghidra-sre.org/">Ghidra</a>, a software reverse engineering tool with a very good decompiler. I also really like the binary diff feature, which works with assembly instructions. It’s made by the NSA.</li>
<li><a href="https://angr.io/">angr</a>, which is more of a Python library that happens to work well in a Python shell too. It’s great for writing automated code that works with binary code. It also has a very powerful solver engine that allows you to figure out the inputs of a program given the output. There’s a GUI frontend in development now, but it’s in its early stages.</li>
<li><a href="https://www.radare.org/">radare2</a>, which works very well in a CLI, and happens to have a GUI frontend called Cutter. I use it to run through code with the debugging feature.</li>
</ul>
<p>While I personally use these tools for some specific features that they have, each of these tools have nearly the same functionality, though it may buried behind a different menu bar or shell command. It’s probably best to try each one out, and select the ones that you like best.</p>
<h3 id="the-binary">The Binary</h3>
<p>The binary output of <code class="language-plaintext highlighter-rouge">gcc</code> or <code class="language-plaintext highlighter-rouge">clang</code> (or any compiler that compiles to assembly) is an <a href="https://en.wikipedia.org/wiki/Executable_and_Linkable_Format">ELF executable</a>. By default this executable will be named <code class="language-plaintext highlighter-rouge">a.out</code>, but that can be overridden with <code class="language-plaintext highlighter-rouge">-o</code>. Simply put, ELF is a common standard for how data is arranged in a binary program that divides the file into segments. These include (in order):</p>
<ul>
<li>The <strong>ELF Header</strong>, which contains basic information about the executable such as memory segments, sections, and other essential information.</li>
<li>The <strong>Program Header</strong>, which contains information about how to execute the program, such as where to load the instructions and the size of it.</li>
<li><strong>Sections</strong>, which are of different types. For example, the <code class="language-plaintext highlighter-rouge">.data</code> section contains numerical constants, while the <code class="language-plaintext highlighter-rouge">.rodata</code> section contains constant strings. <code class="language-plaintext highlighter-rouge">.bss</code> sections contain static variables that have not been initialized. The most important section is the <code class="language-plaintext highlighter-rouge">.text</code> section, which contains the program’s assembly instructions.
<ul>
<li><strong>Debug Sections</strong> can be found in some ELF executables as well. These are DWARF debug sections that can be used with GDB or Valgrind. They’re identified with headers named <code class="language-plaintext highlighter-rouge">.debug_*</code>.</li>
</ul>
</li>
</ul>
<p><a href="https://github.com/corkami">@corkami</a> created a very nice visualization of the layout of an ELF file here:</p>
<p><a href="https://h313.info/blog/assets/img/elf_poster.png"><img src="https://h313.info/blog/assets/img/elf_poster.png" alt="" /></a></p>
<p>Program files are not the only type of files that utilize ELF. Library files with <code class="language-plaintext highlighter-rouge">.so</code> and <code class="language-plaintext highlighter-rouge">.a</code> extensions also use ELF to arrange their data. This format plays and important part in getting the dynamic linker to work properly, by listing the required libraries in the <code class="language-plaintext highlighter-rouge">.interp</code> section.</p>
<p>By looking into specific sections of the ELF executable, we can find constants that are used in the code, as well as check for debug data in the executable.</p>
<h3 id="the-control-flow-graph">The Control Flow Graph</h3>
<p>What we saw above was the physical layout of the executable, which is how it is stored on the disk. However, a more interesting layout of the executable (at least for binary analysis) is the program’s control flow graph, or CFG. The CFG is a representation of the program’s behaviour, with blocks of commands as nodes, and instructions like <code class="language-plaintext highlighter-rouge">jmp</code>, <code class="language-plaintext highlighter-rouge">return</code>, and <code class="language-plaintext highlighter-rouge">call</code> being the edges. This gives us an easy way to understand the behaviour of the program, and we can often match functions to nodes in the program.</p>
<p>For example, take a look at this program, compiled with <code class="language-plaintext highlighter-rouge">g++</code>:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <iostream>
#include <cstdlib>
</span>
<span class="kt">void</span> <span class="nf">do_thing</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"thing"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">void</span> <span class="nf">do_other_thing</span><span class="p">()</span> <span class="p">{</span>
<span class="n">std</span><span class="o">::</span><span class="n">cout</span> <span class="o"><<</span> <span class="s">"beep boop"</span> <span class="o"><<</span> <span class="n">std</span><span class="o">::</span><span class="n">endl</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="c1">// Generate and compare 2 random numbers</span>
<span class="k">auto</span> <span class="n">a</span> <span class="o">=</span> <span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">auto</span> <span class="n">b</span> <span class="o">=</span> <span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="n">a</span> <span class="o"><</span> <span class="n">b</span><span class="p">)</span>
<span class="n">do_thing</span><span class="p">();</span>
<span class="k">else</span>
<span class="n">do_other_thing</span><span class="p">();</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>When we open the control flow graph of <code class="language-plaintext highlighter-rouge">main()</code> in IDA, we see this:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/gcc_default.png" alt="" /></p>
<p>You can see how each set of instructions generates <code class="language-plaintext highlighter-rouge">a</code> and <code class="language-plaintext highlighter-rouge">b</code>, and particularly the call to <code class="language-plaintext highlighter-rouge">rand()</code> with the <code class="language-plaintext highlighter-rouge">call _rand</code> assembly instruction. We also see that the function splits into two at the if statement, jumping to either calling the <code class="language-plaintext highlighter-rouge">_do_thing</code> or <code class="language-plaintext highlighter-rouge">do_other_thing()</code> function, and then finally returning and ending the program after the function is called. The CFG is the easiest way to examine how an unknown binary works, and provides good intuition into what a program does.</p>
<p>Now, let’s see what happens when we run it again, but this time with <code class="language-plaintext highlighter-rouge">g++ -O3</code>:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/gcc_o3.png" alt="" /></p>
<p>This time, we see that the random number generation takes a lot less instructions than before. In this case, <code class="language-plaintext highlighter-rouge">-O3</code> compared the two random numbers in the stack, instead of saving them into storage and then loading them again. That gets rid of the <code class="language-plaintext highlighter-rouge">mov</code> commands using the <code class="language-plaintext highlighter-rouge">rbp</code> registers, so in essense <code class="language-plaintext highlighter-rouge">main()</code> in the assembly code has become:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">if</span> <span class="p">(</span><span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">1</span> <span class="o"><</span> <span class="n">rand</span><span class="p">()</span> <span class="o">%</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">do_thing</span><span class="p">();</span>
<span class="k">else</span>
<span class="nf">do_other_thing</span><span class="p">();</span>
</code></pre></div></div>
<p>Let’s take another look at it, this time using the Intel C++ Compiler. We compile it this time with <code class="language-plaintext highlighter-rouge">icpc -march=icelake -O3</code>:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/icpc_o3.png" alt="" /></p>
<p>We can see that this time, our compiler inlined the single line <code class="language-plaintext highlighter-rouge">do_thing()</code> and <code class="language-plaintext highlighter-rouge">do_other_thing()</code> functions. We also see it using registers like <code class="language-plaintext highlighter-rouge">r9d</code>, which are 64-bit only, and an AVX instruction too, <code class="language-plaintext highlighter-rouge">vstmxcsr</code>. Intel’s compiler does this often since it’s used heavily in high performance computing.</p>
<p>These three compilations all ended up with different binaries, but as we can see the control flow graphs of these programs are almost the same. This shows why the CFG is so important in binary analysis. Not only does it make assembly easier to read, but we can spot similar functions using this as well.</p>
<h3 id="analyzing-a-real-cfg">Analyzing a Real CFG</h3>
<p>I’m going to analyze <a href="https://yx7.cc/code/">hyx</a>, a command-line hex editor with vim keybindings, as an example here. This is mostly due to the small size of the code base, the fact that the source code is in C so it’s easier to analyze, and the fact that it compiles into a single binary with no library dependencies. To start, we’ll take a look at the ELF headers with <code class="language-plaintext highlighter-rouge">readelf</code>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~/hyx-2020.06.09
⟩ readelf -hl hyx
ELF Header:
Magic: 7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
Class: ELF64
Data: 2's complement, little endian
Version: 1 (current)
OS/ABI: UNIX - System V
ABI Version: 0
Type: DYN (Shared object file)
Machine: Advanced Micro Devices X86-64
Version: 0x1
Entry point address: 0xd450
Start of program headers: 64 (bytes into file)
Start of section headers: 231864 (bytes into file)
Flags: 0x0
Size of this header: 64 (bytes)
Size of program headers: 56 (bytes)
Number of program headers: 11
Size of section headers: 64 (bytes)
Number of section headers: 34
Section header string table index: 33
Program Headers:
Type Offset VirtAddr PhysAddr
FileSiz MemSiz Flags Align
PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
0x0000000000000268 0x0000000000000268 R 0x8
INTERP 0x00000000000002a8 0x00000000000002a8 0x00000000000002a8
0x000000000000001c 0x000000000000001c R 0x1
[Requesting program interpreter: /lib64/ld-linux-x86-64.so.2]
LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x000000000000caf0 0x000000000000caf0 R 0x1000
LOAD 0x000000000000d000 0x000000000000d000 0x000000000000d000
0x0000000000011cf1 0x0000000000011cf1 R E 0x1000
LOAD 0x000000000001f000 0x000000000001f000 0x000000000001f000
0x0000000000002110 0x0000000000002110 R 0x1000
LOAD 0x0000000000021b88 0x0000000000022b88 0x0000000000022b88
0x000000000000a600 0x000000000000a858 RW 0x1000
DYNAMIC 0x0000000000021b98 0x0000000000022b98 0x0000000000022b98
0x0000000000000200 0x0000000000000200 RW 0x8
NOTE 0x00000000000002c4 0x00000000000002c4 0x00000000000002c4
0x0000000000000044 0x0000000000000044 R 0x4
GNU_EH_FRAME 0x000000000001ffdc 0x000000000001ffdc 0x000000000001ffdc
0x000000000000033c 0x000000000000033c R 0x4
GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
0x0000000000000000 0x0000000000000000 RW 0x10
GNU_RELRO 0x0000000000021b88 0x0000000000022b88 0x0000000000022b88
0x0000000000000478 0x0000000000000478 R 0x1
Section to Segment mapping:
Segment Sections...
00
01 .interp
02 .interp .note.gnu.build-id .note.ABI-tag .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt
03 .init .plt .text .fini
04 .rodata .eh_frame_hdr .eh_frame
05 .init_array .fini_array .dynamic .got .data .bss
06 .dynamic
07 .note.gnu.build-id .note.ABI-tag
08 .eh_frame_hdr
09
10 .init_array .fini_array .dynamic .got
</code></pre></div></div>
<p>We quickly see that this file is dynamically linked, since we have a <code class="language-plaintext highlighter-rouge">.dynsym</code>, <code class="language-plaintext highlighter-rouge">.dynstr</code>, and <code class="language-plaintext highlighter-rouge">.dynamic</code> section. That helps us, since we will be able to find API or cstdlib functions that are called in the binary and match them to the source. Furthermore, the presense of the <code class="language-plaintext highlighter-rouge">.eh_frame_hdr</code> and <code class="language-plaintext highlighter-rouge">.eh_frame</code> sections show that DWARF debug information is present. Note that in this case I used <code class="language-plaintext highlighter-rouge">make debug</code> to build hyx instead of just <code class="language-plaintext highlighter-rouge">make</code>, so we can preserve function names when we do the analysis. That’s why we will see <code class="language-plaintext highlighter-rouge">ubsan</code> function calls later.</p>
<p>We’re going to use the Cutter frontend of radare2 instead of IDA to examine the control flow graph this time, just because I like the color scheme better. Our <code class="language-plaintext highlighter-rouge">main()</code> function looks like a tentacle monster:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/hyx_main_cfg.png" alt="" /></p>
<p>Instantly, we can tell that there are two loops somewhere in this function, since we see a green arrow on the left column and a blue arrow on the right column that creates cycles in the call graph. Looking in the source code, we can see those two loops in <code class="language-plaintext highlighter-rouge">hyx.c:142-151</code> and <code class="language-plaintext highlighter-rouge">hyx.c:177-190</code>. This is helpful since it shows us that everything else in that cycle is part of the loop. Let’s take a closer look at the right side of the CFG:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/hyx_strcmp.png" alt="" /></p>
<p>Note that the only appearance of <code class="language-plaintext highlighter-rouge">strcmp</code> in the function is in the <code class="language-plaintext highlighter-rouge">for</code> loop at <code class="language-plaintext highlighter-rouge">hyx.c:142-151</code>. Therefore, we can assume that this entire cycle is what reads the arguments:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="p">(</span><span class="kt">size_t</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">1</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="p">(</span><span class="kt">size_t</span><span class="p">)</span> <span class="n">argc</span><span class="p">;</span> <span class="o">++</span><span class="n">i</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s">"-h"</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s">"--help"</span><span class="p">))</span>
<span class="n">help</span><span class="p">(</span><span class="mi">0</span><span class="p">);</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s">"-v"</span><span class="p">)</span> <span class="o">||</span> <span class="o">!</span><span class="n">strcmp</span><span class="p">(</span><span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">],</span> <span class="s">"--version"</span><span class="p">))</span>
<span class="n">version</span><span class="p">();</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">filename</span><span class="p">)</span>
<span class="n">filename</span> <span class="o">=</span> <span class="n">argv</span><span class="p">[</span><span class="n">i</span><span class="p">];</span>
<span class="k">else</span>
<span class="n">help</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now, on to the left side, which features an if statement:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/hyx_ifstatement.png" alt="" /></p>
<p>We can match the calls to <code class="language-plaintext highlighter-rouge">fileno</code> and <code class="language-plaintext highlighter-rouge">isatty</code> in the topmost node of the cfg, as well as the <code class="language-plaintext highlighter-rouge">blob_load</code> at the bottommost node of the cfg to this snippet:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">blob_init</span><span class="p">(</span><span class="o">&</span><span class="n">blob</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">isatty</span><span class="p">(</span><span class="n">fileno</span><span class="p">(</span><span class="n">stdin</span><span class="p">)))</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">filename</span><span class="p">)</span> <span class="n">help</span><span class="p">(</span><span class="n">EXIT_FAILURE</span><span class="p">);</span>
<span class="n">blob_load_stream</span><span class="p">(</span><span class="o">&</span><span class="n">blob</span><span class="p">,</span> <span class="n">stdin</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="o">!</span><span class="n">freopen</span><span class="p">(</span><span class="s">"/dev/tty"</span><span class="p">,</span> <span class="s">"r"</span><span class="p">,</span> <span class="n">stdin</span><span class="p">))</span>
<span class="n">pdie</span><span class="p">(</span><span class="s">"could not reopen controlling TTY"</span><span class="p">);</span>
<span class="p">}</span>
<span class="k">else</span> <span class="p">{</span>
<span class="n">blob_load</span><span class="p">(</span><span class="o">&</span><span class="n">blob</span><span class="p">,</span> <span class="n">filename</span><span class="p">);</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Notice here that passing <code class="language-plaintext highlighter-rouge">blob</code> by reference to <code class="language-plaintext highlighter-rouge">blob_init</code> was done with the <code class="language-plaintext highlighter-rouge">lea rax, blob</code> command and then calling the function. That assembly command stores the location of <code class="language-plaintext highlighter-rouge">blob</code> in <code class="language-plaintext highlighter-rouge">rax</code>, much like a pointer in normal C. This resemblance by design, since x86 was made to support languages like C.</p>
<p>The other sections of the if statement is self-explanatory. The middle two nodes with only two commands check if the filename exists, and if not prints out the error message. <code class="language-plaintext highlighter-rouge">blob_load_stream</code> and the following if statement are merged into one node.</p>
<p>As you can see, it’s surprisingly easy to figure out how the program works in assembly, even when we don’t build it with debug symbols. This really helps in debugging a program without those symbols as well, since we can still trace through the program even when GDB fails us. As we will soon see, symbolic execution can complement debuggers and sometimes even replace them when debug symbols are absent.</p>
<h3 id="symbolic-execution">Symbolic Execution</h3>
<p>This is one of the most important tools in a reverse engineering arsenal. It allows us to take an expected ouput, and figure out the inputs needed for the said output to show. This is especially useful for debugging edge cases where the output is known but the input is not. Let’s take a look at a program like this one:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <cstdio>
#include <cstring>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">(</span><span class="kt">int</span> <span class="n">argc</span><span class="p">,</span> <span class="kt">char</span> <span class="o">**</span><span class="n">argv</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Fail if we don't have exactly one argument</span>
<span class="k">if</span> <span class="p">(</span><span class="n">argc</span> <span class="o">!=</span> <span class="mi">1</span><span class="p">)</span>
<span class="k">return</span> <span class="mi">1</span><span class="p">;</span>
<span class="kt">char</span> <span class="n">nice</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="s">"TOASTY"</span><span class="p">,</span> <span class="n">input</span><span class="p">[</span><span class="mi">100</span><span class="p">];</span>
<span class="c1">// We don't want to change the EOL operator</span>
<span class="k">for</span> <span class="p">(</span><span class="kt">int</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">6</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span>
<span class="n">nice</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">nice</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">+</span> <span class="n">i</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"Enter something: </span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="n">scanf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span> <span class="n">input</span><span class="p">);</span>
<span class="k">if</span> <span class="p">(</span><span class="n">strcmp</span><span class="p">(</span><span class="n">nice</span><span class="p">,</span> <span class="n">input</span><span class="p">)</span> <span class="o">!=</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"u suck</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">else</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"good job</span><span class="se">\n</span><span class="s">"</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now, if we put in the correct string, we will get the program to print out <code class="language-plaintext highlighter-rouge">good job</code>. However, the string is not exactly <code class="language-plaintext highlighter-rouge">TOASTY</code> anymore after the addition we did. Maybe the CFG can help us figure out the right input. Using radare2 again to generate an aesthetic CFG:</p>
<p><img src="https://h313.info/blog/assets/img/cfg/problem_show.png" alt="" /></p>
<p>Here we see the if statement branching into one of two possible branches depending on the results of a call to the <code class="language-plaintext highlighter-rouge">strcmp</code> function, as expected. But something happens to the <code class="language-plaintext highlighter-rouge">rsi</code> pointer that stores <code class="language-plaintext highlighter-rouge">TOASTY</code> and it is now no longer <code class="language-plaintext highlighter-rouge">TOASTY</code>. We’ll have to use symbolic execution to get the correct input.</p>
<p>First, let’s take a look at the addresses of the “good job” command in the binary by right-clicking on the <code class="language-plaintext highlighter-rouge">lea rdi, str.good_job</code> assembly instruction. This happens at <code class="language-plaintext highlighter-rouge">0x0000122c</code> for me. The same method shows that the <code class="language-plaintext highlighter-rouge">lea rdi, str.u_suck</code> instruction for the incorrect input is at <code class="language-plaintext highlighter-rouge">0x0000121e</code>. That gives us the addresses that the symbolic executor will want to find and the ones that it will want to avoid. Now, we can do a symbolic execution run using <code class="language-plaintext highlighter-rouge">angr</code> in a Python 3 shell:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>>>> import angr
>>> p = angr.Project('toasty-test')
WARNING | 2020-11-05 18:08:11,453 | cle.loader | The main binary is a position-independent executable. It is being loaded with a base address of 0x400000.
>>> p.factory.block(p.entry).pp()
0x401130: endbr64
0x401134: xor ebp, ebp
0x401136: mov r9, rdx
0x401139: pop rsi
0x40113a: mov rdx, rsp
0x40113d: and rsp, 0xfffffffffffffff0
0x401141: push rax
0x401142: push rsp
0x401143: lea r8, [rip + 0x156]
0x40114a: lea rcx, [rip + 0xdf]
0x401151: lea rdi, [rip - 0xe8]
0x401158: call qword ptr [rip + 0x2e8a]
>>> sm = p.factory.simulation_manager()
>>> sm.explore(find=0x40122c, avoid=0x40121e)
WARNING | 2020-11-05 18:08:58,679 | angr.storage.memory_mixins.default_filler_mixin | The program is accessing memory or registers with an unspecified value. This could indicate unwanted behavior.
WARNING | 2020-11-05 18:08:58,679 | angr.storage.memory_mixins.default_filler_mixin | angr will cope with this by generating an unconstrained symbolic variable and continuing. You can resolve this by:
WARNING | 2020-11-05 18:08:58,679 | angr.storage.memory_mixins.default_filler_mixin | 1) setting a value to the initial state
WARNING | 2020-11-05 18:08:58,679 | angr.storage.memory_mixins.default_filler_mixin | 2) adding the state option ZERO_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to make unknown regions hold null
WARNING | 2020-11-05 18:08:58,679 | angr.storage.memory_mixins.default_filler_mixin | 3) adding the state option SYMBOL_FILL_UNCONSTRAINED_{MEMORY,REGISTERS}, to suppress these messages.
WARNING | 2020-11-05 18:08:58,680 | angr.storage.memory_mixins.default_filler_mixin | Filling memory at 0x7fffffffffeff40 with 8 unconstrained bytes referenced from 0xa8f900 (strcmp+0x0 in libc.so.6 (0x8f900))
<SimulationManager with 1 found, 1 avoid>
>>> sm.found[0].posix.stdin.concretize()[0]
b'TPCVX^\x00\x02I\xe0\x02\x89\x80\x02\xa4\x08\x02)\x08\x89\x8aI*J\x02\x89I\x89\x19\x01\x89I\x10\x19\x89\x02\x02\x02\x02\x01\x08\x02\x02\x08\x01\x02\x00\x02\x01\x01\x02\x01\x01\x01\x02\x01\x01\x01\x00\x02'
</code></pre></div></div>
<p>And there you have it. At the last line, entering <code class="language-plaintext highlighter-rouge">TPCVX^</code> will provide us with the <code class="language-plaintext highlighter-rouge">good job</code> message. Everything after the <code class="language-plaintext highlighter-rouge">\00</code> character can be removed, since that’s after the null operator. Notice how we found that the instruction for an incorrect input was at <code class="language-plaintext highlighter-rouge">0x0000121e</code>, but we set the instruction in angr to <code class="language-plaintext highlighter-rouge">0x40121e</code>. That’s because the <code class="language-plaintext highlighter-rouge">cle</code> loader that angr uses starts the executable at a base address of <code class="language-plaintext highlighter-rouge">0x400000</code>, so our instruction offset we got above would become <code class="language-plaintext highlighter-rouge">0x400000</code> + <code class="language-plaintext highlighter-rouge">0x121e</code> = <code class="language-plaintext highlighter-rouge">0x40121e</code> when we use it in angr.</p>
<p>While the above code did seem like brute-forcing it until the correct assembly line was hit, the way angr and other symbolic execution frameworks solves problems like this is much more ingenious. By reading in the assembly code and interpreting it while keeping all inputs to the program as a symbol (like in the algebra sense), we can arrive at an expression for each input, with the contraints on the expression being the conditional statements that the program goes through.</p>
<p>Symbolic execution is an especially good use case for testing software, since it’s often hard to find the correct inputs to trigger the outputs that we may want to test. It’s kind of like a correctness proof but with less mathematical rigor. Automating symbolic exection and CFG analysis is easy too, especially with angr and radare2. Ghidra’s automation requires Java and a GUI, which is much harder, while IDA Pro’s automation costs a lot. This provides lots of opportunities for new ways to test code, espeically in places where the code coverage is low.</p>
<h3 id="conclusion">Conclusion</h3>
<p>Binary analysis is already an invaluable tool for malware researchers and is pretty active as a research area. But it also has its uses for developers, and comes in especially handy when dealing with bugs caused by compilers. It can also be used for working out a bug without debug symbols, where GCC may not be of much use. It’s worth it to learn if only to understand how to better optimize code for compilers. It’s also very interesting (and free) to poke around at binaries, so why not try it?</p>Haoda Wangharry@h313.infoOne of the seriously underutilized tools of the trade in the software development world, at least in my experience, has been binary analysis. We have linters, unit tests, correctness proofs, and static analysis tools to help catch bugs in our software. However, when a bug inevitably pops up that escapes all these checks, it could be hard to fix. Binary analysis can enhance our debugging toolkit by catching bugs that stem from the compiler. While most binary analysis is done in the field of security, many of those principles can be brought into normal software development to fix hard-to-detect problems as well.Does Having an Anime Profile Picture Make You a Better Programmer?2020-07-31T06:33:13+00:002020-07-31T06:33:13+00:00https://h313.info/github/anime/google-cloud/2020/07/31/does-having-an-anime-profile-picture-make-you-a-better-programmer<p>In her 2001 book <em>Anime from Akira to Princess Mononoke</em>, Professor Napier showed that many fans of
anime work in computer science and its related fields. The survey also happened to show that “over
70 percent had a grade point average of 3.0 or higher, which is especially impressive when one
considers the academic rigor of scientific fields.”</p>
<p>Anime has a pretty well-known reputation for creating <a href="https://youtu.be/755BDwzxv5c?t=3">men of culture</a>. That’s a clear
indication that anime fans can be profoundly affected by the medium. In addition, many prolific
open source contributors have anime characters as their profile picture. So that got me to thinking,
does being a fan of anime also make you a more intelligent person?</p>
<p><img src="https://raw.githubusercontent.com/laynH/Anime-Girls-Holding-Programming-Books/master/C%2B%2B/Sakura_Nene_CPP.jpg" alt="sakura nene cpp" /></p>
<p>Of course, a question like that is nearly impossible to answer directly. After all, there’s
countless ways to measure intelligence, and anime fandom is so broad that no one definition can fit
all cases. For example, should we consider someone who has only watched <em>Spirited Away</em>, and liked
it very much, but has no exposure to other forms of anime, to be an anime fan? What about people who
only read manga? Or those who exclusively watch <a href="https://en.wikipedia.org/wiki/The_Leader_(web_series)">whatever this is supposed to be</a>?</p>
<p>A smaller question that’s easily answerable would be to see if having an anime profile picture
correlates with you being a better programmer. After all, if someone takes the effort to set their
profile picture to a waifu, they clearly have some fondness for anime. As for being a “better
programmer,” we’ll just equate being better with having more activity on GitHub. And being good at
programming does require an amount of critical reasoning at logic skill, which should equate to a
higher intelligence. Of course, this metric could be easily abused by having a <code class="language-plaintext highlighter-rouge">cron</code> job making a
ton of commits, but it’s a measure of programming activity that should be Good Enough™.</p>
<p>Luckily, Google provides their <a href="https://cloud.google.com/vision/">image labelling API</a> for very cheap (or free, if you
have GCP credit). As an example, putting in an image of best girl Mai Sakurajima from <em>Rascal Does
Not Dream of Bunny Girl Senpai</em> into the demo provided, I’ll get this list of labels back from it:</p>
<p><img src="https://h313.info/blog/assets/img/mai_google_vision.png" alt="mai" /></p>
<p>Notice how one of the labels is “Anime”? That’s a surprise tool that will help us later :) Google
also provides a Python API, which makes it even easier to check images, since all you have to do now
is check if “Anime” is one of the tags:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">anime_or_not</span><span class="p">(</span><span class="n">image</span><span class="p">):</span>
<span class="n">response</span> <span class="o">=</span> <span class="n">client</span><span class="p">.</span><span class="n">label_detection</span><span class="p">(</span><span class="n">image</span><span class="o">=</span><span class="n">image</span><span class="p">)</span>
<span class="n">labels</span> <span class="o">=</span> <span class="n">response</span><span class="p">.</span><span class="n">label_annotations</span>
<span class="k">for</span> <span class="n">item</span> <span class="ow">in</span> <span class="n">labels</span><span class="p">:</span>
<span class="k">if</span> <span class="n">item</span><span class="p">.</span><span class="n">description</span> <span class="o">==</span> <span class="s">"Anime"</span><span class="p">:</span>
<span class="k">return</span> <span class="bp">True</span>
</code></pre></div></div>
<p>As for GitHub commits, we can use the <a href="https://docs.github.com/v3/activity/event_types/">events API</a> that’s roughly analogous to the
contribution history graph of a user. We’ll be measuring user activity just by the number of events
for each user, so each event (opening a PR, creating a repo, etc.) is given equal weight. That’s
roughly analogous to how green a user’s contribution heatmap is.</p>
<p><img src="https://h313.info/blog/assets/img/github_contribution_graph.png" alt="contribution map" /></p>
<p><a href="https://pygithub.readthedocs.io/en/latest/">PyGitHub</a> wraps the GitHub API into an easy to use library, so getting the number of
events for a user, as well as their profile picture’s URL, is pretty simple:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="n">users</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="n">get_users</span><span class="p">()</span>
<span class="k">for</span> <span class="n">user</span> <span class="ow">in</span> <span class="n">users</span><span class="p">:</span>
<span class="n">event_count</span> <span class="o">=</span> <span class="mi">0</span>
<span class="k">for</span> <span class="n">event</span> <span class="ow">in</span> <span class="n">user</span><span class="p">.</span><span class="n">get_events</span><span class="p">():</span>
<span class="n">event_count</span> <span class="o">+=</span> <span class="mi">1</span>
<span class="n">is_anime_image</span> <span class="o">=</span> <span class="n">check_if_weeb</span><span class="p">(</span><span class="n">user</span><span class="p">.</span><span class="n">avatar_url</span><span class="p">)</span>
</code></pre></div></div>
<p>GitHub does rate limit the API to 5000 requests per hour for authenticated users. That’s enough to
run about 2000 requests per hour. To get around that, we can take advantage of how GitHub profile
IDs are numbered sequentially and process profiles in batches of 1000:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="k">for</span> <span class="n">github_id</span> <span class="ow">in</span> <span class="nb">range</span><span class="p">(</span><span class="mi">1200000</span><span class="p">,</span> <span class="mi">1201000</span><span class="p">):</span>
<span class="k">try</span><span class="p">:</span>
<span class="n">user</span> <span class="o">=</span> <span class="n">g</span><span class="p">.</span><span class="n">get_user</span><span class="p">(</span><span class="n">github_id</span><span class="p">)</span>
<span class="k">except</span> <span class="nb">Exception</span><span class="p">:</span>
<span class="k">continue</span>
<span class="c1"># do user stuff here
</span></code></pre></div></div>
<p>I’ve modified the <code class="language-plaintext highlighter-rouge">get_user</code> function here to use the undocumented <code class="language-plaintext highlighter-rouge">/user/:id</code> endpoint. This hasn’t
been implemented in PyGitHub yet, but <a href="https://github.com/PyGithub/PyGithub/issues/1615">this issue</a> seems to be tracking it.</p>
<p>All that’s left is to link these APIs up and save the data. It’s trivial to just loop through all
users using the <code class="language-plaintext highlighter-rouge">/users</code> GitHub API endpoint, send their image over to the Google Vision API, note
down whether they had an anime profile picture and the number of events for that user, and finally
log it into a CSV for analysis later. That’s exactly what I did, and you can see my code
<a href="https://github.com/h313/anime-face">here</a>. It’s very research quality, so don’t expect much.</p>
<p>So now I’ve got a table of 3497 GitHub profiles, of which only 23 have anime profile pictures.
Here’s a box plot that displays the distribution of user activity by profile picture type:</p>
<p><img src="https://h313.info/blog/assets/img/github_boxplot.png" alt="box plot" /></p>
<p>Hmm, the users with an anime profile picture do seem to have a higher average number of
activities. But we can’t stop here. Keep in mind that there’s way more samples of users without
anime profile pictures compared to those with, as well as the comparatively high amount of
outliers in both groups. To be sure that the difference here is statistically significant,
we’ll need to do a T-test:</p>
<div class="language-python highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="kn">from</span> <span class="nn">scipy.stats</span> <span class="kn">import</span> <span class="n">ttest_ind</span>
<span class="n">cat1</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s">'is_anime_face'</span><span class="p">]</span> <span class="o">==</span> <span class="bp">True</span><span class="p">]</span>
<span class="n">cat2</span> <span class="o">=</span> <span class="n">df</span><span class="p">[</span><span class="n">df</span><span class="p">[</span><span class="s">'is_anime_face'</span><span class="p">]</span> <span class="o">==</span> <span class="bp">False</span><span class="p">]</span>
<span class="n">ttest_ind</span><span class="p">(</span><span class="n">cat1</span><span class="p">[</span><span class="s">'contribs'</span><span class="p">],</span> <span class="n">cat2</span><span class="p">[</span><span class="s">'contribs'</span><span class="p">])</span>
</code></pre></div></div>
<p>That provides a p-value of <code class="language-plaintext highlighter-rouge">0.2371</code>. We now have to conclude that the higher average we got isn’t
statistically significant, since our p-value of 23.7% doesn’t meet the traditional 5% cutoff.
Therefore, we must once again acquiesce to <a href="https://en.wikipedia.org/wiki/Betteridge's_law_of_headlines">Betteridge’s law</a>, and adopt our null
hypothesis, that having an anime profile picture does not necessarily correlate with your abilities
as a programmer.</p>
<p>Further work into this topic can be done, however. Since this project only looked at a small
number of users, who were among the first to register, it is not a representative slice of the
GitHub user population. In addition, it may also be enlightening to include the inactive users
skipped in this experiment.</p>Haoda Wangharry@h313.infoIn her 2001 book Anime from Akira to Princess Mononoke, Professor Napier showed that many fans of anime work in computer science and its related fields. The survey also happened to show that “over 70 percent had a grade point average of 3.0 or higher, which is especially impressive when one considers the academic rigor of scientific fields.”This is What Peak Hello World Looks Like2020-05-17T08:15:00+00:002020-05-17T08:15:00+00:00https://h313.info/cpp/2020/05/17/this-is-what-peak-hello-world-looks-like<p>Everybody’s done a Hello World program before. But now that I’ve got a few years of experience with the language, I set out to ask one of the most pressing questions out there - how do we make Hello World in C as convoluted and hard to understand as possible? This post documents the final result of a sleep-deprived me trying to do exactly that.</p>
<p>I quickly realized I had to set a few ground rules first so there would be a sensible limit for lines of code that are “useless”:</p>
<ul>
<li>Since we’re using pure C, we won’t have classes (so no <code class="language-plaintext highlighter-rouge">HelloWorldFactoryFactoryFactorySingleton</code>s)</li>
<li>All <code class="language-plaintext highlighter-rouge">#include</code> directives should be essential to the program (so no chaining a billion <code class="language-plaintext highlighter-rouge">.h</code> files)</li>
<li>Each function must do something essential to the program (so no functions that just call another or useless <code class="language-plaintext highlighter-rouge">#defines</code>)</li>
<li>The program takes no input, and writes <code class="language-plaintext highlighter-rouge">Hello World!</code> exactly to the terminal (so no other calculations are done)</li>
</ul>
<p>With that, let’s start by building our string in a separate function with <code class="language-plaintext highlighter-rouge">malloc</code> instead:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
</span>
<span class="kt">char</span><span class="o">*</span> <span class="nf">generate_words</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">13</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'H'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'e'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'l'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'l'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'o'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="sc">' '</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'W'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'o'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'r'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'l'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'d'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'!'</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="sc">'\0'</span><span class="p">;</span> <span class="c1">// Can't forget our null operator!</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">generate_words</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>That’s already much more complicated than before, but we can make it even worse! Replacing our direct assignments of letters to bitwise operations gets us this monstrosity:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
</span>
<span class="kt">char</span><span class="o">*</span> <span class="nf">generate_words</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span><span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">12</span><span class="p">);</span>
<span class="kt">char</span> <span class="n">c</span> <span class="o">=</span> <span class="mh">0x01</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">3</span><span class="p">);</span> <span class="c1">// H is 0x48</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">);</span> <span class="c1">// 'e' is 101</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">);</span> <span class="c1">// 'l' is 108</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">3</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">);</span> <span class="c1">// 'l' is 108</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">);</span> <span class="c1">// 'o' is 111</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">);</span> <span class="c1">// ' ' is 32</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">)</span> <span class="o"><<</span> <span class="mi">3</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">)</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">;</span> <span class="c1">// 'W' is 87</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">7</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">);</span> <span class="c1">// 'o' is 111</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="n">c</span><span class="p">)</span> <span class="o"><<</span> <span class="mi">4</span><span class="p">;</span> <span class="c1">// 'r' is 114</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">9</span><span class="p">]</span> <span class="o">=</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">((</span><span class="n">c</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">))</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">);</span> <span class="c1">// 'l' is 108</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">);</span> <span class="c1">// 'd' is 100</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">c</span> <span class="o"><<</span> <span class="mi">5</span><span class="p">)</span> <span class="o">|</span> <span class="n">c</span><span class="p">;</span> <span class="c1">// '!' is 33</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">generate_words</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>As you can see, I’ve added comments and some extra brackets for readability’s sake. However, what if we remove our <code class="language-plaintext highlighter-rouge">c</code> variable to save space and only depend on previously defined values in <code class="language-plaintext highlighter-rouge">ret</code> for our calculations?</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
</span>
<span class="k">volatile</span> <span class="n">u_int8_t</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">0</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">volatile</span> <span class="n">u_int8_t</span> <span class="o">*</span> <span class="n">z</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="n">u_int8_t</span><span class="o">*</span> <span class="nf">generate_words</span><span class="p">()</span> <span class="p">{</span>
<span class="n">u_int8_t</span> <span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">malloc</span><span class="p">(</span><span class="mi">13</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">3</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">&</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">))</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">6</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">6</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">5</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">>></span> <span class="mi">1</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">2</span><span class="p">))</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">4</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="o">~</span><span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">6</span><span class="p">])</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">>></span> <span class="mi">5</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">|</span> <span class="p">((</span><span class="n">ret</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">8</span><span class="p">])</span> <span class="o">>></span> <span class="mi">6</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">9</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">7</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">u_int8_t</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">generate_words</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span> <span class="n">ret</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Now our code’s starting to look good (bad?). The only problem is that it takes way too long to run - my super-powerful laptop processor from two years ago took nearly 5 milliseconds to run it! Well, we could also optimize it to be cache-aligned for <strong>maximum performance</strong>:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
#include <zconf.h>
</span>
<span class="n">u_int8_t</span><span class="o">*</span> <span class="nf">generate_words</span><span class="p">()</span> <span class="p">{</span>
<span class="n">u_int8_t</span> <span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="n">sysconf</span><span class="p">(</span><span class="n">_SC_LEVEL1_DCACHE_LINESIZE</span><span class="p">),</span> <span class="mi">13</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">3</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">&</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="mi">6</span><span class="p">))</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">1</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">6</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">6</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">5</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">2</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">])</span> <span class="o">>></span> <span class="mi">1</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">6</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">4</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">2</span><span class="p">))</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">0</span><span class="p">]</span> <span class="o">>></span> <span class="mi">4</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">8</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="o">~</span><span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">6</span><span class="p">])</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o"><<</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">>></span> <span class="mi">5</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">11</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="mi">5</span><span class="p">]</span> <span class="o">|</span> <span class="p">((</span><span class="n">ret</span><span class="p">[</span><span class="mi">10</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="mi">8</span><span class="p">])</span> <span class="o">>></span> <span class="mi">6</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">9</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">7</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="k">return</span> <span class="n">ret</span><span class="p">;</span>
<span class="p">}</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">u_int8_t</span> <span class="o">*</span><span class="n">ret</span> <span class="o">=</span> <span class="n">generate_words</span><span class="p">();</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span> <span class="n">ret</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Well, now we’ve got <em>way</em> too many lines. Let’s trim down on it by getting rid of the function and putting our code into a <code class="language-plaintext highlighter-rouge">for</code> loop:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">u_int8_t</span> <span class="o">*</span> <span class="n">ret</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="n">sysconf</span><span class="p">(</span><span class="n">_SC_LEVEL1_DCACHE_LINESIZE</span><span class="p">),</span> <span class="mi">13</span><span class="p">);</span>
<span class="k">for</span> <span class="p">(</span><span class="n">__auto_type</span> <span class="n">i</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="n">i</span> <span class="o"><</span> <span class="mi">13</span><span class="p">;</span> <span class="n">i</span><span class="o">++</span><span class="p">)</span> <span class="p">{</span>
<span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">0</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">6</span><span class="p">))</span> <span class="o">|</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">3</span><span class="p">));</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">1</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)))</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">>></span> <span class="n">i</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">5</span><span class="p">));</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">2</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">4</span><span class="p">));</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">4</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">4</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">+</span> <span class="mi">1</span><span class="p">));</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">5</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">4</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">5</span><span class="p">])</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">4</span><span class="p">);</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">6</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">6</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">4</span><span class="p">)))</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">6</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span> <span class="o"><<</span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">5</span><span class="p">));</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">8</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="o">~</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">7</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">2</span><span class="p">])</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o">|</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o"><<</span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">7</span><span class="p">);</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">10</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">9</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">5</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">5</span><span class="p">));</span>
<span class="k">else</span> <span class="k">if</span> <span class="p">(</span><span class="n">i</span> <span class="o">==</span> <span class="mi">11</span><span class="p">)</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">6</span><span class="p">]</span> <span class="o">|</span> <span class="p">((</span><span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="n">ret</span><span class="p">[</span><span class="n">i</span> <span class="o">-</span> <span class="mi">3</span><span class="p">])</span> <span class="o">>></span> <span class="p">(</span><span class="n">i</span> <span class="o">-</span> <span class="mi">5</span><span class="p">));</span>
<span class="k">else</span>
<span class="n">ret</span><span class="p">[</span><span class="n">i</span><span class="p">]</span> <span class="o">=</span> <span class="n">i</span> <span class="o">-</span> <span class="n">i</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">9</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">ret</span> <span class="o">+</span> <span class="mi">7</span><span class="p">,</span> <span class="n">ret</span> <span class="o">+</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">ret</span><span class="p">[</span><span class="mi">12</span><span class="p">]</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span> <span class="n">ret</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">ret</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Wait, that <code class="language-plaintext highlighter-rouge">for</code> loop we added there could be written in an even more convoluted way! Let’s do so:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
</span>
<span class="k">volatile</span> <span class="kt">unsigned</span> <span class="kt">int</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">0</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">volatile</span> <span class="n">u_int8_t</span> <span class="o">*</span> <span class="n">z</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="k">switch</span><span class="p">(</span><span class="n">a</span><span class="p">)</span> <span class="p">{</span>
<span class="k">case</span> <span class="mi">0</span> <span class="o">-</span> <span class="mi">1</span><span class="p">:</span>
<span class="n">z</span> <span class="o">=</span> <span class="n">aligned_alloc</span><span class="p">(</span><span class="n">sysconf</span><span class="p">(</span><span class="n">_SC_LEVEL1_DCACHE_LINESIZE</span><span class="p">),</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">3</span><span class="p">);</span>
<span class="k">break</span><span class="p">;</span>
<span class="k">case</span> <span class="mi">1</span> <span class="o">*</span> <span class="mi">10</span> <span class="o">+</span> <span class="mi">3</span><span class="p">:</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">z</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="n">z</span> <span class="o">+</span> <span class="mi">2</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">z</span> <span class="o">+</span> <span class="mi">9</span><span class="p">,</span> <span class="n">z</span> <span class="o">+</span> <span class="mi">3</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">memcpy</span><span class="p">(</span><span class="n">z</span> <span class="o">+</span> <span class="mi">7</span><span class="p">,</span> <span class="n">z</span> <span class="o">+</span> <span class="mi">4</span><span class="p">,</span> <span class="mi">1</span><span class="p">);</span>
<span class="n">printf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span> <span class="n">z</span><span class="p">);</span>
<span class="n">free</span><span class="p">(</span><span class="n">z</span><span class="p">);</span>
<span class="k">return</span> <span class="mi">0</span><span class="p">;</span>
<span class="nl">default:</span>
<span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">=</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">0</span> <span class="o">?</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="p">]</span> <span class="o">=</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="mi">6</span><span class="p">))</span> <span class="o">|</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="mi">3</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">1</span> <span class="o">?</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="p">(</span><span class="mi">1</span> <span class="o"><<</span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="mi">5</span><span class="p">)))</span> <span class="o">|</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">>></span> <span class="n">a</span><span class="p">)</span> <span class="o">|</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="mi">5</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">2</span> <span class="o">?</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">|</span> <span class="n">z</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="mi">4</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">4</span> <span class="o">?</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">|</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">4</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">5</span> <span class="o">?</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">4</span><span class="p">]</span> <span class="o">&</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o">&</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">5</span><span class="p">])</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">4</span><span class="p">)</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">6</span> <span class="o">?</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">2</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">|</span> <span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">6</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">4</span><span class="p">)))</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">6</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">2</span><span class="p">)</span> <span class="o"><<</span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">5</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">8</span> <span class="o">?</span> <span class="p">(</span><span class="o">~</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">7</span><span class="p">]</span> <span class="o">&</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">2</span><span class="p">])</span> <span class="o">|</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o">|</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">3</span><span class="p">]</span> <span class="o"><<</span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">7</span><span class="p">)</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">10</span> <span class="o">?</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">9</span><span class="p">]</span> <span class="o">&</span> <span class="o">~</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">5</span><span class="p">]</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">5</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">==</span> <span class="mi">11</span> <span class="o">?</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">6</span><span class="p">]</span> <span class="o">|</span> <span class="p">((</span><span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">1</span><span class="p">]</span> <span class="o">&</span> <span class="n">z</span><span class="p">[</span><span class="n">a</span> <span class="o">-</span> <span class="mi">3</span><span class="p">])</span> <span class="o">>></span> <span class="p">(</span><span class="n">a</span> <span class="o">-</span> <span class="mi">5</span><span class="p">))</span> <span class="o">:</span> <span class="n">a</span> <span class="o">-</span> <span class="n">a</span><span class="p">;</span>
<span class="p">}</span>
<span class="n">a</span> <span class="o">=</span> <span class="n">a</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="n">main</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>Well, our switch there is taking quite a bit of space, isn’t it? Let’s fix that and clean up the code a bit:</p>
<div class="language-c highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="cp">#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>
</span>
<span class="k">volatile</span> <span class="n">u_int8_t</span> <span class="n">a</span> <span class="o">=</span> <span class="mi">0</span> <span class="o">-</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">volatile</span> <span class="n">u_int8_t</span> <span class="o">*</span> <span class="n">z</span> <span class="o">=</span> <span class="nb">NULL</span><span class="p">;</span>
<span class="kt">int</span> <span class="nf">main</span><span class="p">()</span> <span class="p">{</span>
<span class="n">a</span><span class="o">==</span><span class="mi">0</span><span class="o">-</span><span class="mi">1</span><span class="o">?</span><span class="n">z</span><span class="o">=</span><span class="n">aligned_alloc</span><span class="p">(</span><span class="n">sysconf</span><span class="p">(</span><span class="n">_SC_LEVEL1_DCACHE_LINESIZE</span><span class="p">),</span><span class="mi">1</span><span class="o">*</span><span class="mi">10</span><span class="o">+</span><span class="mi">3</span><span class="p">)</span><span class="o">:</span><span class="n">a</span><span class="o">!=</span><span class="mi">1</span><span class="o">*</span><span class="mi">10</span><span class="o">+</span><span class="mi">3</span><span class="o">?</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="p">]</span><span class="o">=</span><span class="n">a</span><span class="o">==</span><span class="mi">0</span><span class="o">?</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="p">]</span><span class="o">=</span><span class="p">(</span><span class="mi">1</span><span class="o"><<</span><span class="p">(</span><span class="n">a</span><span class="o">+</span><span class="mi">6</span><span class="p">))</span><span class="o">|</span><span class="p">(</span><span class="mi">1</span><span class="o"><<</span><span class="p">(</span><span class="n">a</span><span class="o">+</span><span class="mi">3</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">1</span><span class="o">?</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">&</span><span class="p">(</span><span class="mi">1</span><span class="o"><<</span><span class="p">(</span><span class="n">a</span><span class="o">+</span><span class="mi">5</span><span class="p">)))</span><span class="o">|</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">>></span><span class="n">a</span><span class="p">)</span><span class="o">|</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">+</span><span class="mi">5</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">2</span><span class="o">?</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="o">|</span><span class="n">z</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span><span class="o">&~</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">+</span><span class="mi">4</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">4</span><span class="o">?</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="o">|</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span><span class="o">|</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">4</span><span class="p">]</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">+</span><span class="mi">1</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">5</span><span class="o">?</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">4</span><span class="p">]</span><span class="o">&</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span><span class="o">&</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">5</span><span class="p">])</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">4</span><span class="p">)</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">6</span><span class="o">?</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">2</span><span class="p">]</span><span class="o">&~</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">|</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">6</span><span class="p">]</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">4</span><span class="p">)))</span><span class="o">&~</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">6</span><span class="p">]</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">2</span><span class="p">)</span><span class="o"><<</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">5</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">8</span><span class="o">?</span><span class="p">(</span><span class="o">~</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">7</span><span class="p">]</span><span class="o">&</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">2</span><span class="p">])</span><span class="o">|</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span><span class="o">|</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">3</span><span class="p">]</span><span class="o"><<</span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">7</span><span class="p">)</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">10</span><span class="o">?</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">9</span><span class="p">]</span><span class="o">&~</span><span class="p">(</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">5</span><span class="p">]</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">5</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">==</span><span class="mi">11</span><span class="o">?</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">6</span><span class="p">]</span><span class="o">|</span><span class="p">((</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">1</span><span class="p">]</span><span class="o">&</span><span class="n">z</span><span class="p">[</span><span class="n">a</span><span class="o">-</span><span class="mi">3</span><span class="p">])</span><span class="o">>></span><span class="p">(</span><span class="n">a</span><span class="o">-</span><span class="mi">5</span><span class="p">))</span><span class="o">:</span><span class="n">a</span><span class="o">-</span><span class="n">a</span><span class="o">:</span><span class="mi">0</span><span class="p">;</span><span class="k">if</span><span class="p">(</span><span class="n">a</span><span class="o">==</span><span class="mi">1</span><span class="o">*</span><span class="mi">10</span><span class="o">+</span><span class="mi">3</span><span class="p">){</span><span class="n">memcpy</span><span class="p">(</span><span class="n">z</span><span class="o">+</span><span class="mi">3</span><span class="p">,</span><span class="n">z</span><span class="o">+</span><span class="mi">2</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span><span class="n">memcpy</span><span class="p">(</span><span class="n">z</span><span class="o">+</span><span class="mi">9</span><span class="p">,</span><span class="n">z</span><span class="o">+</span><span class="mi">3</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span><span class="n">memcpy</span><span class="p">(</span><span class="n">z</span><span class="o">+</span><span class="mi">7</span><span class="p">,</span><span class="n">z</span><span class="o">+</span><span class="mi">4</span><span class="p">,</span><span class="mi">1</span><span class="p">);</span><span class="n">printf</span><span class="p">(</span><span class="s">"%s"</span><span class="p">,</span><span class="n">z</span><span class="p">);</span><span class="n">free</span><span class="p">(</span><span class="n">z</span><span class="p">);</span><span class="k">return</span> <span class="mi">0</span><span class="p">;}</span><span class="n">a</span><span class="o">=</span><span class="n">a</span><span class="o">+</span><span class="mi">1</span><span class="p">;</span><span class="n">main</span><span class="p">();</span>
<span class="p">}</span>
</code></pre></div></div>
<p>And there we go - a program that prints <code class="language-plaintext highlighter-rouge">Hello World</code>, in just a single line of logic!</p>
<p>I’m sure there’s more ways to improve the unreadability of the project, so if you see something that could be improved, please let me know!</p>Haoda Wangharry@h313.infoEverybody’s done a Hello World program before. But now that I’ve got a few years of experience with the language, I set out to ask one of the most pressing questions out there - how do we make Hello World in C as convoluted and hard to understand as possible? This post documents the final result of a sleep-deprived me trying to do exactly that.Figuring Out Where and When You Are, Except it’s in Space2020-05-11T04:33:13+00:002020-05-11T04:33:13+00:00https://h313.info/aerospace/nasa/spice/2020/05/11/figuring-out-where-and-when-you-are-except-its-in-space<p>Space is huge - our entire solar system is about 9.09 billion kilometers in diameter, and at those scales even the radius of the sun at 695,508 seems tiny in comparison. So how is it that we can communicate with probes like New Horizons, <a href="https://solarsystem.nasa.gov/missions/new-horizons/in-depth/">4.5 light-hours away</a>, with unerring accuracy?</p>
<p>During my work with Mars 2020, I was able to play around with <a href="https://naif.jpl.nasa.gov/naif/index.html">SPICE</a>, a NASA toolkit that has become the standard for positioning and timing in space. SPICE is also used by ESA, JAXA, ISRO, and KARI. The best part is, it’s open source and the toolkit is open for anyone to download! However, the concepts took a while to understand. That’s why I thought it would be a good idea to document the “big ideas” used in the toolkit, which I found extremely interesting. Let’s start with time.</p>
<h2 id="when-you-are">When you are</h2>
<p>Time in SPICE is specified in multiple different systems, all of which revolve around epochs, which are events we can set a time to. For example, an epoch may be set to the oscillations of a quartz crystal, or Earth’s rotation around it’s axis. Multiple systems exist since we have different ways to represent time, and different epochs we can set our time to.</p>
<p>Everyone uses Coordinated Universal Time (UTC) in everyday life. We specify it as a string like this: <code class="language-plaintext highlighter-rouge">1999-03-21T12:28:29.702</code>. It’s not the best representation for space missions however, as UTC noon often does not match perfectly with astronomical noon (in the UT1 format, where the sun is directly over the Greenwich zenith meridian) as the Earth’s rotation is not uniform, so leap seconds need to be added at the end of either June 30 or December 31. Having a non-uniform time system is a recipe for disaster, which is where <a href="https://en.wikipedia.org/wiki/International_Atomic_Time">International Atomic Time</a> (TAI) comes in.</p>
<p>The format of International Atomic Time is similar to the UNIX epoch. TAI measures atomic seconds from UTC midnight on 1 Jan 1958 (in UNIX it’s from midnight on 1 Jan 1970). As it uses an <a href="https://en.wikipedia.org/wiki/Atomic_clock">atomic clock</a> for timekeeping, our epochs are fully uniform, which lets us calculate time without worrying too much about offsets like UTC. We specify it similarly to a UNIX timestamp you can get with <code class="language-plaintext highlighter-rouge">date +%s</code>.</p>
<p>The most basic system for counting time in SPICE is the Barycentric Dynamical Time (TDB), which is also known as Ephemeris Time (ET). This is set with an offset and a scale to <a href="https://en.wikipedia.org/wiki/Barycentric_Coordinate_Time">Barycentric Coordinate Time</a> (TCB), a clock at rest but outside the Solar system’s gravity well. This means that TCB ticks about 490 milliseconds faster per year than TAI. However, ET advances on average at almost the same rate as TAI, which is good enough for navigation in the solar system. This is also specified in the same way as a UNIX timestamp.</p>
<p>The spacecraft also contains clocks to schedule operations, which is a count of clock-dependent “ticks” from some reference tick. Since these clocks aren’t very stable, we have to account for drift in the duration of the “tick.” In SPICE, spacecraft time (SCLK) is encoded as a double-precision number called “ticks” for ease of conversion, but it’s often expressed as a character string that differs by mission, and is specified in kernels. That’s because each clock may have a “tick” that denotes different amounts of time passing, or varying accuracy of ticks. That’s why Cassini’s SCLK format (<code class="language-plaintext highlighter-rouge">1/4294967295.255</code>) is completely different from that of Galileo’s (<code class="language-plaintext highlighter-rouge">1/16777215:90:09:07</code>).</p>
<p>SPICE contains a utility (<code class="language-plaintext highlighter-rouge">chronos</code>) to convert between UTC, ET, and SCLK. But how would we store all the offsets, or how much time each “tick” actually is?</p>
<h4 id="lsk-and-sclk-kernels">LSK and SCLK Kernels</h4>
<p>A leapsecond kernel (LSK) is a file listing <strong>every</strong> leap second that has ever occurred, and gets updated whenever a new one is announced. It contains the dates on which a leap second occurred. For example, <a href="https://naif.jpl.nasa.gov/pub/naif/MSL/kernels/lsk/msl.tls">here’s</a> the LSK used by Mars Science Laboratory (MSL):</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>DELTET/DELTA_AT = ( 10, @1972-JAN-1
11, @1972-JUL-1
12, @1973-JAN-1
13, @1974-JAN-1
14, @1975-JAN-1
15, @1976-JAN-1
16, @1977-JAN-1
17, @1978-JAN-1
18, @1979-JAN-1
19, @1980-JAN-1
20, @1981-JUL-1
21, @1982-JUL-1
22, @1983-JUL-1
23, @1985-JUL-1
24, @1988-JAN-1
25, @1990-JAN-1
26, @1991-JAN-1
27, @1992-JUL-1
28, @1993-JUL-1
29, @1994-JUL-1
30, @1996-JAN-1
31, @1997-JUL-1
32, @1999-JAN-1
33, @2006-JAN-1
34, @2009-JAN-1
35, @2012-JUL-1
36, @2015-JUL-1
37, @2017-JAN-1 )
</code></pre></div></div>
<p>SCLK kernels allow us to translate from a spacecraft clock to other time systems. You can find one of MSL’s <a href="https://naif.jpl.nasa.gov/pub/naif/MSL/kernels/sclk/msl_lmst_gc120806_v3.tsc">here</a>:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>SCLK_PARTITION_START_76900 = ( 0.00000000000000E+00 )
SCLK_PARTITION_END_76900 = ( 3.15576000000000E+14 )
SCLK01_COEFFICIENTS_76900 = (
0.0000000000000E+00 397446666.183 88775.24400000
)
</code></pre></div></div>
<p>The partitions mentioned in the kernel are the time when the spacecraft clock rolls over, at which point the SCLK would go from <code class="language-plaintext highlighter-rouge">1/**</code> to <code class="language-plaintext highlighter-rouge">2/**</code>. The coefficients are the interesting part. <code class="language-plaintext highlighter-rouge">88775.24400000</code> is the number of ET seconds per Martian day (or Sol). <code class="language-plaintext highlighter-rouge">397446666.183</code> is the exact landing time of MSL on Mars in seconds from midnight at GMT on Jan 1 2000 - <code class="language-plaintext highlighter-rouge">2012-08-06T05:17:55</code> in UTC.</p>
<h2 id="where-you-are">Where you are</h2>
<p>To know your position, you’ll need two things:</p>
<ul>
<li>Another object to measure your position from (an origin)</li>
<li>A way to measure how far you are from that object (a unit of distance)</li>
</ul>
<p>In SPICE, this translates to a reference frame and a coordinate system. A reference frame can be an inertial frame, where the origin is the barycenter of the Solar system. The most common inertial reference frame used is <a href="https://en.wikipedia.org/wiki/International_Celestial_Reference_Frame">ICRF</a>, which is defined by extragalactic radio sources, and coincides closely with the <a href="https://en.wikipedia.org/wiki/Earth-centered_inertial#J2000">J2000</a> frame, which is based on Earth’s equator and equinox at <code class="language-plaintext highlighter-rouge">2000-01-01T12:00:00</code>. This is useful for travel through the Solar system.</p>
<p><img src="https://h313.info/blog/assets/img/spice_icrf_frame.png" alt="" /></p>
<p>A body-fixed frame, on the other hand, is tied to a celestial body and rotates with it, which is useful to position parts on a spacecraft.</p>
<p><img src="https://h313.info/blog/assets/img/spice_body_fixed_frame.png" alt="" /></p>
<p>Then there are topocentric frames, which are attached to the surface of a celestial body. This is useful for landers and rovers.</p>
<p>Finally, we have dynamic frames, where the orientations change with time. This is useful for knowing where to point antennas to communicate with probes.</p>
<p><img src="https://h313.info/blog/assets/img/spice_gse_frame.png" alt="" /></p>
<p>SPICE also supports multiple coordinate systems, including planetocentric, planetodetic, and planetographic ones. Our Longitude/Latitude system is an example of a planetocentric system, where we assume the planet is spherical and measure using the Prime Meridian and the Equator. A planetodetic system, however, keeps longitude but the latitude becomes the angle measured from the X-Y plane to the surface normal at the point of interest. A planetographic system is similar to a planetodetic system, but longitude increases against the direction of the planet’s rotation, except for the earth, moon and sun, where longitude is positive east by default. The planetographic system is only fixed for planets and satellites, as there are conflicting standards for dwarf planets, asteroids and comets. SPICE comes with utilities to convert between these systems.</p>
<p><img src="https://h313.info/blog/assets/img/spice_coordinates.png" alt="" /></p>
<p>Each spacecraft often has multiple body-fixed frames for its instruments, in addition to an inertial frame or topocentric frame to locate the spacecraft. For example, a high-gain antenna may have it’s own frame and orientation data. Data for these frames are stored in <code class="language-plaintext highlighter-rouge">spk</code>, <code class="language-plaintext highlighter-rouge">ck</code>, and <code class="language-plaintext highlighter-rouge">fk</code> files. How do these files work?</p>
<h4 id="spk-ck-fk-and-pck-files">SPK, CK, FK, and PCK files</h4>
<p>An <code class="language-plaintext highlighter-rouge">spk</code> file contains position and velocity data for objects. Each <code class="language-plaintext highlighter-rouge">spk</code> kernel can contain multiple objects. When resolving the position of an object, SPICE can also look through all its loadeded kernels to compute the vectors needed. You can find one used for MSL <a href="https://naif.jpl.nasa.gov/pub/naif/MSL/kernels/spk/msl_atls_gc120806_v3.bsp">here</a>. Each object is stored in a simple binary format:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>Target, Ref Frame ID, Center of Motion, T_start, T_stop
epoch_1, x1, y1, z1, vx1, vy1, vz1
...
epoch_n, xn, yn, zn, vxn, vyn, vzn
</code></pre></div></div>
<p>A <code class="language-plaintext highlighter-rouge">pck</code> file contains orientation and shape models for natural celestial bodies. It can be in text form, or a binary form if high accuracy data is available. Polynomials are used to describe the declination and rotation of the pole, as well as the prime meridian. You can find one used by MSL <a href="https://naif.jpl.nasa.gov/pub/naif/MSL/kernels/pck/pck00008.tpc">here</a>. An entry in a text-based PCK file would look like this:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>BODY610_POLE_RA = ( 40.58 -0.036 0. )
BODY610_POLE_DEC = ( 83.52 -0.004 0. )
BODY610_PM = ( 58.83 518.2359876 0. )
BODY610_LONG_AXIS = ( 0. )
BODY610_NUT_PREC_RA = ( 0. -1.623 0. 0. 0. 0. 0. 0. 0.023 )
BODY610_NUT_PREC_DEC = ( 0. -0.183 0. 0. 0. 0. 0. 0. 0.001 )
BODY610_NUT_PREC_PM = ( 0. 1.613 0. 0. 0. 0. 0. 0. -0.023 )
</code></pre></div></div>
<p>A <code class="language-plaintext highlighter-rouge">ck</code> file contains orientation data for a spacecraft or a component on that spacecraft. It’s stored in a rather complicated binary format, so I won’t go into detail on it.</p>
<p>A <code class="language-plaintext highlighter-rouge">fk</code> file contains frame data that allows for easy translation between each frame’s coordinate system. Here’s what an entry looks like:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>FRAME_DAWN_SPACECRAFT = -203000
FRAME_-203000_NAME = 'DAWN_SPACECRAFT’
FRAME_-203000_CLASS = 3
FRAME_-203000_CLASS_ID = -203000
FRAME_-203000_CENTER = -203
</code></pre></div></div>
<p>These entries note that frame ID <code class="language-plaintext highlighter-rouge">-203000</code> has name <code class="language-plaintext highlighter-rouge">DAWN_SPACECRAFT</code>, with class <code class="language-plaintext highlighter-rouge">3</code>, which means it is a CK-based frame. The <code class="language-plaintext highlighter-rouge">CLASS_ID</code> tells us that the ID of the CK structure of this entry is <code class="language-plaintext highlighter-rouge">-203000</code>, and is centered at the object <code class="language-plaintext highlighter-rouge">-203</code>.</p>
<h2 id="aberration-corrections">Aberration Corrections</h2>
<p>Positions can now be determined given a state vector and a reference frame. SPICE provides many features relating to positioning, including conversion between coordinate systems and reference frames, as well as a cool feature called aberration corrections. These are corrections made to state vectors to account for the travel time of light and <a href="https://en.wikipedia.org/wiki/Aberration_(astronomy)">stellar aberration</a>.</p>
<p><img src="https://h313.info/blog/assets/img/spice_positioning.png" alt="" /></p>
<p>For example, let’s say our spacecraft is trying to image a moving object 8 light seconds away. SPICE would show that the object is 8 light-seconds away, but is also moving in another direction. However, in order to take a picture with the planet centered, we would need to point our camera at the object as it was 8 seconds ago. There is one problem though: we’re assuming that our spacecraft is not moving at the same time (stellar aberration). We can correct this by pointing our cameras slightly away from the spacecraft’s direction of travel.</p>
<p>SPICE provides a utility for this, called <code class="language-plaintext highlighter-rouge">STATES</code>. Usage is simple:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>~
⟩ ./cspice/exe/states
Welcome to STATES
This program demonstrates the use of NAIF S- and P-
Kernel (SPK) files and subroutines by computing the
state of a target body as seen from an observing
body at a number of epochs within a given time
interval.
Enter the name of a leapseconds kernel file: naif0012.tls
Enter the name of a binary SPK ephemeris file: de425s.bsp
Enter the name of the observing body: Mars
Enter the name of a target body: Earth
Enter the number of states to be calculated: 24
Enter the beginning UTC time: 1 jan 2020
Enter the ending UTC time: 5 jan 2020
Enter the inertial reference frame (e.g.:J2000): J2000
Type of correction Type of state
-------------------------------------------------------------
'LT+S' Light-time and stellar aberration Apparent state
'LT' Light-time only True state
'NONE' No correction Geometric state
Enter LT+S, LT, or NONE: LT+S
Working ... Please wait
For time 1 of 24, the state of:
Body : Earth
Relative to body: Mars
In Frame : J2000
At UTC time : 2020 JAN 01 00:00:00
Position (km) Velocity (km/s)
----------------------- -----------------------
X: 1.7264994106068176e+08 -4.4257725250749800e+01
Y: 2.5540840124038413e+08 1.1538086331635352e+01
Z: 1.0847062666574109e+08 5.8006544004475851e+00
MAGNITUDE: 3.2681390793796480e+08 4.6103375928890053e+01
Continue? (Enter Y or N): y
</code></pre></div></div>
<p>Here’s the results without abberation correction:</p>
<div class="language-plaintext highlighter-rouge"><div class="highlight"><pre class="highlight"><code>For time 1 of 24, the state of:
Body : Earth
Relative to body: Mars
In Frame : J2000
At UTC time : 2020 JAN 01 00:00:00
Position (km) Velocity (km/s)
----------------------- -----------------------
X: 1.7259725326375243e+08 -4.4256182271235772e+01
Y: 2.5541439777498752e+08 1.1529246064486195e+01
Z: 1.0847415490990619e+08 5.7967956829135803e+00
MAGNITUDE: 3.2679193488880628e+08 4.6099197652776560e+01
</code></pre></div></div>
<p>Remember that J2000 is actually ICRF here. Also, because SPICE uses vectors for the location of each planet, the choice of reference frame won’t change our final answer since the component of the vector in the direction of the origin cancels out. We also see very minimal changes due to aberration since Earth and Mars are closer together, but if we look at the state of an object farther away, like Pluto, we’ll get a much more noticable distortion.</p>
<p>I was actually surprised and very impressed at the amount of thought put into something as simple as knowing your position and time in space. Even this post only scratches the surface of what SPICE does. Aside from command-line utilities, it also includes APIs for C, FORTRAN, IDL, Java, and MATLAB. Bindings are also available for other languages, including <a href="https://spiceypy.readthedocs.io/en/master/index.html">Python</a>. If you’re interested in playing around with it, you can download the <a href="https://naif.jpl.nasa.gov/naif/toolkit.html">toolkit</a> as well as existing <a href="https://naif.jpl.nasa.gov/naif/data_operational.html">mission data</a> and try it out yourself!</p>Haoda Wangharry@h313.infoSpace is huge - our entire solar system is about 9.09 billion kilometers in diameter, and at those scales even the radius of the sun at 695,508 seems tiny in comparison. So how is it that we can communicate with probes like New Horizons, 4.5 light-hours away, with unerring accuracy?An Analysis of the LEGO City Deep Space Rocket2020-05-09T19:33:13+00:002020-05-09T19:33:13+00:00https://h313.info/aerospace/2020/05/09/an-analysis-of-the-lego-city-deep-space-rocket<p>The LEGO City <a href="https://www.lego.com/en-us/product/deep-space-rocket-and-launch-control-60228">Deep Space Rocket and Launch Control</a> is a “modular, multi-stage rocket
with cockpit, booster and payload storage modules.” Prominently featured on the page are images of
the launch control tower, launchpad, and various extra equipment such as a lunar rover. But can this
rocket really fly?</p>
<p><img src="https://h313.info/blog/assets/img/lego_rocket.jpeg" alt="" /></p>
<p>LEGO helpfully provides an “Explore in 3D” option for us, which provides us a good measure of the
exact size of the rocket. We will first assume that:</p>
<ul>
<li>LEGO City is on Earth</li>
<li>A minifigure, 3 blocks high, is repesentative of an average American adult, at 1.753 meters tall
(<a href="https://www.cdc.gov/nchs/data/series/sr_03/sr03_039.pdf">US Dept. of Health and Human Services; et al.</a>)</li>
<li>The two boosters are similar to the 4-segment <a href="https://en.wikipedia.org/wiki/Space_Shuttle_Solid_Rocket_Booster">solid rocket boosters</a> used on the
space shuttle</li>
<li>The first stage uses a cryogenic rocket engine like the RS-25</li>
</ul>
<p>Given these, we can find these measurements for each of the respective components:</p>
<ul>
<li>Main stage: 20 blocks (11.69m) tall, 6 blocks (3.51m) wide</li>
<li>SRB: 13 blocks (7.60m) tall, 4 blocks (2.34m) wide</li>
<li>Payload stage: 12 blocks (7.01m) tall, 6 blocks (3.51m) wide</li>
<li>Space Capsule: 18 blocks (10.52m) tall, 8 blocks (4.68m) wide, cone-shaped</li>
</ul>
<p>An interesting aspect of this design is that it removes the commonly-used second stage booster and
replaces it with a payload. While this increases cargo space, it also increases the total mass that
needs to be put into orbit, which makes it harder to accelerate as well.</p>
<p>Using these figures, we can start calculating the amount of fuel in our main stage. Let’s assume
that that the main stage contains fuel and oxidizer similar to that in the space shuttle’s external
fuel tank. For comparison, the <a href="https://en.wikipedia.org/wiki/Space_Shuttle_external_tank">shuttle’s external fuel tank</a> holds 735,601
kilograms of fuel (both LOX and LH2), in a volume of 2,050,798 litres. That’s roughly
0.36 kg of fuel per litre. As our main stage has volume (assuming it’s a cylinder) 113,115 litres,
it can hold about 40,573 kilograms of fuel.</p>
<p>As for our solid rocket boosters, they hold 500,000 kg of rocket fuel in a volume of about 491.43
cubic meters (assuming a fully cylindrical booster), or about 1017 kg of fuel per cubic meter.
Knowing that our SRB has volume 27.94 cubic meters, we can assume that each of our SRBs hold 28,410
kg of propellant.</p>
<p>For simplicity, let’s just assume that our space capsule is similar in mass to the
<a href="https://en.wikipedia.org/wiki/Orion_(spacecraft)">Orion crew module</a>, or 10,400 kg. Let’s assume that the payload stage has the same
mass as our crew module for simplicity. We now have the information we need to use the
<a href="https://en.wikipedia.org/wiki/Tsiolkovsky_rocket_equation">rocket equation</a> to examine its motion:</p>
<p><img src="https://h313.info/blog/assets/img/rocket_equation.svg" alt="" /></p>
<p>We know our initial mass (m<sub>0</sub>) to be 107,793 kg (we’re assuming the stages are 100%
propellant), and we know our final mass (m<sub>f</sub>) to be 10,400 kg (that’s just the crew
module). Standard gravity (g<sub>0</sub>) is just 9.81 m/s^2.</p>
<p>We’ll now have to calculate our specific impulse (I<sub>sp</sub>). Since these engines also have
different mass flow, we’ll need to get final I<sub>sp</sub> with <a href="https://wiki.kerbalspaceprogram.com/wiki/Specific_impulse#Multiple_engines">this equation</a>. Plugging
in those numbers to Mathematica gives us:</p>
<p><img src="https://h313.info/blog/assets/img/isp_calculation.png" alt="" /></p>
<p>Our final I<sub>sp</sub>, which is 263.1 seconds. Plugging this back into the rocket equation, we
get our final delta-v:</p>
<p><img src="https://h313.info/blog/assets/img/delta_v_calculation.png" alt="" /></p>
<p>4246.43 meters per second, or 4.25 km/s. That’s not high enough for even LEO, which
<a href="https://en.wikipedia.org/wiki/Delta-v_budget#Budget">requires about 9.4 km/s</a>, though it would definitely break the Karman line. This
raises troubling questions for LEGO City leadership. Why does the set include a rover and a
grappling arm, if it will never reach the moon? What’s the satellite used for if it doesn’t have
the delta-v to reach even low-earth orbit? LEGO, we need answers!</p>Haoda Wangharry@h313.infoThe LEGO City Deep Space Rocket and Launch Control is a “modular, multi-stage rocket with cockpit, booster and payload storage modules.” Prominently featured on the page are images of the launch control tower, launchpad, and various extra equipment such as a lunar rover. But can this rocket really fly?Dissecting DNS Packets at Line Rate2020-05-05T01:33:13+00:002020-05-05T01:33:13+00:00https://h313.info/networking/dpdk/dns/2020/05/05/dissecting-dns-packets-at-line-rate<p>A couple months ago, my advisor asked me if I wanted to develop a small part of the <a href="https://ant.isi.edu/ddidd/index.html">DDiDD</a>
project, which would check incoming DNS packets and reply to any packets with an invalid domain
automatically, which would free up the DNS server from responding to those. Sounds simple, right?
There’s one catch - the packets needed to be processed at line rate, which in my case meant 40
gigabits per second.</p>
<p>40 gigabits per second of pure DNS packets, assuming packet sizes of about 80 bytes per packet,
means that the program would have to process 62.5 million packets every second. That gives me 16
nanoseconds to process each packet, or 67 CPU cycles given a 4.2GHz processor with a single core
(assuming that the bridge between my NIC and my processor has zero latency). This is not enough time
for the Linux kernel’s network stack to even send a packet (<a href="http://info.iet.unipi.it/~luigi/papers/20120503-netmap-atc12.pdf">L. Rizzo, 2012, p. 3</a>).</p>
<p>So, what I needed was a library that could provide:</p>
<ul>
<li>Fast packet I/O at 40Gbps</li>
<li>Ability to create virtual interfaces to communicate with the DNS server on the same machine</li>
<li>Ability to read and modify packets before forwarding them</li>
</ul>
<h2 id="the-competition">The Competition</h2>
<p>There’s many libraries out there that promise higher packet processing speeds than the Linux kernel.
Most of them rely on hardware on the NIC or kernel bypass techniques. It’s interesting to note that
most of these methods rely on polling, rather than interrupts, since at such high network speeds
interrupts would actually slow down packet processing.</p>
<h3 id="p4">P4</h3>
<p>Luckily, some of the NICs in the testbed I’m using contain FPGAs inside and support <a href="https://p4.org/">P4</a>, which
essentialy turns our NIC into a switch. However, P4 did not support an easy way of looking at the
packet’s contents, only the headers. This also requires buying expensive, specialized hardware,
which would limit where we could deploy the software.</p>
<h3 id="mellanox-vma">Mellanox VMA</h3>
<p>Mellanox’s <a href="https://www.mellanox.com/products/software/accelerator-software/vma">VMA</a> runs by using an <code class="language-plaintext highlighter-rouge">LD_PRELOAD</code> to override the kernel’s network calls with
their own, which lowers the number costly <code class="language-plaintext highlighter-rouge">memcpy</code>s, interrupts, and context switches you have to
do.</p>
<h3 id="solarflare-ef_vi">Solarflare EF_VI</h3>
<p>Cloudflare does a much better job of explaining this than I will on <a href="https://blog.cloudflare.com/kernel-bypass/">their blog</a>.
It works in a similar way to Mellanox’s solution.</p>
<h3 id="netmap">Netmap</h3>
<p><a href="https://github.com/luigirizzo/netmap">Netmap</a> is a collection of kernel modules that allow for fast packet I/O. However, it
also requires patched drivers, and supports less NICs than DPDK does. It also lets you create
virtual network interfaces for non-netmap programs to use.</p>
<h3 id="dpdk">DPDK</h3>
<p><a href="https://www.dpdk.org/">DPDK</a> is another high-speed kernel bypass library sponsored by Intel which supports
a wide array of network interfaces, but has a troublesome API which requires rewriting the network
stack for everything beyond the physical layer. While far from ideal, since this was the library
I was most familiar with, I ended up using this for the project.</p>
<h2 id="the-kernel-nic-interface">The Kernel NIC Interface</h2>
<p>One of the more interesting modules in DPDK is the <a href="https://doc.dpdk.org/guides/prog_guide/kernel_nic_interface.html">kernel NIC interface</a>, which lets you
create a virtual interface for non-DPDK programs to use. However, it’s also faster than traditional
virtual interfaces, since it cuts out some of the costly transitions between kernelspace and
userspace. This requires a kernel module, <code class="language-plaintext highlighter-rouge">kmod/rte_kni.ko</code>. For my use case, I’ll be setting
<code class="language-plaintext highlighter-rouge">carrier=on</code> so I won’t have to bother with <code class="language-plaintext highlighter-rouge">rte_kni_update_link()</code>.</p>
<p>The developers also provided a handy <a href="https://github.com/DPDK/dpdk/tree/master/examples/kni">example application</a> in their repos, which
forwards incoming data from a physical NIC interface to a KNI interface, and vice versa. The magic
here happens in the <code class="language-plaintext highlighter-rouge">kni_egress</code> and <code class="language-plaintext highlighter-rouge">kni_ingress</code> functions, which work similarly.</p>
<p>Each interface has RX and TX ring buffers, which stores packets until they’re read. That makes the
sending and receiving packets without processing them rather simple. To transmit a packet, just push
a <code class="language-plaintext highlighter-rouge">rte_mbuf</code> containing the packet into the TX buffer, and to recieve a packet, read the RX buffer
into a different <code class="language-plaintext highlighter-rouge">rte_mbuf</code>. These operations are achieved with the <code class="language-plaintext highlighter-rouge">rte_eth_tx_burst</code> and
<code class="language-plaintext highlighter-rouge">rte_eth_rx_burst</code> for our physical interface, and <code class="language-plaintext highlighter-rouge">rte_kni_tx_burst</code> and <code class="language-plaintext highlighter-rouge">rte_kni_rx_burst</code> for our
KNI one. So, all the program needs to do is read data in with <code class="language-plaintext highlighter-rouge">rte_*_rx_burst</code> and then write those
<code class="language-plaintext highlighter-rouge">rte_mbuf</code>s out to the other interface using <code class="language-plaintext highlighter-rouge">rte_*_tx_burst</code>.</p>
<p>Since this example doesn’t handle headers, it’s fully transparent to the end user, except that all
traffic is now routed thorugh <code class="language-plaintext highlighter-rouge">vEth*</code> instead of <code class="language-plaintext highlighter-rouge">eth*</code>.</p>
<h2 id="fun-wth-ring-buffers">Fun Wth Ring Buffers</h2>
<p>DPDK also exposes the ring buffers directly to the user, which is a core component of this project.
By having <code class="language-plaintext highlighter-rouge">kni_ingress</code> write to a new ring buffer instead of the KNI TX ring, I can have another
thread running to do work on those packets. Here’s what that looks like:</p>
<p><img src="https://h313.info/blog/assets/img/dpdk_dns_layout.png" alt="" /></p>
<p>For this, I’ll need 4 threads. One would forward the packets from our NIC to the <code class="language-plaintext highlighter-rouge">WORKER_RX_RING</code>.
There, another thread reads <code class="language-plaintext highlighter-rouge">WORKER_RX_RING</code> and parses through each packet. All DNS packets with
an invalid TLD as determined by <a href="https://www.icann.org/resources/pages/tlds-2012-02-25-en">ICANN</a> are then passed to the <code class="language-plaintext highlighter-rouge">WORKER_TX_RING</code>, while
the rest continue to the KNI interface. Finally, one thread would pass invalid TLD responses to
the NIC, while the other passes outgoing packets from the KNI.</p>
<h2 id="decoding-a-dns-packet">Decoding a DNS Packet</h2>
<p>Now that we have our ring workers passing data between each other, we’ll also have to parse the
incoming DNS packets and read them into our program. Here’s an example of a DNS query packet
opened in Wireshark:</p>
<p><img src="https://h313.info/blog/assets/img/dns_packet.png" alt="" /></p>
<p>What we’re focusing on here is the DNS section, so we can skip the first 42 bytes, which are the
layer 2-4 header. At the beginning, we have 2 bytes that act as an identifier for the client to
match up replies. After that, we have two bytes of flags, which <a href="https://tools.ietf.org/html/rfc1035">RFC 1035</a> goes into on
detail in section 4.1.1. The next 4 sets of 2 bytes list the number of questions, answers, name
server resource records (RRs), and resource records following the packet. It’s important to note
that these records are in big-endian, which means you’ll have to reverse them when running on a
little-endian architecture like x86_64. We’ll skip the additional RRs and focus on the question RRs.</p>
<p>Each query (or question resource record) is split up into substrings by domain, so in our case
<code class="language-plaintext highlighter-rouge">ns5.SPOTIFY.COM</code> will become <code class="language-plaintext highlighter-rouge">ns5</code>, <code class="language-plaintext highlighter-rouge">SPOTIFY</code>, and <code class="language-plaintext highlighter-rouge">COM</code>. Preceding each string is the length of
the string, so <code class="language-plaintext highlighter-rouge">ns5</code> would be <code class="language-plaintext highlighter-rouge">03 6e 73 35</code>. The same applies to the other two domains. The name
ends with the null terminator <code class="language-plaintext highlighter-rouge">00</code>. Following that we’ve got indicators for query type (<code class="language-plaintext highlighter-rouge">0x0001</code> in
this case for an A record), and query class (<code class="language-plaintext highlighter-rouge">0x0001</code> for Internet addresses).</p>
<p>Knowing this, it’s trivial to implement an algorithm to go through the query name until we
find the TLD:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1">// Loop until the end of the query name</span>
<span class="n">std</span><span class="o">::</span><span class="n">string</span> <span class="n">query</span><span class="p">;</span>
<span class="kt">int</span> <span class="n">str_len</span><span class="p">,</span> <span class="n">offset</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="kt">char</span> <span class="o">*</span><span class="n">qname_start</span> <span class="o">=</span> <span class="n">qname</span><span class="p">;</span>
<span class="k">while</span> <span class="p">(</span><span class="nb">true</span><span class="p">)</span> <span class="p">{</span>
<span class="c1">// Read in the string length</span>
<span class="n">str_len</span> <span class="o">=</span> <span class="o">*</span><span class="n">qname</span><span class="p">;</span>
<span class="n">qname</span> <span class="o">=</span> <span class="n">qname</span> <span class="o">+</span> <span class="n">str_len</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">if</span> <span class="p">(</span><span class="o">*</span><span class="n">qname</span> <span class="o">!=</span> <span class="mh">0x0</span><span class="p">)</span>
<span class="n">offset</span> <span class="o">+=</span> <span class="n">str_len</span> <span class="o">+</span> <span class="mi">1</span><span class="p">;</span>
<span class="k">else</span>
<span class="k">break</span><span class="p">;</span>
<span class="p">}</span>
</code></pre></div></div>
<p>In this case, the TLD is <code class="language-plaintext highlighter-rouge">COM</code>. That happens to be a valid TLD, so we’ll push this
packet into the KNI’s RX_QUEUE and continue. But what if it isn’t?</p>
<h2 id="building-a-packet">Building a Packet</h2>
<p>Thankfully, for our use case, I didn’t need to include authority sections or anything extra.
Therefore all I had to do was modify the existing packet (thus saving on <code class="language-plaintext highlighter-rouge">malloc</code>s and <code class="language-plaintext highlighter-rouge">memcpy</code>s)
by swapping the destination address and ports with the source address and ports. This had to be done
on the Ethernet, IPv4, and UDP headers. Following that, I’ll modify the NXDOMAIN flags while keeping
everything else the same using a simple bitmask:</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code><span class="c1">// Modify DNS headers</span>
<span class="o">*</span><span class="p">(</span><span class="n">dns_hdr</span> <span class="o">+</span> <span class="mi">2</span><span class="p">)</span> <span class="o">|=</span> <span class="mb">0b10000000</span><span class="p">;</span> <span class="c1">// Standard query authoritative answer, no</span>
<span class="c1">// truncation or recursion</span>
<span class="o">*</span><span class="p">(</span><span class="n">dns_hdr</span> <span class="o">+</span> <span class="mi">3</span><span class="p">)</span> <span class="o">=</span> <span class="mb">0b00000011</span><span class="p">;</span> <span class="c1">// Name error</span>
</code></pre></div></div>
<p>One last thing to do now: generate our IPv4 checksums (and ignore the UDP ones):</p>
<div class="language-c++ highlighter-rouge"><div class="highlight"><pre class="highlight"><code> <span class="c1">// Set IPv4 checksum</span>
<span class="n">ip_hdr</span><span class="o">-></span><span class="n">hdr_checksum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span>
<span class="n">ip_hdr</span><span class="o">-></span><span class="n">hdr_checksum</span> <span class="o">=</span> <span class="n">rte_ipv4_cksum</span><span class="p">(</span><span class="n">ip_hdr</span><span class="p">);</span>
<span class="n">udp_hdr</span><span class="o">-></span><span class="n">dgram_cksum</span> <span class="o">=</span> <span class="mi">0</span><span class="p">;</span> <span class="c1">// Ignore UDP checksum</span>
</code></pre></div></div>
<p>And that’s it! All that’s left to do is push the packet to the NIC’s RX_QUEUE, and it’ll be sent
back.</p>
<p>The code for this project is available <a href="https://github.com/steelisi/dns-proxy">here</a></p>Haoda Wangharry@h313.infoA couple months ago, my advisor asked me if I wanted to develop a small part of the DDiDD project, which would check incoming DNS packets and reply to any packets with an invalid domain automatically, which would free up the DNS server from responding to those. Sounds simple, right? There’s one catch - the packets needed to be processed at line rate, which in my case meant 40 gigabits per second.