This blog can be found on https://hackmd.io/@0xff07/lkmp-sharing
I’ve always been curious about how Linux achieves its world dominance. Look into the surroundings: the routers, refrigerators, laptops, cloud services that make it possible for this page to exist. Linux is what makes our daily life possible. Yet one way or another, they all derive from that one and only mainline Linux kernel.
It’s a huge world to explore, and it’s been my dream to have that ability to explore it, and to be part of that community. I’ve done some hobbyist kernel development in the past, but still, there's a difference.
I thought that only very niche developers get the chance to be a part of that community.
This was my belief for years. Until one day, this mentorship appeared at the sight. And I thought to myself this might be the opportunity.
So after working on a few prerequisite tasks, I joined the mentorship!
The first challenge is to know at least how some parts of the kernel works. The kernel is already open source so just read it, right? Well, it's a bit more complicated.
Although source code of the kernel is publicly available, it's more than 30 million lines of code. Skimming each line for 10 seconds for 12 hours a day will take more than 19 years to finish, and that's under the assumption that the kernel doesn't grow over the years you're reading it.
Also, reading code without understanding proper context often leads to misinterpretation. And when it happens, it happens silently, and often takes great effort to correct afterwards.
Fortunately, the source code is not the only source of knowledge. Here are some tips I learned during the mentorship that work for me.
This is particularly useful to catch up on some latest topics. It also works very effectively when the documentation in the kernel doesn't mention that specific topic very much.
Videos of the past event held by Linux Foundation can be found on their YouTube channel. From the core kernel functions like IRQ, bottom-half handling like tasklet, workqueue, and threaded IRQs, to memory management and the internals of the SLUB (kernel's slab allocator), to storage stack and network stack. This truly is what makes the Linux kernel community wonderful. There are also LF Live: Mentorship Series that could be very great tutorials.
I made a list that categorized videos into different topics:
(It can be found on hackmd.io/@0xff07/kernel-pocket-index)
Those videos greatly enhance the understanding of the kernel, not only for this mentorship, but also for my job. Exploring topics, listening to stories told by people in the community, sharing them with people, they all bring tremendous joy to me!
For example, a sensor supported by Linux kernel may also be supported by a simpler environment, say, Arduino. If you've already familiar with some of those MCU platforms, it is possible to learn how that hardware works on those platforms, and with that knowledge learn how the kernel handles the same piece of hardware. This will greatly speed up the learning process.
Other examples are the commonly used protocols like USB, PCIe, and I2C. They have stand alone specifications that are independent of the Linux kernel, which means that they can be learned without any prior knowledge of the kernel.
The mailing archive has a really powerful query function. The hints can be found in the tiny little "help" beside the search box in every mailing list:
Clicking into it shows some mysterious symbols:
s: match within Subject e.g. s:"a quick brown fox"
d: match date-time range, git "approxidate" formats supported
Open-ended ranges such as `d:last.week..' and
`d:..2.days.ago' are supported
b: match within message body, including text attachments
nq: match non-quoted text within message body
q: match quoted text within message body
n: match filename of attachment(s)
t: match within the To header
c: match within the Cc header
f: match within the From header
a: match within the To, Cc, and From headers
tc: match within the To and Cc headers
l: match contents of the List-Id header
bs: match within the Subject and body
dfn: match filename from diff
dfa: match diff removed (-) lines
dfb: match diff added (+) lines
dfhh: match diff hunk header context (usually a function name)
dfctx: match diff context lines
dfpre: match pre-image git blob ID
dfpost: match post-image git blob ID
dfblob: match either pre or post-image git blob ID
patchid: match `git patch-id --stable' output
rt: match received time, like `d:' if sender's clock was correct
forpatchid: the `X-For-Patch-ID' mail header e.g. forpatchid:stable
changeid: the `X-Change-ID' mail header e.g. changeid:stable
This is the Xapian Query Syntax that I learned from Doing more with lore and b4. This allows one to filter out mails in a mailing list archive. For example, to find mails containing diff of drivers/gpu/drm/i915/display/intel_dp_mst.c
, simply copy and paste the following to the search box:
(dfn:"drivers/gpu/drm/i915/display/intel_dp_mst.c")
And you'll be able to see mails regarding this file!
Other than learning, this is particularly useful for seeing if anyone has had similar idea before, and more importantly, why that idea didn't work in the first place.
This is rather straightforward: see the typo and send a patch to fix it. Other things similar to this is fixing the inconsistent style and fixing it. Note that some documentation is extracted from the comments in the source code, so some of them may also be worth fixing.
Simply do a make
and run git status
to see if there's any untracked file after make
. Trying to understand why it's there. Sometimes it's because of missing .gitignore
. The tools/
, or the kselftests might be an easier starting point for this.
This is somewhat similar to documentation (e.g. typos).
There are other scenario where you might be able to propose change. For example, sometimes the vendor added support for new devices to a driver, but forgot to mention that in the Kconfig description. This might also be something that could be fixed.
Do check the mailing list (the Xapian query syntax would be very helpful in this case), git log
, and the driver match tables to confirm this.
The static check tools try to catch issues by analyzing the source code. They are "static" in the sense that it happens before or during the compilation, not when the kernel is actually running.
The Syabot board shows results of fuzzing tests on the kernel regularly. The report contains kernel image, vmlinux
, kernel configuration and sometimes C program to reproduce that bug. See Fixing bugs in the Linux kernel with Syzbot, Qemu and GDB detailed instructions.
This is also a place where I find exercises for myself to read output from various sanitizers.
After making a change, tests ensue. Although the kernel community surely has its own culture in terms of development, common dev practices apply.
Doing verification of the change is crucial, even if it's not actual C code. For example, after fixing a typo in the documentation, make sure that it compiles and renders as expected. Maybe a .gitignore is fixed, then it’s a good idea to think if the regex is both sufficient and necessary? Will it ignore files that it’s not supposed to exclude? As unique as the kernel is, common sense for software development may still apply. Do try the best to verify things.
Also, justifying that a change is necessary is also important. For example,
This is a really eye-opening experience, and also a dream come true!
The most important thing I learned is what interacting with the kernel community feels like, and starting to believe that upstreaming some simple patches is doable by mere mortals like me.
I used to be very afraid of sending any mail to the mailing list, thinking that any mistake I made would doom myself. The mailing list never falls short of heated debates after all. It turned out that it also has some very inclusive aspects. The community often provides very helpful suggestions. Listening to community's feedback is a really nice way to grow.
Special thanks to Shuah Khan and Ricardo B. Marliere for teaching everything. Not just for answering questions, but also for teaching mentalities, for joining an open source community, and for building a welcoming culture for us mentees. You truly are the role models that I’m aiming to become. Although this is the end of this mentorship, I’m sure that we'll meet again in the community again in the future. Let's further the world dominance of Linux together!