cell line to command line version 2 change log

book update

Notes on Ming's book

Thank you harris for a such detailed review. I made the changes accordingly.

Preface

Thank you so much Ming for sharing your personal story and your motivation behind becoming a computational biologist. It is truly moving and inspirational! I can definitely relate.

Notes

Chapter 2

Notes

Excellent book collection. I would also add the Sobell book on Linux and SQL for mere mortals
Coursera, EdX, Datacamp offer free options and certification-excellent ways to learn if you prefer videos over books. O'Reilly is also terrific, if the readers happen to have access. (I mentioned coursera and edx in chapter 1 for my story, I mentioned it again per suggestion)

Dash and friends for easy access to documentation for various programming languages

Including containers later would be nice given that installing software takes so much time. The section on workflow managers may be the best place for it. Actually, you include them in Chapter 12, which is great. Mention Chapter 12 here as it mentions ways to deal with installation problems!

Errors

section 2.3 Hadley's name should be corrected
2.3.4. Correct the title: Modern Statistics for Modern Biology
page 17 Unix proceeded to become one of the most used operating systems

Chapter 3

I love the definitions and the Unix resources. It can be confusing for beginners and they may be lost in the vast number or resources available to learn the command line
I really like the short introduction to file permissions
I love the ack reference and the reference to your one liners post

Notes

A few extra words on zcat and friends would be useful, since we have to deal with compressed files very often
I also find the tree command useful. You may want to include it
In 3.17, I would say Regular Expressions instead of Regular Expression. Remember what computational biologists do instead of biologist.

Chapter 4

The notes on purrr and how to batch import files are gold!
Excellent job on introducing pivot_wider and pivot_longer. Datacamp also offers excellent courses on Tidyverse that you might want to link to
Figure 4.12 is extremely helpful

Notes

A few words about Julia too as the new kid on the block (I had it)
In 4.8.4, it should read Do not reinvent the wheel
In 4.8.4, Did you notice instead of noticed
In 4,12, it should read Awesome Quarto

Chapter 5

I particularly like the emphasis on the experimental design and why orthogonal data and sanity checks are important.
I also enjoyed the advanced usage of tidyverse in this chapter
A potential issue with these chapter is that it tries to cover multiple different topics and this affects the flow of information

Notes

In 5.3, deferentially should read differentially
In 5.6, we should not be obsessed with p-value as shown… instead of show

Chapter 6

I totally agree with the choice to include PCA as a separate chapter due to its central importance in Bioinformatics
tSNE and UMAP are key dimensionality reduction methods for single-cell data so it is great that you include them here

Notes

It might be useful to include some other R packages that are useful for publication-quality PCA plots (e.g., ggpubr). Although it is useful and I use it myself, I believe I saw somewhere saying it may have some potential mistakes inside it

Chapter 7

Great that you devote a whole chapter on heatmaps. They are indeed ubiquitous and an essential skill for computational biologists.
I also like the fact that you list the commonly used packages for heatmap generation including ComplexHeatmap.
Using the Polychrom library and getting discrete colors to improve the dendrogram is a useful suggestion.

Notes

In many Genomics papers you will see heatmaps (instead of heatmap)

"Heatmap is of no mystery" could be ommited

A very simple using case for heatmap should read a very simple use case.

Chapter 8

Working efficiently with speadsheets is an important skill to have and thus I agree with dedicating a whole chapter to it

Notes

I have found csvkit (csvkit.readthedocs.io) to be extremely useful with dealing with Excel files and converting them to .csv files (you mention it in the Tools section along with other great tools such as Miller that I had never heard of!)

I would replace wet biologists with wet-lab biologists or experimental biologists.(I did not change)
I would replace "to the benefit of your own sake" with "for your own sake".
I would replace "Tidy a spreadsheet" with "Tidy up a spreadsheet "

Chapter 11

Domain-specific languages and workflow managers become increasingly important as the pipelines become more complicated and we have to deal with various tool dependencies in different compute environment. So, this chapter is really important.
The chapter has unbalanced content at the moment. Snakemake is very well covered but there are only a couple of links for Nextflow and almost nothing on WDL (see notes for suggestions).

Notes

I am glad that you include a reference to Brown's tutorial on Snakemake. In my personal experience, it is diffucult to find high-quality learning material for Snakemake. This is one of the reasons that I went for Nextflow instead.

The chapter content looks unbalanced at the moment as there is a lot about Snakemake but only a couple of links for Nextflow in 11.19 and nothing about WDL. I would add a few things about Nextflow (at least what are processes and channels and how the data are fed from one process to the other through channels). Also, it would be very helpful to link the official Nextflow documentation. A similar approached should be used for WDL.(I fully agree with you. Snakemake is the one that I am familiar with so I focused on it. I added the links for Nextflow and WDL)
You could also add this perspective by Wratten et al. to the readings: Reproducible, scalable, and shareable analysis pipelines with bioinformatics workflow managers. I think it is another excellent resource for anyone interested in understanding the differences between scripts, pipelines, and workflows. (added)

Chapter 12

This chapter is fantastic-offering instructions on how to name files and organize computational projects is invaluable.The notes on cron and crontab are also very useful.

Notes

A few spelling errors on GitHub and Singularity. Also, rocker should read Docker (Rocker is docker for R)

Chapter 14

I totally agree with the "learn by doing" approach. Unfortunately, many times the experimental biologists are eager to get the results and do not give us enough time to experiment… This has to change!
I also agree with the reproducibility comment.

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`	在筆記中貼入程式碼
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.