Joel's Digest in 2020

# Joel's Digest in 2020 > Mostly from newsletters and arxiv.org, including > #AI #ML #deeplearning #python #programming #technews # September ## Sep 14 ### [Google’s AI flood warnings now cover all of India and have expanded to Bangladesh](https://www.theverge.com/2020/9/1/21410252/google-ai-flood-warnings-india-bangladesh-coverage-prediction) I am curious about why Google build these kind of service. ## Sep 12 ### [Writing More Idiomatic and Pythonic Code](https://towardsdatascience.com/writing-more-idiomatic-and-pythonic-code-c22e900eaf83) `contextlib` has some useful functions to use. Try not to use `map` and `reduce`. ### [Instagram filters in python](https://medium.com/@travis.hoppe/instagram-filters-in-python-acc1ee7e67bc) Reverse engineering by learning color mapping from original and filtered photo pairs. ### [How Duolingo uses AI in every part of its app](https://venturebeat.com/2020/08/18/how-duolingo-uses-ai-in-every-part-of-its-app/) Present all praticles AI solutions Doulingo puts to improve user experience on its app. ### [Language-Agnostic BERT Sentence Embedding](https://ai.googleblog.com/2020/08/language-agnostic-bert-sentence.html) Utilize sentence pairs and dual-encoder finetuned from MLM & TLM pretrained LM to build a language-agonistic sentence embedding. ## Sep 11 ### [Being OK With Not Being Extraordinary](https://www.tiffanymatthe.com/not-extraordinary) Extraordinary should not be the end goal. Set up feasible goal to climb up and reassess it frequently. ### [karpathy/minGPT](https://github.com/karpathy/minGPT) Minimal GPT implementation in PyTorch. ## Sep 7 ### [How to stop procrastinating by using the Fogg Behavior Model](https://www.deprocrastination.co/blog/how-to-stop-procrastinating-by-using-the-fogg-behavior-model) Behavior = Motivation + Ability + Trigger. Find out which element you miss and create them to stop procrastinating. ### [Apple, Epic, and the App Store](https://stratechery.com/2020/apple-epic-and-the-app-store/) Apple extends its vertical integration by app installation, payment process and customer management, which potentially invloves anticompetitive behavior. ### [Entropy Explained, With Sheep](https://aatishb.com/entropy) Entropy is all about arrangement. When we get to an object with huge numbers of molecules, the arrangement distribution becomes extremely sharp and you're guaranteed to be right near the peak. # August ## Aug 20 ### [A college kid’s fake, AI-generated blog fooled tens of thousands. This is how he made it.](https://www.technologyreview.com/2020/08/14/1006780/ai-gpt-3-fake-blog-reached-top-of-hacker-news/) By providing interesting title and intro, anyone can use GPT-3 to generate contents that fool people. Note that this can only be feasible to some topics that don't need much rationality. ### [REALM: Integrating Retrieval into Language Representation Models](https://ai.googleblog.com/2020/08/realm-integrating-retrieval-into.html) Retrieve reference sentences from database to improve performance and interpretation. This model is very straightforward, combining accessible submodel such as scaNN and BERT. ### [Google open-sources LIT, a toolset for evaluating natural language models](https://venturebeat.com/2020/08/14/google-open-sources-lit-a-toolset-for-evaluating-natural-language-models) A visual, interactive and user-frendily tools to analyze model behaviors and enable comparison. Focus on finding samples which performs poorly, reasoning model outcomes and understanding consistency of model prediction. ### [What to Expect in Python 3.9](https://livecodestream.dev/post/2020-08-15-what-to-expect-in-python-39/) New parser, dictionary union operator, builtin prefix and postfix remove function, generic typing and graphlib. ## Aug 18 ### [dorking (how to find anything on the Internet)](https://www.alec.fyi/dorking-how-to-find-anything-on-the-internet.html) Advanced Google search cheatsheet. ### [3 Classic Books for Tech Leads (or those aspiring to be)](https://sourcelevel.io/blog/3-classic-books-for-tech-leads-or-those-aspiring-to-be) "Peopleware", "Driving technical change" and "The Mythical Man-Month". ### [Why teams should stop obsessing over their velocity](https://medium.com/serious-scrum/why-teams-should-stop-obsessing-over-their-velocity-826616c9b9b) Instead of obsessing on the delivery of features, we should obsess on the delivery of outcomes ## Aug 14 ### [Announcing the new Jupyter Book](https://blog.jupyter.org/announcing-the-new-jupyter-book-cbf7aa8bc72e) Jupyter Book is a good choose to generate book programatically. ### [A Guide to Python Lambda Functions](https://adamj.eu/tech/2020/08/10/a-guide-to-python-lambda-functions/) <br> [Guido was right, there should be no lambda in Python](http://www.paulbrownmagic.com/blog/vslambda.html) Lambda function is convenience but need to be used carefully for readability and other issues. The original usage of lambda function is for Lambda Calculus, however, we take it as a syntax sugar in Python. ## Aug 13 ### [DeText: A Deep Text Ranking Framework with BERT](https://arxiv.org/pdf/2008.02460v1.pdf) Bert as a representation model for ranking purpose. Pretraining with in-domain data can help improve performance. ## Aug 12 ### [The Art of Not Thinking](http://tiffanymatthe.com/not-thinking) Making the decision in advance and doing a small part first help us use less motivation to start a task. ### [Ask HN: What are the least competitive consumer and enterprise markets?](https://news.ycombinator.com/item?id=24066842) How to find a business problem to solve? ### [Awesome System Design](https://github.com/madd86/awesome-system-design) Lots of materials to improve system design skill. ## Aug 8 ### [Instagram clones TikTok with new short-form ‘Reels’ video feature launching today](https://9to5mac.com/2020/08/05/instagram-clones-tiktok-with-new-short-form-reels-video-feature-launching-today/) Instagram finally clone TokTok with feature named "Reel". ### [Passing your senior engineering coding interview](https://medium.com/@stevenheidel/passing-your-senior-engineering-coding-interview-5a6b30261f68) Preparing coding interview with plan and priciples is really important. ## Aug 6 ### [Powered by AI: Instagram’s Explore recommender system](https://instagram-engineering.com/powered-by-ai-instagrams-explore-recommender-system-7ca901d2a882) Domain-sepcific query language, account level embedding, combination of distilled, light-weight ranking model and high performance ranking model. ## Aug 4 ### [TheirTube](http://www.their.tube/) How people's YouTube home with different persona looks like? ### [GitHub public roadmap](https://github.com/github/roadmap) GitHub shows us its public roadmap in a github way. ### [Why It's Easier to Manage 4 People Than It Is to Manage 1 Person](https://staysaasy.com/management/2020/07/24/Managing-One-Person.html) First-time manager with a junior hire could be a serious disaster, however, this happends frequently. ### [One company's plan to build a search engine Google can't beat](https://www.protocol.com/neeva-search) Neeva tries to bring out subscribe-based, without-ads search engine to not compete with Google. ## Aug 3 ### [Everything you need to know from the tech antitrust hearing](https://www.theverge.com/2020/7/29/21335706/antitrust-hearing-highlights-facebook-google-amazon-apple-congress-testimony) Google's search engine prefers its own service and products. Facebook has issue with content policy. Amazon seems steal third party seller's data to establish its own service. Apple might interfere completition by controling App store. # July ## Jul 31 ### [What's new in TensorFlow 2.3?](https://blog.tensorflow.org/2020/07/whats-new-in-tensorflow-2-3.html) Profiler, keras preprocessing and improved data pipeline. ## Jul 30 ### [Announcing ScaNN: Efficient Vector Similarity Search](https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html) Modifying vector quantization objective to align the goal of ANN task (MIPS). ### [Shortcuts: How Neural Networks Love to Cheat](https://thegradient.pub/shortcuts-neural-networks-love-to-cheat/) Discuss about deep learning generalization problem. Emphasize on importance of o.o.d. test set. ## Jul 29 ### [Beyond the Cache with Python](https://redislabs.com/blog/beyond-the-cache-with-python/) Use Redis as queue, streaming, database and search engine. I would prefer reading reference directly. ### [Design Docs at Google](https://www.industrialempathy.com/posts/design-docs-at-google/) Guideline of using design docs, including structure and useful discussions. ### [22 Principles for Great Product Managers](https://reeve.blog/blog/principles/) Great team works require a leader or PM to keep these principles in mind. ### [I compiled book recommendations from 1300+ leaders](https://www.readthistwice.com/people) Categorized leaders' book recommendations. ## Jul 28 ### [SQLAlchemy ORM Tutorial for Python Developers](https://auth0.com/blog/sqlalchemy-orm-tutorial-for-python-developers/) SQLAlchemy ORM 101. Quite detailed and lengthy. ### [Advanced SQLAlchemy Features You Need To Start Using](https://martinheinz.dev/blog/28) Very advanced SQLAlchemy features, including hybrid properties, mixins and metadata. ## Jul 27 ### [Yes You Can Use GitHub Pages with Python Sphinx](https://www.docslikecode.com/articles/github-pages-python-sphinx/) Tutorial to setup Sphinx doc on github pages. ## Jul 23 ### [OpenAI’s new language generator GPT-3 is shockingly good—and completely mindless](https://www.technologyreview.com/2020/07/20/1005454/openai-machine-learning-language-generator-gpt-3-nlp/) After people giving GPT-3 a try, it shows both genius engineering work and serious problems to solve. ## Jul 21 ### [When your coworker does great work, tell their manager](https://jvns.ca/blog/2020/07/14/when-your-coworker-does-great-work-tell-their-manager/) It would be great if a company has such culture which allows everyone's efforts to be valued. ### [How To Understand Things](https://nabeelqu.co/understanding) Don't stop at an unsatisfactory answer. You are the easiest person to fool, and keep asking yourself "Do I really understand this?". Visualize knowledge and don't afraid to look stupid. ### [The TikTok War](https://stratechery.com/2020/the-tiktok-war/) Trend of media from text, feed, photo to video marks the rise of TikTok. Non-social-network, algorithm feed and availability of video editing are keys to new media's success. ## Jul 20 ### [How To Use the pathlib Module to Manipulate Filesystem Paths in Python 3](https://www.digitalocean.com/community/tutorials/how-to-use-the-pathlib-module-to-manipulate-filesystem-paths-in-python-3) It's more pythonic to use `Path` to replace `os.path`. ### [Hashing it Out](https://akshayr.me/blog/articles/python-dictionaries) Python implements dictionaries using a hash table with open addressing. This article digs into implementation of Python dictionary and provides experiment for illustration. ### [Too many objects: Reducing memory overhead from Python instances](https://pythonspeed.com/articles/python-object-memory/) Python slots magic for memory reduction. ## Jul 17 ### [Duality — A New Approach to Reinforcement Learning](https://ai.googleblog.com/2020/07/duality-new-approach-to-reinforcement.html) Reinforcement Learning can be described by objectives and constraints. Using convex duality and regularization can transform them into a easy-to-optimize version. ### [A new, state-of-the-art voice separation model that distinguishes multiple speakers simultaneously](https://ai.facebook.com/blog/a-new-state-of-the-art-voice-separation-model-that-distinguishes-multiple-speakers-simultaneously) This model works well without knowing number of voice in a mixture speech signal. Detect number of active voice first, then feed signal into specific model for the detected number. ## Jul 16 ### [Linear: The issue tracking tool you'll enjoy using](https://linear.app/) Project management tool for software engineering to track issue and working progress. ### [How Developers Stop Learning: Rise of the Expert Beginner](https://daedtech.com/how-developers-stop-learning-rise-of-the-expert-beginner/) A liitle bit too lengthy. The concept of expert beginner happens frequently in software engineering. The key to get rid of it is to have a big picture of what you are doing. ### [Actual 1950s Proposal: Nuke Alaska](https://www.atlasobscura.com/articles/actual-1950s-proposal-nuke-alaska) Interesting story to reveal crazy idea of using nuke bombs for construction. ### [How to finish your side project](https://hugozap.com/posts/how-to-finish-your-side-project/) Finish a side project is really hard. There are several ways to keep going. for me, taking context note and keeping small steps are the most helpful advices. ### [De-escalating Social Media](https://nickpunt.com/blog/deescalating-social-media) Author proposes a solution to deal with mistake made in social media, especially how to correct influence of the mistake. ## Jul 10 ### [Python and Go : Part I - gRPC](https://www.ardanlabs.com/blog/2020/06/python-go-grpc.html) Python shines in data science while Go provides high throughput service. Use gRPC to combine the best of both worlds. ### [Four over-engineered examples of how type hints can improve your code](https://www.capitalone.com/tech/software-engineering/fizz-buzz-python-type-hints/) Over-engineering part doesn't provide much information. However, it's helpful to put on type hints and use static type checkers like `mypy`. ### [Serverless Web Apps in Python](https://www.sanjaysiddhanti.com/2020/07/05/serverless/) Use combination of `AWS Lambda`, `Zappa` and `virtualenv` to build a serveless web app. Note that `Zappa` is not natively support `docker`, which might be a pain point for development. ### [Interfaces in Python: Protocols and ABCs](http://masnun.rocks/2017/04/15/interfaces-in-python-protocols-and-abcs/) Clear introduction of duck typing and ABC in Python. ### [StephenChou/Surprisify-Playlist-Generator](https://github.com/StephenChou/Surprisify-Playlist-Generator) Playlist generator with configurable surprising level. ## Jul 9 ### [jbesomi/texthero](https://github.com/jbesomi/texthero) Pandas-based text processing toolkits. It's a handy tool to analyze text data in a short time. ### [Apple machine learning in 2020: What’s new?](https://machinethink.net/blog/new-in-apple-machine-learning-2020/) Deep learning support in apple ecosystem becomes mature in 2020. Convert ML models built in other platform can be easily done by using its tool, encryption and model serving are also available. Some on-the-shelf tools make building ML feature in app really simple. ## Jul 7 ### [kotartemiy/pygooglenews](https://github.com/kotartemiy/pygooglenews) A nice tool to help you crawl google news. ### [Python pattern matching](https://ncik-roberts.github.io/posts/pep622.html) PEP 622 proposes a pattern matching construct to Python. In my own opinion, basic usage is nice and helpful but it becomes not pythonic when going too deep. ### [FastAPI for Flask Users](https://amitness.com/2020/06/fastapi-vs-flask/) Easy to switch from Flask to enjoy data validation, modulization, documentation and other high level functionalities. ## Jul 6 ### [Transformers are RNNs: Fast Autoregressive Transformers with Linear Attention](https://arxiv.org/abs/2006.16236) After generalizing similarity function in transformer's self-attention module, linear attention can be achieved by modifying similarity function and leveraging association rule. Also, causal masked attention turns transformer into RNN and speedups inteference time. ## Jul 3 ### [What I learned from looking at 200 machine learning tools](https://huyenchip.com/2020/06/22/mlops.html) Data pipeline is mayjor part of ML, especially in industry. Most of startsups focus on application rather than tools, only big company can affort researcher and situation becomes worse recently. ### [How to Performance Test Python Code: timeit, cProfile, and More](https://therenegadecoder.com/code/how-to-performance-test-python-code/) Basic python profiling. ### [JaidedAI/EasyOCR](https://github.com/JaidedAI/EasyOCR) Support multiple languages. Recognition model is CRC: Resnet + LSTM + CTC. ## Jul 2 ### [Clinging to memory: how Python function calls can increase your memory usage](https://pythonspeed.com/articles/function-calls-prevent-garbage-collection/) Very detail illustration to show it's possible to reduce peak memory usage by proper variable assignment. ### [Pickle’s nine flaws](https://nedbatchelder.com/blog/202006/pickles_nine_flaws.html) After knowing dark side of Pickle, you will avoid using them in any production code. ## Jul 1 ### [Image GPT](https://openai.com/blog/image-gpt/) Proved that GPT model is domain agnostic and works well on generating unrolled image sequence. ### [Big Self-Supervised Models are Strong Semi-Supervised Learners](https://arxiv.org/abs/2006.10029v1) Propose a three-steps training process includes pretrain, finetune and distill. Fully leverage both unlabeled (agreement between augmentation) and labeled data. ### [Google Brain Rethinks Pre-training and Self-training](https://syncedreview.com/2020/06/17/google-brain-rethinks-pre-training-and-self-training) Pre-training sometimes hurt performance when we have lots of labeld data, however, self-training always works well for all setups. # June ## Jun 30 ### [Written communication is remote work super power](https://snir.dev/blog/remote-async-communication) Async writing can solve problems happen in synchronous communication. ### [Do the Real Thing](https://www.scotthyoung.com/blog/2020/05/04/do-the-real-thing/) The key is that real things have real difficulty, we prefer to do fake activities to make ourself feel better. ### [RegexOne](https://regexone.com/) Learn Regular Expressions with simple, interactive exercises. ## Jun 21 ### [Collected Notes](https://collectednotes.com/) Note taking blogger App for minimalist. ## Jun 20 ### [CVPR 2020 Underway, Best Papers Announced](https://syncedreview.com/2020/06/16/cvpr-2020-underway-best-papers-announced) "Unsupervised" is the new fuzz-word in CVPR 2020. All three best papers get rid of supervised learning. ### [Can you remove 99% of a neural network without losing accuracy?](https://towardsdatascience.com/can-you-remove-99-of-a-neural-network-without-losing-accuracy-915b1fab873b) Except those intuitive approaches to prune weights, this article mentions work called "Weight Agnostic Neural Network", which takes a bottom-up approach to build model incrementally. ### [What 6.5 million of #coronavirus tweets and Deep Topological Analysis reveal about people’s thoughts during the pandemic](https://datarefiner.com/feed/covid-twitter) Using "DataRefiner" to analyze tweets about coronavirus. Mostly about clustering analysis with human interpretation. ## Jun 19 ### [The State of Developer Ecosystem 2020](https://www.jetbrains.com/lp/devecosystem-2020/) Programming ecosystem survey results. Lots of people develop on and for Windows. ### [Async Python is not faster](http://calpaterson.com/async-python-is-not-faster.html) Analyze thoughput for sync and async web service framework. One precondition is that we need to give sync framework enough workers, which is quite realisic for web service. ## Jun 18 ### [Step-by-step guide to contributing on GitHub](https://www.dataschool.io/how-to-contribute-on-github/) Super basic step-by-step guide. ### [Python 101 – Exception Handling](https://www.blog.pythonlibrary.org/2020/06/17/python-101-exception-handling-2/) `try`, `except`, `else` and `finally`. ## Jun 16 ### [Facebook contest reveals deepfake detection is still an "unsolved problem"](https://www.theverge.com/21289164/facebook-deepfake-detection-challenge-unsolved-problem-ai) Facebook creates dataset consists of deepfake (face replacement) clips for contest. However, result shows that it’s still a quite challenging task, especially for unseen ones. ### [Linformer: Self-Attention with Linear Complexity](https://arxiv.org/pdf/2006.04768.pdf) By observation author shows that context matrix of transformer is low rank, hence, we can project Key and Value matrix to low dimension, which is proved to be a constant and reduce time and space complexity from O(n^2) to O(n). ## Jun 14 ### [Eloston/ungoogled-chromium: Google Chromium, sans integration with Google](https://github.com/Eloston/ungoogled-chromium) This repository tries to isolate google dependency and build a drop-in replacement for Chromium. ### [Compare Benefits](https://www.levels.fyi/benefits/) Comparison benefits between tech giants. ## Jun 12 ### [Make Your Code Great, Python Style](https://livecodestream.dev/post/2020-06-08-make-your-code-great-python-style/) Fundamental pythonic style guide. ### [Why You Should Use More Enums In Python](https://florian-dahlitz.de/blog/why-you-should-use-more-enums-in-python) Using Enums is a good way to group magic numbers into meaningfull variables, providing functionalities like readability, comparison and bit operation. ### [How async should have been](https://sobolevn.me/2020/06/how-async-should-have-been) Author tries to remove `async` and `await` keyword from python. It uses concept of `Abstraction`, functional programing and `dry-python/returns`. ## Jun 11 ### [Google Meet noise cancellation is rolling out now](https://venturebeat.com/2020/06/08/google-meet-noise-cancellation-ai-cloud-denoiser-g-suite/) Supervised method to cancel noise, including bark, keyboard strokes, door slamming and etc.. Crucial parts are corpus collection, noice definition and latency tuning. ### [PEGASUS: A State-of-the-Art Model for Abstractive Text Summarization](https://ai.googleblog.com/2020/06/pegasus-state-of-art-model-for.html) Hypothize that the closer pretraining method to a task the better it performs. Therefore, it pretrains model (PEGASUS) by predict missing sentences, especially important sentences. ### [Exploration Strategies in Deep Reinforcement Learning](https://lilianweng.github.io/lil-log/2020/06/07/exploration-strategies-in-deep-reinforcement-learning.html) Useful summary article covers most of exploration strategies. ### [How big should my language model be?](https://huggingface.co/calculator/) Big models are surprisingly efficient. it’s a trade off between model size and training steps with a given computation budget. We should consider increasing model size more than training steps. ## Jun 8 ### [Stop Taking Regular Notes; Use a Zettelkasten Instead](https://hackernewsletter.us1.list-manage.com/track/click?u=faa8eb4ef3a111cef92c4f3d4&id=82fb71d7ad&e=f47ff4767b) Connecting ideas is to form knowledge. Summarize article to sentences and tag to connect different ideas. Finally classify ideas into topic for easy browsing. ### [dry-python/returns: Make your functions return something meaningful, typed, and safe!](https://github.com/dry-python/returns) Wrapper to write functional programming in Python. However, I think it’s a little bit over-engineering, Python’s type annotation and careful design can handle most problems. ## Jun 5 ### [Practical Python Programming](https://dabeaz-course.github.io/practical-python/) Good structural tutorial to learn Python from zero ### [Video summary as a service](https://github.com/PicardParis/cherry-on-py) A step-by-step guide to build a video summary service on gcloud ### [Our Python Monorepo](https://medium.com/opendoor-labs/our-python-monorepo-d34028f2b6fa) Monorepo can solve problems like code reusing, CI/CD, deployment. However, it still needs extra effort to adapt tools to live with monorepo. ## Jun 4 ### [Microsoft researchers say NLP bias studies must consider role of social hierarchies like racism](https://venturebeat.com/2020/06/01/microsoft-researchers-say-nlp-bias-studies-must-consider-role-of-social-hierarchies-like-racism/) Researches lack clear descriptions of bias and fails to explain how, why, and to whom that bias is harmful. ### [Netflix Builds Proof-of-Concept AI Model to Simplify Subtitles for Translation](https://news.developer.nvidia.com/how-netflix-uses-ai-to-simplify-subtitles-for-translation/) <br/> [Simplify-then-Translate: Automatic Preprocessing for Black-Box Machine Translation](https://arxiv.org/abs/2005.11197) Using back-translation to train automatic preprocessing model to simplify source sentence for better translation result especially for low-resource language. ## Jun 2 ### [GPT-3: Language Models are Few-Shot Learners](https://arxiv.org/pdf/2005.14165.pdf) <br/> [GPT-3: The New Mighty Language Model from OpenAI](https://towardsdatascience.com/gpt-3-the-new-mighty-language-model-from-openai-a74ff35346fc) “GPT-3″ is just a bigger GPT-2, a huge task agnostic model to perform zero, one and few shot learning. # May ## May 31 ### [Untools: Tools for better thinking](https://untools.co/) Interesting tools/models for thinking. Should be useful when stuck in a complex problem. ### [Hypermodern Python](https://cjolowicz.github.io/posts/hypermodern-python-01-setup/) Great guide covers setup, testing, linting, typing, documentation to CI/CD. You can try building a project with this guide. ### [10 Reading habits that changed my life](https://medium.com/@manjotpahwa/10-reading-habits-that-changed-my-life-5c7673bc34bc) Short book reading principles ### [Habits of High-Functioning Teams](https://deniseyu.io/2020/05/23/habits-of-high-performing-teams.html) High Psychological Safety, Good hygiene practices, Active redistribution of “experience points”, Communicating generously