Fast.ai Asia Study Group Pre-Class Meeting

# Fast.ai Asia Study Group Pre-Class Meeting Date: 2018-10-20 19:30 GMT+8 Attendees: - Sanyam - George - Haider - Cedric Agenda: pre-class meeting and defining the study plan. ## Setup - Find a machine, local or Cloud, that is suitable for the class. - Install fastai v1.0.7 library. - Everyone shared their setup and current status - Haider: planning to upgrade his local machine from Ubuntu 16.04 to 18.04 LTS. - Sanyam: I prefer v18. The UI is slightly cleaner-it works without issues and for me, I don't have to go through the pain of installing drivers for my PCIe card and LAN cards since those aren't supported on 16 but work fine with 18. The same prevails for any system built after Dec 2018 since the kernel on 16 isn't supporting the new Wifi+BE cards. Currently, I’m using fastai 0.7 and 1.0 with Ubuntu 18. Everything is working okay. - Cedric: tips—recommended to use one version before the latest, so if you bounce into an issue, you can find answers easier online through StackOverflow, fastai forums, etc. - Cloud vs. local - Cedric: if you are a heavy user and particularly interested in cutting down the cost, go local. Please bear in mind, for local, you still have to account for the electricity. You can cut down cost on Cloud by using: - AWS Spot instances (US price as low as US$0.27 per hour). - With GCP, customers can save 50% from on-demand prices by using GPUs with preemptible instances. (US/Europe/Asia preemptible price is as low as US$0.135 per hour) - Salamander (US$0.36 per hour). - ...and many more. - Pros and cons of Cloud and local GPU(s): - Cloud gives you flexibility. For examples, you can switch to the latest GPU like V100 on-demand, scale out to tens or hundreds of GPUs. Quoting Jeremy advice: start local for development and scale out to Cloud for testing and deployment. - A good GPU like GTX 1080Ti in local can bring you very far. - In this course, we will learn all sorts of creative way to "make deep learning uncool again", so you can fit and train a complex model faster, even with limited compute power. ## Logistics and Context - For those of you who will be watching the live stream together, if you are shy to ask questions on the fast.ai forums, you can do it in our Slack channel. - Everyone shared their commitment for this course. For example, how much time can you allocate for studying this course? ## How to go through each class - Run through all the codes without running into a bug. - Replicating the notebook by yourself. - Understand codes that you don't understand (as much as you desire) - Read the docs. - Read related books on other libraries, like scikit-learn, Pandas, Numpy, PyTorch. - Read the implementation. - Try an already finished Kaggle competition related to this week's topics. - Read blogs and tutorials to dive deeper or on related topics. ## Projects Possible projects which we can take on together or even individually: - Join Kaggle competition as a team or replicate a winner solution and compete with each other, so we can practice what we learn up to certain lessons. - Port ML course to the new version of the fastai library. - Work on a project together and publish a blog post series. For example, Convolutional Neural Networks for mobile devices, if you are interested: - Train, deploy and run PyTorch model on mobile devices OS such as iOS and Android. - Research on efficient convolutional neural networks for mobile vision applications. - Example: MobileNetV2 vs. Shufflenet vs. Squeezenet - Haider: have anyone really tries applying things from fast.ai outside of the course like in their work or going extra miles applying it to projects that are not part of the lessons? - Cedric: I have applied what I learned and help a startup company building their data products such as medical diagnostics for MRI images. This is a complex project and the fastai library is holding up OK (so far I have not found a need to ditch fastai abstractions. I can create customized networks and model with the help of PyTorch beautifully design API that fastai sits on top. If you complete Part 2 of the course, you will learn that fastai library can be stretch to go very far. Example: PyTorch hook, callbacks, fastai custom head, multiple heads, and many more) ## Working on Challenging Problem - Haider: touch on a little on his challenging encounter with [TextRay](https://arxiv.org/abs/1806.02121) research. - Cedric: you will learn how to achieve what you plan to achieve if you have a little patience and study Part 2. ## Should We Learn PyTorch - Haider: should I spend time learning PyTorch at this point of the course? - Cedric: it is not even an option. You will naturally learn PyTorch starting from Part 1, lesson 7 when Jeremy started to 'peel of the layers' (open the cover of the car engine) and build a neural network from scratch. ## Developer Tools ### Tmux/Screen - Tmux is a terminal multiplexer. - Haider: can it save our work, can it remember our DL model training progress? - Sanyam: Yes, it saves your Jupyter Notebook process in the background of the Cloud VM when you disconnect SSH from the Cloud VM. - George: The reason we lost our work when ssh breaks connection is that the server drops the processes started by the ssh shell. Because Tmux saves the process, nothing will be lost. - Tmux is not just for window tiling in your CLI/Bash shell. - Cedric: If you are a more serious DL practitioner like you have many long-running Python jobs, don't do the training using Jupyter Notebook. Create a project with Python scripts in it and run it as Python process from your terminal. So, if you lose your connection to the VM, you don't lose your work. ## On the Challenging Nature of DL or ML in General - George: as we know, the world of DL/ML is moving at a pace that we think it's hard to catch up. Take, for example, I have to learn Pandas, NumPy on top of learning PyTorch. That's a lot to take. Phew! - Cedric: learn as you go along. Don't go deep on a topic like attempting to master Pandas when you just get started. Most importantly, have the appetite to learn. (Sanyam shared the same opinion) ## The Considerations of Taking the 2016/17 (Keras+TF) and 2018 (PyTorch) Version of the Course - Cedric: To quote Jeremy's advice, "learning frameworks or libraries like Keras or PyTorch is easy. Learning the DL concepts is hard". You still can learn useful things be it from 2017 or 2018 version. A few examples: - ideas and concepts - best practices - tips & tricks - students' stories - Jeremy's story - Cedric: Another thing to note, if you take Part 2, "Cutting-Edge DL", you will learn new research that the other version not covering it, such as: - ULMFiT (2018) - CycleGAN (2018) - K-means (2017) "The treasure of old things". ## Tips on Debugging - Use `pdb` and turn on `%xmode` for verbose mode on Jupyter Notebook. - Haider: `pdb` is not nice to use. - Remote debugging: - Haider: PyCharm connected to the remote machine? - Cedric: [Visual Studio Code (VS Code) + Python ptvsd library for remote debugging Python processes](https://code.visualstudio.com/docs/python/debugging#_remote-debugging) with features like setting breakpoints, stepping to the next breakpoint, etc. However, personally, I have not tested this yet. ## Recommended Editor or IDE - Use whatever editor you are comfortable with. - Cedric: I personally prefer to use a lightweight editor like Sublime Text or VS Code on my local PC and Vim/Nano when I am connected to the remote computer. - George: IDE is more for the purpose of software engineering. You need it when you want to build large software. Data science is more testing ideas and fast modeling. Lightweight editor suffices. ## Tips on Jupyter Notebook ### Persisting Jupyter Notebook and Python State - Haider: The problem: disconnected and losing Jupyter Notebook training state/progress. - Cedric: this issue is no longer true if you are using Jupyter Lab. When you got disconnected and reconnect to the IPython process, Jupyter Notebook client (the JavaScript web GUI) will catch up with the training progress (state stored as a log in the server) and things will resume properly. ## Viewing Fastai Documentation - Sanyam: shared his attempt to find a way to view fastai v1 docs from within Jupyter Notebook. - Cedric: trying to understand Sanyam's question by clarifying what type of docs are we referring to. Is it the [Python code documentation](https://realpython.com/documenting-python-code/) or [fastai docs](https://docs.fast.ai/)? Sanyam: fastai docs inline (or side-by-side) inside Jupyter Notebook. - George: I believe that he means docs written in Jupyter Notebook format that can readily be run interactively. ## Life Pro Tips on Getting the Most Out of this Course - Cedric: No doubt, you can learn a lot DL knowledge from the course. Personally, one of the takeaway from the courses are "learning how to learn". To get the most of the course, you have to come out an attack plan and stick to it. Ensure you diligently write your notes as you progress along the course. Put down your thoughts and what you learned, what works, what doesn't work, etc in your notes and share it either as blog posts or documentation (personal wiki/knowledge base). Have some tenacity. Stay focus. Cut down on distracting things. Most importantly, enjoy the process and have fun! ## Should I go for PhD (Academia) or Industry - ICYMI, read Rachel's blog post, ["What You Need to Know Before Considering a PhD"](http://www.fast.ai/2018/08/27/grad-school/) - George: sharing his recent experience and his background in Computer Science and Maths. Advanced Pure Math is aesthetically enjoyable but often too aloft from most practical applications. I now find myself enjoying more the feeling of building something and solve problems with codes. Still, enjoy studying math as leisure activities. Research is more like higher math than development. It really depends on your personal taste and goal in life. - Sanyam: shared a bit about his undergrad status, on completing his final year and how to decide on the decision like what to do next. - Haider: currently doing his PhD. In his opinion, the industry is more practical because he thinks it's a better way to spend his time. This is what opportunity cost means. - Cedric: how disconnected the academia and industry. Reiterate Jeremy's opinion on this topic. - PhD: if you are willing to spend long years (5++) doing research to push the state-of-the-art and go deep on a research topic. - Industry: if you are interested to apply research from academia to the real world. ## Do you recommend studying Machine Learning (ML) Course? - Sanyam: I recommend it. In fact, everything from Jeremy is very useful. - Cedric: I have studied the ML for Coders course, up to lesson 7 as time doesn't allow. The ML course covers the foundations for modern ML such as decision tree based models (particularly random forests), what makes a good train-validation set split and model interpretation. From lesson 8 onward, Jeremy started to teach neural networks and other topics. Not because it is not useful, but I had something else. - George: ML and DL course complement each other. ML is mostly about structured data, one that can be fit into neat rows and columns with clear labels, while DL about unstructured ones like image, sound, and language. ML also covers more data preparation and preprocessing. ## Has anyone try the first lesson notebook? - Cedric: I have tried the notebook with fastai 1.0.7 and it works. - Sanyam: How do you update fastai package to the latest version? - Cedric: I ran the command, `conda update --all`. You can update a specific package using the command, `conda update -c fastai fastai`. - Haider: Perhaps better to wait for Jeremy's first lesson or his recommendation in the forum. With `conda update --all` we are running the risk of breaking the dependencies. Jeremy [did not like it](https://forums.fast.ai/t/these-4-lines-will-solve-80-of-your-problems/8280/6?u=hayder78) too for the previous version of fastai.