--- tags: BigData-MS-2019, BigData-BS-2019 title: Labs. Prerequisites. --- # Labs: Prerequisites Welcome to the course of Big Data. This document will describe the list of software you will need in this course. Straight ahead! ## Software used in the course: - [Virtual Box](https://www.virtualbox.org/wiki/Downloads) - [Vagrant](https://www.vagrantup.com/intro/getting-started/install.html) - [Docker](https://docs.docker.com/get-docker/) - [Hadoop 3.3.0](https://archive.apache.org/dist/hadoop/common/hadoop-3.3.0/) - [Spark 3.0.0](https://spark.apache.org/downloads.html) ## Hardware Requirements **Processor**: Virtualization Enabled ([Check Virtualization on Windows](https://www.thewindowsclub.com/check-intel-amd-processor-supports-hyper-v), On Linux: `lscpu`). Sometimes it is disabled in BIOS. **Memory**: We recommend at least 8Gb of RAM **Storage**: To complete the first half of the course, you will need roughly 20Gb of free storage on your hard disk. **OS**: macOS or Linux (latest Ubuntu LTS) are fine. You will have some additional issues with Hadoop and Spark on Windows, but it will work. If you want to use Linux - install it on hardware, do not use nested virtualization. :::info If you cannot meet these hardware requirements, please consult with your TA. ::: ## Recommended Reading Refresh your knowledge of [bash](https://learnxinyminutes.com/docs/bash/). ## First Lab For the first lab, install Virtual Box, Vagrant, and Docker. Download Vagrant Box Image from [Vagrant Repository](https://cloud-images.ubuntu.com/vagrant/trusty/current/trusty-server-cloudimg-amd64-vagrant-disk1.box). ## Useful Software - Terminal Emulators (highly recommended for Windows): [Terminus](https://eugeny.github.io/terminus/), [Hyper](https://hyper.is/), [Cmder (Windows Only)](https://cmder.net/), [ConEmu (Windows Only)](https://conemu.github.io/) - Modern Editors: [Atom](https://atom.io/), [Visual Studio Code](https://code.visualstudio.com/) - Bash tools on Windows (highly recommended): [Gnu on Windows](https://github.com/bmatzelle/gow/wiki), [Git Bash](https://www.atlassian.com/git/tutorials/git-bash), [Linux Subsystem](https://docs.microsoft.com/en-us/windows/wsl/install-win10) - IDE (highly recommended): [IntelliJ Idea](https://www.jetbrains.com/idea/download/) ## Books: - [Designing Data Intensive Application](https://dataintensive.net/) - [Programming in Scala](https://booksites.artima.com/programming_in_scala_3ed) - *Extra: [Hadoop: The Definitive Guide](https://www.oreilly.com/library/view/hadoop-the-definitive/9781491901687/)* - *Extra: [Spark: The Definitive Guide](http://shop.oreilly.com/product/0636920034957.do)*