BigDataInnopolis

@BigDataInnopolis

Joined on Aug 8, 2019

  • Lab Block 1: VMs with Vagrant This tutorial guides you through creating your first Vagrant project. Vagrant is a virtual machine management software. It allows us to create instructions for automatic configuration and deployment of virtual machines at scale. The configuration of virtual machines is easy to understand. We start with this topic and later will move further to learn how containers can replace virtual machines. We will do the following: Spin up a generic Ubuntu VM Install Apache server Perform port forwarding Learn how to create a multi-machine environment Connect multiple virtual machines with VPN
     Like 2 Bookmark
  • Spark, Recommender System The goal of this task is to finalize the implementation of a movie recommendation system. In the process, you will get more experience with programming in Scala and working with RDDs. Dataset: Modified MovieLens dataset (Mirror), also available in HDFS hdfs://namenode:9000/movielens-mod Project Template: Download the project and try compiling it. Compilation should complete without errors. (Mirror) Extra files: Download list of movies for collecting user preferences Inspecting and Running The Template Inspect the project template content.
     Like  Bookmark
  • Lab Block 1: Docker This tutorial guides you through creating your first Docker container. It was heavily drawn from the official Docker Getting Started Guide. :::info This tutorial comes with a troubleshooting section that lists the most common problems. ::: How is Docker different from VMs? Virtual Machine emulates a fully working isolated OS. It requires the same resources from the host as a normal OS would, meaning it would load its kernel into the memory, load all necessary kernel modules, all the libraries to work with the software and only then will allocate resources for a user application. If you run 100 identical VMs, they would occupy 100 times more resources.
     Like 1 Bookmark
  • Labs: Prerequisites Welcome to the course of Big Data. This document will describe the list of software you will need in this course. Straight ahead! Software used in the course: Virtual Box Vagrant Docker Hadoop 3.3.0 Spark 3.0.0
     Like  Bookmark
  • Assignment №2. Introduction to Big Data. Stream Processing with Spark Due Date: 8-th October 2019, 23:55 Teams: no less than three and no more than four students. Team representative should send list names to your TA Rule for new team: no ex-teammates. Assignment Details: Should be implemented in Scala. Cluster address is the same. Using this link, you can check the cluster status. In general, this task is not computationally intensive. (But we recommend you to perform hyperparameter search and cross-validation on the cluster) Stream Address: 10.90.138.32:8989 Report: Read Non-Technical Guide to Writing a Report to understand how to present your work in the best way. Submission Format: report, link to GitHub, compiled binaries, and an example of the output. Store your full outputs and your trained model in your groups' folder in HDFS. Grading policy: Individual grade is based on the role in the team (contribution). The description of the personal contribution should be provided in the report.
     Like  Bookmark
  • Big Data. Bash Refresher by Boris Unix/OS/bash Introduction Here you will become familiar with tools that "red eyes" community (a.k.a. Linuxoids) have to use every day. Whenever you read manuals, symbol $ usually signifies the beginning of a bash command. For example: $ echo hello
     Like  Bookmark