# Done: Big Data & ML on GCP Fundamentals - 12 Feb 2021
https://hackmd.io/@ajinkyakolhe112/SJc3UK7Z_/edit or https://**is.gd/uhujen**
Click on edit from menu on top left corner
Slides: https://1drv.ms/b/s!Aq6hYeVV5o6DhstxiE85WHGWV5bISQ?e=VRad8W
Learn better by doing these 3 Things
1. Relate what's being taught to what you know already
- Make it engaging by relating it movies
2. Make detailed **Notes** to understand deeply
3. Revise via **Notes** to increase the retention
4. After revising for 5 to 10 times, the information is permanent in the long term memory
## Collaborative Notes
### Intro Cloud Computing
* 1. Old Recording of the class: https://www.youtube.com/playlist?list=PLY7sQ59Bufns3VafkhnHpbdbGBrTxSXwi* 1. Old Recording of the class: https://www.youtube.com/playlist?list=PLY7sQ59Bufns3VafkhnHpbdbGBrTxSXwi
Importance of Cloud: https://cloudwars.co/special-report-2020/
Cloud is abstraction over computing. We request the hardware we need and run the code there.
Instead of owning the hardware and managing it ourself, we use the hardware owned by the cloud and managed by the cloud
**Free Labs**: https://go.qwiklabs.com/qwiklabs-free
Types of Jobs on Cloud
1. **Migration**. We move the original codebase from on prem to on cloud
* Google Cloud has partner ecosystem and parterns are heavily involved in migration.
* Migration Jobs
1. **Step 1:** Move application as is (Lift & Shift). We are simply running the same application but at different place
1. Most demand for Cloud Enabled people
3. **Step 2:** Improve the application to take advantage of Cloud Products.
4. **Step 3:** (Cloud Native / Serverless) Rewrite everything for cloud Native. No maintainence required
5. Can find many guide here: https://cloud.google.com/docs/tutorials/
2. **Cloud Native Development or New application**: Generally startups do this
- Cloud Architect is Design & Develop the system
- Design consideration for Scale for security for future growth all these are part of architect's job
- Cloud Developer just Develop the system
- Architect vs Civil Engineer
- Civil Engineer Builds the building according to the blueprint.
- Architect designs the blueprint considering the feasibility of the building.
- Movies Reference: Inception Dream Designer: Architect
4. Feature Addition on prexisting cloud hosted product
5. Maintainance of the cloud hosted Product
OnPrem vs Cloud vs Serverless Cloud
- OnPrem - User configured, user managed and user maintained
- Cloud - User configured, provider managed and provider maintained
* Different Ways of Using Cloud
* (on prem is user bought, user configured & user maintained)
* Infrastructure as a Service: User configured, user maintained & Provider provided
* Platform as a Service / Managed Product: User configured, provider managed & maintained but partial work still is needed by user
* Fully Managed / Serverless: Everything is done by the provider. User just codes
* Example restaurant
* IaaS: You cook in the restaurant.
* PaaS: Buffet self service
* Serverless: Waiter serves you prepared food
* https://www.episerver.com/articles/pizza-as-a-service & https://www.bmc.com/blogs/saas-vs-paas-vs-iaas-whats-the-difference-and-how-to-choose/
- Serverless Cloud - Fully automated and no configuration required.
Details about Scale
- Horizontal Scaling - increase number of machines. Has no limit. Distributed Computing
- Vertical Scaling - increase power of machines not increase the number. Has limits. Moore's law (Super Computer)
- AutoScale: when scaling happens automatically
- Scale: when you have to execute command for scaling
### Intro to Big Data & Cloud
Data Sciencitist vs Data Engineer
- Overlapping skillset but different specialization
- Data Science is overcrowded,if you are really good then go for it. Otherwise choose Data Engineering if interested.
- Demand Ratio is 3:1 , where as supply ratio is 1:3.
- Concentrate and focus less crowded areas otherwise be the best in crowd.
Big Data
- data analysed is just 1% of all data produced, thus huge potential for growth in this field
- Cloud helps analyze such a large scale data easier compared to on prem systems
Different roles in Data Team
- 1x Infra (Physical, IaaS or Cloud)
- 1x DevOps (Stack automation,, Containers and Platform as a service)
- 2x **Data Engineer** (Data Pipelines, Data Automation, Data as a Service, Data Ingestion)
- 2x **Analytics** (1x Batch Analytics, 1x Real-Time Analytics and Predictive APIs)
- 1x AI and ML **Data Scienst** (Machine Learning and AI algorithms)
- 1x Front-End Dev (Web and Js developer, web and mobile apps)
- Specialized Roles
- 1x Network Architect
- 1x Security Engineer
- 1x Community Writer
- 1x Data Viz Developer
DEMAND V/S SUPPLY: Consider to future demand v/s supply and neither present demand v/s supply nor past demand v/s supply.
### GCP Resources
* http://comparecloud.in/
* List of All products on cloud https://dynalist.io/d/fBmbDZD2dT2VXnq2OtHW8w7G
* https://www.gcpweekly.com/gcp-resources/
* https://cloud.google.com/training#learning-paths
* https://github.com/gregsramblings/google-cloud-4-words
* GCP Certifications Details https://www.evernote.com/shard/s295/sh/ab8acf7b-98b0-46b3-afbd-3756b46a825e/ffb53c4f70d0fe7fb85f56a9a80bad2f
*
### Module 2: Hadoop on GCP - Dataproc
- https://mattturck.com/data2020/ & https://mattturck.com/data2019/
1. DataProc: Managed Hadoop. Nothing but Hadoop on GCP. (Called EMR/Elastic Mapreduce in aws and HDinsight in Azure)
- Cluster: Group of machines who work together parallely. It's divide the work and do it parallel. That is the basis of big Data
- Migrate on prem Hadoop to On cloud Dataproc
- https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-jobs
- https://cloud.google.com/solutions/migration/hadoop/migrating-apache-spark-jobs-to-cloud-dataproc
- https://cloud.google.com/solutions/migration/hadoop/hadoop-gcp-migration-overview
2. Cloud SQL: Managed RDBMS (MySql, SQL server and PostgreSQL)
- OLTP
- Migrate on prem sql to Cloud SQL
- Migrate oracle to cloud sql https://cloud.google.com/solutions/migrating-data-from-oracle-to-cloud-sql-for-mysql or https://cloud.google.com/solutions/migrating-mysql-to-cloudsql-concept
- Migrate others
Noise cancellation viva AI: https://ref.krisp.ai/u/u90448ea3b
### Module 3: DataWarehouse BigQuery
- Uses SQL to query big Data. It is fully managed / serverless product.
- It is a Data Warehouse, not a Database. Data looks like RDBMS, but Datawarehouse is optimized for Read queries not update or delete.
- For Reddit BigQuery datasets(references) : https://www.reddit.com/r/bigquery/wiki/datasets
- BigQuery Syntax: https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax
- Advanced features
- Supports geostationary data
- Supports Machine learning
- Bigquery
- https://cloud.google.com/blog/products/gcp/anatomy-of-a-bigquery-query
- https://cloud.google.com/blog/products/data-analytics/new-blog-series-bigquery-explained-overview
- BigQuery ML
- https://cloud.google.com/bigquery-ml/docs/reference/standard-sql/bigqueryml-syntax-create
### Module 5: Machine Learning
Get Oreilly Membership via ACM. (ORielly is 60$ per month, where as ACM professional membership for Developing country is just 1$ per month)
- https://go.oreilly.com/acm
- https://services.acm.org/public/qj/proflevel/countryListing.cfm?promo=PWEBTOP&form_type=Professional
- Certification Guide Books on Oreilly
- https://learning.oreilly.com/library/view/official-google-cloud/9781119564416/
- https://learning.oreilly.com/library/view/official-google-cloud/9781119602446/
- https://learning.oreilly.com/library/view/official-google-cloud/9781119618430/
---
## Questions
- [ ] Confused in difference between cloud computing and serverless computing, so can you please explain working of serverless computing with example and how we are implementing it?
- [ ] What is meant by no configuration required in serverless?
- [ ] Is serverless cloud somewhere related to PaaS or SaaS ?
- [ ] did these data in serverlesscloud not required someone to maintained it. Waht actually is automated and what is not?
- [ ] Since cloud providers also have physical servers behind these providings. Is it actually possible that at sometime the server might be clogged up?
- [ ] It could be. But it's the job of the load balancer to ensure that no traffic is allocated to such clogged up server
- [ ] Is it possible to do performance turing in the various layeres in cloud Arctitecture ?
- [ ] Step 1 is lift and shift. Step 2 is optimizing further by chaning iaas to managed products. You can do different optimizations in different layers. Step 3 will be probably with cloud native
- [ ] Sorry if this question is outside the scope of this session. Am from a legacy Mainframe background with 10+ years of IT experience. Will doing the ML certification with no hands-on experience will be helpful in changing a career stream?
- [ ] I don't think it will be helpful. Specialize in mainframe migrations
- [ ] How to prepare for ML engineer