Perform Foundational Data, ML, and AI Tasks in Google Cloud

# Perform Foundational Data, ML, and AI Tasks in Google Cloud **Vertex AI: Qwik Start** :::success **Insight** 1. Enable Google Cloud services 2. Create Vertex AI custom service account for Vertex Tensorboard integration 3. Launch Vertex AI Workbench notebook 4. Clone the lab repository 5. Install lab dependencies ::: --- **Dataprep: Qwik Start** :::success **Insight** 1. Create a Cloud Storage bucket in your project 2. Initialize Cloud Dataprep 3. Create a flow 4. Import datasets 5. Prep the candidate file 6. Wrangle the Contributions file and join it to the Candidates file 7. Summary of data 8. Rename columns ::: --- **Dataflow: Qwik Start - Templates** :::success **Insight** 1. Create a Cloud BigQuery dataset and table Using Cloud Shell 2. Create a Cloud BigQuery dataset and table using the Cloud Console 3. Run the pipeline 4. Submit a query ::: * Google Cloud Dataflow supports batch processing. True * Which Dataflow Template used in the lab to run the pipeline? Pub/Sub to BigQuery --- **Dataflow: Qwik Start - Python** :::success **Insight** 1. Create a Cloud Storage bucket 2. Install pip and the Cloud Dataflow SDK 3. Run an example pipeline remotely 4. Check that your job succeeded ::: * Dataflow temp_location must be a valid Cloud Storage URL. True --- **Dataproc: Qwik Start - Console** :::success **Insight** 1. Create a cluster 2. Submit a job 3. View the job output 4. Update a cluster ::: * Which type of Dataproc job is submitted in the lab? Spark * Dataproc helps users process, transform and understand vast quantities of data. True --- **Dataproc: Qwik Start - Command Line** :::success **Insight** 1. Create a cluster 2. Submit a job 3. Update a cluster ::: * Clusters can be created and scaled quickly with a variety of virtual machine types, disk sizes, and number of nodes. True --- **Cloud Natural Language API: Qwik Start** :::success **Insight** 1. Create an API key 2. Make an entity analysis request ::: --- **Google Cloud Speech API: Qwik Start** :::success **Insight** 1. Create an API key 2. Create your Speech API request 3. Call the Speech API ::: --- **Video Intelligence: Qwik Start** :::success **Insight** 1. Enable the Video Intelligence API 2. Set up authorization 3. Make an annotate video request ::: --- **Perform Foundational Data, ML, and AI Tasks in Google Cloud: Challenge Lab** [Challenge Lab](https://medium.com/@adhwaithchandrann00b/perform-foundational-data-ml-and-ai-tasks-in-google-cloud-challenge-lab-b7cc723ae1e8) Add the following info from your Quest ``` REGION= Dataset= TABLE= TASK_3= TASK_4= PROJECT_ID=$(gcloud config get-value project) target=$Dataset.$TABLE bucket_name=$PROJECT_ID-marking bq mk $Dataset gsutil mb gs://$bucket_name ``` ``` cat > table.py <<EOF from google.cloud import bigquery # Construct a BigQuery client object. client = bigquery.Client() table_id = "$PROJECT_ID.$Dataset.TABLE" schema = [ bigquery.SchemaField("guid", "STRING", mode="NULLABLE"), bigquery.SchemaField("isActive", "BOOLEAN", mode="NULLABLE"), bigquery.SchemaField("firstname", "STRING", mode="NULLABLE"), bigquery.SchemaField("surname", "STRING", mode="NULLABLE"), bigquery.SchemaField("company", "STRING", mode="NULLABLE"), bigquery.SchemaField("email", "STRING", mode="NULLABLE"), bigquery.SchemaField("phone", "STRING", mode="NULLABLE"), bigquery.SchemaField("address", "STRING", mode="NULLABLE"), bigquery.SchemaField("about", "STRING", mode="NULLABLE"), bigquery.SchemaField("registered", "TIMESTAMP", mode="NULLABLE"), bigquery.SchemaField("latitude", "FLOAT", mode="NULLABLE"), bigquery.SchemaField("longitude", "FLOAT", mode="NULLABLE"), ] table = bigquery.Table(table_id, schema=schema) table = client.create_table(table) # Make an API request. print( "Created table {}.{}.{}".format(table.project, table.dataset_id, table.table_id) ) EOF ``` ``` python3 table.py gcloud dataflow jobs run lab-transform --gcs-location gs://dataflow-templates-$REGION/latest/GCS_Text_to_BigQuery --worker-machine-type e2-standard-2 --region $REGION --staging-location gs://$PROJECT_ID-marking/temp --parameters javascriptTextTransformGcsPath=gs://cloud-training/gsp323/lab.js,JSONPath=gs://cloud-training/gsp323/lab.schema,javascriptTextTransformFunctionName=transform,outputTable=$PROJECT_ID:$Dataset.$TABLE,inputFilePattern=gs://cloud-training/gsp323/lab.csv,bigQueryLoadingTemporaryDirectory=gs://$PROJECT_ID-marking/bigquery_temp ``` ``` gcloud dataproc clusters create cluster-f357 --region $REGION --zone $REGION-a --master-machine-type e2-standard-2 --master-boot-disk-size 500 --num-workers 2 --worker-machine-type e2-standard-2 --worker-boot-disk-size 500 --image-version 2.0-debian10 --project $PROJECT_ID ``` ``` gcloud beta compute ssh cluster-f357-w-0 -- -vvv ``` ``` hdfs dfs -cp gs://cloud-training/gsp323/data.txt /data.txt exit ``` ``` gcloud config set dataproc/region $REGION gcloud dataproc jobs submit spark --cluster cluster-f357 \ --class org.apache.spark.examples.SparkPageRank \ --cluster=cluster-f357 \ --jars file:///usr/lib/spark/examples/jars/spark-examples.jar -- /data.txt ``` ``` gcloud services enable apikeys.googleapis.com gcloud alpha services api-keys create --display-name="testname" KEY_NAME=$(gcloud alpha services api-keys list --format="value(name)" --filter "displayName=testname") API_KEY=$(gcloud alpha services api-keys get-key-string $KEY_NAME --format="value(keyString)") echo $API_KEY ``` ``` gcloud iam service-accounts create techvine \ --display-name "my natural language service account" gcloud iam service-accounts keys create ~/key.json \ --iam-account techvine@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com export GOOGLE_APPLICATION_CREDENTIALS="/home/$USER/key.json" gcloud auth activate-service-account techvine@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com --key-file=$GOOGLE_APPLICATION_CREDENTIALS gcloud ml language analyze-entities --content="Old Norse texts portray Odin as one-eyed and long-bearded, frequently wielding a spear named Gungnir and wearing a cloak and a broad hat." > result.json gcloud auth login --no-launch-browser ``` And we need to click the link from the output. Login. And copy the url. Paste into the commandline. ``` gsutil cp result.json $TASK_4 cat > request.json <<EOF { "config": { "encoding":"FLAC", "languageCode": "en-US" }, "audio": { "uri":"gs://cloud-training/gsp323/task3.flac" } } EOF curl -s -X POST -H "Content-Type: application/json" --data-binary @request.json \ "https://speech.googleapis.com/v1/speech:recognize?key=${API_KEY}" > result.json gsutil cp result.json $TASK_3 gcloud iam service-accounts create quickstart gcloud iam service-accounts keys create key.json --iam-account quickstart@${GOOGLE_CLOUD_PROJECT}.iam.gserviceaccount.com gcloud auth activate-service-account --key-file key.json export ACCESS_TOKEN=$(gcloud auth print-access-token) cat > request.json <<EOF { "inputUri":"gs://spls/gsp154/video/train.mp4", "features": [ "TEXT_DETECTION" ] } EOF ``` ``` curl -s -H 'Content-Type: application/json' \ -H "Authorization: Bearer $ACCESS_TOKEN" \ 'https://videointelligence.googleapis.com/v1/videos:annotate' \ -d @request.json curl -s -H 'Content-Type: application/json' -H "Authorization: Bearer $ACCESS_TOKEN" 'https://videointelligence.googleapis.com/v1/operations/OPERATION_FROM_PREVIOUS_REQUEST' > result1.json ```