lab 0 - HackMD

# lab 0 group member: Donghan Yu, Ruohong Zhang, Zhiqing Sun Question: 1. What device(s) are you setting up? * We set up the Rasberry Pi 4 2. Did you run into any roadblocks following the instructions? What happened, and what did you do to fix the problem? * We had unstable internet connection in the set up. We moved from classroom to our own lab and the internet was better. 3. Are all group members now able to ssh in to the device from their laptops? If not, why not? How will this be resolved? * Yes. We can all ssh to the device. 4. What is your group's hardware management plan? For example: Where will the device(s) be stored throughout the semester? What will happen if a device needs physical restart or debugging? What will happen in the case of COVID lockdown? * We plan to put it into our own office with stable power and wifi. We keep our office locked to make it safe. In case of lockdown, we will move it to the living place of a member. 5. Now, you should be able to take a picture, record audio, run a basic computer vision model, and run a basic NLP model. Now, write a script that pipes I/O to models. For example, write a script that takes a picture then runs a detection model on that image, and/or write a script that . Include the script at the end of your lab report. 6. Describe what the script you wrote does (document it.) * Capture Audio: capture 10s of audio and save into output.wav * Convert Audio: convert the audio into integer form * Audio2Translation: Recognize the audio and use a machine translation model in Transformers to convert it into Chinese * Capture Image: capture an image of size 1920 * 1280 * Rotate Image: rotate the image by 180 degrees * Object Detection: Use YOLOV5 model to detect objects 7. Did you have any trouble getting this running? If so, describe what difficulties you ran into, and how you tried to resolve them. * One of the problems we encountered is that Rasberry Pi 4 does not support Pytorch 1.9, where there is some illegal instructions causing core dumped. Using Pytorch 1.8 solved this issue. 8. Demo Audio2Translation ![](https://i.imgur.com/lCiuhoQ.png) Object Detection ![](https://i.imgur.com/sIC7kMv.png) Codes: 1. Capture Audio ```python import sounddevice as sd from scipy.io.wavfile import write fs = 44100 # Sample rate seconds = 10 # Duration of recording myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=1) sd.wait() # Wait until recording is finished write('output.wav', fs, myrecording) # Save as WAV file ``` 2. Convert Audio ```python import sounddevice as sd from scipy.io import wavfile import numpy as np fs = 44100 # Sample rate rate, data = wavfile.read('output.wav') # Convert `data` to 32 bit integers: myrecording = (np.iinfo(np.int32).max * (data / np.abs(data).max())).astype(np.int32) wavfile.write('output_int.wav', fs, myrecording) ``` 3. Audio2Translation ```python import speech_recognition as sr from transformers import AutoTokenizer, AutoModelForSeq2SeqLM wav_path = "output_int.wav" tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh") model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh") r = sr.Recognizer() with sr.WavFile(wav_path) as source: audio = r.record(source) try: text = r.recognize_google(audio) input_ids = tokenizer(text, max_length=64, return_tensors='pt', truncation=True) sequences = model.generate(**input_ids, early_stopping=True, max_length=64, num_beams=2) outputs = tokenizer.batch_decode(sequences, skip_special_tokens=True, max_length=64)[0] # recognize speech using Sphinx try: print("Sphinx thinks you said \" %s \" " % text) print("Helsinki thinks the translation is \" %s \" " % outputs) except sr.UnknownValueError: print("Sphinx could not understand audio") except sr.RequestError as e: print("Sphinx error; {0}".format(e)) except KeyboardInterrupt: pass ``` 4. Capture Image ```python import cv2 def gstreamer_pipeline(capture_width=1280, capture_height=720, display_width=1280, display_height=720, framerate=60, flip_method=0): return ( "nvarguscamerasrc ! " "video/x-raw(memory:NVMM), " f"width=(int){capture_width}, height=(int){capture_height}, " f"format=(string)NV12, framerate=(fraction){framerate}/1 ! " f"nvvidconv flip-method={flip_method} ! " f"video/x-raw, width=(int){display_width}, height=(int){display_height}, format=(string)BGRx ! " "videoconvert ! " "video/x-raw, format=(string)BGR ! appsink" ) HEIGHT = 1280 WIDTH = 1920 center = (WIDTH / 2, HEIGHT / 2) M = cv2.getRotationMatrix2D(center, 180, 1.0) nano = False if nano: cam = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER) else: # Start Camera print(f"start camera") cam = cv2.VideoCapture(0) # cam.set(cv2.CAP_PROP_FRAME_WIDTH, WIDTH) # 3280 # cam.set(cv2.CAP_PROP_FRAME_HEIGHT, HEIGHT) # 2464 if cam.isOpened(): val, img = cam.read() print(f"cam success {val}") if val: fname = "output.png" print(f"save to {fname}") cv2.imwrite(fname, img) # cv2.imwrite('output.png', cv2.warpAffine(img, M, (WIDTH, HEIGHT))) ``` 5. Rotate Image ```python import cv2 import numpy as np img = cv2.imread('output.png') h, w, c = img.shape empty_img = np.zeros([h, w, c], dtype=np.uint8) for i in range(h): for j in range(w): empty_img[i, j] = img[h - i - 1, w - j - 1] empty_img = empty_img[0:h, 0:w] cv2.imwrite("output_rotate.png", empty_img) ``` 6. Object Detection ```python= import torch # Model model = torch.hub.load('ultralytics/yolov5', 'yolov5l') # or yolov5m, yolov5l, yolov5x, custom # Images img = 'output_rotate.png' # or file, Path, PIL, OpenCV, numpy, list # Inference results = model(img) # Results results.print() # or .show(), .save(), .crop(), .pandas(), etc. results.save() ```

Syntax	Example	Reference
# Header	Header	基本排版
- Unordered List	Unordered List
1. Ordered List	Ordered List
- [ ] Todo List	Todo List
> Blockquote	Blockquote
Bold font	Bold font
Italics font	Italics font
~~Strikethrough~~	~~Strikethrough~~
19^th^	19^th
H~2~O	H₂O
++Inserted text++	Inserted text
==Marked text==	Marked text
[link text](https:// "title")	Link
![image alt](https:// "title")	Image
`Code`	`Code`	在筆記中貼入程式碼
```javascript var i = 0; ```	`var i = 0;`
:smile:		Emoji list
{%youtube youtube_id %}	Externals
$L^aT_eX$	L^aT_eX
:::info This is a alert area. :::	This is a alert area.