# lab 0
group member: Donghan Yu, Ruohong Zhang, Zhiqing Sun
Question:
1. What device(s) are you setting up?
* We set up the Rasberry Pi 4
2. Did you run into any roadblocks following the instructions? What happened, and what did you do to fix the problem?
* We had unstable internet connection in the set up. We moved from classroom to our own lab and the internet was better.
3. Are all group members now able to ssh in to the device from their laptops? If not, why not? How will this be resolved?
* Yes. We can all ssh to the device.
4. What is your group's hardware management plan? For example: Where will the device(s) be stored throughout the semester? What will happen if a device needs physical restart or debugging? What will happen in the case of COVID lockdown?
* We plan to put it into our own office with stable power and wifi. We keep our office locked to make it safe. In case of lockdown, we will move it to the living place of a member.
5. Now, you should be able to take a picture, record audio, run a basic computer vision model, and run a basic NLP model. Now, write a script that pipes I/O to models. For example, write a script that takes a picture then runs a detection model on that image, and/or write a script that . Include the script at the end of your lab report.
6. Describe what the script you wrote does (document it.)
* Capture Audio: capture 10s of audio and save into output.wav
* Convert Audio: convert the audio into integer form
* Audio2Translation: Recognize the audio and use a machine translation model in Transformers to convert it into Chinese
* Capture Image: capture an image of size 1920 * 1280
* Rotate Image: rotate the image by 180 degrees
* Object Detection: Use YOLOV5 model to detect objects
7. Did you have any trouble getting this running? If so, describe what difficulties you ran into, and how you tried to resolve them.
* One of the problems we encountered is that Rasberry Pi 4 does not support Pytorch 1.9, where there is some illegal instructions causing core dumped. Using Pytorch 1.8 solved this issue.
8. Demo
Audio2Translation

Object Detection

Codes:
1. Capture Audio
```python
import sounddevice as sd
from scipy.io.wavfile import write
fs = 44100 # Sample rate
seconds = 10 # Duration of recording
myrecording = sd.rec(int(seconds * fs), samplerate=fs, channels=1)
sd.wait() # Wait until recording is finished
write('output.wav', fs, myrecording) # Save as WAV file
```
2. Convert Audio
```python
import sounddevice as sd
from scipy.io import wavfile
import numpy as np
fs = 44100 # Sample rate
rate, data = wavfile.read('output.wav')
# Convert `data` to 32 bit integers:
myrecording = (np.iinfo(np.int32).max * (data / np.abs(data).max())).astype(np.int32)
wavfile.write('output_int.wav', fs, myrecording)
```
3. Audio2Translation
```python
import speech_recognition as sr
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
wav_path = "output_int.wav"
tokenizer = AutoTokenizer.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
model = AutoModelForSeq2SeqLM.from_pretrained("Helsinki-NLP/opus-mt-en-zh")
r = sr.Recognizer()
with sr.WavFile(wav_path) as source:
audio = r.record(source)
try:
text = r.recognize_google(audio)
input_ids = tokenizer(text, max_length=64, return_tensors='pt', truncation=True)
sequences = model.generate(**input_ids, early_stopping=True, max_length=64, num_beams=2)
outputs = tokenizer.batch_decode(sequences, skip_special_tokens=True, max_length=64)[0]
# recognize speech using Sphinx
try:
print("Sphinx thinks you said \" %s \" " % text)
print("Helsinki thinks the translation is \" %s \" " % outputs)
except sr.UnknownValueError:
print("Sphinx could not understand audio")
except sr.RequestError as e:
print("Sphinx error; {0}".format(e))
except KeyboardInterrupt:
pass
```
4. Capture Image
```python
import cv2
def gstreamer_pipeline(capture_width=1280, capture_height=720,
display_width=1280, display_height=720,
framerate=60, flip_method=0):
return (
"nvarguscamerasrc ! "
"video/x-raw(memory:NVMM), "
f"width=(int){capture_width}, height=(int){capture_height}, "
f"format=(string)NV12, framerate=(fraction){framerate}/1 ! "
f"nvvidconv flip-method={flip_method} ! "
f"video/x-raw, width=(int){display_width}, height=(int){display_height}, format=(string)BGRx ! "
"videoconvert ! "
"video/x-raw, format=(string)BGR ! appsink"
)
HEIGHT = 1280
WIDTH = 1920
center = (WIDTH / 2, HEIGHT / 2)
M = cv2.getRotationMatrix2D(center, 180, 1.0)
nano = False
if nano:
cam = cv2.VideoCapture(gstreamer_pipeline(), cv2.CAP_GSTREAMER)
else:
# Start Camera
print(f"start camera")
cam = cv2.VideoCapture(0)
# cam.set(cv2.CAP_PROP_FRAME_WIDTH, WIDTH) # 3280
# cam.set(cv2.CAP_PROP_FRAME_HEIGHT, HEIGHT) # 2464
if cam.isOpened():
val, img = cam.read()
print(f"cam success {val}")
if val:
fname = "output.png"
print(f"save to {fname}")
cv2.imwrite(fname, img)
# cv2.imwrite('output.png', cv2.warpAffine(img, M, (WIDTH, HEIGHT)))
```
5. Rotate Image
```python
import cv2
import numpy as np
img = cv2.imread('output.png')
h, w, c = img.shape
empty_img = np.zeros([h, w, c], dtype=np.uint8)
for i in range(h):
for j in range(w):
empty_img[i, j] = img[h - i - 1, w - j - 1]
empty_img = empty_img[0:h, 0:w]
cv2.imwrite("output_rotate.png", empty_img)
```
6. Object Detection
```python=
import torch
# Model
model = torch.hub.load('ultralytics/yolov5', 'yolov5l') # or yolov5m, yolov5l, yolov5x, custom
# Images
img = 'output_rotate.png' # or file, Path, PIL, OpenCV, numpy, list
# Inference
results = model(img)
# Results
results.print() # or .show(), .save(), .crop(), .pandas(), etc.
results.save()
```