Google API 研究

# Google API 研究 ###### tags: `程式` * [Package com.google.cloud.speech.v1 (2.5.4](https://cloud.google.com/java/docs/reference/google-cloud-speech/latest/com.google.cloud.speech.v1) * [How to Use RESTful APIs In Unity [Unity Tutorial] ](https://www.youtube.com/watch?v=XIbZDz_p6vE) * https://github.com/GlitchEnzo/NuGetForUnity * [Google.Cloud.Speech.V1](https://www.nuget.org/packages/Google.Cloud.Speech.V1/) * really??? 結果只要用IDE(ex:Visual Studio)就可以使用NuGet了Orz，虧我翻了老半天... pruss ( 程式 ) 要在 Unity 上裝Nuget的東西可以在IDE 裡面下Nuget 指令 IDE 會把dll 下載到Packages 資料夾再把dll 丟到專案目錄就可以了感謝pruss大大Orz ![](https://i.imgur.com/iDqQJSU.png) 原來要是版本有問題dll檔會自己跳出來說要裝哪個版本啊Orz ![](https://i.imgur.com/UZMGt8E.png) [How to: Reference a strong-named assembly](https://learn.microsoft.com/en-us/dotnet/standard/assembly/reference-strong-named) [Manage packages with the Visual Studio Package Manager Console (PowerShell)](https://learn.microsoft.com/en-us/nuget/consume-packages/install-use-packages-powershell) ## 語音辨識與語音訴說版本紀錄 ### GoogleCloudSpeechToText版本稍微紀錄 * Google.Cloud.Speech * ### 自己架的版本 * Google.Cloud.TextToSpeech ### TextToSpeech相關資料 * [Create voice audio files](https://cloud.google.com/text-to-speech/docs/create-audio) 轉語音參考程式碼 ``` /// <summary> /// This method is called once by the Unity coroutine once the speech is successfully synthesized. /// It will then attempt to play that audio file. /// Note that the playback will fail if the output audio format is not pcm encoded. /// </summary> /// <param name="sender">The source of the event.</param> /// <param name="args">The <see cref="GenericEventArgs{Stream}"/> instance containing the event data.</param> //private void PlayAudio(object sender, GenericEventArgs<Stream> args) private void PlayAudio(Stream audioStream) { Debug.Log("Playing audio stream"); // Play the audio using Unity AudioSource, allowing us to benefit from effects, // spatialization, mixing, etc. // Get the size of the original stream var size = audioStream.Length; // Don't playback if the stream is empty if (size > 0) { try { Debug.Log($"Creating new byte array of size {size}"); // Create buffer byte[] buffer = new byte[size]; Debug.Log($"Reading stream to the end and putting in bytes array."); buffer = ReadToEnd(audioStream); // Convert raw WAV data into Unity audio data Debug.Log($"Converting raw WAV data of size {buffer.Length} into Unity audio data."); int sampleCount = 0; int frequency = 0; var unityData = AudioWithHeaderToUnityAudio(buffer, out sampleCount, out frequency); // Convert data to a Unity audio clip Debug.Log($"Converting audio data of size {unityData.Length} to Unity audio clip with {sampleCount} samples at frequency {frequency}."); var clip = ToClip("Speech", unityData, sampleCount, frequency); // Set the source on the audio clip audioSource.clip = clip; Debug.Log($"Trigger playback of audio clip on AudioSource."); // Play audio audioSource.Play(); } catch (Exception ex) { Debug.Log("An error occurred during audio stream conversion and playback." + Environment.NewLine + ex.Message); } } } /// <summary> /// Reads a stream from beginning to end, returning an array of bytes /// </summary> /// <param name="stream"></param> /// <returns></returns> public static byte[] ReadToEnd(Stream stream) { long originalPosition = 0; if (stream.CanSeek) { originalPosition = stream.Position; stream.Position = 0; } try { byte[] readBuffer = new byte[4096]; int totalBytesRead = 0; int bytesRead; while ((bytesRead = stream.Read(readBuffer, totalBytesRead, readBuffer.Length - totalBytesRead)) > 0) { totalBytesRead += bytesRead; if (totalBytesRead == readBuffer.Length) { int nextByte = stream.ReadByte(); if (nextByte != -1) { byte[] temp = new byte[readBuffer.Length * 2]; Buffer.BlockCopy(readBuffer, 0, temp, 0, readBuffer.Length); Buffer.SetByte(temp, totalBytesRead, (byte)nextByte); readBuffer = temp; totalBytesRead++; } } } byte[] buffer = readBuffer; if (readBuffer.Length != totalBytesRead) { buffer = new byte[totalBytesRead]; Buffer.BlockCopy(readBuffer, 0, buffer, 0, totalBytesRead); } return buffer; } finally { if (stream.CanSeek) { stream.Position = originalPosition; } } } /// <summary> /// Converts two bytes to one float in the range -1 to 1. /// </summary> /// <param name="firstByte">The first byte.</param> /// <param name="secondByte"> The second byte.</param> /// <returns>The converted float.</returns> private static float BytesToFloat(byte firstByte, byte secondByte) { // Convert two bytes to one short (little endian) short s = (short)((secondByte << 8) | firstByte); // Convert to range from -1 to (just below) 1 return s / 32768.0F; } /// <summary> /// Converts an array of bytes to an integer. /// </summary> /// <param name="bytes"> The byte array.</param> /// <param name="offset"> An offset to read from.</param> /// <returns>The converted int.</returns> private static int BytesToInt(byte[] bytes, int offset = 0) { int value = 0; for (int i = 0; i < 4; i++) { value |= ((int)bytes[offset + i]) << (i * 8); } return value; } /// <summary> /// Dynamically creates an <see cref="AudioClip"/> that represents raw Unity audio data. /// </summary> /// <param name="name"> The name of the dynamically generated clip.</param> /// <param name="audioData">Raw Unity audio data.</param> /// <param name="sampleCount">The number of samples in the audio data.</param> /// <param name="frequency">The frequency of the audio data.</param> /// <returns>The <see cref="AudioClip"/>.</returns> private static AudioClip ToClip(string name, float[] audioData, int sampleCount, int frequency) { var clip = AudioClip.Create(name, sampleCount, 1, frequency, false); clip.SetData(audioData, 0); return clip; } /// <summary> /// Converts raw WAV data into Unity formatted audio data. /// </summary> /// <param name="wavAudio">The raw WAV data.</param> /// <param name="sampleCount">The number of samples in the audio data.</param> /// <param name="frequency">The frequency of the audio data.</param> /// <returns>The Unity formatted audio data. </returns> private static float[] AudioWithHeaderToUnityAudio(byte[] wavAudio, out int sampleCount, out int frequency) { // Determine if mono or stereo int channelCount = wavAudio[22]; // Speech audio data is always mono but read actual header value for processing Debug.Log($"Audio data has {channelCount} channel(s)."); // Get the frequency frequency = BytesToInt(wavAudio, 24); Debug.Log($"Audio data frequency is {frequency}."); // Get past all the other sub chunks to get to the data subchunk: int pos = 12; // First subchunk ID from 12 to 16 // Keep iterating until we find the data chunk (i.e. 64 61 74 61 ...... (i.e. 100 97 116 97 in decimal)) while (!(wavAudio[pos] == 100 && wavAudio[pos + 1] == 97 && wavAudio[pos + 2] == 116 && wavAudio[pos + 3] == 97)) { pos += 4; int chunkSize = wavAudio[pos] + wavAudio[pos + 1] * 256 + wavAudio[pos + 2] * 65536 + wavAudio[pos + 3] * 16777216; pos += 4 + chunkSize; } pos += 8; // Pos is now positioned to start of actual sound data. sampleCount = (wavAudio.Length - pos) / 2; // 2 bytes per sample (16 bit sound mono) if (channelCount == 2) { sampleCount /= 2; } // 4 bytes per sample (16 bit stereo) Debug.Log($"Audio data contains {sampleCount} samples. Starting conversion"); // Allocate memory (supporting left channel only) var unityData = new float[sampleCount]; try { // Write to double array/s: int i = 0; while (pos < wavAudio.Length) { unityData[i] = BytesToFloat(wavAudio[pos], wavAudio[pos + 1]); pos += 2; if (channelCount == 2) { pos += 2; } i++; } } catch (Exception ex) { Debug.Log($"Error occurred converting audio data to float array of size {wavAudio.Length} at position {pos}."); } return unityData; } /// <summary> /// Converts raw WAV data into Unity formatted audio data. /// </summary> /// <param name="wavAudio">The raw WAV data.</param> /// <param name="sampleCount">The number of samples in the audio data.</param> /// <param name="frequency">The frequency of the audio data.</param> /// <returns>The Unity formatted audio data. </returns> private static float[] FixedRAWAudioToUnityAudio(byte[] wavAudio, int channelCount, int resolution, out int sampleCount) { // Pos is now positioned to start of actual sound data. int bytesPerSample = resolution / 8; // e.g. 2 bytes per sample (16 bit sound mono) sampleCount = wavAudio.Length / bytesPerSample; if (channelCount == 2) { sampleCount /= 2; } // 4 bytes per sample (16 bit stereo) Debug.Log($"Audio data contains {sampleCount} samples. Starting conversion"); // Allocate memory (supporting left channel only) var unityData = new float[sampleCount]; int pos = 0; try { // Write to double array/s: int i = 0; while (pos < wavAudio.Length) { unityData[i] = BytesToFloat(wavAudio[pos], wavAudio[pos + 1]); pos += 2; if (channelCount == 2) { pos += 2; } i++; } } catch (Exception ex) { Debug.Log($"Error occurred converting audio data to float array of size {wavAudio.Length} at position {pos}."); } return unityData; } ``` ### SpeechToText相關紀錄 ``` //如果接到例外會強制停止 catch (TaskCanceledException) { ``` ### Google cloud相關設定 * [Service accounts](https://cloud.google.com/iam/docs/service-accounts) * 管理權限的東西 * [Install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install-sdk) * 去學校電腦測試時應該也要再裝一次...好吧不需要網路就可以呼叫此Cloud shell了Orz * Google Cloud * Cloud shell * [Apply for monthly invoiced billing](https://cloud.google.com/billing/docs/how-to/invoiced-billing) ``` while (!(wavAudio[pos] == 100 && wavAudio[pos + 1] == 97 && wavAudio[pos + 2] == 116 && wavAudio[pos + 3] == 97)) { pos += 4; int chunkSize = wavAudio[pos] + wavAudio[pos + 1] * 256 + wavAudio[pos + 2] * 65536 + wavAudio[pos + 3] * 16777216; pos += 4 + chunkSize; } pos += 8; ``` * ByteString轉成Byte在經過一連串轉成Unity可以播的資料 ## 數字轉換 * [Google cloud speech to text - How to get numbers in digit ](https://stackoverflow.com/questions/66206056/google-cloud-speech-to-text-how-to-get-numbers-in-digit) * [Supproted class tokens](https://cloud.google.com/speech-to-text/docs/class-tokens) ## 雜項 * [Naudio](https://github.com/naudio/NAudio) * 用來轉換MP3格式的API * [Send a recognition request with model adaptation](https://cloud.google.com/speech-to-text/docs/adaptation) * https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/projects.locations.customClasses * https://cloud.google.com/speech-to-text/docs/class-tokens * https://stackoverflow.com/questions/66206056/google-cloud-speech-to-text-how-to-get-numbers-in-digit * https://groups.google.com/g/cloud-speech-discuss/c/tocHI0uQ2rE?pli=1 * 更進階的功能可能要用Python寫...Orz