# Google API 研究
###### tags: `程式`
* [Package com.google.cloud.speech.v1 (2.5.4](https://cloud.google.com/java/docs/reference/google-cloud-speech/latest/com.google.cloud.speech.v1)
* [How to Use RESTful APIs In Unity [Unity Tutorial]
](https://www.youtube.com/watch?v=XIbZDz_p6vE)
* https://github.com/GlitchEnzo/NuGetForUnity
* [Google.Cloud.Speech.V1](https://www.nuget.org/packages/Google.Cloud.Speech.V1/)
* really???
結果只要用IDE(ex:Visual Studio)就可以使用NuGet了Orz,虧我翻了老半天...
pruss ( 程式 )
要在 Unity 上裝Nuget的東西 可以在IDE 裡面下Nuget 指令 IDE 會把dll 下載到Packages 資料夾 再把dll 丟到專案目錄就可以了
感謝pruss大大Orz

原來要是版本有問題dll檔會自己跳出來說要裝哪個版本啊Orz

[How to: Reference a strong-named assembly](https://learn.microsoft.com/en-us/dotnet/standard/assembly/reference-strong-named)
[Manage packages with the Visual Studio Package Manager Console (PowerShell)](https://learn.microsoft.com/en-us/nuget/consume-packages/install-use-packages-powershell)
## 語音辨識與語音訴說版本紀錄
### GoogleCloudSpeechToText版本稍微紀錄
* Google.Cloud.Speech
*
### 自己架的版本
* Google.Cloud.TextToSpeech
### TextToSpeech相關資料
* [Create voice audio files](https://cloud.google.com/text-to-speech/docs/create-audio)
轉語音參考程式碼
```
/// <summary>
/// This method is called once by the Unity coroutine once the speech is successfully synthesized.
/// It will then attempt to play that audio file.
/// Note that the playback will fail if the output audio format is not pcm encoded.
/// </summary>
/// <param name="sender">The source of the event.</param>
/// <param name="args">The <see cref="GenericEventArgs{Stream}"/> instance containing the event data.</param>
//private void PlayAudio(object sender, GenericEventArgs<Stream> args)
private void PlayAudio(Stream audioStream)
{
Debug.Log("Playing audio stream");
// Play the audio using Unity AudioSource, allowing us to benefit from effects,
// spatialization, mixing, etc.
// Get the size of the original stream
var size = audioStream.Length;
// Don't playback if the stream is empty
if (size > 0)
{
try
{
Debug.Log($"Creating new byte array of size {size}");
// Create buffer
byte[] buffer = new byte[size];
Debug.Log($"Reading stream to the end and putting in bytes array.");
buffer = ReadToEnd(audioStream);
// Convert raw WAV data into Unity audio data
Debug.Log($"Converting raw WAV data of size {buffer.Length} into Unity audio data.");
int sampleCount = 0;
int frequency = 0;
var unityData = AudioWithHeaderToUnityAudio(buffer, out sampleCount, out frequency);
// Convert data to a Unity audio clip
Debug.Log($"Converting audio data of size {unityData.Length} to Unity audio clip with {sampleCount} samples at frequency {frequency}.");
var clip = ToClip("Speech", unityData, sampleCount, frequency);
// Set the source on the audio clip
audioSource.clip = clip;
Debug.Log($"Trigger playback of audio clip on AudioSource.");
// Play audio
audioSource.Play();
}
catch (Exception ex)
{
Debug.Log("An error occurred during audio stream conversion and playback."
+ Environment.NewLine + ex.Message);
}
}
}
/// <summary>
/// Reads a stream from beginning to end, returning an array of bytes
/// </summary>
/// <param name="stream"></param>
/// <returns></returns>
public static byte[] ReadToEnd(Stream stream)
{
long originalPosition = 0;
if (stream.CanSeek)
{
originalPosition = stream.Position;
stream.Position = 0;
}
try
{
byte[] readBuffer = new byte[4096];
int totalBytesRead = 0;
int bytesRead;
while ((bytesRead = stream.Read(readBuffer, totalBytesRead, readBuffer.Length - totalBytesRead)) > 0)
{
totalBytesRead += bytesRead;
if (totalBytesRead == readBuffer.Length)
{
int nextByte = stream.ReadByte();
if (nextByte != -1)
{
byte[] temp = new byte[readBuffer.Length * 2];
Buffer.BlockCopy(readBuffer, 0, temp, 0, readBuffer.Length);
Buffer.SetByte(temp, totalBytesRead, (byte)nextByte);
readBuffer = temp;
totalBytesRead++;
}
}
}
byte[] buffer = readBuffer;
if (readBuffer.Length != totalBytesRead)
{
buffer = new byte[totalBytesRead];
Buffer.BlockCopy(readBuffer, 0, buffer, 0, totalBytesRead);
}
return buffer;
}
finally
{
if (stream.CanSeek)
{
stream.Position = originalPosition;
}
}
}
/// <summary>
/// Converts two bytes to one float in the range -1 to 1.
/// </summary>
/// <param name="firstByte">The first byte.</param>
/// <param name="secondByte"> The second byte.</param>
/// <returns>The converted float.</returns>
private static float BytesToFloat(byte firstByte, byte secondByte)
{
// Convert two bytes to one short (little endian)
short s = (short)((secondByte << 8) | firstByte);
// Convert to range from -1 to (just below) 1
return s / 32768.0F;
}
/// <summary>
/// Converts an array of bytes to an integer.
/// </summary>
/// <param name="bytes"> The byte array.</param>
/// <param name="offset"> An offset to read from.</param>
/// <returns>The converted int.</returns>
private static int BytesToInt(byte[] bytes, int offset = 0)
{
int value = 0;
for (int i = 0; i < 4; i++)
{
value |= ((int)bytes[offset + i]) << (i * 8);
}
return value;
}
/// <summary>
/// Dynamically creates an <see cref="AudioClip"/> that represents raw Unity audio data.
/// </summary>
/// <param name="name"> The name of the dynamically generated clip.</param>
/// <param name="audioData">Raw Unity audio data.</param>
/// <param name="sampleCount">The number of samples in the audio data.</param>
/// <param name="frequency">The frequency of the audio data.</param>
/// <returns>The <see cref="AudioClip"/>.</returns>
private static AudioClip ToClip(string name, float[] audioData, int sampleCount, int frequency)
{
var clip = AudioClip.Create(name, sampleCount, 1, frequency, false);
clip.SetData(audioData, 0);
return clip;
}
/// <summary>
/// Converts raw WAV data into Unity formatted audio data.
/// </summary>
/// <param name="wavAudio">The raw WAV data.</param>
/// <param name="sampleCount">The number of samples in the audio data.</param>
/// <param name="frequency">The frequency of the audio data.</param>
/// <returns>The Unity formatted audio data. </returns>
private static float[] AudioWithHeaderToUnityAudio(byte[] wavAudio, out int sampleCount, out int frequency)
{
// Determine if mono or stereo
int channelCount = wavAudio[22]; // Speech audio data is always mono but read actual header value for processing
Debug.Log($"Audio data has {channelCount} channel(s).");
// Get the frequency
frequency = BytesToInt(wavAudio, 24);
Debug.Log($"Audio data frequency is {frequency}.");
// Get past all the other sub chunks to get to the data subchunk:
int pos = 12; // First subchunk ID from 12 to 16
// Keep iterating until we find the data chunk (i.e. 64 61 74 61 ...... (i.e. 100 97 116 97 in decimal))
while (!(wavAudio[pos] == 100 && wavAudio[pos + 1] == 97 && wavAudio[pos + 2] == 116 && wavAudio[pos + 3] == 97))
{
pos += 4;
int chunkSize = wavAudio[pos] + wavAudio[pos + 1] * 256 + wavAudio[pos + 2] * 65536 + wavAudio[pos + 3] * 16777216;
pos += 4 + chunkSize;
}
pos += 8;
// Pos is now positioned to start of actual sound data.
sampleCount = (wavAudio.Length - pos) / 2; // 2 bytes per sample (16 bit sound mono)
if (channelCount == 2) { sampleCount /= 2; } // 4 bytes per sample (16 bit stereo)
Debug.Log($"Audio data contains {sampleCount} samples. Starting conversion");
// Allocate memory (supporting left channel only)
var unityData = new float[sampleCount];
try
{
// Write to double array/s:
int i = 0;
while (pos < wavAudio.Length)
{
unityData[i] = BytesToFloat(wavAudio[pos], wavAudio[pos + 1]);
pos += 2;
if (channelCount == 2)
{
pos += 2;
}
i++;
}
}
catch (Exception ex)
{
Debug.Log($"Error occurred converting audio data to float array of size {wavAudio.Length} at position {pos}.");
}
return unityData;
}
/// <summary>
/// Converts raw WAV data into Unity formatted audio data.
/// </summary>
/// <param name="wavAudio">The raw WAV data.</param>
/// <param name="sampleCount">The number of samples in the audio data.</param>
/// <param name="frequency">The frequency of the audio data.</param>
/// <returns>The Unity formatted audio data. </returns>
private static float[] FixedRAWAudioToUnityAudio(byte[] wavAudio, int channelCount, int resolution, out int sampleCount)
{
// Pos is now positioned to start of actual sound data.
int bytesPerSample = resolution / 8; // e.g. 2 bytes per sample (16 bit sound mono)
sampleCount = wavAudio.Length / bytesPerSample;
if (channelCount == 2) { sampleCount /= 2; } // 4 bytes per sample (16 bit stereo)
Debug.Log($"Audio data contains {sampleCount} samples. Starting conversion");
// Allocate memory (supporting left channel only)
var unityData = new float[sampleCount];
int pos = 0;
try
{
// Write to double array/s:
int i = 0;
while (pos < wavAudio.Length)
{
unityData[i] = BytesToFloat(wavAudio[pos], wavAudio[pos + 1]);
pos += 2;
if (channelCount == 2)
{
pos += 2;
}
i++;
}
}
catch (Exception ex)
{
Debug.Log($"Error occurred converting audio data to float array of size {wavAudio.Length} at position {pos}.");
}
return unityData;
}
```
### SpeechToText相關紀錄
```
//如果接到例外會強制停止
catch (TaskCanceledException)
{
```
### Google cloud相關設定
* [Service accounts](https://cloud.google.com/iam/docs/service-accounts)
* 管理權限的東西
* [Install the Google Cloud CLI](https://cloud.google.com/sdk/docs/install-sdk)
* 去學校電腦測試時應該也要再裝一次...好吧不需要網路就可以呼叫此Cloud shell了Orz
* Google Cloud
* Cloud shell
* [Apply for monthly invoiced billing](https://cloud.google.com/billing/docs/how-to/invoiced-billing)
```
while (!(wavAudio[pos] == 100 && wavAudio[pos + 1] == 97 && wavAudio[pos + 2] == 116 && wavAudio[pos + 3] == 97))
{
pos += 4;
int chunkSize = wavAudio[pos] + wavAudio[pos + 1] * 256 + wavAudio[pos + 2] * 65536 + wavAudio[pos + 3] * 16777216;
pos += 4 + chunkSize;
}
pos += 8;
```
* ByteString轉成Byte在經過一連串轉成Unity可以播的資料
## 數字轉換
* [Google cloud speech to text - How to get numbers in digit
](https://stackoverflow.com/questions/66206056/google-cloud-speech-to-text-how-to-get-numbers-in-digit)
* [Supproted class tokens](https://cloud.google.com/speech-to-text/docs/class-tokens)
## 雜項
* [Naudio](https://github.com/naudio/NAudio)
* 用來轉換MP3格式的API
* [Send a recognition request with model adaptation](https://cloud.google.com/speech-to-text/docs/adaptation)
* https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/projects.locations.customClasses
* https://cloud.google.com/speech-to-text/docs/class-tokens
* https://stackoverflow.com/questions/66206056/google-cloud-speech-to-text-how-to-get-numbers-in-digit
* https://groups.google.com/g/cloud-speech-discuss/c/tocHI0uQ2rE?pli=1
* 更進階的功能可能要用Python寫...Orz