## OpenL3 OpenL3 is an open-source Python library for computing deep audio and image embeddings. Using pysoundfile library for audio, scikit-image library for image, and moviepy for video. ### Audio Embeddings Computation ```python=1 import openl3 import soundfile as sf audio, sr = sf.read('/path/to/file.wav') emb, ts = openl3.get_audio_embedding(audio, sr) ``` `emb` is a T-by-D numpy array: >T is the number of embedding frames D is the dimensionality of the embedding `ts` is a length-T numpy array: >timestamps for each embedding frame ### Get a list of embeddings and timestamp arrays for each of the input arrays. ```python=1 import openl3 import soundfile as sf audio1, sr1 = sf.read('/path/to/file1.wav') audio2, sr2 = sf.read('/path/to/file2.wav') audio3, sr3 = sf.read('/path/to/file3.wav') audio_list = [audio1, audio2, audio3] sr_list = [sr1, sr2, sr3] # Pass in a list of audio arrays and sample rates emb_list, ts_list = openl3.get_audio_embedding(audio_list, sr_list, batch_size=32) # If all arrays use sample rate, can just pass in one sample rate emb_list, ts_list = openl3.get_audio_embedding(audio_list, sr1, batch_size=32) ``` ### OpenL3怎麼get embeddings? 1. Uses a mel-spectrogram time-frequency representation with 128 bands 2. Returns an embedding of dimensionality 6144 for each embedding frame If you want to change the parameter: ```python=1 emb, ts = openl3.get_audio_embedding(audio, sr, content_type="env",# <-model trained on environmental videos input_repr="linear", embedding_size=512) #spectrogram with linear frequency axis ``` ### To compute embeddings for an audio file and directly save them to disk you can use `process_audio_file` ```python=1 audio_filepath = '/path/to/file.wav' # Save the embedding to '/path/to/file.npz' openl3.process_audio_file(audio_filepath) ``` Load embeddings from disk ```python=1 data = np.load('/path/to/file.npz') emb, ts = data['embedding'], data['timestamps'] ``` ### Choosing an Audio Frontend (CPU / GPU) OpenL3 provides two different audio frontends to choose from. 1. Kapre (GPU) 2. Librosa (CPU)