# activeloopai/Hub https://github.com/activeloopai/Hub 這個 project 主要是要解決 是再說 一般處理ml/ai 的時候建模比較少,大多是在做資料的預處理所衍生出來的專案,裡面有簡單example https://debomastet335.medium.com/exploring-hub-activeloop-a-tale-of-uploading-pokemon-data-568b0028f7b3 最簡單可以看到這個專案可以透過幾行程式碼 去載入 tensorflow 或 pytorch https://app.activeloop.ai/datasets/popular/?utm_source=github&utm_medium=repo&utm_campaign=readme 數據可以視覺化檢視數據 https://app.activeloop.ai/dataset/activeloop/mnist ![](https://i.imgur.com/TkD3Myd.png) 也提供類似儲存庫的方式共享數據 ![](https://i.imgur.com/RmhoCho.png) 在其中一個issue https://github.com/activeloopai/Hub/issues/427 嚴格控管安全性方面我覺得是可以透過這個方法去檢查 video or audio data format https://gist.github.com/x213212/38f998c51e4ddd19309ef9a881c400e2 # typer checker > need install ffmpeg > pip install FFmpeg > can use mutagen or ffprobe check data format https://mutagen.readthedocs.io/en/latest/user/gettingstarted.html https://stackoverflow.com/questions/53144494/get-duration-from-multiple-video-files so possible add type checker in there ![](https://i.imgur.com/9OpKdOL.png) ```python= import os, sys, subprocess, shlex, re from subprocess import call def type_check(filepath): print(type(filepath)) print("-----------------") if (type(filepath).__name__=='list'): for x in filepath: probe_file(x) else: probe_file(filepath) def probe_file(filename): cmnd = ['ffprobe', '-show_format', '-pretty', '-loglevel', 'quiet', filename] p = subprocess.Popen(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) print (filename) out, err = p.communicate() print ("==========output==========") # print (out) tmp = (str(out).split( "\\n")) # print (tmp) for x in tmp : print(x) if err: print ("========= error ========") print (err) ``` ![](https://i.imgur.com/koUq8UE.png) ```python for key, value in schema.items(): if( type(value).__name__ == "Audio" ) : print(type(value).__name__) type_check((ds)) ``` we can get true data format infomation ![](https://i.imgur.com/rgr0sCG.png)