# activeloopai/Hub
https://github.com/activeloopai/Hub
這個 project 主要是要解決 是再說 一般處理ml/ai 的時候建模比較少,大多是在做資料的預處理所衍生出來的專案,裡面有簡單example
https://debomastet335.medium.com/exploring-hub-activeloop-a-tale-of-uploading-pokemon-data-568b0028f7b3
最簡單可以看到這個專案可以透過幾行程式碼 去載入 tensorflow 或 pytorch
https://app.activeloop.ai/datasets/popular/?utm_source=github&utm_medium=repo&utm_campaign=readme
數據可以視覺化檢視數據
https://app.activeloop.ai/dataset/activeloop/mnist
![](https://i.imgur.com/TkD3Myd.png)
也提供類似儲存庫的方式共享數據
![](https://i.imgur.com/RmhoCho.png)
在其中一個issue https://github.com/activeloopai/Hub/issues/427
嚴格控管安全性方面我覺得是可以透過這個方法去檢查 video or audio data format
https://gist.github.com/x213212/38f998c51e4ddd19309ef9a881c400e2
# typer checker
> need install ffmpeg
> pip install FFmpeg
>
can use mutagen or ffprobe check data format
https://mutagen.readthedocs.io/en/latest/user/gettingstarted.html
https://stackoverflow.com/questions/53144494/get-duration-from-multiple-video-files
so possible add type checker in there
![](https://i.imgur.com/9OpKdOL.png)
```python=
import os, sys, subprocess, shlex, re
from subprocess import call
def type_check(filepath):
print(type(filepath))
print("-----------------")
if (type(filepath).__name__=='list'):
for x in filepath:
probe_file(x)
else:
probe_file(filepath)
def probe_file(filename):
cmnd = ['ffprobe', '-show_format', '-pretty', '-loglevel', 'quiet', filename]
p = subprocess.Popen(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE)
print (filename)
out, err = p.communicate()
print ("==========output==========")
# print (out)
tmp = (str(out).split( "\\n"))
# print (tmp)
for x in tmp :
print(x)
if err:
print ("========= error ========")
print (err)
```
![](https://i.imgur.com/koUq8UE.png)
```python
for key, value in schema.items():
if( type(value).__name__ == "Audio" ) :
print(type(value).__name__)
type_check((ds))
```
we can get true data format infomation
![](https://i.imgur.com/rgr0sCG.png)