Try   HackMD

activeloopai/Hub

https://github.com/activeloopai/Hub
這個 project 主要是要解決 是再說 一般處理ml/ai 的時候建模比較少,大多是在做資料的預處理所衍生出來的專案,裡面有簡單example
https://debomastet335.medium.com/exploring-hub-activeloop-a-tale-of-uploading-pokemon-data-568b0028f7b3

最簡單可以看到這個專案可以透過幾行程式碼 去載入 tensorflow 或 pytorch
https://app.activeloop.ai/datasets/popular/?utm_source=github&utm_medium=repo&utm_campaign=readme
數據可以視覺化檢視數據
https://app.activeloop.ai/dataset/activeloop/mnist

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

也提供類似儲存庫的方式共享數據

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

在其中一個issue https://github.com/activeloopai/Hub/issues/427
嚴格控管安全性方面我覺得是可以透過這個方法去檢查 video or audio data format
https://gist.github.com/x213212/38f998c51e4ddd19309ef9a881c400e2

typer checker

need install ffmpeg
pip install FFmpeg

can use mutagen or ffprobe check data format

https://mutagen.readthedocs.io/en/latest/user/gettingstarted.html
https://stackoverflow.com/questions/53144494/get-duration-from-multiple-video-files

so possible add type checker in there

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

import os, sys, subprocess, shlex, re from subprocess import call def type_check(filepath): print(type(filepath)) print("-----------------") if (type(filepath).__name__=='list'): for x in filepath: probe_file(x) else: probe_file(filepath) def probe_file(filename): cmnd = ['ffprobe', '-show_format', '-pretty', '-loglevel', 'quiet', filename] p = subprocess.Popen(cmnd, stdout=subprocess.PIPE, stderr=subprocess.PIPE) print (filename) out, err = p.communicate() print ("==========output==========") # print (out) tmp = (str(out).split( "\\n")) # print (tmp) for x in tmp : print(x) if err: print ("========= error ========") print (err)

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →

        for key, value in schema.items():
            if( type(value).__name__ == "Audio" ) :
                print(type(value).__name__)
                type_check((ds))

we can get true data format infomation

Image Not Showing Possible Reasons
  • The image file may be corrupted
  • The server hosting the image is unavailable
  • The image path is incorrect
  • The image format is not supported
Learn More →