huggingface === ###### tags: `huggingface` <br> [TOC] <br> ## [官方] Push your dataset files ### CLI ```bash= # Install the Hugging Face CLI pip install -U "huggingface_hub[cli]" # Login with your Hugging Face credentials huggingface-cli login # Push your dataset files huggingface-cli upload tsungjung411/snapshots_for_public . --repo-type=dataset ``` <br> ### Python ```python= from huggingface_hub import HfApi api = HfApi(token=os.getenv("HF_TOKEN")) api.upload_folder( folder_path="/path/to/local/dataset", repo_id="tsungjung411/snapshots_for_public", repo_type="dataset", ) ``` <br> ### HTTPS ```bash= # Make sure git-lfs is installed (https://git-lfs.com) git lfs install git remote add origin https://huggingface.co/datasets/tsungjung411/snapshots_for_public # You'll be prompted for your HF credentials git push -u origin main ``` <br> ### SSH ```bash= # Make sure git-lfs is installed (https://git-lfs.com) git lfs install git remote add origin git@hf.co:datasets/tsungjung411/snapshots_for_public # Make sure SSH key is set in your user settings (https://huggingface.co/settings/keys) git push -u origin main ``` <br> ## Gated user access ### Intro - [Gated models](https://huggingface.co/docs/hub/models-gated) ### Settings -> Gated user access - ### disabled [![](https://hackmd.io/_uploads/Bk44rlZ0kx.png)](https://hackmd.io/_uploads/Bk44rlZ0kx.png) - ### enabled [![](https://hackmd.io/_uploads/HkJwrWZ0yl.png)](https://hackmd.io/_uploads/HkJwrWZ0yl.png) - Automatic approval - Manual review - Notifications frequency - Once a day - Real-time ### 可存取? [![](https://hackmd.io/_uploads/ry2MmWWRJx.png)](https://hackmd.io/_uploads/ry2MmWWRJx.png) :::info ### You need to agree to share your contact information to access this dataset - This repository is publicly accessible, but you have to accept the conditions to access its files and content. - By agreeing you accept to share your contact information (email and username) with the repository authors. [ ] Agree and send request to access repo ::: :::info **Gated dataset** You can list files but not access them ::: <br> ### access / requested [![](https://hackmd.io/_uploads/rytB4WbC1g.png)](https://hackmd.io/_uploads/rytB4WbC1g.png) :::info ### You need to agree to share your contact information to access this dataset - This repository is publicly accessible, but you have to accept the conditions to access its files and content. - **Your request to access this repository has been submitted and is awaiting a review from the repository authors. You can check the status of all your access requests in [your settings](https://huggingface.co/settings/gated-repos).** ::: - ### your settings > https://huggingface.co/settings/gated-repos [![](https://hackmd.io/_uploads/rJ564-ZCkg.png)](https://hackmd.io/_uploads/rJ564-ZCkg.png) - ### 收到通知 ![](https://hackmd.io/_uploads/H1zaBWZCye.png) :::info tj-tsai has requested access to your dataset tsungjung411/snapshots_for_public on huggingface.co. Visit your [repo settings](https://huggingface.co/datasets/tsungjung411/snapshots_for_public/settings) to approve or reject their request. ::: - ### repo settings > https://huggingface.co/datasets/tsungjung411/snapshots_for_public/settings ![](https://hackmd.io/_uploads/SJePIbWCyx.png) - Manage access requests - **pending** [![](https://hackmd.io/_uploads/S1HDhWWRkl.png)](https://hackmd.io/_uploads/S1HDhWWRkl.png) - **accepted** [![](https://hackmd.io/_uploads/rJBxh--0yx.png)](https://hackmd.io/_uploads/rJBxh--0yx.png) <br> ## DEMO ### 安裝套件 ``` pip install huggingface_hub ``` ### 常見錯誤 - ### token 錯誤或已失效: HfHubHTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/datasets/tsungjung411/snapshots_for_public/tree/main?recursive=True&expand=False (Request ID: Root=1-67f4973f-5f3e4f423eb118f807570ff8;31a7d269-9444-4cc5-98cf-2a1d8725ead7) Invalid credentials in Authorization header - ### repo 不存在 Repository Not Found for url: https://huggingface.co/api/datasets/tsungjung411/snapshots_for_public/tree/main?recursive=True&expand=False. Please make sure you specified the correct `repo_id` and `repo_type`. If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication <br> ### 列出 repo 檔案 - ### 方法一 ```python= # pip install huggingface_hub from huggingface_hub import HfApi hf_token = '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' repo_id = "tsungjung411/snapshots_for_public" api = HfApi(token=hf_token) files = api.list_repo_files(repo_id, repo_type="dataset") print("Files in repo:") for file in files: print(f"- {file}") ``` ![image](https://hackmd.io/_uploads/rJolDff01e.png) - ### 方法二 ```python= # pip install huggingface_hub from huggingface_hub import list_repo_files hf_token = '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' repo_id = "tsungjung411/snapshots_for_public" files = list_repo_files(repo_id, token=hf_token, repo_type="dataset") print("Files in repo:") for file in files: print(f"- {file}") ``` ![image](https://hackmd.io/_uploads/rJolDff01e.png) <br> ### 下載 repo 中的檔案 ```python= # pip install huggingface_hub from huggingface_hub import hf_hub_download hf_token = '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' repo_id = "tsungjung411/snapshots_for_public" filename = "20250407-snapshot.zip" # https://huggingface.co/{repo_type}/{repo_owner}/{repo_name}/{filename} local_file_path = hf_hub_download( token=hf_token, repo_type="dataset", repo_id=repo_id, filename=filename) print(f"Downloaded file at: {local_file_path}") ``` - ### 常見參數 - 指定下載資料夾 `local_dir="./downloads"` 預設為:`~/.cache/huggingface/hub/` - ### 下載結果 ![](https://hackmd.io/_uploads/ryB8kXfCkl.png) - ### 常見錯誤1:gated status: awaiting a review GatedRepoError: 403 Client Error. (Request ID: Root=1-67f49d0f-7dea010d736c2d6c7d07cdf4;0dd87351-58b8-4f19-a3da-c47b4997a638) Cannot access gated repo for url https://huggingface.co/datasets/tsungjung411/snapshots_for_public/resolve/main/20250407-snapshot.zip. Your request to access dataset tsungjung411/snapshots_for_public is awaiting a review from the repo authors. - ### 常見錯誤2:gated status: rejected GatedRepoError: 403 Client Error. (Request ID: Root=1-67f49ce8-4e415e33145a54cd1d759e43;f143bc45-5a0d-4959-bc56-939371948df9) Cannot access gated repo for url https://huggingface.co/datasets/tsungjung411/snapshots_for_public/resolve/main/20250407-snapshot.zip. Your request to access dataset tsungjung411/snapshots_for_public has been rejected by the repo's authors. <br> {%hackmd vaaMgNRPS4KGJDSFG0ZE0w %}