huggingface
===
###### tags: `huggingface`
<br>
[TOC]
<br>
## [官方] Push your dataset files
### CLI
```bash=
# Install the Hugging Face CLI
pip install -U "huggingface_hub[cli]"
# Login with your Hugging Face credentials
huggingface-cli login
# Push your dataset files
huggingface-cli upload tsungjung411/snapshots_for_public . --repo-type=dataset
```
<br>
### Python
```python=
from huggingface_hub import HfApi
api = HfApi(token=os.getenv("HF_TOKEN"))
api.upload_folder(
folder_path="/path/to/local/dataset",
repo_id="tsungjung411/snapshots_for_public",
repo_type="dataset",
)
```
<br>
### HTTPS
```bash=
# Make sure git-lfs is installed (https://git-lfs.com)
git lfs install
git remote add origin https://huggingface.co/datasets/tsungjung411/snapshots_for_public
# You'll be prompted for your HF credentials
git push -u origin main
```
<br>
### SSH
```bash=
# Make sure git-lfs is installed (https://git-lfs.com)
git lfs install
git remote add origin git@hf.co:datasets/tsungjung411/snapshots_for_public
# Make sure SSH key is set in your user settings (https://huggingface.co/settings/keys)
git push -u origin main
```
<br>
## Gated user access
### Intro
- [Gated models](https://huggingface.co/docs/hub/models-gated)
### Settings -> Gated user access
- ### disabled
[](https://hackmd.io/_uploads/Bk44rlZ0kx.png)
- ### enabled
[](https://hackmd.io/_uploads/HkJwrWZ0yl.png)
- Automatic approval
- Manual review
- Notifications frequency
- Once a day
- Real-time
### 可存取?
[](https://hackmd.io/_uploads/ry2MmWWRJx.png)
:::info
### You need to agree to share your contact information to access this dataset
- This repository is publicly accessible, but you have to accept the conditions to access its files and content.
- By agreeing you accept to share your contact information (email and username) with the repository authors.
[ ] Agree and send request to access repo
:::
:::info
**Gated dataset** You can list files but not access them
:::
<br>
### access / requested
[](https://hackmd.io/_uploads/rytB4WbC1g.png)
:::info
### You need to agree to share your contact information to access this dataset
- This repository is publicly accessible, but you have to accept the conditions to access its files and content.
- **Your request to access this repository has been submitted and is awaiting a review from the repository authors. You can check the status of all your access requests in [your settings](https://huggingface.co/settings/gated-repos).**
:::
- ### your settings
> https://huggingface.co/settings/gated-repos
[](https://hackmd.io/_uploads/rJ564-ZCkg.png)
- ### 收到通知

:::info
tj-tsai has requested access to your dataset tsungjung411/snapshots_for_public on huggingface.co.
Visit your [repo settings](https://huggingface.co/datasets/tsungjung411/snapshots_for_public/settings) to approve or reject their request.
:::
- ### repo settings
> https://huggingface.co/datasets/tsungjung411/snapshots_for_public/settings

- Manage access requests
- **pending**
[](https://hackmd.io/_uploads/S1HDhWWRkl.png)
- **accepted**
[](https://hackmd.io/_uploads/rJBxh--0yx.png)
<br>
## DEMO
### 安裝套件
```
pip install huggingface_hub
```
### 常見錯誤
- ### token 錯誤或已失效:
HfHubHTTPError: 401 Client Error: Unauthorized for url: https://huggingface.co/api/datasets/tsungjung411/snapshots_for_public/tree/main?recursive=True&expand=False (Request ID: Root=1-67f4973f-5f3e4f423eb118f807570ff8;31a7d269-9444-4cc5-98cf-2a1d8725ead7)
Invalid credentials in Authorization header
- ### repo 不存在
Repository Not Found for url: https://huggingface.co/api/datasets/tsungjung411/snapshots_for_public/tree/main?recursive=True&expand=False.
Please make sure you specified the correct `repo_id` and `repo_type`.
If you are trying to access a private or gated repo, make sure you are authenticated. For more details, see https://huggingface.co/docs/huggingface_hub/authentication
<br>
### 列出 repo 檔案
- ### 方法一
```python=
# pip install huggingface_hub
from huggingface_hub import HfApi
hf_token = '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'
repo_id = "tsungjung411/snapshots_for_public"
api = HfApi(token=hf_token)
files = api.list_repo_files(repo_id, repo_type="dataset")
print("Files in repo:")
for file in files:
print(f"- {file}")
```

- ### 方法二
```python=
# pip install huggingface_hub
from huggingface_hub import list_repo_files
hf_token = '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'
repo_id = "tsungjung411/snapshots_for_public"
files = list_repo_files(repo_id, token=hf_token, repo_type="dataset")
print("Files in repo:")
for file in files:
print(f"- {file}")
```

<br>
### 下載 repo 中的檔案
```python=
# pip install huggingface_hub
from huggingface_hub import hf_hub_download
hf_token = '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'
repo_id = "tsungjung411/snapshots_for_public"
filename = "20250407-snapshot.zip"
# https://huggingface.co/{repo_type}/{repo_owner}/{repo_name}/{filename}
local_file_path = hf_hub_download(
token=hf_token, repo_type="dataset", repo_id=repo_id, filename=filename)
print(f"Downloaded file at: {local_file_path}")
```
- ### 常見參數
- 指定下載資料夾
`local_dir="./downloads"`
預設為:`~/.cache/huggingface/hub/`
- ### 下載結果

- ### 常見錯誤1:gated status: awaiting a review
GatedRepoError: 403 Client Error. (Request ID: Root=1-67f49d0f-7dea010d736c2d6c7d07cdf4;0dd87351-58b8-4f19-a3da-c47b4997a638)
Cannot access gated repo for url https://huggingface.co/datasets/tsungjung411/snapshots_for_public/resolve/main/20250407-snapshot.zip.
Your request to access dataset tsungjung411/snapshots_for_public is awaiting a review from the repo authors.
- ### 常見錯誤2:gated status: rejected
GatedRepoError: 403 Client Error. (Request ID: Root=1-67f49ce8-4e415e33145a54cd1d759e43;f143bc45-5a0d-4959-bc56-939371948df9)
Cannot access gated repo for url https://huggingface.co/datasets/tsungjung411/snapshots_for_public/resolve/main/20250407-snapshot.zip.
Your request to access dataset tsungjung411/snapshots_for_public has been rejected by the repo's authors.
<br>
{%hackmd vaaMgNRPS4KGJDSFG0ZE0w %}