# plan-for-hackmd-airtable-sync
## command line tool
Great—here are two **implementation plans** for your locally running Next.js app that (a) enrich every note with YAML front‑matter, and (b) push notes + metadata into Airtable. I’ll keep UI light and focus on **API routes**, **file layout**, and **Airtable interactions**.
> **Key facts we’ll rely on**
>
> * HackMD API quotas: **2,000 calls/month** (10,000 on Prime) and **100 calls / 5 minutes**. ([HackMD][1])
> * Notes list & note-content endpoints (`GET /v1/notes`, `GET /v1/notes/:noteId`). ([HackMD][2])
> * “Download all notes” is available in Settings (manual ZIP). ([HackMD][3])
> * Historical **`/download`** path returns raw Markdown for a note URL (works mainly for **public/published** links; it’s undocumented—use as best-effort fallback only). ([GitHub][4])
> * Airtable REST API rate limit: **5 req/sec per base**; bulk update/upsert up to **10 records per request**. ([Airtable][5])
> * New interactive HackMD **Swagger docs** link (for endpoint shapes). ([HackMD][6])
---
## Shared groundwork (used by both plans)
**Environment variables**
```
HACKMD_TOKEN=... # Bearer token
HACKMD_TEAM_PATH=... # optional (e.g., '@bok-center'); empty => personal
NOTES_DIR=./content/hackmd # where enriched .md files land
AIRTABLE_TOKEN=... # PAT
AIRTABLE_BASE_ID=appXXXXXXXX # your Base
AIRTABLE_TABLE=Notes # table name (or tblXXXXXXXX)
```
**Directory layout**
```
/app/api/...
/lib/hackmd.ts # API wrapper + rate limit handling
/lib/files.ts # read/write .md, hashing, safe filenames
/lib/yaml.ts # add/merge YAML via gray-matter
/lib/airtable.ts # batched upsert (10 per request)
/content/hackmd/ # enriched markdown output
/export/manual/ # (Plan 2) unzipped manual download of .md files
/state/index.json # cache: { noteId -> { lastChangedAt, sha256, path } }
```
**Airtable schema (suggested)**
Table **Notes** (primary field = `HackMD ID`):
* `HackMD ID` (single line) — **unique key**
* `Title` (single line)
* `Short ID` (single line)
* `Tags` (multiple select)
* `Created At` (date-time)
* `Last Changed At` (date-time)
* `Publish Type` (single select)
* `Read Permission` / `Write Permission` (single select)
* `Team Path` / `User Path` (single line)
* `Permalink` (URL), `Publish Link` (URL)
* `Local Path` (single line)
* `SHA256` (single line)
* `YAML` (long text)
* `Content` (long text) — optional; include if you want raw MD in Airtable
* `Status` (single select: New/Updated/Unchanged/Failed)
* `Last Sync At` (date-time)
Use **bulk upsert** (≤10 records/request) with `performUpsert` on field `HackMD ID`. ([Airtable][7])
**YAML front‑matter shape (example)**
```yaml
---
title: "<derived from content or API>"
tags: ["tag1", "tag2"]
created: "2025-01-11T15:31:56.000Z"
updated: "2025-03-07T19:14:22.000Z"
source: "hackmd"
slug: "<permalink or shortId>"
hackmd:
id: "<noteId>"
shortId: "<shortId>"
readPermission: "owner|signed_in|guest"
writePermission: "owner|signed_in|guest"
publishType: "view|slide|book"
publishedAt: "2024-... (optional)"
permalink: "<string|null>"
publishLink: "<string|null>"
teamPath: "<@team or null>"
userPath: "<userPath>"
lastChangeUserName: "<string>"
url: "https://hackmd.io/<shortId or permalink or id>"
file:
path: "content/hackmd/<safe-title>--<shortId>.md"
sha256: "<hash>"
size: 12345
apiDownloadedAt: "2025-08-27T..."
---
```
(*HackMD titles are derived from content; don’t be surprised if the first `#` heading is the title.*) ([HackMD][2])
---
# Plan 1 — **List notes via API, try `/<...>/download` first, fall back to API content**
### Flow
1. **List metadata**
* Call `GET https://api.hackmd.io/v1/notes` (and optionally `GET /v1/teams/{teamPath}/notes` if you need a team; these older “Teams” list endpoints exist but are marked deprecated—prefer Swagger docs to confirm your variant). Cache `id`, `shortId`, `title`, `tags`, `createdAt`, `lastChangedAt`, `readPermission`, `publishType`, `permalink`, `publishLink`, `teamPath`, `userPath`. ([HackMD][2])
2. **For each note, attempt a direct Markdown download**
Build a **candidate download URL**:
* If `publishLink` exists (published): use `publishLink + '/download'`.
* Else if `readPermission==='guest'` and `shortId` exists: try `https://hackmd.io/{shortId}/download` (or `https://hackmd.io/@{userOrTeam}/{shortId}/download`).
* This is **best‑effort**, not guaranteed; if 4xx/other, fall back to step 3. ([GitHub][4])
3. **Fallback to API for content**
`GET https://api.hackmd.io/v1/notes/:noteId` → `.content` (raw MD). Rate‑limit to **≤100 calls per 5 minutes**. ([HackMD][1])
4. **Enrich with YAML**
Use `gray-matter` to parse/merge. Prepend or merge the YAML block above (avoid overwriting any existing YAML; nest HackMD‑specific fields under `hackmd:`). ([GitHub][8])
5. **Write to disk**
* File path: `${NOTES_DIR}/${safeTitle}--${shortId || id}.md`
* Compute `sha256` of final file; store/update `/state/index.json` (`id -> {lastChangedAt, sha256, path}`) to avoid reprocessing unchanged notes.
6. **Upsert in Airtable**
Batch records in chunks of 10; include: `HackMD ID`, `Title`, `Short ID`, timestamps, perms, links, `Local Path`, `SHA256`, and optionally `YAML` + `Content`. Respect **5 req/sec**. ([Airtable][5])
7. **Attachments (optional)**
If you’ll need embedded images later, HackMD exposes an **Attachments API**—requests must hit `hackmd.io` (not `api.hackmd.io`) and **follow redirects**. Keep that in mind for a Phase 2. ([HackMD][9])
### Next.js routes (suggested)
* `GET /api/hackmd/list` → returns note metadata cache.
* `POST /api/hackmd/sync-plan1` → orchestrates: list → download/fallback → YAML → write → Airtable.
* `POST /api/hackmd/rebuild-yaml` → re‑merges YAML for all local files (no network).
### Minimal code skeletons (TypeScript)
**`/lib/hackmd.ts`**
```ts
// fetch with Bearer auth; simple throttling (1 call / 3s ≈ 100/5min)
const sleep = (ms:number)=>new Promise(r=>setTimeout(r,ms));
export async function listNotes() {
const res = await fetch('https://api.hackmd.io/v1/notes', {
headers: { Authorization: `Bearer ${process.env.HACKMD_TOKEN}` }
});
if (!res.ok) throw new Error(`List failed: ${res.status}`);
return res.json();
}
export async function getNoteContent(noteId: string) {
await sleep(3000);
const res = await fetch(`https://api.hackmd.io/v1/notes/${noteId}`, {
headers: { Authorization: `Bearer ${process.env.HACKMD_TOKEN}` }
});
if (!res.ok) throw new Error(`Get note failed: ${res.status}`);
const j = await res.json();
return j.content as string;
}
export function candidateDownloadUrls(n: any): string[] {
const urls: string[] = [];
if (n.publishLink) urls.push(`${n.publishLink}/download`);
if (n.readPermission === 'guest' && n.shortId) {
urls.push(`https://hackmd.io/${n.shortId}/download`);
if (n.userPath || n.teamPath) {
const owner = n.teamPath?.replace(/^@/,'') || n.userPath;
urls.push(`https://hackmd.io/@${owner}/${n.shortId}/download`);
}
}
return urls;
}
```
**`/lib/yaml.ts`**
```ts
import matter from 'gray-matter';
import crypto from 'node:crypto';
export function enrichMarkdown(rawMd: string, meta: any, outPath: string) {
const parsed = matter(rawMd);
const base = parsed.data || {};
// merge without clobbering user's keys
const merged = {
...base,
title: base.title ?? meta.title ?? parsed.content.match(/^#\s+(.*)$/m)?.[1],
tags: base.tags ?? meta.tags ?? [],
created: base.created ?? new Date(meta.createdAt).toISOString(),
updated: new Date(meta.lastChangedAt).toISOString(),
source: 'hackmd',
slug: base.slug ?? meta.permalink ?? meta.shortId ?? meta.id,
hackmd: { ...(base.hackmd || {}), ...pickHackmd(meta) },
file: { ...(base.file || {}), path: outPath },
apiDownloadedAt: new Date().toISOString()
};
return matter.stringify(parsed.content, merged);
}
function pickHackmd(m:any){ // trim to useful fields
const {id, shortId, readPermission, writePermission, publishType,
publishedAt, permalink, publishLink, teamPath, userPath,
lastChangeUser} = m;
return {
id, shortId, readPermission, writePermission, publishType,
publishedAt, permalink, publishLink, teamPath, userPath,
lastChangeUserName: lastChangeUser?.name
};
}
export function sha256(s: string){
return crypto.createHash('sha256').update(s).digest('hex');
}
```
**`/lib/airtable.ts`** (batched upsert)
```ts
const BASE = process.env.AIRTABLE_BASE_ID!;
const TABLE = process.env.AIRTABLE_TABLE!;
const API = `https://api.airtable.com/v0/${BASE}/${encodeURIComponent(TABLE)}`;
async function batchUpsert(records: any[]) {
// Up to 10 per request; respect 5 r/s
const body = {
performUpsert: { fieldsToMergeOn: ['HackMD ID'] },
records: records.map(r => ({ fields: r }))
};
const res = await fetch(API, {
method: 'PATCH',
headers: {
Authorization: `Bearer ${process.env.AIRTABLE_TOKEN}`,
'Content-Type': 'application/json'
},
body: JSON.stringify(body)
});
if (!res.ok) throw new Error(`Airtable upsert failed ${res.status}`);
return res.json();
}
export async function upsertInChunks(rows:any[]) {
for (let i=0;i<rows.length;i+=10) {
await batchUpsert(rows.slice(i,i+10));
await new Promise(r=>setTimeout(r,220)); // ~4.5 r/s
}
}
```
**`/app/api/hackmd/sync-plan1/route.ts`** (orchestrator)
```ts
import { NextRequest, NextResponse } from 'next/server';
import fs from 'node:fs/promises';
import path from 'node:path';
import { listNotes, getNoteContent, candidateDownloadUrls } from '@/lib/hackmd';
import { enrichMarkdown, sha256 } from '@/lib/yaml';
import { upsertInChunks } from '@/lib/airtable';
export async function POST(_req: NextRequest) {
const notes = await listNotes();
const outDir = process.env.NOTES_DIR || './content/hackmd';
await fs.mkdir(outDir, { recursive: true });
const toAirtable:any[] = [];
for (const n of notes) {
let content: string | null = null;
// 1) try /download (best-effort for public notes)
for (const u of candidateDownloadUrls(n)) {
try {
const r = await fetch(u, { redirect: 'follow' });
if (r.ok) { content = await r.text(); break; }
} catch {}
}
// 2) fallback to API
if (!content) content = await getNoteContent(n.id);
// 3) write enriched MD
const safe = (s:string)=>s.replace(/[^\w\-]+/g,'-').replace(/-+/g,'-').replace(/^-|-$/g,'');
const fname = `${safe(n.title || 'untitled')}--${n.shortId || n.id}.md`;
const outPath = path.join(outDir, fname);
const md = enrichMarkdown(content, n, outPath);
await fs.writeFile(outPath, md, 'utf8');
const hash = sha256(md);
// 4) prepare Airtable row
toAirtable.push({
'HackMD ID': n.id,
'Title': n.title,
'Short ID': n.shortId,
'Tags': (n.tags || []).map((t:string)=>t),
'Created At': new Date(n.createdAt).toISOString(),
'Last Changed At': new Date(n.lastChangedAt).toISOString(),
'Publish Type': n.publishType,
'Read Permission': n.readPermission,
'Write Permission': n.writePermission,
'Team Path': n.teamPath,
'User Path': n.userPath,
'Permalink': n.permalink,
'Publish Link': n.publishLink,
'Local Path': outPath,
'SHA256': hash,
'Status': 'Updated'
});
}
await upsertInChunks(toAirtable);
return NextResponse.json({ ok: true, processed: toAirtable.length });
}
```
---
# Plan 2 — **List notes via API; pair with a folder from manual “Download all notes”; enrich + Airtable**
This minimizes per‑note API **content** calls (you still make **one** list call) and then merges metadata with files from the manual export. Manual export instructions live in HackMD Settings → **Download all notes**. ([HackMD][3])
### Flow
1. **User action**: Download the ZIP from HackMD Settings and unzip it to `${PROJECT_ROOT}/export/manual/`. (The export provides the raw `.md` for your notes; names typically reflect titles—ID presence is not guaranteed.)
2. **List metadata**
Same `GET /v1/notes` call as Plan 1; cache `id`, `shortId`, `title`, `lastChangedAt`, etc. ([HackMD][2])
3. **Pairing heuristic (file ↔ note)**
Because exported filenames may not contain IDs, pair using a cascade:
* **Exact title match** (normalized): first `# Heading` from file vs. API `title` (HackMD titles derive from content) ([HackMD][2])
* **Fuzzy title match** (Levenshtein) if needed.
* **Content prefix similarity**: compare first 300–1,000 chars of file to `GET /v1/notes/:id` **only** for the small subset still unpaired (keeps API calls low).
* **Last resort**: manual review list for any still-unmatched files (emit a JSON report).
4. **YAML enrichment + write**
For each **paired** file, read file content, merge YAML like in Plan 1 (do **not** fetch content via API unless necessary). Write the **enriched file** to `${NOTES_DIR}` with a stable filename `${safeTitle}--${shortId || id}.md`.
5. **Airtable upsert**
Same as Plan 1 (batched 10 at a time).
6. **Report**
Return a JSON payload listing: `{ pairedCount, unmatchedFiles: [...], unmatchedNotes: [...] }` so you can quickly handle edge cases.
### Next.js routes (suggested)
* `POST /api/hackmd/sync-plan2` → expects nothing; it reads the folder `${PROJECT_ROOT}/export/manual/`, lists API notes, pairs, enriches, and upserts.
* `GET /api/hackmd/pairing-report` → returns the last pairing report.
**Pairing helper (sketch)**
```ts
// /lib/pair.ts
import fs from 'node:fs/promises';
import path from 'node:path';
export async function loadManualFiles(dir:string) {
const entries = await fs.readdir(dir, { withFileTypes: true });
const files = entries.filter(e=>e.isFile() && e.name.endsWith('.md'));
return Promise.all(files.map(async f => {
const p = path.join(dir, f.name);
const text = await fs.readFile(p,'utf8');
const h1 = text.match(/^#\s+(.*)$/m)?.[1]?.trim();
return { path: p, name: f.name, h1, text };
}));
}
export function matchNotesToFiles(notes:any[], files:any[]) {
const norm = (s:string)=>s?.toLowerCase().replace(/\s+/g,' ').trim() || '';
const byTitle = new Map(files.map(f => [norm(f.h1), f]));
const pairs:any[] = [], unmatchedNotes:any[] = [], unmatchedFiles = new Set(files.map(f=>f.path));
for (const n of notes) {
const f = byTitle.get(norm(n.title));
if (f) { pairs.push({ note:n, file:f }); unmatchedFiles.delete(f.path); }
else unmatchedNotes.push(n);
}
return { pairs, unmatchedFiles: [...unmatchedFiles], unmatchedNotes };
}
```
---
## Rate limiting and durability
* **HackMD**: throttle to **≤100/5 min**; remember each **note content fetch** (Plan 1 fallback) is **one request per note**. Use your metadata list + `lastChangedAt` to skip unchanged notes on subsequent runs and stay under the **2,000 calls/month** cap. ([HackMD][1])
* **Airtable**: batch updates (10/req) and sleep \~200–250ms between requests to remain under **5 req/sec per base**. ([Airtable][5])
* **Direct `/download`**: treat as opportunistic; works best for **public/published** notes. If it fails, fall back to the API (reliable). ([GitHub][4])
---
## Nice-to-haves you can add later
* **Local index** (`/state/index.json`) to skip any `id` whose `lastChangedAt` hasn’t advanced—turns N calls into “only changed notes.”
* **Attachments pass** using HackMD Attachments API (remember to use `hackmd.io` domain and follow redirects). ([HackMD][9])
* **Team workspaces**: if you use Teams, list team notes using the relevant endpoint (see Developer Portal + Swagger to confirm the current path; older “Teams API” pages are marked deprecated). ([HackMD][6])
---
### Why two plans?
* **Plan 1** gives you a fully automated pipeline end‑to‑end, leaning on `/download` when possible to reduce API content calls, with a robust API fallback for private notes.
* **Plan 2** minimizes content API calls by reusing the **manual ZIP**, while still capturing authoritative **metadata** via the API and pushing a clean, YAML‑enriched corpus into Airtable.
If you want, I can adapt these skeletons to your exact repo layout (e.g., monorepo with `apps/web`), wire in your **pnpm** scripts, and add a tiny admin page to trigger either plan and show the pairing report.
[1]: https://hackmd.io/%40hackmd-api/api-policy?utm_source=chatgpt.com "Policy"
[2]: https://hackmd.io/%40hackmd-api/user-notes-api?utm_source=chatgpt.com "User Notes API"
[3]: https://hackmd.io/%40hackmd-blog/batch-download-all-your-notes?utm_source=chatgpt.com "Batch Download All Your Notes :package"
[4]: https://github.com/hackmdio/codimd/issues/448?utm_source=chatgpt.com "Download URL for raw Markdown file - hackmdio/codimd"
[5]: https://www.airtable.com/developers/web/api/rate-limits?utm_source=chatgpt.com "Rate limits - Airtable Web API"
[6]: https://hackmd.io/%40hackmd-api/developer-portal "HackMD Developer Portal - HackMD"
[7]: https://www.airtable.com/developers/web/api/update-multiple-records "Airtable Web API"
[8]: https://github.com/jonschlinkert/gray-matter?utm_source=chatgpt.com "GitHub - jonschlinkert/gray-matter: Smarter YAML front ..."
[9]: https://hackmd.io/%40hackmd-api/attachments-api?utm_source=chatgpt.com "User Notes API"