plan-for-hackmd-airtable-sync

# plan-for-hackmd-airtable-sync ## command line tool Great—here are two **implementation plans** for your locally running Next.js app that (a) enrich every note with YAML front‑matter, and (b) push notes + metadata into Airtable. I’ll keep UI light and focus on **API routes**, **file layout**, and **Airtable interactions**. > **Key facts we’ll rely on** > > * HackMD API quotas: **2,000 calls/month** (10,000 on Prime) and **100 calls / 5 minutes**. ([HackMD][1]) > * Notes list & note-content endpoints (`GET /v1/notes`, `GET /v1/notes/:noteId`). ([HackMD][2]) > * “Download all notes” is available in Settings (manual ZIP). ([HackMD][3]) > * Historical **`/download`** path returns raw Markdown for a note URL (works mainly for **public/published** links; it’s undocumented—use as best-effort fallback only). ([GitHub][4]) > * Airtable REST API rate limit: **5 req/sec per base**; bulk update/upsert up to **10 records per request**. ([Airtable][5]) > * New interactive HackMD **Swagger docs** link (for endpoint shapes). ([HackMD][6]) --- ## Shared groundwork (used by both plans) **Environment variables** ``` HACKMD_TOKEN=... # Bearer token HACKMD_TEAM_PATH=... # optional (e.g., '@bok-center'); empty => personal NOTES_DIR=./content/hackmd # where enriched .md files land AIRTABLE_TOKEN=... # PAT AIRTABLE_BASE_ID=appXXXXXXXX # your Base AIRTABLE_TABLE=Notes # table name (or tblXXXXXXXX) ``` **Directory layout** ``` /app/api/... /lib/hackmd.ts # API wrapper + rate limit handling /lib/files.ts # read/write .md, hashing, safe filenames /lib/yaml.ts # add/merge YAML via gray-matter /lib/airtable.ts # batched upsert (10 per request) /content/hackmd/ # enriched markdown output /export/manual/ # (Plan 2) unzipped manual download of .md files /state/index.json # cache: { noteId -> { lastChangedAt, sha256, path } } ``` **Airtable schema (suggested)** Table **Notes** (primary field = `HackMD ID`): * `HackMD ID` (single line) — **unique key** * `Title` (single line) * `Short ID` (single line) * `Tags` (multiple select) * `Created At` (date-time) * `Last Changed At` (date-time) * `Publish Type` (single select) * `Read Permission` / `Write Permission` (single select) * `Team Path` / `User Path` (single line) * `Permalink` (URL), `Publish Link` (URL) * `Local Path` (single line) * `SHA256` (single line) * `YAML` (long text) * `Content` (long text) — optional; include if you want raw MD in Airtable * `Status` (single select: New/Updated/Unchanged/Failed) * `Last Sync At` (date-time) Use **bulk upsert** (≤10 records/request) with `performUpsert` on field `HackMD ID`. ([Airtable][7]) **YAML front‑matter shape (example)** ```yaml --- title: "<derived from content or API>" tags: ["tag1", "tag2"] created: "2025-01-11T15:31:56.000Z" updated: "2025-03-07T19:14:22.000Z" source: "hackmd" slug: "<permalink or shortId>" hackmd: id: "<noteId>" shortId: "<shortId>" readPermission: "owner|signed_in|guest" writePermission: "owner|signed_in|guest" publishType: "view|slide|book" publishedAt: "2024-... (optional)" permalink: "<string|null>" publishLink: "<string|null>" teamPath: "<@team or null>" userPath: "<userPath>" lastChangeUserName: "<string>" url: "https://hackmd.io/<shortId or permalink or id>" file: path: "content/hackmd/<safe-title>--<shortId>.md" sha256: "<hash>" size: 12345 apiDownloadedAt: "2025-08-27T..." --- ``` (*HackMD titles are derived from content; don’t be surprised if the first `#` heading is the title.*) ([HackMD][2]) --- # Plan 1 — **List notes via API, try `/<...>/download` first, fall back to API content** ### Flow 1. **List metadata** * Call `GET https://api.hackmd.io/v1/notes` (and optionally `GET /v1/teams/{teamPath}/notes` if you need a team; these older “Teams” list endpoints exist but are marked deprecated—prefer Swagger docs to confirm your variant). Cache `id`, `shortId`, `title`, `tags`, `createdAt`, `lastChangedAt`, `readPermission`, `publishType`, `permalink`, `publishLink`, `teamPath`, `userPath`. ([HackMD][2]) 2. **For each note, attempt a direct Markdown download** Build a **candidate download URL**: * If `publishLink` exists (published): use `publishLink + '/download'`. * Else if `readPermission==='guest'` and `shortId` exists: try `https://hackmd.io/{shortId}/download` (or `https://hackmd.io/@{userOrTeam}/{shortId}/download`). * This is **best‑effort**, not guaranteed; if 4xx/other, fall back to step 3. ([GitHub][4]) 3. **Fallback to API for content** `GET https://api.hackmd.io/v1/notes/:noteId` → `.content` (raw MD). Rate‑limit to **≤100 calls per 5 minutes**. ([HackMD][1]) 4. **Enrich with YAML** Use `gray-matter` to parse/merge. Prepend or merge the YAML block above (avoid overwriting any existing YAML; nest HackMD‑specific fields under `hackmd:`). ([GitHub][8]) 5. **Write to disk** * File path: `${NOTES_DIR}/${safeTitle}--${shortId || id}.md` * Compute `sha256` of final file; store/update `/state/index.json` (`id -> {lastChangedAt, sha256, path}`) to avoid reprocessing unchanged notes. 6. **Upsert in Airtable** Batch records in chunks of 10; include: `HackMD ID`, `Title`, `Short ID`, timestamps, perms, links, `Local Path`, `SHA256`, and optionally `YAML` + `Content`. Respect **5 req/sec**. ([Airtable][5]) 7. **Attachments (optional)** If you’ll need embedded images later, HackMD exposes an **Attachments API**—requests must hit `hackmd.io` (not `api.hackmd.io`) and **follow redirects**. Keep that in mind for a Phase 2. ([HackMD][9]) ### Next.js routes (suggested) * `GET /api/hackmd/list` → returns note metadata cache. * `POST /api/hackmd/sync-plan1` → orchestrates: list → download/fallback → YAML → write → Airtable. * `POST /api/hackmd/rebuild-yaml` → re‑merges YAML for all local files (no network). ### Minimal code skeletons (TypeScript) **`/lib/hackmd.ts`** ```ts // fetch with Bearer auth; simple throttling (1 call / 3s ≈ 100/5min) const sleep = (ms:number)=>new Promise(r=>setTimeout(r,ms)); export async function listNotes() { const res = await fetch('https://api.hackmd.io/v1/notes', { headers: { Authorization: `Bearer ${process.env.HACKMD_TOKEN}` } }); if (!res.ok) throw new Error(`List failed: ${res.status}`); return res.json(); } export async function getNoteContent(noteId: string) { await sleep(3000); const res = await fetch(`https://api.hackmd.io/v1/notes/${noteId}`, { headers: { Authorization: `Bearer ${process.env.HACKMD_TOKEN}` } }); if (!res.ok) throw new Error(`Get note failed: ${res.status}`); const j = await res.json(); return j.content as string; } export function candidateDownloadUrls(n: any): string[] { const urls: string[] = []; if (n.publishLink) urls.push(`${n.publishLink}/download`); if (n.readPermission === 'guest' && n.shortId) { urls.push(`https://hackmd.io/${n.shortId}/download`); if (n.userPath || n.teamPath) { const owner = n.teamPath?.replace(/^@/,'') || n.userPath; urls.push(`https://hackmd.io/@${owner}/${n.shortId}/download`); } } return urls; } ``` **`/lib/yaml.ts`** ```ts import matter from 'gray-matter'; import crypto from 'node:crypto'; export function enrichMarkdown(rawMd: string, meta: any, outPath: string) { const parsed = matter(rawMd); const base = parsed.data || {}; // merge without clobbering user's keys const merged = { ...base, title: base.title ?? meta.title ?? parsed.content.match(/^#\s+(.*)$/m)?.[1], tags: base.tags ?? meta.tags ?? [], created: base.created ?? new Date(meta.createdAt).toISOString(), updated: new Date(meta.lastChangedAt).toISOString(), source: 'hackmd', slug: base.slug ?? meta.permalink ?? meta.shortId ?? meta.id, hackmd: { ...(base.hackmd || {}), ...pickHackmd(meta) }, file: { ...(base.file || {}), path: outPath }, apiDownloadedAt: new Date().toISOString() }; return matter.stringify(parsed.content, merged); } function pickHackmd(m:any){ // trim to useful fields const {id, shortId, readPermission, writePermission, publishType, publishedAt, permalink, publishLink, teamPath, userPath, lastChangeUser} = m; return { id, shortId, readPermission, writePermission, publishType, publishedAt, permalink, publishLink, teamPath, userPath, lastChangeUserName: lastChangeUser?.name }; } export function sha256(s: string){ return crypto.createHash('sha256').update(s).digest('hex'); } ``` **`/lib/airtable.ts`** (batched upsert) ```ts const BASE = process.env.AIRTABLE_BASE_ID!; const TABLE = process.env.AIRTABLE_TABLE!; const API = `https://api.airtable.com/v0/${BASE}/${encodeURIComponent(TABLE)}`; async function batchUpsert(records: any[]) { // Up to 10 per request; respect 5 r/s const body = { performUpsert: { fieldsToMergeOn: ['HackMD ID'] }, records: records.map(r => ({ fields: r })) }; const res = await fetch(API, { method: 'PATCH', headers: { Authorization: `Bearer ${process.env.AIRTABLE_TOKEN}`, 'Content-Type': 'application/json' }, body: JSON.stringify(body) }); if (!res.ok) throw new Error(`Airtable upsert failed ${res.status}`); return res.json(); } export async function upsertInChunks(rows:any[]) { for (let i=0;i<rows.length;i+=10) { await batchUpsert(rows.slice(i,i+10)); await new Promise(r=>setTimeout(r,220)); // ~4.5 r/s } } ``` **`/app/api/hackmd/sync-plan1/route.ts`** (orchestrator) ```ts import { NextRequest, NextResponse } from 'next/server'; import fs from 'node:fs/promises'; import path from 'node:path'; import { listNotes, getNoteContent, candidateDownloadUrls } from '@/lib/hackmd'; import { enrichMarkdown, sha256 } from '@/lib/yaml'; import { upsertInChunks } from '@/lib/airtable'; export async function POST(_req: NextRequest) { const notes = await listNotes(); const outDir = process.env.NOTES_DIR || './content/hackmd'; await fs.mkdir(outDir, { recursive: true }); const toAirtable:any[] = []; for (const n of notes) { let content: string | null = null; // 1) try /download (best-effort for public notes) for (const u of candidateDownloadUrls(n)) { try { const r = await fetch(u, { redirect: 'follow' }); if (r.ok) { content = await r.text(); break; } } catch {} } // 2) fallback to API if (!content) content = await getNoteContent(n.id); // 3) write enriched MD const safe = (s:string)=>s.replace(/[^\w\-]+/g,'-').replace(/-+/g,'-').replace(/^-|-$/g,''); const fname = `${safe(n.title || 'untitled')}--${n.shortId || n.id}.md`; const outPath = path.join(outDir, fname); const md = enrichMarkdown(content, n, outPath); await fs.writeFile(outPath, md, 'utf8'); const hash = sha256(md); // 4) prepare Airtable row toAirtable.push({ 'HackMD ID': n.id, 'Title': n.title, 'Short ID': n.shortId, 'Tags': (n.tags || []).map((t:string)=>t), 'Created At': new Date(n.createdAt).toISOString(), 'Last Changed At': new Date(n.lastChangedAt).toISOString(), 'Publish Type': n.publishType, 'Read Permission': n.readPermission, 'Write Permission': n.writePermission, 'Team Path': n.teamPath, 'User Path': n.userPath, 'Permalink': n.permalink, 'Publish Link': n.publishLink, 'Local Path': outPath, 'SHA256': hash, 'Status': 'Updated' }); } await upsertInChunks(toAirtable); return NextResponse.json({ ok: true, processed: toAirtable.length }); } ``` --- # Plan 2 — **List notes via API; pair with a folder from manual “Download all notes”; enrich + Airtable** This minimizes per‑note API **content** calls (you still make **one** list call) and then merges metadata with files from the manual export. Manual export instructions live in HackMD Settings → **Download all notes**. ([HackMD][3]) ### Flow 1. **User action**: Download the ZIP from HackMD Settings and unzip it to `${PROJECT_ROOT}/export/manual/`. (The export provides the raw `.md` for your notes; names typically reflect titles—ID presence is not guaranteed.) 2. **List metadata** Same `GET /v1/notes` call as Plan 1; cache `id`, `shortId`, `title`, `lastChangedAt`, etc. ([HackMD][2]) 3. **Pairing heuristic (file ↔ note)** Because exported filenames may not contain IDs, pair using a cascade: * **Exact title match** (normalized): first `# Heading` from file vs. API `title` (HackMD titles derive from content) ([HackMD][2]) * **Fuzzy title match** (Levenshtein) if needed. * **Content prefix similarity**: compare first 300–1,000 chars of file to `GET /v1/notes/:id` **only** for the small subset still unpaired (keeps API calls low). * **Last resort**: manual review list for any still-unmatched files (emit a JSON report). 4. **YAML enrichment + write** For each **paired** file, read file content, merge YAML like in Plan 1 (do **not** fetch content via API unless necessary). Write the **enriched file** to `${NOTES_DIR}` with a stable filename `${safeTitle}--${shortId || id}.md`. 5. **Airtable upsert** Same as Plan 1 (batched 10 at a time). 6. **Report** Return a JSON payload listing: `{ pairedCount, unmatchedFiles: [...], unmatchedNotes: [...] }` so you can quickly handle edge cases. ### Next.js routes (suggested) * `POST /api/hackmd/sync-plan2` → expects nothing; it reads the folder `${PROJECT_ROOT}/export/manual/`, lists API notes, pairs, enriches, and upserts. * `GET /api/hackmd/pairing-report` → returns the last pairing report. **Pairing helper (sketch)** ```ts // /lib/pair.ts import fs from 'node:fs/promises'; import path from 'node:path'; export async function loadManualFiles(dir:string) { const entries = await fs.readdir(dir, { withFileTypes: true }); const files = entries.filter(e=>e.isFile() && e.name.endsWith('.md')); return Promise.all(files.map(async f => { const p = path.join(dir, f.name); const text = await fs.readFile(p,'utf8'); const h1 = text.match(/^#\s+(.*)$/m)?.[1]?.trim(); return { path: p, name: f.name, h1, text }; })); } export function matchNotesToFiles(notes:any[], files:any[]) { const norm = (s:string)=>s?.toLowerCase().replace(/\s+/g,' ').trim() || ''; const byTitle = new Map(files.map(f => [norm(f.h1), f])); const pairs:any[] = [], unmatchedNotes:any[] = [], unmatchedFiles = new Set(files.map(f=>f.path)); for (const n of notes) { const f = byTitle.get(norm(n.title)); if (f) { pairs.push({ note:n, file:f }); unmatchedFiles.delete(f.path); } else unmatchedNotes.push(n); } return { pairs, unmatchedFiles: [...unmatchedFiles], unmatchedNotes }; } ``` --- ## Rate limiting and durability * **HackMD**: throttle to **≤100/5 min**; remember each **note content fetch** (Plan 1 fallback) is **one request per note**. Use your metadata list + `lastChangedAt` to skip unchanged notes on subsequent runs and stay under the **2,000 calls/month** cap. ([HackMD][1]) * **Airtable**: batch updates (10/req) and sleep \~200–250ms between requests to remain under **5 req/sec per base**. ([Airtable][5]) * **Direct `/download`**: treat as opportunistic; works best for **public/published** notes. If it fails, fall back to the API (reliable). ([GitHub][4]) --- ## Nice-to-haves you can add later * **Local index** (`/state/index.json`) to skip any `id` whose `lastChangedAt` hasn’t advanced—turns N calls into “only changed notes.” * **Attachments pass** using HackMD Attachments API (remember to use `hackmd.io` domain and follow redirects). ([HackMD][9]) * **Team workspaces**: if you use Teams, list team notes using the relevant endpoint (see Developer Portal + Swagger to confirm the current path; older “Teams API” pages are marked deprecated). ([HackMD][6]) --- ### Why two plans? * **Plan 1** gives you a fully automated pipeline end‑to‑end, leaning on `/download` when possible to reduce API content calls, with a robust API fallback for private notes. * **Plan 2** minimizes content API calls by reusing the **manual ZIP**, while still capturing authoritative **metadata** via the API and pushing a clean, YAML‑enriched corpus into Airtable. If you want, I can adapt these skeletons to your exact repo layout (e.g., monorepo with `apps/web`), wire in your **pnpm** scripts, and add a tiny admin page to trigger either plan and show the pairing report. [1]: https://hackmd.io/%40hackmd-api/api-policy?utm_source=chatgpt.com "Policy" [2]: https://hackmd.io/%40hackmd-api/user-notes-api?utm_source=chatgpt.com "User Notes API" [3]: https://hackmd.io/%40hackmd-blog/batch-download-all-your-notes?utm_source=chatgpt.com "Batch Download All Your Notes :package" [4]: https://github.com/hackmdio/codimd/issues/448?utm_source=chatgpt.com "Download URL for raw Markdown file - hackmdio/codimd" [5]: https://www.airtable.com/developers/web/api/rate-limits?utm_source=chatgpt.com "Rate limits - Airtable Web API" [6]: https://hackmd.io/%40hackmd-api/developer-portal "HackMD Developer Portal - HackMD" [7]: https://www.airtable.com/developers/web/api/update-multiple-records "Airtable Web API" [8]: https://github.com/jonschlinkert/gray-matter?utm_source=chatgpt.com "GitHub - jonschlinkert/gray-matter: Smarter YAML front ..." [9]: https://hackmd.io/%40hackmd-api/attachments-api?utm_source=chatgpt.com "User Notes API"