📜 Downloading Ethereum History with a Parallel Downloader for era1, era files

# 📜 Downloading Ethereum History with a Parallel Downloader for era1, era files Ethereum is evolving — and with that evolution comes a major shift in how we access historical blockchain data. With the upcoming implementation of [EIP-4444](https://eips.ethereum.org/EIPS/eip-4444), Ethereum clients will **no longer be required to serve pre-merge historical block data via the P2P layer**. Instead, users will depend on alternative archival distribution channels like the [Portal Network](https://portal.network), decentralized content-addressable storage, and prepackaged historical snapshots like `.era` and `.era1` files. --- ## ⚠️ Why This Matters > 🔥 **EIP-4444 deprecates full sync from Genesis.** Traditionally, Ethereum clients could sync the entire chain from Genesis by connecting to peers and requesting old blocks. This is no longer sustainable: - Old block data takes up **hundreds of gigabytes**. - Very few peers reliably serve ancient history. - Syncing from Genesis takes **days** or even **weeks**. With EIP-4444: - Execution clients **stop serving block bodies older than 1 year**. - **Pre-merge history is removed** from the standard sync path. - Developers and node operators must rely on **external data providers**, archives, or **structured history dumps**. One such structured and performant format is the **`era1` file** — a compact, indexable snapshot format developed by the Nimbus team. It's quickly becoming the **de facto standard** for fast and efficient full history syncs post-EIP-4444. ## 🔍 What Are `.era` and `.era1` Files? `era` and `era1` files are compressed binary dumps of Ethereum history — including blocks, receipts, and sometimes state. They are: - ⚡ Optimized for fast ingestion by Ethereum clients - 🌐 Served over HTTP, torrent, and IPFS - 🏗️ Used for syncing execution clients like [Nimbus](https://nimbus.team) without relying on peers They are crucial for: - Bootstrapping archival nodes - Syncing testnets like **Sepolia** or **Holesky** - Auditing and reprocessing historical data - Supporting state reconstruction for zk-rollups or proving systems ## 🧩 The Challenge Most `.era` files are hosted as **plain HTTP directory listings** — there's no API, no manifest, and no standardized tool for downloading hundreds of large files efficiently. So what's the problem? - ❌ Manual downloading is slow and error-prone - ❌ Some `.era` files are hundreds of megabytes - ❌ No per-directory progress bar - ❌ No resume or retry by default ## ✅ Our Solution: The Era Downloader Script We created a **cross-platform Bash script** that: - 🚀 Downloads all `.era`, `.era1`, and `.txt` files from a directory listing - ⏱️ Uses `aria2c` for **fast parallel downloads** - 📈 Shows a global progress bar — `% of files downloaded` - ♻️ Resumable and restart-safe - 🐧 Works on both **Linux and macOS** It’s perfect for: - Researchers archiving history - Node operators doing a full sync - Projects building zk-verifiers from raw data - Developers contributing to Portal or history sync infrastructure ## 🖥️ Prerequisites You'll need: - [`aria2`](https://aria2.github.io/) installed: - **macOS**: `brew install aria2` - **Ubuntu/Debian**: `sudo apt install aria2` - Standard Unix tools: `bash`, `awk`, `find`, `grep`, `curl` ## 📜 The Script (Save as `download_era.sh`) ```bash #!/bin/bash # Copyright (c) 2025 Status Research & Development GmbH. Licensed under # either of: # - Apache License, version 2.0 # - MIT license # at your option. This file may not be copied, modified, or distributed except # according to those terms. # Usage: ./download_era.sh <download_url> <download_path> set -eo pipefail if [ $# -ne 2 ]; then echo "Usage: $0 <download_url> <download_path>" exit 1 fi DOWNLOAD_URL="$1" DOWNLOAD_DIR="$2" if ! command -v aria2c > /dev/null 2>&1; then echo "❌ aria2c is not installed. Install via: brew install aria2 (macOS) or sudo apt install aria2 (Linux)" exit 1 fi mkdir -p "$DOWNLOAD_DIR" cd "$DOWNLOAD_DIR" || exit 1 # Generate safe temp files for URL lists URLS_RAW_FILE=$(mktemp) URLS_FILE=$(mktemp) # Scrape and filter curl -s "$DOWNLOAD_URL" | \ grep -Eo 'href="[^"]+"' | \ cut -d'"' -f2 | \ grep -Ei '\.(era|era1|txt)$' | \ sort -u > "$URLS_RAW_FILE" # Remove trailing file (like index.html) to get actual base path BASE_URL=$(echo "$DOWNLOAD_URL" | sed -E 's|/[^/]*\.[a-zA-Z0-9]+$||') # 🔧 Normalize base URL (handle trailing slash or index.html) case "$DOWNLOAD_URL" in */index.html) BASE_URL="${DOWNLOAD_URL%/index.html}" ;; */) BASE_URL="${DOWNLOAD_URL%/}" ;; *) BASE_URL="$DOWNLOAD_URL" ;; esac # Prepend full URL awk -v url="$BASE_URL" '{ print url "/" $0 }' "$URLS_RAW_FILE" > "$URLS_FILE" TOTAL_FILES=$(wc -l < "$URLS_FILE") if [ "$TOTAL_FILES" -eq 0 ]; then echo "❌ No .era, .era1, or .txt files found at $DOWNLOAD_URL" exit 1 fi aria2c -x 8 -j 5 -c -i "$URLS_FILE" \ --dir="." \ --console-log-level=warn \ --quiet=true \ --summary-interval=0 \ > /dev/null 2>&1 & ARIA_PID=$! echo "📥 Starting download of $TOTAL_FILES files..." while kill -0 "$ARIA_PID" 2> /dev/null; do COMPLETED=$(find . -type f $ -name '*.era' -o -name '*.era1' -o -name '*.txt' $ | wc -l) PERCENT=$(awk "BEGIN { printf \"%.1f\", ($COMPLETED/$TOTAL_FILES)*100 }") echo -ne "📦 Download Progress: $PERCENT% complete ($COMPLETED / $TOTAL_FILES files) \r" sleep 1 done COMPLETED=$(find . -type f $ -name '*.era' -o -name '*.era1' -o -name '*.txt' $ | wc -l) echo -ne "📦 Download Progress: 100% complete ($COMPLETED / $TOTAL_FILES files) \n" # ✅ Cleanup temp files rm -f "$URLS_RAW_FILE" "$URLS_FILE" echo "✅ All files downloaded to: $DOWNLOAD_DIR" ``` ### 🔧 Example Usage ```bash chmod +x download_era.sh ./download_era.sh https://sepolia.era1.nimbus.team ~/Downloads/sepolia ``` This will: - Fetch all .era, .era1, and .txt files - Store them in your chosen directory - Show clean global progress every second - Resume partial downloads if interrupted