Introducing Schematron: Structured HTML Extraction 80-160x cheaper than Frontier LLMs

# Introducing Schematron: Structured HTML Extraction 80-160x cheaper than Frontier LLMs We're thrilled to introduce Schematron, a family of specialized models that transform messy HTML into clean, structured JSON. Schematron-8B and Schematron-3B deliver frontier-level extraction quality at 1-2% of the cost and 10x+ faster inference than large, general-purpose LLMs. By specializing in HTML-to-JSON, they are able to handle malformed HMTL, complex JSON schemas, and excel across varying context lengths. These models unlock workloads that would previously have been economically infeasible, and make web agents faster and more accurate. Here's what processing 1 million pages daily looks like (assuming 3,000 input + 150 output tokens per page): ![Cost Chart](https://hackmd.io/_uploads/HyxmY5X5gl.png) Both models are open source and available today through our [serverless API](https://inference.net/models). ## The Challenge of Structured Web Data While today's strongest LLMs are extremely effective at extracting data from messy HTML, they're often impractical for real-world scraping workflows, where processing millions of pages with GPT-4.1 or Claude Sonnet can cost thousands of dollars per run. Our team has experienced this firsthand through building dozens of scrapers and crawlers, and acutely felt the need for a more affordable HTML parsing model that doesn’t sacrifice extraction quality. We built Schematron to deliver frontier-model accuracy at prices that make sense for production workloads. Monitoring competitor prices, aggregating product catalogs, or building research agents all require extracting clean, typed data from messy HTML--a task that Schematron solves at a price point low enough to unlock new use cases. ## Meet the Schematron Family We trained two versions for Schematron: a highly accurate Schematron-8B and a marginally less accurate but significantly faster Schematron-3B. **Schematron-8B** excels at complex extraction tasks, handling intricate nested schemas and very long documents with remarkable accuracy. It's your go-to choice when quality matters most. **Schematron-3B** offers exceptional value for simpler extraction tasks, processing standard web pages in milliseconds while maintaining impressive accuracy for common use cases. Both models share core capabilities: - **128K token context window**: for processing long pages - **Strict JSON mode**: guaranteeing parseable, schema-compliant output with 100% adherance The quality of these models matches that of frontier models like despite their massively lower size: ![LLM-as-a-judge_ Average Quality Score by Model](https://hackmd.io/_uploads/rkUBKqQ5ll.png) ## Best-in-class Speed and Cost Schematron's speed unlocks new agentic use cases, allowing LLMs to navigate and parse the web an order of magnitude faster. 10x lower latency also save developer time and server resources, which is essential for large crawling and scraping jobs. Some scraping/parsing tasks are extremely time sensitive, such as parsing financial documents for live trading. For these use cases, a fast model like Schematron can give you a significant advantage. ![Average Latency by Model](https://hackmd.io/_uploads/H1GYFqXcgg.png) Perhaps more importantly, Schematron is significantly cheaper than using a frontier model. Schematron-8B and Schematron 3B are 60x and 120x more affordable than GPT-5, respectively. This unlocks internet-scale scraping jobs that previously would have been impossible for all but the best-funded teams. It also makes it cost-effective to monitor large groups of pages for real-time data that depends on external sites. ![Cost per 100k HTML Pages Parsed](https://hackmd.io/_uploads/ryDjFqQ9gl.png) ## How We Built Schematron To create models that truly understand both HTML structure and JSON schemas, we started by assembling a massive corpus of real-world web pages from Common Crawl. Using frontier models as teachers, we created hundreds of thousands of extraction examples across diverse domains—from e-commerce product pages to technical documentation. We then clustered similar websites and generated schemas that could apply broadly across each cluster. This mirrors real-world usage where you define a schema once and use it on multiple pages. Our curriculum learning approach progressively exposed the models to longer documents throughout training, ensuring stable performance even at the 128K token limit. The result was models that truly understand the relationship between schemas and HTML structure. ## Built for Developers To use Schematron models: define your schema using Pydantic, Zod, or JSON Schema. Pass in raw HTML and get back validated, type-safe data that's ready to use in your application. Here's how simple it is to extract product information from any e-commerce page: ```python from pydantic import BaseModel from openai import OpenAI class Product(BaseModel): name: str price: float availability: str specifications: dict client = OpenAI( base_url="https://api.inference.net/v1", api_key="your-api-key" ) response = client.chat.completions.create( model="inference-net/schematron-8b", messages=[{"role": "user", "content": html}], response_format={"type": "json_object"}, schema=Product.model_json_schema() ) product = Product.parse_raw(response.choices[0].message.content) ``` ## Open Source and API Access We believe specialized models like Schematron represent the future of AI applications—carefully trained small models that excel at specific tasks rather than expensive generalists trying to do everything. That's why we're making Schematron available through multiple channels: **Inference.net API**: Start extracting immediately with our hosted API. Pay only for what you use with no upfront costs. **Open Source**: Both Schematron-3B and Schematron-8B are available on Hugging Face under permissive licenses. Deploy them on your own infrastructure, fine-tune for your specific domain, or integrate directly into your applications. **Batch Processing**: Our Batch API delivers significant savings on large-scale extraction—ideal for parsing workloads that aren’t time-sensitive. We also recommend a webhook workflow: submit an async request, and results are posted to your webhook (typically within seconds) to upsert your database, while retaining the same batch-level savings. ## Start Extracting We'd love to hear what you'd like to build with this model. Schedule a meeting with us or get started with these resources: - Try the [Inference.net API](https://inference.net) with $10 free credits - Download models from [Hugging Face](https://huggingface.co/inference-net) - Read the [documentation](https://docs.inference.net/schematron) We can't wait to see what you'll build with truly affordable, high-quality extraction. --- *The Schematron models are now generally available via the Inference.net API at $0.30/1M input tokens and $1.20/1M output tokens for Schematron-8B, and $0.15/1M input tokens and $0.60/1M output tokens for Schematron-3B.*