Below is a practical, production‑ready plan for a **single “q” search** that works well for type‑ahead UX, scales on Postgres, and is easy to evolve.
---
## TL;DR (recommended approach)
1. **Turn on Postgres extensions**: `unaccent` (accent‑insensitive) and `pg_trgm` (fuzzy/trigram).
2. **Add per‑table generated `tsvector` columns + GIN indexes** on:
* `customers` (name, company, title)
* `customer_contacts` (value/email/phone)
* `locations` (full mailing string)
This keeps search matching **index‑only and join‑free**.
3. **Search pipeline** (one API):
* Build a **tsquery** from `q` and hit each table’s `search_vector` separately (index scan).
* **UNION** the matching `customer_id`s with a weighted rank from each source.
* Add a **fuzzy fallback** (trigram similarity) for partial/prefix/misspells.
* Aggregate to one row per customer, sort by final score, paginate.
4. **UX**:
* Debounce 250–300ms, return top 10–20 with a score.
* Smarter heuristics: detect email/phone/ZIP, bias results near **primary location**.
This gives excellent recall/precision without pulling in an external search engine—and you can add a materialized view later if you want even faster lookups.
---
## 1) Postgres extensions & indexes (Alembic snippets)
> Create extensions once in an early migration (Postgres only):
```sql
CREATE EXTENSION IF NOT EXISTS unaccent;
CREATE EXTENSION IF NOT EXISTS pg_trgm;
```
### Generated `tsvector` columns (per table)
**Customers**
```sql
ALTER TABLE customers
ADD COLUMN IF NOT EXISTS search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('simple', unaccent(coalesce(company, ''))), 'A') ||
setweight(to_tsvector('simple', unaccent(coalesce(last_name, ''))), 'A') ||
setweight(to_tsvector('simple', unaccent(coalesce(first_name, ''))), 'B') ||
setweight(to_tsvector('simple', unaccent(coalesce(title, ''))), 'D')
) STORED;
CREATE INDEX IF NOT EXISTS ix_customers_search_vector
ON customers USING GIN (search_vector);
-- Useful fuzzy/prefix indexes
CREATE INDEX IF NOT EXISTS ix_customers_company_trgm
ON customers USING GIN ((unaccent(lower(company))) gin_trgm_ops);
CREATE INDEX IF NOT EXISTS ix_customers_name_trgm
ON customers USING GIN ((unaccent(lower(first_name || ' ' || last_name))) gin_trgm_ops);
```
**Contacts**
```sql
ALTER TABLE customer_contacts
ADD COLUMN IF NOT EXISTS search_vector tsvector
GENERATED ALWAYS AS (
setweight(to_tsvector('simple', unaccent(coalesce(type, ''))), 'D') ||
setweight(to_tsvector('simple', unaccent(coalesce(value, ''))), 'B')
) STORED;
CREATE INDEX IF NOT EXISTS ix_customer_contacts_search_vector
ON customer_contacts USING GIN (search_vector);
-- Fuzzy for email/phone substrings
CREATE INDEX IF NOT EXISTS ix_customer_contacts_value_trgm
ON customer_contacts USING GIN ((unaccent(lower(value))) gin_trgm_ops);
-- Fast digit-only phone contains
CREATE INDEX IF NOT EXISTS ix_customer_contacts_digits_trgm
ON customer_contacts USING GIN ((regexp_replace(value, '\D','','g')) gin_trgm_ops);
```
**Locations**
```sql
ALTER TABLE locations
ADD COLUMN IF NOT EXISTS search_vector tsvector
GENERATED ALWAYS AS (
setweight(
to_tsvector('simple',
unaccent(
coalesce(concat_ws(' ', line1, line2, city, state, zip, country), '')
)
), 'C')
) STORED;
CREATE INDEX IF NOT EXISTS ix_locations_search_vector
ON locations USING GIN (search_vector);
-- Fuzzy address substring
CREATE INDEX IF NOT EXISTS ix_locations_addr_trgm
ON locations USING GIN (
(unaccent(lower(concat_ws(' ', line1, line2, city, state, zip, country)))) gin_trgm_ops
);
```
> Notes
>
> * We use `'simple'` text config to avoid English stemming on names/addresses.
> * Weights: **A** (strong) for company/last\_name, **B** for first\_name/contact value, **C** for address, **D** for misc.
---
## 2) ORM mappings (optional but recommended)
Map the generated columns so SQLAlchemy can reference them:
```python
from sqlalchemy.dialects.postgresql import TSVECTOR
from sqlalchemy.orm import Mapped, mapped_column
class Customer(DbEntityBase):
__tablename__ = "customers"
# ... existing fields ...
search_vector: Mapped[str] = mapped_column(TSVECTOR, nullable=False)
class CustomerContact(DbEntityBase):
__tablename__ = "customer_contacts"
# ... existing fields ...
search_vector: Mapped[str] = mapped_column(TSVECTOR, nullable=False)
class Location(DbEntityBase):
__tablename__ = "locations"
# ... existing fields ...
search_vector: Mapped[str] = mapped_column(TSVECTOR, nullable=False)
```
If you prefer not to modify models, you can reference these columns with `sa.literal_column("search_vector")`, but mapping is cleaner.
---
## 3) The search function (SQLAlchemy 2.x)
This function:
* Searches all relevant fields with full‑text first
* Adds fuzzy similarity fallback
* Handles email/phone substrings
* Scopes by `office_id` / `office_region_id` if provided
* Returns customers + a combined `score`, ordered and paginated
```python
from __future__ import annotations
import re
from typing import Optional
import sqlalchemy as sa
from sqlalchemy import select, func, literal, or_, and_, desc
from sqlalchemy.orm import Session
from sqlalchemy.dialects import postgresql as pg
TRIGRAM_THRESHOLD = 0.25 # tune between 0.2 and 0.35 for your data
def search_customers(
session: Session,
q: str,
*,
office_id: Optional[int] = None,
office_region_id: Optional[int] = None,
status_in: Optional[list] = None, # e.g. [CustomerStatus.ACTIVE]
limit: int = 20,
offset: int = 0,
):
q = (q or "").strip()
if not q:
return []
lower_q = q.lower()
digits = re.sub(r"\D", "", q)
is_email = "@" in q and " " not in q
has_digits = len(digits) >= 4
# Build tsquery; unaccent the query to match our unaccented documents.
ts_query = func.websearch_to_tsquery("simple", func.unaccent(lower_q))
# ---- 1) Index-only full-text matches on each table ----------------------
# Customers FTS
cust_fts = (
select(
Customer.id.label("customer_id"),
func.ts_rank_cd(Customer.search_vector, ts_query).label("rank"),
literal(1.00).label("w"), # strongest
)
.where(Customer.search_vector.op("@@")(ts_query))
)
# Contacts FTS
contact_fts = (
select(
CustomerContact.customer_id.label("customer_id"),
func.ts_rank_cd(CustomerContact.search_vector, ts_query).label("rank"),
literal(0.65).label("w"),
)
.where(CustomerContact.search_vector.op("@@")(ts_query))
)
# Locations FTS
loc_fts = (
select(
Location.customer_id.label("customer_id"),
func.ts_rank_cd(Location.search_vector, ts_query).label("rank"),
literal(0.45).label("w"),
)
.where(Location.search_vector.op("@@")(ts_query))
)
fts_union = sa.union_all(cust_fts, contact_fts, loc_fts).subquery("fts_union")
fts_scored = (
select(
fts_union.c.customer_id,
# take best weighted rank among sources per customer
func.max(fts_union.c.rank * fts_union.c.w).label("fts_score"),
)
.group_by(fts_union.c.customer_id)
.subquery("fts_scored")
)
# ---- 2) Fuzzy similarity fallback (trigram) -----------------------------
# Name/company similarity
full_name = func.unaccent(
func.lower(func.concat_ws(" ", Customer.first_name, Customer.last_name))
)
company_l = func.unaccent(func.lower(Customer.company))
# Address similarity (across any location rows)
addr_concat = func.unaccent(
func.lower(
func.concat_ws(
" ",
Location.line1,
Location.line2,
Location.city,
Location.state,
Location.zip,
Location.country,
)
)
)
# Per-customer fuzzy score (max across related rows)
fuzzy_scores = (
select(
Customer.id.label("customer_id"),
func.greatest(
func.similarity(company_l, lower_q),
func.similarity(full_name, lower_q),
func.coalesce(func.max(func.similarity(addr_concat, lower_q)), 0.0),
).label("fuzzy_score"),
)
.join(Location, Location.customer_id == Customer.id, isouter=True)
.group_by(Customer.id)
.subquery("fuzzy_scores")
)
# ---- 3) Email/phone/ZIP heuristics -------------------------------------
# Email contains (fast path)
email_match = None
if is_email:
email_match = (
select(CustomerContact.customer_id)
.where(
func.lower(CustomerContact.value).contains(lower_q),
# optional: where type = 'email'
)
.subquery("email_match")
)
# Phone digits contains (fast path)
phone_match = None
if has_digits:
phone_digits = digits
phone_match = (
select(CustomerContact.customer_id)
.where(func.regexp_replace(CustomerContact.value, r"\D", "", "g")
.contains(phone_digits))
.subquery("phone_match")
)
# ---- 4) Assemble final query -------------------------------------------
# Base filter scope
scope_conditions = []
if office_id is not None:
scope_conditions.append(Customer.office_id == office_id)
if office_region_id is not None:
scope_conditions.append(Customer.office_region_id == office_region_id)
if status_in:
scope_conditions.append(Customer.status.in_(status_in))
# Join scored subqueries
scored = (
select(
Customer.id,
Customer.first_name,
Customer.last_name,
Customer.company,
# max: ensure we can order even when only fuzzy matched
func.coalesce(fts_scored.c.fts_score, 0.0).label("fts_score"),
func.coalesce(fuzzy_scores.c.fuzzy_score, 0.0).label("fuzzy_score"),
# Simple linear blend; tune weights to taste
(
func.coalesce(fts_scored.c.fts_score, 0.0) * 1.0
+ func.coalesce(fuzzy_scores.c.fuzzy_score, 0.0) * 0.35
).label("score"),
)
.outerjoin(fts_scored, fts_scored.c.customer_id == Customer.id)
.outerjoin(fuzzy_scores, fuzzy_scores.c.customer_id == Customer.id)
)
# Optional: require at least something to match (fts or fuzzy or heuristics)
where_any_match = [
fts_scored.c.fts_score.is_not(None),
fuzzy_scores.c.fuzzy_score >= TRIGRAM_THRESHOLD,
]
if email_match is not None:
scored = scored.outerjoin(email_match, email_match.c.customer_id == Customer.id)
where_any_match.append(email_match.c.customer_id.is_not(None))
if phone_match is not None:
scored = scored.outerjoin(phone_match, phone_match.c.customer_id == Customer.id)
where_any_match.append(phone_match.c.customer_id.is_not(None))
if scope_conditions:
scored = scored.where(and_(*scope_conditions))
scored = scored.where(or_(*where_any_match))
# Sorting & pagination
scored = (
scored.order_by(desc(sa.literal_column("score")), desc(sa.literal_column("fts_score")))
.limit(limit)
.offset(offset)
)
rows = session.execute(scored).all()
# You can post-process into DTOs; returning rows for brevity
return [
{
"id": r.id,
"full_name": f"{r.first_name or ''} {r.last_name or ''}".strip(),
"company": r.company,
"score": float(r.score or 0.0),
"fts_score": float(r.fts_score or 0.0),
"fuzzy_score": float(r.fuzzy_score or 0.0),
}
for r in rows
]
```
### Why this shape?
* **Index‑friendly**: matching happens on each table’s `tsvector`/trigram indexes, then we union customer IDs. We avoid big ORs over many columns and avoid cartesian blowups from joining first.
* **Ranking that feels right**: Full‑text rank dominates, fuzzy gives helpful lift for typos/short prefixes.
* **Heuristics for special tokens**: Email and phone detection use narrow, fast filters to surface the obvious hit.
---
## 4) UX affordances (front‑end)
* **Debounce** input 250–300ms; show spinner; return top N (10–20).
* **Highlight**: you can return the original `q` and let the FE highlight in the displayed name/company/address.
* **Bias primary location**: optionally boost customers whose `primary_location.city/state` match the user’s chosen region (add a small bonus to `score`).
* **Empty state**: display guidance like “Try name, company, email, phone, or address”.
---
## 5) Performance notes & knobs
* **Thresholds**: `TRIGRAM_THRESHOLD` is the main recall/precision dial. Start at `0.25`. Lower increases recall; higher increases precision.
* **GIN memory**: If tables are large, ensure `maintenance_work_mem` and `work_mem` are adequate when building indexes.
* **Selective partial indexes**: If most customers are ACTIVE, add partial indexes with `WHERE status = 'ACTIVE'` to speed common queries.
* **Pagination**: `LIMIT/OFFSET` is fine for type‑ahead. For deep pagination, switch to keyset (seek) on `(score, id)`.
---
## 6) Optional: materialized search view (later)
For very large datasets or strict p95 latency budgets, create a denormalized `customer_search` **materialized view** with:
* `customer_id`
* a single merged `tsvector` (customers + primary contact + primary location)
* GIN index on that vector
Then query just that MV with FTS + trigram. Schedule `REFRESH MATERIALIZED VIEW CONCURRENTLY` or maintain a regular table via triggers. This trades freshness for speed but can cut tail latency further.
---
### Summary
* The **per‑table tsvector + union** strategy delivers fast, accurate, and maintainable search entirely in Postgres with SQLAlchemy.
* The provided function is drop‑in: it does full‑text first, fuzzy fallback, and special handling for email/phone—then ranks, dedupes, and paginates cleanly.
* You can tune weights/thresholds without schema changes and optionally evolve to a materialized view if/when you need more speed.