Evan's List — E-Commerce Brand Acquisition Case Study

The Problem

Before & After Evan's List

Finding acquisition candidates was manual, inconsistent, and impossible to scale.

Before

Brand discovery done by hand — browsing a research tool one page at a time
No consistent scoring criteria — gut feel, not data
Leads scattered across CSVs; no pipeline visibility
Outreach copy written from scratch each time
No record of which brands had already been contacted
Scoring rules informal — locked in one person's head
No way to run recurring discovery at scale

After Evan's List

43,981 brands automatically discovered and scored — refreshed every month 43,981 brands ingested from the SmartScout API and scored — monthly cron, resumable long pulls
Every brand ranked into one of 6 target segments using transparent, agreed criteria
One source of truth tracks every brand, score, and conversation 13-table Cloudflare D1 state store tracks every brand, run, score version, and contact
A human review queue approves every batch before outreach goes out
Business team owns scoring rules and email copy through live web tools — no engineer needed
A brand can never be contacted twice — the system makes it impossible Idempotent by design — unique indexes + suppression tables make double-contact impossible
Replies land in the CRM automatically; brands that convert are flagged as won

Architecture

The Intelligence Pipeline

From market data all the way to a warm conversation in the CRM — with exactly one human checkpoint, right where judgment matters.

Seven serverless stages from marketplace ingestion to CRM sync — every stage live, every stage idempotent and re-runnable.

🔍

Discovery Market Data SmartScout API Live

→

⚙️

Ingestion Clean & Store Dedup + D1 Upsert Live

→

📊

Scoring Rank & Segment 6 Segments, Config-Driven Live

→

🧬

Enrichment Find the Right Person Apollo + Fallback Waterfall Live

→

👀

Human Review Approve Every Batch Review Queue UI Live

→

✉️

Outreach Personalized Email Smartlead, 6 Campaigns Live

→

🤝

CRM Warm Replies + Wins HubSpot + Win Detection Live

The whole flow runs itself on a monthly schedule — retries, error handling, and alerts included. No servers to maintain, no engineer on call. Orchestrated by trigger.dev — durable background tasks handle fan-out, retries with exponential backoff, and the monthly cron. State lives in Cloudflare D1; the client-facing tools and API run on Cloudflare Pages + Functions behind Zero Trust.

Engineering Decisions

Where We Did Things Differently

The challenges that required real architectural thinking — not just plugging in tools.

Intelligent Workaround

The Client's Data Warehouse Wasn't Ready. We Built Around It.

The client's long-term plan was an enterprise data warehouse — but provisioning it would take months. Instead of waiting, we stood up a lightweight serverless database as a stand-in, designed so the eventual warehouse can slot in later without rebuilding anything.

The target warehouse was Snowflake — but access wasn't provisioned and wouldn't be for months. Blocking the build on vendor onboarding wasn't acceptable, so we made Cloudflare D1 the interim state store, written against a swappable interface shaped like the eventual warehouse client.

The result: the entire system was built, tested, and launched months before the warehouse existed. The client lost zero velocity — and the upgrade path is already designed. The result: the full pipeline shipped months ahead of warehouse provisioning. Post-launch, the migration was designed properly: a warehouse-only split where the transactional store keeps the hot path and Snowflake is fed in batch for analytics — never blocking a run (see What's Next).

Business-Owned Logic

Scoring Is Usually Buried in Code. We Built a UI for It.

In most systems, the rules for what makes a "good lead" live in code only an engineer can change. We built a live web tool where the commercial team defines those rules themselves — which criteria are must-haves, which are dealbreakers, which are nice-to-haves — and instantly sees how real brands re-rank as they adjust.

The scoring model is fully config-driven: each of the 6 segments is defined by must-have gates, dealbreakers, and nice-to-have criteria, edited in an authenticated web configurator with a live preview of real brands re-scoring as you type. Saves are versioned rows in the database — the pipeline picks up the active config on its next run. No deploy required.

43 saved versions of the scoring rules were iterated through with the client's leadership before sign-off — every change visible, previewable, and reversible. A "Why suppressed?" explainer shows exactly why any brand was filtered out. 43 config versions iterated to calibration sign-off. Re-scoring writes a new config_version set of score rows, so a tweaked config can be compared against the same cohort without re-ingesting — the "Why suppressed?" tooltip traces every filter decision.

Infrastructure-Free Orchestration

Background Jobs with Retries, Fan-Out, and Cron — No Servers.

Scoring and enriching tens of thousands of brands can't happen in one click. Normally that means servers, job queues, and someone on call. Instead, the pipeline runs as managed background jobs that retry themselves when a vendor hiccups and fire automatically every month.

Enriching hundreds of brands against rate-limited vendor APIs can't happen synchronously. trigger.dev provides durable task execution: the monthly cron fires the first weekend of each month, fans out ingestion → scoring → enrichment tasks with maxAttempts: 3 and exponential backoff.

Self-healing by default: if a data provider rate-limits or times out mid-run, the system backs off and resumes on its own. Nobody gets paged. Long pulls got their own lane: full-cohort ingestion and enrichment runs exceed serverless time limits, so resumable watchdog scripts checkpoint progress and pick up where they left off — the 43,981-brand cohort was pulled without a single manual restart.

Safe-by-Default Design

The Pipeline Can't Send an Email — or Spend a Dollar — by Accident.

A system that can email thousands of prospects or rack up vendor charges needs hard brakes. We built two independent safety switches — one for spending on contact data, one for sending email — and both default to "off." Going live was a deliberate human decision, twice.

Two independent environment gates protect the blast radius: ENRICH_MODE gates paid enrichment API spend, PIPELINE_MODE gates live sends. Unless a human explicitly sets production in the deployment config, tasks return early with a logged {skipped: true} — dry-run is the default everywhere.

Deliverability protected too: if email bounces start creeping up, the system raises an alert before the sending platform would penalize the domains — protecting the client's sender reputation automatically. Bounce guardrail: webhook-driven bounce tracking alerts before Smartlead's 5% auto-pause threshold is reached, so deliverability issues are caught while they're still cheap to fix. Sends are throttled to ~15–20/day across 6 warmed mailboxes on 3 domains.

Data Model Discipline

Stable Identity: A Brand Is Never Contacted Twice.

Brands rename themselves, storefronts get rebranded, and data refreshes monthly. The system recognizes the same brand across all of it — so a prospect who already replied, or asked to be left alone, can never accidentally be emailed again. That protection is built into the database itself, not a checklist.

Brand identity is a content hash of an immutable natural key — sha256(storefront_url)[:12] — so records survive display-name changes across monthly runs. Outreach state is a strict truth table: replied and unsubscribed brands are permanently suppressed, enforced by unique indexes rather than application logic.

Why it matters: in cold outreach, double-emailing someone who said "no thanks" burns trust and domains. Here, it's structurally impossible. Every run is re-runnable: upserts are idempotent, side-effects (notifications, status writes) are best-effort and never fail the pipeline, and every execution is logged to an auditable run history.

Correctness Engineering

One Scoring Model, Two Runtimes — Kept Honest by Tests.

The same scoring rules run in two places: the monthly pipeline, and the live preview the business team sees in their browser. If those ever drifted apart, the team would be approving one thing and sending another. Automated checks compare both against the same examples on every change — they cannot silently disagree.

The scoring engine exists in Python (pipeline) and JavaScript (live preview + API). Dual implementations of one business rule is a drift bug waiting to happen — so two parity test suites run both engines against shared fixtures and fail the build on any divergence, down to segment assignment and suppression reasons.

And we said so out loud: in our post-launch review we flagged this duplication as the first thing to consolidate in the next version — see What's Next. Honest assessment: parity tests are the mitigation, not the cure. The v2 architecture collapses this into a single shared implementation — one source of truth per business rule (see What's Next).

What We Built

6 Production Systems

Every component is deployed, documented, and running in production — in daily use by the client's team.

🔄

Monthly Discovery Engine

Automatically pulls the latest marketplace data every month, filters out brands below the revenue floor, removes anyone already known or previously contacted, and files the rest for scoring.

Paginated SmartScout ingestion with a $1M+ revenue floor, SHA-256 content-hash deduplication, and suppression-list cross-reference on every run. Fires on a monthly cron; long pulls run as resumable, checkpointed jobs.

43,981Brands Ingested

MonthlyAuto-Refresh

$1M+Revenue Floor

🎛️

Scoring Configurator

A live web tool where the commercial team defines what a great prospect looks like — for each of 6 target segments — and watches real brands re-rank instantly. No engineer required to change strategy.

Authenticated configurator for the criteria-based scoring model: must-have gates, dealbreakers, and nice-to-have bands per segment, with a live preview scoring real brands client-side and versioned config saves the pipeline reads at runtime.

6Segments

43Config Versions

LivePreview

🗄️

Edge State Store & API

One source of truth for everything — every brand, every score, every email sent, every reply — served globally with effectively zero infrastructure cost.

Cloudflare D1 as the pipeline's source of truth: 13 tables covering brands, score versions, runs, outreach history, review queue, contacts, and config — evolved through 9 schema migrations, fronted by ~30 serverless API endpoints on Pages Functions.

13Tables

~30API Endpoints

$0Infra Cost

🧬

Contact Enrichment Waterfall

For every promising brand, the system finds the actual decision-maker — name, title, and verified email — trying a second data provider automatically when the first comes up empty.

Apollo.io decision-maker discovery with a BetterContact email-reveal fallback, persona-mapped titles, full audit logging per enrichment attempt, and multi-contact support per brand. Spend-gated behind its own environment switch.

620+Contacts Found

2-StepWaterfall

LoggedEvery Attempt

👀

Review Queue & Sequence Editor

Humans stay in charge: a review dashboard where the team approves or rejects each prospect before any email goes out, plus an editor for writing the outreach copy for each segment — all in the browser.

Human-in-the-loop review UI with per-brand notes, segment overrides, and approval state feeding the send queue — plus a segment-tabbed sequence editor whose copy is versioned in the database and wired into the sending platform on approval.

100%Human-Approved

6Sequences

TeamEditable

🤝

Outreach → CRM, Closed Loop

Approved prospects get personalized email from warmed-up sending addresses. Replies flow straight into the client's CRM, and when a prospect becomes a partner, the system spots it and marks the win automatically — so results are measured, not guessed.

Six Smartlead campaigns (one per segment) across 6 mailboxes / 3 domains; reply, bounce, and unsubscribe webhooks write back to the state store; warm replies sync to HubSpot; automatic win detection flags converted brands. Live dashboard reports funnel stats end-to-end.

6Campaigns

AutoWin Detection

LiveDashboard

By The Numbers

What's Running in Production

Real figures from the live system at handoff.

🔍

0

Brands Ingested & Scored

Full marketplace cohort, refreshed monthly

🔥

0+

Hot-Tier Prospects Identified

Ranked by the client's own criteria

🧬

0+

Decision-Maker Contacts

Name, title & verified email

🎛️

0

Scoring Config Iterations

Calibrated with client leadership to sign-off

📬

0

Warmed Sending Mailboxes

Across 3 dedicated domains

🗄️

0

Database Tables

Evolved through 9 zero-downtime migrations

🛡️

0

Brands Contacted Twice

Enforced by the data model, not a checklist

🚨

0

Accidental Sends or Spend

Dry-run by default, dual production gates

How We Got There

10 Weeks, Idea to Production

A phase-by-phase breakdown of the engagement. Click any phase for details.

Weeks 1–2 — Discovery & Data Design

🔍 Map the Data Landscape Before Writing Code

Audited the client's existing assets — brand lists, a partially populated CRM, and their market-data subscription. Defined the revenue floor, the brand identity scheme, and the full database design before any pipeline code existed.

✓ Complete Schema Design Data Audit

Click for details

Key decisions made in this phase:

Revenue floor set at $1M/year — filters noise, keeps the dataset focused on actionable targets
Brand IDs as sha256(storefront_url)[:12] — stable across name changes, portable across future data stores
Every brand gets a permanent fingerprint, so it's recognized even if it renames itself
Warehouse access confirmed unavailable → D1 interim-store decision made here, behind a swappable interface
The client's enterprise warehouse wasn't ready — the stand-in database decision was made here, with the upgrade path designed in
Deduplication rules defined against the existing CRM and past outreach history

Weeks 2–4 — Ingestion Engine & State Store

⚙️ Market Data Flowing, Automatically

Built the complete ingestion layer: paginated market-data pulls with revenue filtering, deduplication, and full run logging — deployed as scheduled background jobs with the database live at the edge.

✓ Complete Python 3.11 Cloudflare D1 trigger.dev

Click for details

Key deliverables:

Ingestion with discover (audit-only) and ingest (upsert) modes — safe to inspect before writing
A "preview mode" that audits what would change before anything is written
Python D1 HTTP client — parameterized queries, full error handling, run audit trail via start_run()/finish_run()
Every pipeline execution logged with an auditable start/finish record
First end-to-end run validated in under 20 seconds, discovery to state store
Monthly cron deployed with a production-only guard; fan-out into downstream batch tasks

Weeks 3–5 — Business-Owned Tools

🎛️ Scoring Configurator + Sequence Editor

Built the browser tools that put strategy in the business team's hands: the live scoring configurator and the email sequence editor — both deployed behind enterprise access control, both in daily use since.

✓ Complete Scoring UI Sequence Editor Zero Trust

Click for details

Highlights:

Live preview — sample brands re-score and re-segment in real time as criteria change
"Why suppressed?" explainer on every filtered brand — full transparency into the model
Config persisted via the scoring-config API — versioned in D1, instantly readable by the pipeline; sendBeacon fallback saves on page close
Changes save to the cloud instantly and take effect on the next run — no deploys, no tickets
Sequence editor: one pane per segment, multi-step copy, fully owned by the commercial team

Weeks 5–7 — Scoring Model v2 & Calibration

🧠 From Weighted Sliders to Criteria the Client Actually Thinks In

The first scoring model used abstract weights — mathematically sound, but not how the client reasoned about prospects. We rebuilt it around their real language: must-haves, dealbreakers, and nice-to-haves per segment — then calibrated it with leadership through 43 saved iterations to sign-off.

✓ Complete Criteria Model Parity Tests

Click for details

What changed and why:

Score = must-have gates → dealbreaker filters → 50 + 50 × (nice-to-haves hit ÷ known); brands match to their best-fit segment A–F
Brands must pass the non-negotiables, get excluded for dealbreakers, then earn points for every bonus criterion — and land in whichever of the 6 segments fits best
Unknown data handled honestly — a brand isn't penalized for missing information, and the math says so
Python and JS engines locked in step by two parity suites running shared fixtures in CI
6 target segments finalized with the client's growth leadership; hot/warm/cold tiers defined

Weeks 7–8 — Enrichment & Human Review

🧬 Finding the Right Person, Then Asking Permission

Wired up the two-provider contact enrichment waterfall and the human review queue — 620+ decision-maker contacts resolved across the hot tier, every one routed through the review dashboard before becoming eligible for outreach.

✓ Complete Apollo.io Review Queue

Click for details

Key deliverables:

Enrichment waterfall: Apollo.io discovery → BetterContact email reveal on miss; every attempt logged to an enrichment audit table
If the first data provider can't find a verified email, a second one tries automatically — and every attempt is logged
Persona mapping — titles matched to the decision-maker profile each segment targets
Review dashboard with approval state, notes, and segment overrides per brand
Spend gate (ENRICH_MODE) kept enrichment in dry-run until unit economics were confirmed

Weeks 8–9 — Outreach Goes Live

🚀 Six Campaigns, Warmed Domains, Guardrails On

Flipped the switch — deliberately. Six segment-specific campaigns launched across six warmed mailboxes on three dedicated domains, with reply, bounce, and unsubscribe events flowing back into the system in real time.

✓ Complete Smartlead Webhooks

Click for details

Launch architecture:

Approved brands enroll into the campaign matching their segment — copy tailored to their profile
Reply/bounce/unsubscribe webhooks write to outreach_history; merge-field integrity covered by tests
Every reply, bounce, and unsubscribe updates the system's memory instantly
Send volume deliberately throttled (~15–20/day) to protect domain reputation while scaling
Bounce guardrail alerts before the sending platform's automatic penalty threshold

Weeks 9–10 — CRM Loop & Handoff

🤝 Close the Loop, Hand Over the Keys

Warm replies sync into the client's CRM automatically, converted brands are detected and marked as wins, and a live dashboard reports the full funnel. Delivered with complete documentation, runbooks, and a walkthrough for the client's team.

✓ Complete HubSpot Dashboard Handoff Docs

Click for details

Delivered at handoff:

HubSpot sync on warm reply + automatic win detection reconciling CRM state against the brand ledger
When a prospect becomes a partner, the system notices on its own and records the win
Live funnel dashboard: ingested → scored → enriched → reviewed → contacted → replied → won
Runbooks, troubleshooting guides, and a full handoff document — maintained in the repo, not a PDF that rots
Mid-project the client consolidated CRMs — the pipeline absorbed the switch without missing a run

Technology

Tech Stack

Every tool chosen for a reason — reliability, fit, and zero unnecessary infrastructure.

🐍 Python 3.11 📘 TypeScript ⚡ trigger.dev ☁️ Cloudflare Pages + Functions 🗄️ Cloudflare D1 🔐 Cloudflare Zero Trust 🔍 SmartScout API 🧬 Apollo.io 📧 BetterContact ✉️ Smartlead 🤝 HubSpot 📨 Resend 🏔️ Snowflake (analytics roadmap) 🤖 Claude Code 📋 REST / Webhooks

What's Next

We Audit Our Own Work

After launch, we ran a formal review of our own architecture — what earned its keep, and what we'd sharpen. That honesty is the point: the client got a written evolution plan, not a black box.

Post-launch, we ran a formal architecture retrospective and designed the v2 path. The verdict: the data model was the right investment; the seams between runtimes are where the cost lived. Here's the evolution plan the client received.

⚡ Pipeline

discover → score → enrich → outreach, monthly

trigger.dev orchestration — ingest → score → enrich → outreach

↓ hot-path reads & writes

🗄️ Transactional Store

the fast, operational memory the pipeline runs on

operational state: brands, scores, review queue, outreach history

↓ batch load after each run

🏔️ Analytics Warehouse

where the client's analysts explore history and report on results

Snowflake: cohort & score history, campaign events, conversions — fed, never blocking

Evolution 1

The Warehouse Split

The enterprise warehouse finally arrives — but as an analytics layer, not a replacement. The fast operational database keeps running the day-to-day pipeline; the warehouse receives a copy of everything for the client's analysts to slice. Each system does what it's best at.

Snowflake is a columnar analytics warehouse, not a transactional store — so the design splits responsibilities rather than migrating wholesale. The SQLite-class store keeps the hot path; Snowflake is batch-fed cohort history, score versions, campaign events, and conversions after each run. The pipeline never blocks on a warehouse query.

Guiding rule: prove the new setup produces identical results to the current one before changing anything else. Migrate first, improve second — never both at once. Guiding rule: migrate to parity first, refactor second. The new infrastructure must reproduce identical scores, counts, and events against the current baseline before any logic changes ship.

Evolution 2

One Language, End to End

The system was built in three programming languages — each fine on its own, but every translation point between them added friction. Our retrospective's biggest lesson: next time, one language everywhere. Simpler to maintain, cheaper to extend, easier for any future engineer to pick up.

Python pipeline, TypeScript orchestration wrappers, JS API — every meaningful pain point traced to a boundary between languages, not to any one of them: results parsed off stdout via a sentinel line, and a pipeline that reached its own database through its own HTTP API because the runtimes couldn't share bindings. The v2 default: TypeScript end-to-end with native database bindings — or, if the data work demands Python, a Python-reachable store so the hop disappears.

Also on the list: collapsing the duplicated scoring logic into a single shared implementation, so the rule exists in exactly one place. Also on the list: one source of truth per business rule — the Python/JS scoring twins (today kept honest by parity tests) collapse into a single implementation every consumer imports.

Evolution 3

What We'd Do Again, Verbatim

Not everything changes. The foundation — how brands are identified, how every version of the scoring rules is preserved, how the system protects prospects from double-contact — proved itself and carries into v2 untouched. Getting the data foundation right early is what made everything else fixable.

The retrospective's other half: the data model earned its keep. Content-hash IDs from immutable natural keys, config-versioned score rows enabling cohort-level A/B of scoring changes, and idempotent runs with best-effort side-effects all carry forward as standing patterns for every future build.

The takeaway: invest early in the parts a rewrite can't easily fix. Everything else can evolve. The takeaway: invest early in the ID strategy and schema — it's the part a rewrite can't cheaply fix. Runtimes, frameworks, and vendors are all swappable above a sound data model.

The Approach

Why This Works

The best automation systems aren't the ones where the engineer did everything — they're the ones where the business team can change what matters without calling an engineer. Scoring criteria, email copy, and campaign approval all live in tools the commercial team owns. Engineering owns the infrastructure. Strategy stays where it belongs.

— The Evan's List Approach

Engagement Scope

What's Included

⏱️

~200 Hours, 10 Weeks

Scoped in phases with a fixed cap, billed against actuals. From first discovery call to production handoff — including a mid-project CRM switch absorbed without a change order.

📋

Full Source Code

Every script, task, function, and UI delivered to the client's own repository. No vendor lock-in on the intelligence layer — they own the logic.

📖

Living Documentation

Runbooks, troubleshooting guides, a changelog, and a full handoff document — maintained in the repository and kept current through the final week, not written after the fact.

🧭

An Honest Roadmap

A written post-launch retrospective and a costed evolution plan — what to build next, what to leave alone, and what we'd do differently. Clients deserve the real answer.

From Gut Feel
to a Brand Intelligence
Engine.

Before & After Evan's List

Before

After Evan's List

The Intelligence Pipeline

Where We Did Things Differently

The Client's Data Warehouse Wasn't Ready. We Built Around It.

Scoring Is Usually Buried in Code. We Built a UI for It.

Background Jobs with Retries, Fan-Out, and Cron — No Servers.

The Pipeline Can't Send an Email — or Spend a Dollar — by Accident.

Stable Identity: A Brand Is Never Contacted Twice.

One Scoring Model, Two Runtimes — Kept Honest by Tests.

6 Production Systems

Monthly Discovery Engine

Scoring Configurator

Edge State Store & API

Contact Enrichment Waterfall

Review Queue & Sequence Editor

Outreach → CRM, Closed Loop

What's Running in Production

10 Weeks, Idea to Production

🔍 Map the Data Landscape Before Writing Code

⚙️ Market Data Flowing, Automatically

🎛️ Scoring Configurator + Sequence Editor

🧠 From Weighted Sliders to Criteria the Client Actually Thinks In

🧬 Finding the Right Person, Then Asking Permission

🚀 Six Campaigns, Warmed Domains, Guardrails On

🤝 Close the Loop, Hand Over the Keys

Tech Stack

We Audit Our Own Work

The Warehouse Split

One Language, End to End

What We'd Do Again, Verbatim

Why This Works

What's Included

~200 Hours, 10 Weeks

Full Source Code

Living Documentation

An Honest Roadmap

Ready to build your pipeline?

From Gut Feel to a Brand IntelligenceEngine.

Before & After Evan's List

Before

After Evan's List

The Intelligence Pipeline

Where We Did Things Differently

The Client's Data Warehouse Wasn't Ready. We Built Around It.

Scoring Is Usually Buried in Code. We Built a UI for It.

Background Jobs with Retries, Fan-Out, and Cron — No Servers.

The Pipeline Can't Send an Email — or Spend a Dollar — by Accident.

Stable Identity: A Brand Is Never Contacted Twice.

One Scoring Model, Two Runtimes — Kept Honest by Tests.

6 Production Systems

Monthly Discovery Engine

Scoring Configurator

Edge State Store & API

Contact Enrichment Waterfall

Review Queue & Sequence Editor

Outreach → CRM, Closed Loop

What's Running in Production

10 Weeks, Idea to Production

🔍 Map the Data Landscape Before Writing Code

⚙️ Market Data Flowing, Automatically

🎛️ Scoring Configurator + Sequence Editor

🧠 From Weighted Sliders to Criteria the Client Actually Thinks In

🧬 Finding the Right Person, Then Asking Permission

🚀 Six Campaigns, Warmed Domains, Guardrails On

🤝 Close the Loop, Hand Over the Keys

Tech Stack

We Audit Our Own Work

The Warehouse Split

One Language, End to End

What We'd Do Again, Verbatim

Why This Works

What's Included

~200 Hours, 10 Weeks

Full Source Code

Living Documentation

An Honest Roadmap

Ready to build your pipeline?

From Gut Feel
to a Brand Intelligence
Engine.