On 25 April, the first filter was finance.
iGent had a paid cloud coding agent with a specific advantage: long-horizon work, the kind where landing closer to the target is worth more than a cheap demo. The first London users could be startups, but not the “noodles level” founder who likes the idea and cannot pay for it. The product needed companies where engineering throughput already had a budget line.
Quant research was the obvious answer. London has funds, technical teams, and software that touches money quickly enough to make better tools feel rational. A lazy lead agent could have stopped there and produced a handsome list: hedge funds, quant shops, infrastructure-heavy fintechs, all ranked with confident little explanations.
Research became useful when it disappointed that instinct.
One attractive row had everything a shallow search likes: London office, visible funding, AI language, engineering jobs, and a plausible story about tools making expensive developers faster. Then the evidence column opened. The jobs were mostly product application roles. The AI language belonged to marketing. The integration surface was thin. Nobody obvious owned developer tooling, platform engineering, or engineering productivity.
No outreach. monitor.
Lead generation becomes interesting only when the research changes the map you would have drawn by hand. Finance stayed in the set; it did not keep the crown. Late-stage fintech, AI infrastructure, developer tools, applied automation, regtech, and bioinformatics looked less glamorous in some cases, but often had better first-run properties: public engineering pain, inspectable product surfaces, and a buyer a human could name without writing fan fiction.
Search after fit
The original deliverable was not a lead list. It was architecture plus an opinionated ICP and scoring rubric, grounded in London market research, running on Cloudflare’s agents stack. That wording mattered. Architecture without an ICP would only automate wandering. Search before fit lets the web write the scorecard: strong SEO becomes market fit, a careers page fond of the word platform becomes engineering intensity, and a vague funding quote turns into urgency.
For a paid coding agent, the dimensions were plain enough: London presence, engineering intensity, technical surface area, regulated or security-sensitive pressure, urgency around scale, and buyer accessibility. Each dimension had to carry evidence. A missing dimension stayed missing.
The first rubric was deliberately boring:
{
"dimensions": {
"local_presence": 15,
"engineering_intensity": 20,
"category_fit": 25,
"regulated_or_security_need": 15,
"scale_urgency": 15,
"buyer_accessibility": 10
},
"hard_disqualifiers": [
"no observable product or engineering surface",
"no plausible buyer",
"procurement path incompatible with sales motion",
"technical constraint makes the product unusable"
]
}
Arithmetic was only a handle. It gave the agent something to be wrong against. Without a scorecard, the run produced confidence. With one, it produced a claim that could be inspected, challenged, and lowered.
Receipts before rank
Collection could not begin with “top ten companies in London.” It had to begin with candidate evidence: source trails, not summaries; primary sources where possible, not recycled market reports; failed searches kept in the notes instead of silently discarded.
A careers page supports engineering intensity. A security page supports regulated pressure. A public integration guide supports technical surface area. A funding announcement may support runway, but it does not prove buyer access. A vague article about digital transformation mostly supports the continuing employment of people who write vague articles about digital transformation.
Weak evidence is allowed. It just has to stay weak. Agent research often goes soft here. A third-party profile says a company is expanding, the agent repeats “rapid growth,” and by the end of the run the phrase has become a fact with a score attached.
Slow the work down at that exact point. Mark source quality. Separate direct evidence from inference. Make unsupported dimensions expensive.
That is why the first downgrade mattered. The source trail did not merely decorate the score. It changed the next action.
Downgrading was not a conservative flourish. It was the product doing its job.
Score in public
Every candidate needed four fields per dimension: score, maximum score, evidence used, and the reason that evidence was enough or not enough. No evidence means a low score. Indirect evidence means a cautious score. A hard disqualifier kills the candidate even when the rest of the row looks tempting.
Bad output reads like this:
{
"company": "ExampleCo",
"fit": "Strong",
"reason": "ExampleCo is growing quickly and investing in AI."
}
Better output exposes the check:
{
"company": "ExampleCo",
"tier": "monitor",
"score": 61,
"breakdown": {
"engineering_intensity": {
"score": 14,
"max": 20,
"evidence": ["hiring page lists application engineering roles"],
"source_quality": "primary"
},
"buyer_accessibility": {
"score": 2,
"max": 10,
"evidence": ["no platform, DevEx, infra, or tooling owner found"],
"source_quality": "missing"
}
},
"next_action": "Verify ownership before outreach."
}
Version two is uglier, which is part of its charm. It gives the operator something specific to check. It also prevents the agent from turning a missing buyer into a beautifully phrased outreach hook, one of those small acts of violence that sales software commits against everyone involved.
Plain tiers are easier to trust: A, B, monitor, disqualify. No “intent heat.” No “market resonance.” No funnel taxonomy with names that sound as though a dashboard swallowed a thesaurus. A category exists to decide the next action, not to make uncertainty sound funded.
Give the row an adversary
Scoring was not the final authority. By the end of the run, the agent has spent too much time making the candidate legible to be trusted as the judge.
A reviewer pass receives the scorecard, candidate notes, source trail, and scores, then looks for unsupported claims, ignored disqualifiers, and overconfident inference. It reduces scores, downgrades tiers, and adds verification tasks. If the reviewer cannot find the evidence for a score, the score loses its nice shoes.
Splitting the work across agents helps for practical reasons. One agent builds the fit model. Another searches and collects candidates. Others extract evidence, score rows, review weak claims, and write the market brief. Elaborate, maybe. The alternative is a single agent carrying its first assumptions all the way into the final prose, where they arrive freshly laundered and very pleased with themselves.
Cloudflare was a reasonable substrate for that loop: Workers for endpoints, Durable Objects for run state, Queues for collection and review stages, D1 or R2 for evidence and outputs, Workflows for long-running coordination. The stack mattered less than the separation of duties. Search, extraction, scoring, review, and summary leave different fingerprints.
Let the map change
Ranked lists are not the prize. Market maps are.
After enough candidates have been scored, the agent aggregates the run: which clusters produced strong fits, which dimensions failed repeatedly, which disqualifiers appeared most often, where evidence was thin, and which buyer paths looked reachable. Research changes the next run only after that aggregation exists.
By the end of the April brief, the answer was not “go after quant funds because they have money.” It was closer to: keep quant in the research set, but do not let budget impersonate access. Late-stage fintech and AI infrastructure may produce better first conversations because the engineering pain is public, the product surface is inspectable, and the owner is easier to name.
Next action is non-negotiable. A lead without one is not a lead; it is a label. Contact mapping, source verification, procurement-risk checking, pilot hypothesis writing, a page to read, or a decision to leave the company alone until the market changes are all valid actions. The agent hands back the next piece of work, not a ranked object to admire.
Keep the surface boring
Overbuilding this workflow is tempting. It does not require a cinematic agent interface or a sales-intelligence cathedral. A solid first surface is an endpoint that returns structured evidence and a reviewable table. Schema, source trail, scoring discipline, and reviewer pass matter before the theatre around them.
Minimum contract:
- every candidate has a cluster, tier, score, evidence list, disqualifiers, and next action;
- every score is traceable to source-backed evidence or explicitly marked as weak;
- every disqualifier can override the total score;
- every run produces a market summary as well as a prospect list;
- every output is stable enough to compare with the next run.
Enough, if a tempting row can be downgraded in public: no buyer found, weak source trail, hard disqualifier, next action changed from outreach to verification.
Without that loop, the agent produces a faster version of the old problem: a list of names with confidence applied after the fact, like varnish.