What Data Do You Need Before You Can Start Tracking Cost Per Case?

Before you can track cost per case, you need the right data in the right place. That sounds obvious. But the majority of PI firms that try to build cost-per-case tracking without a structured data foundation end up with numbers they can't trust — and decisions made on unreliable numbers are worse than decisions made on no numbers at all.

This post walks through exactly what data you need, where it typically lives, and what to do when some of it is missing or messy. Think of it as the pre-flight checklist before you start tracking cost per case for real.

Related guide: See our definitive guide to cost per case for PI firms — calculation formula, benchmarks by firm size and lead source, and step-by-step tracking methodology.

The Three Categories of Data You Need

Cost per case connects two inputs: what you spent and what you got. Simple in theory. Complicated in practice because PI firms typically have their spending data in one place, their case data in another, and the connection between the two nowhere.

Here's what you need in each category:

1. Marketing Spend Data

This is your total investment by source. Not an estimate — actual spend. For each lead vendor and marketing channel you use, you need:

Monthly spend by vendor — what you actually paid, not what was quoted or budgeted
Start and end dates for each vendor relationship — so you can align spend to the correct time periods
How billing works for each vendor — flat monthly retainer, per-lead pricing, performance-based, or hybrid
Any credits, refunds, or adjustments — these affect your true cost per case and matter at scale

Most firms have this data, but it's scattered. Invoices in email, some entries in QuickBooks, some in a marketing spreadsheet, some tracked only in the vendor's portal. Step one is consolidating all of it into a single ledger — even a rough spreadsheet — before you try to connect it to case outcomes.

2. Lead and Case Data

This is the output side of cost per case. For each lead that entered your pipeline, you need to know:

Lead source — which vendor or channel sent this lead
Lead received date — when it entered your system
Lead disposition — did it become a signed case, a rejection, or a withdrawal?
Signed case date — when the case was signed, not just when the lead arrived
Case type or severity — if your firm tracks case severity, this enriches your analysis significantly

This data should live in your case management software — LeadDocket, Filevine, Clio, MyCase, or whatever your firm uses. The question is whether it's structured or unstructured. If your intake team records lead source as free text (“TV ad” vs. “TV” vs. “television”), you have a data cleaning problem before you have a tracking problem.

3. Attribution Data

Attribution is the bridge between spend and cases. It answers: which spend produced which cases? For most PI firms, attribution is the weakest link.

Attribution data can come from several sources:

Call tracking (CallRail or similar) — ties inbound calls to their marketing source
Form tracking — web forms with UTM parameters or source fields that capture where the lead came from
Intake questions — intake specialists asking “How did you hear about us?” and recording the answer in a structured field
Vendor-assigned lead IDs — some vendors tag each lead with a unique identifier that lets you trace it back to their campaign

Perfect attribution isn't the goal. The goal is consistent, good-enough attribution — where the same methodology is applied across all vendors so you can compare them fairly. A firm that asks “How did you hear about us?” consistently and records it in a structured field has better attribution than a firm that uses call tracking for some sources and intake questions for others.

The Three Data Categories for Cost Per Case

Marketing SpendActual cost by vendor/month

Lead & Case DataSource, disposition, signed date

Attribution DataConnecting spend to cases

Marketing SpendActual cost by vendor/month

Lead & Case DataSource, disposition, signed date

Attribution DataConnecting spend to cases

How Much Historical Data Do You Need?

You can start tracking cost per case with as little as one month of clean, connected data. But you need at least three months of data before patterns become meaningful — and six months before you can make reliable vendor decisions based on trends rather than snapshots.

Here's why this matters for PI firms specifically: the 6 to 18-month settlement lag means your financial intelligence layer — connecting spend to settlement revenue — won't be reliable for 6 to 9 months after you start tracking. But your cost-per-case data (spend to signed cases) can be meaningful much faster.

If you have 12 months of historical spend data and a case management system with structured lead source and disposition data, you can retroactively build a cost-per-case baseline for the past year. That's valuable — it means you're not starting from zero.

What to Do When Your Data Is Messy

Messy data is the norm, not the exception. Here's how to handle the most common problems:

Inconsistent Lead Source Labels

If your case management system has “TV ad,” “TV” and “television” as three separate sources, you need to consolidate before you can analyze. Create a source mapping document that standardizes your vendor names across all systems. Going forward, lock intake specialists to a dropdown field rather than free-text entry.

Missing Spend Records

If you don't have exact spend numbers for historical periods, use estimates from bank records, accounting software exports, or vendor invoices. An estimate within 10% is close enough to build a baseline. Record where estimates were used so you can refine them later.

No Lead Source Data for Older Cases

For cases that predate your tracking improvements, attribute them in aggregate. If you know that 60% of your historical spend went to Vendor A, and you signed 100 cases in that period, use proportional attribution as a starting estimate. It's not perfect — but it gives you a usable baseline while your new tracking practices accumulate clean data.

Cases Without Signed Dates

If your case management system records case status but not the date the case was signed, you may need to use case creation date as a proxy. Document this substitution. As you clean up your intake process, start capturing signed dates explicitly — they're essential for time-based cost-per-case analysis.

The Data Minimum Viable Product

You don't need perfect data to start. You need a minimum viable dataset:

Total spend per vendor for the past 6 to 12 months (estimated is fine)
Total leads received per vendor for the same period
Total signed cases attributable to each vendor for the same period
A standardized vendor naming convention going forward

With those four inputs, you can calculate a rough cost per case by vendor today. It won't be perfect. It will tell you things you didn't know. And it will give you a baseline to improve against as your tracking gets cleaner.

Olivia, an intake manager at a 25-attorney PI firm, put it this way: “We thought we needed six months to clean up our data before we could start. We actually started with what we had — imperfect data — and the process of connecting it to cost per case immediately showed us where the gaps were. Cleaning the data became a business priority once we could see what the clean version would tell us.”

The Minimum Viable Dataset

Spend Per Vendor

6-12 mo

Estimated is fine

Leads Per Vendor

Same period

From intake records

Signed Cases

By vendor

Attributed to source

Naming Convention

Standardized

Going forward

A Pre-Launch Data Checklist

Before you start tracking cost per case, work through this checklist:

Historical spend by vendor: 6–12 months minimum, actual or estimated
Standardized vendor/source names across all systems
Lead disposition data in a structured field (not free text) in your CMS
Attribution mechanism in place for all active lead sources (call tracking, UTMs, or intake question + structured field)
Signed case date captured in your CMS (or a reliable proxy)
Defined cost-per-case target for your firm (what does a “good” CPC look like?)

If you can check all six boxes, you're ready to start tracking cost per case with confidence. If you can check four or five, you're ready to start with appropriate caveats. If you can only check two or three, fix the data foundation first — it will save you from drawing wrong conclusions from bad inputs.

Pre-Launch Data Readiness Assessment

	Data Element	Ready	Needs Work	Priority
Historical spend by vendor (6+ months)			Critical
Standardized vendor names			Critical
Lead disposition in structured fields			High
Attribution mechanism (tracking #s/UTMs)			High
Signed case dates in CMS			Medium
Defined CPC target for firm			Medium

Ready to Connect Your Data?

RevenueScale is built to work with the data PI firms actually have — including the messy, gap-filled spreadsheets and partially structured case management exports that represent real-world starting points.

Book a demo and we'll walk through your specific data situation, identify what's ready to use now and what needs cleanup, and show you what your cost-per-case tracking could look like within 30 days.