Identity Resolution for Sports: Why Email Matching Isn't Enough

The email problem

When someone asks how their organization matches records across systems, the answer is almost always: "We use email address as the primary key." It's a reasonable choice. Email is unique, it's something people provide consistently, and it's the connective tissue between most marketing and CRM systems.

The problem is that people change email addresses — and they change them constantly. They switch from a college email to a personal one when they graduate. They change providers when one service gets too cluttered. They create a new account to take advantage of a promotion. They use their spouse's email for one purchase and their own for another.

Research consistently shows that the average consumer uses 2–3 active email addresses at any given time and cycles through 4–6 over a decade. In the context of a fan who's been buying tickets since 2012, that means your systems may have three or four records for the same person — each with a completely different email address, each treated as a unique individual.

What deterministic matching misses

Most fan data platforms use what's called deterministic matching: if two records share an exact email address (or phone number, or some other unique identifier), they're considered the same person. If they don't share an exact match, they're considered different people.

This approach has two major failure modes in the athletic department context:

False negatives: Two records that represent the same person don't get matched because they use different emails. Your ticketing system has John Smith at jsmith@gmail.com. Your CRM has him at john.smith@company.com. They're the same person, but your system treats them as two different fans. John has a ticket purchase history, donation history, and email engagement history — but your team can only see one piece of it at a time, depending on which system they're looking at.

False positives: Two records that represent different people share some data element and get incorrectly merged. A family account where multiple family members use the same email for ticketing but donate separately is a common example. Naive matching can collapse a family's giving history into a single record that makes no sense.

How probabilistic matching works differently

Probabilistic matching doesn't require an exact match on any single field. Instead, it calculates a confidence score based on how many signals two records share and how distinctive those signals are.

The process works something like this: two records with the same last name, same street address, similar first names (Bob vs. Robert), and purchasing history that shows similar sport preferences get a high confidence score — even if their email addresses are completely different. Two records with the same common last name and same city get a low confidence score and stay separate.

The signals that matter most in the athletic department context:

Name similarity: Not just exact matches, but nickname resolution (Bob/Robert, Bill/William, Liz/Elizabeth), hyphenated names, and common name variations
Address matching: Same address is a very strong signal. Address-plus-unit-number variations need special handling for families in the same building
Phone number: Highly stable over time, but requires normalization (different formatting of the same number)
Purchase behavior: Same sport, same venue section, same payment method — behavioral fingerprints that survive email changes
Temporal patterns: Records created within a short window of each other, across different systems, are more likely to be the same person

The match rate difference is significant

The practical difference between email-only matching and probabilistic matching is substantial. When we run probabilistic identity resolution against a program's data for the first time, we typically find that 15–25% of what looked like unique individual records are actually duplicates or cross-system representations of the same fan.

For a program with 200,000 fan records across all systems, that could mean 30,000–50,000 fans who appear twice or three times. Every one of those duplicate records is a relationship being managed as two separate people — with all the coordination failures that implies.

The match review process

Good probabilistic matching isn't fully automated. High-confidence matches (90%+) can be merged automatically. Moderate-confidence matches (60–90%) get surfaced for human review. Low-confidence matches (below 60%) stay separate.

The review process is where athletic department staff provide real value. Your season ticket coordinator who knows that "the Hendersons" always buy tickets together and donate separately can make a judgment call that no algorithm would make. That institutional knowledge, when captured in the identity resolution system, improves every future match.

See your identity resolution rate

Connect your systems to athvin and see how many of your "unique" fan records are actually the same person.

Request a Demo