mobile logo

Search

How to approach deduplicating contacts in your database without overhauling everything

Dominik Soanes

Head of NFP

esynergy

Managing supporter, volunteer, and donor data effectively is critical for charities. One of the most common challenges is dealing with duplicate records in databases. While the idea of a centralised “golden record” is appealing, it’s not always feasible to achieve upfront due to the complexity and effort involved. 

A practical approach focuses on creating a single customer view that identifies potential duplicates across systems and provides traceability to the original source systems. This method avoids prematurely combining data into a single “golden record” and instead highlights how likely records are duplicates, enabling deeper investigation and remediation at the source. 

Understanding the origin and nature of duplication allows charities to address the root cause rather than attempting to merge data without full confidence. Since creating a reliable golden record often requires external data enrichment or manual validation—both of which can be resource-intensive—this approach offers a more achievable path to improving data quality and managing duplication effectively. 

If this resonates with your organisation, here are three key steps to get started: 

 

Step 1: Understand the problem 

Before jumping into solutions, take time to grasp the scope of the issue. 

  • Quantify the impact: How many duplicate records exist in your systems? Are they affecting reporting accuracy, supporter experience, or staff efficiency? 
  • Identify pain points: Duplication can lead to wasted resources, mismatched communications, and frustrated supporters. Where do these challenges hit hardest? 

This step is about understanding both the scale of the problem and its implications, which will help you prioritise efforts. 

  

Step 2: Assess your current data and architecture

Charities often manage data across multiple systems—CRMs, email platforms, volunteer management tools, and more. Understanding both the tools and the data they handle is critical for tackling duplication. 

  • Audit your systems: Map out the tools in use and the types of data they manage. 
  • Understand your data: Identify and define the fields critical for detecting duplicates, such as names, email addresses, and phone numbers. 
  • Identify duplication points: Locate where duplicates are most likely to occur, such as when donors appear separately in different platforms. 
  • Engage key stakeholders: Collaborate with teams across the organisation to capture how data is created, used, and shared. 

This step provides the foundation for identifying duplicates systematically and integrating efforts within the existing technology landscape. 

 
Step 3: Leverage the right process and tools 

Once you’ve assessed the scope of the problem and understood your current setup, the next step is to explore practical ways to address duplication. Successful deduplication requires well-defined processes supported by effective tools. 

  • Define rules and thresholds: Establish clear criteria for identifying duplicates and assign probabilities for potential matches within the single customer view. 
  • Data cleaning: Flag and address duplicates within each system using built-in tools or custom processes. 
  • Integrate systems: Link platforms to synchronise data updates across the organisation, reducing duplication risks. 
  • Automate Standardisation: Apply workflows to standardise data entry and maintenance, minimising manual effort and errors. 

Focusing on processes ensures a structured approach that can evolve and scale, avoiding the pitfalls of one-off deduplication efforts.

Step 4: Monitor and Refine the Process 

Treat deduplication as an ongoing effort rather than a one-time project. 

  • Validate on a controlled dataset: Test the accuracy of matching algorithms using a defined dataset. Investigate mismatches and refine rules instead of manually correcting individual records. 
  • Improve matching algorithms: Use insights from validation to enhance rules and thresholds, improving accuracy across datasets. 
  • Develop a pipeline: Build an automated, repeatable pipeline for identifying and resolving duplicates. Iterations will refine the process, increasing reliability over time. 

Continuous monitoring and iteration lead to better data quality and a more robust system for addressing duplicates across the organisation.