name: duplicate-detective description: Identifies potential duplicate Accounts, Contacts, or Leads in Dataverse using intelligent fuzzy matching that catches nicknames, abbreviations, phone format variations, and address similarities. Use when user says "find duplicates", "check for duplicate accounts", "are there any duplicate contacts", "duplicate detection", "clean up duplicates", "merge duplicates", or "data quality check". metadata: author: Dataverse version: 1.0.0 category: data-quality
Duplicate Detective
Sales and marketing teams often accumulate duplicate records due to data entry variations, imports from multiple sources, and inconsistent naming conventions. Standard Dataverse duplicate detection rules are limited (max 5 rules per table) and miss subtle duplicates. This skill provides intelligent duplicate detection to maintain data quality and prevent wasted effort on duplicate outreach.
Instructions
Step 1: Determine Scope of Duplicate Check
Ask the user or infer from context:
- Target Table: Account, Contact, or Lead (or all three)
- Scope:
- Full scan of all records
- Specific subset (e.g., records created in last 30 days)
- Check against a specific record (e.g., "Find duplicates of Contoso")
- Matching Threshold: Strict (high confidence only) or Relaxed (include possible matches)
Step 2: Query Source Records
Based on scope, retrieve records to analyze:
For Accounts:
SELECT accountid, name, telephone1, telephone2, emailaddress1,
address1_line1, address1_city, address1_stateorprovince, address1_postalcode,
websiteurl, numberofemployees, industrycode
FROM account
WHERE statecode = 0
For Contacts:
SELECT contactid, firstname, lastname, fullname, emailaddress1, emailaddress2,
telephone1, mobilephone, jobtitle, accountid,
address1_line1, address1_city, address1_stateorprovince, address1_postalcode
FROM contact
WHERE statecode = 0
For Leads:
SELECT leadid, firstname, lastname, fullname, companyname, emailaddress1,
telephone1, mobilephone, jobtitle, address1_line1, address1_city,
address1_stateorprovince, address1_postalcode, websiteurl
FROM lead
WHERE statecode = 0
Step 3: Apply Fuzzy Matching Algorithms
3.1 Company/Account Name Matching Detect duplicates even with variations:
- Abbreviation Handling: "International Business Machines" ↔ "IBM"
- Legal Suffix Normalization: "Contoso Inc." ↔ "Contoso Corporation" ↔ "Contoso LLC"
- Common Abbreviations: "Mfg" = "Manufacturing", "Intl" = "International"
- Word Order Variations: "Microsoft Corporation" ↔ "Corporation Microsoft"
- Punctuation & Spacing: "A.B.C. Company" ↔ "ABC Company"
- The/A Prefix Handling: "The Contoso Group" ↔ "Contoso Group"
Matching Logic:
- Normalize both names (lowercase, remove punctuation, expand abbreviations)
- Calculate string similarity using:
- Levenshtein distance (edit distance)
- Jaro-Winkler similarity (handles transpositions)
- Token-based matching (matching words regardless of order)
- Score: 90%+ = High confidence match, 75-89% = Possible match
3.2 Person Name Matching Detect duplicate contacts/leads:
- Nickname Resolution: "Robert" ↔ "Bob" ↔ "Rob" ↔ "Bobby"
- Name Variations: "William" ↔ "Will" ↔ "Bill" ↔ "Billy"
- Initial Matching: "J. Smith" ↔ "John Smith"
- Name Order: "John Smith" ↔ "Smith, John"
- Hyphenated Names: "Mary Smith-Jones" ↔ "Mary Jones"
- Middle Name Handling: "John Michael Smith" ↔ "John Smith"
Common Nickname Mappings:
| Formal Name | Nicknames |
|---|---|
| Robert | Bob, Rob, Bobby, Robbie |
| William | Will, Bill, Billy, Willy |
| Richard | Rick, Rich, Dick |
| Michael | Mike, Mikey |
| James | Jim, Jimmy, Jamie |
| Elizabeth | Liz, Beth, Betty, Lizzy |
| Jennifer | Jen, Jenny |
| Katherine | Kate, Katie, Kathy, Cathy |
3.3 Phone Number Matching Normalize and compare phone numbers:
- Strip all formatting: "(555) 123-4567" → "5551234567"
- Handle country code variations: "+1-555-123-4567" ↔ "555-123-4567"
- Match across fields: telephone1 vs mobilephone vs telephone2
- Consider area code changes for same base number
Phone Normalization Steps:
- Remove all non-numeric characters
- Remove leading country code if present (1 for US)
- Compare last 10 digits for US numbers
- Flag as match if any phone field matches
3.4 Email Address Matching Detect email-based duplicates:
- Domain Variations: Gmail.com aliases (+ addressing)
- Dot Variations: "john.smith@gmail.com" ↔ "johnsmith@gmail.com"
- Case Normalization: Always compare lowercase
- Typo Detection: "john@gmial.com" likely = "john@gmail.com"
- Corporate Email Patterns: Match same domain = same company
3.5 Address Matching Detect address duplicates with variations:
- Street Abbreviations: "Street" ↔ "St" ↔ "St."
- Direction Abbreviations: "North" ↔ "N" ↔ "N."
- Building/Suite Handling: "123 Main St Suite 100" ↔ "123 Main St"
- Postal Code Normalization: "12345-6789" ↔ "12345"
- State Variations: "California" ↔ "CA"
Address Normalization:
- Standardize street type abbreviations
- Standardize direction prefixes/suffixes
- Remove suite/apt/unit information for base comparison
- Normalize postal code to 5 digits
- Standardize state to 2-letter code
Step 4: Score and Rank Potential Duplicates
Composite Scoring Model: Calculate overall duplicate probability using weighted scores:
| Field Type | Weight (Accounts) | Weight (Contacts) | Weight (Leads) |
|---|---|---|---|
| Name | 35% | 30% | 30% |
| 25% | 30% | 30% | |
| Phone | 20% | 20% | 20% |
| Address | 15% | 15% | 15% |
| Website | 5% | - | 5% |
| Company (for contacts) | - | 5% | - |
Score Thresholds:
- High Confidence (90-100%): Almost certainly duplicates
- Medium Confidence (75-89%): Likely duplicates, review recommended
- Low Confidence (60-74%): Possible duplicates, investigate
Step 5: Group and Present Duplicate Sets
Organize Results:
Duplicate Set 1 (High Confidence - 95% match):
├── Record A: Contoso Inc. (accountid: xxx)
│ ├── Phone: (555) 123-4567
│ ├── Email: info@contoso.com
│ └── Address: 123 Main Street, Seattle, WA
└── Record B: Contoso Corporation (accountid: yyy)
├── Phone: 555.123.4567
├── Email: sales@contoso.com
└── Address: 123 Main St, Seattle, WA 98101
Matching Signals:
- Company name: 92% similar (legal suffix variation)
- Phone: 100% match (format difference only)
- Address: 95% match (abbreviation variation)
Step 6: Provide Actionable Recommendations
For Each Duplicate Set, Recommend:
-
Merge Recommendation:
- Identify the "surviving" record (more complete, older, or more activity)
- List fields to preserve from each record
- Identify child records that would need reassignment
-
Activity Analysis:
- Count activities on each record
- Identify which record has more recent engagement
- Flag if merging would consolidate significant history
-
Relationship Impact:
- List opportunities linked to each account
- Count contacts under each account
- Identify potential data loss risks
Output Format:
DUPLICATE DETECTION SUMMARY
===========================
Scan Parameters:
- Table: Account
- Records Scanned: 5,432
- Date Range: All active records
Results:
- High Confidence Duplicates: 23 sets (46 records)
- Medium Confidence Duplicates: 45 sets (98 records)
- Low Confidence Duplicates: 12 sets (26 records)
TOP PRIORITY DUPLICATES (High Confidence):
1. [Account] Contoso Inc. ↔ Contoso Corporation
Match Score: 95%
Recommendation: Merge into "Contoso Inc." (more activity)
Impact: 3 opportunities, 12 contacts would be consolidated
2. [Contact] Bob Smith ↔ Robert Smith (both at Contoso)
Match Score: 92%
Recommendation: Merge into "Robert Smith" (more complete record)
Impact: 8 activities would be consolidated
3. [Lead] ABC Manufacturing ↔ A.B.C. Mfg Company
Match Score: 91%
Recommendation: Review - may be parent/subsidiary
Impact: Different addresses - verify relationship first
Step 7: Execute Merge (If Requested)
If user wants to proceed with merge:
-
Document Current State:
- Export both records' data
- List all child records
-
Update Child Records:
- Reassign contacts to surviving account
- Update opportunity regardingobjectid
- Move activities to surviving record
-
Merge Field Values:
- Keep most complete values
- Concatenate notes/descriptions if both have content
- Preserve all email addresses and phone numbers
-
Deactivate Duplicate:
- Set statecode = 1 (Inactive) on duplicate record
- Add note explaining it was merged
Dataverse Tables Used
| Table | Purpose |
|---|---|
account | Primary entity for company duplicates |
contact | Primary entity for person duplicates |
lead | Primary entity for lead duplicates |
opportunity | Check for linked opportunities |
activitypointer | Count activities on each record |
annotation | Document merge decisions |
Key Fields Reference
account:
name(NVARCHAR 160) - Company name for matchingaccountnumber(NVARCHAR 20) - Account number for exact matchingtelephone1,telephone2(PHONE) - Phone matchingemailaddress1(EMAIL) - Email matchingaddress1_line1,address1_city,address1_stateorprovince,address1_postalcode- Address matchingwebsiteurl(URL) - Website matchingparentaccountid(LOOKUP → account) - Parent company relationshipstatecode(STATE) - Active(0), Inactive(1) - Filter active only
contact:
firstname,lastname(NVARCHAR) - Name componentsfullname(NVARCHAR) - Calculated full nameemailaddress1,emailaddress2(EMAIL) - Email matchingtelephone1,mobilephone(PHONE) - Phone matchingaccountid(LOOKUP → account) - Parent accountjobtitle(NVARCHAR) - Additional matching signaldepartment(NVARCHAR) - Department for disambiguationstatecode(STATE) - Active(0), Inactive(1)
lead:
companyname(NVARCHAR 100) - Company matchingfirstname,lastname,fullname(NVARCHAR) - Name matchingemailaddress1(EMAIL) - Email matchingtelephone1,mobilephone(PHONE) - Phone matchingaddress1_line1,address1_city,address1_stateorprovince- Address matchingwebsiteurl(URL) - Website matchingstatecode(STATE) - Open(0), Qualified(1), Disqualified(2)statuscode(STATUS) - New(1), Contacted(2) [Open]; Qualified(3) [Qualified]; Lost(4), Cannot Contact(5), No Longer Interested(6), Canceled(7) [Disqualified]
Fuzzy Matching Best Practices
-
Always normalize before comparing:
- Lowercase all text
- Remove punctuation
- Standardize abbreviations
-
Use multiple matching signals:
- Never flag as duplicate based on single field match
- Require 2+ strong signals for high confidence
-
Consider business context:
- Same company name in different cities may be branches
- Same person name at different companies = different people
- Parent/subsidiary relationships are not duplicates
-
Handle false positives:
- "John Smith" is common - require additional signals
- Generic company names need more verification
- Allow user to mark "Not a Duplicate"
Examples
Example 1: Find Account Duplicates
User says: "Check for duplicate accounts in my CRM"
Actions:
- Query all active accounts from Dataverse
- Apply fuzzy matching on name, phone, email, address
- Group potential duplicates by confidence score
- Present high-confidence matches first
Result:
DUPLICATE DETECTION SUMMARY
- Records Scanned: 5,432 accounts
- High Confidence Duplicates: 23 sets
- Recommendation: Start with "Contoso Inc." ↔ "Contoso Corporation" (95% match)
Example 2: Check Specific Record for Duplicates
User says: "Find duplicates of Fabrikam Inc."
Actions:
- Retrieve Fabrikam Inc. record details
- Compare against all other active accounts
- Apply weighted scoring across all matching fields
- Return ranked list of potential matches
Result:
Potential duplicates of "Fabrikam Inc.":
1. Fabrikam Incorporated (87% match) - Same phone, similar address
2. The Fabrikam Group (72% match) - Same website domain
Example 3: Contact Duplicate with Nickname
User says: "Are there duplicate contacts named Bob at Contoso?"
Actions:
- Search contacts at Contoso account
- Expand "Bob" to include Robert, Rob, Bobby
- Compare phone and email across matches
- Present findings with merge recommendation
Result:
Found potential duplicate:
- "Bob Smith" (created 2024-01-15) - 3 activities
- "Robert Smith" (created 2023-06-01) - 12 activities
Recommendation: Merge into "Robert Smith" (more complete record)
Troubleshooting
Error: Too many potential duplicates returned
Cause: Matching threshold too relaxed or common names in dataset Solution:
- Increase confidence threshold to 85%+
- Require 2+ matching signals for flagging
- Filter by specific criteria (industry, geography)
Error: Known duplicates not detected
Cause: Significant spelling variations or missing data in fields Solution:
- Check if key fields (email, phone) are populated
- Review nickname mappings for person names
- Consider adding custom abbreviation rules
Error: False positives flagged as duplicates
Cause: Common names or parent/subsidiary relationships Solution:
- Review address and account hierarchy
- Check if records are intentionally separate (branches, divisions)
- Mark as "Not a Duplicate" to exclude from future scans