FuzzyDupes:NET Assembly Documentation

Fast Duplicate Search in DataTable Data

Library: KS.FuzzyDupesNet.dll (DotNet 2.0 Assembly)
Version: 6.4

Dependencies: System/none

 
General Information
Download
License
Installation

Interface


General Information

The FuzzyDupes:NET Assembly enables you to find similar records in database record data.

The data is processed in memory, the control implements optimized data structures in c# for this purpose. The algorithms used are universal for any string data. Adress data can be processed as any other data as well.

There is one main function that receives a System.Data.DataTable and returns another DataTable with the results. Some parameters can adjust the search.


Download

You can always find the current version at http://www.kroll-software.de/download/FuzzyDupesNETSetup.exe

Different versions will have the same download filename.


License

Disclaimer: FuzzyDupes:NET Assembly is provided "as is" with no warranty of any kind.

FuzzyDupes:NET Assembly is not Freeware. The Demo Version shows an about screen prior to every search. With a valid license, the about screen is never shown.

The Demo Version is for testing purposes only and must not be used in a productive environment.. More information about licensing the control can be found at the Kroll-Software website. Or contact info@kroll-software.de


Installation

  1. Copy the file KS.FuzzyDupesNet.dll and \de\KS.FuzzyDupesNet.resources.dll to your harddrive
  2. Add a dependency to this assembly into your DotNet 2.0 project (VB.NET, C#, or whatever..)
  3. Check Local Copy and set Specific Version to false


Methods

AboutBox()

Return Value: void
Parameters: none or [ParentForm]
 
Shows the AboutBox.

 

Reset()

Return Value: void
Parameters: none
 
Resets the settings for the next search. Call this function prior to every search.

 

Cancel()

Return Value: void
Parameters: none
 
Cancels a long running process. You can call this in a Progress-Event.

 

SetColumnOption()

Set options for each column
Return Value: void
Parameters: int ColumnIndex, bool Cluster, bool DupeSearch, double Weight, bool CompareNull, string ColumnName[, int ImportColumnIndex]
 
Set the Options for your Variant-Array.

Call this prior to DupeSearch(), FuzzyMatch() and FuzzyMerge()

*) Important note: Set CompareNull to 'true' for columns that are filled with data in (nearly) all rows (e.g. Last Name, Street, ZIP, City) and set CompareNull to 'false' for other columns which may contain NULL values in many rows (e.g. First Name, Phone Number, ...)

 

DupeSearch()

Search for duplicate values in a System.Data.DataTable
Return Value: System.Data.DataTable or null
Parameters: System.Data.DataTable T, double Threshold, double ClusterThreshold, DupeReturnResults ReturnResults, bool ShowProgress[, System.Windows.Forms.Form ParentForm]

The returned DataTable has an additional column containing GUIDs for each Record. These GUIDs are identical for similar records.

 

FuzzyMerge()

Merge two DataTables with fuzzyness
Return Value: System.Data.DataTable or null
Parameters: System.Data.DataTable T, System.Data.DataTable TMerge, double Threshold, double ClusterThreshold, MergeReturnResults ReturnResults, bool ShowProgress[, System.Windows.Forms.Form ParentForm]

The returned DataTable has up to 2 additional columns (depending on ReturnResults)

These GUIDs are identical for similar records.

 

FuzzyMatch()

Matches two System.Data.DataTable with fuzzyness,
positive/negative match between two lists
Return Value: System.Data.DataTable or null
Parameters: System.Data.DataTable T, System.Data.DataTable TMatch, double Threshold, double ClusterThreshold, MatchReturnResults ReturnResults, bool ShowProgress, System.Windows.Forms.Form ParentForm

The returned DataTable has 1 additional column containing 0=No-Match, or Index (1 based) from TMatch where a match was found

 

ExactMatch()

Matches two DataTables exact (without fuzzyness) for a given column
Return Value: System.Data.DataTable or null
Parameters: System.Data.DataTable T, System.Data.DataTable TMatch, int MatchColumnIndex, ExactMatchReturnResults ReturnResult

 

DeleteDupes()

Delete duplicate values from a DataTable (result from another method) depending on a FuzzyDupesID (GUID)
Return Value: System.Data.DataTable or null
Parameters: System.Data.DataTable T, int FIDColumnIndex, int SortColumnIndex, int MaxColumn, DeleteOrders DeleteOrder, DeleteReturnResults ReturnResults

Use this function, to get a clean (dupe free) search result:

  1. Search for dupes with DupeSearch() and ReturnResults=DRR_ALL
  2. Call this function to remove the dupes from the result

 

Canceled

This property returns true, when a long running process was canceled.
Return Value: boolean
Parameters: none
 

 


Working with Normalization Rules

The Method SetNormalizeOption() sets a normalization rule to a column.

AddNormalizeRule() adds a search/replace pair to the current normalize list

You can use the class NormalizeList to create new, load and save lists for normalization.

--- End ---

Back to Kroll-Software Website