KSPhonet OCX Documentation

Fast Duplicate Search in Recordset Data

Library: KSPhonetLib (KSPhonet ActiveX Control module)
GUID: {BC5CFE93-56AE-11D4-80C6-0048546EA2D1}
Version: 5.7

Dependencies: none

 
General Information
Download
License
Installation

Interface

Sample Source Code


General Information

KSPhonet Control enables you to find similar records in database record data.

The data is processed in memory, the control implements optimized data structures in c++ for this purpose. The algorithms used are universal for any string data. Adress data can be processed as any other data as well.

There is one main function that receives a Variant-Array returned from ADODB.Recordset.GetRows() and returns another Variant-Array with the results. Some parameters can adjust the search.


Download

You can always find the current version at http://www.kroll-software.de/download/fuzzydupescom.zip

Different versions will have the same download filename.


License

Disclaimer: KSPhonet-Control is provided "as is" with no warranty of any kind.

KSPhonet-Control is not Freeware. The Demo Version returns a maximum of 50 records. With a valid license, the control returns all records found.

More information about licensing the control can be found at the Kroll-Software website. Or contact info@kroll-software.de


Installation

  1. Copy the file ksphonet.ocx to a directory with a short (DOS) pathname
  2. Open the windows command prompt
  3. Register the control with the following command
    regsvr32.exe <Drive:Path\>ksphonet.ocx


Methods

AboutBox()

Return Value: void
Parameters: none
 
Shows the AboutBox.

 

Reset()

Return Value: void
Parameters: none
 
Resets the settings for the next search. Call this function prior to every search.

 

Cancel()

Return Value: void
Parameters: none
 
Cancels a long running process. You can call this in a Progress-Event.

 

SetColumnOption()

Set options for each column
Return Value: void
Parameters: ColumnIndex AS Integer, Cluster AS Boolean, DupeSearch AS Boolean, Weight AS Double, CompareNull As Boolean, ColumnName As String
 
Set the Options for your Variant-Array.

Call this prior to DupeSearch() and FuzzyMerge()

*) Important note: Set CompareNull to 'true' for columns that are filled with data in (nearly) all rows (e.g. Last Name, Street, ZIP, City) and set CompareNull to 'false' for other columns which may contain NULL values in many rows (e.g. First Name, Phone Number, ...)

 

DupeSearch()

Search for duplicate values in a variant-array (record data)
Return Value: Variant-Array or Empty Variant (vt=VA_EMPTY)
Parameters: varArray AS Variant, IDColumnIndex AS Integer, Threshold As Double, ClusterThreshold As Double, Options As String, ReturnResults AS Enum, ShowProgress As Boolean

The returned Variant has an additional column containing GUIDs for each Record. These GUIDs are identical for similar records.

 

FuzzyMerge()

Merge two arrays (record data) with fuzzyness
Return Value: Variant-Array or Empty Variant (vt=VA_EMPTY)
Parameters: varArray AS Variant, varImport AS Variant, IDColumnIndex AS Integer, Threshold As Double, ClusterThreshold As Double, Options As String, ReturnResults As Enum, ShowProgress As Boolean, LogFilenameImported As String, LogFilenameRejected As String

The returned Variant has up to 2 additional columns (depending on ReturnResults)

These GUIDs are identical for similar records.
The log file exports use the settings from the properties CSVDelimiter, CSVTextInQuotes, CSVColumnHeaders

 

FuzzyMatch()

Matches two arrays (record data) with fuzzyness,
positive/negative match with external list
Return Value: Variant-Array or Empty Variant (vt=VA_EMPTY)
Parameters: varArray AS Variant, varMatch AS Variant, IDColumnIndex AS Integer, Threshold As Double, ClusterThreshold As Double, Options As String, ReturnResults As Enum, ShowProgress As Boolean, LogFilenamePositive As String, LogFilenameNegative As String

The returned Variant has 1 additional column containing 0=No-Match, or Index (1 based) from varMatch where a match was found
The log file exports use the settings from the properties CSVDelimiter, CSVTextInQuotes, CSVColumnHeaders

 

DeleteDupes()

Delete duplicate values from a variant array (result from another method) depending on a DupeID (GUID)
Return Value: Variant-Array or Empty Variant (vt=VA_EMPTY)
Parameters: varArray AS Variant, FIDColumnIndex AS Integer, MaxColumnIndex As Integer, DeleteOrder As Enum, ReturnResults As Enum, ShowProgress As Boolean, LogFilenameDeleted As String

Use this function, to get a clean (dupe free) search result:

  1. Search for dupes with DupeSearch() and ReturnResults=DRR_ALL
  2. Call this function to remove the dupes from the result

The log file exports use the settings from the properties CSVDelimiter, CSVTextInQuotes, CSVColumnHeaders

 

VariantArray2CSVFile()

Export a results from a search to a CSV text file
Return Value: BOOL
Parameters: varArray AS Variant, Filename AS String
 
Export a Variant-Array to a CSV file.
Returns true when successful.

 

ExactMatchByColumn()

Match two arrays exactly by a key-column
Return Value: Variant-Array or Empty Variant (vt=VA_EMPTY)
Parameters: varArray AS Variant, varMatch AS Variant, MatchColumnIndex AS Integer, ReturnResults As Enum, ShowProgress As Boolean
This function can be used to delete a list from the record data. The delete list is e.g. the result from DeleteDupes() with parameter DER_DELETED

 

CanceledByUser()

Returns true, when a long running process was canceled.
Return Value: BOOL
Parameters: none
 

 


Sample Source Code

Pseudo Code (VB6 like)

 
--- End ---

Back to Kroll-Software Website