Home > Excel > Advance Fuzzy lookup in Excel – Data cleaning

Advance Fuzzy lookup in Excel – Data cleaning


Data cleaning is no more tough task. But, we need to have a strategy and knowledge based mechanism. Like Data Quality Service.

But for now, we have data across various sources. one of the source is Excel, the most powerful data analysis tool.

Fuzzy lookup add-in for excel

Finding duplicate value with exact string or partial string between two table or single table in Excel.

This fuzzy lookup add-in using the advance technology “Jaccard similarity” to find the duplicates.

It can find errors including spelling mistakes, abbreviations, synonyms and added/missing data.

It add more power to the excel.

you can download the add-ins from the Microsoft site.

Implementation

Once you install the excel add-in setup, you will get a set of file along with the Fuzzy lookup add-in.

Portfolio.xlsx has a pre-build data sets in two work sheets. You can just click the Fuzzy lookup menu from the tool bar and click fuzzy lookup.

It will open a panel on the right side of the excel workbook. Here you can use Fuzzy join between  columns in two tables.

You can configure the fuzzy lookup properties such as similarity score and number of matches in this panel.

Thanks for Reading.

Advertisements
  1. Prasanna
    October 22, 2011 at 8:05 pm

    Installed the EXE from http://www.microsoft.com/download/en/confirmation.aspx?id=15011
    I did not see any Portfolio.xlsx file

    Any help?

    • October 22, 2011 at 8:47 pm

      Prasanna,

      If you search at the installed folder you could see this protfolio.xlsx file.

      But, I will send you a copy of my file to your email.

  2. Hemant
    December 27, 2011 at 11:15 am

    For reader’s benefit: Fuzzy Lookup is Add-in for Excel 2010. Though it is not tested for Excel 2007 it is learned that some people were successful in getting the result in 2007.

  3. sam
    December 2, 2012 at 6:42 pm

    Have you had any luck figuring out how to use the advanced configuration?
    e.g. defining transformation rules, “Inc” = “incorporated”, or “FC” = “football club”
    e.g. putting more weight towards a match for the “Name” column, than a match for say area code. because an area code of 29980, is not at all similar to 49980, although they are only out by one number..they are nothing alike.
    thanks.

  1. No trackbacks yet.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: