The Ancestor Hunt
  • Home
  • Blog
  • Newspapers
  • Newspaper Links
  • Obituaries
  • BMD Records
  • Photos
  • Yearbooks
  • Directories
  • By Location
  • Cemetery Records
  • Divorce Records
  • Naturalizations
  • Immigration
  • Mortuary Records
  • Church Records
  • School Records
  • Voter Records
  • Coroner Records
  • Probate and Wills
  • Alumni Records
  • Newsletter Page
  • Tools
  • Genealogy News
  • California Genealogy
  • Videos
  • Fun With Newspapers
  • About
  • Contact
  • Privacy Policy

When is Fuzzy Search Too Fuzzy? Elephind Tells Us!

7/16/2013

2 Comments

 
Picture
Approximately two weeks ago in my never-ending quest for more resources and repositories to research genealogy through newspapers, I wrote an article, Elephind - One To Watch  about a dynamic service that searches multiple online newspaper repositories at one time. I had considerable interest and have been in contact with them to discover more about their capabilities, technologies, etc.

Subsequent to reading my post, several genealogy bloggers have also written short articles about Elephind - attempting to inform their readership about this site. One of the articles was quite interesting in its comparison of results of a search using Elephind and the same search using the Library of Congress - Chronicling America newspaper research site. As a result of a comment penned by the Elephind folks in that article, I suggested that they write a guest post describing search technologies and the impact of fuzzy search on newspaper research.

So I am pleased to present the following guest post from Meredith Palmer of DL Consulting, the creators of Elephind:
______________________________________________________________________________________________

Search engine logic:  When is “fuzzy” search a bit too fuzzy?

By Meredith Palmer, DL Consulting

As a genealogist, family historian, or ancestor hunter, the almighty search engine is likely your most important forensic tool in the initial stages of discovery. These days, almost every website you come across, especially websites housing historical records, operates under some type of search functionality. It is the only way to wade through pages of documents efficiently and organize information in ways that make finding that hidden gem possible.

Faceted, federated, fuzzy…these are all terms to describe the various functions search engines can perform and each type has an important role to play in returning relevant results. Faceted search is something you are probably already familiar with as it is the function of filtering search results by criteria, such as date, title, or subject. Federated search is like an “uber-search”. It allows you to search multiple searchable resources with a single query. And finally, fuzzy search is…well…fuzzy.  It is a technique that helps us out by searching for words similar to the word we query, broadening the search to include likely alternative spellings. But, is it really helpful?

That depends. As Stefan Boddie recently described in response to a blog by Phillip Trauring on his website, www.bloodandfrogs.com, sometimes it can be very helpful but it may also be too clever for its own good, generating lots of false positive results. As Stefan points out, fuzzy search is useful in sorting through poor OCR text because it is intended to find close matches, assuming the resulting words are distorted versions of your query. The problem is the results are likely to include distorted versions of words similar to but not the same as your query. Suddenly, you have hundreds of thousands of results to sift through.

Chronicling America is an example of a website which uses fuzzy searching, for at least certain searches. To see how the search function on this site would behave I experimented with a search for my grandfather’s family name. 60 pages of results were returned, most matching a word similar to Meerse including “Monroe”, “Messrs”, “Melbourne”, “course”, “license” and many others. In a way, fuzzy search is like the wide angle lens on a camera. Turning the lens widens your view of the landscape including everything around you. If that’s too much to look at all at once, you need to dial back the focus, if the website you are using allows you to do that.

All historical news aggregators, such as Chronicling America, Trove, Papers Past, CDNC, as well as the pay per view news banks perform searches in slightly different ways. To make searching these sites easier and to provide a fast, federated search across all of them, Stefan Boddie and his colleagues built Elephind.com. Elephind incorporates many different digital newspaper collections and allows a user to query all of them at once. As Stefan says to Mr. Trauring, “Our goal is to make the search functions in Elephind better than those in the underlying collections like Chronicling America and Trove…” Therefore, the site is not set to conduct fuzzy search except on request.

If you would like to test out the search capabilities of Elephind and are interested in giving your feedback, I will pose the same questions Stefan asked Mr. Trauring. Do you think fuzzy search as implemented in Chronicling America is a good idea? Would it be better if searches were for exact matches by default, with some sort of "search for similar words" option the user could choose when desired? What other search features would make it easier to use these collections? Feel free to leave a reply here or contact me at Meredith@dlconsulting.com with your comments.

2 Comments
Steve
7/17/2013 02:15:03 am

I find your site no better than the "fuzzy" search. My query was "Newnom" and as with every other site it mistook "Newsom" for "Newnom". Why OCR is incapable of distinguishing N form S is beyond me.
It would also be nice if the search results were highlighted on the page so we don't have to read every word on the page.

Reply
Dylan
8/10/2013 11:33:35 am

If you want to see next generation fuzzy search over both item names AND content, you need to check out a newcomer AIKIN HyperSearch www.grappledata.com/aikin

Reply



Leave a Reply.

    Check Out the NEW Subscription Options

    Save Time

    With the ​By Location Feature
    ​

    Free Resource Links
    ​​

    By Location Newspapers Obituaries BMD Records Directories Photos Yearbooks Cemetery Records Divorce Records Naturalizations Mortuary Records Immigration Church Records School Records Voter Lists Coroners Records Probate and Wills Alumni Records

    Subscribe
    Option 1 - Receive Links to New Published Articles 4 X per month

    Enter Email

    Subscribe
    ​
    Option 2​ - Receive New Complete
    Bi-Monthly Newsletter​ 
    ​
    Enter Email

    Search This Site


    Write or Record Your Autobiography the Easy Way

    Picture
    ​Use the Coupon Code HUNT to get a 10% discount
    Picture
    ​Use the Coupon Code HUNT to get a 10% discount



    Facebook Page
    Picture

    RSS Feed

    Archives

    January 2021
    December 2020
    November 2020
    October 2020
    September 2020
    August 2020
    July 2020
    June 2020
    May 2020
    April 2020
    March 2020
    February 2020
    January 2020
    December 2019
    November 2019
    October 2019
    September 2019
    August 2019
    July 2019
    June 2019
    May 2019
    April 2019
    March 2019
    February 2019
    January 2019
    December 2018
    November 2018
    October 2018
    September 2018
    August 2018
    July 2018
    June 2018
    May 2018
    April 2018
    March 2018
    February 2018
    January 2018
    December 2017
    November 2017
    October 2017
    September 2017
    August 2017
    July 2017
    June 2017
    May 2017
    April 2017
    March 2017
    February 2017
    January 2017
    October 2016
    August 2016
    July 2016
    June 2016
    May 2016
    April 2016
    March 2016
    February 2016
    January 2016
    November 2015
    October 2015
    September 2015
    August 2015
    July 2015
    May 2015
    April 2015
    March 2015
    February 2015
    January 2015
    December 2014
    October 2014
    September 2014
    August 2014
    July 2014
    June 2014
    May 2014
    April 2014
    February 2014
    January 2014
    December 2013
    November 2013
    October 2013
    September 2013
    August 2013
    July 2013
    June 2013
    April 2013
    March 2013
    February 2013
    January 2013
    December 2012
    November 2012
    October 2012
    September 2012
    May 2012
    April 2012
    November 2010
    October 2010


Picture
©2012-21

Thanks for Visiting The Ancestor Hunt
The Ancestor Hunt is focused on helping primarily hobbyist genealogy and family history researchers to achieve their goals.

"The Ancestor Hunt" is a participant in the Amazon Services LLC Associates Program.  There may be a small commission paid to "The Ancestor Hunt" should you purchase from Amazon.
.
"The Ancestor Hunt" is also an affiliate for "A Life Untold", Trace.com, and "Audiobiography". There may be a small commission paid to "The Ancestor Hunt" should you purchase from these companies.