Ben McCann

Co-founder of Connectifier.
Investor at C3 Ventures.
Google and CMU alum.

Ben McCann on LinkedIn Ben McCann on AngelList Ben McCann on Twitter

Scraping

HTML Parsing using the Firefox DLLs

03/21/2008

One of my first posts was a comparison of HTML parsers. Today I found a particularly challenging document to parse. None of the parsers I had compared earlier were able to handle the malformed HTML in this table where the td elements were prematurely ended. The behavior of Neko and HtmlCleaner made the most sense Read More