Diffbot today announced the release a production version of its API for developers, which lets people navigate the hidden world of the Web visually. All a developer/application needs to do in order to leverage Diffbot is submit a URL and he or she can see when content has changed on a website, or easily understand the different sections of a website, such as important text, advertisements and headlines. Diffbot also helps to distinguish context of material, so that Apple the computer maker is clearly differentiated from Apple the fruit based on other nearby articles.
Diffbot has two API offerings, On Demand and Follow. On Demand was created to analyze home pages and index pages using the common layout markers such as headlines, bylines, and images, with a sepate feature set that can extract clean text and images from web pages. Follow tracks chanegs that are made to a web page, and any updates, similar to an RSS feed. With Diffbot it’s easy for a developer to follow only the part of the page he is most interested in, and easily extract the metadata organized in a meaningful manner.
Like many powerful technologies, Diffbot emerged from a rather simple idea. “I was taking 8 CS courses one quarter, and created Diffbot as a tool for monitoring my class webpages,” says creator Mike Tung. “Anytime a professor posted a new homework assignment, lecture, or announcement, my phone would buzz and show me the new content. My friends wondered how I was always informed about everything in real-time and asked if they could use it, too. I realized, during my work in AI at Stanford, that techniques in computer vision and machine learning could be used to generalize my algorithm to not just analyzing class webpages, but any page on the web.”
Diffbot already has significant traction, and is being used by AOL Editions, which touts itself as “The magazine that reads you,” to extract user recommendations based on interactions with different content. Hacker News Radio is an Internet radio station for the blind that leverages Diffbot to allow users to hear a webpage’s content while avoiding extraneous information such as privacy policies and other non-crucial data.
Diffbot was launched from Stanford’s StartX program by Tung and co-founder Leith Abdullah, both on leave of absence from PhD programs at the school.