Purifyr strips web sites down to their basics and gets rid of irrelevant clutter – ideal for mobile browsing or printing, and for speeding up your experience on slow or shared connections. Click to Take a look at how we look naked. Purifyr strips out headers and footers, and removes advertisements.
Click and drag this Purifyr bookmark and drop into your browser toolbar to use the Purifyr service on any web page you visit.
Purifyr also has a full API to allow you to make free use of the service. Cost options for faster processing include hosted use of the service, and they also offer the service for use behind firewalls, helping to save on data costs and improving the ’semantic’ or ‘meaning’ value of web content.
The funding will help to scale the site search platform and grow monthly audience of Quintura Site Search from its current 8 million to 50 million in 2009. That’s quite an ambition, but the search engine is doing a good job already. Quintura offers its services to men’s magazine Maxim as well as tech blog ReadWriteWeb (talking about a diverse group of customers). But further growth is needed anyhow. In order to achieve this, Russian-based Quintura has hired Dennis Szerszen as US-based Chief Marketing Officer.
When I interviewed Sadchikov a while ago, he told me he aims for his product to be the “iPhone of search“. The search results already look quite good, but it isn’t such a visual spectacle as the famous shiny object yet. Sadchikov will need more money for that, that’s why he will use Mangrove’s capital for bringing in new venture capital investors as well.
This is a guest post by New Media student Edial Dekker
Science Fiction writers, visionaries, whose books I consumed as a child, made me believe that in a few years, shiny robots would handle all mundane tasks. There are many robots today, but no funny-whistling R2-D2’s. The robots today are invisible and immaterial, reading and indexing millions of websites on daily basis. They are robots built for speed and efficiency, mapping the Internet as fast and as accurately as possible. A few years ago we thought we could find anything that was out there on the Web, today we realize the Web is fragmented, divided into four continents with ‘Terra Incognita’-islands; websites that are clustered and simply can’t be found, no matter how many times you click or how hard you try.
No round-trips
Most search-engines do not even try to reach the full Web, because indexing as many as websites as possible isn’t necessarily the best way to provide the best search results. The Web is big yet small. But the small world behind the Web is a bit misleading. The Web is a scale-free network, dominated by hubs and nodes with a very large number of links. The World Wide Web has a directed structure. Andrei Broder, Vice President of Emerging Search Technology for Yahoo!, was the first person to notice how this directed network had consequences for the topology of the Web itself. For example, if you want to go from website A to website D, you can start from node A, then go to node B, which has a link to node C, which points to D. But you can’t make a round-trip. Most likely there is a different route one would have to find for going from node D to node A.
The four different continents of the Web
Albert-László Barabási, a Hungarian scientist, famous for contributing his insights on network theories, has tried to map the Web into four different continents:A Strongly Connected, or Central Core (SCC): this contains a quarter of all websites, it gives a home to all indexed websites and is easy navigable. This does not mean there is a link between all nodes; but the paths are defined and allows you to surf between the nodes.Than there are the IN and the OUT continents: these continents are just as large as the Central Core but are much harder to navigate. From the IN continent you can easily reach the SCC, but there is no path taking you back to the IN continent. In contrast, the OUT continent can easily be reached from the SCC, but has links to take you back to the core (where all the magic happens). The OUT continent is mostly populated by corporate websites that can easily be reached from outside, but once you get in, there is no way out.
The fourth continent is made out of Tendrils and disconnected Islands; they are interlinked groups that are unreachable from the SCC and have no links back to it. These websites can contain thousands of documents. The location of these websites have nothing to do with the content, but with relation to other documents.
There’s no way you can reach it
These four continents significantly limit the Web’s navigability. Where we can go, depends on the continent you start your search at. No matter how many times you time you want to click, when you are in the Central Core there is no way you can reach the IN continent or the Islands that surround it. Ever realized why search engines are giving user the option to submit websites? It’s because then the crawlers can sniff into those isolated islands that can otherwise never be found.
Is this fragmented structure here to stay? Barabási thinks it is. As long links remain directed, homogenization will never occur. One of the founding fathers of the Web, Tim Berners-Lee has been stressing the importance of links that track back to where they are linked from, for many years. The way blogs use the track-back system, can also be used for connecting the IN and OUT continent. The bottom line is that directed networks always break into the same four continents. The only way to organize is to reorganize the relations documents have with each other, semantic web anyone?
Here is the full presentation that Nova Spivack, technology visionary and entrepreneur and currently CEO and Founder of Radar Networks, gave at The Next Web Conference 2008 titled “Making Sense of the Semantic Web”. Erick Schonfeld called it “one of the clearest explanations of the semantic Web I’ve heard so far” and wrote an excellent article about some of the issues discussed in the presentation. Our hard bloggin’ scientist Anne Helmond also wrote an insightful summary of this presentation and several blogs like ReadWriteWeb referred to this post when they covered the ongoing discussion about the semantic web.