This article was published on March 4, 2010

Why should you care if a search engine has access to the Twitter firehose.


Why should you care if a search engine has access to the Twitter firehose.

firehoseI was curious about this Big News that some of my favorite search engines now had access to the Twitter “firehose,” so I checked out Collecta for ‘Sarah Palin.’ Here are some of the results:

“Sarah Palin’s new job comes with one tremendous benefit for the public…a remote control…ZAP!!!”

“Earthquakes, Huge waves, Kids on air traffic control, Sarah Palin talk show, what’s this world coming to?”

“Sarah Palin shopping a reality TV show. Show me to a vomitorium.”

“Anyone watch Sarah Palin’s stand up? I was like don’t quit your day job and then I remembered she already did.”

Cute perhaps, but is it news?  I decided to put the question to Gerry Campbell, CEO of the realtime search engine Collecta.

Gerry, why should I care that Collecta has the firehose?
Well Charles, as you know, this week it was announced that Collecta and six other companies would get the Twitter firehose… For Collecta that means that we’ll see every single tweet instantaneously. Right when the tweeter tweets it.

It’s called a firehose for a good reason, e.g. Extracting value from it is like drinking from the hydrant itself. Raw messages shooting by to the tune of fifty million a day, streaming continuously in a never ending blast. It takes a lot of focus and expertise to be able to make something useful out of it.

But this is where the metaphor breaks down. Water is water. Undifferentiated. One ounce in the hose is the same as any other ounce.

Messages are different. Each tweet has the potential of being just the one you’re looking for. Just like in web search, the concept of head and tail apply in spades here.

Comprehensiveness is critical in real-time search, just as in web search. Maybe even more so… Unless you’re @aplusk you yourself are a tail-dweller. Don’t you expect to find yourself when you search Twitter?

So what does this mean for Collecta?

Our goal is to have everything on the web that is happening right now. not just Twitter, not just social networks in general. Not even social networks and blogs. We have ten million sites that include traditional news, video, images social messages, blog posts, blog comments… Everything imaginable. And we’re growing that all of the time.

Why? Because that piece of information you want may be anywhere. And that story you’re following may evolve outside one single venue. I can guarantee it probably does.

To put a fine point on it, with the firehose, Collecta remains the single most comprehensive source of timely information in the world. Cool, huh? Try us out when you want to know what that smoke is down the street from you… I bet we can tell you. (True example)

Just as important as having everything possible to pick from and show you, we are also highly efficient at figuring out what NOT to show you.

As our content intake scales, we have more and more images, stories, comments and updates to scan for relevant results for you. In fact, Collecta’s adaptive filtering techniques thrive on huge datasets. We get better and better at delivering high quality trends with more content flowing.

Get the TNW newsletter

Get the most important tech news in your inbox each week.

Also tagged with