I was curious about this Big News that some of my favorite search engines now had access to the Twitter “firehose,” so I checked out Collecta for ‘Sarah Palin.’ Here are some of the results:
“Sarah Palin’s new job comes with one tremendous benefit for the public…a remote control…ZAP!!!”
“Earthquakes, Huge waves, Kids on air traffic control, Sarah Palin talk show, what’s this world coming to?”
“Sarah Palin shopping a reality TV show. Show me to a vomitorium.”
“Anyone watch Sarah Palin’s stand up? I was like don’t quit your day job and then I remembered she already did.”
Cute perhaps, but is it news? I decided to put the question to Gerry Campbell, CEO of the realtime search engine Collecta.
Gerry, why should I care that Collecta has the firehose?
Well Charles, as you know, this week it was announced that Collecta and six other companies would get the Twitter firehose… For Collecta that means that we’ll see every single tweet instantaneously. Right when the tweeter tweets it.
It’s called a firehose for a good reason, e.g. Extracting value from it is like drinking from the hydrant itself. Raw messages shooting by to the tune of fifty million a day, streaming continuously in a never ending blast. It takes a lot of focus and expertise to be able to make something useful out of it.
But this is where the metaphor breaks down. Water is water. Undifferentiated. One ounce in the hose is the same as any other ounce.
Messages are different. Each tweet has the potential of being just the one you’re looking for. Just like in web search, the concept of head and tail apply in spades here.
Comprehensiveness is critical in real-time search, just as in web search. Maybe even more so… Unless you’re @aplusk you yourself are a tail-dweller. Don’t you expect to find yourself when you search Twitter?
So what does this mean for Collecta?
Our goal is to have everything on the web that is happening right now. not just Twitter, not just social networks in general. Not even social networks and blogs. We have ten million sites that include traditional news, video, images social messages, blog posts, blog comments… Everything imaginable. And we’re growing that all of the time.
Why? Because that piece of information you want may be anywhere. And that story you’re following may evolve outside one single venue. I can guarantee it probably does.
To put a fine point on it, with the firehose, Collecta remains the single most comprehensive source of timely information in the world. Cool, huh? Try us out when you want to know what that smoke is down the street from you… I bet we can tell you. (True example)
Just as important as having everything possible to pick from and show you, we are also highly efficient at figuring out what NOT to show you.
As our content intake scales, we have more and more images, stories, comments and updates to scan for relevant results for you. In fact, Collecta’s adaptive filtering techniques thrive on huge datasets. We get better and better at delivering high quality trends with more content flowing.















Great post, Charles. This is pretty huge news for Collecta, so congratulations to them. I met Gerry at SMX this week and he justifiably pretty excited about what going on over there.
I'd be curious to know what the bandwidth of the “fire hose” looks like. It's got to be quite large.
By the way, what ever happened to the good old days of searching for “absinthe?”
Not sure what you're saying here. Bit of a longish, drifting article, well below par for TNW.
I don't think I'd use Collecta and I'm certain S.P. would not show up this way. Why would I even search for her? She was mildly important in the early stages of the US presidential pre-elections. Now, globally? What?
We'll see, but I think the firehose will just be fine, for all of us.
I think Twitter in general is better for evaluating trending topics, or discovering the mood of the most vocal of internet users. However, I can see where tapping into this stream can give Collecta in finding real time news, but that would be the gem amongst the dross. With your example, I might not turn to Twitter as a source for news or public opinion, but rather blogs, where more nuanced responses may arise. I imagine that the big challenge is determining how do we value the information from Twitter, and then place that data into an algorithm. Until we can find a useful way to assign authority to a tweet, we will have problems with real time search.