Gigablast: Watch this Space

It's almost hard to believe that the massively impressive Gigablast search engine was built, and is maintained by, a single individual: Matt Wells. This month's edition of AMCQueue contains a great interview with Wells. But if you thought my previous entry -- about the interview with Yahoo's Search Manager -- contained lots of search technology jargon, then I should definitely warn you this time. Wells is a geek++, but he knows his stuff. Here's some of the highlights of what he had to say. On Search Engine Spam: Spam is a huge problem for search engines today. Nobody is accountable on the Internet and anybody can publish anything they want. A large percentage of the content on the Web is spam. On Google: "Google's strength is its cached Web pages, index size, and search speed. Cached Web pages allowed Google to generate dynamic summaries that contain the most query terms. It gives greater perceived relevance. It allows searchers to determine if a document is one they'd like to view. The other engines are starting to jump on board here, though. I do not think Google's results are the best anymore, but the other engines really are not offering enough for searchers to switch." On PageRank: I do not think Google's success is due that much to its PageRank algorithm. It certainly touts that as a big deal, but it's really just marketing hype ... PageRank is just a silly idea in practice, but it is beautiful mathematically. On the Similarities Between a Search Engine and a Human Brain: Now that the Internet is very large, it makes for some well-developed memory. I would suppose that the amount of information stored on the Internet is around the level of the adult human brain. Now we just need some higher-order functionality to really take advantage of it. At one point we may even discover the protocol used in the brain and extend it with an interface to an internet search engine. On Gigablast: I'm hoping to build Gigablast up to 5 billion pages this year.

Comments

0 comments

Search

About

Mediajunk is Michael Heraghty's blog, with articles on web design, usability, online marketing, digital innovation, etc. More »