Archives for "April 2004"

One-Fifth of All Searches are Consumer Queries, Claims Google

At a search engine marketing conference held in Belfast a couple of days ago, a Google representative claimed that 65% of all searches were information-seeking (or, as he put it, "people trying to educate themselves"), while 20% were consumer queries. 20%! That seems unlikely. I suspect Google are conveniently using a very loose interpretation of the data. For example, if I type in "Jamaica", Google *could* suppose that I'm looking to purchase a sun holiday, when I may simply be looking for information on the country. Of course, the people at Google have more information at their disposal -- they can actually see which link I click on the results that are returned for my search on "Jamaica". In fact, via my Google toolbar data, they can probably tell how long I stay on any site I visit, and whether or not I make a purchase! In addition, searches are likely to be more specific, containing at least two words -- "package holidays in Jamaica" is more indicative of a consumer query than "history of Jamaica". Still, I don't believe that Google has done this, especially not when it comes out with such a high, round figure of 20%. The representative who quoted this, by the way, was promoting Google Adwords, so I think it's fair to assume he had a vested interest in giving a high figure. I'd say that less than 1% of my searches are consumer queries, and I suspect the same is true of the average *regular* user of the internet. The Google rep didn't explain, either, what the remaining 15% of queries are for. I guess that means they are porn searches!

Search Engine Spam

Among the services I offer are search engine optimization (SEO) and search engine marketing. The former involves constructing a website in such a way that it gets picked up for searches on relevant queries; the latter involves advising clients on other, paid-for ways to promote a site in search engines (such as Google Adwords, etc.). Both of these services are tolerated by the major search engines -- albeit reluctantly. What search engines can't stand is search engine spam -- pages that deceive either the search engines or web users. Search engine spam is becoming a major problem, even though we don't ever see most of it, as it gets filtered out of the results. However, a significant percentage of the pages on the web are now spam. Just like email spam, many spam web pages are computer-generated, and would be considered a nuisance by the average user. The search engines have been at war with spam for some time, but the research community have lately joined the fight. Check out, for example, this paper from Stanford University, which attempts a taxonomy of spam. I have the feeling, however, that the current relationship between the search engines and spammers is just the beginning of what's going to be a long, ever-complex stretch of evolutionary adaptation and counter-adaptation...

Gigablast: Watch this Space

It's almost hard to believe that the massively impressive Gigablast search engine was built, and is maintained by, a single individual: Matt Wells. This month's edition of AMCQueue contains a great interview with Wells. But if you thought my previous entry -- about the interview with Yahoo's Search Manager -- contained lots of search technology jargon, then I should definitely warn you this time. Wells is a geek++, but he knows his stuff. Here's some of the highlights of what he had to say. On Search Engine Spam: Spam is a huge problem for search engines today. Nobody is accountable on the Internet and anybody can publish anything they want. A large percentage of the content on the Web is spam. On Google: "Google's strength is its cached Web pages, index size, and search speed. Cached Web pages allowed Google to generate dynamic summaries that contain the most query terms. It gives greater perceived relevance. It allows searchers to determine if a document is one they'd like to view. The other engines are starting to jump on board here, though. I do not think Google's results are the best anymore, but the other engines really are not offering enough for searchers to switch." On PageRank: I do not think Google's success is due that much to its PageRank algorithm. It certainly touts that as a big deal, but it's really just marketing hype ... PageRank is just a silly idea in practice, but it is beautiful mathematically. On the Similarities Between a Search Engine and a Human Brain: Now that the Internet is very large, it makes for some well-developed memory. I would suppose that the amount of information stored on the Internet is around the level of the adult human brain. Now we just need some higher-order functionality to really take advantage of it. At one point we may even discover the protocol used in the brain and extend it with an interface to an internet search engine. On Gigablast: I'm hoping to build Gigablast up to 5 billion pages this year.

Yahoo Search Guru Reveals All

For those of you who, like me, are very interested in search engines (you'd never have guessed, right?), it's worth checking out this month's edition of e-marketing news. There's a fascinating, lengthy interview with Yahoo's Search Manger, John Glick. In it, Glick describes how Yahoo went about taking all the knowledge it had gained through its acquisitions of erstwhile competitor search engines, and put it together to create a new technology, aimed at taking on Google. Glick also provides many insights into what Yahoo regards as spam, and the measures it takes to prevent it (these often differ from Google's). I don't buy Glick's attempts to explain Yahoo's new paid inclusion programme (called SiteMatch), which has all the usual fudges we've been getting from Yahoo. (See my post about how SiteMatch inevitably means stale listings.) Nor do I buy any of the claims that Yahoo's future ambitions -- personalisation and local search -- are motivated by the need to better serve searchers. Rather, they are aimed at better serving merchants (or, from Yahoo's point of view, its customers). The examples that Glick provides make this clear, as they smack of overtly commercial concern, not genuine concern for the searcher's needs: "For example, if you been looking at a lot of travel sites and you type China, then you may want information on the country. If you've just been looking through jewellery sites, or wedding venue sites and that sort of thing, you may be planning a wedding and maybe you're looking for China dishes! So it's taking that type of information. "And it's not just which product you're looking for. It's where you are in that product buying cycle ... A person who simply types in: iPod - maybe looking to figure out what they are, should they be getting into this digital music download thing and dropping their CD player for the new thing. Whereas, further into the process, they may be looking for who's got the best price on iPods. Or maybe it's about, does the store down the street have the new one in stock yet. It's taking a lot of that context." Still, lots of useful info here, plus some juicy jargon, including "hill-climbing genetic algorithms" (as opposed to "tree-parsing/gradient-boosting" ones), and "de-aliasing tables". More marginally valuable information to clutter up my small, overcrowded brain...

The Network Is The Computer. Really.

An article by Faisal Islam in yesterday's Observer, entitled Great moments for rivals of Gates, contained many thought-provoking insights into the future shape of the IT industry. I was struck by this assessment of Google's forthcoming Gmail service: "Rather than using a Windows desktop, everything - software, photographs, documents, music - could be based on a remote supercomputer and accessed through the web using efficient 'slim' mini-computers and souped-up mobile phones. 'This is what Larry Ellison at Oracle and Sun Microsystems have been banging on about, but haven't actually executed,' says one former Microsoft executive. 'They did not have the crucial killer application - search - but Google does.' At work, and increasingly at home, an information overload puts a premium on effective ways of filtering, indexing and quickly retrieving filed documents, photographs, videos and music. Software programmes matter, and have made Microsoft its billions. But the really important function for many people is navigation and access. So the search methods Google has developed, for mapping six billion web pages around the world, will become potentially critical within corporations, besides being seen as crucial for home use." Of course, Gmail also opens up all sorts of privacy concerns ... but many have predicted for some time that the future of computing lies with network storage. Google is making that future a reality.

Website Findability: My Search Engine Optimization Book

I've finally finished my book on search engine optimization, which I?ve called Website Findability: How to Get Traffic from Google and Other Search Engines.

I got the idea to write a book on this subject around the end of last year, shortly after a spike in visitor traffic to this weblog. The surge in visitors was a result of a piece I'd written about Google's "Florida update". A lot of sites lost good positions in Google as a result of that update, and a lot of people posted anguished comments in response to my entry.

I helped some of the commenters out by giving tips as to how to improve their sites' positions. The more help I gave, the more questions I got -- soon I was inundated with emails!

Now, I normally provide search engine optimization services to corporate clients. But the people who were emailing me were Mom 'n' Pop businesses, who didn't have a budget for optimization. I wondered if there was a way I could help all these people and make a little money for myself.

That's when I decided to write a book, detailing everything I know about search optimization -- or website findability, as I call it.

I soon had a dilemma. Since my subject is a hot topic, and I'm a writer by "trade", I was confident that I could interest a commercial publisher. But even as I hacked out the first draft, changes in the search engine industry were rendering some of my text obsolete. How could I ever write a book that would still be relevant by the time it hit the shelves?

That's when I decided to fully embrace the web medium: I would publish the book myself, in PDF format, and commit to updating it every 6-8 weeks, so that the information would not go be out of date.

So my readers are happy. And, by cutting out the middleman, I'm happy.

What I really like about being the author of a search engine optimization book is that lets me help far more people with their websites than I could with my small business.

On the other hand, I might be creating a lot of competitors for myself -- since anyone who reads my book will know as much about search engine optimization as I do!

Man Sues Google for "Libel"

The battle of the search engine giants is being fought in the news as much as in the technology sector. A month ago, Yahoo was making all the headlines, but in the last fortnight, it’s been nothing but Google, Google, Google. Amidst this flurry of news, PR and gossip, most industry analysts have missed a small but crucial event that, I believe, carries great portent for the future of the relationship between search engines and private individuals. In Southern California a few weeks ago, a man called Mark Maughan Googled himself (that is, he looked up his own name in Google) and was miffed at what he found -- false claims, apparently, about him and his company*. Maughan has initiated a lawsuit aiming to sue Google. He has little chance of winning, since Google is not responsible for the content that appears in the results. It seems that Maughan does not understand very well how search engines work. Then again, who among the general population does? And there’s the rub. As search engines, especially Google, are coming to have more and more importance in our society, how can people protect their identities online? I have been making this point for some years now, and long ago made sure that any search for Michael Heraghty or Heraghty will lead to results over which I have at least some control. (I have managed to secure the top positions for these searches.) I have done the same thing for many of my clients, a service I call “online identity management.” I believe that this service will become increasingly important in the future. On the internet, our reputations really do precede us. The growth of weblogs and social networking webs are indications that people are starting to create and manage their own virtual reputations, identities and “brands”. *Of course, a search for Mark Maughan now takes you to results that are predominantly related to the lawsuit story.

Google Gmail Raises Privacy Concerns

I have never before seen an announcement by an internet company cause such a stir. Google's recent promise to bring Gmail -- a free email service offering massive storage space and other enticing features -- has caused a huge storm about privacy concerns. Already, 28 privacy and civil liberty organisations in the US have joined together to call for Google to suspend its Gmail plans. "The 28 organizations are voicing their concerns about Google’s plan to scan the text of all incoming messages for the purposes of ad placement, noting that the scanning of confidential email for inserting third party ad content violates the implicit trust of an email service provider. The scanning creates lower expectations of privacy in the email medium and may establish dangerous precedents. Other concerns include the unlimited period for data retention that Google’s current policies allow, and the potential for unintended secondary uses of the information Gmail will collect and store." Meanwhile, Google points to its privacy policy, which states: "We serve highly relevant ads and other information as part of the service using our unique content-targeting technology. No human reads your email to target ads or related information to you without your consent." But Google will store all of your emails into the future -- even after you close your Gmail account, should you do so. And though you might find Google trustworthy right now, can you extend that trust indefinitely into the future? While it would be extremely impractical for Google employees to read the contents of individual emails (since its databases be such a huge number of individual messages, the majority of which would contain little of interest to a third party), your messages would certainly be searchable -- especially by Google. One misplaced trigger word and your message might raise a flag down at the Googleplex. On the other hand, is there any such thing as private email? Unless both parties of an email exchange are using public key encryption, is pretty easy for any number of third parties intercepting your message. The bottom line is that email is *not* private. It never was. And it probably never will be.

Google & The World's Biggest Computer

Topix.net speculates on why Google launched Gmail. The post is fairly technical but the basic idea is that, in building the world's most powerful search engine, the guys at Google have created what is perhaps the world's largest proprietary, general-purpose computer, so it is inevitable that they should try to find other uses for it: "Google is a company that has built a single very large, custom computer ... They make their big computer even bigger and faster each month ... It's looking more like a general purpose platform than a cluster optimized for a single application. ... This computer is running the world's top search engine, a social networking service, a shopping price comparison engine, a new email service, and a local search/yellow pages engine. What will they do next with the world's biggest computer and most advanced operating system?" As to the latter question, Paul Ford has written a fictitious, speculative "short feature from a business magazine published in 2009," which examines how Google got to its postion of being the "world's single largest marketplace."

Google To Offer Email

The Yahoo! vs. Google battle gets more intense by the day. Google has just announced that it is to launch an email service, called GMail. According to the company's press release, GMail will have the following features:
  • A Google-power search feature that allows users to search all emails sent and received.
  • 1,000 megabytes (1 gigabyte) of free storage space. Yahoo, by contrast, offers a mere 6 megabytes.
  • Smart ordering of related emails into "conversations".
  • Ultra-advanced spam filters.
  • No banner or pop-up ads, only sponsored text links.
It is in the latter area -- sponsored text links (aside: I wonder how/if Google will target these links?) -- that Google expects to draw revenue from its Gmail service. Based on the storage space promise alone, I'd switch from Yahoo (my current email host provider) to Google in a flash. Indeed, the storage offer is so enticing, that many Google-watchers were today questioning whether the announcement was an April Fool's joke!