Archives for "Search Engines: 2004"

Google Suggest: Gift or Gimmick?

Google Suggest is a new feature from Google Labs that predicts words as you begin to enter your search query. It's a little like predictive text on cellphones, except its based on actual searches that have been carried out by other users.

It's definitely fun to use, but it's difficult to know at this stage whether people will find Google Suggest useful or annoying. It's a little intrusive. Perhaps it would serve best as a feature that could be toggled on or off.

For now, though, the blogosphere seems to like it:

"Wow" -- Joi Ito

"At PA it suggests "Paris Hilton" and at PAM it suggests "Pamela Anderson," so clearly the technology works." -- Joho the Blog

"Seem to be filtering out some of the dirty stuff." -- Davenetics

"It works for phrases, too." -- Niel M. Bornstein

"MOST disgusting intrusion into privacy EVER" -- Joel Spolksy

Yahoo Video Search

Yahoo has beta-launched a video search feature, similar to Alltheweb's:

Google Sandbox Increases Domain Name Values

For almost a year now, webmasters have been complaining about what they call the "Google Sandbox".

Before then, webmasters with a good grasp of search engine optimization could launch a new site and expect it to show up in Google's results for certain keyphrases in a matter of weeks.

In early 2004, however, Google made a change to its algorithm to prevent such quick wins. Theories differ as to the specific nature of this change, but its outcome is clear: it is now extremely difficult to get a site with a new domain name listed prominently in Google’s results.

There are a couple of probable reasons for Google's alteration of its algorithm to produce this Sandbox effect:

a) Spammers profited from techniques that they knew would cause their sites to get banned. They didn’t care; when the sites got banned, they immediately launched new domains with the exact same content, and got high listings once again. The sandbox effect prevents this tactic from succeeding.

b) Age of a site is arguably an indication of quality. The longer established a particular domain name, the more likely it is to have been maintained, and to offer useful content. The sandbox effect prevents new sites from displacing more established competitors, even if the newer sites are otherwise better optimized.

The Google Sandbox causes new sites to go through a “probationary period” -- of indefinite duration -- before they can achieve good listings in Google. Worse, there is no guarantee that a site that's stuck in the sandbox will ever get out!

Google are, as ever, being coy about this change to its algorithm. At first, it denied outright the existence of the Sandbox. Whether it exists as a deliberate policy or not, the Sandbox effect cannot be denied. To find out, launch a new website, using a newly registered domain name. Then try finding it by searching Google.

Or just take read the accounts of those who have experienced the Sandbox first-hand:

For this reason, optimizers (and spammers) have resorted to purchasing existing sites and domain names, which in turn is pushing up the value of domain names. Expect them to continue rising; it seems that the Google Sandbox is here to stay.

MSN's Top 200

According to MSN (Microsoft Search), the search query typed most often into its search engine is ... Google!

Here is a full list of MSN's top 200 searches, distilled by the kind folk at webmasterworld:

1 google
2 yahoo
3 internet explorer
4 ebay
5 overture monitor five bids nz20040513c
6 mapquest
7 hotmail
8 espn
9 msn
10 yahoo mail

Continue reading "MSN's Top 200"

Google Scholar: Academic Search

Another new beta offering from the guys at the 'plex is Google Scholar, an engine confined to the narrow search space of academic documents.

Google scholar represents more than niche searching, it also allows users to access information that would normally be part of the "invisible web". Danny Sullivan writes: "Google has worked with publishers to gain access to some material that wouldn't ordinarily be accessible to search spiders, because it is locked behind subscription barriers."

However, not all of the information revealed by the search is available through the internet, Sullivan explains. "In some cases, the material is not actually online. Google may know about a paper only through references it has seen on other papers. In these cases, a Library Search and Web Search link will appear next to the paper or book's title."

So why show these results at all? Well, for those doing scholarly work, just being aware that a publication exists can be beneficial; they can then go and hunt the publication down elsewhere (in the offline world).

Amother feature of Google Scholar is citation extraction and analysis, whereby next to each document listed is a "cited by" link, and the overall number of citations. Interestingly, citation analysis -- which has long been a feature of the world of academic information -- is the idea that Google adapted in giving birth to its own PageRank concept.

All Google-Watchers Now

When I first started my blog, certain people weren't afraid to point out to me that I seemed obsessed with Google. I was, of course. What can I say? I was ahead of the posse! ;)

These days, everyone seems to be speculating about Google; the Google IPO; what its next moves might be; whether it is a force for good or evil; whether it is going to challenge Microsoft; whether we can trust it with our privacy; whether we should love it or hate it; etc.

In this entry, I wanted to give you a flavour of some of that speculation.

David Weinberger considers what Google means when it says that it wants "to organize the world's information, and make it accessible":

What might [the Google application of the future] be like?

It would find pages on the Web, of course, but it'd also find the ones on my desktop (Google desktop). It would know about my email (Gmail). It would know that my own photos are categorically different from all the other jpgs on the planet (Picasa). It would let me browse the physical earth (Keyhole) and show on a map the documents that talk about any particular place (Keyhole + Google Local).

And it wouldn't be just a browser. It would let me work with the information I've found: Manage my photos (Picasa), manage my desktop files, translate documents (Google Languages), shop...

Hublog, meanwhile, lists everything that Google "knows" about you:

- Everything you search for using Google
- Every web page you visit that has Google Adsense ads on it
- Which country you're in
- Every Blogger page you visit, and the referring page
If you have an Adsense account
- Your full name, address and bank account details
- The IP address of everyone who visits your pages with Adsense ads on them
- The number of visitors to each of your pages with Adsense ads on them
If you use a GMail account
- Who you send emails to
- Who sends emails to you
- The contents of those emails
- The contents of all emails received from any mailing lists of which you are a member, even if they are private mailing lists.
Even if you don't use a GMail account
- The contents of any emails you send to anyone who does use a GMail account
- The contents of any emails you send to any mailing lists of which any one member uses a GMail account
If you're a member of Orkut
- Your online social network, interests and groups

It's hard to believe a technology so new has become so important, so quickly. The same might be said of the internet as a whole. But having just celebrated its birthday, I'm glad to say that the internet is not that new -- in fact it's (just about) older than me!

Google Aptitude Test

Interested in working for Google? Try the Google Aptitude Test.

Google Explores Clustering

Google is working on a new version of its search engine that clusters search results. The company gave a presentation of its clustering technology at a recent conference.

Clustering refers to a method of presenting search engine results that is based on a (machine) analysis of document texts.

For example, look at how search engines such as Clusty (a.k.a. Vivisimo) or Mooter present their results. Some believe that the threat from these emerging engines prompted Google to investigate the clustering method.

I'm not sure that I'm too keen on clustered results -- though if it were a feature that I could toggle on and off, I'd perhaps grow to like it.

Update: It seems that Google has been trying out a "related searches" feature for a while now. Here's a screenshot.

Also, check out this GoogleClusters logo from SEO Roundtable.

Google Desktop Search Proving Popular

Google has trumped Microsoft by releasing its new desktop search tool, which allows you to search documents on your own hard drive, and returns results in the familiar, intuitive Google web page format.

That's right: now you can Google your hard drive!

The tool is easy to install and is already proving popular, according to early feedback from users. On the downside, the programme needs quite a lot of space, which can be a problem, espcially as you don't get to choose which drive partition to install to.

My guess is that Google Desktop Search may be the company's first step towards a peer-to-peer search engine that allows users to search the contents of each others' hard drives.

Google Launches SMS Service

Snippets of Google's search results available via text message. The service, called Google SMS, is currently only available in the US.

Users can text a word or phrase to Google's number, and Google will text them back with the the results of their query. So far as I can tell, this snippet will be similar to the information you would see on a Google search results web page, before you click on any of the links to go to the source sites.

It is difficult to say how useful this service would be. Google explains that if, for example, you texted "G population San Francisco" to its number (you have to put "G" before the search terms), you would get a result texted back saying "San Francisco Population. Population: 728,921 (1992); Population ...".

Now, if do a Google search on the web for San Francisco Population, you will see that the snippet is taken from the first search result. Presumably this is where the snippet comes from.

As one observer put it, this could signal the end of the pub quiz! But would it really? In my view, the snippets can't be relied upon to provide answers. For example, try typing "Ireland population" or "population of Ireland" or "population of Ireland 2004" into Google. The first result wouldn't provide any real answer.

As well as snippets, however, users can make more specific requests for dictionary definitions, business addresses and product prices (taken from Froogle).

It remains to be seen whether many people will find this service useful. Personally, I'll be surprised if Google SMS takes off. Then again, I was surprised that SMS ever took off in the first place!

Yahoo Launches Personal Search

Yahoo has launched a personalised version of its search engine.

Personally (ahem) speaking, I don't see much value for this tool, for reasons I stated when Google made live a beta-version of its own personalised search tool.

Yahoo Cuts to 10

Tidbit: Yahoo has this week from giving 20 results per page (which it had done for years, since it started) to giving 10. Why? Because Google has 10 results per page.

If you can't beat 'em, copy 'em.

Google to Launch GBrowser?

Speculation is mounting that Google is planning to launch its own web browser. The New York Post reports that "based on the half-dozen hires in recent weeks, Google appears to be planning to launch its own Web browser and other software products to challenge Microsoft."

Possible clue: certain bloggers have reported seeing an as-yet unidentified browser from the Google domain visiting their sites.

Another clue: a WHOIS search on the domain reveals that the name is registered to Google.

Some in the internet industry have been crying out for Google to create a browser -- as a serious challenger to Internet Explorer, the first since the demise of Netscape -- for years. See usability consultant John Rhodes's article on the subject, written in 2001.

More recently, Jason Kottke wrote:

Google could use their JavaScript expertise (in the form of Gmail ubercoder Chris Wetherell) to build Mozilla applications. Built-in blogging tools. Built-in Gmail tools. Built-in search tools. A search pane that watches what you're browsing and suggests related pages and search queries or watches what you're blogging and suggests related pages, news items, or emails you've written.

Personally, I think we'll see the gBrowser soon. I expect it will follow HTML and CSS standards more closely than it's Microsoft rival, and will probably come with a few bells and whistles that nobody else has yet thought of.

Google Marketing Gets Geekier

Google is as much a marketing phenomenon as an internet phenomenon. But sometimes, in its attempts to be innovative, it just gets too geeky (even by my high standards of geekiness tolerance).

Take the current example of the billboard on Northern California's highway 101.

The billboard says:

{ First 10 digit prime in consecutive digits of e }.com

This translates to:

On which page, users encounter a(n extremely feckin difficult) mathematical puzzle*. When solved, the user gets led to this page:

Sheesh. Like I said, a little too geeky, even for Google.

However, it demonstrates once more the company's uncanny ability to innovate in marketing. One single (albeit probably expensive and highly prominent) billboard, and soon there's a buzz on the blogosphere. Suddenly it's no billboard ad; it's a spreading-like-wildfire viral marketing campaign.


*Given the modest size of my brain, I didn't spend much time trying to solve this puzzle. But apparently it's a series of consecutive sequences of ten digits that add up to the sum of 49. Or something like that.

f(6) = 2952605956
f(7) = 0753907774
f(8) = 0777449920
f(9) = 3069697720
f(10) = 1252389784
f(11) = 3163688923

Microsoft Search In User Trial Phase

Microsoft's long-awaited foray into the search engine market is entering its final stages, as today it launched a live test version of the new MSN Search technology.

Until now, Microsoft search was powered by Inktomi's technology, a company that was acquired by Yahoo a couple of years ago.

Factors such as the increased importance of search engines in internet user behaviour; the phenomenal of Google; and the success of various companies in generating revenues from search-related ventures, have caused Microsoft -- just as Yahoo did a few months ago -- to launch its own proprietary search engine.

Unlike Yahoo, however, Microsoft has opted not to mix paid-for results with regular or "organic" results (see article in today's FT) -- an approach also championed by Google. (Microsoft, with desktop software remaining its core business, is less dependent on the revenue streams generated by paid-for listings than Yahoo, while Google finds other ways of displaying paid-for links, but never in its main search results.)

Another feature of Microsoft's new search is that its results are returned rapidly, with a clean page design -- again mimicking features for which Google is famous.

The big question remains, however: will the quality and relevance of MSN's search results rival Google's? Index spam is the search engine's enemy, and it is proliferating.

Early tests indicate that Microsoft has put too much emphasis on the simple use of keywords within elements such as domain names. This may make it a very easy target for spammers.

Google Goes Open-Source

... Well, sort of. Google is promising to release at least some of its code and, perhaps, other elements of its intellectual property to the general public, so that others can benefit from its R&D. "We need to have the tools out in the universities so the next generation can build on our work, too," explained Wayne Rosing, the company's vice-president of engineering. Australia's The Age, which cited Rosing (who was on a recruiting drive for the company's new regional research and development centre, which it is expected to establish in Melbourne) also reported that Google intends to triple its global workforce -- from 700 to 2100 -- during the next 12 months.

Google Pushes Site-Based Search

Google launched two initiatives this week, both of which aim to introduce its search bar to other web sites. The first, a site search offering to its network of advertisers, received a full launch earlier in the week. Website owners who already display Google "Adsense" Ads on their sites can now also have a Google search bar appear on the site. The owners will get revenue from any paid-for clicks that the search generates. Wired magazine reports: "The move to add its search feature on other publishers' sites is a logical move for Google, since many Internet users prefer to conduct queries on the websites they visit, rather than going to a dedicated search-engine site, said Joshua Stylman, managing partner of Reprise Media, a search marketing company." Yesterday, Google made a beta version of a separate but similar tool, which it calls Site Flavoured Search, available to any web owners who want to try it (i.e. not just those in its AdSense network). From its FAQ page: "Site-Flavored Google Search uses a Google search box to deliver custom web search results, based on a profile filled out by a site's webmaster. The profile reflects the content of the website, and when the site-flavored search box is placed within the pages of that site, users are able to view search results that are "flavored" to be more relevant to them."

Eight Percent of Web is Spam

Microsoft claims that, in its analysis of one billion web pages, eight per cent were spam (a.k.a. "index spam" or "search engine spam" -- as opposed to email spam. See my article in the Sunday Business Post for a more detailed explanation). I believe that Microsoft's estimate is quite conservative. I do not believe the researchers that form part of its much-anticipated entry into the search market are up to speed on what constitutes spam. At least, that's my impression after reading about how they identified spam pages: "Microsoft is incorporating a new filtering technology into its forthcoming MSN Search technology, aiming to offer results clear of web spam. "The company unveiled a research project at its Silicon Valley campus in Mountain View which uses statistical analysis to locate spam web pages." Clear of web spam? Yeah, right. Clearly, they have no clue. As long as there is email, there will be email spam. And as long as there is a web...

Are Google Users Fickle?

Bill Condie of the Evening Standard reports on the findings of a recent search engine survey: "A survey in the US commissioned by Standard & Poor's Equity Research Services found that 48% of search engine users say they use Google most overall, compared with 20% for Yahoo, 14% for Microsoft's MSN and 7% for AOL. But 60% of Google users said they would switch search engines if a better service were introduced." Interpreting these figures, it might seem that Google users are fickle. I would argue the opposite. The reason Google "stole" users from other search engines was by offering a faster, higher quality service, with more relevant results, and no ads mixed with listings. As Google often points out, its success lay in the fact that it concentrated on doing one thing -- search -- as well as it could, and by putting users first. Google is free. If another search engine were to offer users, for free, similar services of an even higher quality, then of course you would expect users to migrate, just as they migrated to Google in the first place. The intriguing thing about this survey is that 40% of users would not migrate, even if a better search service were offered to them. Now that's what I call brand loyalty.

How to Get Googled

I had an article published in The Sunday Business Post (an Irish newspaper) last weekend, entitled How to Get Properly Googled.

Google "Puffin" Takes On Microsoft

Many have speculated that when Microsoft enters the search engine business (its Longhorn Search is expected to arrive in 2005), the software giant will blow rivals Google and Yahoo out of the water. I disagree. Microsoft's core skillset and knowledge-base has developed around desktop software. That's how it saw off Netscape in the "browser wars", despite its opponent's massive head-start -- a browser, after all, is just another desktop application. A search engine is a qualitatively different beast. A senior expert at Microsoft last year reportedly conceded that search is the most difficult real-world computer science problem anyone has ever faced, before announcing that Longhorn Search would take longer to develop than anticipated. Google has been tackling that computer science problem with a team of 60+ out-and-out computer scientists (its much-vaunted PhD graduates) for over five years now. Other major search engines are, if a comparative study reported widely today is to be believed, not far behind. Another advantage that Microsoft's Longhorn search will offer users, claim the analysts, is the ability to search the contents of their hard drives. I have always been a little dubious of this proposition, since Windows users can already search their hard drives using the "search" option in their start menu. Presumably the new technology will provide a more sophisticated solution ... but do users really need a more sophisticated solution? In any case, Google has clearly moved to counteract this potential advantage by developing a desktop search application of its own. Codenamed "Puffin", details of the project were indicated in the second of two recent leaks -- or PR stunts? -- from the company (see my previous entry). I believe that desktop search is a red herring. As an intensive user, I don't see much value in it. On the other hand -- and I believe the "experts" have overlooked this -- a desktop search application that becomes popular could easily be extended, as I have suggested in the past, to become a peer-to-peer search engine. Now that's something worth getting excited about.

Google Gets (Even More) Ethical

Every time I complain about Google losing its integrity, it goes and does something admirable. This week, it posted its proposed "Software Principles," which can be summarised as follows:
  • Software should not trick you into installing it.
  • When an application is installed or enabled, it should inform you of its principal and significant functions.
  • It should be easy for you to figure out how to disable or delete an application.
  • If an application collects or transmits your personal information such as your address, you should know.
  • Application providers should not allow their products to be bundled with applications that do not meet these guidelines.
Why, exactly, has Google posted these principles? To raise awareness? Possibly, but another guess is that it may, if it finds there is a large consensus out there, decide to penalise sites that peddle such software. In any case, Google is certainly raising its profile as the self-appointed police force of the web, particularly as it emerged recently (through an employee leak) that the company has an internal "ethics committee". On the subject of what such an ethics committee might talk about, the BBC have launched, eh, a discussion about their discussions. By the way, for the user who posted a comment recently asking how to shut down his Gmail account -- why not just sell or swap it? Wired magazine explains how...

Google Sell-Out Begins?

Some would say that the Google sell-out started a long time ago. I reserved judgment, even after the Florida update. Today, I'm straining to believe that Google's integrity -- which it has (cynically?) turned into part of its brand (the company's motto is "don't be evil") -- is finally in doubt. The introduction of banner ads -- or, as Google would like me to call them, image ads -- is a shock move in what I consider to be the wrong direction for the internet's poster boys, Sergey Brin and Larry Page. Banner ads are a throwback to old media thinking. They do not embrace the web medium. They are intrusive; and web users are sophisticated enough to ignore them. Banner ads haven't worked -- that's why so many dot-com companies that based their business plans (those that actually had them) on advertising revenues went bust a few years back. That's probably why Google is touting the "image ads" moniker. Sheesh. A banner ad by any other name... Of course, Google isn't going to display ads on its own site -- just on those of other people's, most likely hard-for-cash Mom and Pop sites, who are most likely to opt in. Don't expect Google to tell these owners that banner ads create a poor user experience, and that they significantly increase page download times. The most disappointing aspect to this move is that Google has revolutionised internet advertising by introducing its short, concise, neat, targeted and highly successful sponsored links, or AdWords. Talk about one step forward and two steps back...

One-Fifth of All Searches are Consumer Queries, Claims Google

At a search engine marketing conference held in Belfast a couple of days ago, a Google representative claimed that 65% of all searches were information-seeking (or, as he put it, "people trying to educate themselves"), while 20% were consumer queries. 20%! That seems unlikely. I suspect Google are conveniently using a very loose interpretation of the data. For example, if I type in "Jamaica", Google *could* suppose that I'm looking to purchase a sun holiday, when I may simply be looking for information on the country. Of course, the people at Google have more information at their disposal -- they can actually see which link I click on the results that are returned for my search on "Jamaica". In fact, via my Google toolbar data, they can probably tell how long I stay on any site I visit, and whether or not I make a purchase! In addition, searches are likely to be more specific, containing at least two words -- "package holidays in Jamaica" is more indicative of a consumer query than "history of Jamaica". Still, I don't believe that Google has done this, especially not when it comes out with such a high, round figure of 20%. The representative who quoted this, by the way, was promoting Google Adwords, so I think it's fair to assume he had a vested interest in giving a high figure. I'd say that less than 1% of my searches are consumer queries, and I suspect the same is true of the average *regular* user of the internet. The Google rep didn't explain, either, what the remaining 15% of queries are for. I guess that means they are porn searches!

Search Engine Spam

Among the services I offer are search engine optimization (SEO) and search engine marketing. The former involves constructing a website in such a way that it gets picked up for searches on relevant queries; the latter involves advising clients on other, paid-for ways to promote a site in search engines (such as Google Adwords, etc.). Both of these services are tolerated by the major search engines -- albeit reluctantly. What search engines can't stand is search engine spam -- pages that deceive either the search engines or web users. Search engine spam is becoming a major problem, even though we don't ever see most of it, as it gets filtered out of the results. However, a significant percentage of the pages on the web are now spam. Just like email spam, many spam web pages are computer-generated, and would be considered a nuisance by the average user. The search engines have been at war with spam for some time, but the research community have lately joined the fight. Check out, for example, this paper from Stanford University, which attempts a taxonomy of spam. I have the feeling, however, that the current relationship between the search engines and spammers is just the beginning of what's going to be a long, ever-complex stretch of evolutionary adaptation and counter-adaptation...

Gigablast: Watch this Space

It's almost hard to believe that the massively impressive Gigablast search engine was built, and is maintained by, a single individual: Matt Wells. This month's edition of AMCQueue contains a great interview with Wells. But if you thought my previous entry -- about the interview with Yahoo's Search Manager -- contained lots of search technology jargon, then I should definitely warn you this time. Wells is a geek++, but he knows his stuff. Here's some of the highlights of what he had to say. On Search Engine Spam: Spam is a huge problem for search engines today. Nobody is accountable on the Internet and anybody can publish anything they want. A large percentage of the content on the Web is spam. On Google: "Google's strength is its cached Web pages, index size, and search speed. Cached Web pages allowed Google to generate dynamic summaries that contain the most query terms. It gives greater perceived relevance. It allows searchers to determine if a document is one they'd like to view. The other engines are starting to jump on board here, though. I do not think Google's results are the best anymore, but the other engines really are not offering enough for searchers to switch." On PageRank: I do not think Google's success is due that much to its PageRank algorithm. It certainly touts that as a big deal, but it's really just marketing hype ... PageRank is just a silly idea in practice, but it is beautiful mathematically. On the Similarities Between a Search Engine and a Human Brain: Now that the Internet is very large, it makes for some well-developed memory. I would suppose that the amount of information stored on the Internet is around the level of the adult human brain. Now we just need some higher-order functionality to really take advantage of it. At one point we may even discover the protocol used in the brain and extend it with an interface to an internet search engine. On Gigablast: I'm hoping to build Gigablast up to 5 billion pages this year.

Yahoo Search Guru Reveals All

For those of you who, like me, are very interested in search engines (you'd never have guessed, right?), it's worth checking out this month's edition of e-marketing news. There's a fascinating, lengthy interview with Yahoo's Search Manger, John Glick. In it, Glick describes how Yahoo went about taking all the knowledge it had gained through its acquisitions of erstwhile competitor search engines, and put it together to create a new technology, aimed at taking on Google. Glick also provides many insights into what Yahoo regards as spam, and the measures it takes to prevent it (these often differ from Google's). I don't buy Glick's attempts to explain Yahoo's new paid inclusion programme (called SiteMatch), which has all the usual fudges we've been getting from Yahoo. (See my post about how SiteMatch inevitably means stale listings.) Nor do I buy any of the claims that Yahoo's future ambitions -- personalisation and local search -- are motivated by the need to better serve searchers. Rather, they are aimed at better serving merchants (or, from Yahoo's point of view, its customers). The examples that Glick provides make this clear, as they smack of overtly commercial concern, not genuine concern for the searcher's needs: "For example, if you been looking at a lot of travel sites and you type China, then you may want information on the country. If you've just been looking through jewellery sites, or wedding venue sites and that sort of thing, you may be planning a wedding and maybe you're looking for China dishes! So it's taking that type of information. "And it's not just which product you're looking for. It's where you are in that product buying cycle ... A person who simply types in: iPod - maybe looking to figure out what they are, should they be getting into this digital music download thing and dropping their CD player for the new thing. Whereas, further into the process, they may be looking for who's got the best price on iPods. Or maybe it's about, does the store down the street have the new one in stock yet. It's taking a lot of that context." Still, lots of useful info here, plus some juicy jargon, including "hill-climbing genetic algorithms" (as opposed to "tree-parsing/gradient-boosting" ones), and "de-aliasing tables". More marginally valuable information to clutter up my small, overcrowded brain...

Website Findability: My Search Engine Optimization Book

I've finally finished my book on search engine optimization, which I?ve called Website Findability: How to Get Traffic from Google and Other Search Engines.

I got the idea to write a book on this subject around the end of last year, shortly after a spike in visitor traffic to this weblog. The surge in visitors was a result of a piece I'd written about Google's "Florida update". A lot of sites lost good positions in Google as a result of that update, and a lot of people posted anguished comments in response to my entry.

I helped some of the commenters out by giving tips as to how to improve their sites' positions. The more help I gave, the more questions I got -- soon I was inundated with emails!

Now, I normally provide search engine optimization services to corporate clients. But the people who were emailing me were Mom 'n' Pop businesses, who didn't have a budget for optimization. I wondered if there was a way I could help all these people and make a little money for myself.

That's when I decided to write a book, detailing everything I know about search optimization -- or website findability, as I call it.

I soon had a dilemma. Since my subject is a hot topic, and I'm a writer by "trade", I was confident that I could interest a commercial publisher. But even as I hacked out the first draft, changes in the search engine industry were rendering some of my text obsolete. How could I ever write a book that would still be relevant by the time it hit the shelves?

That's when I decided to fully embrace the web medium: I would publish the book myself, in PDF format, and commit to updating it every 6-8 weeks, so that the information would not go be out of date.

So my readers are happy. And, by cutting out the middleman, I'm happy.

What I really like about being the author of a search engine optimization book is that lets me help far more people with their websites than I could with my small business.

On the other hand, I might be creating a lot of competitors for myself -- since anyone who reads my book will know as much about search engine optimization as I do!

Man Sues Google for "Libel"

The battle of the search engine giants is being fought in the news as much as in the technology sector. A month ago, Yahoo was making all the headlines, but in the last fortnight, it’s been nothing but Google, Google, Google. Amidst this flurry of news, PR and gossip, most industry analysts have missed a small but crucial event that, I believe, carries great portent for the future of the relationship between search engines and private individuals. In Southern California a few weeks ago, a man called Mark Maughan Googled himself (that is, he looked up his own name in Google) and was miffed at what he found -- false claims, apparently, about him and his company*. Maughan has initiated a lawsuit aiming to sue Google. He has little chance of winning, since Google is not responsible for the content that appears in the results. It seems that Maughan does not understand very well how search engines work. Then again, who among the general population does? And there’s the rub. As search engines, especially Google, are coming to have more and more importance in our society, how can people protect their identities online? I have been making this point for some years now, and long ago made sure that any search for Michael Heraghty or Heraghty will lead to results over which I have at least some control. (I have managed to secure the top positions for these searches.) I have done the same thing for many of my clients, a service I call “online identity management.” I believe that this service will become increasingly important in the future. On the internet, our reputations really do precede us. The growth of weblogs and social networking webs are indications that people are starting to create and manage their own virtual reputations, identities and “brands”. *Of course, a search for Mark Maughan now takes you to results that are predominantly related to the lawsuit story.

Search Gets Personal

Google has launched a beta version of "Google Personalized". The area of personalised search results is seen as one worth chasing by the big players, and Yahoo are also going after this space. However, I have my doubts. When I search, I generally want to search all the web. I've never been convinced about the benefits of personalisation on the web, which have been touted for years now, but which have rarely borne fruit. Google's implementation of the feature -- you can increase or decrease the level of personalisation of the results using a slider bar -- is cool, though maybe a little gimmicky. -- Skeptical in Sligo.

Yahoo vs. Google: Arms Race Escalates

Yahoo is going all-out in its attempt to dethrone Google and become the web's most popular search engine. Adding to a flurry of recent activity and updates, Yahoo this week began beta-testing two new features that it has, eh, "borrowed" directly from its rival. The new Beta version of the Yahoo Companion Toolbar includes a feature called "WebRank", its answer to Google's highly successful (and trademarked!) PageRank. Yahoo gives users of downloading the toolbar with the WebRank feature turned on, in exchange for collecting data about those users' online behaviour (the sites they visit; how long they spend there; etc.). Those who do not wish to pass on such information can download the toolbar with WebRank turned off. This is exactly the same strategy Google has in place with its PageRank feature. Another new feature at Yahoo is its Beta News Search, a feature that (surprise, surprise) is quite similar to the popular Google News Search (which is also in Beta mode, incidentally, despite being live for around two years now). Google, meanwhile, has responded with a new Beta test of its own, one that is not found on Yahoo. The new search by location feature is called Google Local Search. Currently, the local search applies only to US queries, but Google intends to roll it out to other countries and regions over time.

Mediajunk On "Your" Yahoo

In its recent efforts to out-Google Google, Yahoo has added some innovative features to its search engine. One of those is the identification of RSS feeds associated with search results. For example, if you do a Yahoo search for mediajunk, you will see the following below the listing for this site: RSS: View as XML - Add to My Yahoo! [Beta] The "Add to My Yahoo" option allows you to add the syndicated summary version of this site to your My Yahoo page (which typically contains news headlines, TV listings, etc.). All part of Yahoo's strategy to to "personalise" the search experience, we're told. Can search really be personalised? Stay tuned; that's another day's rant...

Google vs. Yahoo -- at a Glance

Now that Yahoo is using different search results to Google, those interested in how their websites are performing for certain search queries will have to compare results on both engines. has created a great little utility to do just that -- in a single, simple interface. Booked and marked!

Yahoo Introduces Own Algorithm, But Results May Prove Stale

Search engine watchers aren’t just trainspotter types (honest!). Search has become the single hottest sector on the internet, with the advent of search-based marketing and advertising, and the global appeal of search brands. Google, more than any other engine, has made search sexy. The search sector has seen rapid conglomeration during the last year, as competition intensifies. Microsoft is threatening to enter the fray and, as anticipated, Yahoo finally dumped Google as its search technology provider this week. No longer will a search on Yahoo produced the same results as a search on Google. What few expected, was that Yahoo would build an entirely new search algorithm. After all, Yahoo owns Inktomi, a company that provides search technology for MSN, AltaVista and others. It is too early to say whether Yahoo’s new search results are as high-quality – that is, as relevant to search queries – as Google’s. Relevancy is not the only factor that will influence the battle of these two search giants. Another is the non-trivial matter of the pay-for-inclusion (PFI) service that Yahoo will soon offer. PFI is not the same as paying for a Yahoo sponsored listing; nor is it the same as paying for a listing in the Yahoo directory. Rather, PFI allows a customer to pay for a site to be quickly submitted to by crawled by the Yahoo spider. PFI does not guarantee that the site will appear highly – or even at all – for particular searches in Yahoo’s results. For example, let’s say I noticed that was not appearing in Yahoo’s results and, by looking through my site visitor statistics, I confirmed that Yahoo’s spider (which, incidentally, was recently renamed “Slurp”) was not crawling my site. I may then decide to pay for inclusion. This would guarantee that Slurp would visit my site, and return regularly to check for updates. If I chose not to pay for inclusion in Yahoo, on the other hand, I could choose simply to submit to Yahoo for free. My site may still appear top of the regular (non-sponsored) listings in Yahoo – but not before Slurp has crawled it. If I were to update Mediajunk (as I do regularly), I may have to wait – who knows how long? – for Slurp to return, so that my updates are reflected in Yahoo’s listings. By contrast, Google crawls as many URLs at it can on a regular, almost daily basis – and for free. Google does not offer a PFI service. Why do I think that resisting PFI will ultimately work out to Google’s advantage? Because Google’s search results will be fresher than Yahoo’s. After all, the web now contains several billion pages. Only a tiny fraction of those (let’s say 0.05%) will be submitted to Yahoo via its PFI service. Thus, only 0.05% of web pages will be regularly crawled by Yahoo; the remainder will be crawled less frequently. The result: search results that look out-of-date. After all, if the majority of results aren’t stale, what advantage does PFI give? It will certainly be interesting to see how Yahoo’s results look several months from now.

Searchers Growing More Sophisticated

As we become more web-literate, our search habits are changing. According to OneStat, users are now much more likely to enter three-word keyphrases into search engines than at any time in the past. Studies by the internet marketing analysis company reveal that the typical breakdown of search types is as follows: 1. 2 word phrases -- 32.58% 2. 3 word phrases -- 25.61% 3. 1 word phrases -- 19.02% 4. 4 word phrases -- 12.83% 5. 5 word phrases -- 5.64% 6. 6 word phrases -- 2.32% 7. 7 word phrases -- 0.98% As users become more acquainted with search engines, so the engines will become more sophisticated. This happens with all media. For, example, if you could travel 80 years into the past and show someone a modern-day movie, they wouldn't understand the cutting, and other techniques that we take for granted. Of course, this "literacy" process is happening quicker with the internet than with other media. The medium will continue to evolve, especially those who are in school today take up interent-related jobs.

Why Microsoft (Still) Doesn�t Get the Internet

At the recent World Economic Forum in Davos, Bill Gates conceded that “Google kicked our butts.” He did this with all the confidence of a man who believes that, despite losing initial battles, he will ultimately win the “search wars” (reports John Markoff calls them in today’s New York Times). Gates certainly has precedent on his side. Microsoft (incredibly!) failed to see the potential of the internet -- and the browser as a killer app -- at the beginning of the 1990s. Nevertheless, the Seattle-based company managed to dominate the browser market by bundling the browser free with its OS, a strategy that led to an anti-trust suit. Many observers believe that Gates will be able to employ a similar strategy to see off Google and dominate the search market. Microsoft’s forthcoming operating system, Longhorn, will have built-in search functionality intended to surpass Google’s technology. However, the search wars differ from the browser wars in significant ways. A browser is desktop software. Microsoft was able to quickly build and release a browser (Internet Explorer) that offered more or less the same functions as Netscape, because desktop applications were within its core strength. But an internet search engine is no desktop toy. It is a complex set of functions, performed remotely. Search is, according to a senior source at MSN search, “the hardest computer-science problem [Microsoft have] ever had to face.” (See the Mediajunk entry entitled Search Engine Pack Chasing Hard). Google has achieved a significant head-start on this computer science problem. And, even as Microsoft struggles to play catch-up, Google will continue to develop its considerable knowledge of how to give searchers relevant results. Despite Gates’s bravado at Davos, the media failed to pick up on what may have been his most significant statement: that Microsoft would deliver a better, next-generation internet search engine “as early as next year.” (Quote: Yahoo! News.) Early? Microsoft had been promising that Longhorn would be delivered by the spring of 2004. By next year, Google will be out of sight. Google is a brand, a destination -- a website. Users will ignore the add-ons MS includes in its new software, so long as Google is delivering the best results. Microsoft just doesn’t get it. Then again, it has never got the internet. Remember all the hype about .Net and this wonderful vision of the company’s web presence that none of its founders could quite articulate? That the company is being accused of human rights abuses (Observer) -- by aiding the Chinese government to censor the web -- indicates how far removed it is from the culture and spirit that powers the internet. No wonder, either, that it was the laughing stock of the blogging community last week when, in an official press release on how to avoid malicious websites, Microsoft urged surfers to type URLs directly into browsers -- rather than clicking on links! Google may not be perfect. It may be the next Netscape. It may even be the next Microsoft. But it has, most likely, already won the search wars.

Yahoo Dumps Google

"Yahoo is poised to end its long-time relationship with search-results provider Google," reports Keith Regan in today's E-Commerce Times. "The breakup has been expected for a while, as Google has become more of an all-around threat to Yahoo by offering paid search listings and news and shopping searches." Indeed. While the move will be a financial blow to Google, Yahoo's reputation as a search engine may well deteriorate, as Inktomi's (who will supplant Google as Yahoo's search provider) results are generally considered less accurate/relevant than Google's. It will be interesting to see if a reduction in quality of Yahoo's search results lead to a migration of traffic to Google. In the short term, this is unlikely, as Yahoo's users are probably loyal to its brand. But brand loyalty on the internet tends to be short-lived. Ultimately, ongoing improvement in Inktomi's results will be needed to maintain Yahoo's reputation as a major search engine.

Mediajunk is No Longer Updated

Visit Michael Heraghty's current blog at User Journeys


Mediajunk was Michael Heraghty's blog from 2002 to 2010, with articles on usability, UX, SEO, web design, online marketing, etc. More »

follow me on Twitter