Google Category Archive
After so many Google features and products have gone to the Google Graveyard, it is nice to see new search features made available. Yesterday, Google announced on G+ that two new image search filters have been introduced: animations and transparent backgrounds. The filters are available both on the Google Advanced Image Search page (a pleasant change in that the advanced search pages have been hidden and ignored of late) and as Search Tools filters on image search. The screenshot from the advanced search shows both. The Transparent option is available under the Search Tools "Any color" drop down menu and on the advanced search "colors in image" choices. It restrict results to those with a transparent background.
The Animated filter is available under the Search Tools "Any type" drop down menue and on the advanced search "type of image" option. It restricts results to animated gifs. I have found this ability to limit to animated gifs useful at times. With the resurgent interest in animated gifs at Tumblr and other image intensive sites, there are many being created again, and so Google has added a way to find just animated gifs.
Previously, the only other option I have found for finding animated images are the old PicSearch (which has an "animation" filter on the left after running a search) and the newer Giphy (which is more limited to searching by tag and covers a much smaller selection of images). I am amazed that PicSearch, which seems relatively unknown with low traffic, has continued to be available for all these years and still offers the animation limit.
Google has a much larger databases that Giphy or PicSearch, so the ability to search its larger database and to be able to restrict to animated gifs or images with transparent backgrounds is a nice new feature. I expect that I'll use bother several times a year, so for me they will not be common search features, but nice to have when I need them. The thumbnails in the search results display are not animated. However, clicking on an animated result will open an inline image display, and (if you wait long enough) the image within that dark grey inline display will start its animation.
There are two main types of animated gifs. The older ones were mostly clip art style animations, and often smaller dimensions to keep the file size small. Many of the newer ones are larger, pixel-dense photos, and thus are more larger file sizes. Since the animation limit is under the type section, there does not seem to be a way to combine "animation" with either "photo" or "clip art" for further narrowing by type.
Google Scholar has announced an update that brings Scholar more in line with other Google properties. Google Scholar is still not linked from the main Google page, but now (if you are getting the new look which is still being rolled out) Scholar has made a couple of changes to bring the look in line with the other Google revisions.
The main changes:
- The limits which had been along the top of the results are now on the left side
- Advanced Search is now hiding behind the small triangle in the search box like in News (see my older post on Advanced Search Changes)
The screenshot below, from the blog post, highlights these changes.
Somehow I missed the announcement last month from Google which included a note that Google has started indexing more punctuation marks. Based on this announcement, it looks like we can start searching with all of the following (some of which have been possible for years, like the dollar sign): %, $, \, ., @, #, and +.
It does not seem that Google has fully indexed all pages with these punctuation marks, but as it continues to refresh its indexing, more content should be indexed with these symbols and thus findable via a Google search. Already this change makes it possible to search for social tags like @bios and #cildc. I find it interesting that the top result for the last example is a link to a Twitter search for that hash tag (which by this point finds only 3 tweets). A much better search option is Topsy which claims over 1,000 tweets for the hash tag, but Google's new support of punctuation means its search finds the hash tag on other web pages. I need to explore more of these options, like using the + for the blood type ab+ or even b+ (which finds more uses than just blood type). Beware using old searches where the + is supposed make the term an exact match. Now the search may find a Google+ user instead.
I have never understood why Google supports the $ for searching prices but still does not support other currency symbols like the Euro, Pound, or Yen. But maybe this expansion of searchable punctuation will lead the way to more punctuation search capabilities.
If a search using one of these punctuation marks finds no results, Google offers the search results without the punctuation and links to a new help page on punctuation searching.
Google News has announced a few changes:
- "Realtime" news
- Larger images on main page
- Integration of Google+ content
But no, this is not a new version of Google Realtime search that searched Twitter and other social networks and was killed when Google was unable to renegotiate access to the Twitter feed.
After panning Google earlier today for non-search changes like the addition of Play and disappearing search tools experiment, I should give them credit for a potentially useful search results change. Barry reported on SearchEngineLand that for some results from the Google Discussion database subset (web forums and Google Groups), that the results will include a new "Top Answer" tag line that highlights a potential answer to the search question. Searchers may not see this often. Clicking the Discussions database on the left will bring up more, but not all discussion records have it.
In addition to what Barry found, below is an example I was able to find when in "Everything." If you have not looked in the Discussions recently, also take a look at some of the other information that Google makes available on some Discussions results just under the URL like number of posts, number of authors, and date. While this information is not obvious on a quick look, I often find it very helpful in deciding which results to view.
Google Play, the new combined online Google store for music, books, and Android apps, is now being featured prominently in the top Google bar, highlighted with a "NEW" superscript. Clicking the Play link does not transfer a search that you have already done in Google (which is what happens with Images, Maps, and YouTube) but just opens the main Play page for buying something from Google. Sorry Google, but this is not a search-related enhancement. How about adding Scholar to the bar or at least the drop down list under More?
Meanwhile, Google OS shares a report and video of a royally stupid Google User Interface experiment: a collapsible search sidebar, further excluding the links to databases and search tools when Google already only shows some of the options. If I remember correctly, Google tried this experiment a few years ago when the sidebar was new. It stunk then and stinks now. If you want users to have these options, make them viewable! Better yet, give us the option in search settings to always have both the databases at the top and the search tools at the bottom fully expanded.
Remember when Google use to make improvements to search?
It is always interesting to hear people's rationale about leaving or joining a company. Yesterday a former Googler hit hard with "The Google I was passionate about was a technology company that empowered its employees to innovate. The Google I left was an advertising company with a single corporate-mandated focus." James Whittaker criticizes Google's heavy-handed move into Google+ and social networking to increasing ad revenue via learning more about its users.
It is an interesting and seemingly very honest soul baring. "Google+ and me, we were simply never meant to be. Truth is I've never been much on advertising." He ends by giving Google a -1.
James is not alone. The recent Search Engine Use 2012 survey from the Pew Internet and American Life Project reports that "73% say they would NOT BE OKAY with a search engine keeping track of your searches and using that information to personalize your future search results because you feel it is an invasion of privacy." See also Danny Sullivan's analysis of the report.
On the other hand, I can see where Google's new emphasis and concerns originate. Facebook is changing the way the web operates. James Fallows' article in The Atlantic "Facebook, Google, and the Future of the Online 'Commons'" provides a great overview of those issues. If you have some time for search engine reading, this combination of links will likely pique your interest.
Back in November, Google launched and then withdrew a new advanced search page. The new version is back, if you can find it, and it is improved over the buggy November version. I do like the new look and the removal of the need to click the + button to get the full advanced options, but I am sorry that the link and related search options have been removed. Instead, Google only offers links to how to use the link: and related: commands. I still miss how the old form would show the commands at they are selected from the boxes below.
It is not only Google's web advanced search form which is changing. Other Google databases (Books, Groups, Patents, Scholar, Shopping, and Video) still have the older interface while Images has the new one. News takes a very different approach while Google services likes Documents, Gmail, and Calendar have an even more different approach. My latest "On the Net" column "Advanced Search in Retreat" (in Online 36(2): 43-46, March 2012) discusses these issues in more detail, and is available online for free.
Some other search engines still have advanced search forms while other newer ones often do not (DuckDuckGo,Blekko, Wolfram|Alpha, Quixey. Speciality search engines like Hulu have sophisticated advanced search pages. For advanced searching fans, here is a list of links.
Google Advanced Search Links
Other Advanced Search Pages
Speaking of making Google Advanced Search hard to find, guess what happens when I try a Google search on advanced search? The Google Canada Advanced Search page ranks as #1 followed by other advanced search pages, but NOT the main Google Advanced Search page!
Google Operating System reports in Bring Back Keyword Highlighting to Google Cache that logged in users have lost the ability to have search words highlighted when looking at Google's cached copy of a web page. "For some reason, Google's caching feature is more and more difficult to use. The "cached" link is hidden inside the Instant Preview box and it's no longer available in the mobile interface. Now the keywords from cached pages aren't highlighted if you are logged in."
I had not realized that cached links were gone from the mobile interface. While I don't always use the highlighting when looking at cached pages, if needed, at least one option is to re-do the search in a non-logged-in browser or use the URL modification trick mentioned in the blog post. Since no keywords are highlighted, the header line of "These search terms are highlighted: . . . " is also missing. Also, as a reminder, the cache used to mention which search terms only showed up in links to the page. Now if you are not logged in and a search word does appear on the page, it is just not mentioned.
Google's expanded synonym operator, the tilde (which must be right before the word with no space as in ~cancer), can be used in some interesting ways. Garrett French's "The Link Prospector's Guide to the Tilde" from yesterday gives several ideas of use from an SEO (search engine optimization) perspective.
To review, unless using Google's verbatim search function (in the left side bar) or putting all search terms and phrases in quote marks, Google will search for the term entered plus grammatical variants and some synonyms. For example, searching art teaching statistic finds results with matches on words that were not entered like teachers, education, statistics, arts. Adding the tilde in front of a term makes Google look for even more synonyms. Changing the search to ~art teaching statistic also finds more synonyms for art including gallery and design.
It is not always easy to see the differences since exact matches may rise to the top, but some of French's ideas make it easier. He notes that you can use the ~ along with field prefixes like inurl: and intitle:. Since both titles and URLs are visible in the search results, it can help to see the synonyms, especially if you also NOT out the actual term. For example, intitle:~cancer -intitle:cancer shows that the synonyms include disease, leukemia, pain, and tumor.
Back in November, Google started launching a new interface it dubbed the Google Bar. At that time, in my comparison between the old and the new versions, I was dismayed by the removal of several databases from the drop down list. Over the months since that announcement, the new Google Bar was only very gradually being rolled out, and many users never saw it. Then, last week, Google announced a retreat. Instead of the new drop-down-from-the-logo Google Bar (like in the image if you'd not seen it), they are going back to the black menu bar at the top "with a consistent and expanded set of links running across the top of the page."
Unfortunately, the links are consistent with newly abandoned design in that the following databases and product are lost from appearing in the top black bar and from the drop down under "More."
In addition, Video is now down in last place on the More drop down and YouTube is featured instead in the top bar. Google-owned YouTube is certainly the dominant web video provider, but this will make it less likely that Google users will find non-YouTube content. How much of a difference is that?
The database loss means that it is now more difficult and time consuming to take a search in Google and switch it to Scholar, Groups, or Blogs (all three databases that I use somewhat regularly, especially Scholar). You can still get to the Blogs database by switching to News (but you have to use the link on the left rather than the link in the top bar which just goes to the News home page) and then once the news results display, look for the Blogs link under All News on the left.
Does this mean Scholar may be destined for the Google graveyard or just that Google finds that most Scholar users do not start from a Google search and then switch to Scholar? We'll need to wait and see. What's gone or more hidden?
- Search switching to Google Scholar and Groups
- Video search demoted
- Google Sites link buried under "Even more"
- The "Web" database is now just called "Search"
So what is being featured instead?
- Mobile (which is just a direct link to the ad for Google Mobile, it is of little use to computer users)
- Music (not a searchable database but another ad for another recent Google product)
- Offers (another ad for a newer Google product attempting to compete with Groupon and LivingSocial)
- Wallet (ad for new, recently-hacked Google payment scheme)
- Blogger (a long-time Google service not previously available from the main menus)
It seems like a continuing move towards increased Google product promotion and a diminished role for searchable databases.
Big changes are coming to the Google Advanced Search, but they have been intermittent in appearing and we may not yet see the final format. Right before Thanksgiving (here in the U.S.) there were numerous reports of the new advanced search but the appearance differed slightly (some had the descriptive text on the right and some did not, perhaps depending on browser and/or device). There were complaints about the loss of drop down menu functionality, the removal of the link search box, and many other unhappy advanced search users. The change seemed to roll in for most U.S. users at least between Nov. 19-23.
Then Google changed back to the former advanced search on Nov. 25 or 26. I have been checking since then to see if the new version might re-appear, and yesterday in one of my browsers (Internet Explorer 9) it came back (but not yet for me in other browsers or on other computers. At least there are now links at the bottom that mention link and similarity searching, but they just link to a page explaining how to use the link: and related: operators.
The new design looks more like some of Google's other newer designs and is perhaps more similar to a tablet interface. However, when I try actually using the new advanced search page, sorry Google, but it stinks. There are lots of little design and usability changes that make it more difficult to use, and it is definitely less instructive.
Tuesday, Google announced a new interface for its front page and search results, removing the black bar at the top with links to other databases and the options gear and moving those choices to a mouse-over, drop-down list that from the Google logo. I still do not see the change on any of my computers at home or on campus, but thanks to a trick posted on the Google Operating System blog, you can make the new interface appear by setting options in a cookie. It sounds like the intent is to make this a common interface across all Google properties. As Google says in the announcement, their goal is "making navigation and sharing super simple for people." (Note the addition of "sharing" which means that Google+ links and notes will be more prominent).
So I tried the new interface and compared it to the old. How well does it work for searchers? The old black bar at the top and its previous incarnations have made it easy for me to switch searches from one Google database to another. The new bar sort of works that way, but it has added several new links (none of which I use often) and removed several others (including those that I do). What has changed?
Some wording is different: "Web" is now "Search" and "Gmail" is just "Mail." But several databases have been removed:
The gear icon in the upper right hand corner of the home page that links to Search Settings, Advanced Search, and Language Tools is also gone from the new home page design. The Options gear shows up once you do a search. I don't mind not having Sites, since I don't use that, but switching a web search to a search in the Scholar, Blog, or Groups databases is a royal pain. Especially as an academic searcher, the loss of Scholar is significant. The Blogs database is available on the left if you expand the databases, but neither Scholar nor Groups is available there. See the screencast below for a comparison of old and new.
Exploring this change made me realized that Google really offers two places to search images and two places to search video. You can transfer your search from the Image database to "Photos" which gives you the results from Picasa Web Albums without requiring a log in. Note how the Video in the new interface has been moved to the end of the list. Again, you can transfer a search from the Video database to YouTube (for only YouTube videos and different ranking), but you cannot yet transfer from YouTube to the larger Google Video database.
And what's up with the News link? Why does that not transfer the search terms but instead just go to the Google News home page? Clicking the News link in the left margin works properly. Oh well, since I'm using a hack to get the new interface, perhaps these are all bugs that Google will fix before it is fully rolled out. Or so I hope.
Google News has new search settings available that provide several new features. It lets searchers choose to specify certain news sources for exclusion or from which to get more or less news. In addition, for those news sources tagged as (blog) or (press release), searchers can also request exclusion of those whole groups, or fewer or more from either. While the blog limit has been available in the left hand margin for awhile, there is not other location to limit to press releases or to exclude them (the advanced news search does not have these options). Some are concerned that these new settings will cause users to exclude blogs and press releases (see Danny's comments on this in his Look Out Blogs: Google News Gains Options To Drop Blogs & Press Releases post).
Want to see these settings? Unlike a Google web search, where search settings can be saved as a cookie and do not require the searcher to log in, the new News settings require first logging in to an account. You also must be using the U.S. version.
While I am not yet sure if I'll use any of these settings on a regular basis, there is another great reason to log in and explore these settings. It finally provides a way to search within the sources for Google News. While others have compiled various lists, such as the recent list of sources compiled at Digital Inspiration, they often list sources only by their URL are typically incomplete (I checked four small news sources against that list and found at least one missing). Within the news search settings, just start typing the beginning of a source name (not just the URL), and a list of up to ten potential matches appears. Want to know if a news source is included in Google News? Log in, go to the News Settings, and start typing the first letters of the source name (and try any possible variants as well.)
See below (or check full post) for a 2.5 minute screencast of how some of these options work.
Back in December, when Google first introduced its reading level limit on the advanced search page, I thought it strange that it appeared there and not in the "search tools" section of the sidebar since most of the new advanced search features were showing up there and not on the advanced search page. So after complaining about the reading level limit not being in the sidebar at last month's Computers in Libraries conference, and then discussing it with Tasha Bergson-Michelson (a Search Education Curriculum Fellow at Google), I was pleased to see the addition of the reading level limits to the sidebar today. First noticed last Friday at Google OS, it seems to be rolled out to all of the computers and browsers that I'm using.
It has been interesting to watch much of the coverage today, by many who think that the option is new. I guess that does reinforce my opinion that few people notice the options in the advanced search. At least a few more seem to notice it when it is in the sidebar, even though searchers still need to click on the "More search tools" link to see them.
If you've not used the reading level limits previously, just run a search, click on the "More search tools" in the left sidebar, and clicking "Reading Level." This will have the same result as using the advanced search page and choosing "Annotate results with reading level." It will display a percentage analysis of the results divided into basic, intermediate, and advanced levels and tag each result just below the title with a note identifying the reading level of that page. To limit results to just one of the reading levels, click the the level by the bar graph at the top.
Wondering why some Wikipedia articles show up as "advanced" while some university pages are labeled "basic?" Take a look at Daniel Russell's (a Google developer) explanation
Along with the new Reading Level search tool, Google has also added a "Dictionary" search tool. Google has had a long history of connecting search words to dictionary definitions. Searchers used to be able to click on their search terms in the now-gone blue bar at the top. The words (or the definition link) went to dictionary.com and then in 2005 to Answers.com. By the end of 2009, Google was using its own definitions, including automatically generated ones from the web. Now, the new dictionary search tool may retrieve information in these various sections:
- Definitions (probably from Oxford Pocket Dictionary of Current English)
- Web definitions
- Usage examples
- Related phrases
- Related languages
The fine print at the bottom states that "The usage examples, images and web definitions on this page were selected automatically by a computer program." The Web definitions seem to match the results searchers can get from using the define: operator (compare define:library with the new results). Of course, the librarian in me would like to know where all the other sections originate, and it would be even better to have citations to the sources on the page. For someone looking for usage and definitions, this is a useful source, but if it is a student that needs to cite the source, it is a problem, especially since Google might change it at any time.
The older Google Dictionary which used to be available at http://www.google.com/dictionary seems to be gone (redirecting to the new service), except in a Google cache copy for now and in this screenshot. I'll be curious to see how long the new Google Dictionary lasts.
About a year ago, Google killed off its SearchWiki experiment (not long after Jimmy Wales' WikiaSearch was scuttled) and introduced the ability to star results. SearchWiki let users block results, comment on results, and raise or lower rankings for specific pages. With the reintroduction of blocking, the continued use of stars, and the new +1 option, many of the SearchWiki features are back. Will more searchers use these new capabilities than used SearchWiki, and will they last any longer?
Promoting Results: +1
Last week Google introduced a new Google Labs Experimental Search option call +1. Read and see more about it on Google's +1 page, but for now at least, you must jump through several hoops to even see this option:
- Log in to a Google Account
- Be sure you have an upgraded Google Profile set up (according to the blog post)
- Go to Experimental Search and Join the +1 button experiment
- Then search Google and you should (may) see the +1 button after each result
According to Google, +1 is "browser-specific. . . . Also, it may take a while before you see the button in search results, and it may occasionally disappear as we make improvements." While the +1s are potentially going to be public, I have yet to see signs of that. If you have a Google Profile, you can see what pages you have added as +1s in the Profile. But it is very much still experimental. I first +1ed several of my sites from an account in Firefox, after joining the experiment as seen in the screenshot below. The "You Shared This" is separate, coming from sites in my Profile that I set up rather than from +1. If a page is +1ed but not in the profile, it will say "You +1'd this."
Stars and Browser Variance
Today, when I logged into the same account in IE9 (again after joining the +1 experiment), instead of seeing +1, the three sites showed up as starred results and there are no +1 buttons. Clicking the "Starred results for. . . " did not give me a list of any of these sites in my Google Bookmarks, although when I check my Profile, I could see that the +1s were listed, even in IE.
Also, note that while the star appears for the site which I clicked a +1, it does not appear for the next hit. So stars are still not showing up in search results, and you'll need to use another way to add Google Bookmarks. The stars for unbookmarked sites are gone for me in Chrome, Firefox, and IE, and the +1 works in Firefox and Chrome (at least today).
As to the blocking, kudos to Mary Ellen Bates from whose March InfoTip I finally figured out how to consistently find the blocking option. You first have to click on a search result and then go back to the Google search engine results page (and be logged in to Google). Google actually stated as much in its initial announcement, "when you click a result and then return to Google, you'll find a new link next to "Cached" that reads "Block all example.com results," but many of us missed that part, myself included. Remember that you add pages to the block list by going direct to the Manage Blocked Sites page without first running a search.
When blocking a site after running a search (on all three browsers) I sometimes see the little explosion cloud that also used to be featured in SearchWiki.
I still do not always see the "Block all . . . " link even after visiting the site and returning, but it appears most of the time. Nor does it appear for a site that I visited recently. In other words, if I click on the first hit, go back to the results, then click on the second hit, and go back to the results, I only get an option to block the second, not the first.
Who Will Use These Features?
Given all the requirements to even use most of these resurrected SearchWiki features (log in, join the experiment, have a Google Profile), my hunch is that (like SearchWiki) only a small minority of searchers will ever even see these options and fewer yet will choose to use them. There is a small portion of users that will at least try these features, as I have, and I hope they find them easier to use than I do. But after trying them, blocking a few very annoying sites and adding a +1 to some favorites, what is the incentive to continue to use features. Danny makes a great case that the +1 is Google's answer to Facebook's like button, but I think that Google needs to find even more compelling reasons to get searchers to use these. Time will tell!
After several more days of searching Google while logged in and not seeing any option to block, I finally saw it today, once! As you can see from the screenshot below, a search on 'glog' gave me a "block all results from . . ." message for glogster, but none of the other sites listed had the block option. I've highlighted the one I saw and the spot the others should have been. This was in Firefox. I tried several more searches, and none of the results had the block option. When I went back to the 'glog' search, the block option was gone. So I've not had the chance to try clicking on the one block link I've seen so far, but maybe the next one. So is Google slowly moving towards implementing the block option to more users, or do I just get a teaser once in awhile?
While I am still not seeing the block sites option at Google when logged in to any of my accounts in any browser, there is a way to force Google to start blocking sites, even if you do not see the "Block all whatever.com results" link after a search result.Since I've not yet seen it, the example Google gives looks like this.
If you don't see that on your results, just go to the Manage Blocked Sites page once you are logged in. You can enter up to 500 sites into your personal block list. Google asks for a reason, but that can be left blank. After entering eHow, I tried a few searches and then found at the bottom of the results, the message letting me know that I blocked a result along with link for showing the blocked result and for managing my list.
I'm not sure how much I will use this, now that I have found a way to do so. First, I do not expect it to last for too long. Maybe Google will keep it for a few years at most, unless many searchers keep using it. Second, if it does last, a site that I block this year may well have very different content a few years hence, and I'll probably have forgotten that I've blocked it. In addition, at this point, the Manage Blocked Sites information is not linked from Search Settings or Account Settings, so I'll need to get a message like the one above to even remind me that it is there (or else I'll have to come back to this point and see if I still have blocked results).
Over the past week has come news of Bing and Google using whitelists, speculations of blacklists, Google's announcement of site blocking by signed in users, and the disappearance of the bookmarking stars which replaced SearchWiki. Just to confuse matters further, depending on which Google version you use, browser version, and apparently the toss of the dice, you may or may not be able to even see some of these changes.
So I'll try to summarize my understanding. First, Danny reports that at the SMX West conference last week, "both Google and Bing said they have "exception lists" for sites that might get hit by some algorithm signals." In other words, some of the spam identification algorithms that find and demote lots of sites in the rankings may also demote sites that were not supposed to have gotten hit. Those sites get put in the whitelist for that particular algorithm. An article from The Register claims that this contradicts statements from Google in certain lawsuits and continues with examples of similar actions that look like site blacklisting.
Most searchers can hope that the blacklisting and whitelisting improves search results and removes irrelevant hits, but if we don't know what has been listed, it is hard to determine how true that is. On the same day, Google announced a new ability for logged-in searchers to block results from specific domains. Don't like see so many eHow results? Just block them all. Or so the the theory goes. Not only must you be logged in to a Google account to have the option, but you have to use Google.com in English with the right browser. According to the announcement on 3/10/11, "The new feature is rolling out today and tomorrow on google.com in
English for people using Chrome 9+, IE8+ and Firefox 3.5+, and we'll be
expanding to new regions, languages and browsers soon."
Maybe. Karen has some screenshots in her post from a few days later, but I have been trying since the announcement, using the right Google in the right language with several supported browsers and different accounts. Here in Montana, I still can't see it almost a week later. I've turned Instant on and off. I've tried from home and on campus. Still no luck. Comments on Karen's post also note both that others can't see it and that it may go away. On top of that, the stars have been turned off that allowed signed in users to star (bookmark) certain results so that they'd come back at the top of results pages later. The bookmarked pages remain bookmarked, but for now at least, the ability to star new ones has vanished. Or as Barry puts it, you "You Can Hate (Block) But No Longer Love (Star) Google's Search Results." For me, that means I can now neither star or block results.
In the past few weeks, both Bing and Google have announced changes to their social searching. With Blekko also having social searching via Facebook connect, I thought I'd compare how successful and useful I found each of the three, and explore the new announcements in a bit more depth. First though, for more searchers, social search may be a waste of time. If you do not have a large Facebook network of friends (or don't use Facebook), avoid Twitter, and have not built a social network, there is nothing to search. Or if you have a large Facebook or Twitter network of friends and family, but you want to search professional topics that are not of interest to those friends and family, the social search results will offer little but amusement, if you even see them.
But if you do have a social network and are interested in searching the "likes" of Facebook friends, Tweets, or posts in Google Reader, read on to see what's new and how to find the social results.
Several search engine watchers and forums have noticed how web page titles that Google is displaying in its search results are different than the actual page title. This seems to be a change that goes well beyond the traditional scope of why Google might change the page title (such as a missing title element, a different title in the Open Directory, or malformed HTML.
Over the past few weeks, I have noticed similar issues, and what surprised me was that Google will change the title in the search results for the exact same site depending on the search query. As Christopher Skyi comments (see 2/18/11 comment) "It appears Google is constructing the SERP on the fly as a combined function of the user's specific search query and what's on a ranking page."
Here's an example from my employer, Montana State University. I was checking out the Wikipedia entry which still gives the name of the university of Montana State University-Bozeman (even though the '-Bozeman' was dropped several years ago). Search either montana state university or montana state university-bozeman and the first results is the MSU web site. But the title that Google lists depends on which way you search it. (And for at least the past four years, the HTML title on the page has been <title>Montana State University</title>.)
So for some searches at least, it seems that Google will display a title for a page based in part on some of the words in your search query.
Yesterday Google announced the launch of its new Recipe Search which is also been added as another database in the left side bar. Initially, this will only be available on the U.S. and Japanese versions of Google.
Run a Recipe Search or choose the "Recipes" option on the left side, and Google provides specialized search facets on the side for ingredients, cook time, and calories, as in this example from a search for cherry rhubarb.
This should prove to be quite useful to many cooks, but what I find so fascinating is that Google is creating this database from sites that use Google's rich snippets markup which depends on content creators using the special coding on their pages so that Google will include the pages in Recipe Search and be able to better parse the fielded meta data in the recipe.
Searchers: this means that Recipe Search will also exclude many recipes that are not marked up with the special tagging.
Last week, Google rolled out a new top toolbar, as confirmed by SearchEngineLand. It is supposed to be few pixels shorter than the older toolbar and is a bit more colorful and interactive.
On the top left side, note the new darker blue bar above the current database (Web in this example). New is a background color change when you hover your cursor over one of the other databases (as shown with News in this example). The old toolbar had no hover color unless you choose something from the "more" drop-down when the color background went dark blue. At this point, only the following databases use the new toolbar:
The old toolbar is still seen at the home page for these (but after doing a search you may see the new one):
More significant changes occur in the top right section, which changes depending on whether or not you are signed in to Google. The Search Settings, iGoogle, Web History, and Sign Out links are now more hidden and each take an extra click to get to them.
Previously, when not signed in, you would see the following at the top right:
Here's the new version, after clicking on the gear icon to display the options:
After running a search, iGoogle is replaced with Web History. This change has removed all these links from the visible page, so if you want people to try changing their search settings, you must first tell them to click on the small gear icon in the upper right corner
Why is Google making these changes? Putting these options in the drop down menu does not increase screen space. It just removes links that I am guessing are infrequently used. It also allowed Google to add Privacy (even though that is already linked at the bottom of every page).
Perhaps more telling is the removal of the Sign Out link. Once you have signed in, you used to always see a sign out link in the upper right section. Now, searchers need to click on their name to see a drop down choice to Sign Out. Perhaps few other users ever sign out or else Google finds a strategic advantage in making it more difficult to sign out.
For your amusement! (At least, I was amused.) In creating a shortened URL for use with a job posting, I wanted to do a quick Google search to see if my created word has been used by others. So what happened when I tried?
Apparently, Google thought that in the middle of February in Montana, I really wanted to search for 'surfing.' So confident were Google's algorithms that I really had meant to 'surfing' that they decided to give me those results. After all, 4 of the 9 letters matched on surfing. At least, clicking the 'search instead' choice let me find what I wanted to know -- that no one else in Google's current web database was using msurefjob.
When Google first launched Google Instant, it was not available for all the Google databases. It still is not available in Google Scholar, for instance. Today, Google announced Instant's expansion to Google Shopping (also known as Product Search). See more coverage at SearchEngineLand where Barry notes that "Interestingly enough, this does not seem to work off the Google Product Search home page." It only works at this point after choosing "Shopping" in the left hand facets.
See my NewsBreak on Google Instant: Interactive Searching for an overview of the new, results-as-you-type, display at Google.
See my NewsBreak on Google's new Navigational Search Facets for an initial evaluation of the nice, new limits and database switching options on the left side of the Google results page.
After checking back with Nate and Google and asking several others to help out, I am getting reports that others at least can see the "View Customizations" notice and link. It is indeed down at the bottom of the results, much more hidden than when it was up at the top.
Here's the screen shot Nate from sent me today. Note that this is from the bottom of the search results page beneath the search box along with other seldom noticed links.
I heard from several people elsewhere who also did not see this, including Chris and Mary Ellen in Colorado and Gary in DC, and then later this evening Danny in California and Barry in NYC were starting to see this. I especially like Chris' comment to me that "The new interface doesn't seem to be working on all cylinders yet." He sent a screen shot of the databases section offering "fewer" than none! Maybe we just need to wait a few more days (or weeks?) until the new interface settles down and is a bit more consistent.
Last week when I wrote about the new Google look, I listed several items that appeared to be missing. Most troubling to me is the loss of the "view customizations" notification and link. So I asked Google why the "view customizations" message is no longer visible and whether it return. The response I got is that "The view customizations link is still there. It is now located at the bottom of the page below the search box."
OK, but why can't I see it then. I've done numerous searches, both logged in and not, in Firefox and in IE, and I still don't see it. I've tried different computers. I compared search results in different browsers and saw different results ranking but still no message.
I've sent a follow-up email to Google asking for a screenshot with no reply so far. So, is anyone else seeing this anywhere on the new search results page? If so, please post a screenshot at http://drop.io/showgoogle or at least post a note there that you've seen it.
Oh, and here is a screenshot of where the message used to be before the new interface changes last week.
Yesterday I posted about the recent changes at Google including a few lost (or at least temporarily missing) features. One more change that I have noticed is on the advanced search page. I expect that many advanced searchers will use the left hand search tools to limit and refine their searches, especially since it contains options not available on the advanced search page. What surprised me is what is now missing on the advanced search page.
Down at the bottom of the advanced search page, up until the redesign launched, Google had a list of "Topic-specific search engines from Google" including specialized searches for government, Microsoft, and Linux.
Here are the old
choices and links. All but the "Universities" search (one of their
earliest) still work. While the ones in the first are linked elsewhere
(and Code is not new -- it launched in 2006), the others do not appear
to be linked from Google elsewhere.
Google Code Search New!
Google News archive search
U.S. Government |
And a screenshot of the bottom of the old advanced search page:
Google announced yesterday that is rolling out several significant changes to its search results pages. Dubbed the Jazz user interface at SearchEngineLand, this has been in experimental mode since at least Nov. 2009. See Danny's excellent and exhaustive overview which details many of the changes. Here are several of the key points and a few cosmetic changes:
- The Search options are now always displayed in the left margin
- The options are now divided into databases and search tools
- Click the down arrow to display more databases under "Everything" and more search options under "Show search tools"
- New "Something different" section under Search Tools links to related but different search suggestions
- Estimated number of results is now under the search box
- Search button is now directly attached to the search box
- Google logo changes and drop shadow is gone
Additional databases and search options may automatically show up for some searches. For example, in the search on 'blue' shown to the left, three other databases (books, image, & videos) display and two search options (2 months limit and sites with images). The full set of options will only be seen when the down arrow "more" links are clicked. Expand both to see all choices. The "Something different search suggestions also only show up for some searches, but they are always displayed if available underneath the search tools.
The roll out will likely take a few days as it is expanded to more and more users. Some other country and language versions of Google are also seeing the roll-out, but it may not be available for all. It will be interesting to see if it is on the left or right for languages that read right to left like Arabic and Hebrew. Some other Google databases show a similar results display (Web, Images, Videos, Shopping, Books, Blogs) while others look more like the old "Show options" left hand facets (News, Finance), and others have no left hand choices (Scholar, Groups, Directory).
What is gone or missing for now at least:
- The "View Customizations" message that could previously alert searchers to when the ranking had been changed based on previous searches, geolocation, or search history is nowhere to be seen
- Search terms used to show up after the estimated number of results
- For single word searches, the link to a dictionary definition is gone
- For country specific versions that used to offer an option under the search box to limit to pages just from that country is gone. Instead it shows up after the search in the left margin. See discussion.
What I like, so far:
- The Something different links go beyond the usual search suggestions to more distantly related concepts using data from Google Squared
- Having options more visible should increase use of these search facets
- More clicks to see all the faceted navigation options. Previously, I could just click on "Show options" to get a nearly full set of facets. Now I have to click twice: first on themore databases and then more search tools.
- Loss of any notice that results have been customized
- No ability to revert to old look. Many others are complaining about this
As others have noted, Ask 3D, Bing, and Yahoo! (among many others) have had left hand navigation with a variety of facets for years. Reading through the commentary, it has been interesting to see how many Google users had never noticed the Show options link in the past. This change will certainly make it more obvious. We will see how long it will last before the next big change. If you are not seeing it yet, try refreshing your results or clearing your Google cookies. I saw it on more computers yesterday and fewer of the same computers earlier today. This evening, I am seeing it more often on most computers. On one, just refreshing the page finally got the new version.
I have not seen other commenting on the loss of the "View customizations" notice, and I definitely hope that some such notification will re-appear.P.S. For more about the changes from Google and links to some rejected designs, see today's post.
Know anyone who thinks of Google this way? From today's Pearls Before Swine:
For all the subtle and experimental changes Google has been making to its results display lately, such as the jump to links and enhanced page links, and that Google even announced the changes (which does not always happen), I don't think I'd previously seen this particular approach to (what should I call them?) site links, or subsite links, or indented results from the same site. It seems related but not the same as the contextual show more results change from July.
You can see it in the screen shot below or try it yourself. For the past two days, I've seen the same results for this search on both IE and Firefox, and on at least three different computers. What's different from the usual indented display?
Google has announced that Google acquires reCAPTCHA. reCAPTCHA is a clever use of scanned, poorly OCRed text as a Captcha that prevents bots from spamming forms and at the same time helps improve OCR (Optical Character Recognition - the process of taking a scanned image of a page of text and converting it into searchable text).
I had always liked the idea of reCAPTCHA, especially since it was reputedly helping the Internet Archive with their scanning of books which (unlike Google books) they make open to everyone and focus on clearly out of copyright works.
However, with the Google announcement, I saw very little mention of how this might impact the Internet Archive. I assumed that Google would switch the reCAPTCHA underlying data from the Internet Archive to the Google Books project (which is not open and it remains to be seen how willing, if at all, Google will be to let other search engines use the searchable data from all their scanning).
Then I was even more surprised to read at reddit that the Internet Archive had never received any correction data from reCAPTCHA. "I don't expect to get any data from the reCaptcha project, since we've asked several times and received no response."
Just another example of a great sounding project that failed to deliver the results it implied. I'm sure Google will make sure to have it help their scanning and OCR projects, but I, for one, am no longer interested in using it.
Sometimes I find the Google blog posts to be long winded, high on hype, and low on information value. Yesterday's post about Google Search Quality started out in a similar vein, but it quickly improved and contains a number of interesting points about how Google handles searches and ranking. And for all those who like to say, "Just make it more like Google" and expect that to be a simple fix, please note the way Google describes their hard work on search quality is that "more than one thousand programmer/scientist years have gone directly into their development."
Several extracts that I found of interest include:
- Ranking algorithms include many aspects beyond PageRank:
- language models (the ability to handle phrases, synonyms, diacritics, spelling mistakes)
- query models (how people use language today)
- time models (some queries are best answered with a 30-minutes old page, and some are better answered with a page that stood the test of time)
- personalized models (not all people want the same thing)
- Evaluation includes automated evaluations every minute (to make sure nothing goes wrong)
- Change Frequency: "In 2007, we launched more than 450 new improvements"
While these do not, perhaps, have any direct bearing on how we can better use Google, it does help to inform us about the rationale for changing results and different processing from one day to the next.
Earlier this month Google expanded the number of languages available in Google Translate. While the press release and most other coverage talked about ten new languages, the number of language pairs (from language X to language Y) increased far more substantially. Previously, Yahoo! Babel Fish had the most with 38 pairs. Google not only upped the number of possible languages, but every language listed can translate to the other. So depending on how you count, Google Translate now has over 500 language pairs available! That's a major increase. As Google Operating System notes, the counting varies depending on how you count Chinese. Only one choice is given for input of "Chinese," but Google Translate seems to accept both the Simplified or Traditional versions. Output can specify either Simplified or Traditional. So, if you count both versions of Chinese as one languages, this means Google Translate can machine translate 506 language pairs. If you consider that as two, it would be 552. And do note that you can input either version of Chinese characters and have it translated to the other.
Also note that Google has not only expanded its machine translation abilities but has augmented its Translated Search as well. Translated Search (also available on the Language Tools page as "Search Across Languages") will translate the query words and then display results in both the original language and in translation. Google translated search can machine translate query words and pages between the following languages. The following ten languages have been added along with the ability to translate between any of the possible language pairs.
Presumably, Google has been able to make such a major expansion of language translation pairs available by using statistical machine translation developed in house. This process is described in their FAQ: we feed the computer billions of words of text, both monolingual text in the target language, and aligned text consisting of examples of human translations between the languages. We then apply statistical learning techniques to build a translation model." Moving to this approach certainly seems to have allowed such a major expansion. Bear in mind that all of this automatic translation is prone to error, although it should give some rough sense of the underlying meaning. I've updated my Online Translation and Translated Search pages with the new languages.
So maybe I missed this earlier, but today is the first time I noticed Google showing related search suggestions at the top of Google results. In this case, I just happened to run on search for talking heads, trying to get an example of integrated content. While it worked for that, it also gave this one line of "Related searches:" at the very top. This is the first time that I can recall seeing this at the top.
Whether this is just one of Google's many user interface tests that may just run a short time or may continue and be seen at more searches, I don't know. Maybe it only shows up for music groups, although it did not for a few other searches I just tried. So far, the only other search I found that showed "Related searches:" was beatles. Anyone else seeing this?
At SearchEngineLand, Barry noticed that Google is no longer alerting searchers that stop words are not searched. Previously, stop words in a query that was not in phrase marks would usually find Google prompting searchers that the stop word in the query is "a very common word and was not included in your search." Does this mean that Google no longer has any stop words? Based on a few of my tests with a small retrieval set, comparing a search with a stop word and another search with a + in front of the stop word, it does seem that Google will on occassion still ignore some stop words.
Dean posted a scathing review of Google Scholar's performance over that past year based on a 32% decline in unique visitors according to ComScore data. More data on the changes at various Google properties between Nov. 2006 and Nov. 2007 are available in a TechCrunch posting. While I am sure that this data does not fully reflect actual Google traffic (and at least one comment on Battelle's Searchblog post says "a staff member from Google . . . tells me that ComScore has some of their numbers wrong"), I still find it fascinating. To no one's surprise, Web search is by far the busiest Google property. Google Directory traffic went down, which is not surprising since Google has made it so much harder to find. But the huge declines in Product Search (down 73%), Scholar (down 32%), and the Video Search (12% decline) surprised me. Book Search on the other hand has grown significantly in visitors (up 55%).
The chart showing which Google properties get the most visits is interesting as well. Web and Image search dominate and are both growing. After those two comes Gmail and Google Maps, which both rank higher in visitors than Google News. Given its increased prominence on the Google News page, I was also surprised to see how few visitors ever went to Blog search.
For Google users who visit many of their services, this is a telling lesson about how others use or do not use so many of Google's search services and applications. I also agree with Dean that Google Scholar's drop in visitors (if that is indeed accurate) comes in part from their failure to improve the service. I have found general Web searches often more effective than Google Scholar searches for at least some scholarly documents.
A new Google experimental search option for moving or deleting search results has been spotted at SearchEngineLand and elsewhere. It is not available for everyone, and is not listed on either the Google Labs page or on the list of Experimental Search options page. The help page notes that "To see your changes next time, you must be signed in to your Google account," but I still do not have the option even when logged in.
Phil notes that other services like Eurekster have done this for awhile. Even Google has tried this before. In the first version of their experimental SearchMash searchers could click and drag results to re-organize the list.
A new Flash version of SearchMash was found by Google Operating System. I haven't seen many major changes in Google's experimental site, especially since the Google Universal Search launch back in May. The Flash version is a significant change. It includes includes options for tracking a search history (see the left side hidden panel for this and other options), results screen shots from Snap, and access to all the databases available from regular SearchMash. One additional database added to this Flash version is Maps.
There's an interview today with the product leader for the Google Custom Search Engine (CSE). I always find numbers interesting:
We have more than 100K registered Custom Search Engines, and that's growing pretty rapidly.(Although it will be far more interesting to see how many of these are getting any traffic several months from now.) Since I just put up my Customize Your Own Search Engine page and built a State Libraries: Custom Search page, I might as well mention a few of my complaints with Google's CSE.
For one of my upcoming columns in Online, I compared the various custom search engines and other tools for building a topical search engine from a subset of a major search engine's database. Tools like Gigablast Custom Topic Search, Google Custom Search Engine, Live Search Macros, Swickis, Rollyo, and Yahoo! Search Builder. I compared a number of features (including the maximum number of sites, whether they support subdirectories, and if they have usage statistics). This information can now be seen on my new Customize Your Own Search Engine page.
Back in January I mentioned that Google was expected to fix its
filetype: search so that it would give results even when not combined with another search term. I checked today, and sure enough,
filetype:qpw now works without needing any additional terms.
So why would anyone really want to do this? Most of the benefit of restricting the search results to a specific file type is when it is combined with other search terms. A search like
filetype:pdf might be useful to get a count for how many PDF documents Google has indexed, but Google's estimated number is so wildly inaccurate that this should certainly not be used for that. Two better uses for the plain file type search are for unusual file extensions and for teaching. Less commonly used file extensions like .qpw or .wps result in less than a thousand hits. Just bear in mind that the
filetype: prefix does not just search file types. Any URL that has a dot followed by something else can be found. Try a search for .junk for example. For teaching, it is nice that we can now demonstrate how a
filetype: search can work having to use additional search terms.
Last week, Bowker announced an agreement with Microsoft that its Global Books In Print database will be used for "basic and value-added data that will enhance descriptions of books incorporated in the new Live Search Books." Considering that Live Books are primarily out-of-print, out-of-copyright books and that Global Books In Print covers, surprise, in-print books, it would be interesting to know how many matches between the two are found. I have yet to see any examples. Today, Google announces the addition of geographic data to its books. Books are analyzed for place names and a Google Map with a list of names and text snippets appear on some books' "About this book" page. It includes some snippet, limited preview, and full text books. According to Google,
When our automatic techniques determine that there are a good number of quality locations from a book to show you, you'll find a map on the "About this book" page.The only way to find out if a particular book has been so analyzed is to look at that book's "About this book" page.
At Search Engine Land, Danny has a long report about Google indexing and ranking issues. While other sections of the post talk about an update to the visible PageRank, issues with supplemental results, and duplicate content, I found the short section on the
filetype: command most interesting. Like some of Google's other field search prefix commands,
filetype: results in zero records unless it is combined with another search term. So
filetype:xls finds nothing, but this is supposed to change sometime in the future and will finally let us run a
filetype:search without requiring an additional term. Does this mean that other field searches will be able to be run separately as well? We'll have to wait and see. In the meantime, if you'd like to get all the results Google will give you for some unusual file type, there is an easy way around the additional term requirement.
I'm somewhat surprised that I've not heard more librarians complaining about this. I had not really considered all the ramifications about it. Philipp Lenson on Google Blogoscoped posts about Freeing Google Books. Basically, he notes that Google scans public domain books available from libraries and then appears to add further restrictions for those books including restricting commercial republication and the removal of the "digitized by Google" mark. Since that bothered him, he has pulled 100 titles from Google Books and "set them free" on his own Authorama Public Domain Books site (with the "digitized by Google" mark removed).
Apparently, there was enough outcry over Google's self-promotional "tips" that they have removed them. When even Google's own Matt Cutts complains about them, I am not at all surprised to see the tips removed. They were actually removed last week, and I'm finally getting around to posting about it. Whether or not the tips were over the top and too intrusive or not, Google responded well. They received a fair amount of criticism for these tips, even while others like Danny felt that the complaints went too far. In this case, Google seems to have decided that it would be best to remove the tips. So, kudos to them for listening and responding so rapidly.
I've been reading some criticism of Google's promotion of its own services on top of other search results. Blake Ross posts Trust is Hard to Gain, Easy to Lose and Phil Bradley says Google Admits It's Failing. Both criticize the "tips" that Google has introduced recently hawking Blogger, Picasa, and Calendar above regular results (but after the top ads, if there are any). One example that Blake uses seems especially egregious. A search for blogs.ca brings up that Canadian free blogging site, but right above it is Google's self-promoting "tip" to try Blogger.
On some top search results, Google adds additional links below the extract that point to subsections of the top ranked Web site. The official name from Google for these subsite links is Sitelinks. See below for a screen shot of Sitelinks for ALA.
Earlier this month, Yahoo! changed their "more" menu. Recently Google has been expanding theirs. A week or two ago I noticed that on my campus, Scholar had suddenly appeared in the
more >> menu. I've checked that frequently, and it was never their or above the search box previously. Other campuses (and even certain other organizations) have reported that a Scholar link was above the search box, but it had not been available to me. It is still not showing up above the search box, but at least it is under
Using scanning technology from Google Books, yesterday Google has launched a new searchable database of U.S. patents at google.com/patents. The blog post has been updated to note some problems with printing and saving, but this is an impressive collection of 7 million patents from the 1790s through to the middle of 2006, with plans to add more recent patents. While there have been many other free patent databases for well over a decade, Google's popularity may help push their version. It has few of the features that a professional patent search might want, but it can help the rest of us dig into the patent literature. Unlike some other Google databases, Patents starts out with an advanced interface for searching a few of the specific fields like
It also includes some specialized prefixes, listed on the about page, such as
patent: (for patent number searching). Date limits for issue and filing date are available.
It looks like Google has rolled out a new feature within their general Web results. When a result is connected with a local business with a known address, a "plus box" will appear next to an address in blue after the snippet extract. Click the plus box to see the map, the address and phone, and a link to a larger map and directions. See Matt Cutts' example and screen shot in his explanatory blog post. It only covers the United States at this point.
Remember when the pay for answers Google Answers service launched how some in the information business thought it could spell the end of reference librarians? Today, Google announced that they are stopping accepting questions and will eventually close down any more answers by the end of 2006. Google will keep the database, which is a good thing since there are some very useful answers within it. But it is ironic to see the demise of this fee-based service, which seemed to make the answers something less than minimum wage and cost the questioners a relatively minor fee. At the same time, Yahoo!'s newer, free answers service has been booming. While the answers are typically much shorter and less well researched, the Yahoo! service seems to have struck a chord with Internet users. After less than a year, Yahoo! reports that their service is available in 18 countries and 8 languages and that Yahoo! Answers has 60 million unique users worldwide with 160 million answers. Just within the US and English-speaking countries, they have 14.4 million unique users and 60 million answers.
Last week I finally experienced the experimental Google user interface (UI) that has the links to other databases displayed on the left side instead of along the top of the search box. Take a look at the first screenshot that shows the top of a regular results page. It includes links to Images, Maps, News, Groups, and more (which just links straight to the More Google Products page instead of being an Ajax pop-up). Note that Groups is still listed instead of the new default of Video. Also, there is no "Web" link, which since we are already in the Web database, makes sense.
You will only notice this once you click on a result, and in particular on a Full View or Limited Preview result. The Snippet view has changed a bit with the addition of 'Key words and phrases' at the top and a 'Contents' section and some other additional information depending on the book record.
But take a look at a Full View or Limited Preview record to see significantly more differences. The left frame lets the reader scroll down from one page to the next without clicking on a next page link. Google has finally added a zoom option. Many records also have a list of 'Related Books.'
SearchMash, Google's experimental search engine, has been updated. The Ajax re-arranging of results (which was fun but seemed otherwise useless) is gone. Now results from different databases are displayed in their own part of the page. A screenshot and discussion are available from Google Operating System. Note the boxes for Web, images, and Wikipedia along with the top box for suggested alternative searches.
Mick O'Leary has an excellent overview "Google Book Search Has Far to Go" for his Nov. column in Information Today. In particular, he compares Google Books Search to Amazon's Search Inside the Book and notes that
. . . Amazonâs feature has several critical advantages over Book Search. The most important is that Amazon has the latest books; Book Search does not. Perhaps because of differing licenses with the publishers, Book Search is often several years behind; Amazon has the latest releases and also lists forthcoming titles. For example, Amazonâs feature has the latest books by Pat Buchanan, James Lee Burke, Ann Coulter, Jeffrey Deaver, Tom Friedman, and Robert B. Parker; Book Search does not (and is usually two or three books behind with these popular authors). This seriously devalues Book Search as a tool for finding, buying, or researching books.
Used Google Co-op much? No? Other than developers and experimenters, I have heard little use of Google Co-op, especially for searching. That has all changed with today's launch of Google Custom Search Engine, an application of Google Co-op. The Custom Search Engine was announced yesterday.Like other search engines that search a subset of a larger database, Google's Custom Search Engine lets users specify specific sites to included or excluded, and some can be prioritized over others. Creators can also specify that certain words should be added to the query. Other subset (or vertical or custom) search engines include Yahoo! Search Builder, Rollyo (which uses the Yahoo! database), and Eurekster's Swickis
This past weekend, I've been giving several workshops in Monterey at the Internet Librarian conference. It is always fun to give a workshop on Web searching on the weekend when the search engines tend to roll out new features or give otherwise unusual results. I had two such situations this weekend, both of which are back to normal today. First, I tried to demonstrate searching the Yahoo! directory from the main Yahoo! page by just clicking on the "Directory" tab above the search box. On both Saturday and Sunday, when searching "monterey" that way, Yahoo! said there were zero directory results, even though if I went straight to dir.yahoo.com and search "monterey," I got plenty of results. Meanwhile, over at Google, I was demonstrating what happens when you click on the "more >>" link which used to bring up links to "Books," "Froogle," "Groups," and "even more," but no link to Google Scholar. On Sunday, "Scholar" had been added to that last of four. I was hoping that the Scholar addition would be permanent and that Yahoo! would fix the directory search. As of today when I tested both situations, the Yahoo! directory searching is fixed, but Scholar no longer appears under "more >>." Oh well.
It has some specialized field searches including
The old Usenet groups and DejaNews search engine became Google Groups. Now Google is announcing the launch of yet another version of Groups. The beta has a new interface and several new features. New capabilities include the ability for group owners to create a welcome message, upload a group logo, and customize fonts and colors. The new Pages feature lets users create web pages inside a group as well. Overall, it seems to be moving Google Groups further away from its Usenet origins and more towards what Yahoo! Groups offers.
Google usually runs all kinds of experimental user interfaces, changes to ranking algorithms, and tests of every type on their own site, just letting a very small percentage of their users see the changes and then evaluating the response. Now, Google has actually launched its own experimental search engine a a completely different URL with no Google logo. SearchMash is the new site. SearchMash presents Web results with Image matches off to the right side. Results are numbered (something I always prefer) and can be sorted. Just click and drag one to change the order of the results (although I have no idea why you might want to do this). Another interface change is that at the end of the page, searchers can just click on the "more web pages" link to retrieve another ten results. Instead of being shown on a new page, they just appear below the first ten, on the same page. Overall, I find no compelling reason to use this, but it is a site to watch for future Google experiments.
From the Official Google Webmaster Central blog come this post on How search results may differ based on accented characters and interface languages. This highlights a change in the way Google handles diacritics and gives a good overview of how it still varies depending on the search interface language chosen.
The Google Blog, in a post entitled "Find the wealth in your library" talks about the expansion of links to national library union catalogs at Google Books. More than 15 union catalogs are includes, not just Open WorldCat. It is not always easy to connect to each of these union catalogs, and I still find plenty of records without a "Find this book in a library" link, even when the books are listed in WorldCat. Gary makes some pointed comments as well.
What is a search engine to do when a searcher puts a URL in the search box? After years of giving a single match with links to other options, Google has done an about face. Now, enter a URL and Google gives results for pages that match the URL as a text phrase. To get to the old display, just use the
info: prefix before the URL. See Matt Cutts' more detailed explanation for why they made the change. It looks like this changed earlier this month, since it was notice on Sept. 1 at Digital Point forums and Search Engine Roundtable.
The sharp-eyed folks in the WebMasterWorld forums noticed that Google has posted information about their subsite results, which they call Sitelinks, and how and why they appear in the results below certain site listings. See the image below for an example of these subsite links.
Google News announces the launch of a News archive search which is linked on the main Google News page (upper right). Instead of being an archive of what Google News has crawled in the past, beyond the 30 day limit of regular Google News, this new archive search is a combination of fee and free content. The fee based content comes from Newsbank, AccessMyLibrary.com, ThomsonGale, Factiva, HighBeam, LexisNexis and others. No list of news sources or vendors is available. Some sources are subscription-only while others offer pay per article options.
Even more interesting is the report of the availability of some of these scanned books from with the University of Michigan's online catalog, MIRLYN. Some of the government publications which Google only shows in snippet view are available in full text via Michigan. The problem is to find these. Try going to MIRLYN, click on the Advanced Search link near the top, change the Format limit to "electronic resources," and then you might find one. However this does not just limit to Google. Try adding "Michigan Digitization Project" as a "Words Anywhere" and look for records with links both to "Google Online" and "U-M Online." The latter gives the Michigan version.
In another incremental change, Google has changed the layout on the Google home page. The links above the search box connect to some of Googleâs other databases. Their version of the Open Directory used to be there before it was demoted. Today, Groups and Froogle have been demoted and no longer are visible links on the home page. Video (with a NEW! tag) has replaced them. In a sense, Groups and Froogle are still available on the home page, but only via a click. The
more >> link at the right now gives a pop-up menu of other choices for Froogle, Groups, Books, and "even more."
But a user must click the
more >> link to see these choices pop up. The "even more" goes to other Google database. For academic researchers who are not on a campus where Google automatically displays a Scholar link, the one major absence from the new Google page remains Google Scholar.
In conjunction with StopBadware.org, Google now sometimes censors results. See the PC World news report for more details. Basically, Google gives a warning page when a searcher clicks on certain results. The warning page has a link to the StopBadware site along with a "Or you can continue to . . . " with a link to the potential malware site. I can get it to work for the asta-killer site with a search like asta-killer. On mousing over the title, Google shows the link URL of http://www.google.com/interstitial?url=http://asta-killer.com/ in the status bar. However, the sub-site links under the URL do not have the interstitial warning link. Nor does Google give the warning on a search that might take me to a particular section of the site such as winzip registration code.
Google Reader has changed its default sort to date (in reverse chronological order) according to the Official Google Reader Blog in its Your Wish is Our Command. Google always seems to drag its feet with date sorts. With Web results, date sorting is quite problematic since most Web pages do not have a reliable date. So date sorting of Web results rarely is useful. But with news and other published sources, date sorting is easy and helpful. While Google gives the option for a date sort in Google News, it is not the default. Meanwhile, neither Google Books nor Google Scholar even offer the option. Google Scholar's strange "recent articles" addition a few months ago is not much of a substitute since it just limits to recent years and then does another relevance sort. So if a wish is really a command to Google, here's a wish for a real date sort at Scholar and Books and for a default date sort at Google News and Scholar.
Google's sitemaps blog, Inside Google Sitemaps, reports a change in its More Control Over Page Snippets posting. Previously, some sites that appeared in the Open Directory would have their Open Directory description show up after the page title in Google results listings instead of the more common keyword-in-context extract (or as Google calls it, a "snippet"). This could become a problem for sites and searchers when the Open Directory description no long accurately reflected the content of the page. Now, site owners can determine whether or not the Open Directory description is used by inserting a meta tag. This is not a new idea. Microsoft introduced the possibility (back on May 22. And both use the same syntax of META NAME="ROBOTS" CONTENT="NOODP".
For some time now, Google has been inserting suggested results part way down the page. (Danny reported on April 6, 2006 that Google confirmed that it was no longer experimental but an official part of Google.) It only happens occasionally, but when it does, they have a faint line above and below and start with "See results for" followed by the other suggested search term. Then three results from the alternate search are displayed within the faint lines. For example, a search on office has suggested results for 'office shoes' part way down the page. That is not the alternate search I would expect to be suggested, but at least it makes some sense.
On June 29, SEO Speedwagon reported a fascinating problem with these midpage results. For some strange reason, a search on
therapy products resulted (and still does today) in a suggested search of 'yahoo.' That makes no sense at all. Try it yourself to see if it still happens, or see my screencast on YouTube.
Bill Tancer from HitWise has several fascinating posts derived from their analysis of Internet traffic patterns. He has one on The top 20 most visited Google sites along with their relative percentage of traffic to each. Due to the interest from that post, he followed up with a similar one for MSN and Yahoo! and then compared each of those three within specific categories. At Google, their Web database got about 80% of the traffic among those top 20 Google properties for the week in quesiton while image search had about 10%. That left only 10% for the remaining properties. Google Directory had more traffic than Google Local. Many other interesting points can be seen in these charts and graphs.
Matt goes on at some length (about 2,000 words) to explain recent changes to Google crawling and indexing process and the Bigdaddy roll-out earlier this year in his Indexing timeline post in his blog. The comments get even longer, but it is an interesting read which explains in part at least why so very old supplementary records have hung around in the Google database for so long.
Tara reports on her experimenting with Two New Google Operators and Limited Google Clustering based on a report from India about using a
type: prefix that would result in response about the category it falls in along with a source citation. The examples are interesting, but as of a week later, it does not work for me. Presumably, another short-lived user interface (UI) experiment.
In the same post, Tara mentions two other posts about another UI test. This one has some "refine results" suggestions at the top of the results page, a feature other search engines have had for years. I wonder if Google will roll this one out or not?
With a "Yes, we are still all about search" title Google announces four new products that are supposed to "enhance and improve the search experience for our users." Try them out to see if you agree.
- Google Co-op is another foray into social networking and collaborative searching.
- Google Desktop 4 is yet another update to their desktop search with an emphasis on many new "Google Gadgets."
- Google Notebook (which is not even in beta yet -- it is due out next week) sounds like another bookmarking and clipping application similar to many others out there.
- Google Trends is initially the
most interesting to me. It lets users search on a selection of Google search
queries. You can compare search queries (separate them with a comma) or just
see a graph of the search volume on a single query. Unusual queries give a "do
not have enough search volume to show graphs" response.
The Depth and Breadth of Google Scholar: An Empirical Study from the April 2006 issue of Portal should be available to anyone on a campus with a Muse subscription.
For advanced search geeks, if you've not looked at the Google Hacking Database from "I'm j0hnny. I hack stuff," you are missing a fascinating collection of advanced search tricks. Bear in mind that many of these tricks are designed to find passwords and cracks, but the techniques are well worth perusing anyway.
It looks like both Alexa and A9 have switched from using an abbreviated Google Web database to using MSN's (although it is labeled Live.com which is more of a different front end to the older MSN Search database rather than a different underlying database). At the moment, there is no longer any image search at A9 (previous one was from Google). Nor do I see Google text ads on Amazon anymore.
Someone at Saint Louis University has reported that Google Scholar appears as an option on the Google home page from on campus. This has been going on at least for some since June 2005. I have never seen it at Montana State University, so it must only be some campuses but not all.
When do Google results contain the sublinks underneath the extract? Michael Nguyne explores this in a post Traffic Determines Google UI Snippet Links. See also Barry Schwartz' post at SearchEngineWatch. I am certainly seeing more examples of these sublinks in Google results. Even if these guesses are not correct as to the why of their appearance, at least we now have a name for them: sublinks. See the image below:
Danny has a summary of a French relevancy study which compares Google, MSN, Yahoo!, Exalead, Voila, and Dir.com. By one measure (best relevance of top five results), Google and Yahoo! tie for top relevancy scores. Using a different measure (at least one good result in top five), Yahoo! beats Google by a bit with MSN and Exalead not very far behind.
Like Google Scholar has done for some time now, Google has announced that Book Search will have more "Find it in a library" links to connect to OpenWorldCat records. While I'm glad to see that a librarian at Google makes the announcement, I was disappointed in that few of the records I found had the link. For the many I looked at, it was less than 10%. However, it sounds as if they are planning on expanding that number significantly, and I hope it does increase soon.
On a related issue, I am still disappointed that books scanned via the Google Library program do not give attribution to the library providing the book. Perhaps the libraries prefer this to protect them from legal concerns, but from a scholarly viewpoint it would be good to know which library copy was used. Then the record could be tied to more detailed bibliographic records about the item from that library. In the meantime, try looking at the first few pages for a library stamp if you'd like to know the originating library.
Google Scholar has announced the expansion of their library links options (available under scholar preferences) to include library union lists in Hungary, Iceland, Israel, Portugal, Sweden, and Switzerland.
With the latest version of Google Desktop, several new features and new concerns arise. Desktop 3, announced on their blog today, gives users the ability to drag panels from the sidebar to place them elsewhere on the desktop. Sidebar content can be sent via email or chat to other users. Desktop 3 now indexes zipped files, added an advanced search form, and allows more advanced search commands. The controversial new feature is the use of Google servers to enable searching across multiple computers (say your home and office desktops). This is not enabled by default, but if it is, data has to be sent to Google servers for this to work. Anyone with concerns about sharing such data should probably not enable this. New search commands include
under: for searching a specific folder and
machine: to specify which computer (if search across computers is enabled).
And on the search side, Desktop 3 allows searching across multiple computers by storing
Gary reports on the addition of related phrases at the top of the results when using Google's
A mere two and a half years after its initial launch, Google says that "We're taking Google News out of beta!" Not all 22 regional of Google News are out of beta, but at least many of the English-language ones are. With the move out of beta comes a feature that suggests stories for users of the personalized news page and keeping search history and personalization enabled. The stories will show up under a "Recommended for. . . " heading.
For users of Google's Personalized Search and their search history, a new experiment allows for the removal of pages and entire sites from the results. Matt posts about the new remove result option.
It looks like Google has increased the font size for the headings on the ads that appear on the right-hand side of search results. Google Blogoscoped has screenshots comparing the before and after. Based on my comparisons with before and after, the ad text itself and the URLs are the same size but the header which is the only part of the side ad which is a link is now larger. Previously (in 2003), the side ads had colored backgrounds and the entire colored box. In those days, the ad text was also smaller and lighter colored. Now, the side ads are the same text size and color as regular results, but they just have less information (less descriptive text and no file size for example). Probably changed to increase click through rates, it makes it a bit more difficult to distinguish the advertisements from the regular Web search results.
Google (and other search engines) have long had a peculiar inability to count their results. It is always an estimate of "about" some rounded number. Repeating a search term can change those numbers, and Danny speculates on some of the reasons why. Just remember, we wouldn't want consistency from Google, would we?
For users of Google's Personalized Search and their search history, a new experiment allows for the removal of pages and entire sites from the results. Matt posts about the new remove result option. Unlike Yahoo!'s block site feature which this mimics, there is no listing of the sites and pages that have been blocked.
For those Google toolbar users who also use the Mozilla Firefox browser, there is now a Google Toolbar available for Firefox along with some other Google Extensions for Firefox. With the Firefox built-in search box in the upper right corner, many users may not feel a need to install it.
Want to see different results at the top than others do? Try Google's new Personalized Search (in beta of course). "Personalized Search orders your search results based on what you've searched for in the past." This new project from Google Labs was announced on the Google Blog. You may have to stay logged in for awhile and build up a history pattern before you see any change in the results order.
At last Google Print now has its own search form, and you can get more than three book results at a time. Go to print.google.com for the search form.
After many years of promoting its "laser like focus on search" and success as an anti-portal, Google is making another step towards being a portal. With the new "Personalize Your Homepage" available from Google Labs, you can customize all kinds of additional information on your personalized version of the Google home page. See the Google Blog post for more.
Google Scholar has opened up the ability to add OpenURL link resolvers to Scholar and have them automatically turned on based on campus IP address ranges. According to their blog post, over 100 academic libraries are already included, and the Support for Libraries help page at Scholar has more details. Most libraries should also check with their link resolver company who may be able to create the appropriate files to enable the links.
According to Urchin, Google Agrees To Acquire Urchin, an analytics tool used understand users' experiences, optimize content, and track marketing performance. According to the press release:
"We want to provide web site owners and marketers with the information they need to optimize their users' experience and generate a higher return-on-investment from their advertising spending," said Jonathan Rosenberg, vice president of product management, Google. "This technology will be a valuable addition to Google's suite of advertising and publishing products."
While Google still does not release a list of its sources for Google News (apparently secrecy is "not evil"), an interesting hack is available from PrivateRadio.org that runs a PHP script every 15 minutes and records the sources on the Google News home page. Started March 24, 2005, by today it lists over 1,000 sources which can be sorted alphabetically or by frequency of inclusion on the Google News home page.
Google announced the launch of Google X, an interface that looks like Mac OS X by putting icons above the search box that when moused-over grow larger and name the service. This does make it easy to have more links above the search box. Unfortunately, after launching this, Google has subsequently removed it. No official word, but many presume it was removed due to copyright concerns or complaints from Apple. Anyway, for awhile at least, there is an unofficial mirror in France for the curious.
Google has opened up a new site, Code.google.com, on which they are providing access to developer-oriented programming libraries and tools. This is intended as a site for "external developers interested in Google-related development." They plan to publish free source code and a listing of our their API services. This will be of interest primarily to programmers or those who might play with API. The initial projects include a Core Dumper, a Sparse Hashtable, and Perftools.
In its continuing move towards a portal, Google now lets users customize some aspects of the Google News front page. Users can re-arrange sections and even add customized sections with up to 9 stories based on a particular query. More information is available in the Google News: Customized News FAQ. This is available in the 9 languages and 22 local editions of Google News. While these changes certainly make Google News more useful as a starting point for news, it could use an option to reduce the size of each listing. There is a "show headlines only option" which removes the first sentence and images. But it is the extra links to other sources for the same or similar story that take up too much screen space. And as Chris notes, there is no RSS feed option.
Not only is there now an updated version of the Google Desktop Search client, but it is no longer in beta. It only runs on Windows XP or Windows 2000 SP 3 or above. But the new version indexes more content (but still not everything). New content types include Netscape Mail, Thunderbird Mail, Netscape/Firefox/Mozilla Web browsing, PDFs, and any meta tags associated with music, image, and video files.
A recent article "Google's Cookie and Hacking Google Print" describes techniques used to write a script that can create PDFs of entire copyrighted books from Google Print. [More comments on it at Kuro5hin.] The full code is not available and the author let Google know about the issue, but the point is that despite some clever programming on Google's part, there can be numerous ways of getting around the copyright restrictions once a book is in a publicly-accessible electronic format. Not a problem for the out of copyright books, but for those still under copyright . . .
There are problems reported with Google's Wildcard Word in a Phrase. The problem is that the asterisk seems to represent either zero or one word. It used to represent exactly one word. For example,
"a little * * * mischief" used to find only "a little neglect may breed mischief" or a similar phrase of six words. Now it also finds pages with just "a little mischief." The cache copy on those pages says that the search terms only appear in pages pointing to the resulting page, but that does not seem accurate. I think that what now happens is that in addition to the way it used to work, Google now also ORs the results of the same search as if the asterisks were not in the query.
Continuing with its trend to add more portal features, Google now has quick access to current weather conditions and a four day forecast for U.S. cities and ZIP codes. Strangely enough, the weather information does not link to a source for more detailed weather information for the locality, even though the Weather Underground (from which Google gets its weather information) does have more detailed conditions and forecasts. This is also available via SMS by sending a text message to the U.S. five digit shortcode 46645 (GOOGL on most mobile phones) followed by the weather query. As Gary notes, Google is finally catching up with a feature that AltaVista offered back in 2002.
Awhile back, Google changed its default dictionary links (from the 'definition' link in the upper right corner that sometimes appears after a single term query or from the linked search terms that appear in the same spot for a multiple term query). Those used to go to Dictionary.com. Now they go to Answers.com, powered by Gurunet. Now, as Gary Stock reports, the links to definitions no longer appear if the word is a plural. Compare the search for test to the search for tests.
Google has just launched a search shortcut to help users access local movie showtimes in the U.S. along with film information and reviews. The service is available from any Google search box and via SMS to 46645 or GOOGL on many phones. To find information and reviews, use
movies: followed by a word or words from the movie's title. To find local movie listings use the shortcut
showtimes followed by a ZIP code or U.S. city name.
Google Scholar has added a Scholar Preferences page which lists a few dozen academic institutions. Up to three can be selected, and those institutions OpenURL links will be shown on individual records. They say "Institutional access is currently a small pilot project" which means that if you are not on the list, you probably will not be able to get on it any time soon.
Google has a beta of the 3.0 version of its toolbar. This is still only available for Internet Explorer 5.5+ running on Windows 98 or higher. There is still no Mozilla Firefox version. New features include spelling correction, a word translator, and auto links. If enabled the auto links will provide additional links from the page when an address (link to Google Maps) or certain numbers appear on a Web page. The numbers include ISBNs (links to Amazon) and package tracking and vehicle identification numbers (link to Google's search by numbers searches).
In its continuing drive towards providing more portal style information, Google has now launched its own Maps project. This beta version uses data from NAVTEQ like many other Web mapping tools. It allows for zooming and dragging the map. It only covers the U.S. and Canada at this point. It can be searched by ZIP code and can map directions between two points. Gary has a more detailed analysis.
The local search from Google is now on the main page, as one of the "tabs" above the search box that can lead to other Google databases. This move is good for the U.S. and Canadian versions of Google where the local database is available.
Gary notes a few changes that have appeared on Google's Advanced Search help page (not on the Advanced Search page itself). It has added more instructions and a section on search operators, changed the "~ search" heading to "Synonym Search," and renamed "Domain Restrict" to "Domain Search."
The definition links offered by Google that used to point to Dictionary.com entries now go to Answers.com instead.
Google has added a video search tool, Unlike the recently unveiled Yahoo! video search which looks for available video files on the Web, Google's video search is more of a television search since it searches TV closed captioning from CSPAN and San Francisco TV stations. It does not provide access to either the transcripts or the video of the shows "at this time" but only includes some screen shots and KWIC text. Still, this could be useful for finding text occurrences within broadcast TV shows.
Google has finally upped its 10 word query limit to 32. As Tara reports, Google News retains the 10 word limit.
Google has changed its Google Print program for books. Formerly, extracts from books were included in regular search results. Today, Google has announce the expansion of its Google Print Program. While it sounds more similar to what Amazon has done for some time with its Search Within a Book capability, the results are now no longer within regular Google listings. Instead, the links are above the regular search results and only show up for certain queries. Try using
books about or
books about followed by some word or certain titles such as
king lear (but not yet
macbeth). At the top, "Book results for" will show with a four book icon and up to three book links. Click one of these to see how the new interface looks. I find several links that just go to the a message saying a page is not available, but the service is still new and will likely change significantly in the weeks to come. See example of new Google Print posting for more examples.
Back in the depths of Google's history, their cached copy of Web pages included two dates: the date when Google crawled the page and the reported date stamp on the page at that time. Then, both dates disappeared as Google realized that they showed how old some parts of their database was. Now that they have greatly increased the freshness of their database and revisit more pages more frequently, they have finally added back some date information. The top line in the cache now gives the date Google last crawled the page. It is a welcome and useful addition.
To get a Google Glossary definition used to require using either the define: prefix or just adding a define in front of a query term. Now it will also appear when "What is" precedes a term.
Google has added a text only cache version. After displaying a regular cached page, look in the header for a "Click here for the cached text only" link to see the cached page with just the text and without any images. This is discussed in more detail in a Search Engine Watch forum posting.
Danny reports on a strange bug at Google which has allowed someone to remove the home page of several sites such as Microsoft and Adobe from the Google database. Danny received the following confirmation from Google: "We can confirm that less than 10 websites were inadvertently removed from Google's index for several hours [Thursday]. All of these sites have been restored and are accessible through a Google search. The removal occurred as the result of an outside attempt to abuse Google's automated web page removal tool -- a free service we provide webmasters who would like to remove web pages they own from Google's index. Upon discovering this bug, we fixed it immediately. We will also perform a thorough analysis to ensure additional web pages were not inappropriately removed."
Google has started up a weblog at www.google.com/googleblog, providing one more place to check for news.
Must be nice to be rich. Google figures it can afford to offer free email along with 1 GB of storage for each users with all of it financed by context sensitive text ads. Just what I want. I can read my spam and see ads for more. However, at this point, GMail is not open to everyone and is by invitation only.
Sometime recently, Yahoo! has dropped the Google image database and is using their own, which is basically the one that has been available at AltaVista and AlltheWeb for the past few months. Yahoo! UK is still showing pictures from Google's image database, but I'm guessing that the new Yahoo! image database will slowly be rolled out to all the other Yahoo!s soon.
After experimenting for several months, Google has launched its new look today. While the appearance is not too different, one significant change is the removal of the Directory tab and the addition of the Froogle shopping tab. Google's directory is still available, but it is much more difficult to get to. Froogle is still in beta, but now it is being emphasized much more. Other changes include the removal of the "tab" look (which makes the links to other Google databases a bit less obvious), the removal of the color background on the side ads, making less of the ads clickable, and putting the search query, definition, and count on the right of the header bar. Google News now includes some thumbshot images in the search results (something AltaVista had done a few years ago).
Two other new Google Labs initiatives are the Google Personalized Web Search and Google Web Alerts. The Personalized Web Search lets users choose certain preference and then use a slider to re-rank their results based on those preferences (but not one at a time). The Web Alerts offer email alerts about new search results. These are run either once per day or once per week.
With all the cosmetic changes and bad news this week, I am pleased to see some new and potentially very useful syntax from Google. The number range search lets you search for a range of numbers, say for any number between 5 and 11. It even searches for numbers with and without commas and includes decimals such as 7.23. The number range command consists of a smaller number, two periods, and larger number which can be used in conjunction with another search word, as in
score 5..11. Adding a dollar sign invokes the price range search which actually searches for the dollar sign, (although it does not yet recognize the pound (£), Yen (¥), or Euro (€) characters) as in
good books $5..11. See the new number searching section of my Google review for more details.
The beta Google Print project has added some magazine articles to the book extracts it has had previously. I am not sure when they first started adding these, but I have not seen them before today. It looks like it includes some short full text articles from several Reed Business Information publications such as Electronic News, Test & Measurement World, Library Journal, and Publishers Weekly. The title of each of these is preceded by [MAGAZINE] instead of [BOOK]. Try a search such as site:print.google.com magazine to see some more.
Compared to any of the major full-text databases from Gale, Ebsco, ProQuest, or Factiva (and often available at public and academic libraries), this is a very small collection of articles. Like most of the Google Print records, few searchers are likely to even come across these. Even the free (and significantly smaller) FindArticles.com and MagPortal databases are better than this, but it will depend upon how much Google expands their collection (and whether they offer a separate interface to just these articles and book extracts).
Why is Google launching its Local Search out from Google Labs and integrating it with general search results? Ad revenue opportunities certainly must have played a part in the decision. While their press release focuses on how it helps users find more local information, the ad revenue possibilities have been pushing many local search efforts. With the launch, if a search includes a U.S. location term (like a ZIP code or the name of a town or city) in addition to some other search term. The Local results will display near the top (in a similar location to news headlines) on a search like springfield books. Or go direct to the Local Search to see just local results. Personally, I still find the paper (or even electronic) yellow pages much more useful and comprehensive. There have been many attempts at local search in the past (Northern Light having one of the more interesting implementations), so we will have to see how popular (profitable?) Google's will turn out to be.
As has long been expected, Yahoo! has announced the launch of its own search engine database and dropped Google. After using AltaVista, then Inktomi, and then Google to deliver search results after directory listings (and now that they own Inktomi, AltaVista, and AlltheWeb), Yahoo! now uses its own database. It appears to be primarily from Inktomi, but its results differ from MSN Search and HotBot which also use Inktomi. Several positive comments at first look:
- It still has cached copies of pages
- It is a large database, sometime finding more than Google
- Most advanced search features still work
As noted at ResearchBuzz, Google's site field search no longer needs to be combined with another word. Previous, to search Google for all pages at whatever.com required using a search like
whatever site:whatever.com since the
site:whatever.com search would give an error result. Now
site:whatever.com will work.
In keeping with a sudden frenzy of new initiatives, Google is now starting to include records and extracts from published books along with a few connections into library holdings information. These two initiatives are currently separate from each other, and since they are both experimental, they may change or stop appearing at any time. Neither one tends to show up in search results very often, but here are a few links to see what they look like.
First, the Google Print inside the book content, which is not as useful as the Amazon Search Inside the Book since it only includes extracts. Note that the text actually resides on Google's servers. See the Google Print FAQ for more information.
Second, Google has some links to OCLC's Open WorldCat pilot project. It took awhile to find a search that would retrieve one, but Maureen Whitebrook toads seems to work. After finding such a record, the user will need to enter a ZIP code to have it identify local libraries holding the particular book.
While at first glance, both of these book-related efforts seem like good ideas, they may well just confuse the sense of what Google is indexing. Most library patrons are still better served by checking directly with their library's own catalog. And book buyers are likely better served at Amazon or another book retailer.
Google is now featuring its shopping search engine more prominently, just in time for the end of the Christmas shopping season. It is not only advertising Froogle directly on the main Google page, but at the top of some search results (only for specific query words), Google will list "Product Search" results. They have been experimenting with this since earlier this month, but these results are now live. As with other recent Google initiatives, it is a bit of a guessing game when the Product results will show. A search on wooden spoons had no Product links while tea kettles had the standard three (even though wooden spoons has more than 100 results in Froogle).
Google has added a few more shortcuts for specific number searches and for airport travel conditions. Basically, five databases will have shortcuts: U.S. Patents, UPS Tracking Numbers, FedEx Tracking Numbers, FCC Equipment IDs, and FAA Airplane Registration Numbers. Note that some require a prefix like patent, fedex, or fcc while others do not and the airport weather needs the suffix of airport. Not all of the examples given work, or they only work at some data centers, but since it is a new feature, those bugs should be worked out soon.
Also, Google is trying out a new design and look on a very small portion of searchers. Whether Google will decide to implement the new look in this screen shot (or here or here) remains to be seen. But based on these samples, it looks like they are experimenting with doing away with the tabbed interface and moving those links above the search box, removing the color background on ads, and adding a "define" link after the search terms. Or is this just a response to Danny Sullivan's predications of multiplying tabs?
Since the most recent Google Dance started around Nov. 15, this update of the Google database nicknamed Florida has created quite a stir in the ecommerce Webmaster community. The major complaint has been the significant change in the ranking of results and many pages no longer show up in the top of the search engine results. For those with time, read the thousands of postings about it in the Update Florida discussions at WebmasterWorld.
Certainly, the ranking changes will also have an impact on searchers, but even more significant to me is the experimentation that Google is now doing with automatic stemming. Discovered first in the Cre8asiteforums, Google changed its Basic Help page to announce that it is now using stemming.
Basically, Google now takes search terms and looks for grammatical variants of SOME of them. Unfortunately, Google does not make it clear which terms it stems and which it does not. I found no plural or singular variants but did find some examples of verb variants. For example, a search on drink water matches pages with 'drinking' and 'water' while run linux also finds 'running.'
You may be able to identify when it happens by looking for the highlighted terms in the search results, but it is not always obvious when this occurs especially if the hits do not rank high enough to appear on the first page. The stemming does not seem to occur on single word searches or on phrase searches (yet another reason to use quotes for phrase searching whenever possible).
Does this help relevance? Maybe for some searches and searchers, but for precision searching it can also be frustrating. Plus, the searcher is not given the choice of when to use it and when to turn it off. MSN Search has offered a stemming check box on its advanced search page for years. Since Google does not say when they will turn on the stemming and when they do not, they could at least give searchers the choice of when to use it (at least for those of us that use features like the preferences and advanced search options).
Expanding on its success with its toolbar, Google launched a new Google Labs experiment today: the Google Deskbar. Rather than a browser add-on, like the HotBot deskbar, it appears in the Windows taskbar and can function independent of the browser. It can be used for many Google functions, including the calculator, definitions, Web searches, news, groups, Froogle, and more.
Unfortunately, it still only works for those with Windows 98 or higher and requires Internet Explorer 5.5 or higher. It displays the results in a mini-viewer instead of the full browser, and because of that can be faster than opening up a browser to see results. However, the mini-viewer can send a page to a user's default browser which does not have to be Internet Explorer. Even so, like the toolbar, it is yet another way that Google tries to get users to rely primarily on its search services rather than others.
The announcement I received from Google, but which is not on their site, reads as follows:
Today, Google released a new Google Labs experiment called the Google Deskbar, a search application that enables PC users to perform Google searches at any time from any application.
The Google Deskbar is a free software download that appears as a search box in the Windows taskbar at the bottom right of most Windows-based PCs. Users enter queries into the search box and results are automatically displayed in a small pane that rises above the Deskbar and overlays a corner of the application theyre using.
The Deskbar provides instant access to information on the web, from any application. For instance, a user in the midst of typing an e-mail can check facts or find definitions by simply entering words and phrases into the Google Deskbar. Additionally, typing Ctrl + Alt + G automatically positions the cursor into the deskbar search box, enabling users to search instantly without having to move the mouse. When users highlight text on a page and press Ctrl + Alt + G, the highlighted text is automatically inserted into the search box.
Forward and back buttons to the top left of the Deskbar pane enable users to easily click through results pages, and a small arrow-shaped link launches a browser for users who wish to view results in full screen. The deskbar menu offers links to all Google services and to helpful web resources such as definitions, stock quotes and other useful information. Users can customize these links via the Options menu. This is an English-language only product and is available for Windows users running IE 5.5 or higher versions.
Were excited about experimenting with new technologies that make it faster and easier for people to connect with the information they need. With the Google Deskbar, users get a great search experience without moving their fingers from the keyboard, from whatever application theyre currently using.
With Amazon's launch of a searchable databases of the full-text of over 120,000 books, it comes as no surprise that Google is also in talks with publishers to do something similar. Publishers Weekly reports that Google has been in talks with publishers and that Google "has reached agreements that allow it to enter as many as 60,000 titles in its database and also presented extensive mock-ups to publishers of how book-relevant searches will look."
On top of talking to publishers, Google is also working with OCLC to include a subset of OCLC's WorldCat database of library holdings in regular Google results. According to an Information Today NewsBreak, these results could start appearing at Google in November. This is part of the Open WorldCat Project.
How Google will implement either initiative, if at all, will be interesting to see. If none of the other search engines do something similar, then Google will have a unique component to its database with library holdings records and/or full-text search access of books.
About.com's owner PRIMEDIA announces that it has entered into a four year agreement with Google to place Google AdWords ads on the About.com meta sites. In addition, part of the deal is that Google is buying About's Sprinks (the current pay per click ad network running ads on the About.com sites). The Google ads are not yet appearing on About, but if it cuts down on the pop-up ads and the very heavy advertising that currently appears on About.com sites, it will be a welcome relief for anyone that tries to view the quality text content on those sites. This deal also shows Google's continued movement into being an advertising network.
Google has moved one of its Google Labs projects into the mainstream. The Google Glossary function is now available directly from Google in two ways using "define." Enter a search that starts with "define" and the first Google glossary results shows at the top. For example,
define environmental protection agency. To see all the definitions, use "define:" as in
define:environmental protection agency. For phrases, it makes no difference whether quotations are used or not. This can work well for acronyms, too.
Note that the definitions found come from an automatic pattern recognition program that tries to identify definitions on Web pages. Many of these are inaccurate and some are just plain wrong. Use this for getting a sense of common definitions on the Web, not for a definitive answer, unless you trust the originating Web site.
Google Alert announces new delivery options. "Results can now be delivered as email, HTML, RSS 1.0, RSS 2.0 or TrackBack feeds." It also now includes direct links to Google's cache.
Google announces that AOL has agreed to continue using both the Google Web database and Google's ads. Called a "multi-year alliance," this renews the AOL deal that started in May 2002 when AOL announced a switch from Overture ads and an Inktomi Web database to Google for both. The Google Web database did not go live on AOL until July 31, 2002, so it has only been a bit over a year since AOL switched to Google.
With the renewal comes several changes to AOL Search:
- Addition of the Google Images database (with "strict" filtering on) and as a separate tab
- New People Search tab for AOL members for search AOL Chat Rooms, Message Boards, Home Pages and Groups
- Local searching capabilities for AOL members
- Popular Searches, in a box listed as "Hot Searches"
The directory continues to be the Open Directory. The Google Web database has the English language limit turned on by default. No advanced search page is available nor any ability to change defaults. Two pages per site are shown, but without the indentation available at Google.
AOL offers no compelling reason for professional searchers to go to their site rather than direct to Google. Of more interest is the AOL Hometown page which has a separate search interface to AOL "Journals" (i.e., blogs) and member home pages. While some of the member home pages show up in the regular AOL Search, Google, and other search engines, the journal pages and some of the member home pages do not. So you may find additional content via AOL Hometown that is not elsewhere searchable.
Usually, search engines will replace all punctuation marks with a space when they index Web pages. And if you use a punctuation mark between words in a query, the search becomes a phrase search. In other words, a search on
import-export is the same as
"import export". However, Google has a couple exceptions to this rule for two characters: the ampersand & and the underscore _. Both can be searched by themselves or as part of a character string. In other words, a search on
adv_search gets different results than
"adv search" and
&tc differs from
tc. And for programmers, while it would not search # or + in most cases, it does
c. It does not, however, differentiate
c and both
c+. Other punctuation marks may change the sorting of results. So Google does some different treatment of punctuation marks, and it has changed over time as well.
I've updated the "unique" section of my Google Review. I also updated the site search page by finally removing the defunct Northern Light search box and the defunct xrefer search box. I also updated the Reference Search Tools page.
Today, Google announced its acquisition of Kaltix Corporation. Formed just this past summer in June, Kaltix has been working on developing search technologies related to personalization and context-sensitive searching. What Google will actually do with this technology remains to be seen and will likely take awhile before it is implemented for the public. And given Google's number of products, if the technology is used, it might be for their ads, news, or their shopping database rather than for their general Web search engine.
Google is new experimenting with a new Search by Location in Google Labs. They are finally catching up with a feature that the old Northern Light had years ago. Google has added a map of locations for the hits from MapQuest. It highlights matching addresses in the keyword in context (KWIC) display, but there is no cache link. At this point, it seems to be limited to U.S. addresses. Searches must include some address information. Full state names and ZIP codes appear to be normalized to a city, state abbreviation search. In other words, the address can be entered in a variety of formats, but the KWIC highlighting usually only highlights the city and state abbreviation in the record, at least on the searches I tried.
This is still very much an experiment. It may not always be available. It may change greatly. Certainly at this point, some of its matches for locations are quite inaccurate.
The ever-experimenting Google has added two more experiments that a very small portion of their users may see. First, they are finally experimenting with giving suggestions for "related searches." This is one feature that they could have added long ago and that many other search engines have offered for years. But the few early experiment reports have not been very impressive and seem to have poorly related suggestions. Presumably this will be much improved before it is released, if they ever release it at all.
Then there is Google's Spectrum, which is a Google search counter. Users can see how many Google searches they run each day. It not yet publicly available, but according to Google is just "an experiment we've just started running with a small sample of Google users."
Both of these are not available to the vast majority of Google users at this time.
Well, it looks like it took AlltheWeb announcing a larger size than Google to get Google to finally update its claim to 3.3 billion. Since And I don't know if this is related or not, but I have suddenly found a few hits that Google labels as "Supplemental Result" right before the cached link as in the last record on this search. I'm not sure what this "Supplemental Result" is supposed to be, but the URL is a dead link. I certainly hope Google did not just boost its numbers by adding a bunch of dead links. I rather doubt this is the case, and another such search found a record with the same label, but that one is not a dead link.
Wait . . . a little more searching turns up an answer on Google's How to Interpret your Search Results help page where it says that
"Google augments results for difficult queries by searching a supplemental collection of more web pages. Results from this index are marked in green as "Supplemental."I am assuming that this is new, in part because doing a site search at Google for "supplemental" results in zero hits, even though they have a help page with that answer. But I'd like to see more description somewhere of what this supplemental collection consists of.
Back in May, Google's intitle: and inurl: were not working properly, as I posted earlier. Well, they now seem to be working again. A search that combines a general query term with these field searches, like "market research" intitle:tourism, now work. I've updated my Google Inconsistencies page to note that problem has been fixed, but I added another report of a strange result for the simple query of 'cameras.'
Following in the footsteps of AlltheWeb, Google now has a built-in a calculator function. It lets you use numbers or the word for the number for mathematical equations, unit conversions, and physical constants. Only a bit of a description of all the functions are available on the calculator section of the help page. One "Easter Egg" in the calculator comes up when searching answer to life the universe and everything where it displays '42,' the answer from Douglas Adams' The Hitchhiker's Guide to the Galaxy.
Google has added an alert service for its news databases. The Google News Alerts is in beta and is also listed on the Google Labs page. With the demise of other free alert services, especially Northern Light's current news alerts, this is a great addition for anyone who wants to keep up with the latest news. Just be careful not to choose search terms that will return too many hits. The default "once a day" option should help if you do, but be careful with the "as it happens" choice.
Google has introduced a new operator, the tilde ~, for searching for synonyms. It should be placed immediately before a search term, with no space, for which you want Google to look for synonyms. For example, a search on query ~analysis finds matches with query statistics and query analyzer. A brief entry about ~ is available on their help page.
Using some of the technology behind the Google Sets, the ~ seems to include plural and singular forms as well as synonyms. Use the - operator to get a sense of what synonyms have been searched, as in ~hiking -hiking. Some of the automatically generated terms may not be helpful, but when you are not aware of the vocabulary in a field, this could be quite helpful.
Google review updated. I put the synonym operator under the Truncation section, since that is at least one use that can be made of it.
Google has finally added an advanced search page for its news database. It includes options for sorting by date, specifying the news source, a location limit, a date limit, and field searches for headline, body, and URL.
For more than a month now, the intitle: and inurl: field searches have been broken. I first heard of this on May 27, 2003. The advantage of intitle: and inurl: over the advanced search page Occurrences section or the allintitle: and allinurl: field searches was that they applied to only a single term and could be combined with other search terms that would look through the record. So now, searchers can not do a search that looks for one word in the title and another in the body. A search that tries like "market research" intitle:tourism retrieves many results that do not include 'tourism' in the title.
At first I thought this was a temporary glitch from the strange May update, but it has persisted through the June update and has continued for some time. Hopefully it will be correct sometime soon. I've updated the Google Inconsistencies page with this problem and several others long term problems.
In addition, I updated several parts of the Google Review, including the addition of several language limits added in early 2002 that I had missed: Croatian, Indonesian, Serbian, Slovak, and Slovenian.
It appears that Google's spider is not only checking robots.txt files, it is also indexing and even caching some of them. Try a search on
allinurl:robots.txt to see some examples, or see the cached copy of the Salon.com file.
It would be interesting to know why they are doing this. Other search engines, like AlltheWeb will index robots.txt files that do not follow the protocol as in the search for
disallow user-agent url.all:robots.txt. (The results either have the robots.txt file not located in the root directory or the filename is not all lower case.) But with Google not only indexing the content of the files but also saving cached versions, this opens up some interesting applications for searching for sites that exclude specific bots and also to track changes in a robots.txt file for a specific site by comparing the cached version to the current version.
How long this may remain available will depend on whether this was intentional on Google's part or simply a mistake. Since some of the KWIC extracts (snippets) show some code such as
that are not actually in the original files, I suspect that it may be either a mistake or that it just still has some bugs that need to be worked out.
Following up on Hotbot's announcement yesterday and Infospace's the day before, here comes Google with a beta of version 2.0 of the Google Toolbar. The new version has several new features including a pop-up blocker (which counts how many it has blocked, something I really do not want to know), the ability to automatically fill out forms, and a BlogThis! button to instantly comment in your blog on the page you are viewing. Of course BlogThis! only works if you have a blog on Google-owned Blogger. The toolbar only works with Internet Explorer and on Windows.
Google the advertising company is now moving beyond search-related ads into content-based sites with an affiliate program called AdSense. The self-service program makes it easy for Web masters and Web publishers to put Google ads on their site and share the ad dollars. What ads get put on the participating sites? Google uses its link analysis techniques to try and match appropriate advertisers with the right publishers. How well that will work and how profitable it may turn out to be for both Google and the publishers remains to be seen.
FindWhat, another ad bidding engine like Overture and Google AdWords, is buying up Espotting, an ad bidding engine that has focused on Europe, for about 8.1 million shares of FindWhat.com stock and about $27 million in cash for a combined valuation of about $163 million according to their US press release. The combination of the two may help FindWhat become a more serious competitor for the search engine ad space to the two big companies: Overture and Google.
Pandia reports that "Google has started using automatic redirect scripts directing non-US users to the relevant national versions of the Google site." Even U.S. users should bear this in mind when traveling outside the country. Fortunately, as Pandia notes, you can still get to the main U.S. version with an address such as http://www.google.com/webhp?hl=en.
SearchKing had sued Google in Oct. 2002 due to a loss of PageRank and a subsequent drop in ranking for its site at Google. On May 27, 2003, the U.S. District Court for the Western District of Oklahoma granted Google's motion to dismiss. Although the court case had always seemed to many to be without merit, at least SearchKing deserves some credit for being willing to posted the decision on its own site.
Reuters reported on May 5 that Google CEO Eric Schmidt says that "soon the company will also offer a service for searching Web logs." Andrew Orlowski then interpreted the comment to suggest that blogs may be separated into a distinct database. It is too early to call that anything more than a guess, but it will be interesting to see what Google ends up doing. In the meantime, Daypop, Feedster, and others listed on my Other Internet Search Engines page provide very useful searchable access of blogs.
OK, actually Google has purchased Applied Semantics (the company formerly known as Oingo). Their press release quotes Sergey Brin saying "This acquisition will enable Google to create new technologies that make online advertising more useful to users, publishers, and advertisers alike." So the purchase is helping Google the advertising agent and will likely be used for their content-targeted advertising. Will it impact search beyond the ads? We'll have to wait and see.
Benjamin Edelman from Harvard Law School offers up a well-research paper, Empirical Analysis of Google SafeSearch, that shows that "SafeSearch blocks at least tens of thousands of web pages without any sexually-explicit content, whether graphical or textual. Blocked results include sites operated by educational institutions, non-profits, news media, and national and local governments." Filtered sites included Apple Support, NET Bible, Thomas (Congressional legislative system), and many more.
So how well are Google's new content targeted ads doing? According to a ContentBiz story, advertisers are not as pleased with the results of the new placement as Google may have hoped..
In Google News you used to be able to use advanced syntax like cache: followed by a URL to pull up a cached news story or site: to limit to a specific publication. Now these syntax no longer work and Google says "site:nytimes.com was dropped from your search because it is not supported for this type of search." For title searching, intitle: still works. Instead of site: try using source: which should be followed by either the single word for the source title that Google shows in green or for multiple word sources, use an underscore (_) character in between the words as in
source:new_york_times. Google News could really use an advanced search form and the restoration of the cached copies.
With all the other recent changes, this caught me totally by surprise. Remember Infoseek that became Go after Disney bought it? It dumped its own search engine back in March of 2001 and replaced it with straight Overture searches. Today it now says "Powered by Google" and gives both Google AdWords results and regular Google results. The Google steamroller moves on.
An interesting tale of censorship at Google is told and documented by Seth Finkelstein. Basically, Google removed a page from its index in Feb. 2003 after pressure from the UK. The page seems to now be gone from the Web itself, but according to various reports, it was a very sick, twisted joke page, and not the pedophile page it was claimed to be.
Apparently, Google is moving more aggressively into the advertising business. They are starting a new text ad program, Google Content-Targeted Advertising, which will display ads on non-search related pages. This ad program, like those ads that Google now shows on its site, are not graphics or banner ads, but text ads with a colored background. They are free until March 12, but where they will be displayed after then remains to be seen. See also Google's FAQ for Content-Targeted Advertising.
For reasons known only inside the company at this point, Google has bought Pyra Labs, maker of the free Weblog site and software company Blogger. Fittingly enough, instead of announcing it in a press release, the news first showed up in a blog. No immediate known changes as a result to either Google or Blogger, but it will be interesting to see what comes of this acquisition.
Googlert and SearchAlert.net are two new free services that offer email alerts when new search engine results are available. Googlert was launched in January, but I'm not sure when SearchAlert.net started. Googler works only on Google and does require registration for a free Google API key. SearchAlert.net says that it "continually monitors the big Web search engines" but does not specify which ones. Alerts page updated with both of these.
Finding Google's cached copy is not always trouble free. Take the recent example of an interesting story of journalistic confusion gets even more confused. Apparently, a Computerworld reporter was fooled into believing that terrorists claimed responsibility for the recent "Slammer" worm. The original story was posted online but now states "Computerworld removed this story due to questions about its authenticity. An update about this situation has been posted."
So what does this have to do with Google's cache? Well, other reporters thought they might find the original story from Google's cache. Google Village, in their story Google Everflux Misses Slammer Terror states that "Google is good at getting the fresh stuff each day, but not good enough to capture a page, and cache it after such a page has appeared for a few hours." And The Register reports that the story "doesn't seem to have been around long enough to make it into Google cache."
Well, I beg to differ. For as long as it lasts, take a look here. Presumably, the reporters tried a search like cache:www.computerworld.com/securitytopics/security/virus/story/0,10801,78219,00.html which currently gives no results. If they had gone one step further and clicked on the "News" tab, they would have found the cached file. Note that the cached copy is missing the usual surrounding text and graphics. I think this is due to the way Google identifies news articles for indexing, leaving out the navigational and other surrounding text. Google News search results do not display a link to a cached copy of the story, but apparently they are there anyway. And in case the cached copy disappears from Google, I have a copy on my site.
Oh, and while I'm on the topic, I've noticed some other oddities with Google's cache. Google has two rather distinct crawls: the regular GoogleBot crawl, sometimes called DeepBot, and a smaller one that focuses on frequently refreshed content. The latter often called the FreshBot. Results from FreshBot usually have a date listed before the "Cached" link. These two crawls can have two separate cached copies at Google. For example, a search on lisnews today finds the top hit with a date of "Feb 10, 2003." Click on the "cached" link, and the latest story is actually from Feb. 9. But a direct search for cache:www.lisnews.com pulls up a page cached Jan. 11. Both pages are searchable in Google's index. But for hardcore cache users, the point is that there are two versions of the page accessible from Google, if you are willing to do a little digging.
HotBot has relaunched and now can search Inktomi, Google, FAST, and Teoma. Terra-Lycos, the owner of HotBot, says that with the new HotBot, they want to give the users control. It certainly makes it easy to check four of the major Web search engines from one interface. The front page no longer has ads and flashing banners and pop-ups should be gone from other pages. And the advanced features are readily available and properly translated for each of the four search engines, if they are supported. If they are not supported, HotBot will say that "These filters are not yet supported."
However, several advanced features are gone from the previous version of HotBot:
- Boolean option for Inktomi searches
- Name search (listed as "the person" previously)
- Truncation and word stemming
- The ability to choose more than 10 results on the advanced search page is gone (and with the Preferences, it will now only give up to 50 hits but Google and Teoma will not even give that many)
- And the "More results from this site" link does not always show up, meaning that when it does not the searcher can only find one page per site
And just in time for Christmas, today Google launches Froogle, named with pun firmly in cheek. Froogle contains just products. Sellers can get their products included for free, potentially using a data feed. See the About Froogle and Information for Merchants for more details. Unlike regular Google results, Froogle includes price, store name, and sometimes even a picture.
Google Labs has added two new initiatives. The Google Viewer provides a fancy way to have search results scroll by with views of the pages as well. I find WiseNut's Sneak-a-Peek and the MSN Search Preview easier to view but prefer basic text results to all three.
The second initiative, Google Webquotes, seems to have more information value. Enter a search, and for each of the top ten sites, several quotes from other pages that point to the top ten sites are listed. So a search on 'google' finds "Google sells paid listings. . . " and "If Google were to charge a fee. . . ." Perhaps interesting to play with, but not the most definitive quotations.
Jeff Dean, Distinguished Engineer at Google, gave a keynote address yesterday at Online Information 2002 in London. He mentioned some of Google's future plans, which include
- More comprehensive and fresher database
- Improved usability (will that be a new user interface?)
- Conceptual understanding (perhaps that Google will try to guess synonyms)
- More personalization
Gary Price reports that "You're now able to limit your search to a specific site for stories available via Google News. In other words, the site: syntax now works." He includes several examples. Maybe they will eventually add an advanced search page as well.
The main Google page claim has jumped from 2,469,940,685 web pages to 3,083,324,652 web pages. To get their number over 4 billion, they add in the 330 million images in their image database and the "nearly 800 million Usenet newsgroup postings" in Google Groups. The image number has remained static since Dec. 2001, but the Usenet postings have grown from 700 million then to "nearly" 800 million.
So what about their basic Web page growth? I am not sure what they are counting. On a few quick tests that I ran, Google did not seem to find that many more results than they did last March, and in some cases, they actually found less. It may be that the unindexed URLs and duplicates have increased substantially, but I have been unsuccessful in getting Google to comment on that. According to Googles Nate Tyler, "more than 40 percent of these 3 billion web pages are authored in non-English languages" and "more than 50 percent of Googles traffic comes form overseas," so perhaps much of the claimed growth comes in that sector.
An interesting article on News.com "The Google Gods: Does Search Engine's Power Threaten Web's Independence?" quotes Gary Price of The Virtual Acquisition Shelf and News Desk fame.
The Excite Networks (who run iWon and the portal portion of Excite) have launched a new portal. MyWay.com boasts that is has no banners or pop-ups. The portal content is similar to that at Excite and iWon, and the search engine and directory come from Google (and Google's implementation of the Open Directory). So how are they going to make money off of this one? They claim "My Way makes money through clearly identified sponsored listings and text links. . . . Does it work? Yes. In fact, we will be profitable in our first month of operation." Time will tell. I did not seen any sponsored listings yet, but the portal content is fairly slim and undifferentiated from other portals. At least their Google implementation includes the cached links as well.
Jonathan Zittrain and Benjamin Edelman have posted their study "Localized Google Search Result Exclusions: Statement of Issues and Call for Data" that has found that at least 100 sites have been excluded "in whole or in part from the French google.fr and German google.de compared with google.com." The sites were removed by Google to avoid legal problems with laws in those countries.
For anyone following the Google Answers commercial service, there are several recent interesting writings about it. Jessamyn West has a enlightening read in her Searcher article "Information for Sale: My Experience With Google Answers" [10(9): 14- , Oct. 2002.] But her story did not end there as her follow-up posting "How I Tried to Resign from Google Answers but Found I Was Already Fired" shows. And then there is a similar story "Silicon Samurai: Questions About Google Answers" posted at Geek.com on July 17.
I find the economics especially interesting. Maybe Google will make enough to continue Google Answers, but it certainly does not seem that the researchers are going to make much. Nor does it pose much threat to professional researchers, information professionals, or librarians. Oh, and if you want to link to Google Answers, don't use the currently non-functioning link on Google's Services and Tools page (http://answers.google.com). It needs to be an https link: https://answers.google.com. Maybe Google should pay a researcher to help them fix their own internal links?
After several months of waiting, Yahoo! announced today (during their conference call announcing third quarter profits) that they have extended Google as their search engine partner even though the "Powered by Google" logo and text are gone. In addition, they have mixed up Yahoo! directory entries with Google records in their search results. Instead of having Yahoo! directory entries under the 'Web Sites' heading and Google records under 'Web Pages,' they now both come under 'Web Matches.' Then their is a follow-up search option to just search the directory under the "Search in . . . Directory" link in the upper right corner. The advanced search has also changed significantly. It now looks much more like the Google advanced search. See their What's Changed with Yahoo! Search and Danny Sullivan's report for more details.
I can't say I'm impressed with the change. The division between the directory listing and search engine results was often useful and made it easier to teach the difference between directory results and search engine results. The mixture of results may help on some searches but will probably lead to more confusion on others. And with a greater emphasis on plain search engine results, what is the long term future of the Yahoo directory? If you liked the old Yahoo!, try one of the international versions like Yahoo UK or Yahoo Australia which at least for now still have the old separation.
The telegraph.co.uk reports that Google is considering user fees for some sections of its site like the new News search. According to the article, Google's Senior Vice President of Worldwide Sales and Field Operations, Omid Kordestani, said: "We may experiment with ways of monetising after we have got the service right. Charging would be one approach. So far we have found it better to keep the service free and charge for targetted advertising." It sounds like they are just floating the idea with no definite plans to start charging yet, but it is one more reason to be comfortable with the Google alternatives. First seen at Pandia.
The Google dance appears to have begun yesterday and there is much weeping and gnashing of teeth in the optimization community. The Webmaster World forum thread discussing the update already has over 430 posts since it started yesterday morning. What is the Google dance? It occurs when Google is launching a new database, first on www2.google.com or www3.google.com and then eventually on the main site. I can take several days until the whole new dance to finish. So right now on the main www.google.com, the bulk of their database appears to have come from an early August crawl. The database on www2 is from a late August early September crawl. So searchers take note. Try www2 for the more current records, but expect changes over the next few days from what you got at Google earlier this week.
So why all the frantic discussion in the forums. It seems that Google may have made a more significant change than usual to their relevance ranking algorithms. According to a related Webmaster World thread the changes have moved Microsoft out of the top spot for a phrase search on "go to hell" and perhaps has increased the importance of anchor text from the Open Directory. Again, the point for searchers is that the results will likely change compared to what you have seen. Whether better or worse relevance ranking remains to be seen, but it will probably depend greatly on the search terms.
The Google News has greatly expanded its number of news sources (to "approximately 4,000") and the depth of its archive. It also has a newly redesigned look and has finally added the News Tab on the main page and on search results pages. According to the About page
"Google News continuously crawls more than 4,000 news sources from around the world. This number will continue to grow as we develop the service further" and it now "includes articles that appeared within the past 30 days." There is still no advanced search, although Tara points out that adding
&num=100 to the end of a results URL will give 100 results at a time. Even easier, just change your regular Google preferences to default to 100, and you don't even need to add the special code.
I can't say I'm impressed with the "Google News is highly unusual in that it offers a news service compiled solely by computer algorithms without human intervention" boast or the lack of a list of those 4,000 sources. However, the results are certainly much broader than what was offered before.
Gary Price points out that some changes are going on at the Google News search. Search engines like to experiment by giving one out of say a thousand queries the experimental interface or results and then gauging their reactions. That makes it hard for the rest of us to see the details of the experiment unless someone grabs a quick screen shot. Just earlier this week on Yahoo! I noticed that the "Web Pages" link was not highlighted unless you clicked on other of the other links first. And the Powered by Google had moved way down to the bottom. Was this the beginning of a change to another search engine or an attempt to lessen the amount they pay to Google? Or what it just Yahoo! experimenting with some different approach. Time may or may not tell.
Now completely moved from Inktomi and Overture to Google and Google AdWords, AOL introduces their "New AOL Search." AOL also notes that Google results are included on several AOL properties: "now available within the search areas of Netscape, AOL.COM and CompuServe, and for members in the United Kingdom, France, Germany, Netherlands, Brazil, Mexico, Argentina, Japan, Australia and Canada." On all of these, the search results may include a few links to specific material from AOL sites as well as the Google results.
In an initially surprising partnership, Google has announced that it will provide advertisements from its AdWords database to Ask Jeeves and Teoma users. Ask Jeeves will continue to use the Teoma database for its search engine results, but the ads will come from Google.
Slashdot has an interview with Craig Silverstein, Google's Directory of Technology in which he answers 10 questions from the Slashdot Linux-loving community.
The much-discussed, if underwhelming, Google Answers has added a search capability and classified past questions and answers. I wondered how long it would take Google, a search company, to add this search feature. The broad categories and the search box are at the bottom of the main Google Answers page.
Netscape Search is now serving Google results without Open Directory hits first. It does start with some Sponsored Links which are ads from either Overture or Google AdWords. However, the Netscape search buttons still point to a page that rotates among several search engines. Like Yahoo!'s version of Google, the Netscape Search version does not all the advanced capabilities of Google.
Opening a peak into Google's current experiments, the new Google Labs shows ideas under development. At this point, there are four: a Glossary that offers definitions of words, phrases, and acronyms; Google Sets which provides related terms, Voice Search for searching Google by telephone, and Keyboard Shortcuts for non-mouse navigation. Google also has a new version of the Google Toolbar. The old one must be uninstalled first to get the new experimental features to work, and they are hard to find under Toolbar Options. They include an ability to suppress some pop-up windows and some new navigation features.
AOL announces a switch to Google for both an ad database and its Web results. Right now, the sponsored links (formerly from Overture) are coming from the Google AdWords database. The Web search results continue to come from Inktomi for now, but they should be from Google by this summer. This includes AOL, CompuServe, and Netscape search sites. See also the Google press release.
Amazon-owned Alexa now offers Alexa Web Search, a strange amalgam of Alexa's information about Web pages combined with Google's Web search engine database. Gary Price offers a detailed analysis of the new tool.
Two fascinating uses of the free Google API programming could certainly be of use to advanced searchers. The Google API Proximity Search and the Fagan Finder Google Advanced Search with the ability to choose specific dates.
Google has introduced a pay-for-research service in beta format called Google Answers. Take a look at the Answers FAQ and the Help & Tips section for details, but basically users pay $0.50 to post a question and agree to pay from $4-$50 dollars for the answer. Check out the site to gauge the quality of the answers.
If a search term gets zero hits on Google, it will not automatically try to guess the correct spelling and search that. Try a search on brjother and Google says "Your original search: brjother was misspelled" and then automatically searches for what it thinks you meant.
Google is providing access to its APIs for developers and programmers. There are limits on its use, capping the number of queries per day at 1,000 and requiring an account for use, but it is free. This may eventually offer more sophisticated uses of the Google database, especially if Google adopts the best of what outside developers can create. In the meantime, read Google's API FAQ and its Terms of Service.
The new beta Google News Search can now sort by date. However, it still defaults to a relevance sort first.
Google announces the launch of its news headlines database in beta version. It covers only about 100 English-language Web-based news sources. It also clusters related stories from different publications under one headlines.
Several new additions to the Search Engine Statistics section. I have updated my Relative Size Showdown and the Total Size Estimate analyses with data from March 4-6, 2002. Using 25 search terms, and verifying the actual number of hits available for the largest search engines, Google has maintained a solid first place, followed by WiseNut and then AllTheWeb . I also updated the Database Change Over Time page which compares the same searches run on the search engines at various times. In addition, I have posted two new pages on Google: the Google Database Components which compares the components of the Google Web database based on the statistics analysis and one on Google's Unindexed URLs which has an explanation and example of Google's barely-indexed URLs. Google Review also updated.
Google has introduced another specialty search page: a Microsoft related sites search. It is also linked on the bottom of their advanced search page.
Google has announced a new pricing structure for their AdWords program, the one that puts the ads in the right hand margin for certain searches. Formerly, the ads were bought based on the number of impressions they received. The new AdWords Select program will be more like Overture in that the keywords are bid on based on a cost per click model. While this will not change the ranking of the regular Google results, it will provide Google with an ad database with which to compete with Overture.
iLor announces that as of today, it will begin using Ask Jeeves databases rather than Google's for its search tools. Initially it will be using Ask Jeeves' Direct Hit database but it will be switching to Teoma later this year.
Google announces its Search Appliance, a combination hardware and software product for site search on intranets and other Web sites. Pricing starts at $20,000.
Google previously had one unsearchable stop word --'the.' It is now searchable within phrases, like other stop words, and well as with the + symbol. Also, Google now supports using the asterisk * within a phrase to represent any full word, something AltaVista has long supported. However, the asterisk does not work for truncation anywhere else at Google.
Google is busy once again. This time, they have introduced Google News Headlines, a page which has summaries top news stories. It is a rather lengthy page and lacks the easy to view organization of many other news sites and portals, but it provides several viewpoints on each news story from different publications. Unfortunately, the do not provide any archival access to the stories. In addition, it looks like Google is no longer using Moreover for the news headlines on a regular search page. Instead, it appears they are using their own crawled headlines.
Google releases a new database in beta: Google Catalog Search. There is a link to the new database at the bottom of the advanced search screen, and it is directly accessible at catalogs.google.com. This database consists of scanned pages from print mail-order catalogs. The database is text searchable, and it displays the full page images from the catalogs.
- Google Groups is out of beta
- Google Groups now goes back 20 years
- 700 million Usenet posts in Groups
- Image database is now 330+
- Web database has 1.5 billion fully indexed documents
- That includes 35 million non-HTML docs like PDF, PS, DOC
- The total count includes 1/2 billion unindexed URLs
- Selected news crawling replaces Moreover
An alert reader has noticed a change in the way the Google handles diacritics. In the past, words with no diacritics would match those with and vice versa, so either
éléphant would find both elephant and éléphant. Now,
elephant only matches the word without diacritics. To find the French version,
éléphant must be used. Note that this differs from AltaVista where the plain
elephant matches both but the diacritics version,
éléphant, only matches éléphant. The lesson for the multilingual searcher is that in Google, use all diacritic variants if you want more than an exact match.
Google now automatically searches for stop words when they are in phrases, without requiring the + sign in front of the stop words. Google has added and will be adding more stop words in non-English languages (Chinese now and German next week). This bears watching as articles in one language may have a different meaning in another, such as the German 'die.' This automatic stop word searching means that only the unsearchable stop word 'the' can be used for the full word wild card within phrases. Google also now has a "help us improve" link at the bottom of the results page where searchers can tell Google why they didn't like the results. Google Review updated.
A new beta of the Google Toolbar lets users vote on whether or not they like specific Google search results. The beta toolbar is available from Google.
Google now has an advanced image search page. It gives options to limit searches by file type (gif or jpeg), color (black and white, grayscale or full color), or to a specific domain. In its Web database, Google has added more file types beyond PDFs. It now indexes the following file types and searches can be limited by using
filetype: followed by a file type extension as in
- PostScript (ps)
- Lotus 1-2-3 (wk1, wk2, wk3, wk4, wk5, wki, wks, wku)
- Lotus WordPro (lwp)
- MacWrite (mw)
- Microsoft Excel (xls)
- Microsoft PowerPoint (ppt)
- Microsoft Word (doc)
- Microsoft Works (wks, wps, wdb)
- Microsoft Write (wri)
- Rich Text Format (rtf)
- Text (ans, txt)
Google has changed its design to incorporate a new tabbed interface. Its four databases (Web, images, Usenet news groups, and the Open Directory) are listed as separate tabs on the main search page and on subsequent pages as well. In addition, they have added a new Language Tools search option next to the advanced search link which gives choices on 66 interface language, 26 language limits, and links to regional versions of Google.
Google announces the purchase of Outride, Inc., a small relevance technology firm. According to its Product Technology page, "Outride's technology is based on pioneering research done by the company's founders at Xerox PARC in the fields of data mining, pattern recognition, natural language semantic analysis, artificial intelligence, and information search and retrieval. Ultimately, our Outride Relevance Builder technology and the products on which it is based deliver real-time personal relevance." How Google will incorporate this, if at all, remains to be seen.
The Google Groups have been improved with an enhanced thread view capability that is similar to the thread capability that DejaNews used to have. In addition, the full, original, detailed header information for each posting is available in the original format display option.
Google announces that Eric Schmidt is their new CEO. In addition to Google's official press release, several news stories, such as a CNET story from Reuters, note that Schmidt states that "We are quite profitable."
Google has added a date limit to its Advanced Search page. However, it is only a very limited limit. The only choices for the limit available are Past 3 Months, Past 6 Months, or Past Year. And since Google neither displays the date in its results listing nor gives an option for displaying the date of pages, the Google date limit is not nearly as useful as AltaVista's or Northern Lights'.
Google has finally made it a bit easier to find their Google Groups Usenet archive and search engine. It is now featured on the main Google page below the search box and to the right of the Google Directory (which uses the Open Directory) link.
Google announces its Google.ca, a Canadian version of the Google search engine. About the only difference on the front page is an option to limit to Canadian Web pages. Surprisingly, the limit is not a simple .ca top level domain filter. A search on vancouver restaurants found several .com hits, although it still missed the top hit on the worldwide Google of www.vancouver-bc.com/Dining/ which seems like it should have also been included. A French language version is available at www.google.ca/fr which adds an option for searching only French language pages to the top page.
Continuing its efforts to bring back the funcationality of DejaNews, Google has now added posting capabilities to its Google Groups Usenet archive and search engine.
Google has come through on their promise to bring up the full Deja archive of Usenet news postings. The Google Groups site now offers the archive back to March 29, 1995. Google claims that is has more than 650 million individual messages. Also, the Google Groups advanced search page has been expanded with options to sort by date, and to restrict language, message ID, author, subject, date, or newsgroup. Google also states that they plan to offer posting ability, like Deja used to have, sometime during May.
Google now has added an automated translation capability. The translate option currently only shows up next to results that are in Spanish, German, French, and Portuguese. See Google's help page for more details, and note that a new preference is available which will automatically translate title and KWIC extracts to English for some results.
Google announces that "Dr. Eric E. Schmidt, 45, currently chairman and CEO of Novell, Inc., has joined Google's board of directors as chairman. Schmidt succeeds Sergey Brin, 27, Google's founding chairman and current president.
Google has added a new feature that links to U.S. phone directory information. Enter a person's name followed by a U.S. state abbreviation, area code, city name, or ZIP code to see how it works. For example, john smith ca. It also works when just entering a phone number with area code. Google also has a removal form if you prefer not to be included.
I have updated my Google review as well as the feature chart, and search engines by features pages to reflect Google's other new field searches. The
inurl: field searches are not available in the advanced search but can be used in the regular search box. These can also be combined with other search terms, unlike the allintitle: and allinurl: fields used by the advanced search.
In a major break with the tradition of other search engines, Google has begun indexing the full text of Adobe Portable Document Format (PDF) files. These are identified in Google search results with a [PDF] designation at the front. Instead of a cached copy of the full PDF file, Google offers a text version. Google does not offer a way to search only their indexed PDF files, but just adding PDF as an extra search term can often bring up some results. Try laser pdf to see an example of the documents that may now be found.
According to Chris Sherman's report, Google has already indexed 13 million PDF files. Full implementation on all Google's search clusters is not due until Feb. 5, so you may or may not find any PDF files in your results (and for the search above) until that time. Either way, this is a significant addition to their already large Web database, and it means that Google may well find even more documents not available from other search engines.
Google announces the addition of Greek as a language limit. It is available in both the Advanced Search and via the preferences.
Google has expanded its field search offerings. On the revised Advanced Search page, there is the Occurrences option which allows a title or a URL field search. These are also available on the regular Google search by using
allinurl: as the field labels. It is unfortunate that Google does not just use the
url: syntax of AltaVista and others, but they chose not to. In addition, these field searches can not yet be combined with other search terms. Thus, Google does not provide a way to search for one word in a title and another word that occurs anywhere in the page text.
A new Boolean Searching on Google page has been added here which attempts to explain how to use Google's new OR operator to perform specific Boolean functions. This seems to be changeable, so please drop me an email if you discover that Google has changed how it processes any of these suggestions.
It looks like Google has finally introduced support for the Boolean OR operator. It must be entered in all upper case as in
x OR y. Google still does not support Nesting, AND, or NOT, so use of the OR operator involves some careful ordering of terms.
Google has made some other changes to its layout and color schemes on the results page. Most useful to searchers is that onthe cached page copies, each search term is now highlighted with a different color and is so identified at the top of the page. Google review, search engine chart, and search engines by feature pages updated.
Google has finally introduced an Advanced Search form. It has domain limits, language limits, and the ability to request up to 100 hits at a time.
Google results now appear on Yahoo! (although this may not yet be the final switch. A few observations:
- Results are clustered with only one page per site as opposed to the two per site available on Google.
- Even with the greater clustering, Yahoo! finds fewer than regular Google.
- The ranking of results is also slightly different.
- Clicking on the [More results from . . .] doesn't work right yet. It actually brings up all results from that site, whether or not they contain the keyword.