Click here to find out more
home feature chart reviews statistics learn directorires search

AltaVista

Review of AltaVista

Last updated Apr. 07, 2004.
by Greg R.

Dead Database!

As of March 25, 2004, AltaVista no longer uses its own database. Instead, it is using new owner Yahoo!'s database. Many of the advanced features listed below are now gone.

This review will continue to be available to provide an historical record of the search capabilities AltaVista used to have.

AltaVista was one of the three largest and most important search engines for many years, but it is no longer as popular as it used to be. It had two distinct search modes: Basic Search and Advanced Search. In August 2000, it introduced a third: the Power Search. In Feb. 2002, the Power Search features were added to the Advanced Search page and then in Nov. 2002 were also moved to the More Precision page. There are some significant differences between the Basic and Advanced search pages, as will be seen below. In Feb. 2003, AltaVista was bought by Overture. Overture expected to merge the AltaVista and AlltheWeb databases later in 2003, but once Yahoo! bought Overture, AltaVista's database was replaced by a Yahoo!/Inktomi on March 25, 2004. Use the table of contents on the left to navigate this review.

For News

See AltaVista News Stories

 

Databases:
AltaVista has a variety of databases:

  • Web database: AltaVista's own indexed Web pages including PDF files
  • Directory: Open Directory (formerly LookSmart)
  • News: AltaVista's own crawled pages (formerly from Moreover)
  • Ads: from Overture
  • Images: AltaVista's own crawled image files
  • Audio and Video: AltaVista's own crawled multimedia files

AltaVista has experimented with a variety of databases in addition to their regular Web page database. In the past, they have served results from Ask Jeeves, their own Usenet database, RemarQ Usenet, Overture (formerly GoTo) ads, RealNames Internet Keywords, and LookSmart categories. As of Dec. 2002, most of these additional databases are gone, except for the Overture paid positioning results which may appear at the top and bottom of results, labeled as "Sponsored Matches." AltaVista does have other databases available, including images, MP3/audio, video, directory, and News databases. In addition, there are the AltaVista Shortcuts which may show up at the top of regular search results. These provide quick links to selected popular information.

Also, as far as partners using the AltaVista database, Yahoo! used to use AltaVista as its back-end search engine before Inktomi and then Google. MSN Web search moved from Inktomi to AltaVista in Sept. 1999 but then changed back to Inktomi in December.

Strengths:
  * Powerful search features, some unique to AltaVista
  * Proximity searching, truncation, link searches
  * International coverage, interfaces, and foreign language handling
  * Indexes PDF files

Weaknesses: See also AltaVista Inconsistencies page
  * Database not as large as it used to be
  * Only indexes first 110K of a Web page and 750K of PDFs
  * No cached copies of pages or other file types beyond PDFs

Default Operation: The default Boolean operation for multiple terms has changed many times, and depends on which search form is used. In both, for any term containing a punctuation mark or symbol, the punctuation mark or symbol is removed and replace with a space and the string is searched as a phrase. Thus, a search on cd-rom is equivalent to searching "cd rom".
Simple Search: At this point, the default is finally an AND. However, some automatic phrase recognition is still in force and certain phrases will automatically be searched as a phrase. Check the fine print at the bottom to see which phrases, if any, were automatically searched.

Previously, it was much more confusing. In early 2002, it was AND unless too few records were found in which case it might be an OR. For the latter half of 2001, it was usually a default OR. Before May 2001, searches with four or less terms are an automatic AND while searches with five or more terms are an automatic OR. Before Jan. 2001, for multiple terms entered with no special markings or operators, the Simple Search ran a phrase search if the terms were recognized as a phrase. If it did not recognize a phrase, it processed multiple terms with an automatic OR operation.
Advanced: If no operators are used in the Boolean expression box, AltaVista interprets the search as a phrase search even if it does not match a phrase in their database of phrases. If multiple terms are entered in the ranking keywords box only, they are processed as an OR operation.

Boolean Searching:
Simple Search: Allows only the use of a + for AND and - for NOT. As of May 1, 2001, the Simple Search also can use full Boolean searching, as long as the operators are in uppercase: AND, OR, AND NOT. (Actually, only one of the operators needs to be uppercase to make it work, but having them all in uppercase works as well.) Boolean searches can be nested using parentheses.
Advanced: AltaVista supports full Boolean searching with the operators AND, OR, and AND NOT. Searching can be nested using parentheses. Operators can be in lower or upper case. Also, symbols can be used: & for AND, | for OR, ! for AND NOT. Be sure to use AND NOT rather than just NOT.

Proximity Searching:
Simple Search: Phrase searching is available by using "double quotes" around a phrase. Phrase searching can also be designated by putting punctuation marks between words (like double-quotes or double,quotes). As of May 1, 2001, the Simple Search also supports the NEAR operator if it is in upper case. Like in the Advanced Search, NEAR means that the search terms must be within 10 words of each other.
Advanced: Phrase searching is available by using "double quotes" around a phrase. Phrase searching can also be designated by putting punctuation marks between words (like double-quotes or double,quotes). The operator NEAR (or the symbol ~) can also be used in the Boolean expression box to designate that the search terms must be within 10 words of each other and in any order. NEAR can be entered in upper or lower case and can be nested. There are also a few undocumented proximity commands available:

FunctionCommandDetails
Numbered proximity, no orderwithin #
~~ #
The within operator must be used with a number to specify how many positions away from each other the two terms are located. It is order independent, meaning the two terms can be in either order. This can be used when NEAR's default of 10 is too few or too many. To specify seven words, any of the following will work: within 7, ~~7, or ~~ 7. However, within7 does not work. So be sure to put a space before the number when using within, but if you use the ~~ symbol, it will not matter if there is a space or not.
Order (before)< The before operation specifies the order of terms, in that term1 must occur before term2. The word 'before' cannot be used. Only the symbol for the before command, the < less than symbol, can be used. It is not actually a proximity command since it only specifies order, and not proximity, unless combined with near, as shown below.
Order (before) & proximity<~ By combining before and near, you can specify the order of terms that must be within 10 words of each other. Only the symbol for the before near command, the <~ less than and tilde symbols, can be used. For example, term1 <~ term2.
Order (after) > The after operation specifies the order of terms, in that term1 must occur after term2. The word 'after' cannot be used. Only the symbol for the after  command, the > more than symbol, can be used. It is not actually a proximity command since it only specifies order, and not proximity, unless combined with near, as shown below.
Order (after) & proximity >~ By combining after and near, you can specify the order of terms that must be within 10 words of each other. Only the symbol for the after near command, the >~ more than and tilde symbols, can be used. For example, term1 >~ term2.

Truncation:
Simple and Advanced: The truncation symbol, an asterisk *, can be used for truncation of unlimited extra characters. This can be used as internal truncation or end truncation. There is a minimum of three characters before the truncation symbol can be used. Truncation can be used in phrase searches. Also, within a phrase, the asterisk can be used to represent an entire word such as "addictive semiconscious * of biblioscopy". Within the phrase, it only replaces one word but can be combined with other terms that use the truncation symbol. NOTE that the truncation or wildcard does NOT work with numbers. It does always seem to work with uppercase letters and diacritics. Before late 2001, the asterisk represented 0-5 extra characters and a double asterisk ** was used for unlimited truncation, but that is no longer necessary.  

Case Sensitivity:
Simple: Searches are generally not case sensitive. Search terms entered in lowercase, uppercase, or mixed case all get the same number of hits. HOWEVER, if a term with at least one upper case letter is within a phrase or just enclosed in double quotes, that term will only retrieve exact matches. All simple searches used to be case sensitive but only quoted terms are now.
Power and Advanced: Searches are case sensitive. If search terms are entered completely in lower case, all mixtures of upper and lower case are searched. If a search term contains one or more UPPER case letters, the search is limited to only records that exactly match the specified case.

Field Searching:
Simple and Advanced: Field searching is available by using the field name followed by a colon : followed by the field query. See AltaVista's field list or the descriptions below. For example, the Advanced Search link:www.notess.com and not host:notess.com should find all pages in the database the link to my Web site excluding my own pages on this Web site. To search a phrase within a field, put the quotes after the colon (title:"phrase search"). Do not leave a space after the field name or after the colon.

FieldExplanation
anchor:Term(s) located in the text of a hyperlink. anchor:"search engine showdown"
applet:Pages containing a Java applet with the term in the name. applet:morph
domain:For top-level domain only. domain:edu
host:For a particular site. host:notess.com
image:Pages have an image with term in filename. image:gull finds pages with gull.gif
link:Hypertext links include the term(s). link:notess.com finds pages with links to this site.
text:Pages include the term(s) somewhere other than in an image tag, link, or URL.
title:Hits have the term(s) in the HTML title element. title:"search engines"
url:Pages have the term(s) somewhere in the URL (host name, path, or filename). url:searchenginewatch
like:Find similar pages to the submitted URL. Requires a complete URL, although the http:// can be omitted. Works in Simple Search, Advanced Search Sort by box, but not Advanced Search Boolean box. Is the same function as clicking on the Related pages link in the display. It cannot be combined with other search terms. like:notess.com

Limits:
Simple Search:

  • Language Limits: Many are available (25 in Aug. 2003). These can be changed from the default by clicking the default language limits. The default language is determined automatically as AltaVista tries to determine the country of origin of the searcher.
  • Region: When AltaVista guesses determine the country of origin of the searcher, it makes a regional identification available as well. For the U.S., it defaults to "worldwide" and then gives a radio button choice for U.S. For other countries, it usually defaults to that country. Just click the other radio button to change or use the Settings to change the default..
  • File Type: A file type limit for PDF files is available by adding filetype:pdf after search terms, but this does not work with Boolean operators.
  • Other limits can be created by the use of field searches. For example, +"USS Trenton" +host:navy.mil will limit the search to records with the phrase USS Trenton (with an exact match of the case) that are also in the U.S. Navy's domain of navy.mil.

Advanced:

  • Date Limit: Below the Boolean box is an option to limit by date, specifying a start date and/or an end date. This is the date the document was last modified, according to the document's server and at the time that the AltaVista spider last indexed this page. Dates should be entered in a format like day/month/year, or recent time periods can be chosen from a drop down list.
  • File Type: A file type limit for PDF files is available as a drop down menu choice. Note thatthe filetype:pdf syntax does not work in the advanced search Boolean box.
  • Domain Limit: Two boxes are available, for domains (as in top level domains) and for hosts or URLs (which can include path names)
  • Region: When AltaVista guesses determine the country of origin of the searcher, it makes a regional identification available as well. For the U.S., it defaults to "worldwide" and then gives a radio button choice for U.S. For other countries, it usually defaults to that country. Just click the other radio button to change or use the Settings to change the default.
  • Other limits can be created by the use of Boolean operators and field searches. For example, "USS Trenton" and host:navy.mil will limit the search to records with the phrase USS Trenton (with an exact match of the case) that are also in the U.S. Navy's domain of navy.mil.
  • Language Limits: The default language is determined automatically as AltaVista tries to determine the country of origin of the searcher. Use the Search Preferences page to change or the radio buttons below the search box. Added in mid 1997 it originally covered about a dozen languages. Now at 25, they are listed below:

[Note: Can choose any one of the following. Use the Search Preferences page to search more than one of these and to specify non-Roman alphabet data entry.]

  • Chinese
  • Czech
  • Danish
  • Dutch
  • English
  • Estonian
  • Finnish
  • French
  • German
  • Greek
  • Hebrew
  • Hungarian
  • Icelandic
  • Italian
  • Japanese
  • Korean
  • Latvian
  • Lithuanian
  • Norwegian
  • Polish
  • Portuguese
  • Romanian
  • Russian
  • Spanish
  • Swedish

Stop Words:
Simple Search: This has changed over time. Some stop words are searched and others are discarded and not searched. Stop words within a phrase are searched. So just put the quotation marks around a stop word to force it to be searched or use the advanced search. 
Advanced: No stop words in the advanced mode, although the AltaVista operators of "and", "or", "and not", and "near" need to be entered with the phrase marking to be searched.

Sorting:
Simple Search: The actual Web database results are ranked by AltaVista's relevance ranking formula. In Oct. 1999, AltaVista started clustering results by site, so that only one or two record per site appears on the main results page. This clustering is based on the exact host name. In other words, name.com and www.name.com would be two separate clusters. To see other hits from the site, select the [ More pages from this site ] option.
Advanced Search: The results are sorted by relevance, although putting one or more terms in the sort by box will make those terms the primary sort criteria. Before Feb. 2002, advanced search results were unsorted if the ranking keyword box was empty. Starting Oct. 25, 1999, all results from the Advanced Search were sorted by site. This clustering is based on the exact host name. In other words, name.com and www.name.com would be two separate clusters. screen shot In November 1999 AltaVista switched. The Advanced search did not cluster by default unless the checkbox was used to get only one result per site. With the Feb. 2002 update, the default is again to cluster, displaying only one or two pages per site. Uncheck the site collapse box to uncluster the results and get all the results.

Display: AltaVista displays the title, URL, file size, language, and a two line extract for each hit. As of Feb. 2002, the extract will try to include keywords in context and may also display part of the description meta tag. AltaVista's default display is 10 records at a time. With the Advanced search and the Customization capabilities also give options for 20, 30, 40 or 50 results at a time. To see the file size and language information, the customization capability needs to be used. Date used to be displayed, and then was only available via the customization options. Unfortunately, sometime around the beginning of 2002, date was removed completely from the display.

Related search suggestions are offered in the right margin and use the AltaVista Prisma technology. The help to narrow search queries and go beyond just looking at additional search terms that others have used. It also looks for synonyms and other related search terms.

At the end of each record, one of the following options may be displayed:

  • Related Pages
  • Translate
  • More pages from . . .

The Translate option will only display if AltaVista identifies the language of the page as one which it can translate. The More Pages option shows up only when additional hits have been clustered under that site. Introduced in Feb. 2000, the Related Pages link appears for some results and is the same as using like:[URL].

Running the exact same search on AltaVista can result in varying numbers of hits. If you want the most comprehensive retrieval, run the search several times and then check the same search again at different times. See the AltaVista Inconsistencies page for more information.

At the most, only 1,000 records can be displayed. For more, use the Advanced Search. By changing the final number in the URL, you may be able to get even more. Also, there is a query length limit of 800 characters, which should be more than enough for most queries.

Special Features: AltaVista was the first to introduce a free translation service which currently offers translations between English and Chinese, French, German, Italian, Japanese, Korean, Spanish, and Portuguese. It also goes from Russian to English, German to French, and French to German.

Documentation:

home feature chart reviews statistics learn directorires search

A Notess.com Web Site
1999-2007 by Greg R. Notess, all rights reserved
Search Engine Showdown
Greg's Writings
Greg's Presentations