by Steve Burks, www.smcvt.edu/library  

Reference & Instruction Librarian
Saint Michael's College Library
Colchester, VT 05439
802/654-2354; Fax 802/654-2630
sburks@smcvt.edu 
BONUS QUESTION:  Who was the person who first coined the phrase "Invisible Web" and when?

Assignment Due next class - Finish In-class Questions below

Web Search Engines Categorized

Distribution of deep Web sites by type of content.

PIE CHART

 From White Paper @ http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp

 

The Internet is providing a number of search engines and subject sites that are specific to subject disciplines and rated by evaluative criteria.  Each search engine "harvests" web sites on the Internet, and then ranks them by different methods.  Some search engines use quantity of search hit returns and others use quality of hits to make their search engine as useful as possible.

What can't be searched by "traditional" search engines is referred to as the  Invisible Web or Deep Web.  These terms describe the vast amount of digital information not found with "traditional" search engines (up to 1000 times the size) See  http://www.brightplanet.com/deepcontent/index.asp for more information

Coverage

Search Engines First Generation: The first designed Oold" search engines. Use term relevancy ranking as the primary means of ranking results
AltaVista, HotBot, Excite

Second Generation and Semantic search: Use other schemes to rank search results by as concept-processing tools, popularity, or number of links.  Second Generation search engines may also allow natural language queries instead of using Boolean terms (and, or, not)  
Ask Jeeves, Google, Northern Lights, Oingo

Deep or Invisible Web:  The Invisible Web is thought to be over 500 times of the estimated size of the "searchable" web (estimated to be 3 billion web sites).  "Deep" search engines can find "Invisible" sources of information such as pdf files, online databases (ex. encyclopedias), .
ProFusion, Google, Bright Planet, Search Adobe PDF Online

META Search Engines:  Not real search engines, these sites allow a user to search more multi-search engines in one search
Dogpile, Profusion, SearchBug

Directories: These sites index materials "placed" on their directories.  Some directories are a combination of search engine and directory. Directories are usually human-compiled guides to the web, where sites are organized by category
Yahoo, Ask Jeeves

Subject Specific Search Engines Subject Specific Search Engines - Sites often part of the Invisible Web

Britannica Internet Guide, Yahooligans, First Gov, Argos FindArticles, Country-Specific Search Engines, Resources Song List, this is an endless list
Archived Web Sites Search engine that finds old / defunct web sites

The Internet Archive: Way Back Machine

ShopBots Shopbots are software agents that automatically gather and collate information from multiple on-line vendors about the price and quality of consumer goods and services

Edge Gain, MySimon, ShopFind, Agent Land, Froogle

Graphics /Clip Art "Free" in the Public Domain Graphic Sites and Commercial Sites

Google, Public Domain Pictures, Locating Public Domain Images

Subject Guides

&

Subject Directories

Broad Subject Guides to Internet sources contributed by subject specialist:

Argus Clearinghouse, WWW Virtual Library, Scout Report Signpost

These sites provide access to Internet information by categories such as LC Subject Headings, Academic Disciplines:

Web Gems,  Infomine, Internet Public Library, Librarians Index to the Internet,  Google Web Directory - Open Directory, Yahoo

Newsgroups Usenet Newsgroups: Scholarly Listserv/Newsgroups

Google Groups, Tile.net, Directory of Scholarly and Professional E-Conferences

Miscellaneous: FAQ's FAQ's,  Top 10 Reasons ...
In Class Questions Try these Questions out!.  First- analyze the question - what is the best tool listed above to address the TYPE of referral you are making? Then answer the question.

Search Engines

First Generation

AltaVista @ http://www.altavista.com/
AltaVista is extremely fast and comprehensive. It searches the entire full-text of web pages and Usenet articles. Users can search for exact phrases, require or prohibit words, search within the title field of an HTML document, search for documents that contain a link to a particular URL, use wildcards, and employ case sensitivity. The advanced query allows for the use of Boolean operators (AND, OR, AND NOT, NEAR) and lets users limit searches by date.

HotBot @ http://www.hotbot.com/
HotBot is a fast, comprehensive, robot-based web index run by Wired.  The HotBot user interface supports basic searching features, such as Boolean operators with nesting. It also allows users to do weighted searches or to limit searches by date, location, media (image, text, sound, &etc.), or page type.

Excite @ http://www.excite.com/
Excite is a feature-rich web site offering the benefits of both a powerful search engine and a well-organized index. Users can search for web pages, 

Second Generation Search Engines -

Northern Light @ http://www.northernlight.com/
Northern Light is a robot-driven search engine of the worldwide Web. It has two main differences from other search engines. It dynamically organizes search results, creating folders of documents with similar subjects, sources, or types; and it contains a "special collection" of 2 million articles from over 2,900 journals, books, magazines, databases and newswires not available on any other search engine. These articles are sold for $1 to $4 each. Northern Light is especially good when looking for academic information.

Pros: automated subject-group "folders" (they work well!), indexes whole websites (not just first few levels), "special collection" of non World-Wide Web articles.
Cons: does not support complex Boolean searches

 
Google@ http://www.google.com/  

Google is the Biggest of all the surface web search engines.  Google tends to give you the most relevant and high-quality websites near the top of your results list. A site's importance is measured in part by how many other sites link to it -- the more important the site, the higher it ranks in Google's search results.
Example:

If your question is: What was the financial cost of the Vietnam War?
then try typing: "vietnam war" cost billion
Allows "field" searching, finds graphics and Newsgroups


Semantic Search

Ask Jeeves @ http://www.aj.com/  

Ask Jeeves lets you search the web by typing a simple question in plain English. Ask Jeeves contains links to more than 7 million answers to the most frequently-asked questions on the web. Also check out Ask Jeeves for Kids.
Tip: Phrase your question as simply as possible.
 How many countries are in the world?
Pros: Simple and intuitive.
Cons: Uncommon or long questions often return irrelevant results.

Teoma @ http://www.teoma.com/ 

"At Teoma, we've invented a whole new approach to search, and this allows us to achieve our mission of providing the best search results on the Web. Now, we could throw a lot of fancy terms at you, like dynamic ranking and advanced algorithms. And these concepts are a crucial part of what makes Teoma so powerful. But, what's really important for you to know is that Teoma adds a new dimension and level of authority to search results through its breakthrough approach, known as Subject-Specific PopularitySM"

Deep or Invisible Web 

Google@ http://www.google.com/   

       Indexes PDF files and Newsgroups

Profusion @ http://www.profusion.com/ 

Searches over a 1,000 individual databases

Bright Planet @ http://www.completeplanet.com/ 

Bright Planet: Deep Query Manager @ http://www.brightplanet.com/products/dqm_cp_ads.asp Fee Based - But Free Trial Offered 
90,000 specialty databases:
     Government
     Corporate
      University
     Trade associations
Message boards
Chat rooms
Hard to find deep content

Search Adobe PDF Online @ http://searchpdf.adobe.com/ 
Now there's a way to search through more than a million summaries of Adobe® Portable Document Format (PDF) files on the Web. Your search results will allow you to see the summaries before deciding to view the original Adobe PDF.

Sites to look out for (up and coming):

http://www.teoma.com 

 

META-Search Engines

Dogpile @ http://www.dogpile.com/
Dogpile Searches: The Web: Yahoo!, Thunderstone, Lycos' A2Z, GoTo.com, Mining Co., Excite Guide, PlanetSearch, What U Seek, Magellan, Lycos, WebCrawler, InfoSeek, Excite & AltaVista.

Profusion @ http://www.profusion.com/
ProFusion Search sends your query to multiple search engines at the same time. Search results returned by all search engines are then combined with duplicates removed and relevance factors recalculated. Therefore, you can take advantage of multiple search engines and obtain documents that are more likely to meet your expectations. Current implementation of ProFusion Search uses the following search engines: Alta Vista, InfoSeek, Lycos, Excite, WebCrawler, and OpenText. InfoSeek, Lycos, WebCrawler, and OpenText are keyword-based search engines. Excite is a concept-based search engine.

SearchBug @ http://www.searchbug.com/ 


Directories

Yahoo @ http://www.yahoo.com/  

Yahoo is the most widely-used Internet Directories. In contrast to the search engines listed above, which use computer programs to scour the Internet for Web pages, Yahoo catalogs sites manually, depending largely on user submissions. The front page is an alphabetical list of broad subject areas which are then subdivided into smaller categories. Users can browse the hierarchical structure, use a search engine that searches the URLs, titles and comments within Yahoo, or use both features in tandem. Current news, stock quotes, sport scores, yellow pages and city maps are available as well as regional Yahoos and an excellent search site for kids, Yahooligans.

Cons: Can be hard to differentiate between categories and sites. Site listings are not very up-to-date (i.e. new sites take a long time to get listed.)

Google Directory @ http://www.google.com/dirhp?hl=en 
Volunteers submit information that is put into categories in the Google Directory

AskJeeves @ http://www.aj.com/


Subject Specific Search Engines - Sites often part of the Invisible Web (examples)

Yahooligans  
http://www.yahooligans.com/
Yahooligans! is a subject-oriented guide for Web surfers ages 7 to 12 on the World Wide Web and Internet. Yahooligans! lists the sites that these users want to see most and categorizes them into appropriate subject categories.

Britannica's Internet Guidee @ http://search.eb.com/ 
Search Web sites chosen by Britannica's editors. 

First Gov  
http://firstgov.gov/ 
Official portal for U.S.government, with links to state and local governments.  Search by keyword, or use subject tree directory.

Argos  
http://argos.evansville.edu/
Argos is the first peer-reviewed, limited area search engine (LASE) on the World-Wide Web. It has been designed to cover the ancient and medieval worlds. Quality is controlled by a system of hyperlinked Internet indices which are managed by qualified professionals who serve as the Associate Editors of the project. The same procedures that govern quality also serve to limit the scope of Argos to the ancient world

FindArticles 
http://www.findarticles.com/PI/index.jhtml 
Indexes "free" journals not typically found by standard search engines

Country-Specific search engines  http://www.searchenginecolossus.com/

Resources/Song List/Search the Web  http://www.pdinfo.com/resource.htm 
Resources for finding usable public domain music. Alphabetical lists of more than 3000 songs now in the public domain in the United States. Web search engine for individual song titles.

 


Archived Web Sites

The Internet Archive:

Way Back Machine @ http://www.archive.org/  
A digital library
of Internet sites and other cultural artifacts in digital form. Like a paper library, we provide free access to researchers, historians, scholars, and the general public in accordance with use policy


ShopBots: Comparison Shopping

Froogle @ http://froogle.google.com/

MySimon @ http://www.mysimon.com/ 

ShopFind @ http://www.shopfind.com/ 

Edge Gain @ http://www.edgegain.com/comparison_shopping.htm 

Agent Land @ http://www.agentland.com/ 
A Directory of BOT programs


Graphics / Clipart / Sound / Video

Google @ http://www.google.comv  
Click on the IMAGES tab and search.  Remember to check for copyright use.

All The Web @ http://www.alltheweb.com/?_oldhost=alltheweb.com 

Singingfish @ http://www.singingfish.com/ 

Locating Public Domain Images @ http://www.ala.org/acrl/resjan98.html 
Great Article with links!!

The AccuNet/AP Multimedia Archive  "...perhaps the most important visual archive available today."


Subject Guides

Argus Clearinghouse @ http://www.clearinghouse.net/
The Argus Clearinghouse provides a central access point for value-added topical guides which identify, describe, and evaluate Internet-based information resources.  Guides are rated and authored in a peer reviewed fashion.

WWW Virtual Library @ http://www.vlib.org/Home.html
The VL is the oldest catalog of the web, started by Tim Berners-Lee, the creator of the web itself. Unlike commercial catalogs, it is run by a loose confederation of volunteers, who compile pages of key links for particular areas in which they are expert; even though it isn't the biggest index of the web, the VL pages are widely recognized as being amongst the highest-quality guides to particular sections of the web. Also compiled by VL is....

EJournal @ http://www.edoc.com/ejournal/
Includes peer reviewed academic online journals

Scout Report Signpost @ http://www.signpost.org/signpost/
 The Scout Report Signpost contains only the best Internet resources, as chosen by the editorial staff of the Scout Report, which have been cataloged and organized for efficient browsing and searching. SEARCHABLE BY SUBJECT HEADINGS


Subject Directories

Infomine: Scholarly Internet Resource Collection @ http://lib-www.ucr.edu/
Sponsored by the University of California, Infomine is intended for the introduction and use of Internet/Web resources of relevance to faculty, students, and research staff at the university level. It is being offered as a comprehensive showcase, virtual library and reference tool containing highly useful Internet/Web resources including databases, electronic journals, electronic books, bulletin boards, listservs, online library card catalogs, articles and directories of researchers, among many other types of information.

WebGems: A Guide to Substantive Web Resources
http://www.fpsol.com/gems/webgems.html
Sites listed in WebGEMS provide significant information useful to students and researchers

The Internet Public Library @ http://www.ipl.org/
An online library that provides access to fulltest materials.  The IPL is self-described by the following:  The Library is hosted by the School of Information & Library Studies of the University of Michigan, and we acknowledge with gratitude their support. We actively seek the participation and collaboration of individuals and organizations around the world, but it is our intent to maintain the nucleus of the Library here. We are pleased now to be supported by grants from the School of Information via its grant from the W. K. Kellogg Foundation,

Google Web Directory @ http://directory.google.com/ 

Yahoo @ http://www.yahoo.com 


Newsgroups

Google Groups @ http://groups.google.com/ 
Deja News is the only searcher dedicated exclusively to searching Usenet newsgroups and is a great way to find esoteric information even too arcane for the World Wide Web. Users can search the Usenet as far back as March, 1995 with a simple interface. A very customizable page is also available where users can create their own query filters. Deja News also provides a great deal of basic information about newsgroups, newsreading and posting and allows users to post from their site.

Directory of Scholarly and Professional E-Conferences @ http://www.n2h2.com/KOVACS
A great sight that reviews Newsgroups and Listservs by academic, scholarly, or professional criteria


Tile Net
@ http://tile.net/
A good place to find out if a Newsgroup or Listserv is moderated, amount of traffic, sponsorship, etc.


Miscellaneous

FAQ's @ http://personalweb.smcvt.edu/sburks/faq.htm
FAQ's stand for "frequently asked questions."   The term probably originated with Usenet Newsgroup listings.  New people to these discussion groups were asking the same old questions on the topics covered.   FAQ's helped keep topic discussions from needlessly reviewing the same questions from "newbees."


Top 10 Reasons You Are or Aren't Successful in Searching the WWW
@ http://academics.smcvt.edu/sburks/top10.htm 


Name: ___________________________________

In-Class Questions/Homework :: Try these Questions out!.  First- analyze the question - what is the best tool listed above to address the TYPE of referral you are making? Then answer the question. Give the source and answer.  Due next class (thursday)

1. I need to find information listed in a foreign country web site.  
Find a Search Engine based in Norway

 

 

2. I'm shopping for a product and want to find the best price.  
Find the best price for the following digital camera Canon PowerShot A80   

 

 

3. You are looking for the name of an old high school sweetheart and want to search more than one search engine at in one search.  
 

 

 

4.  I want to find a discussion group (Usenet Newsgroups) where people discus topics of interest to me.  
Find a (recent) message in a Newgroup that discusses the safety of travel in Cuba.  Give the Newsgroup and message.

Find  a Newsgroup that list job openings.

 

5. I want to find images on the web using a search engine.  
Find a photo of a train taken in Vermont.   Successful - yes or no

 

 

6. I want to find old / defunct web sites.  
Find the old (original) St. Michael's Web page.

 

 

7. I want to search for a document on the Web  but I think it is in an Adobe .pdf format.
What are the trends for fatherhood in Vermont?

 

 

8. I want to use a search engine that allows me to use a search engine using "natural language." Hint try Askjeeves or a Semantic Search engine
Find out How many countries are in the world?

 

 

9. I want to find common basic information about a topic
Find a FAQ on making beer

 

 

10. I want to find a "broad" subject guide to Internet Resources.
Find a guide to History sources.

 

 

11.  How do you evaluate a search engine?  Compare two Search Engines by the following criteria:

*One search engine should be your personal favorite. _________________________

* The second search engine can be one of the following choices. Circle your choice

A. Google - http://www.google.com/ 
B. Excite - http://my.excite.com/myexcite/my.jsp 
C. Alta Vista - http://www.altavista.com/ 
D. Hot Bot - http://hotbot.lycos.com/?query= 

 

Compare the Two search engines by the following criteria:

1. Size of the Search Engine
- How many pages are found in a search? (See - http://www.searchenginewatch.com/reports/sizetest.html for ideas on how to do the comparison)

2. File Types - What file types can be found by you search engines?
    - Web Pages, Usenet News, gopher, FTP, PDF (Adobe)
    - Other [software, sound, images, video, MP3, etc]
    - Material type:  newspapers, magazines, books

3. Interface
    - modes: simple or complex or guided, look over for details of Boolean searching, etc. Search Engine Evaluation Part I,

4. Ranking of results:  See if you can determine how results are listed by what ranking criteria.  Do the two search engines differ in results listed in top 5 sites given?  Did one search engine lead you to better sources than another?
   

5. Limitations
    - Language, Geography, Filters for language

 

6.. Description of sources (annotations) found in hit list

 

7. Speed - how fast is the search - time a search.

 

After doing a comparison of the two search engines by the criteria above, give a short 1 paragraph explanation of which Search is "better."