by Steve Burks, www.smcvt.edu/library
| Reference & Instruction Librarian Saint Michael's College Library Colchester, VT 05439 802/654-2354; Fax 802/654-2630 sburks@smcvt.edu |
BONUS QUESTION: Who was the person who first coined the phrase "Invisible Web" and when? |
Assignment Due next class - Finish In-class Questions below
Web Search Engines Categorized Distribution of deep Web sites by type of content.
From White Paper @ http://www.brightplanet.com/deepcontent/tutorials/DeepWeb/index.asp
|
The Internet is providing a number of search engines and subject sites that are specific to subject disciplines and rated by evaluative criteria. Each search engine "harvests" web sites on the Internet, and then ranks them by different methods. Some search engines use quantity of search hit returns and others use quality of hits to make their search engine as useful as possible.
What can't be searched by "traditional" search engines is referred to as the Invisible Web or Deep Web. These terms describe the vast amount of digital information not found with "traditional" search engines (up to 1000 times the size) See http://www.brightplanet.com/deepcontent/index.asp for more information
- Definition of a Search Engine
- Evaluating Search Engines
- What is the Invisible Web?
- Why is the Invisible Web so much Larger than the Surface Web?
- Top 60 Invisible web sites by size
- Top Ten Reasons You are or Aren't Successful Searching the WWW
Coverage |
| Search Engines | First Generation:
The first designed Oold" search engines. Use term
relevancy ranking as the primary means of ranking results AltaVista, HotBot, Excite Second
Generation and Semantic search: Use other schemes to rank search results by as
concept-processing tools, popularity,
or number of links. Second Generation search engines may also allow natural language
queries instead of using Boolean terms (and, or, not) Deep or Invisible Web: The Invisible
Web is thought to be over 500 times of the estimated size of the
"searchable" web (estimated to be 3 billion web sites).
"Deep" search engines can find "Invisible" sources of
information such as pdf files, online databases (ex. encyclopedias), . META Search Engines: Not real search
engines, these sites allow a user to search more multi-search engines in one search Directories: These sites index materials
"placed" on their directories. Some directories are a combination of
search engine and directory. Directories are usually human-compiled guides to the web,
where sites are organized by category |
| Subject Specific Search Engines | Subject Specific Search Engines -
Sites often part of the Invisible Web Britannica Internet Guide, Yahooligans, First Gov, Argos FindArticles, Country-Specific Search Engines, Resources Song List, this is an endless list |
| Archived Web Sites | Search engine that finds old / defunct
web sites
The Internet Archive: Way Back Machine |
| ShopBots | Shopbots
are software agents that automatically gather and collate information from
multiple on-line vendors about the price and quality of consumer goods and
services
Edge Gain, MySimon, ShopFind, Agent Land, Froogle |
| Graphics /Clip Art | "Free" in the Public Domain Graphic Sites
and Commercial Sites
Google, Public Domain Pictures, Locating Public Domain Images |
| Subject Guides
& |
Broad Subject Guides to Internet sources contributed by subject
specialist: Argus Clearinghouse, WWW Virtual Library, Scout Report Signpost These sites provide access to Internet information
by categories such as LC Subject Headings, Academic Disciplines: |
| Newsgroups | Usenet Newsgroups: Scholarly
Listserv/Newsgroups Google Groups, Tile.net, Directory of Scholarly and Professional E-Conferences |
| Miscellaneous: FAQ's | FAQ's, Top 10 Reasons ... |
| In Class Questions | Try these Questions out!. First- analyze the question - what is the best tool listed above to address the TYPE of referral you are making? Then answer the question. |
First Generation
AltaVista @ http://www.altavista.com/
AltaVista is extremely fast and comprehensive. It searches the entire full-text of web
pages and Usenet articles. Users can search for exact phrases, require or prohibit words,
search within the title field of an HTML document, search for documents that contain a
link to a particular URL, use wildcards, and employ case sensitivity. The advanced
query allows for the use of Boolean operators (AND, OR, AND NOT, NEAR) and lets users
limit searches by date.
HotBot @ http://www.hotbot.com/
HotBot is a fast, comprehensive, robot-based web index run by Wired.
The HotBot user
interface supports basic searching features, such as Boolean operators with nesting. It
also allows users to do weighted searches or to limit searches by date, location, media
(image, text, sound, &etc.), or page type.
Excite @ http://www.excite.com/
Excite is a feature-rich web site offering the benefits of both a powerful search engine
and a well-organized index. Users can search for web pages,
Northern Light @ http://www.northernlight.com/
Northern Light is a robot-driven search engine of the worldwide Web. It has two main
differences from other search engines. It dynamically organizes search results, creating
folders of documents with similar subjects, sources, or types; and it contains a
"special collection" of 2 million articles from over 2,900 journals, books,
magazines, databases and newswires not available on any other search engine. These
articles are sold for $1 to $4 each. Northern Light is especially good when looking for
academic information.
Google@ http://www.google.com/
Semantic Search
Ask Jeeves @ http://www.aj.com/
Ask Jeeves lets you search the web by typing a simple question in plain English. Ask Jeeves contains links to more than 7 million answers to the most frequently-asked questions on the web. Also check out Ask Jeeves for Kids.
Tip: Phrase your question as simply as possible.
How many countries are in the world?
Pros: Simple and intuitive.
Cons: Uncommon or long questions often return irrelevant results.
Teoma @ http://www.teoma.com/
"At Teoma, we've invented a whole new approach to search, and this allows us to achieve our mission of providing the best search results on the Web. Now, we could throw a lot of fancy terms at you, like dynamic ranking and advanced algorithms. And these concepts are a crucial part of what makes Teoma so powerful. But, what's really important for you to know is that Teoma adds a new dimension and level of authority to search results through its breakthrough approach, known as Subject-Specific PopularitySM"
Deep or Invisible Web
Google@ http://www.google.com/
Indexes PDF files and Newsgroups
Profusion @ http://www.profusion.com/
Searches over a 1,000 individual databases
Bright Planet @ http://www.completeplanet.com/
Bright Planet: Deep Query Manager @ http://www.brightplanet.com/products/dqm_cp_ads.asp
Fee Based - But Free Trial Offered
90,000 specialty databases:
Government
Corporate
University
Trade associations
Message boards
Chat rooms
Hard to find deep content
Search Adobe PDF Online @ http://searchpdf.adobe.com/
Now there's a way to search through more than a million summaries of Adobe®
Portable Document Format (PDF) files on the Web. Your search results will allow
you to see the summaries before deciding to view the original Adobe PDF.
Sites to look out for (up and coming):
META-Search Engines
Dogpile @ http://www.dogpile.com/
Dogpile Searches: The Web: Yahoo!, Thunderstone, Lycos' A2Z, GoTo.com,
Mining Co., Excite Guide, PlanetSearch, What U Seek, Magellan, Lycos, WebCrawler,
InfoSeek, Excite & AltaVista.
Profusion @ http://www.profusion.com/
ProFusion Search sends your query to multiple search engines at the same time. Search
results returned by all search engines are then combined with duplicates removed and
relevance factors recalculated. Therefore, you can take advantage of multiple search
engines and obtain documents that are more likely to meet your expectations. Current
implementation of ProFusion Search uses the following search engines: Alta Vista,
InfoSeek, Lycos, Excite, WebCrawler, and OpenText. InfoSeek, Lycos, WebCrawler, and
OpenText are keyword-based search engines. Excite is a concept-based search engine.
SearchBug @ http://www.searchbug.com/
Directories
Yahoo @ http://www.yahoo.com/
Yahoo is the most widely-used Internet Directories. In contrast to the search engines listed above, which use computer programs to scour the Internet for Web pages, Yahoo catalogs sites manually, depending largely on user submissions. The front page is an alphabetical list of broad subject areas which are then subdivided into smaller categories. Users can browse the hierarchical structure, use a search engine that searches the URLs, titles and comments within Yahoo, or use both features in tandem. Current news, stock quotes, sport scores, yellow pages and city maps are available as well as regional Yahoos and an excellent search site for kids, Yahooligans.
Cons: Can be hard to differentiate between categories and sites. Site listings are not very up-to-date (i.e. new sites take a long time to get listed.)
Google Directory @ http://www.google.com/dirhp?hl=en
Volunteers submit information that is put into categories in the Google
Directory
AskJeeves @ http://www.aj.com/
Subject Specific Search Engines - Sites often part of the Invisible Web (examples)
Yahooligans
http://www.yahooligans.com/
Yahooligans! is a subject-oriented guide for Web surfers ages 7 to 12 on the
World Wide Web and Internet. Yahooligans! lists the sites that these users want
to see most and categorizes them into appropriate subject categories.
Britannica's Internet
Guidee @
http://search.eb.com/
Search Web sites chosen by Britannica's editors.
First Gov
http://firstgov.gov/
Official portal for U.S.government, with links to state and local governments.
Search by keyword, or use subject tree directory.
Argos
http://argos.evansville.edu/
Argos is the first peer-reviewed, limited area search
engine (LASE) on the World-Wide Web. It has been designed to cover the ancient
and medieval worlds. Quality is controlled by a system of hyperlinked Internet
indices which are managed by qualified professionals who serve as the Associate
Editors of the project. The same procedures that govern quality also serve to
limit the scope of Argos to the ancient world
FindArticles
http://www.findarticles.com/PI/index.jhtml
Indexes "free" journals not typically found by standard search engines
Country-Specific search engines http://www.searchenginecolossus.com/
Resources/Song List/Search the Web
http://www.pdinfo.com/resource.htm
Resources for finding usable public domain music. Alphabetical lists of more
than 3000 songs now in the public domain in the United States. Web
search engine for individual song titles.
Archived Web Sites
The Internet Archive:
Way Back Machine @ http://www.archive.org/
A digital library of Internet sites and other cultural artifacts in digital
form. Like a paper library, we provide free access to researchers,
historians, scholars, and the general public in accordance with use policy
ShopBots: Comparison Shopping
Froogle @ http://froogle.google.com/
MySimon @ http://www.mysimon.com/
ShopFind @ http://www.shopfind.com/
Edge Gain @ http://www.edgegain.com/comparison_shopping.htm
Agent Land @ http://www.agentland.com/
A Directory of BOT programs
Graphics / Clipart / Sound / Video
Google @ http://www.google.comv
Click on the IMAGES tab and search. Remember to check for copyright use.
All The Web @ http://www.alltheweb.com/?_oldhost=alltheweb.com
Singingfish @ http://www.singingfish.com/
Locating Public Domain Images @ http://www.ala.org/acrl/resjan98.html
Great Article with links!!
The AccuNet/AP Multimedia Archive "...perhaps the most important visual archive available today."
Argus Clearinghouse @ http://www.clearinghouse.net/
The Argus Clearinghouse provides a central access point for value-added topical guides
which identify, describe, and evaluate Internet-based information resources. Guides
are rated and authored in a peer reviewed fashion.
WWW Virtual Library @ http://www.vlib.org/Home.html
The VL is the oldest catalog of the web, started by Tim Berners-Lee, the creator of the
web itself. Unlike commercial catalogs, it is run by a loose confederation of volunteers,
who compile pages of key links for particular areas in which they are expert; even though
it isn't the biggest index of the web, the VL pages are widely recognized as being amongst
the highest-quality guides to particular sections of the web. Also compiled by VL is....
EJournal @ http://www.edoc.com/ejournal/
Includes peer reviewed academic online journals
Scout Report Signpost @ http://www.signpost.org/signpost/
The Scout Report
Signpost contains only the best Internet resources, as chosen by the editorial staff of
the Scout Report, which
have been cataloged and organized for efficient browsing and searching.
SEARCHABLE BY SUBJECT HEADINGS
Infomine: Scholarly Internet Resource Collection @ http://lib-www.ucr.edu/
Sponsored by the University of California, Infomine is intended for the introduction and
use of Internet/Web resources of relevance to faculty, students, and research staff at the
university level. It is being offered as a comprehensive showcase, virtual library and
reference tool containing highly useful Internet/Web resources including databases,
electronic journals, electronic books, bulletin boards, listservs, online library card
catalogs, articles and directories of researchers, among many other types of information.
WebGems: A Guide to Substantive Web Resources
@ http://www.fpsol.com/gems/webgems.html
Sites listed in WebGEMS provide significant information useful to students and researchers
The Internet Public Library @ http://www.ipl.org/
An online library that provides access to fulltest materials. The IPL is
self-described by the following: The Library is hosted by the School of Information
& Library Studies of the University of Michigan, and we acknowledge with gratitude
their support. We actively seek the participation and collaboration of individuals and
organizations around the world, but it is our intent to maintain the nucleus of the
Library here. We are pleased now to be supported by grants from the School of Information
via its grant from the W. K. Kellogg Foundation,
Google Web Directory @ http://directory.google.com/
Yahoo @ http://www.yahoo.com
Google Groups @ http://groups.google.com/
Deja News is the only searcher dedicated exclusively to searching Usenet newsgroups and is
a great way to find esoteric information even too arcane for the World Wide Web. Users can
search the Usenet as far back as March, 1995 with a simple interface. A very customizable page is also available where users can create their own query filters. Deja
News also provides a great deal of basic information about newsgroups, newsreading and
posting and allows users to post from their site.
Directory of Scholarly and Professional E-Conferences @ http://www.n2h2.com/KOVACS
A great sight that reviews Newsgroups and Listservs by academic, scholarly,
or professional criteria
Tile Net @ http://tile.net/
A good place to find out if a Newsgroup or Listserv is moderated, amount of traffic,
sponsorship, etc.
FAQ's @ http://personalweb.smcvt.edu/sburks/faq.htm
FAQ's stand for "frequently asked questions." The term probably
originated with Usenet Newsgroup listings. New people to these discussion groups
were asking the same old questions on the topics covered. FAQ's helped keep
topic discussions from needlessly reviewing the same questions from "newbees."
Top 10 Reasons You Are or Aren't Successful in Searching the WWW @ http://academics.smcvt.edu/sburks/top10.htm
Name: ___________________________________
In-Class Questions/Homework :: Try these Questions out!. First- analyze the question - what is the best tool listed above to address the TYPE of referral you are making? Then answer the question. Give the source and answer. Due next class (thursday)
1. I need to find information listed in a foreign country web
site.
Find a Search Engine based in Norway
2. I'm shopping for a product and want to find the best
price.
Find the best price for the following digital camera
Canon PowerShot A80
3. You are looking for the name of an old high school sweetheart and
want to search more than one search engine at in one search.
4. I want to find a discussion group (Usenet Newsgroups) where people discus topics of interest to me.
Find a Newsgroup that list job openings.
5. I want to find images on the web using a search engine.
Find a photo of a train taken in Vermont. Successful - yes or no
6. I want to find old / defunct web sites.
Find the old (original) St. Michael's Web page.
7. I want to search for a document on the Web but I think it is in
an Adobe .pdf format.
What are the trends for fatherhood in Vermont?
8. I want to use a search engine that allows me to use a search engine
using "natural language." Hint try Askjeeves or a Semantic Search
engine
Find out How many countries are in the world?
9. I want to find common basic information about a topic
Find a FAQ on making beer
10. I want to find a "broad" subject guide to Internet
Resources.
Find a guide to History sources.
11. How do you evaluate a search engine? Compare two Search Engines by the following criteria:
*One search engine should be your personal favorite. _________________________
* The second search engine can be one of the following choices. Circle your choice
A. Google - http://www.google.com/
B. Excite - http://my.excite.com/myexcite/my.jsp
C. Alta Vista - http://www.altavista.com/
D. Hot Bot - http://hotbot.lycos.com/?query=
Compare the Two search engines by the following criteria:
1. Size of the Search Engine
- How many pages are found in a search? (See - http://www.searchenginewatch.com/reports/sizetest.html
for ideas on how to do the comparison)
2. File Types - What file types can be found by you search engines?
- Web Pages, Usenet News, gopher, FTP, PDF (Adobe)
- Other [software, sound, images, video, MP3, etc]
- Material type: newspapers, magazines, books
3. Interface
- modes: simple or complex or guided, look over for details
of Boolean searching, etc. Search
Engine Evaluation Part I,
4. Ranking of results: See if you can determine how results are listed
by what ranking criteria. Do the two search engines differ in results
listed in top 5 sites given? Did one search engine lead you to better
sources than another?
5. Limitations
- Language, Geography, Filters for language
6.. Description of sources (annotations) found in hit list
7. Speed - how fast is the search - time a search.
After doing a comparison of the two search engines by the criteria above, give a short 1 paragraph explanation of which Search is "better."