(Links to resources will open in a new window)
At the February 2002 CENDI meeting, Mr. Lederman explaieds that the invisible web, or deep web, is information that is made available via the web, but cannot be retrieved by web crawlers because it is in databases, behind firewalls, or is available only for a fee or with other access restrictions. Distributed Explorit is the deep web search technology developed by Innovative Web Applications and has been used on DOE's Environmental Science Network and the Energy Portal at DTIC, as well as other web sites including Science.gov. Mr. Lederman explained how Distributed Explorit functions and listed some enhancements that are not usually found in Web search engines or other applications that search the deep web. The ability to mark and download results, field searching (including date-range searching), access to log-in restricted Web sites, and navigation capabilities are a few of the enhancements mentioned. Plans for future enhancements may include the clustering of results, indexing/analysis of results, assisting users identify the most relevant results, and connecting the product with collaborative work tools.
The overheads from this presentation describe the study of the requirements of the CENDI agencies and the methodology for reviewing the applicable search engine technologies.
In the context of the need to keep up with the content on the Internet that would be of value for the Scout Report's Signpost catalog of Internet resources, the article describes an approach to linking distributed collections of metadata so that they can be searched as a single collection. The infrastructure, based on the Lightweight Directory Access Protocol (LDAP) and the Common Indexing Protocol (CIP) is described. The advantages of using linked metadata as opposed to keyword indexing searching are discussed. Other architectures for metadata discovery are also outlined. Research issues and future directions for the project are included. The Internet Scout Project was funded by NSF.
Paper describing the conceptual design of the Next Generation Internet being supported by the Clinton Administration and various agencies including the National Library of Medicine. Intended to support the continued growth of networking technologies and to address concerns of researchers and government agencies for a high-speed network.
Draft of the specifications for the Resource Description Framework (RDF) which will allow multiple metadata formats to interoperate. It will also allow nesting of metadata to show parent child relationships. This is a working draft and subject to change at any time.
The workshop addresses information server technologies, search technologies and directory and online services. Participants are proponents of repository interface standards for distributed indexing and searching. The report begins with a two page summary of topics and outcomes. Three technical sessions were held on Distributed Data Collection, Data Transfer Formats, and Distributed Search Architectures. Follow-on discussion was on indexing and collecting information needed for indexing. Slides, session notes, outcomes, and quotes are included. Participant and position papers are available in PDF, Postscript and Word format. Standards to support searching and metadata tags were among topics discussed.
Newest version of specifications of the Resource Description Framework (RDF) which allow multiple metadata formats to inter-operate. It will also allow nesting of metadata to show parent child relationships. This version is the recommendation of W3C.
This paper discusses a new search technology called LexiBot that is capable of searching the vast untapped resources of the Web. The search engine can identify, retrieve, qualify, classify, and organize both "surface" and "deep" content from the Web. "Deep" content is that content that is included in databases, spreadsheets, tables and other non-text, non-HTML sources. Developed by the BrightPlanet team, LexiBot is a direct-access query engine. The author also points out some interesting facts about the "deep" Web including the fact that the deep Web contains nearly 550 billion individual documents compared to the 1 billion on the surface Web.
This is a draft of a white paper on the discovery and retrieval of networked information that is currently under preparation. Currently only the outlines for four chapters are available.
Minutes from a presentation by Dr. Walter Warnick and Vincent Dattoria, DOE/Office of Scientific and Technical Information at CENDI on October 3, 2000 in Germantown, MD. This briefing was originally presented to the PITAC Digital Library Panel on September 19, 2000. He emphasized the need for an implementation strategy for an information infrastructure for the physical sciences which would require interagency cooperation and provide a common knowledge base for comprehensive access for use and reuse of physical science information. He also discussed Distributed Explorer, a distributed search engine which has been used at OSTI to search web-based gray literature and R&D project descriptions across agencies. Documents in this network include over 100,000 technical reports that are publicly available from DTIC, NASA, EPA, and DOE. OSTI's digital library for energy science is proposed as a building block for part of the physical sciences infrastructure.
This article describes the joint efforts of the U.S. Department of Energy's Office of Scientific and Technical Information (OSTI) and Innovative Web Applications (IWA) to develop EnergyPortal Search, a Directed Query Engine used to search and retrieve content from the deep web. Deep web or invisible web content is described as documents in online databases that normal web crawlers are unable to reach. For example, the DOE Information Bridge, an online database of over 60,000 technical reports is part of the deep web and served as the cornerstone for OSTI's efforts. The EnergyPortal Search is a government product that enables users to simultaneously search across distributed, deep web database content with a single search query. Full text material, images, presentations and other media that are essentially invisible to search engines are constantly being added to the deep web. OSTI has been very aggressive in digitizing DOE's gray literature. OSTI's collaboration with IWA enabled them to identify eleven of the most popular databases from EnergyFiles and then configure the Distributed Explorer Directed Query Engine to search these multiple heterogeneous databases in parallel. A single interface was used to display the results. Thus the EnergyPortal Search was created. Information format and where it resides is no longer an issue. Building upon the successes of the EnergyPortal Search, OSTI created the PrePRINT Network in January of 2000. This is a searchable gateway to web-based collections of scientific preprints and reprints as provided by researchers. Again using the Directed Query Engine, the Network provides the ability to simultaneously search across multiple preprint databases and servers. PrePRINT Alerts is a new feature that provides patrons with personalized, profile-based notices of recent additions to any of the selected resources. Patrons set up their own profile interests and whenever new information is added, the patron is notified via email. Two other tools recently developed by OSTI using the Distributed Explorer Directed Query Engine are the GrayLIT Network and the Federal R&D Project Summaries. The value of this new technology lies in the fact that a single search can simultaneously search a number of databases in parallel. Also no obligations are placed on the site owners to change their current processes or configurations and there are no additional burdens placed on the information creators. The potential Next Generation architecture for the Physical Sciences Infrastructure (PSII) includes The Federal R&D Project Summaries and the GrayLit Network as part of the conent model. Efforts are still underway to make query results more accurate and relevant.
Bielefeld University Library is developing the "Digital Library North Rhine-Westphalia" which offers integrated access to electronic resources in local and remote online library catalogues, and to licensed electronic resources. The Library has partnered with Fast Search & Transfer, the leading developer of enterprise search and real-time alerting technologies to improve explore and develop tools to improve access to this and other digital collections.
This panel at the September 7-8, 2005, CENDI meeting covers several topics: use of search tools for R & D; value of exhibits in generating interest; use of expert panels; and use of geographic metadata.
First Topic Listing |
Previous Topic Listing |
Next Topic Listing |
Last Topic Listing |