专注在线职业教育24年
下载APP
小程序
希赛网小程序
导航

计算机专业时文选读之二十三

责编:yuyuyiyi 2005-01-25

Topic Maps

Computers have so overloaded us with data, it becomes increasingly difficult to find the information we seek. Beginning in the 1990s, powerful search engines like Yahoo, AltaVista and Google made the Web an incomparably valuable information resource, but the growth of available information has rendered even those remarkable tools far less useful. Google currently indexes more than 4 billion pages, and queries often return tens of thousands of pages, but they are arranged in no discernable order.

One promising approach, still in its infancy, is called topic mapping. A topic map is a kind of data structure, just as an outline or a set of categories is. In practice, topic maps were standardized by the International Standards Organization in 2000 (ISO/IEC 13250) as XML Topic Maps, or XTM. XTM provides a basic model using XML tags to represent the structure of information resources, concepts and the relationships between them.

How It Works

Let's start with a subject, a real-world entity or an idea that we're representing in our map by topic. A subject can be almost anything, from an abstract concept to a specific document section, and the terms subject and topic are often used interchangeably.

The topic map model lets us attach three elements (called characteristics) to any given topic: its names, its associations with other topics and its occurrences (also called resources).

Names are mainly useful to people in dealing with topics, and a topic doesn't actually need a name: A typical cross-reference points to an unnamed topic. Also, we typically group topics according to some notion of type.

For example, if we're mapping an IT installation, we likely have topics for specific pieces of equipment, homegrown and purchased applications, data storage information and the like. Thus, our map would also include categorical topics such as hardware, software and data structures.

Associations are the conceptual heart of topic maps, indicating how one topic relates to another. For example, Book A (a topic) is written by (association) Author B (another topic).

Occurrences are the actual references—pointers to relevant information resources. Occurrences could include articles, books, images, audio and video segments, application code routines or even people. Typically, we refer to occurrences with uniform resource identifiers (URI), an Internet Engineering Task Force standard for addressing and referencing resources. Web address URLs are a type of URI.

These characteristics of topics aren't universal. They exist within a limited context (called scope), where they are regarded as valid.

The final concept is identity. Ideally, there should be one topic for each subject, and vice versa. In practice, multiple topics can represent a single subject, as when different topic maps are merged. And in a single topic map, we might find “William F. Bonney” and “Billy the Kid” as separate topic names referring to the same subject, a historical person.

But the topic name“Billy the Kid”might also refer to the ballet. To get around these problems, we can unambiguously define the identity of a subject through resources called subject indicators.

The promise of topic maps is clear. Unfortunately, the idea of topic maps is still well ahead of its time. Tools for creating topic maps do exist, along with some implementations in specific subject areas, but these are primarily oriented toward representing and organizing content, and they don't yet adequately address the task of content creation.

But in a few more years, as Moore's Law continues to expand our computing capabilities, we may well see topic maps come into their own.

时文选读

主题图

计算机给我们带来了太多的数据,要找到我们所需数据已变得非常困难。从上世纪九十年代开始, Yahoo、AltaVista 和Google等强大的搜索引擎让万维网成为价值无可比拟的信息源泉,但是可获得信息的(快速)增长使得这些著名的工具也变得不太有用。目前,Google对40多亿页编了索引,一次查询常常返回数以万计的页面,而它们的排列又是没有可辨识的次序。

一个尚处于幼年期的叫作主题图的方法前途无量。主题图是一类数据结构,类似于一个纲要或者一组分类。实际上,主题图已由国际标准化组织在2000年进行了标准化(ISO/IEC13250),称作XML主题图,缩写为XTM。XTM提供了利用XML标记的基本模式,来表示信息资源、概念和它们之间关系的结构。

它是如何工作的?

让我们从一个题目、一个真实世界的实体或一个观念开始,把它在按题目的图中表达出来。题目几乎可以是任何东西,从抽象的概念到具体的文档章节,“题目”和“主题”术语常常交换使用。

主题图模型让我们对任何给出的主题附加三种成分(称作特性):名字、与其他主题的联系以及事件(也称作资源)。

名字主要在处理主题时用于人,主题实际上不需要名字:典型的交叉引用指向没有命名的主题。我们通常也根据某个类型观念给主题分组。

例如,我们若要给IT设备归并主题,我们有可能拥有特定设备、自制的和购买的应用程序、数据存储信息等的名字。因而我们的图也包括类别主题,如硬件、软件和数据结构。

联系是主题图的核心概念,指出了一个主题是如何与另一个主题发生关系的。例如,A书(一个主题)是由(联系)B作者(另一个主题)写的。

事件是实际的引用——对有关信息资源的指针。事件可能包括文章、书籍、图像、音频和视频片断、应用程序的例行子程序或者甚至是人。通常,我们利用统一的资源标识符(URI)引用事件。URI是因特网工程任务组提出的寻址和引用资源的标准。万维网地址URL就是URI的一个类型。

主题的这些特性不是通用的。它们存在于有限制的上下文中(称作范围),在此范围内它们是正确的。

最后一个概念是同一性。在理想的情况下,对应每一个题目就应该有一个主题,反之亦然。在实际中,可以有多个主题表达单个题目,如在合并不同主题图时就是这样。在单一主题图中,我们可能会发现“William F. Bonney” 和“Billy the Kid”不同的主体名称都是指同一题目,即一位历史人物。

但是“Billy the Kid”这个主题名可能也是指一个芭蕾舞曲的名称。为了避免这类问题,我们可以通过一个称作题目指示符的资源来清晰地定义主题的同一性。

主题图的未来是光明的。可惜,主题图的思想超前了。生成主题图的工具的确存在,并在一些特定的题目范围内已有实现,但是它们主要是面向表达和组织内容的,它们还不能完全解决内容生成这个任务。

但在今后的几年中,随着摩尔定律继续提高我们的计算能力,我们将看到主题图会发出它应有的光辉。

更多资料
更多课程
更多真题
温馨提示:因考试政策、内容不断变化与调整,本网站提供的以上信息仅供参考,如有异议,请考生以权威部门公布的内容为准!
相关阅读
查看更多

加群交流

公众号

客服咨询

考试资料

每日一练

咨询客服