Databases get a grip on XML
If you could do one thing to improve integration and automate processes with customers and business partners, it would be to implement XML, which has become the standard for exchanging information between disparate systems because it is easily transformed into any format. The good news is that the four leading relational databases, namely Oracle Database, IBM DB2, Sybase ASE, and Microsoft SQL Server, not only can store XML data, but they hide much of the complexity of working with XML.
What does a fashionable XML database provide? Four basic functions: the ability to consume, store, search, and generate XML. The extent to which the database supports these functions and the methods it uses to accomplish them are what make for a successful implementation of XML in a database.
Relational databases and XML documents are both powerful ways to represent relationships among data, but they're powerful in different ways. For example, querying on a patient ID number in a relational database may allow you to quickly find the dates a certain patient visited the hospital, the conditions he was diagnosed with, and the treatments he was given. But it likely won't help you determine which treatments were provided for which conditions or what times the treatments took place, nor will it give you other useful information that XML versions of these records could provide.
But whether or not you can combine the benefits of relational and XML data depends on how you store the XML. There are three methods for physically storing XML data in a relational database: shredded, unstructured, and structured. Shredded and unstructured are useful methods but limited. The structured method allows you to leverage the power of both relational data and XML hierarchies.
Shredding puts XML data into relational columns but strips it of its XMLness, meaning the hierarchical relationships among the data in the original XML document are lost. Shredding is useful when you're not concerned about keeping the data in XML format. For example, let's say you have a Web site that allows customers to place orders, and the order needs to go to a number of different database systems. Producing an XML file and having the different systems pick it up——that is, shred it——from a network share may be the most efficient and error-free way to get the data where you want it to go.
The unstructured method uses a data type called a CLOB (Character Large Object) to store an entire XML document as a single unit. Databases have been doing this for years with different types of documents, so this is nothing new. The unstructured method provides limited search capabilities, but it is still quite useful. You can't base queries on it, but the structure of the original data is preserved. A good use for unstructured XML storage would be in keeping original documents to comply with government regulations. For example, if a financial institution were to receive original loan documents in XML, this would allow them to have a relational record of each loan application, and also to store the original application with that record.
The structured method allows you to store XML data inside the database and preserve the hierarchy of the data. Structured storage, also known as “native XML”storage, is what every vendor is trying to achieve. The most obvious benefit of preserving the hierarchical relationships of XML data is being able to receive an XML document, combine it or manipulate it with relational data, and produce XML as a result. It isn't possible to produce such result sets with a relational query language alone.
时文选读
数据库抓住XML
如果你能做件事来改进(应用)集成、实现与客户和商务伙伴合作的自动化,那它就是实施XML,XML已经成为不同系统之间交换信息的标准,因为它很容易转换成任何格式。令人高兴的是,四个主要的关系数据库系统,即Oracle、IBM DB2、Sybase ASE和Microsoft SQL Server,不仅能储存XML数据,而且还隐藏掉使用XML时的很多复杂性。
那么时髦的XML数据库能提供哪些功能?有四项基本功能:消费、储存、搜索和生成XML。在多大的程度上支持这些功能和实现这些功能所使用的方法,成为在数据库中实施XML的关键。
关系数据库和XML文档都是表示数据之间关系的重要方法,但是它们的重要性表现在不同的地方。例如,查询关系数据库中病人的身份证号码,可以让你快速发现某个病人到医院看病的日期、诊断的病情和接受的治疗。但此数据库不可能帮你确定对哪种病情提供了哪种治疗、或者治疗了多少次,也不能提供其他的有用信息,而这些病历的XML版本能提供这些信息。
你能否合并关系数据和XML数据两者的长处,依赖于你如何存储数据。在关系数据库中物理地存储XML数据有三种方法:切碎、非结构化和结构化。切碎和非结构化是有用的方法,但有局限性。而结构化方法让你可以利用关系数据和XML层次结构两者的力量。
切碎是将XML数据放进关系列中,但去掉了它的XML特征,这意味着原来XML文档中数据的层次关系丢失了。当你对是否按XML格式保存数据无所谓时,切碎法是有用的。例如,你有一网站,允许客户下订单,订单要经过多个不同的数据库系统。产生一个XML文件,从一个共享网络上让不同的系统选取——就是将它切碎,这可能就是最有效、没有错误的方法,让数据到你想让它到的地方去。
非结构化方法使用了一个叫CLOB(字符型大对象)的数据类型,将整个XML文档作为单个单元存储起来。数据库利用此方法处理不同类型的文档已有多年了,因此它不是新东西。非结构化方法提供了有限的搜索功能,但它还是很好用。你不能将查询建在此基础上,但初始数据的结构保留了下来。非结构化存储的一项很好的用途,就是保存原始文档,使之符合政府规章。比如,一家金融机构计划接受采用XML格式的原始贷款文档,那么这就允许他们对每个贷款申请都有一个关系纪录,同时保存着有此纪录的原始申请。
结构化方法允许你在数据库中存储XML数据和保留数据的层次关系。结构化存储也叫“原始XML”储存,所有的供应商都想实现它。保留XML数据的层次关系最明显的好处,就是能接收XML文档,将它与关系数据合并或进行操作,并最终产生XML。单独用关系查询语言是不能产生这样的结果的。