Autumn 2003

INFO 320

Information Needs, Searching and Presentation

Introduction

Active Management of Information – EDITED BY RYAN PRINS

Premises:

  • Information Sharing
    • Geography
    • Time
    • Across applications
  • Modular Information
    • Document bursting (XHTML)
    • Initial construction of information is XML

Abandon Assumptions:

  • Everyone will agree on a single view of data
  • Everyone will have the same use of information
  • Everyone will use the same technology

Managed Store of Information

One or more XML sources?

  • Technologically neutral: XML is text based but has the ability to be formatted into database formatting
  • Can create XML schemas to establish certain types of XML sources, i.e., Plant catalog, sales, etc.
  • Attach schemas to XML documents with Word 2003
  • Schemas can be attached for later downstream use
  • Save the document as a plain XML document in a selected customer-defined schema, without any Word-specific markup.
  • Save the document as a rich Word XML document including all the information that is saved in the .doc format (such as custom property metadata and so on).

An information design that reflects my needs/view of my information right now. Yesterday it was different, tomorrow it may be different. It may be unlike anyone else's view of the same or similar data. Using newer technology you can have the ability to be flexible in your working of your information. It may change day by day, but with the advances in XML technonlgy you will be able to adapt your changes with little change in performance.

 

Input Update and Integration

I receive XML and blend it into my own information store

  • Sender's technological platform is of no concern to me
  • Sender's presentation preferences are of no concern to me
  • I can validate sender's data against an XML schema
  • I can bridge the semantics of sender's data to my own
  • Trap events after an element from a customer-defined schema is inserted, before an element is deleted, and when the user moves among elements in an XML document

Diverse origins of input

With the new possibilities found in Word 2003 you will be able to expand on this notion even more. You can save important metadata within a XML file so that your company can use it later down the line with a better understanding of what specific data means. This metadata can also be stripped from the file when needed and the stripped file can be passed along when it is needed to without all of the excess metadata attached to it. This new possibility is a benefit for any company internally and would only enhance the productivity of the XML sources and how they are used.

Styling Output

No output is pre-ordained; all output is contingent on time, taste, technology, etc.

  • Style output as XML to share with downstream information consumers
  • Style output to present on Web, Wireless, etc.
  • Style output as input to other technologies, e.g.: databases
  • Output can include your own metadata for internal use or none at all
  • Apply customer-defined XSLT transforms before saving the XML document

 

Now the possibilities are endless with the way that you style your data for output. You can automatically attach a style sheet from the beginning and before your XML is saved you will have already applied your style sheet to your document. This way you will always have the same output from a specific XML file. This creates uniformity in your files and you will be able carry this uniformity all the way down the line.

How this applies when using Microsoft Word 2003

 

With the release of the newest version of Microsoft Word (2003), you are now able to create, edit, and distribute XML files, schemas, and style sheets within one platform. You are also able to implement your own metadata for internal use within your company. This way you will always have a clean source to work from and you will never have to worry about losing valuable data. A list of all the new features found in Word 2003, when it comes to XML, can be seen in the list below:

  • Add schemas to the schema library.
  • Attach schemas to XML documents.
  • Manage elements and attributes in XML documents (add, change, delete, cut, copy, and so on).
  • Save the document as a plain XML document in a selected customer-defined schema, without any Word-specific markup.
  • Save the document as a rich Word XML document including all the information that is saved in the .doc format (such as custom property metadata and so on).
  • Apply customer-defined XSLT transforms before saving the XML document.
  • Trap events after an element from a customer-defined schema is inserted, before an element is deleted, and when the user moves among elements in an XML document.
  • Validate selected nodes on an as-needed basis against schemas.
  • Trap events after a violation of a customer defined schema has occurred.
  • Provide custom XML document validation error handling actions.


Question: What's the best way to store large amounts of XML data in SQL Server? What are the performance implications of storing it in large chunks versus breaking it out into tables? MSDN magazine, Web Q&A, May 2003, p. 17
Answer: Different criteria play a role in that decision. If the data in the XML document is highly structured and fits into a relational model, it is often queried on a granular level, and you rarely need to get the XML back into its original form (in other words, order does not matter). In this case, decomposition into columnar data is better. If you have more document-oriented XML where order matters and recomposition costs are high, a Character Large Object or XML datatype-like approach is better.