Saving the Word 2003 document
Although the default save type of a word document in Word
2003 is a file with extension doc, a binary format, it
can also be saved as a file with docx extension, the
Word 2007 default file format. Consider the example of a word document which
contains a single line, "This is a test" followed by an image
(arrow.jpg, 855 bytes) right after the line as shown in Figure 1. When saved as
a file with the doc extension, the file size is 19.5KB
including the 855 bytes image. This can also be saved with the extension docx and now the size of this file is 11.7KB, a significant
reduction in file size.
Office Open XML (OOXML) Structure of
the Word 2003 document
In order to see the components of this document based on Office Open XML, an ECMA standardized format (ECMA 376), it
is only necessary to change the file extension to ZIP,
a data compression and archival format which began with the PKZIP and PKUNZIP suite of utilities. This is
an ideal format for putting folders and files together and compressing them for
archival purposes. The Open Office XML scheme can be
disassembled and stuffed into several related folders in the ZIP format.
The OriginalTest.doc (Word 2003)
containing a single line of text and an image was saved as OriginalTest.docx
and then its extension was changed so that it was converted to OriginalTest.zip. If the zip folder is unzipped into a
container (folder, Backup of OriginalTestZIP), then
the contents that you would find will be like those below.
Figure 2
The [Content Types].xml basically
consists of the "types" encountered in OriginalTest.doc document. The
XML file is shown Listing 1. This file is like a manifest for the contained
elements (folders and files).
Listing 1: Content Types].xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Types xmlns="http://schemas.openxmlformats.org/package/2006/content-types">
<Default Extension="jpeg" ContentType="image/jpeg"/>
<Default Extension="rels" ContentType="application/vnd.openxmlformats-package.relationships+xml"/>
<Default Extension="xml"
ContentType="application/xml"/><Override PartName="/word/document.xml"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.document.main+xml"/>
<Override PartName="/<span class=Bold>word/styles.xml</span>"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.styles+xml"/>
<Override PartName="/<span class=Bold>docProps/app.xml</span>"
ContentType="application/vnd.openxmlformats-officedocument.extended-properties+xml"/>
<Override PartName="/<span class=Bold>word/settings.xml</span>"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.settings+xml"/>
<Override PartName="/<span class=Bold>word/theme/theme1.xml</span>"
ContentType="application/vnd.openxmlformats-officedocument.theme+xml"/>
<Override PartName="/<span class=Bold>word/fontTable.xml</span>"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.fontTable+xml"/>
<Override PartName="/<span class=Bold>word/webSettings.xml</span>"
ContentType="application/vnd.openxmlformats-officedocument.wordprocessingml.webSettings+xml"/>
<Override PartName="/<span class=Bold>docProps/core.xml</span>"
ContentType="application/vnd.openxmlformats-package.core-properties+xml"/>
</Types>
The "word" folder in the tree contains the
following files: styles.xml, settings.xml,
fontTable.xml, webSettings.xml,
and document.xml. These can be together rationalized
as representing the necessary resource elements needed for the display of the
document.
The document.xml shown in Listing 2 contains
the body of the document.
Listing 2:document.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<w:document
xmlns:ve="http://schemas.openxmlformats.org/markup-compatibility/2006"
xmlns:o="urn:schemas-microsoft-com:office:office"
xmlns:r="http://schemas.openxmlformats.org/officeDocument/2006/relationships"
xmlns:m="http://schemas.openxmlformats.org/officeDocument/2006/math"
xmlns:v="urn:schemas-microsoft-com:vml"
xmlns:wp="http://schemas.openxmlformats.org/drawingml/2006/wordprocessingDrawing"
xmlns:w10="urn:schemas-microsoft-com:office:word"
xmlns:w="http://schemas.openxmlformats.org/wordprocessingml/2006/main"
xmlns:wne="http://schemas.microsoft.com/office/word/2006/wordml">
<w:body><w:p w:rsidR="00A605EA" w:rsidRDefault="00A605EA">
<w:r>
<w:t>This is a test</w:t>
</w:r>
</w:p>
<w:p w:rsidR="00A605EA" w:rsidRPr="00943E24" w:rsidRDefault="00A605EA"><w:r w:rsidRPr="00943E24">
<w:pict><v:shapetype id="_x0000_t75" coordsize="21600,21600"
o:spt="75" o:preferrelative="t"
path="m@4@5l@4@11@9@11@9@5xe" filled="f"
stroked="f">
<v:stroke joinstyle="miter"/>
<v:formulas>
<v:f eqn="if lineDrawn pixelLineWidth 0"/>
<v:f eqn="sum @0 1 0"/>
<v:f eqn="sum 0 0 @1"/>
<v:f eqn="prod @2 1 2"/>
<v:f eqn="prod @3 21600 pixelWidth"/>
<v:f eqn="prod @3 21600 pixelHeight"/>
<v:f eqn="sum @0 0 1"/><v:f eqn="prod @6 1 2"/>
<v:f eqn="prod @7 21600 pixelWidth"/>
<v:f eqn="sum @8 21600 0"/>
<v:f eqn="prod @7 21600 pixelHeight"/>
<v:f eqn="sum @10 21600 0"/></v:formulas>
<v:path o:extrusionok="f"
gradientshapeok="t" o:connecttype="rect"/><o:lock v:ext="edit" aspectratio="t"/>
</v:shapetype>
<v:shape id="_x0000_i1025"
type="#_x0000_t75" style="width:45pt;height:45pt">
<v:imagedata r:id="rId4" o:title=""/>
</v:shape>
</w:pict>
</w:r>
</w:p>
<w:p w:rsidR="00A605EA"
w:rsidRPr="002C481F" w:rsidRDefault="00A605EA"/>
<w:sectPr w:rsidR="00A605EA"
w:rsidRPr="002C481F" w:rsidSect="00A605EA"><w:pgSz w:w="11906" w:h="16838"/>
<w:pgMar w:top="1440"
w:right="1800" w:bottom="1440" w:left="1800"
w:header="720" w:footer="720" w:gutter="0"/>
<w:cols w:space="720"/>
<w:docGrid w:type="lines"
w:linePitch="360"/>
</w:sectPr>
</w:body>
</w:document>
Similarly, the other XML files describe in detail the
applicable settings, the details of fonts, styles, themes and web settings for
the document. These are not listed in this article, but very easy to generate.
The folder media is where the media
files like pictures would reside. In the present case, the arrow.jpg is in the
folder media as image1.jpeg.
The _rels folder has a single file
called document.xml.rels which is also in XML shown
in Listing 3. When you review each of the relationships you will find that this
file relates the constituent parts of the document.
Listing 3: document.xml.rels
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Relationships
xmlns="http://schemas.openxmlformats.org/package/2006/relationships">
<Relationship Id="rId3"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/webSettings"
Target="webSettings.xml"/>
<Relationship Id="rId2"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/settings"
Target="settings.xml"/>
<Relationship Id="rId1"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/styles"
Target="styles.xml"/>
<Relationship Id="rId6"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/theme"
Target="theme/theme1.xml"/>
<Relationship Id="rId5"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/fontTable"
Target="fontTable.xml"/>
<Relationship Id="rId4"
Type="http://schemas.openxmlformats.org/officeDocument/2006/relationships/image"
Target="media/image1.jpeg"/>
</Relationships>
The docParts folder consists of two
files, the app.xml and the core.xml.
The following in Listing 4 shows app.xml which describes the document wide
details such as template file used, security information, etc.
Listing 4: app.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<Properties
xmlns="http://schemas.openxmlformats.org/officeDocument/2006/extended-properties"
xmlns:vt="http://schemas.openxmlformats.org/officeDocument/2006/docPropsVTypes">
<Template>Normal_Wordconv.dotm</Template>
<TotalTime>1</TotalTime>
<Pages>1</Pages>
<Words>2</Words>
<Characters>16</Characters>
<Application>Microsoft Office Outlook</Application>
<DocSecurity>0</DocSecurity>
<Lines>0</Lines>
<Paragraphs>0</Paragraphs>
<ScaleCrop>false</ScaleCrop>
<Company> Hodentek</Company>
<LinksUpToDate>false</LinksUpToDate>
<CharactersWithSpaces>0</CharactersWithSpaces>
<SharedDoc>false</SharedDoc>
<HyperlinksChanged>false</HyperlinksChanged>
<AppVersion>12.0000</AppVersion>
</Properties>
The core.xml describes the details of the creator, the
version and other information about the document that are either provided or
default values as in Listing 5.
Listing 5: core.xml
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<cp:coreProperties
xmlns:cp="http://schemas.openxmlformats.org/package/2006/metadata/core-properties"
xmlns:dc="http://purl.org/dc/elements/1.1/"
xmlns:dcterms="http://purl.org/dc/terms/"
xmlns:dcmitype="http://purl.org/dc/dcmitype/"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<dc:title>This is a test</dc:title>
<dc:subject>
</dc:subject>
<dc:creator>HP Authorized Customer</dc:creator>
<cp:keywords></cp:keywords>
<dc:description></dc:description>
<cp:lastModifiedBy>HP Authorized Customer</cp:lastModifiedBy>
<cp:revision>2</cp:revision>
<dcterms:created xsi:type="dcterms:W3CDTF">2007-04-10T14:49:00Z</dcterms:created>
<dcterms:modified xsi:type="dcterms:W3CDTF">2007-04-10T14:49:00Z</dcterms:modified>
</cp:coreProperties>