OpenDocument format supports document representation:
·
As a single XML document.
·
As a collection of several subdocuments within a package.
Office applications use second approach so we will explain
it detail.
Every ODF file is a collection of several subdocuments
within a package (ZIP file), each of which stores part of the complete
document. Each subdocument stores a particular aspect of the document. For
example, one subdocument contains the style information and another subdocument
contains the content of the document.
This approach has following benefits:
·
You don’t need to process entire file in order to extract
specific data.
·
Images and multimedia are now encoded in native format, not as
text streams.
·
Files are smaller as a result of compression and native
multimedia storage.
There are four subdocuments in the package that contains
file’s data:
·
content.xml - Document content and
automatic styles used in the content.
·
styles.xml - Styles used in the
document content and automatic styles used in the styles themselves.
·
meta.xml - Document meta information,
such as the author or the time of the last save action.
·
settings.xml - Application-specific
settings, such as the window size or printer information.
Besides them, in package can be many other subdocuments like
document thumbnail, images, etc.
In order to read the data from an ODF file you need to:
1.
Open package as a ZIP archive.
2.
Find parts that contain data you want to read.
3.
Read parts you are interested in.
On the other side, if you want to create a new ODF file, you
need to:
4.
Create/get all necessary parts.
5.
Package everything into a ZIP file with appropriate extension.