Excel 2007 extends on the basis of Open Packaging
Conventions by adding its own application-specific XML types. Reference
schemas for all XML files used in Office can be downloaded from MSDN, but some
things are still open to change until the final Excel 2007 release.
We just want to write/read worksheet data, so we need to
look in the folder “\xl\worksheets” inside XLSX file where all the worksheets
are located. For every worksheet there is a separate XML file, “sheet1.xml,” “sheet2.xml”
and so on. When you open such a file you will notice that all of the sheet
data is inside the <sheetData> element. For every row there is a
<row> element and for every cell there is a <c> element. Finally, the
value of the cell is stored in a <v> element.
However, real world XML is never as simple as schoolbook
XML. You will notice that numbers get encoded as numbers inside <v>
element.
Listing 1
<c r="A1">
<v>100</v>
</c>
However, string value (like “John”) also gets encoded as a
number.
Listing 2
<c r="B1" t="s">
<v>0</v>
</c>
That is because MS Excel uses internal table of unique
strings (for performance reasons). Zero is an index of that string in an
internal table of strings and attribute t="s" tells us that the underlying
type is a string, not a number. So where is the table of unique strings
located? It is in “\xl\sharedStrings.xml” XML file and contains all strings
used in the entire workbook, not just a specific worksheet.
This approach is used for many other things: cell styles, borders,
charts, number formats, etc. In fact, that becomes the major programming
problem when working with XLSX files- updating and maintaining various tables
of some unique Excel objects. In this article we will just write/read data
values, but if you require some complex formatting you should probably be using
some commercial component which does all the tedious work for you.