I just worked on a project where we had to convert Word documents that had a table of contents created by just using tabs into HTML tables for a webpage. The one problem with just converting the document into HTML is that you have extra HTML markup from MS Office. Also none of the data was in a table. Here I will break down how I saved the tabbed TOC data into a table in Word, how I cleaned up most of the HTML mark up (hint: I didn’t save the document as an HTML file in Word). Note: I am working in Office 2003, but these functions should be the same in the new Office 2007.
Part 1: Creating Tables in Word with just Tabbed Data:
1. Highlight all the data in the documents (Control+A or Apple+A)
2. Go to Table–> Convert–> Table to Text
3. Now since I only wanted two columns, in the popup box, I actually entered 1 in the text box that says Number of columns. If I entered 2, I sometimes got an extra column. Play around with this number because it might be effected by the way the document was created.
4. Under AutoFit Behavior, I used the defaults. Under Separate Text At: I chose the Tab radio button (since that how the TOC was laid out and created).
5. You will now have created a table around the data. Save the file. (You can also clean up some of the data (like merging cells) while you are still in the document file format).
Part 2: Saving as HTML File
1. Download OpenOffice (free open source suite)
2. File—>Save as HTML.
You will notice that there some extra HTML markup added by Open Office, but not as much if you saved it out of Microsoft Word.
Part 3: Further Clean Up of HTML
I used Dreamweaver and took out the font tags and extra Meta Data. You can do this with a text editor like TextEdit or Notepad (but you can use the find and replace features in Dreamweaver). Another HTML Editor called Kompozer (free and open source), has a Search and Replace feature where you can find and replace the font tag. I didn’t really need to do that much clean up though in Dreamweaver.
I hope this help!