I just worked on a project where we had to convert Word documents that had a table of contents created by just using tabs into HTML tables for a webpage. The one problem with just converting the document into HTML is that you have extra HTML markup from MS Office. Also none of the data was in a table. Here I will break down how I saved the tabbed TOC data into a table in Word, how I cleaned up most of the HTML mark up (hint: I didn’t save the document as an HTML file in Word). Note: I am working in Office 2003, but these functions should be the same in the new Office 2007.

Part 1: Creating Tables in Word with just Tabbed Data:

1. Highlight all the data in the documents (Control+A or Apple+A)

2. Go to Table–> Convert–> Table to Text

3. Now since I only wanted two columns, in the popup box, I actually  entered 1 in the text box that says Number of columns. If I entered 2, I sometimes got an extra column. Play around with this number because it might be effected by the way the document was created.

4. Under AutoFit Behavior, I used the defaults. Under Separate Text At: I chose the Tab radio button (since that how the TOC was laid out and created).

5. You will now have created a table around the data.  Save the file. (You can also clean up some of the data (like merging cells) while you are still in the document file format).

Part 2:  Saving as HTML File

1. Download OpenOffice (free open source suite)

2. File—>Save as HTML.

You will notice that there some extra HTML markup added by Open Office, but not as much if you saved it out of Microsoft Word.

Part 3: Further Clean Up of HTML

I used Dreamweaver and took out the font tags and extra Meta Data. You can do this with a text editor like TextEdit or Notepad (but you can use the find and replace features in Dreamweaver). Another HTML Editor called Kompozer (free and open source), has a Search and Replace feature where you can find and replace the font tag. I didn’t really need to do that much clean up though in Dreamweaver.

I hope this help!

Advertisements