The copy was received as Word 6 documents.
These were saved as Microsoft Internet Archive files (.mht) files.
These in turn were processed to produce normal html files.
These were normalised using htmlpaser to produce ascii html for inclusion in the database.