Converting unstructured files to structured documents

FrameMaker provides a mapping feature to help you transfer your unstructured documents into structure. Your results depend on the following factors:

  • Document consistency. Documents that implement a formatting template consistently, with few or no formatting overrides, will convert better than documents that are full of overrides and custom paragraph or character tags.

  • Similarity between unstructured and structured documents. A new document structure that is similar to the organization in the unstructured documents eases the conversion process.

Conversion workflow

The conversion process creates structured elements from FrameMaker formatting components, such as paragraph tags, character tags, markers, cross-references, and table components.

To begin the conversion process, select an unstructured document that is representative of your typical content. Ideally, this document should contain examples of all of the formatting tags that would occur in your documents. These tags should be shown in logical sequences (as they would occur in documents), so a formatting template that shows examples of each paragraph tag in alphabetical order is not a good example document.

  1. Open the example document.

  2. Import element definitions from the EDD into the example document.

  3. Select StructureTools > Generate Conversion Table. Select Generate New Conversion Table, then click Generate.

    FrameMaker scans the document and creates a list of the formatting components that occur in this document. Tags that are defined in the formatting catalogs but not used in the document are not included in the list.

    Note: FrameMaker assumes that the name of the formatting component will be the same as the name of the structure element.
  4. Modify the mapping rules to match the structure. For example, FrameMaker assumes that the formatting tag names match the element names, so in the preceding example, the Body paragraph (P:Body) is mapped to the Body element. To change this mapping, change the second column (“In this element”) to read Para instead of Body.

  5. Once you have mapped all of the formatting components, add additional entries to the table to create hierarchy. For example, if a Section element typically contains a Heading and one or more Para elements, you add a row to the table and specify how to create the Section element.

  6. Add a root element mapping that specifies the top-level tag in the document, as shown here:

    RE:RootElement                                                Proposal
  7. Save the conversion rules table.

  8. To test the conversion rules table, open your example document, then select StructureTools > Utilities > Structure Current Document. Select the conversion rules table document in the pop-up menu, and then click Add Structure.

    FrameMaker creates a new, untitled, structured document. Keep refining and testing your conversion rules until you are satisfied with the document produced. You can add tags to the conversion rules table by typing them or by scanning additional documents.

  9. To add tags automatically:

    1. Make sure that the conversion rules table is open. Open the file that contains additional formatting components.

    2. Select StructureTools > Generate Conversion Table. Select Update Conversion Table and select your conversion rules document in the pop-up menu.

    3. Click Generate. FrameMaker scans the second sample document and adds additional formatting components to the end of the conversion rules table.

Conversion rule examples

The order in which conversion rules are listed is significant. You must go from lower-level elements to higher-level elements. For example, assume that you have the following mapping rules:

G:                                    Graphic 
P:caption                                    Caption 
E:Graphic,E: Caption                                    Figure

The rule in which Graphic and Caption are wrapped into a Figure element must occur after the rules in which Graphic and Caption are created.

If you need to map several paragraph tags to the same element and then wrap them into different parents, you use the third column for a qualifier. It’s common, for example, to have a ListItem element that’s used for both bulleted lists and numbered lists. Once the bullet and step paragraphs are wrapped in the ListItem element, you need a way to distinguish whether they belong in OrderedList or UnorderedList. To make this distinction, you use the qualifier column, as shown in the following example:

bullet                                    ListItem                                b 
step1                                    ListItem                                st 
step2+                                    ListItem                                st 
E:ListItem[b]+                                    UnorderedList 
E:ListItem[st]+                                    OrderedList

To specify the root element of a document, you use the following:

RE:RootElement                                    Chapter

You can only specify one root element per conversion table.

Graphics and tables are often anchored into the preceding paragraph in the unstructured document. When you structure the document, the Graphic and Table elements end up as children of the preceding Para element.

If you want the Graphic element to be converted as a sibling of Para (shown in the preceding figure on the right) rather than a child, use the “promote” command:

G:                                    Graphic(promote)