This section describes the Fiorano Text Schema Editor tool which is used to design XML schemas as .tfl Text Format Layout files. These TFL files are used by the XML2Text and Text2XML prebuilt components to facilitate data conversion of non-XML data from and to its corresponding XML format respectively.
In case you require your composite component flow to read or write data from your data repository which exists as text or flat-files, you can use the FileReader component to read this flat-file and transform flat-file data into its corresponding XML using the Text2XML component.
The opposite can be done using a combination of the XML2Text component and the FileWriter component. But before you can transform data from flat-file format into its corresponding XML or vice versa, you require defining a File Schema which can aid the transformation. This File Schema may be understood as the format meta-data that is required in both the above mentioned instances.
The Text Schema Editor (TSE) is a tool which assists you to visually define the format and hierarchy of the non-XML data graphically. The format structure created by this editor is called the File schema in which the structure of the non-XML data is defined in terms of records and fields. This format is stored using XML grammar in tfl (Text Format Layout) files.
Once this schema format is defined, it can be used by:
- Text2XML transforms flat-file data to its corresponding XML
- XML2Text transforms XML data into its corresponding flat-file format
The following diagram shows how the FileReader component uses the transformation components to read XML and non-XML data.
Figure 1: Using File Reader and Transformation components
The non-XML data mentioned above can be delimited, positional or both. TSE also provides the test functionality in which the user can verify and test the schema formats created. In the test functionality, the user can generate sample data and can also transform sample non-XML data to XML and vice versa.
Text Format Layout Concepts
A tfl (text format layout) document is a specialized XML grammar which is used to describe the structure of non-XML structured (delimited, positional) data. In the tfl document, the structure of the data is defined as a hierarchical tree of records and fields in a given order.
Figure 2: Structure of a tfl document
The schema of the structured data is added as a child node to this Root Node. This node is called the Schema Node. When you create a new schema in Fiorano Schema Editor, the Root Node and the Schema Node are created automatically.
- Schema Node: In the schema structure, each opened schema file is shown as Schema Node and is the child of the Root Node. The Schema Node corresponds to the Root tag of the output XML which is generated from the structured non-XML text or input XML which is to be converted to the structured non-XML text. Schema Node can also be renamed. The properties of the Schema Node represent the default properties which can be used during data transformation. In a Schema Node, you can add multiple record nodes which represent the structure of input/output data. Adding fields to the Schema Node is not allowed.
- Record: Record represents a collection of information. It can contain a set of fields and/or other records.
- Field: Field represents items of information that are simple in nature, such as strings and numbers.
Creating Flat File Projects
Flat file projects can be created using Flat File Schemas view in Fiorano Tools Perspective. In the eStudio menu bar, navigate to Windows > Open Perspective > Other, select Fiorano Tools and click OK. This takes to Fiorano Tools perspective where a view called Flat File Schemas can be seen.
In this section a sample Employee Schema will be created to illustrate the usage of Flat File Schema Editor. Let's assume a flat file containing all the Employee Records (Employee Name, Age, Address) with each employee record in a new line where individual fields are comma separated as shown below.
Figure 3: Sample csv data
To define schema for this data, first create a new Flat File Project. To do this,
- Right-click on Flat File Schemas Node and select New File Schema Project option.
Figure 4: New File Schema project
This launches a wizard where project details can be configured. Provide name for Project and click the Finish button. A new project will be created with project name as root node and "Empty Schema" as default schema node.
- Rename Empty Schema as Employee Schema by selecting the node and editing the Name property in properties view. Properties like Record Delimiter and Default Filed Delimiter can be configured. Since the Employee records are separated by a new line provide the value for Record Delimiter as \r\n.
Figure 5: Flat file editor
Right-click on Employee Schema option and select Add->Record option. Give the record name as Employee and click OK.
- Since an employee record is comma separated, select the parsing type as Delimited and provide comma (,) as the delimiter value.
Figure 6: Configuring Flat File Schema elements
Right-click on Employee Record and select Add > Fields option.
- Provide Field Name as Name, Age, Address and click OK. Three fields Name, Age, Address are added and the schema is updated accordingly.
Testing Flat File Schema
Flat file schema generated can be tested in Test page.
- Click on the Test tab in the schema editor to open the Test page.
Figure 7: Flat file schema test
- Sample data can be generated by clicking the Generate Sample Flat Format button or the sample can be pasted in the Flat Format section.
Figure 8: Flat format sample data
- Click the Convert Flat Format to XML button. The flat format sample will be converted to XML and is displayed in the XML Format section.
Figure 9: XML output
Generating Flat File Schema using sample data
Flat file schema can be generated by configuring flat file elements as mentioned in section Creating Flat File Projects. The same can be done by loading the tfl data which is detailed in this section.
To define schema for this data, first create a new Flat File Project. To do this,
- Right-click on Flat File Schemas Node and select New File Schema Project option.
Figure 10: New File Schema project
- This launches a wizard where project details can be configured.
- Provide the project name and select Load From Flat File option. Select the flat file and click Next.
- Provide the Schema Node name, record delimiters values. Since the Employee records are separated by a new line provide the value for Record Delimiter as \r\n. Click Load Data button to load the data from flat file. The data is loaded and records are displayed based on the delimiter value.
Figure 11: Schema configuration
- Since all the data corresponds to individual employee records, duplicate rows can be removed. To remove a row, right-click and select Delete. Duplicate rows Record2 and Record3 can be removed.
- Rename Record1 as Employee and click the Next button.
Figure 12: Configuring Records
- The child elements can be configured in the Schema Configuration page. Select Employee node in the left hand tree viewer. The details of the node are displayed.
- Provide comma (,) as the Field delimiter value.
Figure 13: Configuring Record child elements
- Click the Configure Child Elements button. The data is parsed using the child delimiter value and is displayed in a table. The element Name, type and data type can be chosen in this page. Provide the details and click OK.
Figure 14: Defining Record child elements
- The individual Fields are generated. Details of each node can be seen by selecting the node on the left hand side tree viewer. Click Finish to finish the configuration.
Figure 15: Employee Schema configuration
- The schema is generated and is shown in the editor. This can be tested as described in section Testing Flat File Schema.
Sample Schemas
This tool is shipped with five samples that represent various schema types. These are broadly classified under two categories, namely Delimited File Schema samples and Positional File Schema samples.
The prebuilt schema samples are given below:
- Delimited File Schema samples
- CSV File Schema
- Nested CSV File Schema
- Positional File Schema samples
- Positional File Schema
- Nested Positional File Schema
- Positional in Delimited File schema
To import the prebuilt schema samples right-click on Flat File Schemas node and select Import Sample Project option.
Figure 16: Import Sample Project
A dialog is launched listing all the available samples. Select a sample and click ok to load it in the editor.
Figure 17: Select sample projects
Points to note
- Records can be positional or delimited. A Delimited record can contain a positional record as a child but a positional record cannot contain a Delimited record as a child. Delimiters have to be provided for delimited records where as Start and End Positions have to be provided for Positional records.
- The Field properties change based on the parent record parsing types (Delimited/Positional)
- Whenever there's any error in the generated schema, an error badge is shown on the corresponding element indicating the error. Place the cursor to see the error message.
Figure 18: Schema Errors
- The order of Fields or Records can be changed. To change the order, select the parent element and select Change Order option. A dialog is displayed where the order can be altered.
Figure 19: Change elements order
- A flat file schema project can be exported using the Export option available on the context menu of the project. Similarly a project can be imported by selecting Import Project option available on Flat File Schemas context menu.
Figure 20: Import/Export project
- We can close a project and load them later by using Open Closed Projects option on Flat File Schemas Node.
Flat File Element Properties
The properties associated with flat file nodes are described in this section.
Schema Node Properties
The Schema Nodes of all the file schemas represent the same set of properties. These properties act as global properties of the file schema which are available to all the descendent records and fields.
The following table lists all properties associated with the Schema Node:
Property | Value |
---|---|
Comment Start Identifier | An identifier which indicates the start of a comment in the source file. |
Comment End Identifier | An identifier which indicates the end of a comment in the source file. The data between the 'Comment Start' and 'Comment End' identifier is ignored.
|
Name | The name of the Root Node. |
Description | The description of the specification. |
Delimiter Value | Type or select a value for the delimiter. To specify a delimiter value, you must first set the Delimiter Type to Custom Delimiter. The delimiter can be multi-character. |
Escape Character | Specifies the default value of the escape character for this Schema Instance. Type or select a character value for the escape character. To specify an escape character value, you must first set the Escape Type to Character. |
Delimiter Type | Select one of the following options to choose a delimiter for the records/fields directly below the current record.
|
Escape Character Type | You can choose the escape character type from the following values:
An escape character is useful if you have a character in your field data that is also used as the delimiter character in the field's parent record. For example, if your field data is the following and you have chosen a comma as the delimiter value of the record that contains the field, TSE interprets the comma after "Fiorano" to be a delimiter, even if you intend for it to be part of the field data: Solution for this is to place an escape character directly preceding the delimiter character that you want to include in the field data. For example, if your escape character is specified as a backslash, you can place a backslash directly preceding a delimiter character, as in the following example: TSE interprets the comma after the backslash as field data rather than a delimiter character. |
Escape Character | This is the escape character which is to be used as the field delimiter. |
Delimiter Value | Type or select a value for the delimiter. To specify a delimiter value, first set the Delimiter Type to Custom Delimiter. The delimiter can be multi-character. |
Escape Character Type | You can choose the escape character type from the following values:
For example, if your field data is the following and you have chosen a comma as the delimiter value of the record that contains the field, TSE interprets the comma after "Fiorano" to be a delimiter, even if you intend for it to be part of the field data: Solution for this is to place an escape character directly preceding the delimiter character that you want to include in the field data. For example, if your escape character is specified as a backslash, you can place a backslash directly preceding a delimiter character, as in the following example: TSE interprets the comma after the backslash as field data rather than a delimiter character. |
Delimiter Type | This is the field delimiter of this file schema. The delimiter can be multiple characters. |
Record Node Properties
Every file schema is a unique entity with a unique set of records and fields. You can create a new schema by modifying an existing schema. To modify an existing schema, you need to add and/or remove records. After adding records, specify the properties associated with it. If you remove a record, its properties are also removed along with all child records and fields. In addition to adding and removing records, you can also rename them. You can edit the name of an existing record and its properties by selecting the record and editing it.
Following are some basic rules pertaining to records.
- Every new record, which you create, is inserted as a descendant of the record that you selected.
- The name of a record or field needs to be unique. The tool will display an exception if you specify a name that has already been assigned to an existing record or field.
- When you delete a record, all child records and fields are also deleted.
The following table lists all the properties associated with the record node:
Property | Value |
XML Type | The target XML type for the field. Depending on this value, the tag in the resultant XML is generated. Its value can either be Element (default) or None. If 'None' is selected, then the field is NOT mapped to the resultant XML. |
Minimum Occurrences | The minimum number of occurrences specified for a particular record. If the record does not occurs the specified number of times, then an exception is thrown. |
Maximum Occurrences | The maximum number of occurrences allowed for a specified record. After these many occurrences, the parser will not attempt to match the record and an exception is thrown. |
Parsing Type | Specifies whether the data input is to be considered as Positional or Delimited. |
Record Identifier Type | Type of the Identifier to be used for identifying a record. You can choose the Record Identifier from the following values:
|
Name | The name of the record. The name of the node should be a valid XML name. You cannot provide an existing record the same name as an existing record. Sibling record cannot have the same name. |
Description | The description of the record. |
Escape Character Type | You can choose the escape character type from the following values: Solution for this is to place an escape character directly preceding the delimiter character that you want to include in the field data. For example, if your escape character is specified as a backslash, you can place a backslash directly preceding a delimiter character, as in the following example: TSE interprets the comma after the backslash as field data rather than a delimiter character. |
Delimiter Value | Type or select a value for the delimiter. To specify a delimiter value, first set the Delimiter Type to Custom Delimiter. The delimiter can be multi-character. |
Delimiter Type | This is the field delimiter of this file schema. The delimiter can be multiple characters. |
Escape Character | This is the escape character which is to be used as the field delimiter. |
Field Node Properties
Depending on the type of file schema you are defining, you might need to add and/or remove fields. After adding fields to any schema, specify their properties. If you remove a field, its properties are also removed. You cannot add records or fields under a field.
When you add a field, you can immediately rename the field. You can edit the name of an existing field and its properties by selecting the field and editing it.
- If you click Add > Field from the popup menu that appears after right-clicking the mouse, the new field is inserted as a descendant of the record that you selected.
- You cannot give an existing field the same name as an existing record.
- You cannot provide a new field instance the same name as an existing sibling field or record.
- Sibling fields cannot have the same name.
Any changes to the visible properties in the table are set for the currently selected node of the schema tree, which can be a record or field or the root node.
The following parameters are associated with the field node:
Property | Value |
XML Type | The type for the field. This value can either be Element (default), Attribute or None. If None is selected, then the field is NOT mapped to the resultant XML. |
Data Type | Represents the data type for the field data. This property can be set if you want to validate the field data against the supported data types. Data types supported by it include String (default), Integer, Numeric, Date, Byte, and Data Format. This can be defined if the data type for the field is either Numeric or Date. For Numeric data type, data format can be defined based on the syntax rules of java.text.DecimalFormat. For Date data type, data format can be defined based on the syntax rules of java.text.SimpleDateFormat. |
Minimum Length | The minimum number of characters that the field can contain. |
Maximum Length | The maximum number of characters that the field can contain. |
Default value | The default value for a field. The field is matched only if it's value is the same as the default value. Can be used to set Headers and Column Names. |
Map If Null | Whether or not the field should be defined in the output XML if the value for the field in the source file is null/blank. |
Wrap Character | Character used to enclose field data. This property is useful if you have a character in your field data that is also used as the delimiter value for the field's parent node. A solution for this is to define a value for the wrap character property and then enclose the field data in the wrap character. For example, you can set the wrap character property to double quotation marks for the first field and then type your field data, as in the following example: The comma between the double quotation marks is interpreted by TSE Parser to be field data rather than a delimiter value. |
Padding character | This functionality is for the File Writer. If a certain field is smaller than the required size (either minimum length for delimited records or field length for positional records), then the FileWriter will pad the field with the padding character. Fields are always padded to the right of the field. |
Valid Characters | The value for this property represents the set of valid characters for the field value. If this value is set and the field data contains any character which does not belong to this list, then parsing error is thrown. |
Invalid Characters | The value for this property represents the set of invalid characters for the field value. If this value is set and the field data contains any character which belongs to this list, then parsing error is thrown. |
Trim Spaces | Whether to trim the spaces from the source field data before setting in the output XML. You can opt for trimming the spaces from the following positions:
|
Name | The name of the field. The name of the node should be a valid XML name. |
Description | The description of the field. |