generateDS -- Generate Data Structures from XML Schema

Author:	Dave Kuhlman
Contact:	dkuhlman (at) reifywork (dot) com
Address:	http://www.reifywork.com

revision:	2.44.3

date:	October 25, 2024

abstract: generateDS.py generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.

copyright:	Copyright (c) 2004 Dave Kuhlman. This documentation and the software it describes is covered by The MIT License: http://www.opensource.org/licenses/mit-license.php.
abstract:	`generateDS.py` generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.

Contents

1 Moving to a new repository host
2 Introduction
3 Where To find it
- 3.1 Download
- 3.2 Support and more information
4 How to build and install it
- 4.1 Requirements
- 4.2 Installation
5 Packaging your code
6 The command line interface -- How to use it
- 6.1 Running generateDS.py
- 6.2 Command line options
- 6.3 Name conflicts etc.
7 The graphical user interface -- How to use it
8 Common problems
- 8.1 Namespace prefix mis-match
- 8.2 Using multiple subclass modules with the same superclass module
9 Supported features of XML Schema
- 9.1 Attributes + no nested children
- 9.2 Mixed content
- 9.3 anyAttribute
- 9.4 Element extensions
- 9.5 Attribute groups
- 9.6 Substitution groups
- 9.7 Primitive types
- 9.8 simpleType
- 9.9 List values, optional values, maxOccurs, etc.
- 9.10 simpleType and validators
- 9.11 Include file processing
- 9.12 Abstract types
- 9.13 Types derived by extension
- 9.14 Duplicate unqualified type names
- 9.15 Mapping name spaces to defined types
10 The XML schema input to generateDS
- 10.1 Additional constructions
11 XMLBehaviors
- 11.1 The XMLBehaviors input file
- 11.2 Implementing other sources for implementation bodies
12 Additional features
- 12.1 GraphQL support
- 12.2 xsd:list element support
- 12.3 xsd:enumeration support
- 12.4 xsd:union support
- 12.5 Extended xsd:choice support
- 12.6 Arity, minOccurs, maxOccurs, etc
- 12.7 More thorough content type and base type resolution
- 12.8 Making top level simpleTypes available from XschemaHandler
- 12.9 Namespaces -- inserting namespace definition in exported documents
- 12.10 Support for xs:any
- 12.11 Generating Lxml Element tree
  - 12.11.1 Mapping generateDS objects to Lxml Elements and back
- 12.12 Specifying names for anonymous nested type definitions
- 12.13 Controlling and using simple type validation
- 12.14 Generating validator methods
13 How to use the generated source code
- 13.1 The parsing functions
- 13.2 Recognizing the top level element
- 13.3 The export methods
- 13.4 Building instances
- 13.5 Using the subclass module
- 13.6 Elements with attributes but no nested children
- 13.7 Mixed content
- 13.8 anyAttribute
- 13.9 User Methods
- 13.10 Overridable methods -- generatedssuper.py
- 13.11 The element name to class name dictionary
- 13.12 Adding custom exported attributes and namespace prefix definitions
- 13.13 Namespace prefixes, xsi:type attributes, and abstract extended types
14 "One Per" -- generating separate files from imported/included schemas
- 14.1 Approach 1 -- Command line option --one-file-per-xsd
- 14.2 Approach 2 -- Extraction and generation utilities
15 How to modify the generated code
- 15.1 Adding features to class definitions
16 Examples and demonstrations
- 16.1 Django -- Generating Models and Forms
  - 16.1.1 How to generate Django models and forms
  - 16.1.2 How it works
17 Sample code and extensions
- 17.1 Capturing xs:date elements as dates
18 Limitations of generateDS
- 18.1 XML Schema limitations
19 Includes -- The XML schema xs:include and xs:import elements
20 Processing RelaxNG schemas
21 Acknowledgments
22 See also

1 Moving to a new repository host

generateDS is moving to a new repository host.

The new repository location is here: https://sourceforge.net/projects/generateds/

You can clone the repository with the following:

hg clone http://hg.code.sf.net/p/generateds/code generateds

I thank Bitbucket for their excellent support. However, Bitbucket is discontinuing support for Mercurial, and I'd like to continue using Mercurial for our distributed revision-control tool.

2 Introduction

generateDS.py generates Python data structures (for example, class definitions) from an XML Schema document. These data structures represent the elements in an XML document described by the XML Schema. It also generates parsers that load an XML document into those data structures. In addition, a separate file containing subclasses (stubs) is optionally generated. The user can add methods to the subclasses in order to process the contents of an XML document.

The generated Python code contains:

A class definition for each element defined in the XML Schema document.
A main and driver function that can be used to test the generated code.
A parser that will read an XML document which satisfies the XML Schema from which the parser was generated. The parser creates and populates a tree structure of instances of the generated Python classes.
Methods in each class to export the instance back out to XML (method export) and to export the instance to a literal representing the Python data structure (method exportLiteral).

The generated classes contain the following:

A constructor method (__init__), with member variable initializers.
Methods with names 'getX' and 'setX' for each member variable 'X' or, if the member variable is defined with maxOccurs="unbounded", methods with names 'getX', 'setX', 'addX', and 'insertX'.
A "build" method that can be used to populate an instance of the class from a node in a minidom tree.
An "export" method that will write the instance (and any nested sub-instances) to a file object as XML text.
An "exportLiteral" method that will write the instance (and any nested sub-instances) to a file object as Python literals (text).

The generated subclass file contains one (sub-)class definition for each data representation class. If the subclass file is used, then the parser creates instances of the subclasses (instead of creating instances of the superclasses). This enables the user to extend the subclasses with "tree walk" methods, for example, that process the contents of the XML file. The user can also generate and extend multiple subclass files which use a single, common superclass file, thus implementing a number of different processes on the same XML document type.

generateDS.py can be run under either Python 2 or Python 3. The generated Python code (both superclass and subclass modules) can be run under either Python 2 or Python 3.

This document explains (1) how to use generateDS.py; (2) how to use the Python code and data structures that it generates; and (3) how to modify the generated code for special purposes.

There is also support for packaging the code you generate with generateDS.py. See Packaging your code.

3 Where To find it

3.1 Download

You can find the source distribution here:

Python Package Index -- http://pypi.python.org/pypi/generateDS/
Source Forge -- http://sourceforge.net/projects/generateds/. Use Mercurial to clone the repository. Do the following, but possibly change "generateds-code" to the directory of your choice:
```
hg clone http://hg.code.sf.net/p/generateds/code generateds-code
```
Bitbucket -- Bitbucket is discontinuing support for mercurial. The new host for the ``generateDS` repository is SourceForge.net see above), but I'll keep the Bitbucket repository updated until it's removed. See: https://bitbucket.org/dkuhlman/generateds <https://bitbucket.org/dkuhlman/generateds>`_

3.2 Support and more information

There is a mailing list at SourceForge: generateds-discuss -- https://sourceforge.net/p/generateds/mailman/generateds-discuss/.

There is a tutorial in the distribution: tutorial/tutorial.html and at generateDS -- Introduction and Tutorial -- http://www.reifywork.com/generateds_tutorial.html.

4 How to build and install it

4.1 Requirements

Lxml is used both by generateDS.py and by the code it generates. Lxml is available at the Python Package Index https://pypi.python.org/pypi/lxml/ and at the Lxml project home site http://lxml.de/.

Older versions of Python XML support can sometimes cause problems. If you receive a traceback that includes "_xmlplus", then you will need to remove that _xmlplus package.

4.2 Installation

De-compress the generateDS distribution file. Use something like the following:

tar xzvf generateDS-x.xx.tar.gz

Then, the regular Distutils commands should work:

$ cd generateDS-x.xx
$ python setup.py build
$ python setup.py install        # probably as root

5 Packaging your code

There is some support for packaging the code you generate with generateDS.py. This support helps you to produce a directory structure with places to put sample code, sample XML instance documents, and utility code for use with your generated module. It also assists you in using Sphinx to generate documentation for your module. The Sphinx support is especially useful when the schema used to generate code contains "annotation" elements that document complexType definitions.

Instructions on how to use it are here: How to package a generateDS.py generated library -- librarytemplate_howto.html

And the package building support itself is here: LibraryTemplate -- http://www.reifywork.com/librarytemplate-1.0a.zip. It is also included in the generateDS distribution package.

6 The command line interface -- How to use it

6.1 Running `generateDS.py`

Run generateDS.py with a single argument, the XML Schema file that defines the data structures. For example, the following will generate Python source code for data structures described in people.xsd and will write it to the file people.py. In addition, it will write subclass stubs to the file peoplesubs.py:

python generateDS.py -o people.py -s peoplesubs.py people.xsd

Here is the usage message displayed by generateDS.py:

Synopsis:
    Generate Python classes from XML schema definition.
    Input is read from in_xsd_file or, if "-" (dash) arg, from stdin.
    Output is written to files named in "-o" and "-s" options.
Usage:
    python generateDS.py [ options ] <xsd_file>
    python generateDS.py [ options ] -
Options:
    -h, --help               Display this help information.
    -o <outfilename>         Output file name for data representation classes
    -s <subclassfilename>    Output file name for subclasses
    -p <prefix>              Prefix string to be pre-pended to the class names
    -f                       Force creation of output files.  Do not ask.
    -a <namespaceabbrev>     Namespace abbreviation, e.g. "xsd:".
                             Default = 'xs:'.
    -b <behaviorfilename>    Input file name for behaviors added to subclasses
    -m                       Generate properties for member variables
    -c <xmlcatalogfilename>  Input file name to load an XML catalog
    --one-file-per-xsd       Create a python module for each XSD processed.
    --output-directory="XXX" Used in conjunction with --one-file-per-xsd.
                             The directory where the modules will be created.
    --module-suffix="XXX"    To be used in conjunction with --one-file-per-xsd.
                             Append XXX to the end of each file created.
    --subclass-suffix="XXX"  Append XXX to the generated subclass names.
                             Default="Sub".
    --root-element="XX"      When parsing, assume XX is root element of
    --root-element="XX|YY"   instance docs.  Default is first element defined
                             in schema.  If YY is added, then YY is used as the
                             top level class; if YY omitted XX is the default.
                             class. Also see section "Recognizing the top level
                             element" in the documentation.
    --super="XXX"            Super module name in generated subclass
                             module. Default="???"
    --validator-bodies=path  Path to a directory containing files that provide
                             bodies (implementations) of validator methods.
    --use-old-simpletype-validators
                             Use the old style simpleType validator functions
                             stored in a specified directory, instead of the
                             new style validators generated directly from the
                             XML schema.  See option --validator-bodies.
    --use-getter-setter      Generate getter and setter methods.  Values:
                             "old" - Name getters/setters getVar()/setVar().
                             "new" - Name getters/setters get_var()/set_var().
                             "none" - Do not generate getter/setter methods.
                             Default is "new".
    --use-source-file-as-module-name
                             Used in conjunction with --one-file-per-xsd to
                             use the source XSD file names to determine the
                             module name of the generated classes.
    --use-regex-module       Generated modules should import module "regex",
                             not "re".  Default is False.
    --user-methods= <file_path>,
    -u <file_path>           Optional module containing user methods.  See
                             section "User Methods" in the documentation.
    --custom-imports-template=<file_path>
                             Optional file with custom imports directives
                             which can be used via the --user-methods option.
    --no-dates               Do not include the current date in the generated
                             files. This is useful if you want to minimize
                             the amount of (no-operation) changes to the
                             generated python code.
    --no-versions            Do not include the current version in the
                             generated files. This is useful if you want
                             to minimize the amount of (no-operation)
                             changes to the generated python code.
    --no-process-includes    Do not use process_includes.py to pre-process
                             included XML schema files.  By default,
                             generateDS.py will insert content from files
                             referenced by xs:include and xs:import elements
                             into the XML schema to be processed and perform
                             several other pre-procesing tasks.  You likely do
                             not want to use this option; its use has been
                             reported to result in errors in generated modules.
                             Consider using --no-collect-includes and/or
                             --no-redefine-groups instead.
    --no-collect-includes    Do not (recursively) collect and insert schemas
                             referenced by xs:include and xs:import elements.
    --no-redefine-groups     Do not pre-process and redefine group definitions.
    --silence                Normally, the code generated with generateDS
                             echoes the information being parsed. To prevent
                             the echo from occurring, use the --silence switch.
                             Also note optional "silence" parameter on
                             generated functions, e.g. parse, parseString, etc.
    --namespacedef='xmlns:abc="http://www.abc.com"'
                             Namespace definition to be passed in as the
                             value for the namespacedef_ parameter of
                             the export() method by the generated
                             parse() and parseString() functions.
                             Default=''.
    --no-namespace-defs      Do not pass namespace definitions as the value
                             for the namespacedef_ parameter of the export
                             method, even if it can be extraced from the
                             schema.
    --external-encoding=<encoding>
                             Encode output written by the generated export
                             methods using this encoding.  Default, if omitted,
                             is the value returned by sys.getdefaultencoding().
                             Example: --external-encoding='utf-8'.
    --member-specs=list|dict
                             Generate member (type) specifications in each
                             class: a dictionary of instances of class
                             MemberSpec_ containing member name, type,
                             and array or not.  Allowed values are
                             "list" or "dict".  Default: do not generate.
    --export=<export-list>   Specifies export functions to be generated.
                             Value is a whitespace separated list of
                             any of the following:
                                 write -- write XML to file
                                 literal -- write out python code
                                 etree -- build element tree (can serialize
                                     to XML)
                                 django -- load XML to django database
                                 sqlalchemy -- load XML to sqlalchemy database
                                 validate -- call all validators for object
                                 generator -- recursive generator method
                             Example: "write etree"
                             Default: "write"
    --always-export-default  Always export elements and attributes that
                             a default value even when the current value
                             is equal to the default.  Default: False.
    --disable-generatedssuper-lookup
                             Disables the generatetion of the lookup logic for
                             presence of an external module from which to load
                             a custom `GeneratedsSuper` base-class definition.
    --disable-xml            Disables generation of all XML build/export
                             methods and command line interface
    --enable-slots           Enables the use of slots for generated class
                             members.  Requires --member-specs=dict.
    --preserve-cdata-tags    Preserve CDATA tags.  Default: False
    --cleanup-name-list=<replacement-map>
                             Specifies list of 2-tuples used for cleaning
                             names.  First element is a regular expression
                             search pattern and second is a replacement.
                             Example: "[('[-:.]', '_'), ('^__', 'Special')]"
                             Default: "[('[-:.]', '_')]"
    --mixed-case-enums       If used, do not uppercase simpleType enums names.
                             Default is to make enum names uppercase.
    --create-mandatory-children
                             If a child is defined with minOccurs="1" and
                             maxOccurs="1" and the child is xs:complexType
                             and the child is not defined with
                             xs:simpleContent, then in the element's
                             constructor generate code that automatically
                             creates an instance of the child.  The default
                             is False, i.e. do not automatically create child.
    --import-path="string"   This value will be pre-pended to the name of
                             files to be imported by the generated module.
                             The default value is the empty string ("").
                             This enables the user to produce relative
                             import statements in the generated module that
                             restrict the import to some module in a
                             specific package in a package directory
                             structure containing the generated module.
    -q, --no-questions       Do not ask questions, for example,
                             force overwrite.
    --no-warnings            Do not print warning messages.
    --session=mysession.session
                             Load and use options from session file. You can
                             create session file in generateds_gui.py.  Or,
                             copy and edit sample.session from the
                             distribution.
    --fix-type-names="oldname1:newname1;oldname2:newname2;..."
                             Fix up (replace) complex type names.
    -g rootelement:rootclass, --graphql=rootelement:rootclass
                             Generate methods, functions, query, classes,
                             and schema for GraphQL.  Specify the root
                             element (tag) and root class for XML instance
                             docs.
    --version                Print version and exit.

Usage example:

    $ python generateDS.py -f -o sample_lib.py sample_api.xsd

creates (with force over-write) sample_lib.py from sample_api.xsd.

    $ python generateDS.py -o sample_lib.py -s sample_app1.py \
            --member-specs=dict sample_api.xsd

creates sample_lib.py superclass and sample_app1.py subclass modules;
also generates member specifications in each class (in a dictionary).

6.2 Command line options

The following command line options are recognized by generateDS.py:

o <filename>

Write the data representation classes to file filename.

s <filename>

Write the subclass stubs to file filename.

p <prefix>

Prepend prefix to the name of each generated data structure (class).

f

Force generation of output files even if they already exist. Do not ask before over-writing existing files.

a <namespaceabbrev>

Namespace abbreviation, for example "xsd:". The default is 'xs:'. If the <schema> element in your XML Schema, specifies something other than "xmlns:xs=", then you need to use this option. So, suppose you have the following at the beginning of your XSchema file:

<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

Then you can the following command line option:

-a "xsd:"

But, note that generateDS.py also tries to pick-up the namespace prefix used in the XMLSchema file automatically. If the <schema> element has an attribute "xmlns:xxx" whose value is "http://www.w3.org/2001/XMLSchema", then generateDS.py will use "xxx:" as the alias for the XMLSchema namespace in the XMLSchema document.

b <behaviorfilename>

Input file name for behaviors to be added to subclasses. Specifies is the name of an XML document containing descriptions of methods to be added to subclasses generated with the -s flag. The -b flag requires the -s flag. See the section on XMLBehaviors below.

m

Generate property members and new style classes. Causes generated classes to inherit from class object. Generates a call to the built-in property function for each pair of getters and setters. This is experimental.

c <xmlcatalogfilename>

Specify the file to be used as an XML catalog. This file will be used by process_includes.py if needed to resolve references in <xs:import> and <xs:include> elements in the XML Schema. For more information on XML catalogs, see: http://en.wikipedia.org/wiki/XML_Catalog

one-file-per-xsd

Create a separate Python module for each XML Schema document processed (for example, using <xs:include> or <xs:import>). For help with using this option, see "One Per" -- generating separate files from imported/included schemas.

output-directory <directory>

When used with one-file-per-xsd, create generated output files in path <directory>.

module-suffix <suffix>

When used with one-file-per-xsd, append <suffix> to the end of each module name.

subclass-suffix=<suffix>

Append suffix to the name of classes generated in the subclass file. The default, if omitted, is "Sub". For example, the following will append "_Action" to each generated subclass name:

generateDS.py --subclass-suffix="_Action" -s actions.py mydef.xsd

And the following will append nothing, making the superclass and subclass names the same:

generateDS.py --subclass-suffix="" -s actions.py mydef.xsd

root-element=<element_name> -OR- <element_name>|<class_name>

Make element_name the assumed root of instance documents. The default is the name of the element whose definition is first in the XML Schema document. If class_name is also present (after a vertical bar), then class_name is assumed to be the name of the class to be created from the root (top level) element when parsing an XML instance document. If class_name is omitted, the default class name is the same as element_name. This flag effects the parsing functions (for example, parse(), parseString(), etc).

super=<module_name>

Make module_name the name of the superclass module imported by the subclass module. If this flag is omitted, the following is generated near the top of the subclass file:

import ??? as supermod

and you will need to hand edit this so the correct superclass module is imported.

validator-bodies=<path>

Obtain the bodies (implementations) for validator methods for members defined as simpleType from files in directory specified by <path>. The name of the file in that directory should be the same as the simpleType name with an optional ".py" extension. If a file is not provided for a given type, an empty body (pass) is generated. In these files, lines with "##" in the first two columns are ignored and are not inserted.

use-old-simpletype-validators

generateDS.py is capable of generating validator bodies -- the code that validates data content in an XML instance docuement and writes out warning messages if that data does not satisfy the facets in the xs:restriction in the xs:simpleType defintion in the XML schema. Use this option if you want to use your own validation bodies/code defined in a specified directory . See option --validator-bodies for information on that. Without this option (--use-old-simpletype-validators), the validator code will be generated directly from the XML schema, which is the default.

This option can also be used to generate code that does no validation. See simpleType and validators and Turning off validation of simpleType data for more information.

use-getter-setter

generateDS.py now generates getter and setter methods (for variable "abc", for example) with the names get_abc() and set_abc(), which I believe is a more Pythonic style, instead of getAbc() and setAbc(), which was the old behavior. Use this flag to generate getters and setters in the old style (getAbc() and setAbc()) or the newer style(get_abc() and set_abc()) which is the default or to omit generation of getter and setter methods. Possible values are:

"old" - Name getters/setters getVar()/setVar().
"new" - Name getters/setters get_var()/set_var().
"none" - Do not generate getter/setter methods.

The default is "new".

use-source-file-as-module-name

Used in conjunction with and only has an effect when used with --one-file-per-xsd. The effect of this option is to use the source XML schema file names to determine the module name of the generated classes. Without this option, the first root element is used to construct module names. The default is False.

use-regex-module

This option causes generateDS.py to generate modules that import the regex module instead of the re module. The default is False. There are some regular expressions that regex handles but that re does not, for example "p{...}". See https://pypi.org/project/regex/ and https://github.com/mrabarnett/mrab-regex.

u, user-methods=<module>

If specified, generateDS.py will add methods to generated classes as specified in the indicated module. For more information, see section User Methods.

custom-imports-template=<file_path>

Optional file with custom imports directives which can be used via the --user-methods option.

no-dates

Do not include the current date in the generated files. This is useful if you want to minimize the amount of (no-operation) changes to the generated python code.

no-versions

Do not include the current version in the generated files. This is useful if you want to minimize the amount of (no-operation) changes to the generated python code.

no-process-includes

Do not use process_includes.py to pre-process included XML Schema files. By default, generateDS.py will insert content from files referenced by xs:include and xs:import elements into the XML Schema to be processed. See section Include file processing. Note that include processing, which is performed in process_includes.py is required for generating validator bodies from the XML schema, because the Lxml ElementTree produced in process_includes.py is needed to generate the validator code. So, using this option also turns off automatic generation of validator code. Also note that process_includes(.py) performs additional tasks; it also (1) assigns names to each anonymous complexType, (2) processes (replaces) group definitions, and (3) possibly fixes complexType names (see command line option --fix-type-names). You likely do not want to use this option; its use has been reported to result in errors in generated modules. Consider using --no-collect-includes and/or --no-redefine-groups instead.

no-collect-includes

Do not (recursively) collect and insert schemas referenced by xs:include and xs:import elements. This task is performed in process_includes.py.

no-redefine-groups

Do not pre-process and redefine group definitions. This task is performed in process_includes.py.

silence

Normally, the code generated with generateDS echoes the information being parsed. To prevent the echo from occurring, use the --silence switch. This switch causes generateDS.py, when it generates boiler-plate parsing functions, (parse(), parseString(), parseLiteral()), to generate code that does not print out output (export output to stdout).

namespacedef="<http://...>"

Namespace definition to be passed in as the value for the namespacedef_ parameter of the export() method by the generated parse() and parseString() functions. If this parameter is specified, then the export function will insert a namespace prefix definition attribute in the top-most (outer-most) element. (Actually, you can insert any attribute.) The default is an empty string.

no-namespace-defs

Do not pass namespace definitions as the value for the namespacedef_ parameter of the export method, even if it can be extraced from the schema. The default is off. You might want to consider using this in combination with the ability to attach namespace prefix definitions to specific element types during export, as described here: Adding custom exported attributes and namespace prefix definitions.

external-encoding=<encoding>

If an XML instance document contains character data or attribute values that are not in the ASCII character set, then that data will not be written out correctly or will throw an exception. This flag enables the user to specify a character encoding into which character data will be encoded before it is written out by the export functions. The generated export methods encode data using this encoding. The default value, if this flag is omitted, is the value returned by sys.getdefaultencoding(). You can find a list of standard encodings here: http://docs.python.org/library/codecs.html#id3. Example use: --external-encoding='utf-8'.

member-specs Generate member (type) specifications in each class

A dictionary of instances of class MemberSpec_ containing member name, type, array or not, and whether the item is optional (i.e. defined with minOccurs="0"). See User Methods section for more information about MemberSpec_. Allowed values are "list" or "dict". Default: do not generate member specifications (unless --user-methods specified).

export

Specify which of the export related member methods are to be generated. The value is a whitespace separated list of any of the following:

"write" -- Generate methods export, exportAttributes, and exportChildren. These methods write XML to a file.
"literal" -- Generate methods exportLiteral, exportLiteralAttributes and exportLiteralChildren. These methods write out python code.
"etree" -- Generate method to_etree. This method builds an lxml element tree, which can, for example, be serialized to XML using lxml's tostring function and searched with the lxml xpath capability. You can also iterate over nodes in the tree with the node's getiterator, iterchildren, etc, and use any of lxml's other capabilities.
"validate" -- Generate a validator method in each complex type class. When called, this method calls each applicable simple type validator methods on simple types (attributes and children) defined in this class. See Generating validator methods for more information.
"django" -- Generate models for Django databases.
"sqlalchemy" -- Generate models for Sqlalchemy databases.
"generator" -- Generate a Python generator method that can be used to produce an iterable that produces each object in the tree of objects that represent complex types.

For example: --export="write etree" and --export="write". The default is: --export="write".

always-export-default

Always export elements and attributes that a default value even when the current value is equal to the default. Default: False.

disable-generatedssuper-lookup

Disables the generation of code implementing the lookup for presence of an external module from which to load a custom replacement for the default GeneratedsSuper base-class. With this flag, unconditionally uses the built-in implementation of GeneratedsSuper. (Suggestion: In order to get a picture of what difference this option makes, you might consider generating modules both with and without it, and then comparing the results with diff.) The default is False.

disable-xml

Disables generation of code that enables XML build/export methods and command line interface. Actually, the code is there, but is commented out. If you enable this option, the generated modules will not contain code for the following: (1) run as a script without explicitly running python (the #!/usr/bin/env python line is omitted); (2) import lxml.etree; (3) parse an XML file; (4) export an XML file. (Suggestion: In order to get a picture of what difference this option makes, you might consider generating modules both with and without it, and then comparing the results with diff.) The default is False.

enable-slots

Enables the use of slots for generated class members. Use of this option requires that you also use the following option: --member-specs=dict. Use of this option results in the generation of __slots__ class variables in classes generated from complex types. The effect is to reduce memory use and speed up processing.

preserve-cdata-tags

Preserve CDATA tags. Normally, CDATA tags ("<![CDATA[ ... ]]>") are dropped while parsing an XML instance document. If this option is included, the generated code will preserve those tags and will write them out during export. The default is False.

cleanup-name-list=<replacement-map>

Specifies replacement pairs to be used when cleaning up names. Must be a string representation of a Python list of 2-tuples. The values of each pair (2-tuple) must be strings. The first item of each pair is a pattern and must be a valid Python regular expression (see https://docs.python.org/2/library/re.html#module-re) The second item of each pair is a string that will replace anything matched by the pattern. Also see Cleaning up names with special characters etc.

The intension is to enable us to replace special characters in names that would cause the generation of invalid Python names, for example the names of generated classes. However, since a string replacement is performed, you can replace any single character or sequence of characters by any other single character or sequence of characters. Example: [(':', 'colon'), ('-', 'dash'), ('.', 'dot')].

The default when omitted is [('[-:.]', '_')].

mixed-case-enums

Do not uppercase the names of simpleType enums. The default (if this option is omitted) is to make generated enum names uppercase.

create-mandatory-children

If a child is defined with minOccurs="1" and maxOccurs="1" and the child is xs:complexType and the child is not defined with xs:simpleContent, then in the element's constructor generate code that automatically creates an instance of the child. The default is False, i.e. do not automatically create the child. Note that if a value for the child's parameter is passed to the constructor (which overrides the default value of None), then the constructor will not create an instance.

import-path="string"

This value will be pre-pended to the name of files to be imported by the generated module. The default value is the empty string (""). This enables the user to produce relative import statements in the generated module that restrict the import to some module in a specific package in a package directory structure containing the generated module.

q, no-questions

Do not ask questions. For example, if the "-f" command line option is omitted and the ouput file exists, then generateDS.py will not ask whether the file should be overwritten. (In this case, when "-q" is used, the "-f" must be used to force the output file to be written.

no-warnings

While running generateDS.py, do not print warning messages that would be written to stderr.

session=mysession.session

Load and use options from session file. You can create a session file in generateds_gui.py, the graphical front-end for generateDS.py. Additional options on the command line can be used to override options in the session file. A session file is an XML document, so you can modify it with a text editor.

fix-type-names="oldname1:newname1;oldname2:newname2;..."

Fix up (replace) complex type names. Using this option will replace the following: (1) the 'name' attribute of a complexType; (2) the 'type' attribute of each element that refers to the type; and (3) the 'base' attribute of each extension that refers to the type. These fixups happen before information is collected from the schema for code generation. Therefore, using this option is effectively equivalent to copying your schema, then editing it with your text editor, then generating code from the modified schema. If a new name is not specified, the default is to replace the old name with the old name plus an added "xx" suffix. Examples:

$ generateDS.py --fix-type-names="type1:type1Aux"
$ generateDS.py --fix-type-names="type1;type2:type2Repl"

g rootelement:rootclass, graphql=rootelement:rootclass

Generate methods, functions, query, classes, and schema for GraphQL. Specify the root element (tag) and root class for XML instance docs.

version

Print out the current version of generateDS.py and immediately exit.

6.3 Name conflicts etc.

6.3.1 Conflicts with Python keywords

In some cases the element and attribute names in an XML document will conflict with Python keywords. There are two solutions to fixing and avoiding name conflicts:

In an attempt to avoid these clashes, generateDS.py contains a table that maps names that might clash to acceptable names. This table is a Python dictionary named NameTable. The user can modify existing entries in this table within generateDS.py itself and add additional name-replacement pairs to this table, for example, if new conflicts occur.
Or, you can fix additional conflicts by following these steps:
1. Create a module named generateds_config.py.
2. Define a dictionary in that module named NameTable.
3. Place additional name mappings in that dictionary. Here is a sample:
```
NameTable = {
    'range': 'rangeType',
    }
```
1. And, place that module where generateDS.py can import it, or place the directory containing that module on your PYTHONPATH environment variable.
generateDS.py will attempt to import that module (generateds_config.py) and will add the name mappings in it to the default set of mappings in NameTable in generateDS.py itself.

6.3.2 Conflicts between child elements and attributes

In some cases the name of a child element and the name of an attribute will be the same. (I believe, but am not sure, that this is allowed by XML Schema.) Since generateDS.py treats both child elements and attributes as members of the generated class, this is a name conflict. Therefore, where such conflicts exist, generateDS.py modifies the name of the attribute by adding "_attr" to its name.

6.3.3 Cleaning up names with special characters etc.

generateDS.py attempts to clean up names that contain special characters. For example, a complexType whose name contains a dash would generate a Python class with an invalid name. But, you can use this facility to make other changes to names as well.

The command line option --cleanup-name-list specifies replacement pairs to be used when cleaning up (and modifying) names. The value of this option must be a string representation of a Python list of 2-tuples. The values of each pair (2-tuple) must be strings. The first item of each pair is a pattern and must be a valid Python regular expression (see https://docs.python.org/2/library/re.html#module-re) The second item of each pair is a string that will replace anything matched by the pattern. The intension is to enable us to replace special characters in names that would cause the generation of invalid Python names, for example the names of generated classes. However, since a string replacement is performed, you can replace any single character or sequence of characters by any other single character or sequence of characters.

For example, the following option, in addition to performing the default replacements of "-", ":", and "." by an underscore, would also replace the string "Type" when it occurs at the end of a name, by "Class":

--cleanup-name-list="[('[-:.]', '_'), ('Type$', 'Class')]"

This would cause the name "big-data-Type" to become "big_data_Class".

The default when this option is omitted is [('[-:.]', '_')].

The order of replacements performed is the same as the order of the tuples in the list. So, replacements performed by pattern replacement pairs (2-tuples) later in the list (to the right) will be performed after those earlier (to the left), and may overwrite earlier replacements.

See the notes on the command line option --cleanup-name-list for more on this. Or, run $ generateDS.py --help.

7 The graphical user interface -- How to use it

Note: The graphical user interface is no longer supported.

Here are a few notes on how to use the GUI front-end.

generateds_gui.py is installed when you do the standard installation:
```
$ python setup.py install
```
Run it by typing the following at the command line:
```
$ generateds_gui.py
```
For help with command line options, run:
```
$ generateds_gui.py --help
```
For a description of the values and flags that you can set, see section Running generateDS.py. There are also tool tips on the various widgets in the graphical user interface.
Generate the python bindings modules by using the Tools/Generate menu item or the Generate button at the bottom of the window.
Capture the command line generated by using the Tools/Capture command line menu item. You might consider copying and pasting that command line into a shell script or batch file for repeated reuse.
You can also save and later reload your values and flags in a session file. See the Save session, Save session as, and Load session items under the File menu. By default, a session file has the extension ".session".
You can load a session on start-up with the "-s" or "--session" comand line options. For example:
```
$ generateds_gui.py --session=mybindingsjob.session
```
Or, use the "session" option in a configuration file.
If the command to be run when generating bindings is not standard, you can specify that command with the "--exec-path" command line option or with the "exec-path" option configuration file. The default is "generateDS.py".
Command line options can also be specified in a configuration file. generateds_gui.py checks for that configuration file in the following locations in this order:
1. ~/.generateds_gui.ini
2. ./generateds_gui.ini
Here is a sample configuration file:
```
[general]
exec-path: /usr/bin/python ~/bin/generateDS.py
impl-path: generateds_gui.glade
session: a1.session
```
Options on the command line override options in configuration files.

8 Common problems

8.1 Namespace prefix mis-match

generateDS.py is not very intelligent about detecting what prefix is used in the schema file for the XML Schema namespace. When this problem occurs, you may see the following when running generateDS.py:

AttributeError: 'NoneType' object has no attribute 'annotate'

generateDS.py assumes that the XML Schema namespace prefix in your schema is "xs:".

So, if the XML Schema namespace prefix in your schema is not "xs:", you will need to use the "-a" command line option when you run generateDS.py. Here is an example:

generateDS.py -a "xsd:" --super=mylib -o mylib.py -s myapp.py someschema.xsd

8.2 Using multiple subclass modules with the same superclass module

Suppose that from a single XML schema, you have generated a superclass module and a subclass module (using the "-o" and "-s" command line options). Now you make a copy of the subclass module. Next you add special and different code to each of the subclass modules. You can run these two subclass modules separately and (after a bit of debugging) each works fine. And, you can import each subclass module in separate applications, and things are still good. However, if you import both subclass modules into a single application, you find that one of them is "ignored" by the superclass module when it parses XML instance documents and builds classes. Effectively, each subclass module, when it is imported, sets a class variable (subclass) in each superclass to the subclass to be used by the superclass, and the last subclass imported module wins.

There are two alternative solutions to this problem:

Use the script/function provided by the distribution in file fix_subclass_refs.py. The doc string in that module explains how to use it and gives an example of its use.
Each generated superclass module (starting with generateDS.py version 2-19a) contains a global variable CurrentSubclassModule_. The value of this variable, if it is not None, overrides the value of the class variable subclass in each generated superclass. You can change the value of this variable before parsing an XML document and building instances of the generated classes to determine which subclass module is to be used during the "build" phase.

Here is an example of the use of this feature:
```
#!/usr/bin/env python

import lib01suba
import lib01subb

def test():
    lib01suba.supermod.CurrentSubclassModule_ = lib01suba
    roota = lib01suba.parse('test01.xml', silence=True)
    lib01subb.supermod.CurrentSubclassModule_ = lib01subb
    rootb = lib01subb.parse('test01.xml', silence=True)
    roota.show()
    print '-' * 50
    rootb.show()

test()
```

The second alternative (above) is likely to be a more convenient solution in most cases. But, there are possibly use cases where the use of fix_subclass_refs.py or a modified version of it will be helpful.

9 Supported features of XML Schema

The following constructs, among others, in XML Schema are supported:

Attributes of types xs:string, xs:integer, xs:float, and xs:boolean.
Repeated sub-elements specified with maxOccurs="unbounded".
Sub-elements of simple types xs:string, xs:integer, and xs:float.
Sub-elements of complex types defined separately in the XML Schema document.

See file people.xsd for examples of the definition of data types and structures. Also see the section on The XML Schema Input to generateDS.

9.1 Attributes + no nested children

Element definitions that contain attributes but no nested child elements provide access to their data content through getter and setter methods getValueOf_ and setValueOf_ and member variable valueOf_.

9.2 Mixed content

Elements that are defined to contain both text and nested child elements have "mixed content". generateDS.py provides access to mixed content, but the generated data structures (classes) are fundamentally different from that generated for other elements. See section Mixed content for more details.

Note that elements defined with attributes but with no nested sub-elements do not need to be declared as "mixed". For these elements, character data is captured in a member variable valueOf_, and can be accessed with member methods getValueOf_ and setValueOf_.

9.3 anyAttribute

generateDS.py supports anyAttribute. For example, if an element is defined as follows:

<xs:element name="Tool">
   <xs:complexType>
      <xs:attribute name="PartNumber" type="xs:string" />
      <xs:anyAttribute processContents="skip" />
   </xs:complexType>
</xs:element>

Then generateDS.py will generate a class with a member variable anyAttributes_ containing a dictionary. Any attributes found in the instance XML document that are not explicitly defined for this element will be stored in this dictionary. generateDS.py also generates getters and setters as well as code for parsing and export. generateDS.py ignores processContents. See section anyAttribute for more details.

9.4 Element extensions

generateDS.py now generates subclasses for extensions, that is when an element definition contains something like this:

<xs:extension base="sometag">

Limitation -- There is an important limitation, however: member names duplicated (overridden ?) in an extension generate erroneous code. Sigh. I guess I needed something more to do.

Several of the generated methods have been refactored so that subclasses can reuse the code in their superclasses. Take a look at the generated code to learn how to use it.

The Python compiler/interpreter requires that it has seen a superclass before it sees the subclass that uses it. Because of this, generateDS.py delays generating a subclass until after its superclass has been generated. Therefore, the order in which classes are generated may be different from what you expect.

9.5 Attribute groups

generateDS.py now handles definition and use of attribute groups. For example: the use of something like the following:

<xs:attributeGroup name="favorites">
    <xs:attribute name="fruit" />
    <xs:attribute name="vegetable" />
</xs:attributeGroup>

And, a reference or use like the following:

<xs:element name="person" type="personType"/>
<xs:complexType name="personType" mixed="0">
    <xs:attributeGroup ref="favorites" />
</xs:complexType>

Results in generation of class personType that contains members fruit and vegetable.

Multiple levels of attributeGroups are supported, that is, attribute groups themselves can contain references to other attribute groups.

9.6 Substitution groups

generateDS.py now handles a limited range of substitution groups, but, there is an important limitation, in particular generateDS.py handles substitution groups that involve complex types, but does not handle those that involve (substitute for) simple types (for example, xs:string, xs:integer, etc). This is because the code generated for members defined as simple types does not provide the needed information to handle substitution groups.

9.7 Primitive types

generateDS.py supports some, but not all, simple types defined in "XML Schema Part 0: Primer Second Edition" ( http://www.w3.org/TR/xmlschema-0/. See section "Simple Types" and appendix B). Validation is performed for some simple types. When performed, validation is done while the XML document is being read and instances are created.

Here is a list of supported simple types:

xs:string -- No validation.
xs:token -- No validation. White space between tokens is coerced to a single blank between tokens.
xs:integer, xs:short, xs:long. xs:int -- All treated the same. Checked for valid integer.
xs:float, xs:double, xs:decimal -- All treated the same. Checked for valid float.
xs:positiveInteger -- Checked for valid range (> 0).
xs:nonPositiveInteger -- Checked for valid range (<= 0).
xs:negativeInteger -- Checked for valid range (< 0).
xs:nonNegativeInteger -- Checked for valid range (>= 0).
xs:date, xs:dateTime -- All treated the same. No validation.
xs:boolean -- Checked for one of 0, false, 1, true.

9.8 simpleType

generateDS.py generates minimal support for members defined as simpleType. However, the code generated by generateDS.py does not enforce restrictions. For notes on how to enforce restrictions, see section simpleType and validators.

A simpleType can be a restriction on a primitive type or on a defined element type. So, for example, the following will generate valid code:

<xs:element name="percent">
    <xs:simpleType>
        <xs:restriction base="xs:integer">
            <xs:minInclusive value="1"/>
            <xs:maxInclusive value="100"/>
        </xs:restriction>
    </xs:simpleType>
</xs:element>

And, the following will also generate valid code:

<xs:simpleType name="emptyString">
    <xs:restriction base="xs:string">
        <xs:whiteSpace value="collapse"/>
    </xs:restriction>
</xs:simpleType>

<xs:element name="merge">
    <xs:complexType>
        <xs:simpleContent>
            <xs:extension base="emptyString">
                <xs:attribute name="fromTag" type="xs:string"/>
                <xs:attribute name="toTag" type="xs:string"/>
            </xs:extension>
        </xs:simpleContent>
    </xs:complexType>
</xs:element>

9.9 List values, optional values, maxOccurs, etc.

For elements defined with maxOccurs="unbounded", generateDS.py generates code that processes a list of elements.

For elements defined with minOccurs="0" and maxOccurs="1", generateDS.py generates code that exports an element only if that element has a (non-None) value.

9.10 simpleType and validators

9.10.1 Generating validator bodies from XML schema

If you do not use the --use-old-simpletype-validators command line option, then generateDS.py will generate validation code directly from the restrictions specified inside the simpleType definitions in your XML schema.

Here is a bit of explanation of what that generated code will do.

The generated validation code checks the global variable Validate_simpletypes_. Set that variable to False to turn off validation.
In the case of some XML schema built-in simple types, the generated validation code calls gds_validate_xxx, where "xxx" is a base, simple type. In some cases, you will be able to add additional code to that method to perform custom checking. See section Overridable methods -- generatedssuper.py for information on how to use and override that class.
When validation finds data that fails to validate, it generates a warning (using the warnings module from the Python standard library), not an exception, so that processing continues.
The validation code is generated in a separate method named validate_xxx, where "xxx" is the name of the data element. This method is called in the build method as the input data is parsed and instances of the generated classes are created to hold it. Your own code can also call this method whenever you'd like to perform on the data in that element/field.
There are rules for how checking should be performed when (1) there are multiple restrictions in a single simpleType and when there are restrictions in a simpleType and it base simple types. generateds.py attempts to follow those rules in generating validation code. For information about that, see: http://www.w3.org/TR/xmlschema-2/#rf-facets. Pattern facets are especially tricky, because pattern restrictions at the same level are OR-ed together, while pattern restrictions at different levels are AND-ed together. See: http://www.w3.org/TR/xmlschema-2/#rf-pattern.
The validation method also performs type conversion for some simple types, for example, string to int for integers, string to float for floats, etc.

9.10.2 User written validator bodies

This is the older, more manual method. In order to generate code that uses this method, use command line option --use-old-simpletype-validators.

Here are a few notes that should help you write your own validator methods to enforce restrictions.

Default behavior -- The generated code, by default, treats the value of a member whose type is a simpleType as if it were declared as type xs:string.
Validator method stubs -- For a member variable name declared as a simpleType named X, a validator method validate_X is generated. Example -- from:
```
<xs:simpleType name="tAnyName">
    <xs:restriction base="xs:string"/>
</xs:simpleType>
```
The class generated by generateDS.py will contain the following method definition:
```
def validate_tAnyName(self, value):
    # Validate type tAnyName, a restriction on xs:string.
    pass
```

Calls to validator methods -- For a member variable declared as a simpleType X, a call to validate X is added to the build method. Example -- from:

<xs:element name="person">
    <xs:complexType mixed="0">
        <xs:sequence>
            <xs:element name="test2" type="tAnyName"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

generateDS.py produces the following call:

self.validate_tAnyName(self.test2)    # validate type tAnyName

Code bodies for validator methods can be added either (1) manually or (2) automatically from an external source. See command line option "--validator-bodies" and see below.

You can add code to the validator method stub to enforce the restriction for the base type and further restrictions imposed on that base type. This can be done in the following ways:

Add code manually after generation. I recommend that you use the "-s" command line option and override the validator method in the resulting subclass file.
Or, supply code bodies (implementations) in an external source and ask generateDS.py to insert those code bodies into generated validator methods. Here are notes on how to do this:
- Use the "--validator-bodies=path" command line option to specify a directory.
- In that directory, provide one file for each simpleType. The name of the file should be the same as the name of the simpleType with an optional extension ".py". generateDS.py looks for a file named type_name.py, first, and if not found, looks for a file named type_name.
- If the "--validator-bodies" is not on the command line or neither type_name.py nor type_name is found, an empty body (a pass statement) is generated.
- Lines from the file are inserted as is, except that lines containing "##" in the first two columns are omitted. Note that you will need to provide the correct indentation for a method in a class, specifically 8 spaces.

The support for simpleType in generateDS.py has the following limitations (among others, I'm sure):

It only works for simpleType defined with and referenced through a name. It does not work for "in-line" definitions. So, for example, the following works:

<xs:element name="person">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="test3" type="tAnyName"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

<xs:simpleType name="tAnyName">
    <xs:restriction base="xs:string"/>
</xs:simpleType>

But, the following does not work:

<xs:element name="person">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="test3">
                <xs:simpleType name="tAnyName">
                    <xs:restriction base="xs:string"/>
                </xs:simpleType>
            </xs:element>
        </xs:sequence>
    </xs:complexType>
</xs:element>

Attributes defined as a simple type are not supported.

9.10.3 Turning off validation of `simpleType` data

If you do not want validation performed on simpleType data, you have these options:

When generating your code, use the --use-old-simpletype-validators command line option but do not use the --validator-bodies command line option. This will result in validator methods that have empty bodies (only a pass statement).
Or, when you run your generated code, set the variable Validate_simpletypes_ to False. This global variable is near the top of your generated module. It can be set to True or False before and during processing to turn validation on and off.

9.10.4 Additional notes on `simpleType` validation

Don't forget that xmllint can also be used to perform validation against the XML scheme. This validation includes checking against simpleType restrictions. See http://xmlsoft.org/ for more information on xmllint.

9.11 Include file processing

By default, generateDS.py will insert content from files referenced by include elements into the XML Schema to be processed. This behavior can be turned off by using the "--no-process-includes" command line option.

include elements are processed and the referenced content is inserted in the XML Schema by importing and using process_includes.py, which is included in the generateDS.py distribution.

The include file processing is capable of retrieve included files via FTP and HTTP internet protocols as well as from the local file system.

9.12 Abstract types

generateDS.py has support for abstract types. For more on this, see: XML Schema Part 0: Primer Second Edition: Abstract Elements and Types -- http://www.w3.org/TR/xmlschema-0/#abstract.

9.13 Types derived by extension

This section describes some of the support for types derived by extension and also how to use the data bindings generated for those types in Python.

For example, suppose you have an XML schema that looks like this (example.xsd):

<?xml version="1.0"?>
<xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">

<xs:element name="animalCollection">
  <xs:complexType>
    <xs:sequence>
      <xs:element name="animal" type="animal" maxOccurs="unbounded"/>
    </xs:sequence>
  </xs:complexType>
</xs:element>

<xs:complexType name="animal" abstract="true"></xs:complexType>

<xs:complexType name="dog">
  <xs:complexContent>
    <xs:extension base="animal">
      <xs:sequence>
        <xs:element name="name" type="xs:string"/>
      </xs:sequence>
    </xs:extension>
  </xs:complexContent>
</xs:complexType>
</xs:schema>

An XML instance document for this document type might be the following:

<?xml version="1.0"?>
<animalCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
    <animal xsi:type="dog">
        <name>fido</name>
    </animal>
</animalCollection>

Question: How would you, in Python, using bindings generated by generateDS.py, create an instance of type dog that is derived from type animal and when exported to XML, appears as an animal with attribute xsi:type="dog"?

First, we need to generate our bindings:

$ generateDS.py -o example01.py example.xsd

And, now, here is Python some code that creates those instances and exports them:

# sample01.py

import sys
import example01

def test():
    animal_collection = example01.animalCollection()
    animal = example01.dog(name='milicent')
    #
    # must set original_tagname_ and extensiontype_ for
    # type derived by extension.  See:
    # https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#DerivExt
    animal.original_tagname_ = 'animal'
    animal.extensiontype_ = 'dog'
    animal_collection.add_animal(animal)
    animal_collection.export(sys.stdout, 0)
    return animal_collection, animal

test()

Notes:

The above code creates an instance of class animalCollection and an instance of class dog.
Because we want the dog to be represented in XML as a "<animal>" with an "xsi:type" attribute, we must set the original_tagname_ and extensiontype_ attributes in the instance of class dog.
Then we add our dog to the animalCollection, and finally, we export it.
We can get some clues about this by reading the code generated for classes animalCollection, animal, and dog.

When we run it, we'll see:

$ python sample01.py
<animalCollection>
    <animal xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dog">
        <name>milicent</name>
    </animal>
</animalCollection>

For more information on types derived by extension, see "XML Schema Part 0: Primer Second Edition", specifically:

"Deriving Types by Extension" -- https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#DerivExt
"Using Derived Types in Instance Documents" -- https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#UseDerivInInstDocs

9.14 Duplicate unqualified type names

generateDS.py can handle schemas that define the same unqualified name in separate name spaces, although the solution is not, perhaps, ideal. This is done by renaming the duplicate name. Doing so is necessary because all definitions must be generated in a single module, and we cannot, for example, generate multiple classes with the same name.

A dictionary RenameMappings_ in the generated module contains the mapping of original qualified name to new name. Here is an example:

RenameMappings_ = {
    "{http://www.example.com/IPO_US}Address": "Address2",
    "{http://www.example.com/IPO_US}BaseAddress": "BaseAddress1",
    "{http://www.example.com/IPO_US}Postcode": "Postcode4",
    "{http://www.example.com/IPO_US}ProvinceState": "ProvinceState3",
}

If you need look-up in the reverse direction, you can try (under Python 3) using something like the following:

[ins] In [11]: {k: v for (v, k) in my_module.RenameMappings_.items()}
Out[11]:
{'Address2': '{http://www.example.com/IPO_US}Address',
 'BaseAddress1': '{http://www.example.com/IPO_US}BaseAddress',
 'Postcode4': '{http://www.example.com/IPO_US}Postcode',
 'ProvinceState3': '{http://www.example.com/IPO_US}ProvinceState'}

9.15 Mapping name spaces to defined types

Near the bottom of modules generated by generateDS.py (using "-o") is a generated global variable NamespaceToDefMappings_. This variable contains a mapping (a Python dictionary) from name space URIs to a list of the xs:complexType and xs:simpleType types defined under that name space. Actually, for each type there is a 3-tuple containing (1) the type name (for a xs:complexType this corresponds to the name of the Python class generated for that type), (2) the name of the XML schema file in which the type is defined, and (3) "CT" for a xs:complexType or "ST" for a xs:simpleType.

10 The XML schema input to generateDS

Note: Quite a bit of work has been done on generateDS.py since this section was written. So, it accepts and processes more of features in XML Schema than earlier. The best advice is to give it a try on your schema. If it works, great. If it does not, post a message to the list: generateds-discuss -- https://sourceforge.net/p/generateds/mailman/generateds-discuss/.

generateDS.py actually accepts a subset of XML Schema. The sample XML Schema file should give you a picture of how to describe an XML file and the Python classes that you will generate. And here are some notes that should help:

Specify the tag in the XML file and the name of the generated Python class in the name attribute on the xs:element. For example, to generate a Python class named "person", which will be populated from an XML element/tag "person", use the following XML Schema snippet:
```
<xs:element name="person" ...
```
To specify a data member for a generated Python class that will be propogated from an attribute in an element in an XML file, use the XML Schema xs:attribute. For attributes, generateDS recognizes the following types: "xs:string", "xs:integer", and "xs:float". For example, the following adds member data items "hobby" and "category" with types "xs:string" and "xs:integer":
```
<xs:element name="person">
    <complexType>
        <xs:attribute name="hobby" type="xs:string" />
        <xs:attribute name="category" type="xs:integer" />
    </complexType>
</xs:element>
```
To specify a data member for a generated Python class whose value is a string, integer, or float and which will be populated from a nested (simple) element, specify a nested XML Schema element whose type is "xs:string", "xs:integer", or "xs:float". Here is an example which defines a Python class "person" with a data member "description" which is a string and which is populated from a (simple) nested element:
```
<xs:element name="person">
    <complexType>
        <sequence>
            <xs:element name="description" type="xs:string" />
        <sequence>
    </complexType>
</xs:element>
```
To specify a data member of a generated Python class that will be populated from a nested XML element, refer to the nested object in the "type" attribute and then define another element/type whose name is that type. For example, the following specifies that the person class will have a data member named "transportation" that will be populated from a nested XML element "bicycle" and whose value will be an instance of the generated class "bicycle":
```
<xs:element name="person">
    <complexType>
        <sequence>
            <xs:element name="transportation" type="bicycle" />
        <sequence>
    </complexType>
</xs:element>

<xs:element name="bicycle">
    o
    o
    o
</xs:element>
```

To specify a data member of a generated Python class that will contain a list of instances of a generated classes and populated from nested XML elements, add the "maxOccurs" attribute with value "unbounded". Here is an example:

<xs:element name="person">
    <complexType>
        <sequence>
            <xs:element name="transportation" type="bicycle" maxOccurs="unbounded" />
            <xs:element name="description" type="xs:string" maxOccurs="unbounded" />
        <sequence>
    </complexType>
</xs:element>

<xs:element name="bicycle">
    o
    o
    o
</xs:element>

Here are a few additional rules that will help you to write XML Schema files for generateDS.py:

The first (top most) class definition (i.e. the first "xs:element" in the .xsd file) is assumed to be the root element in XML input files. Possibly XML Schema has another way to specify the root, but I was not about to find it in the spec. To specify root element, see command line option "--root-element" in section Running generateDS.py.
The "name" attribute of the "xs:element" must match the tag in the XML file from which instances of this object will be populated. You can change the names of the generated class by using the "-p<prefix>" option, which preprends a prefix to each class name.
The "type" attribute of the "xs:element" should match the "name" attribute of a (separately defined) type (i.e. an xs:element) in order to define a member data item that takes an instance or list of instances of a Python class.

10.1 Additional constructions

Here are a few additional constructions that generateDS.py understands.

10.1.1 <complexType> at top-level

You can use the <complexType> element at top level (instead of <element>) to define an element. So, for example, instead of:

<xs:element name="server-type">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="server-name" type="xs:string"/>
            <xs:element name="server-description" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

you can use the following, which is equivalent:

<xs:complexType name="server-type">
    <xs:sequence>
        <xs:element name="server-name" type="xs:string"/>
        <xs:element name="server-description" type="xs:string"/>
    </xs:sequence>
</xs:complexType>

10.1.2 Use of "ref" instead of "name" and "type" attributes

You can use the "ref" attribute to refer to another element definition, instead of using the "name" and "type" attributes. So, for example, you can use the following:

<xs:element name="server-info">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="server-comment" type="xs:string"/>
            <xs:element ref="server-type" />
        </xs:sequence>
    </xs:complexType>
</xs:element>
   in place of this:
<xs:element name="server-info">
    <xs:complexType>
        <xs:sequence>
            <xs:element name="server-comment" type="xs:string"/>
            <xs:element name="server-type" type="server-type"/>
        </xs:sequence>
    </xs:complexType>
</xs:element>

10.1.3 Extension types

generateDS.py generates a subclass for each element that that is defined as the extension of a base element. So, for the following:

<xs:complexType name="BType">
    <xs:complexContent>
        <xs:extension base="AType">
            <xs:sequence>
                o
                o
                o

generateDS.py will generate something like the following:

class BType(AType):
    o
    o
    o

10.1.4 Elements containing mixed content

generateDS.py generates special code to handle elements defined as containing mixed content, that is elements defined with attribute mixed="true". See section Mixed content for more details.

11 XMLBehaviors

With the use of the "-b" command line option, generateDS.py will also accept as input an XML document instance that describes behaviors to be added to subclasses when the subclass file is generated with the "-s" command line option.

An example is provided in the Demos/Xmlbehavior sub-directory of the distribution.

The XMLBehaviors capability in generateDS.py was inspired and, for the most part, designed by gian paolo ciceri (gp.ciceri@suddenthinks.com). This work is part of our work on our application development project for Quixote.

11.1 The XMLBehaviors input file

This section describes the XMLBehavior XML document that is used as input to generateDS.py. The XMLBehavior XML document is an XML instance document (given as an argument to the "-b" command line flag) that describes behaviors (methods) to be added to class definitions in the subclass file (generated with the "-s" command line flag).

See file xmlbehavior_po.xml in the Demos/Xmlbehavior directory in the distribution for an example that you can use as a model.

The elements in the XMLBehavior document type are the following:

<xb:xml-behavior> -- The base element in the document.
- <xb:base-impl-url> -- The root (left-most portion) of URL containing implementation bodies. Implementation URLs are appended to this base URL.
- <xb:behaviors> -- A list of behaviors.
  - <xb:behavior> -- Describes a single XMLBehavior.
    - <xb:class> -- The name of the class to which this behavior is to be added.
    - <xb:name> -- The name of the behavior/method. Must conform to Python name syntax.
    - <xb:args> -- A list of arguments to the behavior/method.
      - <xb:arg> -- A positional argument to the method.
        
        <xb:name> -- The name of the argument.
        
        <xb:data-type> -- The data-type of the argument.
    - <xb:return-type> -- The data-type of the value returned by the behavior/method.
    - <xb:impl-url> -- The URL of the implementation body. This value will be concatenated to the right-hand side of the base-impl-url.
    - <xb:ancillaries> -- A list of ancillary behaviors/methods. Each ancillary has a role, which defines how it is to be used.
      - <xb:ancillary> -- A specification of an ancillary behavior/method.
        
        <xb:name> -- The name of the behavior/method. Must conform to Python name syntax.
        
        <xb:role> -- The method's role. The following values are supported:
        
        "DBC-precondition" -- A Design By Contract-style pre-condition check. This method will be called before the core behavior/method itself.
        
        "DBC-postcondition" -- A Design By Contract-style post-condition check. This method will be called after the core behavior/method itself.
        
        <xb:args> -- A list of arguments to the ancillary behavior/method. The element has the same content as the <xb:args> element for the core behavior/method.
        
        <xb:return-type> -- The data-type of the value returned by the behavior/method.
        
        <xb:impl-url> -- The URL of the implementation body. This value will be concatenated to the right-hand side of the base-impl-url.

11.2 Implementing other sources for implementation bodies

generateDS.py contains a function get_impl_body() that implements the ability to retrieve implementation bodies. The current implementation retrieves implementation bodies from an Internet Web URL. Other sources for implementation bodies can be implemented by modifying get_impl_body().

As an example, the version that follows first tries to retrieve an implementation body from a Web address and, if that fails, attempts to obtain the implementation body from a file in the local file system using the <xb:base-impl-url> as a path to a directory containing files, each of which contains one implementation body and <xb:impl-url> as the file name. This implementation of get_impl_body was provided by Colin Dembovsky of Systemsfusion Inc. Thanks, Colin. (I've included it in the generateDS.py script, but commented out, for those who want to use and possibly extend it.):

import requests

def get_impl_body(classBehavior, baseImplUrl, implUrl):
    impl = '        pass\n'
    if implUrl:
        trylocal = 0
        if baseImplUrl:
            implUrl = '%s%s' % (baseImplUrl, implUrl)
        try:
            impl = requests.get(implUrl).content
        except:
            trylocal = 1
        if trylocal:
            try:
                implFile = file(implUrl)
                impl = implFile.read()
                implFile.close()
            except:
                print '*** Implementation at %s not found.' % implUrl
    return impl

12 Additional features

Here are additional features. Note that some of this support was contributed by users such as Chris Allan. Many thanks.

12.1 GraphQL support

12.1.1 Introduction

You can generate modules that contain code to support GraphQL queries into the data contained in XML instance documents described by the XML schema from which you generated your module. For information about GraphQL see -- https://graphql.org/.

The generated code uses the Strawberry GraphQL Python module. See https://pypi.org/project/strawberry-graphql/ and https://strawberry.rocks.

In order to use this facility, you must install Strawberry. See the Strawberry getting started guide at https://strawberry.rocks/docs.

This facility supports GraphQL queries. There is no support for mutations.

There is a demo in the Demo/People directory of the source code repository. See README.txt and the scripts run-gen-graphql.sh and run-test-graphql.sh in that directory. The source code repository can be downloaded from https://sourceforge.net/p/generateds/ or can be cloned with the following:

$ hg clone http://hg.code.sf.net/p/generateds/code generateds-code

There is more information along with some examples of usage here -- http://reifywork.com/generateds-graphql.html.

12.1.2 Generating a module containing GraphQL support

In order to run the Strawberry GraphQL server for your data, you must first generate a generateDS.py module that contains Strawberry GraphQL support code. Such a module cannot be used as a normal generateDS.py module.

A module that contains this support code, is generated with the "--graphql=" (or "-g") command line option when you run generateDS.py. For example:

$ generateDS.py -o my_module.py --graphql="tagname:typename" my_xml_schema.xsd

Where:

tagname is the tag of the top-most element in your XML instance documents.
typename is the name of the XML complexType for the top-most element in your XML instance documents.

You have now generated a GraphQL API against which you can apply GraphQL queries. These queries follow the nested structure of your XML instance document. At each level, you have access to the child nodes and the attributes of the nodes at that level.

12.1.3 Run the Strawberry server

After generating your module, you can run the Strawberry server that the module implements with something like the following:

$ strawberry server my_module

Important -- Before running the Strawberry server on your module, you must set the environment variable GRAPHQL_ARGS to specify the path/name of an XML instance document that contains the data to which you want to apply GraphQL queries. This XML document must be an instance of the document type of the schema used to generate your module and should validate against that schema.

I'm on Linux, and so I can do that with, for example, either of the following, assuming that "my_module.py" is the name of the module generated by generateDS.py:

$ export GRAPHQL_ARGS=my_input_data.xml
$ strawberry server my_module

Or, on a single line:

$ GRAPHQL_ARGS=my_input_data.xml strawberry server my_module

For help, run the following:

$ strawberry --help
$ strawberry server --help

12.1.4 Make requests and queries

After starting Strawberry, you can interact with your Strawberry server in the following ways (among others):

Visit http://0.0.0.0:8000/graphql in your Web browser and use the Strawberry interactive REPL (read–eval–print loop). Note that interactive Strawberry supports tab completion (use the Tab key and Ctrl-spacebar) and provides visual hints to guide you through your generated GraphQL API. And near the upper left of the screen (in your Web browser) are buttons that will display (1) a Documentation Explorer and (2) a GraphQL Explorer, which, among other things, will describe your GraphQL API and enable you to interactively point and click to generate queries for your API.

Use cUrl to make requests (see https://curl.se/) and receive JSON data. For example, here is a bash shell script that I can run from the Linux command line:

#!/usr/bin/bash -x

curl 'http://0.0.0.0:8000/graphql' \
  -X POST \
  -H 'content-type: application/json' \
  --data '{
    "query": "{ container { author { name book { author title date genre rating }}}}"
  }'

Use a Python script and the Python requests module (see https://pypi.org/project/requests/ and https://requests.readthedocs.io/en/latest/). For example:

#!/usr/bin/env python

"""
synopsis:
    Demonstration of Python requests to a Strawberry GraphQL server.
usage:
    python request01.py
"""

import requests
import pprint

Query01 = """
{ "query":
"{
  people {
    person {
      name
      interest
      promoter {
        firstname
        lastname
        clientHandler {
          fullname
        }
      }
    }
    programmer {
      name
      interest
      category
      fruit
      vegetable
      value
      range_
      elparam {
        name
        semantic
      }
    }
  }
}"
}
"""

def test():
    headers = {
        'content-type': 'application/json',
    }
    url = "http://0.0.0.0:8000/graphql"
    query = Query01
    query = query.replace('\n', ' ')
    response = requests.post(url, query, headers=headers)
    jsonobj = response.json()
    data = jsonobj["data"]
    print('-' * 40)
    pprint.pprint(data)
    print('-' * 40)

def main():
    test()

if __name__ == "__main__":
    main()

12.2 xsd:list element support

xsd:list elements can be used with a child xsd:simpleType which confuses the XschemaHandler stack unrolling. xsd:list element support should allow the following XML Schema definition to be supported in generateDS.py:

<xsd:attribute name="Foo">
    <xsd:simpleType>
        <xsd:list>
            <xsd:simpleType>
                ...
            </xsd:simpleType>
        </xsd:list>
    </xsd:simpleType>
</xsd:attribute>

12.3 xsd:enumeration support

The enumerated values for the parent element are resolved and made available through the instance attribute values.

12.4 xsd:union support

In order to properly resolve and query types which are unions in an XML Schema, an element's membership in an xsd:union is available through the instance attribute unionOf.

12.5 Extended xsd:choice support

When a parent xsd:choice is exists, an element's "maxOccurs" and "minOccurs" values can be inherited from the xsd:choice rather than the element itself. xsd:choice elements have been added to the child element via the choice instance attribute and are now used in the "maxOccurs" and "minOccurs" attribute resolution. This should allow the following XML Schema definition to be supported in generateDS.py:

<xsd:element name="Foo">
    <xsd:complexType>
        <xsd:choice maxOccurs="unbounded">
            <xsd:element ref="Bar"/>
            <xsd:element ref="Baz"/>
        </xsd:choice>
    </xsd:complexType>
</xsd:element>

12.6 Arity, minOccurs, maxOccurs, etc

Some applications require information about the "minOccurs" and "maxOccurs" attributes in the XML Schema. Some of that information can be obtained by using the --member-specs= (list|dict) command line option, then looking at the member_data_items_ class variable that it generates in each data representation class. In particular, look at the get_container method (from class MemberSpec_).

12.7 More thorough content type and base type resolution

The previous content type and base type resolution is insufficient for some needs. Basically it was unable to handle more complex and shared element and simpleType definitions. This support has been extended to more correctly resolve the base type and properly indicate the content type of the element. This should provide the ability to handle more complex XML Schema definitions in generateDS.py. Documentation on the algorithm for how this is achieved is available as comments in the source code of generateDS.py -- see comments in method resolve_type in class XschemaElement.

12.8 Making top level simpleTypes available from XschemaHandler

Some developers working to extend the analysis and code generation in generateDS.py may be helped by additional information collected during the parsing of the XML Schema file.

Some applications need all the top level simpleTypes to be available for further queries after the SAX parser has completed its work and after all types have been resolved. These types are available as an instance attribute topLevelSimpleTypes inside XschemaHandler.

12.9 Namespaces -- inserting namespace definition in exported documents

In some cases, the document produced by a call to an export method will contain elements that have namespace prefixes. For example, the following snippet contains namespace prefix "abc":

<abc:people >
    <abc:person>
    o
    o
    o
    </abc:person>
</abc:people>

A way is needed to insert a namespace prefix definition into the generated document. Here is how generateDS.py fills that need.

Each generated export method takes an optional argument namespacedef_. If provided, the value of that parameter is inserted in the exported element. So, for example, the following call:

people.export(sys.stdout, 0,
    namespacedef_='xmlns:abc="http://www.abc.com/namespace"')

might produce:

<abc:people xmlns:abc="http://www.abc.com/namespace">
    <abc:person>
    o
    o
    o
    </abc:person>
</abc:people>

If this is an issue for you, then you may also want to consider using the "--namespacedef" command line option when you run generateDS.py. The value of this option will be passed in to the export function in the generated parse functions. So, for example, running generateDS.py as follows:

generateDS.py --namespacedef='xmlns:abc="http://www.abc.com/namespace.xsd"'
    -o mylib.py -s myapp.py myschema.xsd

will generate parse methods that automatically add the namespacedef_ argument to the call to export.

12.10 Support for xs:any

There is minimal support for the xs:any wild card declaration. Effectively, an element defined by an xs:complexType containing xs:any can contain any element type as a child element. Because generateDS.py does not know how to generate code to handle specific element types during the parsing and building of an XML instance document, it generates a call to a method gds_build_any in the GeneratedsSuper class. This method has a default implementation in the generated code. If your XML schema uses xs:any, you may need to add some code to that default implementation of gds_build_any. See section Overridable methods -- generatedssuper.py for guidance on how to provide an implementation of that method.

For more help with this, look at the code generated from an XML schema that uses xs:any. In particular, look at the code generated in the Python class corresponding to the xs:complexType containing xs:any and look at the default implementation of method gds_build_any in class GeneratedsSuper. Reading the code in the buildChildren and exportChildren methods of a class containing a child declared with xs:any should help you understand what is going on.

When you starting developing your implementation of gds_build_any, look at the code generated in several buildChildren methods. It's likely that you will be able to copy, paste, and edit code from there.

12.11 Generating Lxml Element tree

Once you have build the tree of objects that are instances of the classes generated by generateDS.py, you can use this to produce a tree of instances of the Lxml Element instances. See http://lxml.de/ for more about Lxml. And, see the function parseEtree in the generated code for an example of how to produce the Lxml Element tree:

def parseEtree(inFileName):
    doc = parsexml_(inFileName)
    rootNode = doc.getroot()
    rootTag, rootClass = get_root_tag(rootNode)
    if rootClass is None:
        rootTag = 'test'
        rootClass = Test
    rootObj = rootClass.factory()
    rootObj.build(rootNode)
    # Enable Python to collect the space used by the DOM.
    doc = None
    mapping = {}
    rootElement = rootObj.to_etree(None, name_=rootTag, mapping_=mapping)
    reverse_mapping = rootObj.gds_reverse_node_mapping(mapping)
    content = etree_.tostring(
        rootElement, pretty_print=True,
        xml_declaration=True, encoding="utf-8")
    sys.stdout.write(content)
    sys.stdout.write('\n')
    return rootObj, rootElement, mapping, reverse_mapping

12.11.1 Mapping generateDS objects to Lxml Elements and back

Now suppose that you have produced the tree of instances of the generated classes, and suppose that you have used that to produce a tree of instances of the Element class from Lxml. It may be useful to have a dictionary that maps instances in one tree to the corresponding instances in the other. You can create that dictionary by passing an empty dictionary as the value of the optional parameter mapping_ in the call to the to_tree method. And, you can produce the reverse mapping by calling the convenience method gds_reverse_node_mapping from superclass GeneratedsSuper. Again, see the code above for an example.

12.12 Specifying names for anonymous nested type definitions

generateDS.py automatically assigns names for types (and the classes generated from them), when that type definition (for example, xs:complexType) does not have a name and it is nested inside another type definition. However, these assigned names, in part because of the need to make them unique, can be difficult to predict.

Therefore, generateDS.py provides a way to specify the names of the Python classes generated from these anonymous, nested types. To do so, follow these steps:

Create a module named gds_inner_name_map (gds_inner_name_map.py).
Place that module where Python can import it when you run generateDS.py. You can do this either by adding an additional directory to the environment variable PYTHONPATH or by placing gds_inner_name_map.py in a directory that is already on Python's search path for modules. You can check this by running the following Python code snippet:
```
import sys
print sys.path
```
Also see: https://docs.python.org/2/library/sys.html#sys.path
In module gds_inner_name_map, define a global variable Inner_name_map. The value of this variable should be a dictionary that maps 2-tuples containing (a) the name of the grandparent type and (b) the name of the parent type onto the new name.

If generateDS.py cannot import Inner_name_map from gds_inner_name_map, then it will, by default, generate unique names. In particular, it automatically generates names for anonymous, nested types when the following Python statement fails:

from gds_inner_name_map import Inner_name_map

Here is an example of module gds_inner_name_map.py:

Inner_name_map = {
    ("classAType", "inner"): "inner_001",
    ("classBType", "inner"): "inner_002",
}

Usage hints:

When generateDS.py succeeds in importing Inner_name_map from gds_inner_name_map, but cannot find one of the required mappings in that dictionary, it will throw an exception and print out the missing mapping. You can copy this line; paste it into your gds_inner_name_map.py; and edit it so as to specify the class name of your choice.
Make sure that the names you specify are unique within your XML schema.

12.13 Controlling and using simple type validation

Simple types that are defined with restrictions are validated: their restrictions are checked.

See any of the generated methods whose names are of the form validate_xxx_ and their calls in the build method in order to learn how these methods are used and how you can use them.

You can turn this checking off by setting the global variable Validate_simpletypes_ in your generated module to False.

Warning messages that are produced when the value of a simple type fails to validate against its restrictions are collected in an instance of class GdsCollector_. This class is defined in your generated module. See the parse* methods in your generated module for help with using a collector.

12.14 Generating validator methods

generateDS.py generates a validator method in a complex type class for each use of a simple type that contains restrictions on a simple type. During the build process, the generated code calls these methods to ensure that input data (from an XML instance document) validates against those restrictions. Note that this is done only if the global variable Validate_simpletypes_ is True, which is the default.

You can, of course, call any of these validator methods manually and programmatically.

generateDS.py also supports a feature that enables you to generate a validator method that calls all the specific validator methods on each attribute and child in the class for which that specific validator method is relevant.

To instruct generateDS to do this, add the word "validate" to the "--export" command line option. For example:

generateDS.py -o mylib.py --export="write validate" myschema.xsd

To learn how to perform validation, take a look at a generated validate_ method in your generated module.

A few usage hints:

In order to call the validate_ method, you will need to create an instance of class GdsCollector_. This class is defined in your generated module. You will pass this as an argument to the validate_ method and later can use this collector instance to retrieve any generated validation warning messages.
Notice that each validate_ method has a recursive parameter, which is True by default. You can use this to control whether (1) only a single instance is validated or (2) whether the current instance and all children are recursively validated.

13 How to use the generated source code

13.1 The parsing functions

The simplest use is to call one of the parsing functions in the generated source file. You may be able to use one of these functions without change, or can modify one to fit your needs. generateDS.py generates the following parsing functions:

parse -- Parse an XML document from a file.
parseString -- Parse an XML document from a string.

These parsing functions are generated in both the superclass and the subclass files. Note the call to the export method. You may need to comment out or un-comment this call to export according to your needs.

For example, if the generated source is in people.py, then, from the command line, run something like the following:

python people.py people.xml

Or, from within other Python code, use something like the following:

import people
rootObject = people.parse('people.xml')

13.2 Recognizing the top level element

It might be that the generated module, when parsing an XML instance document, does not, by default, recognize the top level (root) element in an instance document. This might happen because generateDS.py does not detect the correct top level element from the XML schema or because you need to use the generated module to parse instance documents that have different top level elements. If this is the case, you might pick and use one of the following strategies:

In your schema, move the definition of the element type that defines the top level element in your instance documents to the top of the schema. By default, generateDS.py uses the first definition in the schema as the when constructing the generated parse function.
Use the "--root-element" command line option to specify top level element. But, be aware that this only works if the tag name and type name of the top level element are the same.

Modify the parse function in your generated module, replacing the class whose factory is called and the tag name passed in to the export method. For example, change:

def parse(inFileName):
    doc = minidom.parse(inFileName)
    rootNode = doc.documentElement
    rootObj = type1.factory()
    rootObj.build(rootNode)
    # Enable Python to collect the space used by the DOM.
    doc = None
    sys.stdout.write('<?xml version="1.0" ?>\n')
    rootObj.export(sys.stdout, 0, name_="type1",
        namespacedef_='')
    return rootObj

to:

def parse(inFileName):
    doc = minidom.parse(inFileName)
    rootNode = doc.documentElement
    rootObj = type2.factory()
    rootObj.build(rootNode)
    # Enable Python to collect the space used by the DOM.
    doc = None
    sys.stdout.write('<?xml version="1.0" ?>\n')
    rootObj.export(sys.stdout, 0, name_="type2",
        namespacedef_='')
    return rootObj

Notice that we've changed the two occurrences of "type1" to "type2".

Using the generated parse function as a model, create a separate module that imports your generated module. In the parse function in your module, make a change similar to that suggested above. And, of course, add any additional code needed by your application.

Write a separate module containing your own parse function that inspects the top level element of an input XML instance document and automatically determines which generated class should be used to parse it. Here is an example:

#!/usr/bin/env python

import sys
from optparse import OptionParser
from xml.dom import minidom
import mygeneratedmodule as gendsmod

def get_root_tag(node):
    tag = node.tagName
    tags = tag.split(':')
    if len(tags) > 1:
        tag = tags[-1]
    rootClass = None
    if hasattr(gendsmod, tag):
        rootClass = getattr(gendsmod, tag)
    return tag, rootClass

def parse(inFilename, options):
    doc = minidom.parse(inFilename)
    rootNode = doc.documentElement
    rootTag, rootClass = get_root_tag(rootNode)
    rootObj = rootClass.factory()
    rootObj.build(rootNode)
    # Enable Python to collect the space used by the DOM.
    doc = None
    sys.stdout.write('<?xml version="1.0" ?>\n')
    rootObj.export(sys.stdout, 0, name_=rootTag,
        namespacedef_='')
    doc = None
    return rootObj

USAGE_TEXT = """
    python %prog [options] <somefile.xml>"""

def usage(parser):
    parser.print_help()
    sys.exit(1)

def main():
    parser = OptionParser(USAGE_TEXT)
    (options, args) = parser.parse_args()
    if len(args) == 1:
        infilename = args[0]
        parse(infilename, options)
    else:
        usage(parser)

if __name__ == "__main__":
    main()

Notice the call to get_root_tag, which attempts to recognize the top level tag in the input XML document so that the parse function can parse and export it.

13.3 The export methods

The generated classes contain methods export and exportLiteral which can be called to export classes to several text formats, in particular to an XML instance document and a Python module containing Python literals. See the generated parse functions for examples showing how to call the export methods.

13.3.1 Method export

The export method in generated classes writes out an XML document that represents the instance that contains it and its child elements. So, for example, if your instance tree was created by one of the parsing functions described above, then calling export on the root element should reproduce the input XML document, differing only with respect to ignorable white space.

Arguments to the generated export method:

outfile -- A file like object open for writing.
level -- the indentation level. If the pretty_print argument is True, the (generated) function showIndent is used to prefix each exported line with 4 spaces for each level of indent.
namespace_ -- An empty string or an XML namespace prefix plus a colon, example "abc:". This value is printed immediately in front of the tag name.
name_ -- The element tag name. Note that the tag name can be overridden by the original_tagname_, which can be set by the class constructor.
namespacedef_ -- Zero or more namespace prefix definitions. Actually, its value can be any attribute-value pairs. Examples:

'' 'xmlns:abc="http://www.abc.com"http://www.def.com" 'xmlns:abc="http://www.abc.com xmlns:def="http://www.def.com"

or, because it is printed where the attributes occur, even:

'size="25" color="blue"'

For more on namespacedef_, see: Namespaces -- inserting namespace definition in exported documents
pretty_print -- If True, exported output is printed with indentation and newlines. If False, indentation and newlines are omitted, which produces a more compact representation.

Also see the comments on generatedsnamespaces and GenerateDSNamespaceDefs near the top of each generated module.

13.3.2 Method `exportLiteral`

generateDS.py generates Python classes that represent the elements in an XML document, given an Xschema definition of the XML document type. The exportLiteral method will export a Python literal representation of the Python instances of the classes that represent an XML document.

13.3.2.1 What It Does

When generateDS.py generates the Python source code for your classes, this new feature also generates an exportLiteral method in each class. If you call this method on the root (top-most) object, it will write out a literal representation of your class instances as Python code.

generateDS.py also generates a function at top level (parseLiteral) that parses an XML document and calls the "exportLiteral" method on the root object to write the data structure (instances of your generated classes) as a Python module that you can import to (re-)create instances of the classes that represent your XML document.

13.3.2.2 Why You Might Care

generateDS.py was designed and built with the assumption that we are not interested in marking up text content at all. What we really want is a way to represent structured and nested date in text. It takes the statement, "I want to represent nested data structures in text.", entirely seriously. Given that assumption, there may be times when you want a more "Pythonic" textual representation of the Python data structures for which generateDS.py has generated code. exportLiteral enables you to produce that representation.

This feature means that the classes that you generate from an XML schema support the interchangeability of XML and Python literals. This means that, given classes generated by generateDS.py for your XML document type, you can perform the following transformations:

Translate an XML document into a Python module containing a literal definition of the contents of the XML document.
Translate the literal definition of a Python data structure into an XML instance document.

This capability enables you to:

Work with an XML (text) document, then exchange it for a Python text representation of the content of that document.
Work with a Python literal text representation of your XML document, then exchange that for an XML document that represents the same content.
"Freeze" your XML document as a Python module that you can import. The module can be edited with your text editor, so perhaps it would be better to say that it is frozen, but not too hard. The classes that you generate with generateDS.py can be used to:
1. Read in an XML document.
2. (Optionally) modify the Python instances that represent that XML document.
3. Write the instances out as a Python module that you can later import.

13.3.2.3 How to use it

See the generated function parseLiteral for an example of how to use exportLiteral.

13.3.3 Exporting compact XML documents

You can also export "compact" XML documents. A compact document is one that is exported without the ignorable whitespace that is used to produce pretty printed documents. In contrast, a pretty printed document will have leading white space on most lines to show indentation.

To produce compact documents, pass the optional argument pretty_print=False to the export function. Check the "parse" functions generated near the bottom of modules generated by generateDS.py, where pretty_print=True is passed in by default.

13.4 Building instances

If you have an instance of a minidom node that represents an element in an XML document, you can also use the 'build' member function to populate an instance of the corresponding class. Here is an example:

from xml.dom import minidom
from xml.dom import Node

doc = minidom.parse(inFileName)
rootNode = doc.childNodes[0]
people = []
for child in rootNode.childNodes:
    if child.nodeType == Node.ELEMENT_NODE and child.nodeName == 'person':
        obj = person()
        obj.build(child)
        people.append(obj)

13.5 Using the subclass module

If you choose to use the generated subclass module, and I encourage you to do so, you may need to edit and modify that file. Here are some of the things that you must do (look for "???"):

Edit the import statement at the top of the file. It should import the generated superclass file. Note that you can also use the "--super" command line option to insert this automatically.
Edit the USAGE_TEXT string so that it gives a help message appropriate for your use.
Edit the main function toward the bottom of the file. It should call a method, that you have possibly added, to the root subclass.

You can also (and most likely will want to) add methods to the generated classes. See the section How to Modify the Generated Code for more on this.

The classes generated from each element definition provide getter and setter methods to access its attributes and child elements.

Elements that are referenced but not defined (i.e. that are simple, for example strings, integers, floats, and booleans) are accessed through getter and setter methods in the class in which they are referenced.

13.6 Elements with attributes but no nested children

13.7 Mixed content

The goal of generateDS.py is to support data structures represented in XML as opposed to text mark-up. However, it does provides some support for mixed content. But, for mixed content, the data structures and code generated by generateDS.py are fundamentally different from those for elements that do not contain mixed content.

There are limitations, of course. A known limitation is related to extension elements. Specifically, if an element contains mixed content, and this element extends a base class, then the base class and any classes it extends must be defined to contain mixed content. This is due to the fact that generateDS.py generates a data structure (class) for elements containing mixed content that is fundamentally different from that generated for other elements.

Here is an example of mixed content:

<note>This is a <bold>nice</bold> comment.</note>

When an element is defined with something like the following:

<xs:complexType mixed="true">
    <xs:sequence>
        o
        o
        o

then, instead of generating a class whose named members refer to nested elements, a class containing a list of instances of class MixedContainer is generated. In order to process the content of a mixed content element, the code you write will need to walk this list of instances of MixedContainer and check the type of each item in that list. Basically, the structure becomes more DOM-like in the sense that it has a list of children, rather than named fields.

Instances of MixedContainer have the following methods:

getCategory -- Returns one of the following, depending on the content:
- CategoryText -- Text content.
- CategorySimple -- Simple elements, that is, elements defined as xs:string, xs:integer, etc. For these, the member variable content_type, accessible through method getContenttype will contain one of TypeString, TypeInteger, TypeFloat, TypeDecimal, TypeDouble, or TypeBoolean.
- CategoryComplex -- Complex elements represented by a generated class. For these, the member variable name, accessible through method getName will return the element/tag name and the member variable value, accessible through method getValue will return the instance.
getContenttype -- Returns one of TypeString, TypeInteger, TypeFloat, TypeDecimal, TypeDouble, or TypeBoolean. Valid only when category is CategorySimple.
getName -- For CategoryComplex, returns the name of the element.
getValue -- Returns the value of this chunk of content. Its type depends on the value returned by getCategory and getContenttype.

13.8 anyAttribute

For elements that specify anyAttributes, generateDS.py produces a class containing the following:

A member variable anyAttributes_ containing a Python dictionary. After parsing an XML instance document, this dictionary will contain name-value pairs for any attributes in the instance document not explicitly defined for that element.
The following getters and setters: getAnyAttributes_ and setAnyAttributes_.
Code to export the attribute names and values stored in the dictionary.
Code to parse attributes in addition to those explicitly defined for the element and store them in the dictionary.

Note: Attributes that are explicitly defined for an element are not stored in the dictionary anyAttributes_.

generateDS.py ignores the processContents attribute on the anyAttribute element in the XML Schema

13.9 User Methods

generateDS.py provides a mechanism that enables you to attach user defined methods to specific generated classes. In order to do so, create a Python module containing specifications of those methods and indicate that module on the command line with the "--user-methods" option. Example:

python generateDS.py -f --super=people_sup -o people_sup.py -s people_sub.py --user-methods=/path/to/gends_user_methods.py people.xsd

The argument to the "--user-methods" (or "-u") command line option is a path to a Python module. It should include ".py" at the end. Examples:

-u gends_user_methods.py
-u path/to/methods_module.py

The module specified with the "--user-methods" flag should define a variable METHOD_SPECS which contains a list of instances of a class that implements methods match_name and get_interpolated_source.

See file gends_user_methods.py for an example of this specification file and the definition of class MethodSpec. Read the comments in that file for more guidance.

The member_data_items_ class variable -- User methods, especially those attached to more than one class, are likely to need a list of the members in the current class. Each generated class has a class variable containing a list of specifications of the members in the class. Each item in this list is an instance of class MemberSpec_, which is defined near the top of your generated (super-class) file. Use the following to access the information in each member specification:

m.get_name() -- Returns the name of the member variable (a string).
m.get_data_type() -- Returns the data type of the member variable (a string). If the data type is a list, returns the terminal type, which is that last string in the list. (Also see get_data_type_chain().)
m.get_data_type_chain() -- Returns the data type of the member variable (a string or list). When the data type is a simpleType that has another simpleType as it's base or is a complexType that extends a simpleType, then the data type is a list of strings, for example:
```
['RelationType', 'xs:string']
```
The last string in the list is the terminal type, usually a built-in simple type. Note that m.get_data_type() returns the terminal (last) type.
m.get_container() -- (an integer) Indicates whether the member variable is a single item or a list/container (i.e. generated from maxOccurs > 0): 0 indicates a single item; 1 indicates a list.
m.get_optional() -- (an integer) Returns 0 (zero) if the item is optional (defined with minOccurs="0"), else returns 1.

There are a number of things of interest in this sample file (gends_user_methods.py):

Although, the MethodSpec class must be included in your user methods specification module, you can modify this class. For example, for special situations, it might be useful to modify either of the methods MethodSpec.match_name or MethodSpec.get_interpolated_source. These methods are called by generateDS.py. See comments on the definitions of these methods in gends_user_methods.py.
A method set_up is attached to the root class. (This user method specification module is intended to be used with people.xsd/people.xml in the Demos/People directory.) It performs initialization, before the walk method is called. In particular, set_up initializes a counter and imports the types module (which saves us from having to modify the generated code).
The walk_and_update and walk_and_show methods provide an example showing how to walk the entire document object tree.
The method walk_and_update uses the member_data_items_ class variable to obtain a list of members of the class. It's a list of instances of class MemberSpec_, which support the m.get_name(), m.get_data_type(), and m.get_container() methods described above.
In method walk_and_show, note the use of getattr to retrieve the value of a member variable and the use of setattr to set the value of a member variable.
The expression "%(class_name)s" is used to insert the class name into the generated source code.
Notice how the types module is used to determine whether a member variable contains a simple type or an instance of a class. Example:
```
obj1 = getattr(self, member[0])
if type(obj1) == types.InstanceType:
    ...
```
In string formatting operations, you will need to use double percent signs in order to "pass through" a single percent sign, for example:
```
print '%%d. class: %(class_name)s  depth: %%d' %% (counter, depth, )
```
where the single percent signs are interpolated ("%(class_name)s" is replace by the class name), and double percent signs are replace by single percent signs ("%%d" becomes "%d").

Suggestion -- How to begin:

Make a copy of gends_user_methods.py.
Modify the method specifications in that file. Replace the source code and the class_name pattern in each specification.
Run generateDS.py with the "--user-methods" (or "-u") flag.
Inspect the user methods in the generated classes.
Test your generated code.
Repeat as necessary.

13.10 Overridable methods -- generatedssuper.py

generateDS.py generates calls to several methods that each have a default implementation in a superclass. The default superclass with default implementations is included in the generated code. The user can replace this default superclass by implementing a module named generatedssuper.py containing a class named GeneratedsSuper.

What to look for in the generated code:

In the generated superclass file (generated with command line option "-o"), look for the import of module generatedssuper.py and the definition of the (default) class GeneratedsSuper.
Also look for calls to methods format_integer(), format_float(), format_double(), etc.

To view the default implementation of class GeneratedsSuper, look in a generated superclass module (one generated by the "-o" command line option with generateDS.py). The default definition of class GeneratedsSuper is near the top of a generated module.

If you wish to modify the behavior of any of these methods, see below for instructions on how to do so.

Caution: Overriding any of the *_format_*() methods enables you to export invalid XML. So, use at your own risk, test before using, etc.

How to modify the behavior of the default methods:

Implement methods that override the default methods.
Look at the definition of the default methods in class GeneratedsSuper in order to learn the signature of the methods in that class.
Look at the definition of the default methods to determine what they do and what type of value they return, then do something similar in your overriding method.
Search for and look at the call to the method you are interested in modifying (for example gds_format_string) to learn where and when it is used and for what.

Where to put (implement) methods that override the default methods -- You can place the implementations of methods that override the default methods in the following places:

In a class named GeneratedsSuper in a separate module named generatedssuper. Since this class would replace the default implementations, you should provide implementations of all the default methods listed above in that class. To create your own version, copy and paste the default implementation of class GeneratedsSuper from your generated module into a file named generatedssuper.py, then modify that.
In individual generated (super) classes (the ones generated with the "-o" command line option) using the User Methods feature.
In individual classes in a subclass module generated with the "-s" command line option.

If you want to use the same method in more than one generated subclass, then you might consider putting that method in a "mix-in" class and inherited that method in the generated subclass. With this approach, you must put the mix-in class containing your methods before the regular superclass, so that Python will find your custom methods before the default ones. That is, you must use:
```
class clientSub(MySpecialMethods, supermod.client):
```
not:
```
class clientSub(supermod.client, MySpecialMethods):
```

If you choose to implement module generatedssuper, here are a few instructions and suggestions:

Implement a module generatedssuper.py containing definition of a class GeneratedsSuper. You can copy and paste the default implementation from a superclass module generated with the -o command line option for generateDS.py.
Put this module in a location where it can be imported when your generated code is run. Note the try:except: block in your generated superclass module that attempts to import it and that uses the default implementation of GeneratedsSuper when it cannot.
An easy way to begin is to copy the default definition of the class GeneratedsSuper from a superclass module generated with the "-o" command line option into a module named generatedssuper.py. Then modify your (copied) implementation.

To implement a method that does a task specific to particular class or a particular member of a class, do something like the following:

def gds_format_string(self, input_data, input_name=''):
    if self.__class__.__name__ == 'person':
        return '[[%s]]' % input_data
    else:
        return input_data

or:

def gds_format_string(self, input_data, input_name=''):
    if self.__class__.__name__ == 'booster' and input_name == 'lastname':
        return '[[%s]]' % input_data
    else:
        return input_data

Alternatively, to attach a method to a specific class, use the User Methods or a generated subclass module (command line option "-s"), as described above.

You can also add additional, new methods that you call (for example, in subclasses that you generate with the -s command line option for generateDS.py.

13.11 The element name to class name dictionary

generateDS.py automatically generates a dictionary that maps element/complexType names to the names of the class generated for that complexType definition. This dictionary is named GDSClassesMapping. You will find it in the module generated with the "-o" option.

13.12 Adding custom exported attributes and namespace prefix definitions

You can add additional attributes to exported XML content by (1) providing a module named generatedsnamespaces.py; (2) placing that module somewhere so that it can be imported when you "run" your generated module; and (3) including in generatedsnamespaces.py a global variable named GenerateDSNamespaceDefs whose value is a Python dictionary. The keys in this dictionary should be element type names in the generated module. And the values should be text strings that are attributes to be added to exported elements of that type.

Here is an example:

# file: generatedsnamespaces.py

GenerateDSNamespaceDefs = {
    "A1ElementType": 'xmlns:abc="http://www.abc.com/namespace_a1"',
    "A2ElementType": 'xmlns:abc="http://www.abc.com/namespace_a2"',
}

Notes:

While the original intension of this facility was to enable the user to add XML namespace prefix definitions to the XML content in exported files, you can use it to add other attribute definitions as well.
If you find that generateDS.py is adding a specific namespace prefix definition to many exported XML elements and you want to suppress this behavior, take a look at the --no-namespace-defs command line option. In particular, this command line option may be useful when used together with the capability described in this section (generatedsnamespaces.py).

13.13 Namespace prefixes, xsi:type attributes, and abstract extended types

In some cases where an instance of a type derived from an abstract type is exported and the type of the instance is specified with the attribute "xsi:type", you may need to specify the prefix for the type. Here are notes and an example on how to do that from the comments in a module generated by generateDS.py:

# Additionally, the generatedsnamespaces module can contain a python
# dictionary named GenerateDSNamespaceTypePrefixes that associates element
# types with the namespace prefixes that are to be added to the
# "xsi:type" attribute value.  See the exportAttributes method of
# any generated element type and the generation of "xsi:type" for an
# example of the use of this table.
# An example table:
#
#     # File: generatedsnamespaces.py
#
#     GenerateDSNamespaceTypePrefixes = {
#         "ElementtypeC": "aaa:",
#         "ElementtypeD": "bbb:",
#     }

14 "One Per" -- generating separate files from imported/included schemas

The generateDS.py project provides support for two approaches to this task:

The first (Approach 1 -- Command line option --one-file-per-xsd, below) is likely to be easier to use, but if it does not work for you as is, it is very difficult to customize.
The second method (Approach 2 -- Extraction and generation utilities, below) may require a little more work and understanding, but offers more options and customization, and, since the scripts that implement it are short and rather simple, may be easier to customize or even re-write for your specific needs.

14.1 Approach 1 -- Command line option --one-file-per-xsd

The --one-file-per-xsd command line option enables you to generate a separate Python module for each XML schema that is imported or included (using <xs:import> or <xs:include>) by a "master" schema. Then, in your Python application, these modules can then be imported separately. Alternatively, these modules can be placed in a Python package (a directory containing a file named "__init__.py"). See http://docs.python.org/2/tutorial/modules.html#packages for more on Python packages.

Here is a sample use:

$ ../generateDS.py --one-file-per-xsd --output-directory="OnePer" --module-suffix="One" one_per.xsd

The above command does the following:

It generates one Python module for each XML schema that is included/imported by one_per.xsd.
It places the generated output files in the directory OnePer.
It adds "One" as a suffix to the name of each generated module.

Here are a few hints, guidelines, and suggestions:

At least one element definition in an included/imported module must be a root element definition in order to cause a module to be generated for that schema. In other words, the module must contain an element definition of the form:
```
<xs:element name="Sample" type="sampleType" />
```
You may want to write a separate "master" schema that includes each of the schemas for which you want to generate a separate Python module.
Use the --output-directory=<directory> command line option to tell generateDS.py to generate the Python modules in a specific directory. The directory must already exist.
Use the --module-suffix=<suffix> command line option to add a specifc suffix to each module name (the part immediately before the extension). For example, the option --module-suffix=Abc causes generateDS.py to generate a file named "schema1Abc.py" instead of "schema1.py".
If you want to import files from the output directory and it is not in sys.path, add a file named "__init__.py" to that directory. The existence of the file __init__.py turns the directory into a Python package.

14.2 Approach 2 -- Extraction and generation utilities

The generateds/utils subdirectory contains two utility scripts that may help with this task. The procedure is as follows:

First, use utils/collect_schema_locations.py to collect a set of directives, one for each (included) schema and each module to be generated. This utility writes out a JSON file that contains the directives to be used in the next step.
Next, use utils/batch_generate.py to generate one module (or perhaps two modules, see below) for each directive in that JSON file.

Each of these modules gives a reasonable amount of usage information in response to the --help command line option.

A few hints and suggestions:

After generating the JSON directives file you can modify it with your text editor. For example, (1) you can add the name of sub-class modules to be generated by generateDS.py; (2) you can specify command line options to be used by generateDS.py when generating specific modules; and (3) you can add new directives to generate additional modules.
If you find yourself typing the same command line options to utils/batch_generate.py over and over, there is a facility to put command line options that have long names (i.e., not one character names) into a configuration file. The usage information produced by utils/batch_generate.py --help shows an example. Then use utils/batch_generate.py --config=myoptins.config ... to feed this configuration file to utils/batch_generate.py. The options in this configuration file can be overridden by those entered on the command line.
The JSON directives file can contain comments, even though this is not part of the JSON standard. A comment is any line that begins with "//" where the "//" is proceeded only by white space characters. utils/batch_generate.py strips these lines out before parsing the JSON file. However, if you plan to process this JSON file with other processes, you will likely either not want to add comments or plan to pre-process them in some way.

15 How to modify the generated code

This section attempts to explain how to modify and add features to the generated code.

15.1 Adding features to class definitions

You can add new member definitions to a generated class. Look at the 'export' and 'exportLiteral' member functions for examples of how to access member variables and how to walk nested sub-elements.

Here are interesting places to look in each class definition:

The 'export' and 'exportLiteral' methods -- These methods walk the object tree. You can consider copying and renaming them to produce other tree walking methods.
The 'build' method -- These methods extract information from the minidom node. You can inspect the 'build' methods to learn how to extract information for other purposes.

And, if you need methods that are common to and shared by several of the generated subclasses, you can put them in a new class and add that class to the superclass list for each of your subclasses.

Although you can add your own methods to the generated superclasses, I'm recommeding that you add methods to the generated subclasses in the subclass module generated with the "-s" command line option, and then edit the subclass module in order to build your application. Why?

The superclasses are cluttered with other code. Using the subclass file enables you to keep your application code separate.
By putting your application code in the subclass file, you will be able to reuse the superclass file. You can generate multiple subclass files from the same XML Schema definition file. Each of these subclass files can import the same superclass file.

Here are some alternatives to using the subclass file:

Add more than one method to each generated (super-)class. Each method implements a separate task or "application". If the number of tasks grows, this will create maintenance difficulties, however.
Re-generate multiple (super-)class files. Add methods to the classes in these separate files to implement different tasks. This of course will not work well if you have had to modify the parser, for example, since generating the file.

16 Examples and demonstrations

Under the directory Demos are several examples:

Demos/People provides a simple demonstration of generating Python data structures from XML Schema.
Demos/Outline contains another simple example. Also provided (in outline_extended.py) is an example of extending and adding to the generating code. Look at the show method in classes outline and node in file outline_extended.py. This extension walks the outline tree and writes out a outline.

Suggested uses:

Anything that requires a tree walk of the XML document structure.
The implementation of filters and transformations on XML documents. The following paper discusses and compares this technique with the use of XSLT: XSLT and generateDS -- Analysis, Comparison, and Evaluation -- http://www.reifywork.com/xsltvsgenerateds.html.
Anything that requires a customized tree walk of the XML document. Because you can add methods to the generated classes containing explicit control logic, the order in which nodes of the parsed XML document are visited is under your control.

16.1 Django -- Generating Models and Forms

generateDS.py can be used to generate Django models and Django forms that represent the data structures defined in an XML Schema.

Note: In order to use this capability, you must obtain the "source" distribution of generateDS.py. You can do this either (1) by downloading generateDS-x.xxy.tar.gz from the Python Package Index or (2) by downloading the distribution from Bitbucket at https://bitbucket.org/dkuhlman/generateds. In particular, installing generateDS.py using pip does not give you all the files you need in order to use this capability.

Note: You only need to obtain the source distribution (so that you can copy the files in the django/ directory, for example); you do not necessarily need to install from it. If you have already installed generateDS.py using pip or easy_install, you do not need to re-install from the source tree.

There are support files in the django directory in the source distribution (but not in the version install using pip or easy_install).

Here is an overview of the process:

Step 1. Generate bindings -- Run generateDS.py.
Step 2. Extract simpleType definitions from schema -- Run gends_extract_simple_types.py.
Step 3. Generate models.py and forms.py -- Run gends_generate_django.py.

The script gends_run_gen_django.py performs these three steps.

In order to use the script gends_run_gen_django.py, you may need to tell it where the generateDS.py script is located. If so, use the "-p" command line option. For more information, do:

python gends_run_gen_django.py --help

16.1.1 How to generate Django models and forms

Warning: Running this script attempts to over-write the following files in the current directory:

<schema>lib.py
generateds_definedsimpletypes.py
models.py
forms.py

To over-write these files, use the -f (or --force) command line option.

So, it's a good idea to create a separate, new directory in which to do the following work.

Now, follow these steps:

Create an empty directory:
```
$ mkdir WorkDir
$ cd WorkDir
```
Copy the files in from the sub directory django/ in the of the source distribution of generateDS.py to the current directory:
```
$ cp /my_sources/generateDS-n.nn/django/* .
```
Copy the file process_includes.py in the distribution to the current directory:
```
$ cp /my_sources/generateDS-n.nn/process_includes.py .
```
Run the following:
```
$ ./gends_run_gen_django.py myschema.xsd
```
There are additional command line options for gends_run_gen_django.py. For help, run $ python gends_run_gen_django.py --help.
Copy the generated files models.py and forms.py to your Django application.

16.1.2 How it works

Here are a few notes that might be helpful if and when you need to do some debugging or extend the current capabilities or write a new "meta-app" that uses the same approach but does something new and even entirely different.

gends_run_gen_django.py uses Popen to run other scripts, specifically, it runs generateDS.py, gends_extract_simple_types.py, and gends_generate_django.py.

gends_extract_simple_types.py scans the XML schema doc and extracts simpleType definitions. It writes descriptors of those definitions to the file generateds_definedsimpletypes.py.

gends_generate_django.py generates the models.py and forms.py files by calling the class method generate_model_ for each class in the list of classes in the variable __all__ in the generated bindings. __all__ is defined at the bottom of the generated bindings module.

The class method generate_model_ (along with some tables for predefined simple types etc) is defined in generatedssuper.py, which is imported by the generated bindings module. We are overriding the default version of that class. generate_model_ is defined in the class GeneratedsSuper, which is used as the root super class of all generated data representation classes.

17 Sample code and extensions

17.1 Capturing xs:date elements as dates

The following extension employs a user method (see User Methods) in order to capture elements defined as xs:date as date objects.

Thanks to Lars Ericson for this code and explanation.

By default, generateDS.py treats elements declared as type xs:date as though they are strings.

To get xs:dates stored as dates, in your local copy, add the following user method (User Methods), a slight modification of the sample (in gends_user_methods.py):

method1 = MethodSpec(name='walk_and_update',
    source='''\
    def walk_and_update(self):
        members = %(class_name)s.member_data_items_
        for member in members:
            obj1 = getattr(self, member.get_name())
            if member.get_data_type() == 'xs:date':
                newvalue = date_calcs.date_from_string(obj1)
                setattr(self, member.get_name(), newvalue)
            elif member.get_container():
                for child in obj1:
                    if type(child) == types.InstanceType:
                        child.walk_and_update()
            else:
                obj1 = getattr(self, member.get_name())
                if type(obj1) == types.InstanceType:
                    obj1.walk_and_update()
''',
    class_names=r'^.*$',
    )

Then, define date_calcs.py as:

#!/usr/bin/env python
# -*- mode: pymode; coding: latin1; -*-

import datetime

# 2007-09-01

# test="2007-09-01"
# print test
# print date_from_string(test)

def date_from_string(str):
    year = int(str[:4])
    month = int(str[5:7])
    day = int(str[8:10])
    dt = datetime.date(year, month, day)
    return dt

And, add a "str" here in generateDS.py:

def quote_xml(inStr):
    s1 = str(inStr)
    s1 = s1.replace('&', '&amp;')
    s1 = s1.replace('<', '&lt;')
    s1 = s1.replace('"', '&quot;')
    return s1

Also, add these imports to TEMPLATE_HEADER in generateDS.py:

import date_calcs
import types

18 Limitations of generateDS

18.1 XML Schema limitations

There are things in Xschema that are not supported. You will have to use a restricted sub-set of Xschema to define your data structures. See above for supported features. See people.xsd and people.xml for examples.

And, then, try it on your XML Schema, and let me know about what does not work.

19 Includes -- The XML schema xs:include and xs:import elements

While generateDS.py itself does not process XML Schema include elements, the distribution provides a script process_includes.py that can be used as a preprocessor. process_includes.py is called automatically and by default by generateDS.py. This behavior can be turned off with the --no-process-includes command line option. However, doing so is not advised, because unexpected and undesirable behavior has been detected this is done. Instead consider using the --no-collect-includes and --no-redefine-groups command line options to selectively turn of specific processing done in process_includes.py.

The process_includes.py script scans your XML Schema document and, recursively, documents that are included looking for include elements; it inserts all content into a single document, which it writes out.

Here are samples of how you might use process_includes.py, if your schema contains include elements.

Example 1:

$ python process_includes.py definitions1.xsd | \
$ python generateDS.py -f --super=task1sup -o task1sup.py -s task1sub.py -

Example 2:

$ python process_includes.py definitions1.xsd tmp.xsd
$ python generateDS.py -f --super=task1sup -o task1sup.py -s task1sub.py tmp.xsd

For help and usage information, run the following:

$ python process_includes.py --help

20 Processing RelaxNG schemas

RelaxNG is a schema definition language and is an alternative to XML Schema. For more information on RelaxNG, see: http://relaxng.org/.

generateDS.py does not understand or process RelaxNG schemas. However, the trang application is able to convert RelaxNG into XML Schemas. I've tried it, and was able to convert a relatively small RelaxNG schema into an XML Schema, and then use generateDS.py to generate a bindings module from that. I have not done any serious testing to determine how complete or accurate this conversion is. trang is written in Java, so you will need Java and the JDK installed in order to compile and use it. For what it's worth, here are the steps I followed in order to use trang:

Clone trang from https://github.com/relaxng/jing-trang.git, and, then built it with:

$ git clone https://github.com/relaxng/jing-trang.git
$ cd jing-trang
# set JAVA_HOME to location of JDK
$ export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt
$ ./ant

Run trang with the following:

$ cd jing-trang/build
$ java -jar trang.jar -I rng -O xsd test01.rng test01.xsd

The above command produced test01.xsd.

Run the resulting XML Schema (test01.xsd, in the above case) through generateDS.py:

$ generateDS.py -o test01sup.py -s test01sub.py test01.xsd

You can learn more about trang at the RelaxNG web site and here: https://github.com/relaxng/jing-trang

21 Acknowledgments

Many thanks to those who have used generateDS.py and have contributed their comments and suggestions. These comments have been valuable both in teaching me about things I needed to know in order to continue work and in motivating me to do the work in the first place.

And, a special thanks to those of you who have contributed patches for fixes and new features. Recent help has been provided by the following among others:

Chris Allan -- for several feature additions.

22 See also

Python: The Python home page.

Dave's Page: My home page, which contains more Python stuff.