======================================================
generateDS -- Generate Data Structures from XML Schema
======================================================

:author: Dave Kuhlman
:contact: dkuhlman (at) reifywork (dot) com
:address:
    http://www.reifywork.com

.. Do not modify the following version comments.
   They are used by updateversion.py.

.. version

:revision: 2.44.3

.. version

:date: |date|

.. |date| date:: %B %d, %Y


:copyright: Copyright (c) 2004 Dave Kuhlman. This documentation
    and the software it describes is covered by The MIT License:
    http://www.opensource.org/licenses/mit-license.php.

:abstract: ``generateDS.py`` generates Python data structures (for
    example, class definitions) from an XML Schema document. These
    data structures represent the elements in an XML document
    described by the XML Schema. It also generates parsers that
    load an XML document into those data structures. In addition,
    a separate file containing subclasses (stubs) is optionally
    generated. The user can add methods to the subclasses in order
    to process the contents of an XML document.

.. sectnum::    :depth: 4

.. contents::
    :depth: 4


Moving to a new repository host
===============================

``generateDS`` is moving to a new repository host.

The new repository location is here:
https://sourceforge.net/projects/generateds/

You can clone the repository with the following::

    hg clone http://hg.code.sf.net/p/generateds/code generateds

I thank Bitbucket for their excellent support.  However, Bitbucket
is discontinuing support for Mercurial, and I'd like to continue
using Mercurial for our distributed revision-control tool.


Introduction
============

``generateDS.py`` generates Python data structures (for example,
class definitions) from an XML Schema document. These data
structures represent the elements in an XML document described by
the XML Schema. It also generates parsers that load an XML
document into those data structures. In addition, a separate file
containing subclasses (stubs) is optionally generated. The user
can add methods to the subclasses in order to process the contents
of an XML document.

The generated Python code contains:

- A class definition for each element defined in the XML Schema
  document.

- A main and driver function that can be used to test the
  generated code.

- A parser that will read an XML document which satisfies the XML
  Schema from which the parser was generated. The parser creates
  and populates a tree structure of instances of the generated
  Python classes.

- Methods in each class to export the instance back out to XML
  (method ``export``) and to export the instance to a literal
  representing the Python data structure (method
  ``exportLiteral``).

The generated classes contain the following:

- A constructor method (__init__), with member variable
  initializers.

- Methods with names 'getX' and 'setX' for each member variable
  'X' or, if the member variable is defined with
  maxOccurs="unbounded", methods with names 'getX', 'setX',
  'addX', and 'insertX'.

- A "build" method that can be used to populate an instance of the
  class from a node in a minidom tree.

- An "export" method that will write the instance (and any nested
  sub-instances) to a file object as XML text.

- An "exportLiteral" method that will write the instance (and any
  nested sub-instances) to a file object as Python literals (text).

The generated subclass file contains one (sub-)class definition
for each data representation class. If the subclass file is used,
then the parser creates instances of the subclasses (instead of
creating instances of the superclasses). This enables the user to
extend the subclasses with "tree walk" methods, for example, that
process the contents of the XML file. The user can also generate
and extend multiple subclass files which use a single, common
superclass file, thus implementing a number of different processes
on the same XML document type.

``generateDS.py`` can be run under either Python 2 or Python 3.  The
generated Python code (both superclass and subclass modules) can be
run under either Python 2 or Python 3.

This document explains (1) how to use ``generateDS.py``; (2) how
to use the Python code and data structures that it generates; and
(3) how to modify the generated code for special purposes.

There is also support for packaging the code you generate with
``generateDS.py``.  See `Packaging your code`_.


Where To find it
================

Download
--------

You can find the source distribution here:

- `Python Package Index --
  http://pypi.python.org/pypi/generateDS/
  <http://pypi.python.org/pypi/generateDS/>`_

- `Source Forge --
  http://sourceforge.net/projects/generateds/
  <http://sourceforge.net/projects/generateds/>`_.
  Use Mercurial to clone the repository.  Do the
  following, but possibly change "generateds-code" to the directory
  of your choice::

      hg clone http://hg.code.sf.net/p/generateds/code generateds-code

- `Bitbucket -- Bitbucket is discontinuing support for mercurial.
  The new host for the ``generateDS`` repository is
  ``SourceForge.net`` see above), but I'll keep the Bitbucket
  repository updated until it's removed.  See:
  https://bitbucket.org/dkuhlman/generateds
  <https://bitbucket.org/dkuhlman/generateds>`_


Support and more information
------------------------------

There is a mailing list at SourceForge:
`generateds-discuss --
https://sourceforge.net/p/generateds/mailman/generateds-discuss/
<https://sourceforge.net/p/generateds/mailman/generateds-discuss/>`_.

There is a tutorial in the distribution:
``tutorial/tutorial.html`` and at
`generateDS -- Introduction and Tutorial --
http://www.reifywork.com/generateds_tutorial.html
<http://www.reifywork.com/generateds_tutorial.html>`_.


How to build and install it
===========================

Requirements
--------------

``Lxml`` is used both by ``generateDS.py`` and by the code it
generates.  ``Lxml`` is available at the Python Package Index
https://pypi.python.org/pypi/lxml/ and at the ``Lxml`` project home
site http://lxml.de/.

Older versions of Python XML support can sometimes cause problems.
If you receive a traceback that includes "_xmlplus", then you will
need to remove that ``_xmlplus`` package.


Installation
--------------

De-compress the ``generateDS`` distribution file. Use something
like the following::

    tar xzvf generateDS-x.xx.tar.gz

Then, the regular Distutils commands should work::

    $ cd generateDS-x.xx
    $ python setup.py build
    $ python setup.py install        # probably as root


Packaging your code
=====================

There is some support for packaging the code you generate with
``generateDS.py``.  This support helps you to produce a directory
structure with places to put sample code, sample XML instance
documents, and utility code for use with your generated module.  It
also assists you in using `Sphinx <http://sphinx.pocoo.org/>`_ to
generate documentation for your module.  The Sphinx support is
especially useful when the schema used to generate code contains
"annotation" elements that document complexType definitions.

Instructions on how to use it are here:
`How to package a generateDS.py generated library --
librarytemplate_howto.html
<librarytemplate_howto.html>`_

And the package building support itself is here:
`LibraryTemplate --
http://www.reifywork.com/librarytemplate-1.0a.zip
<http://www.reifywork.com/librarytemplate-1.0a.zip>`_.
It is also included in the generateDS distribution package.


The command line interface -- How to use it
============================================================

Running ``generateDS.py``
-------------------------

Run ``generateDS.py`` with a single argument, the XML Schema file
that defines the data structures. For example, the following will
generate Python source code for data structures described in
people.xsd and will write it to the file people.py. In addition,
it will write subclass stubs to the file peoplesubs.py::

    python generateDS.py -o people.py -s peoplesubs.py people.xsd

Here is the usage message displayed by ``generateDS.py``::


    Synopsis:
        Generate Python classes from XML schema definition.
        Input is read from in_xsd_file or, if "-" (dash) arg, from stdin.
        Output is written to files named in "-o" and "-s" options.
    Usage:
        python generateDS.py [ options ] <xsd_file>
        python generateDS.py [ options ] -
    Options:
        -h, --help               Display this help information.
        -o <outfilename>         Output file name for data representation classes
        -s <subclassfilename>    Output file name for subclasses
        -p <prefix>              Prefix string to be pre-pended to the class names
        -f                       Force creation of output files.  Do not ask.
        -a <namespaceabbrev>     Namespace abbreviation, e.g. "xsd:".
                                 Default = 'xs:'.
        -b <behaviorfilename>    Input file name for behaviors added to subclasses
        -m                       Generate properties for member variables
        -c <xmlcatalogfilename>  Input file name to load an XML catalog
        --one-file-per-xsd       Create a python module for each XSD processed.
        --output-directory="XXX" Used in conjunction with --one-file-per-xsd.
                                 The directory where the modules will be created.
        --module-suffix="XXX"    To be used in conjunction with --one-file-per-xsd.
                                 Append XXX to the end of each file created.
        --subclass-suffix="XXX"  Append XXX to the generated subclass names.
                                 Default="Sub".
        --root-element="XX"      When parsing, assume XX is root element of
        --root-element="XX|YY"   instance docs.  Default is first element defined
                                 in schema.  If YY is added, then YY is used as the
                                 top level class; if YY omitted XX is the default.
                                 class. Also see section "Recognizing the top level
                                 element" in the documentation.
        --super="XXX"            Super module name in generated subclass
                                 module. Default="???"
        --validator-bodies=path  Path to a directory containing files that provide
                                 bodies (implementations) of validator methods.
        --use-old-simpletype-validators
                                 Use the old style simpleType validator functions
                                 stored in a specified directory, instead of the
                                 new style validators generated directly from the
                                 XML schema.  See option --validator-bodies.
        --use-getter-setter      Generate getter and setter methods.  Values:
                                 "old" - Name getters/setters getVar()/setVar().
                                 "new" - Name getters/setters get_var()/set_var().
                                 "none" - Do not generate getter/setter methods.
                                 Default is "new".
        --use-source-file-as-module-name
                                 Used in conjunction with --one-file-per-xsd to
                                 use the source XSD file names to determine the
                                 module name of the generated classes.
        --use-regex-module       Generated modules should import module "regex",
                                 not "re".  Default is False.
        --user-methods= <file_path>,
        -u <file_path>           Optional module containing user methods.  See
                                 section "User Methods" in the documentation.
        --custom-imports-template=<file_path>
                                 Optional file with custom imports directives
                                 which can be used via the --user-methods option.
        --no-dates               Do not include the current date in the generated
                                 files. This is useful if you want to minimize
                                 the amount of (no-operation) changes to the
                                 generated python code.
        --no-versions            Do not include the current version in the
                                 generated files. This is useful if you want
                                 to minimize the amount of (no-operation)
                                 changes to the generated python code.
        --no-process-includes    Do not use process_includes.py to pre-process
                                 included XML schema files.  By default,
                                 generateDS.py will insert content from files
                                 referenced by xs:include and xs:import elements
                                 into the XML schema to be processed and perform
                                 several other pre-procesing tasks.  You likely do
                                 not want to use this option; its use has been
                                 reported to result in errors in generated modules.
                                 Consider using --no-collect-includes and/or
                                 --no-redefine-groups instead.
        --no-collect-includes    Do not (recursively) collect and insert schemas
                                 referenced by xs:include and xs:import elements.
        --no-redefine-groups     Do not pre-process and redefine group definitions.
        --silence                Normally, the code generated with generateDS
                                 echoes the information being parsed. To prevent
                                 the echo from occurring, use the --silence switch.
                                 Also note optional "silence" parameter on
                                 generated functions, e.g. parse, parseString, etc.
        --namespacedef='xmlns:abc="http://www.abc.com"'
                                 Namespace definition to be passed in as the
                                 value for the namespacedef_ parameter of
                                 the export() method by the generated
                                 parse() and parseString() functions.
                                 Default=''.
        --no-namespace-defs      Do not pass namespace definitions as the value
                                 for the namespacedef_ parameter of the export
                                 method, even if it can be extraced from the
                                 schema.
        --external-encoding=<encoding>
                                 Encode output written by the generated export
                                 methods using this encoding.  Default, if omitted,
                                 is the value returned by sys.getdefaultencoding().
                                 Example: --external-encoding='utf-8'.
        --member-specs=list|dict
                                 Generate member (type) specifications in each
                                 class: a dictionary of instances of class
                                 MemberSpec_ containing member name, type,
                                 and array or not.  Allowed values are
                                 "list" or "dict".  Default: do not generate.
        --export=<export-list>   Specifies export functions to be generated.
                                 Value is a whitespace separated list of
                                 any of the following:
                                     write -- write XML to file
                                     literal -- write out python code
                                     etree -- build element tree (can serialize
                                         to XML)
                                     django -- load XML to django database
                                     sqlalchemy -- load XML to sqlalchemy database
                                     validate -- call all validators for object
                                     generator -- recursive generator method
                                 Example: "write etree"
                                 Default: "write"
        --always-export-default  Always export elements and attributes that
                                 a default value even when the current value
                                 is equal to the default.  Default: False.
        --disable-generatedssuper-lookup
                                 Disables the generatetion of the lookup logic for
                                 presence of an external module from which to load
                                 a custom `GeneratedsSuper` base-class definition.
        --disable-xml            Disables generation of all XML build/export
                                 methods and command line interface
        --enable-slots           Enables the use of slots for generated class
                                 members.  Requires --member-specs=dict.
        --preserve-cdata-tags    Preserve CDATA tags.  Default: False
        --cleanup-name-list=<replacement-map>
                                 Specifies list of 2-tuples used for cleaning
                                 names.  First element is a regular expression
                                 search pattern and second is a replacement.
                                 Example: "[('[-:.]', '_'), ('^__', 'Special')]"
                                 Default: "[('[-:.]', '_')]"
        --mixed-case-enums       If used, do not uppercase simpleType enums names.
                                 Default is to make enum names uppercase.
        --create-mandatory-children
                                 If a child is defined with minOccurs="1" and
                                 maxOccurs="1" and the child is xs:complexType
                                 and the child is not defined with
                                 xs:simpleContent, then in the element's
                                 constructor generate code that automatically
                                 creates an instance of the child.  The default
                                 is False, i.e. do not automatically create child.
        --import-path="string"   This value will be pre-pended to the name of
                                 files to be imported by the generated module.
                                 The default value is the empty string ("").
                                 This enables the user to produce relative
                                 import statements in the generated module that
                                 restrict the import to some module in a
                                 specific package in a package directory
                                 structure containing the generated module.
        -q, --no-questions       Do not ask questions, for example,
                                 force overwrite.
        --no-warnings            Do not print warning messages.
        --session=mysession.session
                                 Load and use options from session file. You can
                                 create session file in generateds_gui.py.  Or,
                                 copy and edit sample.session from the
                                 distribution.
        --fix-type-names="oldname1:newname1;oldname2:newname2;..."
                                 Fix up (replace) complex type names.
        -g rootelement:rootclass, --graphql=rootelement:rootclass
                                 Generate methods, functions, query, classes,
                                 and schema for GraphQL.  Specify the root
                                 element (tag) and root class for XML instance
                                 docs.
        --version                Print version and exit.

    Usage example:

        $ python generateDS.py -f -o sample_lib.py sample_api.xsd

    creates (with force over-write) sample_lib.py from sample_api.xsd.

        $ python generateDS.py -o sample_lib.py -s sample_app1.py \
                --member-specs=dict sample_api.xsd

    creates sample_lib.py superclass and sample_app1.py subclass modules;
    also generates member specifications in each class (in a dictionary).


Command line options
----------------------

The following command line options are recognized by ``generateDS.py``:

o <filename>
    Write the data representation classes to file filename.

s <filename>
    Write the subclass stubs to file filename.

p <prefix>
    Prepend prefix to the name of each generated data structure
    (class).

f
    Force generation of output files even if they already exist.
    Do not ask before over-writing existing files.

a <namespaceabbrev>
    Namespace abbreviation, for example "xsd:". The default is
    'xs:'. If the ``<schema> element`` in your XML Schema,
    specifies something other than "xmlns:xs=", then you need to
    use this option. So, suppose you have the following at the
    beginning of your XSchema file::

        <xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema">

    Then you can the following command line option::

        -a "xsd:"

    But, note that ``generateDS.py`` also tries to pick-up the
    namespace prefix used in the XMLSchema file automatically. If
    the <schema> element has an attribute "xmlns:xxx" whose value
    is "http://www.w3.org/2001/XMLSchema", then ``generateDS.py``
    will use "xxx:" as the alias for the XMLSchema namespace in
    the XMLSchema document.

b <behaviorfilename>
    Input file name for behaviors to be added to subclasses.
    Specifies is the name of an XML document containing
    descriptions of methods to be added to subclasses generated
    with the -s flag.  The -b flag requires the -s flag.  See the
    section on `XMLBehaviors`_ below.

m
    Generate property members and new style classes.  Causes
    generated classes to inherit from class object.  Generates
    a call to the built-in property function for each pair of
    getters and setters.  This is experimental.

c <xmlcatalogfilename>
    Specify the file to be used as an XML catalog.  This file will
    be used by process_includes.py if needed to resolve references
    in <xs:import> and <xs:include> elements in the XML Schema.  For
    more information on XML catalogs, see:
    http://en.wikipedia.org/wiki/XML_Catalog

one-file-per-xsd
    Create a separate Python module for each XML Schema document
    processed (for example, using <xs:include> or <xs:import>).  For
    help with using this option, see `"One Per" -- generating
    separate files from imported/included schemas`_.

output-directory <directory>
    When used with ``one-file-per-xsd``, create generated output
    files in path ``<directory>``.

module-suffix <suffix>
    When used with ``one-file-per-xsd``, append ``<suffix>`` to the
    end of each module name.

subclass-suffix=<suffix>
    Append suffix to the name of classes generated in the subclass
    file.  The default, if omitted, is "Sub".  For example, the
    following will append "_Action" to each generated subclass
    name::

        generateDS.py --subclass-suffix="_Action" -s actions.py mydef.xsd

    And the following will append nothing, making the superclass
    and subclass names the same::

        generateDS.py --subclass-suffix="" -s actions.py mydef.xsd

root-element=<element_name> -OR- <element_name>|<class_name>
    Make ``element_name`` the assumed root of instance documents.
    The default is the name of the element whose definition is first
    in the XML Schema document.  If ``class_name`` is also present
    (after a vertical bar), then ``class_name`` is assumed to be the
    name of the class to be created from the root (top level)
    element when parsing an XML instance document.  If
    ``class_name`` is omitted, the default class name is the same as
    ``element_name``.  This flag effects the parsing functions (for
    example, parse(), parseString(), etc).

super=<module_name>
    Make module_name the name of the superclass module imported
    by the subclass module.  If this flag is omitted, the
    following is generated near the top of the subclass file::

        import ??? as supermod

    and you will need to hand edit this so the correct superclass
    module is imported.

validator-bodies=<path>
    Obtain the bodies (implementations) for validator methods for
    members defined as ``simpleType`` from files in directory
    specified by ``<path>``.  The name of the file in that
    directory should be the same as the ``simpleType`` name with
    an optional ".py" extension.  If a file is not provided for a
    given type, an empty body (``pass``) is generated.  In these
    files, lines with "##" in the first two columns are ignored
    and are not inserted.

use-old-simpletype-validators
    ``generateDS.py`` is capable of generating validator bodies --
    the code that validates data content in an XML instance
    docuement and writes out warning messages if that data does not
    satisfy the facets in the ``xs:restriction`` in the
    ``xs:simpleType`` defintion in the XML schema.  Use this option
    if you want to use your own validation bodies/code defined in a
    specified directory .  See option ``--validator-bodies`` for
    information on that.  *Without* this option
    (``--use-old-simpletype-validators``), the validator code will
    be generated directly from the XML schema, which is the default.

    This option can also be used to generate code that does *no*
    validation.  See `simpleType and validators`_ and `Turning off
    validation of simpleType data`_ for more information.

use-getter-setter
    ``generateDS.py`` now generates getter and setter methods (for
    variable "abc", for example) with the names get_abc() and
    set_abc(), which I believe is a more Pythonic style, instead of
    getAbc() and setAbc(), which was the old behavior.  Use this
    flag to generate getters and setters in the old style (getAbc()
    and setAbc()) or the newer style(get_abc() and set_abc()) which
    is the default or to omit generation of getter and setter
    methods.  Possible values are:

    - "old" - Name getters/setters getVar()/setVar().
    - "new" - Name getters/setters get_var()/set_var().
    - "none" - Do not generate getter/setter methods.

    The default is "new".

use-source-file-as-module-name
    Used in conjunction with and only has an effect when used with
    ``--one-file-per-xsd``.  The effect of this option is to use the
    source XML schema file names to determine the module name of the
    generated classes.  Without this option, the first root element
    is used to construct module names.  The default is False.
    
use-regex-module
    This option causes ``generateDS.py`` to generate modules that
    import the ``regex`` module instead of the ``re`` module.  The
    default is False. There are some regular expressions that
    ``regex`` handles but that ``re`` does not, for example
    "\p{...}".  See https://pypi.org/project/regex/ and
    https://github.com/mrabarnett/mrab-regex.

u, user-methods=<module>
    If specified, ``generateDS.py`` will add methods to generated
    classes as specified in the indicated module.  For more
    information, see section `User Methods`_.

custom-imports-template=<file_path>
    Optional file with custom imports directives
    which can be used via the --user-methods option.

no-dates
    Do not include the current date in the generated files.  This is
    useful if you want to minimize the amount of (no-operation)
    changes to the generated python code.

no-versions
    Do not include the current version in the generated files.  This is
    useful if you want to minimize the amount of (no-operation)
    changes to the generated python code.

no-process-includes
    Do not use ``process_includes.py`` to pre-process included XML
    Schema files.  By default, generateDS.py will insert content
    from files referenced by ``xs:include`` and ``xs:import``
    elements into the XML Schema to be processed.  See section
    `Include file processing`_.  Note that include processing, which
    is performed in ``process_includes.py`` is required for
    generating validator bodies from the XML schema, because the
    Lxml ElementTree produced in ``process_includes.py`` is needed
    to generate the validator code.  So, using this option also
    turns off automatic generation of validator code.  Also note
    that process_includes(.py) performs additional tasks; it also
    (1) assigns names to each anonymous complexType, (2) processes
    (replaces) group definitions, and (3) possibly fixes complexType
    names (see command line option --fix-type-names).  You likely do
    not want to use this option; its use has been reported to result
    in errors in generated modules.  Consider using
    --no-collect-includes and/or --no-redefine-groups instead.

no-collect-includes
    Do not (recursively) collect and insert schemas referenced by
    ``xs:include`` and ``xs:import`` elements.  This task is
    performed in ``process_includes.py``.

no-redefine-groups
    Do not pre-process and redefine group definitions.  This task is
    performed in ``process_includes.py``.

silence
    Normally, the code generated with generateDS echoes the
    information being parsed. To prevent the echo from occurring,
    use the --silence switch.  This switch causes generateDS.py,
    when it generates boiler-plate parsing functions, (parse(),
    parseString(), parseLiteral()), to generate code that does
    *not* print out output (export output to stdout).

namespacedef="<http://...>"
    Namespace definition to be passed in as the value for the
    ``namespacedef_`` parameter of the export() method by the generated
    parse() and parseString() functions.  If this parameter is
    specified, then the export function will insert a namespace
    prefix definition attribute in the top-most (outer-most)
    element.  (Actually, you can insert any attribute.) The default
    is an empty string.

no-namespace-defs
    Do not pass namespace definitions as the value for the
    ``namespacedef_`` parameter of the export method, even if it can
    be extraced from the schema.  The default is off.  You might
    want to consider using this in combination with the ability to
    attach namespace prefix definitions to specific element types
    during export, as described here: `Adding custom exported
    attributes and namespace prefix definitions`_.

external-encoding=<encoding>
    If an XML instance document contains character data or
    attribute values that are not in the ASCII character set, then
    that data will not be written out correctly or will throw an
    exception.  This flag enables the user to specify a character
    encoding into which character data will be encoded before it is
    written out by the export functions.  The generated export
    methods encode data using this encoding.  The default value, if
    this flag is omitted, is the value returned by
    sys.getdefaultencoding().  You can find a list of standard
    encodings here: http://docs.python.org/library/codecs.html#id3.
    Example use: --external-encoding='utf-8'.

member-specs Generate member (type) specifications in each class
    A dictionary of instances of class ``MemberSpec_`` containing
    member name, type, array or not, and whether the item is
    optional (i.e. defined with minOccurs="0").  See `User Methods`_
    section for more information about ``MemberSpec_``.  Allowed
    values are "list" or "dict".  Default: do *not* generate member
    specifications (unless --user-methods specified).

export
    Specify which of the export related member methods are to be
    generated.  The value is a whitespace separated list of any of
    the following:

    - "write" -- Generate methods ``export``, ``exportAttributes``,
      and ``exportChildren``.  These methods write XML to a file.

    - "literal" -- Generate methods ``exportLiteral``,
      ``exportLiteralAttributes`` and ``exportLiteralChildren``.
      These methods write out python code.

    - "etree" -- Generate method ``to_etree``.  This method builds an
      lxml element tree, which can, for example, be serialized to
      XML using lxml's ``tostring`` function and searched with the
      lxml xpath capability.  You can also iterate over nodes in the
      tree with the node's ``getiterator``, ``iterchildren``, etc,
      and use any of lxml's other capabilities.

    - "validate" -- Generate a validator method in each complex type
      class.  When called, this method calls each applicable simple
      type validator methods on simple types (attributes and
      children) defined in this class.  See `Generating validator
      methods`_ for more information.

    - "django" -- Generate models for Django databases.

    - "sqlalchemy" -- Generate models for Sqlalchemy databases.

    - "generator" -- Generate a Python generator method that can be
      used to produce an iterable that produces each object in the
      tree of objects that represent complex types.

    For example: ``--export="write etree"`` and ``--export="write"``.  The
    default is: ``--export="write"``.

always-export-default
    Always export elements and attributes that a default value even
    when the current value is equal to the default.  Default: False.

disable-generatedssuper-lookup
    Disables the generation of code implementing the lookup for
    presence of an external module from which to load a custom
    replacement for the default ``GeneratedsSuper`` base-class.
    With this flag, unconditionally uses the built-in implementation
    of ``GeneratedsSuper``.  (Suggestion: In order to get a picture
    of what difference this option makes, you might consider
    generating modules both with and without it, and then comparing
    the results with ``diff``.)  The default is False.

disable-xml
    Disables generation of code that enables XML build/export
    methods and command line interface.  Actually, the code is
    there, but is commented out.  If you enable this option, the
    generated modules will *not* contain code for the following: (1)
    run as a script without explicitly running ``python`` (the
    ``#!/usr/bin/env python`` line is omitted); (2) import
    ``lxml.etree``; (3) parse an XML file; (4) export an XML file.
    (Suggestion: In order to get a picture of what difference this
    option makes, you might consider generating modules both with
    and without it, and then comparing the results with ``diff``.)
    The default is False.

enable-slots
    Enables the use of slots for generated class members.  Use of
    this option requires that you also use the following option:
    ``--member-specs=dict``.  Use of this option results in the
    generation of ``__slots__`` class variables in classes generated
    from complex types.  The effect is to reduce memory use and
    speed up processing.

preserve-cdata-tags
    Preserve CDATA tags.  Normally, CDATA tags ("<![CDATA[ ... ]]>")
    are dropped while parsing an XML instance document.  If this
    option is included, the generated code will preserve those tags
    and will write them out during export.  The default is False.

cleanup-name-list=<replacement-map>
    Specifies replacement pairs to be used when cleaning up names.
    Must be a string representation of a Python list of 2-tuples.
    The values of each pair (2-tuple) must be strings.  The first
    item of each pair is a pattern and must be a valid Python
    regular expression (see
    https://docs.python.org/2/library/re.html#module-re) The second
    item of each pair is a string that will replace anything matched
    by the pattern.  Also see `Cleaning up names with special characters etc.`_

    The intension is to enable us to replace
    special characters in names that would cause the generation of
    invalid Python names, for example the names of generated
    classes.  However, since a string replacement is
    performed, you can replace any single character or sequence of
    characters by any other single character or sequence of
    characters.  Example:
    ``[(':', 'colon'), ('-', 'dash'), ('.', 'dot')]``.

    The default when omitted is ``[('[-:.]', '_')]``.

mixed-case-enums
    Do not uppercase the names of simpleType enums.  The default (if
    this option is omitted) is to make generated enum names
    uppercase.

create-mandatory-children
    If a child is defined with minOccurs="1" and maxOccurs="1" and
    the child is xs:complexType and the child is not defined with
    xs:simpleContent, then in the element's constructor generate
    code that automatically creates an instance of the child.  The
    default is False, i.e. do not automatically create the child.
    Note that if a value for the child's parameter is passed to the
    constructor (which overrides the default value of None), then
    the constructor will not create an instance.

import-path="string"
    This value will be pre-pended to the name of files to be
    imported by the generated module.  The default value is the
    empty string ("").  This enables the user to produce relative
    import statements in the generated module that restrict the
    import to some module in a specific package in a package
    directory structure containing the generated module.

q, no-questions
    Do not ask questions.  For example, if the "-f" command line
    option is omitted and the ouput file exists, then generateDS.py
    will not ask whether the file should be overwritten.  (In this
    case, when "-q" is used, the "-f" must be used to force the
    output file to be written.

no-warnings
    While running generateDS.py, do not print warning messages that
    would be written to stderr.

session=mysession.session
    Load and use options from session file. You can create a
    session file in generateds_gui.py, the graphical front-end for
    generateDS.py.  Additional options on the command line can be
    used to override options in the session file.  A session file
    is an XML document, so you can modify it with a text editor.

fix-type-names="oldname1:newname1;oldname2:newname2;..."
    Fix up (replace) complex type names.  Using this option will
    replace the following: (1) the 'name' attribute of a
    complexType; (2) the 'type' attribute of each element that
    refers to the type; and (3) the 'base' attribute of each
    extension that refers to the type.  These fixups happen before
    information is collected from the schema for code generation.
    Therefore, using this option is effectively equivalent to
    copying your schema, then editing it with your text editor, then
    generating code from the modified schema.  If a new name is not
    specified, the default is to replace the old name with the old
    name plus an added "xx" suffix.  Examples::

        $ generateDS.py --fix-type-names="type1:type1Aux"
        $ generateDS.py --fix-type-names="type1;type2:type2Repl"

g rootelement:rootclass, graphql=rootelement:rootclass
    Generate methods, functions, query, classes, and schema for GraphQL.
    Specify the root element (tag) and root class for XML instance
    docs.

version
    Print out the current version of ``generateDS.py`` and
    immediately exit.


Name conflicts etc.
---------------------

Conflicts with Python keywords
..............................

In some cases the element and attribute names in an XML document
will conflict with Python keywords.  There are two solutions to fixing
and avoiding name conflicts:

- In an attempt to avoid these clashes, ``generateDS.py`` contains a
  table that maps names that might clash to acceptable names. This
  table is a Python dictionary named ``NameTable``. The user can
  modify existing entries in this table within ``generateDS.py``
  itself and add additional name-replacement pairs to this table, for
  example, if new conflicts occur.

- Or, you can fix additional conflicts by following these steps:

  1. Create a module named ``generateds_config.py``.

  2. Define a dictionary in that module named ``NameTable``.

  3. Place additional name mappings in that dictionary.  Here is a
     sample::

         NameTable = {
             'range': 'rangeType',
             }

  3. And, place that module where ``generateDS.py`` can import it, or
     place the directory containing that module on your
     ``PYTHONPATH`` environment variable.

  ``generateDS.py`` will attempt to import that module
  (``generateds_config.py``) and will add the name mappings in it to
  the default set of mappings in ``NameTable`` in ``generateDS.py``
  itself.


Conflicts between child elements and attributes
...............................................

In some cases the name of a child element and the name of an
attribute will be the same.  (I believe, but am not sure, that
this is allowed by XML Schema.) Since generateDS.py treats both
child elements and attributes as members of the generated class,
this is a name conflict.  Therefore, where such conflicts exist,
generateDS.py modifies the name of the attribute by adding "_attr"
to its name.


Cleaning up names with special characters etc.
................................................

``generateDS.py`` attempts to clean up names that contain special
characters.  For example, a complexType whose name contains a dash
would generate a Python class with an invalid name.  But, you can
use this facility to make other changes to names as well.

The command line option ``--cleanup-name-list`` specifies
replacement pairs to be used when cleaning up (and modifying) names.
The value of this option must be a string representation of a Python
list of 2-tuples.  The values of each pair (2-tuple) must be
strings.  The first item of each pair is a pattern and must be a
valid Python regular expression (see
https://docs.python.org/2/library/re.html#module-re) The second item
of each pair is a string that will replace anything matched by the
pattern.  The intension is to enable us to replace special
characters in names that would cause the generation of invalid
Python names, for example the names of generated classes.  However,
since a string replacement is performed, you can replace any single
character or sequence of characters by any other single character or
sequence of characters.

For example, the following option, in addition to performing the
default replacements of "-", ":", and "." by an underscore, would
also replace the string "Type" when it occurs at the end of a name,
by "Class"::

    --cleanup-name-list="[('[-:.]', '_'), ('Type$', 'Class')]"

This would cause the name "big-data-Type" to become "big_data_Class".

The default when this option is omitted is ``[('[-:.]', '_')]``.

The order of replacements performed is the same as the order of the
tuples in the list.  So, replacements performed by pattern
replacement pairs (2-tuples) later in the list (to the right) will
be performed after those earlier (to the left), and may overwrite
earlier replacements.

See the notes on the command line option ``--cleanup-name-list`` for
more on this.  Or, run ``$ generateDS.py --help``.


The graphical user interface -- How to use it
==============================================================

**Note:** The graphical user interface is no longer supported.

Here are a few notes on how to use the GUI front-end.

- ``generateds_gui.py`` is installed when you do the standard
  installation::

      $ python setup.py install

- Run it by typing the following at the command line::

      $ generateds_gui.py

- For help with command line options, run::

      $ generateds_gui.py --help

- For a description of the values and flags that you can set, see
  section `Running generateDS.py`_.  There are also tool tips on
  the various widgets in the graphical user interface.

- Generate the python bindings modules by using the
  ``Tools/Generate`` menu item or the ``Generate`` button at the
  bottom of the window.

- Capture the command line generated by using the
  ``Tools/Capture command line`` menu item.  You might consider copying and
  pasting that command line into a shell script or batch file for
  repeated reuse.

- You can also save and later reload your values and
  flags in a session file.  See the ``Save session``,
  ``Save session as``, and ``Load session`` items under the ``File`` menu.
  By default, a session file has the extension ".session".

- You can load a session on start-up with the "-s" or "--session"
  comand line options.  For example::

      $ generateds_gui.py --session=mybindingsjob.session

  Or, use the "session" option in a configuration file.

- If the command to be run when generating bindings is not
  standard, you can specify that command with the "--exec-path"
  command line option or with the "exec-path" option configuration
  file.  The default is "generateDS.py".

- Command line options can also be specified in a configuration
  file.  ``generateds_gui.py`` checks for that configuration file in
  the following locations in this order:

  1. ``~/.generateds_gui.ini``

  2. ``./generateds_gui.ini``

  Here is a sample configuration file::

      [general]
      exec-path: /usr/bin/python ~/bin/generateDS.py
      impl-path: generateds_gui.glade
      session: a1.session

  Options on the command line override options in configuration
  files.


Common problems
===============

Namespace prefix mis-match
--------------------------

``generateDS.py`` is not very intelligent about detecting what
prefix is used in the schema file for the XML Schema namespace.
When this problem occurs, you may see the following when running
``generateDS.py``::

    AttributeError: 'NoneType' object has no attribute 'annotate'

``generateDS.py`` assumes that the XML Schema namespace prefix in
your schema is "xs:".

So, if the XML Schema namespace prefix in your schema is not "xs:",
you will need to use the "-a" command line option when you run
``generateDS.py``.  Here is an example::

    generateDS.py -a "xsd:" --super=mylib -o mylib.py -s myapp.py someschema.xsd


Using multiple subclass modules with the same superclass module
-----------------------------------------------------------------

Suppose that from a single XML schema, you have generated a
superclass module and a subclass module (using the "-o" and "-s"
command line options).  Now you make a copy of the subclass module.
Next you add special and different code to each of the subclass
modules.  You can run these two subclass modules separately and
(after a bit of debugging) each works fine.  And, you can import
each subclass module in separate applications, and things are still
good.  However, if you import both subclass modules into a single
application, you find that one of them is "ignored" by the
superclass module when it parses XML instance documents and builds
classes.  Effectively, each subclass module, when it is imported,
sets a class variable (``subclass``) in each superclass to the
subclass to be used by the superclass, and the last subclass
imported module wins.

There are two alternative solutions to this problem:

1. Use the script/function provided by the distribution in file
   ``fix_subclass_refs.py``.  The doc string in that module explains
   how to use it and gives an example of its use.

2. Each generated superclass module (starting with ``generateDS.py``
   version 2-19a) contains a global variable
   ``CurrentSubclassModule_``.  The value of this variable, if it is
   not None, overrides the value of the class variable ``subclass``
   in each generated superclass.  You can change the value of this
   variable before parsing an XML document and building instances of
   the generated classes to determine which subclass module is to be
   used during the "build" phase.

   Here is an example of the use of this feature::

       #!/usr/bin/env python

       import lib01suba
       import lib01subb

       def test():
           lib01suba.supermod.CurrentSubclassModule_ = lib01suba
           roota = lib01suba.parse('test01.xml', silence=True)
           lib01subb.supermod.CurrentSubclassModule_ = lib01subb
           rootb = lib01subb.parse('test01.xml', silence=True)
           roota.show()
           print '-' * 50
           rootb.show()

       test()

The second alternative (above) is likely to be a more convenient
solution in most cases.  But, there are possibly use cases where the
use of ``fix_subclass_refs.py`` or a modified version of it will be
helpful.


Supported features of XML Schema
================================

The following constructs, among others, in XML Schema are
supported:

- Attributes of types xs:string, xs:integer, xs:float, and
  xs:boolean.

- Repeated sub-elements specified with maxOccurs="unbounded".

- Sub-elements of simple types xs:string, xs:integer, and xs:float.

- Sub-elements of complex types defined separately in the XML
  Schema document.

See file people.xsd for examples of the definition of data types
and structures. Also see the section on `The XML Schema Input
to generateDS`_.


Attributes + no nested children
-------------------------------

Element definitions that contain attributes but *no* nested child
elements provide access to their data content through getter and
setter methods ``getValueOf_`` and ``setValueOf_`` and member
variable ``valueOf_``.


Mixed content
-------------

Elements that are defined to contain both text and nested child
elements have "mixed content".  ``generateDS.py`` provides access
to mixed content, but the generated data structures (classes) are
fundamentally different from that generated for other elements.
See section `Mixed content`_ for more details.

Note that elements defined with attributes but with *no* nested
sub-elements do not need to be declared as "mixed".  For these
elements, character data is captured in a member variable
``valueOf_``, and can be accessed with member methods
``getValueOf_`` and ``setValueOf_``.


anyAttribute
------------

``generateDS.py`` supports ``anyAttribute``.  For example, if an
element is defined as follows::

    <xs:element name="Tool">
       <xs:complexType>
          <xs:attribute name="PartNumber" type="xs:string" />
          <xs:anyAttribute processContents="skip" />
       </xs:complexType>
    </xs:element>

Then ``generateDS.py`` will generate a class with a member
variable ``anyAttributes_`` containing a dictionary.  Any
attributes found in the instance XML document that are not
explicitly defined for this element will be stored in this
dictionary.  ``generateDS.py`` also generates getters and setters
as well as code for parsing and export. ``generateDS.py`` ignores
``processContents``. See section `anyAttribute`_ for more details.

Element extensions
------------------

``generateDS.py`` now generates subclasses for extensions, that is
when an element definition contains something like this::

    <xs:extension base="sometag">

**Limitation** -- There is an important limitation, however:
member names duplicated (overridden ?) in an extension generate
erroneous code.  Sigh. I guess I needed something more to do.

Several of the generated methods have been refactored so that
subclasses can reuse the code in their superclasses.  Take a look
at the generated code to learn how to use it.

The Python compiler/interpreter requires that it has seen a
superclass before it sees the subclass that uses it.  Because of
this, ``generateDS.py`` delays generating a subclass until after
its superclass has been generated.  Therefore, the order in which
classes are generated may be different from what you expect.


Attribute groups
----------------

``generateDS.py`` now handles definition and use of attribute
groups.  For example: the use of something like the following::


    <xs:attributeGroup name="favorites">
        <xs:attribute name="fruit" />
        <xs:attribute name="vegetable" />
    </xs:attributeGroup>

And, a reference or use like the following::

    <xs:element name="person" type="personType"/>
    <xs:complexType name="personType" mixed="0">
        <xs:attributeGroup ref="favorites" />
    </xs:complexType>

Results in generation of class ``personType`` that contains members
``fruit`` and ``vegetable``.

Multiple levels of attributeGroups are supported, that is, attribute
groups themselves can contain references to other attribute groups.


Substitution groups
-------------------

``generateDS.py`` now handles a limited range of substitution
groups, but, there is an important **limitation**, in particular
``generateDS.py`` handles substitution groups that involve complex
types, but does not handle those that involve (substitute for)
simple types (for example, xs:string, xs:integer, etc).  This is
because the code generated for members defined as simple types
does not provide the needed information to handle substitution
groups.


Primitive types
---------------

``generateDS.py`` supports some, but not all, simple types defined
in "XML Schema Part 0: Primer Second Edition" (
http://www.w3.org/TR/xmlschema-0/.  See section "Simple Types" and
appendix B).  Validation is performed for some simple types.  When
performed, validation is done while the XML document is being read
and instances are created.

Here is a list of supported simple types:

- ``xs:string`` -- No validation.

- ``xs:token`` -- No validation.  White space between tokens is
  coerced to a single blank between tokens.

- ``xs:integer``, ``xs:short``, ``xs:long``. ``xs:int`` -- All
  treated the same.  Checked for valid integer.

- ``xs:float``, ``xs:double``, ``xs:decimal`` -- All treated the
  same.  Checked for valid float.

- ``xs:positiveInteger`` -- Checked for valid range (> 0).

- ``xs:nonPositiveInteger`` -- Checked for valid range (<= 0).

- ``xs:negativeInteger`` -- Checked for valid range (< 0).

- ``xs:nonNegativeInteger`` -- Checked for valid range (>= 0).

- ``xs:date``, ``xs:dateTime`` -- All treated the same.  No
  validation.

- ``xs:boolean`` -- Checked for one of ``0``, ``false``, ``1``,
  ``true``.


simpleType
----------

``generateDS.py`` generates minimal support for members defined as
``simpleType``.  However, the code generated by ``generateDS.py``
does **not** enforce restrictions.  For notes on how to enforce
restrictions, see section `simpleType and validators`_.

A ``simpleType`` can be a restriction on a primitive type or on a
defined element type.  So, for example, the following will
generate valid code::

    <xs:element name="percent">
        <xs:simpleType>
            <xs:restriction base="xs:integer">
                <xs:minInclusive value="1"/>
                <xs:maxInclusive value="100"/>
            </xs:restriction>
        </xs:simpleType>
    </xs:element>

And, the following will also generate valid code::

    <xs:simpleType name="emptyString">
        <xs:restriction base="xs:string">
            <xs:whiteSpace value="collapse"/>
        </xs:restriction>
    </xs:simpleType>

    <xs:element name="merge">
        <xs:complexType>
            <xs:simpleContent>
                <xs:extension base="emptyString">
                    <xs:attribute name="fromTag" type="xs:string"/>
                    <xs:attribute name="toTag" type="xs:string"/>
                </xs:extension>
            </xs:simpleContent>
        </xs:complexType>
    </xs:element>


List values, optional values, maxOccurs, etc.
---------------------------------------------

For elements defined with ``maxOccurs="unbounded"``,
``generateDS.py`` generates code that processes a list of elements.

For elements defined with ``minOccurs="0"`` and ``maxOccurs="1"``,
``generateDS.py`` generates code that exports an element only if
that element has a (non-None) value.


simpleType and validators
-------------------------

Generating validator bodies from XML schema
.............................................

If you do *not* use the ``--use-old-simpletype-validators`` command line
option, then ``generateDS.py`` will generate validation code
directly from the restrictions specified inside the ``simpleType``
definitions in your XML schema.

Here is a bit of explanation of what that generated code will do.

- The generated validation code checks the global variable
  ``Validate_simpletypes_``.  Set that variable to ``False`` to turn
  off validation.

- In the case of some XML schema built-in simple types, the
  generated validation code calls ``gds_validate_xxx``, where "xxx"
  is a base, simple type.  In some cases, you will be able to add
  additional code to that method to perform custom checking.  See
  section `Overridable methods -- generatedssuper.py`_ for
  information on how to use and override that class.

- When validation finds data that fails to validate, it generates a
  warning (using the ``warnings`` module from the Python standard
  library), not an exception, so that processing continues.

- The validation code is generated in a separate method named
  ``validate_xxx``, where "xxx" is the name of the data element.
  This method is called in the ``build`` method as the input data is
  parsed and instances of the generated classes are created to hold
  it.  Your own code can also call this method whenever you'd like
  to perform on the data in that element/field.

- There are rules for how checking should be performed when (1)
  there are multiple restrictions in a single ``simpleType`` and
  when there are restrictions in a ``simpleType`` and it base simple
  types.  ``generateds.py`` attempts to follow those rules in
  generating validation code.  For information about that, see:
  http://www.w3.org/TR/xmlschema-2/#rf-facets.  Pattern facets are
  especially tricky, because pattern restrictions at the same level
  are OR-ed together, while pattern restrictions at different levels
  are AND-ed together.  See:
  http://www.w3.org/TR/xmlschema-2/#rf-pattern.

- The validation method also performs type conversion for some
  simple types, for example, string to int for integers, string to
  float for floats, etc.


User written validator bodies
...............................

This is the older, more manual method.  In order to generate code
that uses this method, use command line option
``--use-old-simpletype-validators``.

Here are a few notes that should help you *write your own* validator
methods to enforce restrictions.

- Default behavior -- The generated code, by default, treats the
  value of a member whose type is a ``simpleType`` as if it were
  declared as type ``xs:string``.

- Validator method stubs -- For a member variable name declared as a
  ``simpleType`` named ``X``, a validator method ``validate_X`` is
  generated.  Example -- from::

      <xs:simpleType name="tAnyName">
          <xs:restriction base="xs:string"/>
      </xs:simpleType>

  The class generated by ``generateDS.py`` will contain the
  following method definition::

      def validate_tAnyName(self, value):
          # Validate type tAnyName, a restriction on xs:string.
          pass

- Calls to validator methods -- For a member variable declared as a
  ``simpleType`` ``X``, a call to ``validate X`` is added to the
  build method.  Example -- from::

      <xs:element name="person">
          <xs:complexType mixed="0">
              <xs:sequence>
                  <xs:element name="test2" type="tAnyName"/>
              </xs:sequence>
          </xs:complexType>
      </xs:element>

  ``generateDS.py`` produces the following call::

      self.validate_tAnyName(self.test2)    # validate type tAnyName


- Code bodies for validator methods can be added either (1)
  manually or (2) automatically from an external source.  See
  command line option "--validator-bodies" and see below.

You can add code to the validator method stub to enforce the
restriction for the base type and further restrictions imposed on
that base type.  This can be done in the following ways:

1. Add code manually after generation.  I recommend that you use
   the "-s" command line option and override the validator method
   in the resulting subclass file.

2. Or, supply code bodies (implementations) in an external
   source and ask ``generateDS.py`` to insert those code bodies
   into generated validator methods.  Here are notes on how to do
   this:

   - Use the "--validator-bodies=path" command line option to specify
     a directory.

   - In that directory, provide one file for each ``simpleType``.
     The name of the file should be the same as the name of
     the ``simpleType`` with an optional extension ".py".
     ``generateDS.py`` looks for a file named ``type_name.py``,
     first, and if not found, looks for a file named
     ``type_name``.

   - If the "--validator-bodies" is not on the command line
     or neither ``type_name.py`` nor ``type_name`` is found, an
     empty body (a ``pass`` statement) is generated.

   - Lines from the file are inserted as is, except that lines
     containing "##" in the first two columns are omitted.  Note
     that you will need to provide the correct indentation for a
     method in a class, specifically 8 spaces.

The support for ``simpleType`` in ``generateDS.py`` has the
following limitations (among others, I'm sure):

- It only works for ``simpleType`` defined with and referenced
  through a name.  It does not work for "in-line" definitions.
  So, for example, the following works::

      <xs:element name="person">
          <xs:complexType>
              <xs:sequence>
                  <xs:element name="test3" type="tAnyName"/>
              </xs:sequence>
          </xs:complexType>
      </xs:element>

      <xs:simpleType name="tAnyName">
          <xs:restriction base="xs:string"/>
      </xs:simpleType>

  But, the following does not work::

      <xs:element name="person">
          <xs:complexType>
              <xs:sequence>
                  <xs:element name="test3">
                      <xs:simpleType name="tAnyName">
                          <xs:restriction base="xs:string"/>
                      </xs:simpleType>
                  </xs:element>
              </xs:sequence>
          </xs:complexType>
      </xs:element>

- Attributes defined as a simple type are not supported.


Turning off validation of ``simpleType`` data
...............................................

If you do not want validation performed on ``simpleType`` data, you
have these options:

1. When generating your code, use the
   ``--use-old-simpletype-validators`` command line option but do
   *not* use the ``--validator-bodies`` command line option.  This
   will result in validator methods that have empty bodies (only a
   ``pass`` statement).

2. Or, when you run your generated code, set the variable
   ``Validate_simpletypes_`` to ``False``.  This global variable is
   near the top of your generated module.  It can be set to ``True``
   or ``False`` before and during processing to turn validation on
   and off.


Additional notes on ``simpleType`` validation
...............................................

Don't forget that ``xmllint`` can also be used to perform
validation against the XML scheme.  This validation includes
checking against ``simpleType`` restrictions.  See
http://xmlsoft.org/ for more information on ``xmllint``.


Include file processing
-----------------------

By default, generateDS.py will insert content from files referenced by
``include`` elements into the XML Schema to be processed.  This
behavior can be turned off by using the "--no-process-includes"
command line option.

``include`` elements are processed and the referenced content is
inserted in the XML Schema by importing and using
``process_includes.py``, which is included in the ``generateDS.py``
distribution.

The include file processing is capable of retrieve included files
via FTP and HTTP internet protocols as well as from the local file
system.


Abstract types
--------------

``generateDS.py`` has support for abstract types.  For more on
this, see:
`XML Schema Part 0: Primer Second Edition: Abstract Elements and Types --
http://www.w3.org/TR/xmlschema-0/#abstract
<http://www.w3.org/TR/xmlschema-0/#abstract>`_.


Types derived by extension
--------------------------

This section describes some of the support for types derived by
extension and also how to use the data bindings generated for those
types in Python.

For example, suppose you have an XML schema that looks like this
(``example.xsd``)::

    <?xml version="1.0"?>
    <xs:schema xmlns:xs="http://www.w3.org/2001/XMLSchema" version="1.0">

    <xs:element name="animalCollection">
      <xs:complexType>
        <xs:sequence>
          <xs:element name="animal" type="animal" maxOccurs="unbounded"/>
        </xs:sequence>
      </xs:complexType>
    </xs:element>

    <xs:complexType name="animal" abstract="true"></xs:complexType>

    <xs:complexType name="dog">
      <xs:complexContent>
        <xs:extension base="animal">
          <xs:sequence>
            <xs:element name="name" type="xs:string"/>
          </xs:sequence>
        </xs:extension>
      </xs:complexContent>
    </xs:complexType>
    </xs:schema>

An XML instance document for this document type might be the
following::

    <?xml version="1.0"?>
    <animalCollection xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
        <animal xsi:type="dog">
            <name>fido</name>
        </animal>
    </animalCollection>

Question: How would you, in Python, using bindings generated by
``generateDS.py``, create an instance of type ``dog`` that is
derived from type ``animal`` and when exported to XML, appears as an
animal with attribute ``xsi:type="dog"``?

First, we need to generate our bindings::

    $ generateDS.py -o example01.py example.xsd

And, now, here is Python some code that creates those instances and
exports them::

    # sample01.py

    import sys
    import example01

    def test():
        animal_collection = example01.animalCollection()
        animal = example01.dog(name='milicent')
        #
        # must set original_tagname_ and extensiontype_ for
        # type derived by extension.  See:
        # https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#DerivExt
        animal.original_tagname_ = 'animal'
        animal.extensiontype_ = 'dog'
        animal_collection.add_animal(animal)
        animal_collection.export(sys.stdout, 0)
        return animal_collection, animal

    test()

Notes:

- The above code creates an instance of class ``animalCollection``
  and an instance of class ``dog``.

- Because we want the ``dog`` to be represented in XML as a
  "<animal>" with an "xsi:type" attribute, we must set the
  ``original_tagname_`` and ``extensiontype_`` attributes in the
  instance of class ``dog``.

- Then we add our ``dog`` to the ``animalCollection``, and finally,
  we export it.

- We can get some clues about this by reading the code generated for
  classes ``animalCollection``, ``animal``, and ``dog``.

When we run it, we'll see::

    $ python sample01.py
    <animalCollection>
        <animal xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:type="dog">
            <name>milicent</name>
        </animal>
    </animalCollection>

For more information on types derived by extension, see "XML Schema
Part 0: Primer Second Edition", specifically:

- "Deriving Types by Extension" --
  https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#DerivExt

- "Using Derived Types in Instance Documents" --
  https://www.w3.org/TR/2004/REC-xmlschema-0-20041028/#UseDerivInInstDocs


Duplicate unqualified type names
----------------------------------

``generateDS.py`` can handle schemas that define the same
unqualified name in separate name spaces, although the solution is
not, perhaps, ideal.  This is done by renaming the duplicate name.
Doing so is necessary because all definitions must be generated in a
single module, and we cannot, for example, generate multiple classes
with the same name.  

A dictionary ``RenameMappings_`` in the generated module contains the
mapping of original qualified name to new name.  Here is an
example::

    RenameMappings_ = {
        "{http://www.example.com/IPO_US}Address": "Address2",
        "{http://www.example.com/IPO_US}BaseAddress": "BaseAddress1",
        "{http://www.example.com/IPO_US}Postcode": "Postcode4",
        "{http://www.example.com/IPO_US}ProvinceState": "ProvinceState3",
    }

If you need look-up in the reverse direction, you can try (under
Python 3) using something like the following::

  [ins] In [11]: {k: v for (v, k) in my_module.RenameMappings_.items()}
  Out[11]:
  {'Address2': '{http://www.example.com/IPO_US}Address',
   'BaseAddress1': '{http://www.example.com/IPO_US}BaseAddress',
   'Postcode4': '{http://www.example.com/IPO_US}Postcode',
   'ProvinceState3': '{http://www.example.com/IPO_US}ProvinceState'}


Mapping name spaces to defined types
--------------------------------------

Near the bottom of modules generated by ``generateDS.py`` (using
"-o") is a generated global variable ``NamespaceToDefMappings_``.
This variable contains a mapping (a Python dictionary) from name
space URIs to a list of the ``xs:complexType`` and ``xs:simpleType``
types defined under that name space.  Actually, for each type there
is a 3-tuple containing (1) the type name (for a ``xs:complexType``
this corresponds to the name of the Python class generated for that
type), (2) the name of the XML schema file in which the type is defined,
and (3) "CT" for a ``xs:complexType`` or "ST" for a
``xs:simpleType``.


The XML schema input to generateDS
==================================

**Note:** Quite a bit of work has been done on ``generateDS.py``
since this section was written.  So, it accepts and processes
more of features in XML Schema than earlier.  The best advice is to
give it a try on your schema.  If it works, great.  If it does not,
post a message to the list:
`generateds-discuss --
https://sourceforge.net/p/generateds/mailman/generateds-discuss/
<https://sourceforge.net/p/generateds/mailman/generateds-discuss/>`_.


``generateDS.py`` actually accepts a subset of XML Schema.
The sample XML Schema file should give you a picture of how to
describe an XML file and the Python classes that you will
generate. And here are some notes that should help:

- Specify the tag in the XML file and the name of the generated
  Python class in the name attribute on the xs:element. For
  example, to generate a Python class named "person", which will
  be populated from an XML element/tag "person", use the following
  XML Schema snippet::

      <xs:element name="person" ...

- To specify a data member for a generated Python class that will
  be propogated from an attribute in an element in an XML file,
  use the XML Schema xs:attribute. For attributes, generateDS
  recognizes the following types: "xs:string", "xs:integer", and
  "xs:float". For example, the following adds member data items
  "hobby" and "category" with types "xs:string" and "xs:integer"::

    <xs:element name="person">
        <complexType>
            <xs:attribute name="hobby" type="xs:string" />
            <xs:attribute name="category" type="xs:integer" />
        </complexType>
    </xs:element>

- To specify a data member for a generated Python class whose
  value is a string, integer, or float and which will be populated
  from a nested (simple) element, specify a nested XML Schema
  element whose type is "xs:string", "xs:integer", or "xs:float".
  Here is an example which defines a Python class "person" with a
  data member "description" which is a string and which is
  populated from a (simple) nested element::

    <xs:element name="person">
        <complexType>
            <sequence>
                <xs:element name="description" type="xs:string" />
            <sequence>
        </complexType>
    </xs:element>

- To specify a data member of a generated Python class that will
  be populated from a nested XML element, refer to the nested
  object in the "type" attribute and then define another
  element/type whose name is that type. For example, the following
  specifies that the person class will have a data member named
  "transportation" that will be populated from a nested XML
  element "bicycle" and whose value will be an instance of the
  generated class "bicycle"::

    <xs:element name="person">
        <complexType>
            <sequence>
                <xs:element name="transportation" type="bicycle" />
            <sequence>
        </complexType>
    </xs:element>

    <xs:element name="bicycle">
        o
        o
        o
    </xs:element>

- To specify a data member of a generated Python class that will
  contain a list of instances of a generated classes and populated
  from nested XML elements, add the "maxOccurs" attribute with
  value "unbounded". Here is an example::

    <xs:element name="person">
        <complexType>
            <sequence>
                <xs:element name="transportation" type="bicycle" maxOccurs="unbounded" />
                <xs:element name="description" type="xs:string" maxOccurs="unbounded" />
            <sequence>
        </complexType>
    </xs:element>

    <xs:element name="bicycle">
        o
        o
        o
    </xs:element>

Here are a few additional rules that will help you to write XML
Schema files for ``generateDS.py``:

- The first (top most) class definition (i.e. the first
  "xs:element" in the .xsd file) is assumed to be the root element
  in XML input files. Possibly XML Schema has another way to
  specify the root, but I was not about to find it in the spec.
  To specify root element, see command line option "--root-element"
  in section `Running generateDS.py`_.

- The "name" attribute of the "xs:element" must match the tag in
  the XML file from which instances of this object will be
  populated. You can change the names of the generated class by
  using the "-p<prefix>" option, which preprends a prefix to each
  class name.

- The "type" attribute of the "xs:element" should match the "name"
  attribute of a (separately defined) type (i.e. an xs:element) in
  order to define a member data item that takes an instance or
  list of instances of a Python class.


Additional constructions
------------------------

Here are a few additional constructions that ``generateDS.py``
understands.


<complexType> at top-level
..........................


You can use the <complexType> element at top level (instead of
<element>) to define an element. So, for example, instead of::

    <xs:element name="server-type">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="server-name" type="xs:string"/>
                <xs:element name="server-description" type="xs:string"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>

you can use the following, which is equivalent::

    <xs:complexType name="server-type">
        <xs:sequence>
            <xs:element name="server-name" type="xs:string"/>
            <xs:element name="server-description" type="xs:string"/>
        </xs:sequence>
    </xs:complexType>

Use of "ref" instead of "name" and "type" attributes
....................................................

You can use the "ref" attribute to refer to another element
definition, instead of using the "name" and "type" attributes. So,
for example, you can use the following::

    <xs:element name="server-info">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="server-comment" type="xs:string"/>
                <xs:element ref="server-type" />
            </xs:sequence>
        </xs:complexType>
    </xs:element>
       in place of this:
    <xs:element name="server-info">
        <xs:complexType>
            <xs:sequence>
                <xs:element name="server-comment" type="xs:string"/>
                <xs:element name="server-type" type="server-type"/>
            </xs:sequence>
        </xs:complexType>
    </xs:element>


Extension types
...............

``generateDS.py`` generates a subclass for each element that that
is defined as the extension of a base element.  So, for the
following::

    <xs:complexType name="BType">
        <xs:complexContent>
            <xs:extension base="AType">
                <xs:sequence>
                    o
                    o
                    o

``generateDS.py`` will generate something like the following::

    class BType(AType):
        o
        o
        o


Elements containing mixed content
.................................

``generateDS.py`` generates special code to handle elements
defined as containing mixed content, that is elements defined with
attribute ``mixed="true"``.  See section `Mixed content`_ for more
details.


.. _`XMLBehaviors`:

XMLBehaviors
============

With the use of the "-b" command line option, ``generateDS.py`` will
also accept as input an XML document instance that describes
behaviors to be added to subclasses when the subclass file is
generated with the "-s" command line option.

An example is provided in the Demos/Xmlbehavior sub-directory of
the distribution.

The XMLBehaviors capability in ``generateDS.py`` was inspired and,
for the most part, designed by gian paolo ciceri
(gp.ciceri@suddenthinks.com).  This work is part of our work on
our application development project for Quixote.


The XMLBehaviors input file
---------------------------

This section describes the XMLBehavior XML document that is used
as input to ``generateDS.py``.  The XMLBehavior XML document is an
XML instance document (given as an argument to the "-b" command
line flag) that describes behaviors (methods) to be added to class
definitions in the subclass file (generated with the "-s" command
line flag).

See file ``xmlbehavior_po.xml`` in the ``Demos/Xmlbehavior``
directory in the distribution for an example that you can use as a
model.

The elements in the XMLBehavior document type are the following:

- <xb:xml-behavior> -- The base element in the document.

  - <xb:base-impl-url> -- The root (left-most portion) of URL
    containing implementation bodies.  Implementation URLs are
    appended to this base URL.

  - <xb:behaviors> -- A list of behaviors.

    - <xb:behavior> -- Describes a single XMLBehavior.

      - <xb:class> -- The name of the class to which this behavior
        is to be added.

      - <xb:name> -- The name of the behavior/method.  Must
        conform to Python name syntax.

      - <xb:args> -- A list of arguments to the behavior/method.

        - <xb:arg> -- A positional argument to the method.

          - <xb:name> -- The name of the argument.

          - <xb:data-type> -- The data-type of the argument.

      - <xb:return-type> -- The data-type of the value returned by
        the behavior/method.

      - <xb:impl-url> -- The URL of the implementation body.  This
        value will be concatenated to the right-hand side of the
        base-impl-url.

      - <xb:ancillaries> -- A list of ancillary behaviors/methods.
        Each ancillary has a role, which defines how it is to be
        used.

        - <xb:ancillary> -- A specification of an ancillary
          behavior/method.

          - <xb:name> -- The name of the behavior/method.  Must
            conform to Python name syntax.

          - <xb:role> -- The method's role.  The following values
            are supported:

            - "DBC-precondition" -- A Design By Contract-style
              pre-condition check.  This method will be called
              *before* the core behavior/method itself.

            - "DBC-postcondition" -- A Design By Contract-style
              post-condition check.  This method will be called
              *after* the core behavior/method itself.

          - <xb:args> -- A list of arguments to the ancillary
            behavior/method.  The element has the same content as
            the <xb:args> element for the core behavior/method.

          - <xb:return-type> -- The data-type of the value returned by
            the behavior/method.

          - <xb:impl-url> -- The URL of the implementation body.
            This value will be concatenated to the right-hand side
            of the base-impl-url.


Implementing other sources for implementation bodies
----------------------------------------------------

``generateDS.py`` contains a function ``get_impl_body()`` that
implements the ability to retrieve implementation bodies.  The
current implementation retrieves implementation bodies from an
Internet Web URL.  Other sources for implementation bodies can be
implemented by modifying ``get_impl_body()``.

As an example, the version that follows first tries to retrieve an
implementation body from a Web address and, if that fails,
attempts to obtain the implementation body from a file in the
local file system using the <xb:base-impl-url> as a path to a
directory containing files, each of which contains one
implementation body and <xb:impl-url> as the file name.  This
implementation of ``get_impl_body`` was provided by Colin
Dembovsky of Systemsfusion Inc.  Thanks, Colin.  (I've included it
in the ``generateDS.py`` script, but commented out, for those who
want to use and possibly extend it.)::

    import requests

    def get_impl_body(classBehavior, baseImplUrl, implUrl):
        impl = '        pass\n'
        if implUrl:
            trylocal = 0
            if baseImplUrl:
                implUrl = '%s%s' % (baseImplUrl, implUrl)
            try:
                impl = requests.get(implUrl).content
            except:
                trylocal = 1
            if trylocal:
                try:
                    implFile = file(implUrl)
                    impl = implFile.read()
                    implFile.close()
                except:
                    print '*** Implementation at %s not found.' % implUrl
        return impl


Additional features
===================

Here are additional features.  Note that some of this support was
contributed by users such as Chris Allan.  Many thanks.

GraphQL support
-----------------

Introduction
..............

You can generate modules that contain code to support GraphQL
queries into the data contained in XML instance documents described
by the XML schema from which you generated your module.  For
information about GraphQL see -- https://graphql.org/.

The generated code uses the Strawberry GraphQL Python module.  See
https://pypi.org/project/strawberry-graphql/ and
https://strawberry.rocks.

In order to use this facility, you must install Strawberry.
See the Strawberry getting started guide at https://strawberry.rocks/docs.

This facility supports GraphQL queries.  There is no support for
mutations.

There is a demo in the ``Demo/People`` directory of the source code
repository.  See ``README.txt`` and the scripts ``run-gen-graphql.sh`` and
``run-test-graphql.sh`` in that directory.  The source code repository can be
downloaded from https://sourceforge.net/p/generateds/ or can be
cloned with the following::

    $ hg clone http://hg.code.sf.net/p/generateds/code generateds-code

There is more information along with some examples of usage here --
http://reifywork.com/generateds-graphql.html.

Generating a module containing GraphQL support
................................................

In order to run the Strawberry GraphQL server for your data, you
must first generate a ``generateDS.py`` module that contains Strawberry
GraphQL support code.  Such a module **cannot** be used as a normal
``generateDS.py`` module.

A module that contains this support code, is generated with the
"--graphql=" (or "-g") command line option when you run
`generateDS.py`. For example::

    $ generateDS.py -o my_module.py --graphql="tagname:typename" my_xml_schema.xsd

Where:

- ``tagname`` is the tag of the top-most element in your XML
  instance documents.

- ``typename`` is the name of the XML ``complexType`` for the
  top-most element in your XML instance documents.

You have now generated a GraphQL API against which you can apply
GraphQL queries.  These queries follow the nested structure of your
XML instance document.  At each level, you have access to the
child nodes and the attributes of the nodes at that level.


Run the Strawberry server
...........................

After generating your module, you can run the Strawberry server that
the module implements with something like the following::

    $ strawberry server my_module

**Important** -- Before running the Strawberry server on your
module, you must set the environment variable ``GRAPHQL_ARGS`` to
specify the path/name of an XML instance document that contains the
data to which you want to apply GraphQL queries.  This XML document
must be an instance of the document type of the schema used to
generate your module and should validate against that schema.

I'm on Linux, and so I can do that with, for example, either of the
following, assuming that "my_module.py" is the name of the module
generated by ``generateDS.py``::

    $ export GRAPHQL_ARGS=my_input_data.xml
    $ strawberry server my_module

Or, on a single line::

    $ GRAPHQL_ARGS=my_input_data.xml strawberry server my_module

For help, run the following::

    $ strawberry --help
    $ strawberry server --help


Make requests and queries
.............................

After starting Strawberry, you can interact with your
Strawberry server in the following ways (among others):

- Visit ``http://0.0.0.0:8000/graphql`` in your Web browser and use
  the Strawberry interactive REPL (read–eval–print loop).  Note that
  interactive Strawberry supports tab completion (use the Tab key
  and Ctrl-spacebar) and provides visual hints to guide you through
  your generated GraphQL API.  And near the upper left of the screen
  (in your Web browser) are buttons that will display (1) a
  ``Documentation Explorer`` and (2) a ``GraphQL Explorer``, which,
  among other things, will describe your GraphQL API and enable you
  to interactively point and click to generate queries for your API.

- Use ``cUrl`` to make requests (see https://curl.se/) and receive
  JSON data.  For example, here is a ``bash`` shell script that I
  can run from the Linux command line::

      #!/usr/bin/bash -x

      curl 'http://0.0.0.0:8000/graphql' \
        -X POST \
        -H 'content-type: application/json' \
        --data '{
          "query": "{ container { author { name book { author title date genre rating }}}}"
        }'

- Use a Python script and the Python ``requests`` module (see
  https://pypi.org/project/requests/ and
  https://requests.readthedocs.io/en/latest/).  For example::

      #!/usr/bin/env python

      """
      synopsis:
          Demonstration of Python requests to a Strawberry GraphQL server.
      usage:
          python request01.py
      """

      import requests
      import pprint

      Query01 = """
      { "query":
      "{
        people {
          person {
            name
            interest
            promoter {
              firstname
              lastname
              clientHandler {
                fullname
              }
            }
          }
          programmer {
            name
            interest
            category
            fruit
            vegetable
            value
            range_
            elparam {
              name
              semantic
            }
          }
        }
      }"
      }
      """

      def test():
          headers = {
              'content-type': 'application/json',
          }
          url = "http://0.0.0.0:8000/graphql"
          query = Query01
          query = query.replace('\n', ' ')
          response = requests.post(url, query, headers=headers)
          jsonobj = response.json()
          data = jsonobj["data"]
          print('-' * 40)
          pprint.pprint(data)
          print('-' * 40)

      def main():
          test()

      if __name__ == "__main__":
          main()


xsd:list element support
------------------------

xsd:list elements can be used with a child xsd:simpleType which
confuses the XschemaHandler stack unrolling.  xsd:list element
support should allow the following XML Schema definition to be
supported in ``generateDS.py``::

    <xsd:attribute name="Foo">
        <xsd:simpleType>
            <xsd:list>
                <xsd:simpleType>
                    ...
                </xsd:simpleType>
            </xsd:list>
        </xsd:simpleType>
    </xsd:attribute>


xsd:enumeration support
-----------------------

The enumerated values for the parent element are resolved and made
available through the instance attribute ``values``.


xsd:union support
-----------------

In order to properly resolve and query types which are unions in
an XML Schema, an element's membership in an xsd:union is
available through the instance attribute ``unionOf``.


Extended xsd:choice support
---------------------------

When a parent xsd:choice is exists, an element's "maxOccurs" and
"minOccurs" values can be inherited from the xsd:choice rather
than the element itself. xsd:choice elements have been added to
the child element via the ``choice`` instance attribute and are
now used in the "maxOccurs" and "minOccurs" attribute resolution.
This should allow the following XML Schema definition to be
supported in ``generateDS.py``::

    <xsd:element name="Foo">
        <xsd:complexType>
            <xsd:choice maxOccurs="unbounded">
                <xsd:element ref="Bar"/>
                <xsd:element ref="Baz"/>
            </xsd:choice>
        </xsd:complexType>
    </xsd:element>


Arity, minOccurs, maxOccurs, etc
--------------------------------

Some applications require information about the "minOccurs" and
"maxOccurs" attributes in the XML Schema.  Some of that information
can be obtained by using the --member-specs= (list|dict) command line
option, then looking at the ``member_data_items_`` class variable
that it generates in each data representation class.  In particular,
look at the ``get_container`` method (from class ``MemberSpec_``).


More thorough content type and base type resolution
---------------------------------------------------

The previous content type and base type resolution is insufficient
for some needs.  Basically it was unable to handle more complex
and shared element and simpleType definitions.  This support has
been extended to more correctly resolve the base type and properly
indicate the content type of the element.  This should provide the
ability to handle more complex XML Schema definitions in
``generateDS.py``.  Documentation on the algorithm for how this is
achieved is available as comments in the source code of
``generateDS.py`` -- see comments in method ``resolve_type`` in
class ``XschemaElement``.


Making top level simpleTypes available from XschemaHandler
----------------------------------------------------------

Some developers working to extend the analysis and code generation
in ``generateDS.py`` may be helped by additional information
collected during the parsing of the XML Schema file.

Some applications need all the top level simpleTypes to be
available for further queries after the SAX parser has completed
its work and after all types have been resolved.  These types are
available as an instance attribute ``topLevelSimpleTypes`` inside
``XschemaHandler``.


Namespaces -- inserting namespace definition in exported documents
------------------------------------------------------------------

In some cases, the document produced by a call to an export method
will contain elements that have namespace prefixes.  For example,
the following snippet contains namespace prefix "abc"::

    <abc:people >
        <abc:person>
        o
        o
        o
        </abc:person>
    </abc:people>

A way is needed to insert a namespace prefix definition into the
generated document.  Here is how ``generateDS.py`` fills that need.

Each generated export method takes an optional argument
``namespacedef_``.  If provided, the value of that parameter is
inserted in the exported element.  So, for example, the following
call::

    people.export(sys.stdout, 0,
        namespacedef_='xmlns:abc="http://www.abc.com/namespace"')

might produce::

    <abc:people xmlns:abc="http://www.abc.com/namespace">
        <abc:person>
        o
        o
        o
        </abc:person>
    </abc:people>

If this is an issue for you, then you may also want to consider
using the "--namespacedef" command line option when you run
``generateDS.py``.  The value of this option will be passed in to
the export function in the generated parse functions.  So, for
example, running ``generateDS.py`` as follows::

    generateDS.py --namespacedef='xmlns:abc="http://www.abc.com/namespace.xsd"'
        -o mylib.py -s myapp.py myschema.xsd

will generate parse methods that automatically add the
``namespacedef_`` argument to the call to export.


Support for xs:any
--------------------

There is minimal support for the ``xs:any`` wild card declaration.
Effectively, an element defined by an ``xs:complexType`` containing
``xs:any`` can contain any element type as a child element.  Because
``generateDS.py`` does not know how to generate code to handle
specific element types during the parsing and building of an XML
instance document, it generates a call to a method ``gds_build_any``
in the ``GeneratedsSuper`` class.  This method has a default
implementation in the generated code.  If your XML schema uses
``xs:any``, you may need to add some code to that default
implementation of ``gds_build_any``.  See section `Overridable
methods -- generatedssuper.py`_ for guidance on how to provide an
implementation of that method.

For more help with this, look at the code generated from an XML
schema that uses ``xs:any``.  In particular, look at the code
generated in the Python class corresponding to the
``xs:complexType`` containing ``xs:any`` and look at the default
implementation of method ``gds_build_any`` in class
``GeneratedsSuper``.  Reading the code in the ``buildChildren`` and
``exportChildren`` methods of a class containing a child declared
with ``xs:any`` should help you understand what is going on.

When you starting developing your implementation of
``gds_build_any``, look at the code generated in several
``buildChildren`` methods.  It's likely that you will be able to
copy, paste, and edit code from there.


Generating Lxml Element tree
------------------------------

Once you have build the tree of objects that are instances of the
classes generated by ``generateDS.py``, you can use this to produce
a tree of instances of the Lxml Element instances.  See
http://lxml.de/ for more about Lxml.  And, see the function
``parseEtree`` in the generated code for an example of how to
produce the Lxml Element tree::

    def parseEtree(inFileName):
        doc = parsexml_(inFileName)
        rootNode = doc.getroot()
        rootTag, rootClass = get_root_tag(rootNode)
        if rootClass is None:
            rootTag = 'test'
            rootClass = Test
        rootObj = rootClass.factory()
        rootObj.build(rootNode)
        # Enable Python to collect the space used by the DOM.
        doc = None
        mapping = {}
        rootElement = rootObj.to_etree(None, name_=rootTag, mapping_=mapping)
        reverse_mapping = rootObj.gds_reverse_node_mapping(mapping)
        content = etree_.tostring(
            rootElement, pretty_print=True,
            xml_declaration=True, encoding="utf-8")
        sys.stdout.write(content)
        sys.stdout.write('\n')
        return rootObj, rootElement, mapping, reverse_mapping


Mapping generateDS objects to Lxml Elements and back
......................................................

Now suppose that you have produced the tree of instances of the
generated classes, and suppose that you have used that to produce a
tree of instances of the Element class from Lxml.  It may be useful
to have a dictionary that maps instances in one tree to the
corresponding instances in the other.  You can create that
dictionary by passing an empty dictionary as the value of the
optional parameter ``mapping_`` in the call to the ``to_tree``
method.  And, you can produce the reverse mapping by calling the
convenience method ``gds_reverse_node_mapping`` from superclass
``GeneratedsSuper``.  Again, see the code above for an example.


Specifying names for anonymous nested type definitions
--------------------------------------------------------

``generateDS.py`` automatically assigns names for types (and the
classes generated from them), when that type definition (for
example, ``xs:complexType``) does not have a name and it is nested
inside another type definition.  However, these assigned names, in
part because of the need to make them unique, can be difficult to
predict.

Therefore, ``generateDS.py`` provides a way to specify the names of
the Python classes generated from these anonymous, nested types.  To
do so, follow these steps:

1. Create a module named ``gds_inner_name_map``
   (``gds_inner_name_map.py``).

2. Place that module where Python can import it when you run
   ``generateDS.py``.  You can do this either by adding an
   additional directory to the environment variable ``PYTHONPATH``
   or by placing ``gds_inner_name_map.py`` in a directory that is
   already on Python's search path for modules.  You can check this
   by running the following Python code snippet::

       import sys
       print sys.path

   Also see: https://docs.python.org/2/library/sys.html#sys.path

3. In module ``gds_inner_name_map``, define a global variable
   ``Inner_name_map``.  The value of this variable should be a
   dictionary that maps 2-tuples containing (a) the name of the
   grandparent type and (b) the name of the parent type onto the new
   name.

If ``generateDS.py`` cannot import ``Inner_name_map`` from
``gds_inner_name_map``, then it will, by default, generate unique
names.  In particular, it automatically generates names for
anonymous, nested types when the following Python statement fails::

    from gds_inner_name_map import Inner_name_map

Here is an example of module ``gds_inner_name_map.py``::

    Inner_name_map = {
        ("classAType", "inner"): "inner_001",
        ("classBType", "inner"): "inner_002",
    }

Usage hints:

- When ``generateDS.py`` succeeds in importing ``Inner_name_map``
  from ``gds_inner_name_map``, but *cannot* find one of the required
  mappings in that dictionary, it will throw an exception and print
  out the missing mapping.  You can copy this line; paste it into
  your ``gds_inner_name_map.py``; and edit it so as to specify the
  class name of your choice.

- Make sure that the names you specify are unique within your XML
  schema.


Controlling and using simple type validation
----------------------------------------------

Simple types that are defined with restrictions are validated:
their restrictions are checked.

See any of the generated methods whose names are of the form
``validate_xxx_`` and their calls in the build method in order to
learn how these methods are used and how you can use them.

You can turn this checking off by setting the global variable
``Validate_simpletypes_`` in your generated module to False.

Warning messages that are produced when the value of a simple type
fails to validate against its restrictions are collected in an
instance of class ``GdsCollector_``.  This class is defined in your
generated module.  See the ``parse*`` methods in your generated
module for help with using a collector.


Generating validator methods
------------------------------

``generateDS.py`` generates a validator method in a complex type
class for each use of a simple type that contains restrictions on a
simple type.  During the build process, the generated code calls
these methods to ensure that input data (from an XML instance
document) validates against those restrictions.  Note that this is
done only if the global variable ``Validate_simpletypes_`` is True,
which is the default.

You can, of course, call any of these validator methods manually and
programmatically.

``generateDS.py`` also supports a feature that enables you to
generate a validator method that calls all the specific validator
methods on each attribute and child in the class for which that
specific validator method is relevant.

To instruct ``generateDS`` to do this, add the word "validate" to
the "--export" command line option.  For example::

    generateDS.py -o mylib.py --export="write validate" myschema.xsd

To learn how to perform validation, take a look at a generated
``validate_`` method in your generated module.

A few usage hints:

- In order to call the ``validate_`` method, you will need to create
  an instance of class ``GdsCollector_``.  This class is defined in
  your generated module.  You will pass this as an argument to the
  ``validate_`` method and later can use this collector instance to
  retrieve any generated validation warning messages.

- Notice that each ``validate_`` method has a ``recursive``
  parameter, which is True by default.  You can use this to control
  whether (1) only a single instance is validated or (2) whether the
  current instance and all children are recursively validated.


How to use the generated source code
=======================================

The parsing functions
---------------------

The simplest use is to call one of the parsing functions in the
generated source file. You may be able to use one of these
functions without change, or can modify one to fit your needs.
``generateDS.py`` generates the following parsing functions:

- parse -- Parse an XML document from a file.

- parseString -- Parse an XML document from a string.

These parsing functions are generated in both the superclass and
the subclass files. Note the call to the export method. You may
need to comment out or un-comment this call to export according to
your needs.

For example, if the generated source is in people.py, then, from
the command line, run something like the following::

    python people.py people.xml

Or, from within other Python code, use something like the
following::

    import people
    rootObject = people.parse('people.xml')


Recognizing the top level element
-----------------------------------

It might be that the generated module, when parsing an XML instance
document, does not, by default, recognize the top level (root)
element in an instance document.  This might happen because
``generateDS.py`` does not detect the correct top level element
from the XML schema or because you need to use the generated module
to parse instance documents that have *different* top level
elements.  If this is the case, you might pick and use one of the
following strategies:

1. In your schema, move the definition of the element type that
   defines the top level element in your instance documents to the
   top of the schema.  By default, generateDS.py uses the first
   definition in the schema as the when constructing the generated
   ``parse`` function.

2. Use the "--root-element" command line option to specify top level
   element.  But, be aware that this only works if the tag name and
   type name of the top level element are the same.

3. Modify the ``parse`` function in your generated module, replacing
   the class whose factory is called and the tag name passed in to
   the export method.  For example, change::

       def parse(inFileName):
           doc = minidom.parse(inFileName)
           rootNode = doc.documentElement
           rootObj = type1.factory()
           rootObj.build(rootNode)
           # Enable Python to collect the space used by the DOM.
           doc = None
           sys.stdout.write('<?xml version="1.0" ?>\n')
           rootObj.export(sys.stdout, 0, name_="type1",
               namespacedef_='')
           return rootObj

   to::

       def parse(inFileName):
           doc = minidom.parse(inFileName)
           rootNode = doc.documentElement
           rootObj = type2.factory()
           rootObj.build(rootNode)
           # Enable Python to collect the space used by the DOM.
           doc = None
           sys.stdout.write('<?xml version="1.0" ?>\n')
           rootObj.export(sys.stdout, 0, name_="type2",
               namespacedef_='')
           return rootObj

   Notice that we've changed the two occurrences of "type1" to
   "type2".

4. Using the generated ``parse`` function as a model, create a
   separate module that imports your generated module.  In the
   ``parse`` function in your module, make a change similar to that
   suggested above.  And, of course, add any additional code needed
   by your application.

5. Write a separate module containing your own parse function that
   inspects the top level element of an input XML instance document
   and automatically determines which generated class should be used
   to parse it.  Here is an example::

       #!/usr/bin/env python

       import sys
       from optparse import OptionParser
       from xml.dom import minidom
       import mygeneratedmodule as gendsmod

       def get_root_tag(node):
           tag = node.tagName
           tags = tag.split(':')
           if len(tags) > 1:
               tag = tags[-1]
           rootClass = None
           if hasattr(gendsmod, tag):
               rootClass = getattr(gendsmod, tag)
           return tag, rootClass

       def parse(inFilename, options):
           doc = minidom.parse(inFilename)
           rootNode = doc.documentElement
           rootTag, rootClass = get_root_tag(rootNode)
           rootObj = rootClass.factory()
           rootObj.build(rootNode)
           # Enable Python to collect the space used by the DOM.
           doc = None
           sys.stdout.write('<?xml version="1.0" ?>\n')
           rootObj.export(sys.stdout, 0, name_=rootTag,
               namespacedef_='')
           doc = None
           return rootObj

       USAGE_TEXT = """
           python %prog [options] <somefile.xml>"""

       def usage(parser):
           parser.print_help()
           sys.exit(1)

       def main():
           parser = OptionParser(USAGE_TEXT)
           (options, args) = parser.parse_args()
           if len(args) == 1:
               infilename = args[0]
               parse(infilename, options)
           else:
               usage(parser)

       if __name__ == "__main__":
           main()

   Notice the call to ``get_root_tag``, which attempts to recognize
   the top level tag in the input XML document so that the ``parse``
   function can parse and export it.


The export methods
------------------

The generated classes contain methods ``export`` and
``exportLiteral`` which can be called to export classes to several
text formats, in particular to an XML instance document and a
Python module containing Python literals.  See the generated parse
functions for examples showing how to call the export methods.


Method export
.............

The export method in generated classes writes out an XML document
that represents the instance that contains it and its child
elements.  So, for example, if your instance tree was created by
one of the parsing functions described above, then calling
``export`` on the root element should reproduce the input XML
document, differing only with respect to ignorable white space.

Arguments to the generated export method:

- ``outfile`` -- A file like object open for writing.

- ``level`` -- the indentation level.  If the pretty_print argument is
  True, the (generated) function showIndent is used to prefix each
  exported line with 4 spaces for each level of indent.

- ``namespace_`` -- An empty string or an XML namespace prefix plus a
  colon, example "abc:".  This value is printed immediately in front
  of the tag name.

- ``name_`` -- The element tag name.  Note that the tag name can be
  overridden by the ``original_tagname_``, which can be set by the
  class constructor.

- ``namespacedef_`` -- Zero or more namespace prefix definitions.
  Actually, its value can be any attribute-value pairs.
  Examples:

      ''
      'xmlns:abc="http://www.abc.com"http://www.def.com"
      'xmlns:abc="http://www.abc.com xmlns:def="http://www.def.com"

  or, because it is printed where the attributes occur, even:

      'size="25" color="blue"'

  For more on ``namespacedef_``, see:
  `Namespaces -- inserting namespace definition in exported documents`_

- ``pretty_print`` -- If True, exported output is printed with
  indentation and newlines.  If False, indentation and newlines are
  omitted, which produces a more compact representation.

Also see the comments on ``generatedsnamespaces`` and
``GenerateDSNamespaceDefs`` near the top of each generated module.


Method ``exportLiteral``
........................

``generateDS.py`` generates Python classes that represent the
elements in an XML document, given an Xschema definition of the
XML document type. The ``exportLiteral`` method will export a
Python literal representation of the Python instances of the
classes that represent an XML document.


What It Does
,,,,,,,,,,,,

When ``generateDS.py`` generates the Python source code for your
classes, this new feature also generates an ``exportLiteral``
method in each class. If you call this method on the root
(top-most) object, it will write out a literal representation of
your class instances as Python code.

``generateDS.py`` also generates a function at top level
(parseLiteral) that parses an XML document and calls the
"exportLiteral" method on the root object to write the data
structure (instances of your generated classes) as a Python module
that you can import to (re-)create instances of the classes that
represent your XML document.


Why You Might Care
,,,,,,,,,,,,,,,,,,

``generateDS.py`` was designed and built with the assumption that
we are *not* interested in marking up text content at all.  What
we really want is a way to represent structured and nested date in
text.  It takes the statement, "I want to represent nested data
structures in text.", entirely seriously.  Given that assumption,
there may be times when you want a more "Pythonic" textual
representation of the Python data structures for which
``generateDS.py`` has generated code.  ``exportLiteral`` enables
you to produce that representation.


This feature means that the classes that you generate from an XML
schema support the interchangeability of XML and Python literals.
This means that, given classes generated by ``generateDS.py`` for
your XML document type, you can perform the following
transformations:

- Translate an XML document into a Python module containing a
  literal definition of the contents of the XML document.

- Translate the literal definition of a Python data structure into
  an XML instance document.

This capability enables you to:

- Work with an XML (text) document, then exchange it for a Python
  text representation of the content of that document.

- Work with a Python literal text representation of your XML
  document, then exchange that for an XML document that represents
  the same content.

- "Freeze" your XML document as a Python module that you can
  import. The module can be edited with your text editor, so
  perhaps it would be better to say that it is frozen, but not too
  hard. The classes that you generate with ``generateDS.py`` can
  be used to:

  1. Read in an XML document.

  2. (Optionally) modify the Python instances that represent that
     XML document.

  3. Write the instances out as a Python module that you can later
     import.

How to use it
,,,,,,,,,,,,,

See the generated function ``parseLiteral`` for an example of how
to use ``exportLiteral``.


Exporting compact XML documents
.................................

You can also export "compact" XML documents.  A compact document is
one that is exported *without* the ignorable whitespace that is used
to produce pretty printed documents.  In contrast, a pretty printed
document will have leading white space on most lines to show
indentation.

To produce compact documents, pass the optional argument
``pretty_print=False`` to the export function.  Check the "parse"
functions generated near the bottom of modules generated by
``generateDS.py``, where ``pretty_print=True`` is passed in by
default.


Building instances
------------------

If you have an instance of a minidom node that represents an
element in an XML document, you can also use the 'build' member
function to populate an instance of the corresponding class. Here
is an example::

    from xml.dom import minidom
    from xml.dom import Node

    doc = minidom.parse(inFileName)
    rootNode = doc.childNodes[0]
    people = []
    for child in rootNode.childNodes:
        if child.nodeType == Node.ELEMENT_NODE and child.nodeName == 'person':
            obj = person()
            obj.build(child)
            people.append(obj)


Using the subclass module
-------------------------

If you choose to use the generated subclass module, and I
encourage you to do so, you may need to edit and modify that
file. Here are some of the things that you must do (look for
"???"):

- Edit the import statement at the top of the file. It should
  import the generated superclass file.  Note that you can also
  use the "--super" command line option to insert this
  automatically.

- Edit the USAGE_TEXT string so that it gives a help message
  appropriate for your use.

- Edit the main function toward the bottom of the file. It should
  call a method, that you have possibly added, to the root
  subclass.

You can also (and most likely will want to) add methods to the
generated classes. See the section `How to Modify the Generated
Code`_ for more on this.

The classes generated from each element definition provide getter
and setter methods to access its attributes and child elements.

Elements that are referenced but not defined (i.e. that are
simple, for example strings, integers, floats, and booleans) are
accessed through getter and setter methods in the class in which
they are referenced.


Elements with attributes but no nested children
-----------------------------------------------

Element definitions that contain attributes but *no* nested child
elements provide access to their data content through getter and
setter methods ``getValueOf_`` and ``setValueOf_`` and member
variable ``valueOf_``.


.. _`Mixed content`:

Mixed content
-------------

The goal of ``generateDS.py`` is to support data structures
represented in XML as opposed to text mark-up.  However, it does
provides some support for mixed content.  But, for mixed content,
the data structures and code generated by ``generateDS.py`` are
fundamentally different from those for elements that do not
contain mixed content.

There are limitations, of course.  A known limitation is related
to extension elements.  Specifically, if an element contains mixed
content, and this element extends a base class, then the base
class and any classes it extends must be defined to contain mixed
content.  This is due to the fact that ``generateDS.py`` generates
a data structure (class) for elements containing mixed content
that is fundamentally different from that generated for other
elements.


Here is an example of mixed content::

    <note>This is a <bold>nice</bold> comment.</note>

When an element is defined with something like the following::

    <xs:complexType mixed="true">
        <xs:sequence>
            o
            o
            o

then, instead of generating a class whose named members refer to
nested elements, a class containing a list of instances of class
``MixedContainer`` is generated.  In order to process the content
of a mixed content element, the code you write will need to walk
this list of instances of ``MixedContainer`` and check the type of
each item in that list.  Basically, the structure becomes more
DOM-like in the sense that it has a list of children, rather than
named fields.

Instances of ``MixedContainer`` have the following methods:

- ``getCategory`` -- Returns one of the following, depending on
  the content:

  - CategoryText -- Text content.

  - CategorySimple -- Simple elements, that is, elements defined
    as xs:string, xs:integer, etc.  For these, the member variable
    ``content_type``, accessible through method ``getContenttype``
    will contain one of TypeString, TypeInteger, TypeFloat,
    TypeDecimal, TypeDouble, or TypeBoolean.

  - CategoryComplex -- Complex elements represented by a generated
    class.  For these, the member variable ``name``, accessible
    through method ``getName`` will return the element/tag name
    and the member variable ``value``, accessible through method
    ``getValue`` will return the instance.

- ``getContenttype`` -- Returns one of TypeString, TypeInteger,
  TypeFloat, TypeDecimal, TypeDouble, or TypeBoolean.  Valid only
  when category is CategorySimple.

- ``getName`` -- For CategoryComplex, returns the name of the
  element.

- ``getValue`` -- Returns the value of this chunk of content. Its
  type depends on the value returned by ``getCategory`` and
  ``getContenttype``.

Note that elements defined with attributes but with *no* nested
sub-elements do not need to be declared as "mixed".  For these
elements, character data is captured in a member variable
``valueOf_``, and can be accessed with member methods
``getValueOf_`` and ``setValueOf_``.


.. _`anyAttribute`:

anyAttribute
------------

For elements that specify ``anyAttributes``, ``generateDS.py``
produces a class containing the following:

- A member variable ``anyAttributes_`` containing a Python
  dictionary.  After parsing an XML instance document, this
  dictionary will contain name-value pairs for any attributes in
  the instance document not explicitly defined for that element.

- The following getters and setters: ``getAnyAttributes_`` and
  ``setAnyAttributes_``.

- Code to export the attribute names and values stored in the
  dictionary.

- Code to parse attributes in addition to those explicitly defined
  for the element and store them in the dictionary.

**Note:** Attributes that are explicitly defined for an element
are *not* stored in the dictionary ``anyAttributes_``.

``generateDS.py`` ignores the ``processContents`` attribute on the
``anyAttribute`` element in the XML Schema


User Methods
------------

``generateDS.py`` provides a mechanism that enables you to attach
user defined methods to specific generated classes.  In order to
do so, create a Python module containing specifications of those
methods and indicate that module on the command line with the
"--user-methods" option.  Example::

    python generateDS.py -f --super=people_sup -o people_sup.py -s people_sub.py --user-methods=/path/to/gends_user_methods.py people.xsd

The argument to the "--user-methods" (or "-u") command line option
is a path to a Python module.  It should include ".py" at the end.
Examples::

    -u gends_user_methods.py
    -u path/to/methods_module.py

The module specified with the "--user-methods" flag should define
a variable ``METHOD_SPECS`` which contains a list of instances of
a class that implements methods ``match_name`` and
``get_interpolated_source``.

See file `gends_user_methods.py`_ for an example of this
specification file and the definition of class ``MethodSpec``.
Read the comments in that file for more guidance.

The ``member_data_items_`` class variable -- User methods,
especially those attached to more than one class, are likely to
need a list of the members in the current class.  Each generated
class has a class variable containing a list of specifications of
the members in the class.  Each item in this list is an instance
of class ``MemberSpec_``, which is defined near the top of your
generated (super-class) file.  Use the following to access the
information in each member specification:

- ``m.get_name()`` -- Returns the name of the member variable (a string).

- ``m.get_data_type()`` -- Returns the data type of the member
  variable (a string).  If the data type is a list, returns the
  terminal type, which is that last string in the list.  (Also see
  ``get_data_type_chain()``.)

- ``m.get_data_type_chain()`` -- Returns the data type of the
  member variable (a string or list).  When the data type is a
  simpleType that has another simpleType as it's base or is a
  complexType that extends a simpleType, then the data type is a
  list of strings, for example::

      ['RelationType', 'xs:string']

  The last string in the list is the terminal type, usually a
  built-in simple type.  Note that ``m.get_data_type()`` returns
  the terminal (last) type.

- ``m.get_container()`` -- (an integer) Indicates whether the
  member variable is a single item or a list/container (i.e.
  generated from maxOccurs > 0): 0 indicates a single item; 1
  indicates a list.

- ``m.get_optional()`` -- (an integer) Returns 0 (zero) if the item
  is optional (defined with minOccurs="0"), else returns 1.

There are a number of things of interest in this sample file
(`gends_user_methods.py`_):

- Although, the ``MethodSpec`` class must be included in your user
  methods specification module, you can modify this class. For
  example, for special situations, it might be useful to modify
  either of the methods ``MethodSpec.match_name`` or
  ``MethodSpec.get_interpolated_source``.  These methods are
  called by ``generateDS.py``.  See comments on the definitions of
  these methods in `gends_user_methods.py`_.

- A method ``set_up`` is attached to the root class.  (This user
  method specification module is intended to be used with
  ``people.xsd/people.xml`` in the ``Demos/People`` directory.)
  It performs initialization, before the ``walk`` method is
  called.  In particular, ``set_up`` initializes a counter and
  imports the ``types`` module (which saves us from having to
  modify the generated code).

- The ``walk_and_update`` and ``walk_and_show`` methods provide an
  example showing how to walk the entire document object tree.

- The method ``walk_and_update`` uses the ``member_data_items_``
  class variable to obtain a list of members of the class.  It's a
  list of instances of class ``MemberSpec_``, which support the
  ``m.get_name()``, ``m.get_data_type()``, and ``m.get_container()``
  methods described above.

- In method ``walk_and_show``, note the use of ``getattr`` to
  retrieve the value of a member variable and the use of
  ``setattr`` to set the value of a member variable.

- The expression "%(class_name)s" is used to insert the class name
  into the generated source code.

- Notice how the ``types`` module is used to determine whether a
  member variable contains a simple type or an instance of a
  class.  Example::

      obj1 = getattr(self, member[0])
      if type(obj1) == types.InstanceType:
          ...

- In string formatting operations, you will need to use double
  percent signs in order to "pass through" a single percent sign,
  for example::

        print '%%d. class: %(class_name)s  depth: %%d' %% (counter, depth, )

  where the single percent signs are interpolated
  ("%(class_name)s" is replace by the class name), and double
  percent signs are replace by single percent signs ("%%d" becomes
  "%d").

Suggestion -- How to begin:

1. Make a copy of `gends_user_methods.py`_.

2. Modify the method specifications in that file.  Replace the
   source code and the class_name pattern in each specification.

3. Run ``generateDS.py`` with the "--user-methods" (or "-u") flag.

4. Inspect the user methods in the generated classes.

5. Test your generated code.

6. Repeat as necessary.


.. _`gends_user_methods.py`: https://bitbucket.org/dkuhlman/generateds/src


Overridable methods -- generatedssuper.py
-------------------------------------------

``generateDS.py`` generates calls to several methods that each have
a default implementation in a superclass.  The default superclass
with default implementations is included in the generated code.
The user can replace this default superclass by implementing a
module named ``generatedssuper.py`` containing a class named
``GeneratedsSuper``.

What to look for in the generated code:

- In the generated superclass file (generated with command line
  option "-o"), look for the import of module
  ``generatedssuper.py`` and the definition of the (default)
  class ``GeneratedsSuper``.

- Also look for calls to methods ``format_integer()``,
  ``format_float()``, ``format_double()``, etc.

To view the default implementation of class GeneratedsSuper, look in
a generated superclass module (one generated by the "-o" command
line option with generateDS.py). The default definition of class
GeneratedsSuper is near the top of a generated module.

If you wish to modify the behavior of any of these methods, see
below for instructions on how to do so.

**Caution:** Overriding any of the ``*_format_*()`` methods
enables you to export invalid XML.  So, use at your own risk, test
before using, etc.

How to modify the behavior of the default methods:

- Implement methods that override the default methods.

- Look at the definition of the default methods in class
  ``GeneratedsSuper`` in order to learn the signature of the
  methods in that class.

- Look at the definition of the default methods to determine what
  they do and what type of value they return, then do something
  similar in your overriding method.

- Search for and look at the call to the method you are interested
  in modifying (for example ``gds_format_string``) to learn where
  and when it is used and for what.

Where to put (implement) methods that override the default methods
-- You can place the implementations of methods that override the
default methods in the following places:

- In a class named ``GeneratedsSuper`` in a separate module named
  ``generatedssuper``.  Since this class would replace the
  default implementations, you should provide implementations of
  all the default methods listed above in that class.  To create
  your own version, copy and paste the default implementation of
  class ``GeneratedsSuper`` from your generated module into a file
  named ``generatedssuper.py``, then modify that.

- In individual generated (super) classes (the ones generated with
  the "-o" command line option) using the `User Methods`_ feature.

- In individual classes in a subclass module generated with the "-s"
  command line option.

  If you want to use the same method in more than one generated
  subclass, then you might consider putting that method in a
  "mix-in" class and inherited that method in the generated
  subclass.  With this approach, you must put the mix-in class
  containing your methods before the regular superclass, so that
  Python will find your custom methods before the default ones.
  That is, you must use::

      class clientSub(MySpecialMethods, supermod.client):

  not::

      class clientSub(supermod.client, MySpecialMethods):

If you choose to implement module ``generatedssuper``, here are
a few instructions and suggestions:

- Implement a module ``generatedssuper.py`` containing definition of
  a class ``GeneratedsSuper``.  You can copy and paste the default
  implementation from a superclass module generated with the -o
  command line option for ``generateDS.py``.

- Put this module in a location where it can be imported when your
  generated code is run.  Note the ``try:except:`` block in your
  generated superclass module that attempts to import it and that
  uses the default implementation of ``GeneratedsSuper`` when it
  cannot.

- An easy way to begin is to copy the default definition of the
  class ``GeneratedsSuper`` from a superclass module generated with
  the "-o" command line option into a module named
  ``generatedssuper.py``. Then modify your (copied) implementation.

- To implement a method that does a task specific to particular
  class or a particular member of a class, do something like the
  following::

      def gds_format_string(self, input_data, input_name=''):
          if self.__class__.__name__ == 'person':
              return '[[%s]]' % input_data
          else:
              return input_data

  or::

      def gds_format_string(self, input_data, input_name=''):
          if self.__class__.__name__ == 'booster' and input_name == 'lastname':
              return '[[%s]]' % input_data
          else:
              return input_data

  Alternatively, to attach a method to a specific class, use the
  `User Methods`_ or a generated subclass module (command line
  option "-s"), as described above.

- You can also add additional, new methods that you call (for
  example, in subclasses that you generate with the -s command line
  option for ``generateDS.py``.


The element name to class name dictionary
-------------------------------------------

``generateDS.py`` automatically generates a dictionary that maps
element/complexType names to the names of the class generated for
that complexType definition.  This dictionary is named
``GDSClassesMapping``.  You will find it in the module generated
with the "-o" option.


Adding custom exported attributes and namespace prefix definitions
--------------------------------------------------------------------

You can add additional attributes to exported XML content by (1)
providing a module named ``generatedsnamespaces.py``; (2) placing
that module somewhere so that it can be imported when you "run" your
generated module; and (3) including in ``generatedsnamespaces.py`` a
global variable named ``GenerateDSNamespaceDefs`` whose value is a
Python dictionary.  The keys in this dictionary should be element
type names in the generated module.  And the values should be text
strings that are attributes to be added to exported elements of that
type.

Here is an example::

    # file: generatedsnamespaces.py

    GenerateDSNamespaceDefs = {
        "A1ElementType": 'xmlns:abc="http://www.abc.com/namespace_a1"',
        "A2ElementType": 'xmlns:abc="http://www.abc.com/namespace_a2"',
    }

Notes:

- While the original intension of this facility was to enable the
  user to add XML namespace prefix definitions to the XML content in
  exported files, you can use it to add other attribute definitions
  as well.

- If you find that ``generateDS.py`` is adding a specific namespace
  prefix definition to many exported XML elements and you want to
  suppress this behavior, take a look at the ``--no-namespace-defs``
  command line option.  In particular, this command line option may
  be useful when used together with the capability described in this
  section (``generatedsnamespaces.py``).


Namespace prefixes, xsi:type attributes, and abstract extended types
----------------------------------------------------------------------

In some cases where an instance of a type derived from an abstract
type is exported and the type of the instance is specified with the
attribute "xsi:type", you may need to specify the prefix for the
type.  Here are notes and an example on how to do that from the
comments in a module generated by `generateDS.py`::

    # Additionally, the generatedsnamespaces module can contain a python
    # dictionary named GenerateDSNamespaceTypePrefixes that associates element
    # types with the namespace prefixes that are to be added to the
    # "xsi:type" attribute value.  See the exportAttributes method of
    # any generated element type and the generation of "xsi:type" for an
    # example of the use of this table.
    # An example table:
    #
    #     # File: generatedsnamespaces.py
    #
    #     GenerateDSNamespaceTypePrefixes = {
    #         "ElementtypeC": "aaa:",
    #         "ElementtypeD": "bbb:",
    #     }


"One Per" -- generating separate files from imported/included schemas
=======================================================================

The ``generateDS.py`` project provides support for two approaches to
this task:

- The first (`Approach 1 -- Command line option
  --one-file-per-xsd`_, below) is likely to be easier to use, but if
  it does not work for you as is, it is very difficult to customize.

- The second method (`Approach 2 -- Extraction and generation
  utilities`_, below) may require a little more work and
  understanding, but offers more options and customization, and,
  since the scripts that implement it are short and rather simple,
  may be easier to customize or even re-write for your specific
  needs.


Approach 1 -- Command line option --one-file-per-xsd
----------------------------------------------------------

The ``--one-file-per-xsd`` command line option enables you to
generate a separate Python module for each XML schema that is
imported or included (using ``<xs:import>`` or ``<xs:include>``) by
a "master" schema.  Then, in your Python application, these modules
can then be imported separately.  Alternatively, these modules can
be placed in a Python package (a directory containing a file named
"__init__.py").  See
http://docs.python.org/2/tutorial/modules.html#packages for more on
Python packages.

Here is a sample use::

    $ ../generateDS.py --one-file-per-xsd --output-directory="OnePer" --module-suffix="One" one_per.xsd

The above command does the following:

- It generates one Python module for each XML schema that is
  included/imported by ``one_per.xsd``.

- It places the generated output files in the directory ``OnePer``.

- It adds "One" as a suffix to the name of each generated module.

Here are a few hints, guidelines, and suggestions:

- At least one element definition in an included/imported module
  must be a root element definition in order to cause a
  module to be generated for that schema.  In other words, the
  module must contain an element definition of the form::

      <xs:element name="Sample" type="sampleType" />

- You may want to write a separate "master" schema that includes
  each of the schemas for which you want to generate a separate
  Python module.

- Use the ``--output-directory=<directory>`` command line option to
  tell ``generateDS.py`` to generate the Python modules in a
  specific directory.  The directory must already exist.

- Use the ``--module-suffix=<suffix>`` command line option to add a
  specifc suffix to each module name (the part immediately before
  the extension).  For example, the option ``--module-suffix=Abc``
  causes ``generateDS.py`` to generate a file named "schema1Abc.py"
  instead of "schema1.py".

- If you want to import files from the output directory and it is
  not in ``sys.path``, add a file named "__init__.py" to that
  directory.  The existence of the file ``__init__.py`` turns the
  directory into a Python package.


Approach 2 -- Extraction and generation utilities
---------------------------------------------------

The ``generateds/utils`` subdirectory contains two utility scripts
that may help with this task.  The procedure is as follows:

1. First, use ``utils/collect_schema_locations.py`` to collect a set
   of directives, one for each (included) schema and each module to
   be generated.  This utility writes out a JSON file that contains
   the directives to be used in the next step.

2. Next, use ``utils/batch_generate.py`` to generate one module (or
   perhaps two modules, see below) for each directive in that JSON
   file.

Each of these modules gives a reasonable amount of usage information
in response to the ``--help`` command line option.

A few hints and suggestions:

- After generating the JSON directives file you can modify it with
  your text editor.  For example, (1) you can add the name of
  sub-class modules to be generated by ``generateDS.py``; (2) you
  can specify command line options to be used by ``generateDS.py``
  when generating specific modules; and (3) you can add new
  directives to generate additional modules.

- If you find yourself typing the same command line options to
  ``utils/batch_generate.py`` over and over, there is a facility to
  put command line options that have long names (i.e., not one
  character names) into a configuration file.  The usage information
  produced by ``utils/batch_generate.py --help`` shows an example.
  Then use ``utils/batch_generate.py --config=myoptins.config ...``
  to feed this configuration file to ``utils/batch_generate.py``.
  The options in this configuration file can be overridden by those
  entered on the command line.

- The JSON directives file can contain comments, even though this is
  not part of the JSON standard.  A comment is any line that begins
  with "//" where the "//" is proceeded only by white space
  characters.  ``utils/batch_generate.py`` strips these lines out
  before parsing the JSON file.  However, if you plan to process
  this JSON file with other processes, you will likely either not
  want to add comments or plan to pre-process them in some way.


How to modify the generated code
================================

This section attempts to explain how to modify and add features to
the generated code.


Adding features to class definitions
------------------------------------

You can add new member definitions to a generated class. Look at
the 'export' and 'exportLiteral' member functions for examples of
how to access member variables and how to walk nested
sub-elements.

Here are interesting places to look in each class definition:

- The 'export' and 'exportLiteral' methods -- These methods walk
  the object tree. You can consider copying and renaming them to
  produce other tree walking methods.

- The 'build' method -- These methods extract information from the
  minidom node. You can inspect the 'build' methods to learn how
  to extract information for other purposes.

And, if you need methods that are common to and shared by several
of the generated subclasses, you can put them in a new class and
add that class to the superclass list for each of your subclasses.

Although you can add your own methods to the generated
superclasses, I'm recommeding that you add methods to the
generated subclasses in the subclass module generated with the
"-s" command line option, and then edit the subclass module in
order to build your application. Why?

- The superclasses are cluttered with other code. Using the
  subclass file enables you to keep your application code
  separate.

- By putting your application code in the subclass file, you will
  be able to reuse the superclass file. You can generate multiple
  subclass files from the same XML Schema definition file. Each of
  these subclass files can import the same superclass file.

Here are some alternatives to using the subclass file:

- Add more than one method to each generated (super-)class. Each
  method implements a separate task or "application". If the
  number of tasks grows, this will create maintenance
  difficulties, however.

- Re-generate multiple (super-)class files. Add methods to the
  classes in these separate files to implement different tasks.
  This of course will not work well if you have had to modify the
  parser, for example, since generating the file.


Examples and demonstrations
===========================

Under the directory Demos are several examples:

- Demos/People provides a simple demonstration of generating
  Python data structures from XML Schema.

- Demos/Outline contains another simple example. Also provided (in
  outline_extended.py) is an example of extending and adding to
  the generating code. Look at the show method in classes outline
  and node in file outline_extended.py. This extension walks the
  outline tree and writes out a outline.

Suggested uses:

- Anything that requires a tree walk of the XML document
  structure.

- The implementation of filters and transformations on XML
  documents. The following paper discusses and compares this
  technique with the use of XSLT: `XSLT and generateDS --
  Analysis, Comparison, and Evaluation --
  http://www.reifywork.com/xsltvsgenerateds.html
  <http://www.reifywork.com/xsltvsgenerateds.html>`_.

- Anything that requires a *customized* tree walk of the XML
  document. Because you can add methods to the generated classes
  containing explicit control logic, the order in which nodes of
  the parsed XML document are visited is under your control.


Django -- Generating Models and Forms
---------------------------------------

``generateDS.py`` can be used to generate Django models and Django
forms that represent the data structures defined in an XML Schema.

**Note:** In order to use this capability, you must obtain the
"source" distribution of ``generateDS.py``.  You can do this either
(1) by downloading ``generateDS-x.xxy.tar.gz`` from the Python
Package Index or (2) by downloading the distribution from Bitbucket
at https://bitbucket.org/dkuhlman/generateds.  In particular,
installing ``generateDS.py`` using ``pip`` does not give you all the
files you need in order to use this capability.

*Note:* You only need to obtain the source distribution (so that you
can copy the files in the ``django/`` directory, for example); you
do not necessarily need to install from it.  If you have already
installed ``generateDS.py`` using ``pip`` or ``easy_install``, you
do not need to re-install from the source tree.

There are support files in the ``django`` directory in the
source distribution (but *not* in the version install using ``pip``
or ``easy_install``).

Here is an overview of the process:

- Step 1.  Generate bindings -- Run ``generateDS.py``.

- Step 2.  Extract simpleType definitions from schema -- Run
  ``gends_extract_simple_types.py``.

- Step 3.  Generate models.py and forms.py -- Run
  ``gends_generate_django.py``.

The script ``gends_run_gen_django.py`` performs these three steps.

In order to use the script ``gends_run_gen_django.py``, you may need
to tell it where the ``generateDS.py`` script is located.  If so,
use the "-p" command line option.  For more information, do::

    python gends_run_gen_django.py --help


How to generate Django models and forms
.........................................

**Warning:** Running this script attempts to over-write the
following files in the current directory:

- <schema>lib.py

- generateds_definedsimpletypes.py

- models.py

- forms.py

To over-write these files, use the ``-f`` (or ``--force``) command
line option.

So, it's a good idea to create a separate, new directory in which
to do the following work.

Now, follow these steps:

1. Create an empty directory::

       $ mkdir WorkDir
       $ cd WorkDir

2. Copy the files in from the sub directory ``django/`` in the of
   the source distribution of ``generateDS.py`` to the current
   directory::

       $ cp /my_sources/generateDS-n.nn/django/* .

3. Copy the file ``process_includes.py`` in the distribution to the
   current directory::

       $ cp /my_sources/generateDS-n.nn/process_includes.py .

4. Run the following::

       $ ./gends_run_gen_django.py myschema.xsd

   There are additional command line options for
   ``gends_run_gen_django.py``.  For help, run
   ``$ python gends_run_gen_django.py --help``.

5. Copy the generated files  ``models.py`` and ``forms.py`` to your
   Django application.


How it works
..............

Here are a few notes that might be helpful if and when you need to
do some debugging or extend the current capabilities or write a new
"meta-app" that uses the same approach but does something new and
even entirely different.

``gends_run_gen_django.py`` uses ``Popen`` to run other scripts,
specifically, it runs ``generateDS.py``,
``gends_extract_simple_types.py``, and
``gends_generate_django.py``.

``gends_extract_simple_types.py`` scans the XML schema doc and
extracts ``simpleType`` definitions.  It writes descriptors of those
definitions to the file ``generateds_definedsimpletypes.py``.

``gends_generate_django.py`` generates the ``models.py`` and
``forms.py`` files by calling the class method
``generate_model_`` for each class in the list of classes in
the variable ``__all__`` in the generated bindings.  ``__all__`` is
defined at the bottom of the generated bindings module.

The class method ``generate_model_`` (along with some tables for
predefined simple types etc) is defined in ``generatedssuper.py``,
which is imported by the generated bindings module.  We are
overriding the default version of that class.  ``generate_model_``
is defined in the class ``GeneratedsSuper``, which is used as the
root super class of all generated data representation classes.


Sample code and extensions
==========================

Capturing xs:date elements as dates
-----------------------------------

The following extension employs a user method (see `User
Methods`_) in order to capture elements defined as xs:date as date
objects.

Thanks to Lars Ericson for this code and explanation.

By default, ``generateDS.py`` treats elements declared as type
xs:date as though they are strings.

To get xs:dates stored as dates, in your local copy, add the
following user method (`User Methods`_), a slight modification of
the sample (in `gends_user_methods.py`_)::

    method1 = MethodSpec(name='walk_and_update',
        source='''\
        def walk_and_update(self):
            members = %(class_name)s.member_data_items_
            for member in members:
                obj1 = getattr(self, member.get_name())
                if member.get_data_type() == 'xs:date':
                    newvalue = date_calcs.date_from_string(obj1)
                    setattr(self, member.get_name(), newvalue)
                elif member.get_container():
                    for child in obj1:
                        if type(child) == types.InstanceType:
                            child.walk_and_update()
                else:
                    obj1 = getattr(self, member.get_name())
                    if type(obj1) == types.InstanceType:
                        obj1.walk_and_update()
    ''',
        class_names=r'^.*$',
        )

Then, define ``date_calcs.py`` as::

    #!/usr/bin/env python
    # -*- mode: pymode; coding: latin1; -*-

    import datetime

    # 2007-09-01

    # test="2007-09-01"
    # print test
    # print date_from_string(test)

    def date_from_string(str):
        year = int(str[:4])
        month = int(str[5:7])
        day = int(str[8:10])
        dt = datetime.date(year, month, day)
        return dt

And, add a "str" here in generateDS.py::

    def quote_xml(inStr):
        s1 = str(inStr)
        s1 = s1.replace('&', '&amp;')
        s1 = s1.replace('<', '&lt;')
        s1 = s1.replace('"', '&quot;')
        return s1

Also, add these imports to TEMPLATE_HEADER in generateDS.py::

    import date_calcs
    import types


Limitations of generateDS
=========================

XML Schema limitations
----------------------

There are things in Xschema that are not supported. You will have
to use a restricted sub-set of Xschema to define your data
structures. See above for supported features. See people.xsd and
people.xml for examples.

And, then, try it on your XML Schema, and let me know about what
does not work.


Includes -- The XML schema xs:include and xs:import elements
==============================================================

While ``generateDS.py`` itself does not process XML Schema
``include`` elements, the distribution provides a script
``process_includes.py`` that can be used as a preprocessor.
``process_includes.py`` is called automatically and by default by
``generateDS.py``.  This behavior can be turned off with the
``--no-process-includes`` command line option.  **However**, doing
so is not advised, because unexpected and undesirable behavior has
been detected this is done.  Instead consider using the
``--no-collect-includes`` and ``--no-redefine-groups`` command line
options to selectively turn of specific processing done in
``process_includes.py``.

The ``process_includes.py`` script scans your XML Schema document
and, recursively, documents that are included looking for
``include`` elements; it inserts all content into a single document,
which it writes out.

Here are samples of how you might use ``process_includes.py``, if
your schema contains ``include`` elements.

Example 1::

    $ python process_includes.py definitions1.xsd | \
    $ python generateDS.py -f --super=task1sup -o task1sup.py -s task1sub.py -

Example 2::

    $ python process_includes.py definitions1.xsd tmp.xsd
    $ python generateDS.py -f --super=task1sup -o task1sup.py -s task1sub.py tmp.xsd

For help and usage information, run the following::

    $ python process_includes.py --help


Processing RelaxNG schemas
============================

RelaxNG is a schema definition language and is an alternative to XML
Schema.  For more information on RelaxNG, see: http://relaxng.org/.

``generateDS.py`` does not understand or process RelaxNG schemas.
However, the ``trang`` application is able to convert RelaxNG into
XML Schemas.  I've tried it, and was able to convert a relatively
small RelaxNG schema into an XML Schema, and then use
``generateDS.py`` to generate a bindings module from that.  I have
not done any serious testing to determine how complete or accurate
this conversion is.  ``trang`` is written in Java, so you will need
Java and the JDK installed in order to compile and use it.  For what
it's worth, here are the steps I followed in order to use ``trang``:

1. Clone trang from https://github.com/relaxng/jing-trang.git,
   and, then built it with::

       $ git clone https://github.com/relaxng/jing-trang.git
       $ cd jing-trang
       # set JAVA_HOME to location of JDK
       $ export JAVA_HOME=/usr/lib/jvm/jdk-8-oracle-arm32-vfp-hflt
       $ ./ant

2. Run ``trang`` with the following::

       $ cd jing-trang/build
       $ java -jar trang.jar -I rng -O xsd test01.rng test01.xsd

   The above command produced test01.xsd.

3. Run the resulting XML Schema (``test01.xsd``, in the above case)
   through ``generateDS.py``:

       $ generateDS.py -o test01sup.py -s test01sub.py test01.xsd

You can learn more about ``trang`` at the RelaxNG web site and here:
https://github.com/relaxng/jing-trang


Acknowledgments
=================

Many thanks to those who have used ``generateDS.py`` and have
contributed their comments and suggestions.  These comments have
been valuable both in teaching me about things I needed to know in
order to continue work and in motivating me to do the work in the
first place.

And, a special thanks to those of you who have contributed patches
for fixes and new features.  Recent help has been provided by the
following among others:

- Chris Allan -- for several feature additions.


See also
========

`Python`_: The Python home page.

.. _`Python`: http://www.python.org


`Dave's Page`_: My home page, which contains more Python stuff.

.. _`Dave's Page`: http://www.reifywork.com

.. _`ElementTree`: http://effbot.org/zone/element-index.htm

.. _`lxml`: http://lxml.de/index.html

.. vim:ft=rst: