Author: | Dave Kuhlman |
---|---|
Address: | dkuhlman (at) davekuhlman (dot) org http://www.reifywork.com |
Revision: | 1.1a |
Date: | October 05, 2014 |
Copyright: | Copyright (c) 2003 Dave Kuhlman. All Rights Reserved. This software is subject to the provisions of the MIT License http://www.opensource.org/licenses/mit-license.php. |
---|---|
Abstract: | This document is a self-learning document for a second course in Python programming. This course contains discussions of several advanced topics that are of interest to Python programmers. |
Contents
This document is intended as notes for a course on (slightly) advanced Python topics.
For more help on regular expressions, see:
A regular expression pattern is a sequence of characters that will match sequences of characters in a target.
The patterns or regular expressions can be defined as follows:
Because of the use of backslashes in patterns, you are usually better off defining regular expressions with raw strings, e.g. r"abc".
When a regular expression is to be used more than once, you should consider compiling it. For example:
import sys, re pat = re.compile('aa[bc]*dd') while 1: line = raw_input('Enter a line ("q" to quit):') if line == 'q': break if pat.search(line): print 'matched:', line else: print 'no match:', line
Comments:
Use match() to match at the beginning of a string (or not at all).
Use search() to search a string and match the first string from the left.
Here are some examples:
>>> import re >>> pat = re.compile('aa[0-9]*bb') >>> x = pat.match('aa1234bbccddee') >>> x <_sre.SRE_Match object at 0x401e9608> >>> x = pat.match('xxxxaa1234bbccddee') >>> x >>> type(x) <type 'NoneType'> >>> x = pat.search('xxxxaa1234bbccddee') >>> x <_sre.SRE_Match object at 0x401e9608>
Notes:
When a match or search is successful, it returns a match object. When it fails, it returns None.
You can also call the corresponding functions match and search in the re module, e.g.:
>>> x = re.search(pat, 'xxxxaa1234bbccddee') >>> x <_sre.SRE_Match object at 0x401e9560>
For a list of functions in the re module, see Module Contents -- http://docs.python.org/library/re.html#module-contents.
Match objects enable you to extract matched sub-strings after performing a match. A match object is returned by successful match. The part of the target available in the match object is the portion matched by groups in the pattern, that is the portion of the pattern inside parentheses. For example:
In [69]: mo = re.search(r'height: (\d*) width: (\d*)', 'height: 123 width: 456') In [70]: mo.groups() Out[70]: ('123', '456')
Here is another example:
import sys, re Targets = [ 'There are <<25>> sparrows.', 'I see <<15>> finches.', 'There is nothing here.', ] def test(): pat = re.compile('<<([0-9]*)>>') for line in Targets: mo = pat.search(line) if mo: value = mo.group(1) print 'value: %s' % value else: print 'no match' test()
When we run the above, it prints out the following:
value: 25 value: 15 no match
Explanation:
In addition, you can:
Use "values = mo.groups()" to get a tuple containing the strings matched by all groups.
Use "mo.expand()" to interpolate the group values into a string. For example, "mo.expand(r'value1: \1 value2: \2')"inserts the values of the first and second group into a string. If the first group matched "aaa" and the second matched "bbb", then this example would produce "value1: aaa value2: bbb". For example:
In [76]: mo = re.search(r'h: (\d*) w: (\d*)', 'h: 123 w: 456') In [77]: mo.expand(r'Height: \1 Width: \2') Out[77]: 'Height: 123 Width: 456'
You can extract multiple items with a single search. Here is an example:
import sys, re pat = re.compile('aa([0-9]*)bb([0-9]*)cc') while 1: line = raw_input('Enter a line ("q" to quit):') if line == 'q': break mo = pat.search(line) if mo: value1, value2 = mo.group(1, 2) print 'value1: %s value2: %s' % (value1, value2) else: print 'no match'
Comments:
A simple way to perform multiple replacements using a regular expression is to use the re.subn() function. Here is an example:
In [81]: re.subn(r'\d+', '***', 'there are 203 birds sitting in 2 trees') Out[81]: ('there are *** birds sitting in *** trees', 2)
For more complex replacements, use a function instead of a constant replacement string:
import re def repl_func(mo): s1 = mo.group(1) s2 = '*' * len(s1) return s2 def test(): pat = r'(\d+)' in_str = 'there are 2034 birds in 21 trees' out_str, count = re.subn(pat, repl_func, in_str) print 'in: "%s"' % in_str print 'out: "%s"' % out_str print 'count: %d' % count test()
And when we run the above, it produces:
in: "there are 2034 birds in 21 trees" out: "there are **** birds in ** trees" count: 2
Notes:
Here is an even more complex example -- You can locate sub-strings (slices) of a match and replace them:
import sys, re pat = re.compile('aa([0-9]*)bb([0-9]*)cc') while 1: line = raw_input('Enter a line ("q" to quit): ') if line == 'q': break mo = pat.search(line) if mo: value1, value2 = mo.group(1, 2) start1 = mo.start(1) end1 = mo.end(1) start2 = mo.start(2) end2 = mo.end(2) print 'value1: %s start1: %d end1: %d' % (value1, start1, end1) print 'value2: %s start2: %d end2: %d' % (value2, start2, end2) repl1 = raw_input('Enter replacement #1: ') repl2 = raw_input('Enter replacement #2: ') newline = (line[:start1] + repl1 + line[end1:start2] + repl2 + line[end2:]) print 'newline: %s' % newline else: print 'no match'
Explanation:
Alternatively, use "mo.span(1)" instead of "mo.start(1)" and "mo.end(1)" in order to get the start and end of a sub-match in a single operation. "mo.span(1)"returns a tuple: (start, end).
Put together a new string with string concatenation from pieces of the original string and replacement values. You can use string slices to get the sub-strings of the original string. In our case, the following gets the start of the string, adds the first replacement, adds the middle of the original string, adds the second replacement, and finally, adds the last part of the original string:
newline = line[:start1] + repl1 + line[end1:start2] + repl2 + line[end2:]
You can also use the sub function or method to do substitutions. Here is an example:
import sys, re pat = re.compile('[0-9]+') print 'Replacing decimal digits.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break repl = raw_input('Enter a replacement: ') result = pat.sub(repl, target) print 'result: %s' % result
Here is another example of the use of a function to insert calculated replacements.
import sys, re, string pat = re.compile('[a-m]+') def replacer(mo): return string.upper(mo.group(0)) print 'Upper-casing a-m.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break result = pat.sub(replacer, target) print 'result: %s' % result
Notes:
This is also a convenient use for a lambda instead of a named function, for example:
import sys, re, string pat = re.compile('[a-m]+') print 'Upper-casing a-m.' while 1: target = raw_input('Enter a target line ("q" to quit): ') if target == 'q': break result = pat.sub( lambda mo: string.upper(mo.group(0)), target) print 'result: %s' % result
Note 1: You will need a sufficiently recent version of Python in order to use iterators and generators. I believe that they were introduced in Python 2.2.
Note 2: The iterator protocol has changed slightly in Python version 3.0.
Goals for this section:
Definitions:
A few additional basic points:
This section attempts to provide examples that illustrate the generator/iterator pattern.
Why is this important?
Examples - The remainder of this section provides a set of examples which implement and use iterators.
This function contains a yield statement. Therefore, when we call it, it produces an iterator:
def generateItems(seq): for item in seq: yield 'item: %s' % item anIter = generateItems([]) print 'dir(anIter):', dir(anIter) anIter = generateItems([111,222,333]) for x in anIter: print x anIter = generateItems(['aaa', 'bbb', 'ccc']) print anIter.next() print anIter.next() print anIter.next() print anIter.next()
Running this example produces the following output:
dir(anIter): ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__iter__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'gi_frame', 'gi_running', 'next'] item: 111 item: 222 item: 333 item: aaa item: bbb item: ccc Traceback (most recent call last): File "iterator_generator.py", line 14, in ? print anIter.next() StopIteration
Notes and explanation:
An alternative and perhaps simpler way to create an interator is to use a generator expression. This can be useful when you already have a collection or iterator to work with.
Then following example implements a function that returns a generator object. The effect is to generate the objects in a collection which excluding items in a separte collection:
DATA = [ 'lemon', 'lime', 'grape', 'apple', 'pear', 'watermelon', 'canteloupe', 'honeydew', 'orange', 'grapefruit', ] def make_producer(collection, excludes): gen = (item for item in collection if item not in excludes) return gen def test(): iter1 = make_producer(DATA, ('apple', 'orange', 'honeydew', )) print '%s' % iter1 for fruit in iter1: print fruit test()
When run, this example produces the following:
$ python workbook063.py <generator object <genexpr> at 0x7fb3d0f1bc80> lemon lime grape pear watermelon canteloupe grapefruit
Notes:
Each time this method is called, it produces a (new) iterator object. This method is analogous to the iterkeys and itervalues methods in the dictionary built-in object:
# # A class that provides an iterator generator method. # class Node: def __init__(self, name='<noname>', value='<novalue>', children=None): self.name = name self.value = value self.children = children if children is None: self.children = [] else: self.children = children def set_name(self, name): self.name = name def get_name(self): return self.name def set_value(self, value): self.value = value def get_value(self): return self.value def iterchildren(self): for child in self.children: yield child # # Print information on this node and walk over all children and # grandchildren ... def walk(self, level=0): print '%sname: %s value: %s' % ( get_filler(level), self.get_name(), self.get_value(), ) for child in self.iterchildren(): child.walk(level + 1) # # An function that is the equivalent of the walk() method in # class Node. # def walk(node, level=0): print '%sname: %s value: %s' % ( get_filler(level), node.get_name(), node.get_value(), ) for child in node.iterchildren(): walk(child, level + 1) def get_filler(level): return ' ' * level def test(): a7 = Node('gilbert', '777') a6 = Node('fred', '666') a5 = Node('ellie', '555') a4 = Node('daniel', '444') a3 = Node('carl', '333', [a4, a5]) a2 = Node('bill', '222', [a6, a7]) a1 = Node('alice', '111', [a2, a3]) # Use the walk method to walk the entire tree. print 'Using the method:' a1.walk() print '=' * 30 # Use the walk function to walk the entire tree. print 'Using the function:' walk(a1) test()
Running this example produces the following output:
Using the method: name: alice value: 111 name: bill value: 222 name: fred value: 666 name: gilbert value: 777 name: carl value: 333 name: daniel value: 444 name: ellie value: 555 ============================== Using the function: name: alice value: 111 name: bill value: 222 name: fred value: 666 name: gilbert value: 777 name: carl value: 333 name: daniel value: 444 name: ellie value: 555
Notes and explanation:
This class implements the iterator protocol. Therefore, instances of this class are iterators. The presence of the next() and __iter__() methods means that this class implements the iterator protocol and makes instances of this class iterators.
Note that when an iterator is "exhausted" it, normally, cannot be reused to iterate over the sequence. However, in this example, we provide a refresh method which enables us to "rewind" and reuse the iterator instance:
# # An iterator class that does *not* use ``yield``. # This iterator produces every other item in a sequence. # class IteratorExample: def __init__(self, seq): self.seq = seq self.idx = 0 def next(self): self.idx += 1 if self.idx >= len(self.seq): raise StopIteration value = self.seq[self.idx] self.idx += 1 return value def __iter__(self): return self def refresh(self): self.idx = 0 def test_iteratorexample(): a = IteratorExample('edcba') for x in a: print x print '----------' a.refresh() for x in a: print x print '=' * 30 a = IteratorExample('abcde') try: print a.next() print a.next() print a.next() print a.next() print a.next() print a.next() except StopIteration, e: print 'stopping', e test_iteratorexample()
Running this example produces the following output:
d b ---------- d b ============================== b d stopping
Notes and explanation:
There may be times when the next method is easier and more straight-forward to implement using yield. If so, then this class might serve as an model. If you do not feel the need to do this, then you should ignore this example:
# # An iterator class that uses ``yield``. # This iterator produces every other item in a sequence. # class YieldIteratorExample: def __init__(self, seq): self.seq = seq self.iterator = self._next() self.next = self.iterator.next def _next(self): flag = 0 for x in self.seq: if flag: flag = 0 yield x else: flag = 1 def __iter__(self): return self.iterator def refresh(self): self.iterator = self._next() self.next = self.iterator.next def test_yielditeratorexample(): a = YieldIteratorExample('edcba') for x in a: print x print '----------' a.refresh() for x in a: print x print '=' * 30 a = YieldIteratorExample('abcde') try: print a.next() print a.next() print a.next() print a.next() print a.next() print a.next() except StopIteration, e: print 'stopping', e test_yielditeratorexample()
Running this example produces the following output:
d b ---------- d b ============================== b d stopping
Notes and explanation:
Because the _next method uses yield, calling it (actually, calling the iterator object it produces) in an iterator context causes it to be "resumed" immediately after the yield statement. This reduces bookkeeping a bit.
However, with this style, we must explicitly produce an iterator. We do this by calling the _next method, which contains a yield statement, and is therefore a generator. The following code in our constructor (__init__) completes the set-up of our class as an iterator class:
self.iterator = self._next() self.next = self.iterator.next
Remember that we need both __iter__() and next() methods in YieldIteratorExample to satisfy the iterator protocol. The __iter__() method is already there and the above code in the constructor creates the next() method.
A list comprehension looks a bit like an iterator, but it produces a list. See: The Python Language Reference: List displays -- http://docs.python.org/reference/expressions.html#list-displays for more on list comprehensions.
Here is an example:
In [4]: def f(x): ...: return x * 3 ...: In [5]: list1 = [11, 22, 33] In [6]: list2 = [f(x) for x in list1] In [7]: print list2 [33, 66, 99]
A generator expression looks quite similar to a list comprehension, but is enclosed in parentheses rather than square brackets. Unlike a list comprehension, a generator expression does not produce a list; it produces an generator object. A generator object is an iterator.
For more on generator expressions, see The Python Language Reference: Generator expressions -- http://docs.python.org/reference/expressions.html#generator-expressions.
The following example uses a generator expression to produce an iterator:
mylist = range(10) def f(x): return x*3 genexpr = (f(x) for x in mylist) for x in genexpr: print x
Notes and explanation:
Unit test and the Python unit test framework provide a convenient way to define and run tests that ensure that a Python application produces specified results.
This section, while it will not attempt to explain everything about the unit test framework, will provide examples of several straight-forward ways to construct and run tests.
Some assumptions:
In the test class, implement a number of methods to perform your tests. Name your test methods with the prefix "test". Here is an example:
import unittest class MyTest(unittest.TestCase): def test_one(self): # some test code pass def test_two(self): # some test code pass
Create a test harness. Here is an example:
import unittest # make the test suite. def suite(): loader = unittest.TestLoader() testsuite = loader.loadTestsFromTestCase(MyTest) return testsuite # Make the test suite; run the tests. def test(): testsuite = suite() runner = unittest.TextTestRunner(sys.stdout, verbosity=2) result = runner.run(testsuite)
Here is a more complete example:
import sys, StringIO, string import unittest import webserv_example_heavy_sub # A comparison function for case-insenstive sorting. def mycmpfunc(arg1, arg2): return cmp(string.lower(arg1), string.lower(arg2)) class XmlTest(unittest.TestCase): def test_import_export1(self): inFile = file('test1_in.xml', 'r') inContent = inFile.read() inFile.close() doc = webserv_example_heavy_sub.parseString(inContent) outFile = StringIO.StringIO() outFile.write('<?xml version="1.0" ?>\n') doc.export(outFile, 0) outContent = outFile.getvalue() outFile.close() self.failUnless(inContent == outContent) # make the test suite. def suite(): loader = unittest.TestLoader() # Change the test method prefix: test --> trial. #loader.testMethodPrefix = 'trial' # Change the comparison function that determines the order of tests. #loader.sortTestMethodsUsing = mycmpfunc testsuite = loader.loadTestsFromTestCase(XmlTest) return testsuite # Make the test suite; run the tests. def test_main(): testsuite = suite() runner = unittest.TextTestRunner(sys.stdout, verbosity=2) result = runner.run(testsuite) if __name__ == "__main__": test_main()
Running the above script produces the following output:
test_import_export (__main__.XmlTest) ... ok ---------------------------------------------------------------------- Ran 1 test in 0.035s OK
A few notes on this example:
This example tests the ability to parse an xml document test1_in.xml and export that document back to XML. The test succeeds if the input XML document and the exported XML document are the same.
The code which is being tested parses an XML document returned by a request to Amazon Web services. You can learn more about Amazon Web services at: http://www.amazon.com/webservices. This code was generated from an XML Schema document by generateDS.py. So we are in effect, testing generateDS.py. You can find generateDS.py at: http://www.reifywork.com/#generateds-py.
Testing for success/failure and reporting failures -- Use the methods listed at http://www.python.org/doc/current/lib/testcase-objects.html to test for and report success and failure. In our example, we used "self.failUnless(inContent == outContent)" to ensure that the content we parsed and the content that we exported were the same.
Add additional tests by adding methods whose names have the prefix "test". If you prefer a different prefix for tests names, add something like the following to the above script:
loader.testMethodPrefix = 'trial'
By default, the tests are run in the order of their names sorted by the cmp function. So, if needed, you can control the order of execution of tests by selecting their names, for example, using names like test_1_checkderef, test_2_checkcalc, etc. Or, you can change the comparison function by adding something like the following to the above script:
loader.sortTestMethodsUsing = mycmpfunc
As a bit of motivation for creating and using unit tests, while developing this example, I discovered several errors (or maybe "special features") in generateDS.py.
Extending vs. embedding -- They are different but related:
Documentation -- The two important sources for information about extending and embedding are the following:
Types of extensions:
Tools -- There are several tools that support the development of Python extensions:
Writing an extension module by hand -- What to do:
Implementing a wrapper function -- What to do:
Capture the arguments with PyArg_ParseTuple. The format string specifies how arguments are to be converted and captured. See 1.7 Extracting Parameters in Extension Functions. Here are some of the most commonly used types:
Use "i", "s", "f", etc to convert and capture simple types such as integers, strings, floats, etc.
Use "O" to get a pointer to Python "complex" types such as lists, tuples, dictionaries, etc.
Use items in parentheses to capture and unpack sequences (e.g. lists and tuples) of fixed length. Example:
if (!PyArg_ParseTuple(args, "(ii)(ii)", &x, &y, &width, &height)) { return NULL; } /* if */
A sample call might be:
lowerLeft = (x1, y1) extent = (width1, height1) scan(lowerLeft, extent)
Use ":aName" (colon) at the end of the format string to provide a function name for error messages. Example:
if (!PyArg_ParseTuple(args, "O:setContentHandler", &pythonInstance)) { return NULL; } /* if */
Use ";an error message" (semicolon) at the end of the format string to provide a string that replaces the default error message.
Docs are available at: http://www.python.org/doc/current/ext/parseTuple.html.
Write the logic.
Handle errors and exceptions -- You will need to understand how to (1) clearing errors and exceptions and (2) Raise errors (exceptions).
Many functions in the Python C API raise exceptions. You will need to check for and clear these exceptions. Here is an example:
char * message; int messageNo; message = NULL; messageNo = -1; /* Is the argument a string? */ if (! PyArg_ParseTuple(args, "s", &message)) { /* It's not a string. Clear the error. * Then try to get a message number (an integer). */ PyErr_Clear(); if (! PyArg_ParseTuple(args, "i", &messageNo)) { o o o
You can also raise exceptions in your C code that can be caught (in a "try:except:" block) back in the calling Python code. Here is an example:
if (n == 0) { PyErr_SetString(PyExc_ValueError, "Value must not be zero"); return NULL; }
See Include/pyerrors.h in the Python source distribution for more exception/error types.
And, you can test whether a function in the Python C API that you have called has raised an exception. For example:
if (PyErr_Occurred()) { /* An exception was raised. * Do something about it. */ o o o
For more documentation on errors and exceptions, see: http://www.python.org/doc/current/api/exceptionHandling.html.
Create and return a value:
Note: Our discussion and examples are for SWIG version 1.3
SWIG will often enable you to generate wrappers for functions in an existing C function library. SWIG does not understand everything in C header files. But it does a fairly impressive job. You should try it first before resorting to the hard work of writing wrappers by hand.
More information on SWIG is at http://www.swig.org.
Here are some steps that you can follow:
Create an interface file -- Even when you are wrapping functions defined in an existing header file, creating an interface file is a good idea. Include your existing header file into it, then add whatever else you need. Here is an extremely simple example of a SWIG interface file:
%module MyLibrary %{ #include "MyLibrary.h" %} %include "MyLibrary.h"
Comments:
The "%{" and "%}" brackets are directives to SWIG. They say: "Add the code between these brackets to the generated wrapper file without processing it.
The "%include" statement says: "Copy the file into the interface file here. In effect, you are asking SWIG to generate wrappers for all the functions in this header file. If you want wrappers for only some of the functions in a header file, then copy or reproduce function declarations for the desired functions here. An example:
%module MyLibrary %{ #include "MyLibrary.h" %} int calcArea(int width, int height); int calcVolume(int radius);
This example will generate wrappers for only two functions.
You can find more information about the directives that are used in SWIG interface files in the SWIG User Manual, in particular at:
Generate the wrappers:
swig -python MyLibrary.i
Compile and link the library. On Linux, you can use something like the following:
gcc -c MyLibrary.c gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
Note that we produce a shared library whose name is the module name prefixed with an underscore. SWIG also generates a .py file, without the leading underscore, which we will import from our Python code and which, in turn, imports the shared library.
Use the extension module in your python code:
Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import MyLibrary >>> MyLibrary.calcArea(4.0, 5.0) 20.0
Here is a makefile that will execute swig to generate wrappers, then compile and link the extension.
CFLAGS = -I/usr/local/include/python2.3
all: _MyLibrary.so
- _MyLibrary.so: MyLibrary.o MyLibrary_wrap.o
- gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
- MyLibrary.o: MyLibrary.c
- gcc -c MyLibrary.c -o MyLibrary.o
- MyLibrary_wrap.o: MyLibrary_wrap.c
- gcc -c ${CFLAGS} MyLibrary_wrap.c -o MyLibrary_wrap.o
- MyLibrary_wrap.c: MyLibrary.i
- swig -python MyLibrary.i
- clean:
- rm -f MyLibrary.py MyLibrary.o MyLibrary_wrap.c
- MyLibrary_wrap.o _MyLibrary.so
Here is an example of running this makefile:
$ make -f MyLibrary_makefile clean rm -f MyLibrary.py MyLibrary.o MyLibrary_wrap.c \ MyLibrary_wrap.o _MyLibrary.so $ make -f MyLibrary_makefile gcc -c MyLibrary.c -o MyLibrary.o swig -python MyLibrary.i gcc -c -I/usr/local/include/python2.3 MyLibrary_wrap.c -o MyLibrary_wrap.o gcc -shared MyLibrary.o MyLibrary_wrap.o -o _MyLibrary.so
And, here are C source files that can be used in our example.
MyLibrary.h:
/* MyLibrary.h */ float calcArea(float width, float height); float calcVolume(float radius); int getVersion(); int getMode();
MyLibrary.c:
/* MyLibrary.c */ float calcArea(float width, float height) { return (width * height); } float calcVolume(float radius) { return (3.14 * radius * radius); } int getVersion() { return 123; } int getMode() { return 1; }
Pyrex is a useful tool for writing Python extensions. Because the Pyrex language is similar to Python, writing extensions in Pyrex is easier than doing so in C. Cython appears to be the a newer version of Pyrex.
More information is on Pyrex and Cython is at:
Here is a simple function definition in Pyrex:
# python_201_pyrex_string.pyx import string def formatString(object s1, object s2): s1 = string.strip(s1) s2 = string.strip(s2) s3 = '<<%s||%s>>' % (s1, s2) s4 = s3 * 4 return s4
And, here is a make file:
CFLAGS = -DNDEBUG -O3 -Wall -Wstrict-prototypes -fPIC \ -I/usr/local/include/python2.3 all: python_201_pyrex_string.so python_201_pyrex_string.so: python_201_pyrex_string.o gcc -shared python_201_pyrex_string.o -o python_201_pyrex_string.so python_201_pyrex_string.o: python_201_pyrex_string.c gcc -c ${CFLAGS} python_201_pyrex_string.c -o python_201_pyrex_string.o python_201_pyrex_string.c: python_201_pyrex_string.pyx pyrexc python_201_pyrex_string.pyx clean: rm -f python_201_pyrex_string.so python_201_pyrex_string.o \ python_201_pyrex_string.c
Here is another example. In this one, one function in the .pyx file calls another. Here is the implementation file:
# python_201_pyrex_primes.pyx def showPrimes(int kmax): plist = primes(kmax) for p in plist: print 'prime: %d' % p cdef primes(int kmax): cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result
And, here is a make file:
#CFLAGS = -DNDEBUG -g -O3 -Wall -Wstrict-prototypes -fPIC # -I/usr/local/include/python2.3 CFLAGS = -DNDEBUG -I/usr/local/include/python2.3
all: python_201_pyrex_primes.so
- python_201_pyrex_primes.so: python_201_pyrex_primes.o
- gcc -shared python_201_pyrex_primes.o -o python_201_pyrex_primes.so
- python_201_pyrex_primes.o: python_201_pyrex_primes.c
- gcc -c ${CFLAGS} python_201_pyrex_primes.c -o python_201_pyrex_primes.o
- python_201_pyrex_primes.c: python_201_pyrex_primes.pyx
- pyrexc python_201_pyrex_primes.pyx
- clean:
- rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o
- python_201_pyrex_primes.c
Here is the output from running the makefile:
$ make -f python_201_pyrex_makeprimes clean rm -f python_201_pyrex_primes.so python_201_pyrex_primes.o \ python_201_pyrex_primes.c $ make -f python_201_pyrex_makeprimes pyrexc python_201_pyrex_primes.pyx gcc -c -DNDEBUG -I/usr/local/include/python2.3 python_201_pyrex_primes.c -o python_201_pyrex_primes.o gcc -shared python_201_pyrex_primes.o -o python_201_pyrex_primes.so
Here is an interactive example of its use:
$ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_primes >>> dir(python_201_pyrex_primes) ['__builtins__', '__doc__', '__file__', '__name__', 'showPrimes'] >>> python_201_pyrex_primes.showPrimes(5) prime: 2 prime: 3 prime: 5 prime: 7 prime: 11
This next example shows how to use Pyrex to implement a new extension type, that is a new Python built-in type. Notice that the class is declared with the cdef keyword, which tells Pyrex to generate the C implementation of a type instead of a class.
Here is the implementation file:
# python_201_pyrex_clsprimes.pyx """An implementation of primes handling class for a demonstration of Pyrex. """ cdef class Primes: """A class containing functions for handling primes. """ def showPrimes(self, int kmax): """Show a range of primes. Use the method primes() to generate the primes. """ plist = self.primes(kmax) for p in plist: print 'prime: %d' % p def primes(self, int kmax): """Generate the primes in the range 0 - kmax. """ cdef int n, k, i cdef int p[1000] result = [] if kmax > 1000: kmax = 1000 k = 0 n = 2 while k < kmax: i = 0 while i < k and n % p[i] <> 0: i = i + 1 if i == k: p[k] = n k = k + 1 result.append(n) n = n + 1 return result
And, here is a make file:
CFLAGS = -DNDEBUG -I/usr/local/include/python2.3 all: python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.so: python_201_pyrex_clsprimes.o gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o: python_201_pyrex_clsprimes.c gcc -c ${CFLAGS} python_201_pyrex_clsprimes.c -o python_201_pyrex_clsprimes.o python_201_pyrex_clsprimes.c: python_201_pyrex_clsprimes.pyx pyrexc python_201_pyrex_clsprimes.pyx clean: rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \ python_201_pyrex_clsprimes.c
Here is output from running the makefile:
$ make -f python_201_pyrex_makeclsprimes clean rm -f python_201_pyrex_clsprimes.so python_201_pyrex_clsprimes.o \ python_201_pyrex_clsprimes.c $ make -f python_201_pyrex_makeclsprimes pyrexc python_201_pyrex_clsprimes.pyx gcc -c -DNDEBUG -I/usr/local/include/python2.3 python_201_pyrex_clsprimes.c -o python_201_pyrex_clsprimes.o gcc -shared python_201_pyrex_clsprimes.o -o python_201_pyrex_clsprimes.so
And here is an interactive example of its use:
$ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_clsprimes >>> dir(python_201_pyrex_clsprimes) ['Primes', '__builtins__', '__doc__', '__file__', '__name__'] >>> primes = python_201_pyrex_clsprimes.Primes() >>> dir(primes) ['__class__', '__delattr__', '__doc__', '__getattribute__', '__hash__', '__init__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__str__', 'primes', 'showPrimes'] >>> primes.showPrimes(4) prime: 2 prime: 3 prime: 5 prime: 7
Documentation -- Also notice that Pyrex preserves the documentation for the module, the class, and the methods in the class. You can show this documentation with pydoc, as follows:
$ pydoc python_201_pyrex_clsprimes
Or, in Python interactive mode, use:
$ python Python 2.3b1 (#1, Apr 25 2003, 20:36:09) [GCC 2.95.4 20011002 (Debian prerelease)] on linux2 Type "help", "copyright", "credits" or "license" for more information. >>> import python_201_pyrex_clsprimes >>> help(python_201_pyrex_clsprimes)
Choose SWIG when:
Choose Pyrex when:
Here is a simple example that uses Cython to wrap a function implemented in C.
First the C header file:
/* test_c_lib.h */ int calculate(int width, int height);
And, the C implementation file:
/* test_c_lib.c */ #include "test_c_lib.h" int calculate(int width, int height) { int result; result = width * height * 3; return result; }
Here is a Cython file that calls our C function:
# test_c.pyx # Declare the external C function. cdef extern from "test_c_lib.h": int calculate(int width, int height) def test(w, h): # Call the external C function. result = calculate(w, h) print 'result from calculate: %d' % result
We can compile our code using this script (on Linux):
#!/bin/bash -x cython test_c.pyx gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c.o test_c.c gcc -c -fPIC -I/usr/local/include/python2.6 -o test_c_lib.o test_c_lib.c gcc -shared -fPIC -I/usr/local/include/python2.6 -o test_c.so test_c.o test_c_lib.o
Here is a small Python file that uses the wrapper that we wrote in Cython:
# run_test_c.py import test_c def test(): test_c.test(4, 5) test_c.test(12, 15) if __name__ == '__main__': test()
And, when we run it, we see the following:
$ python run_test_c.py result from calculate: 60 result from calculate: 540
The goal -- A new built-in data type for Python.
Existing examples -- Objects/listobject.c, Objects/stringobject.c, Objects/dictobject.c, etc in the Python source code distribution.
In older versions of the Python source code distribution, a template for the C code was provided in Objects/xxobject.c. Objects/xxobject.c is no longer included in the Python source code distribution. However:
And, you can use Pyrex to generate a new built-in type. To do so, implement a Python/Pyrex class and declare the class with the Pyrex keyword cdef. In fact, you may want to use Pyrex to generate a minimal extension type, and then edit that generated code to insert and add functionality by hand. See the Pyrex section for an example.
Pyrex also goes some way toward giving you access to (existing) C structs and functions from Python.
Extension classes the easy way -- SWIG shadow classes.
Start with an implementation of a C++ class and its header file.
Use the following SWIG flags:
swig -c++ -python mymodule.i
More information is available with the SWIG documentation at: http://www.swig.org/Doc1.3/Python.html.
Extension classes the Pyrex way -- An alternatie is to use Pyrex to compile a class definition that does not have the cdef keyword. Using cdef on the class tells Pyrex to generate an extension type instead of a class. You will have to determine whether you want an extension class or an extension type.
Python is an excellent language for text analysis.
In some cases, simply splitting lines of text into words will be enough. In these cases use string.split().
In other cases, regular expressions may be able to do the parsing you need. If so, see the section on regular expressions in this document.
However, in some cases, more complex analysis of input text is required. This section describes some of the ways that Python can help you with this complex parsing and analysis.
There are a number of special purpose parsers which you will find in the Python standard library:
XML parsers and XML tools -- There is lots of support for parsing and processing XML in Python. Here are a few places to look for support:
For simple grammars, this is not so hard.
You will need to implement:
As an example, we'll implement a recursive descent parser written in Python for the following grammer:
Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word>
Here is an implementation of a recursive descent parser for the above grammar:
#!/usr/bin/env python """ A recursive descent parser example. Usage: python rparser.py [options] <inputfile> Options: -h, --help Display this help message. Example: python rparser.py myfile.txt The grammar: Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word> """ import sys import string import types import getopt # # To use the IPython interactive shell to inspect your running # application, uncomment the following lines: # ## from IPython.Shell import IPShellEmbed ## ipshell = IPShellEmbed((), ## banner = '>>>>>>>> Into IPython >>>>>>>>', ## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<') # # Then add the following line at the point in your code where # you want to inspect run-time values: # # ipshell('some message to identify where we are') # # For more information see: http://ipython.scipy.org/moin/ # # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 FuncCallNodeType = 3 FuncCallListNodeType = 4 TermNodeType = 5 # Token types NoneTokType = 0 LParTokType = 1 RParTokType = 2 WordTokType = 3 CommaTokType = 4 EOFTokType = 5 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Child:', child def showLevel(self, level): for idx in range(level): print ' ', # # The recursive descent parser class. # Contains the "recognizer" methods, which implement the grammar # rules (above), one recognizer method for each production rule. # class ProgParser: def __init__(self): pass def parseFile(self, infileName): self.infileName = infileName self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = file(self.infileName, 'r') self.tokens = genTokens(self.infile) try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() self.infile.close() self.infile = None return result def parseStream(self, instream): self.tokens = genTokens(instream, '<instream>') try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() return result def prog_reco(self): commandList = [] while 1: result = self.command_reco() if not result: break commandList.append(result) return ASTNode(ProgNodeType, commandList) def command_reco(self): if self.tokenType == EOFTokType: return None result = self.func_call_reco() return ASTNode(CommandNodeType, result) def func_call_reco(self): if self.tokenType == WordTokType: term = ASTNode(TermNodeType, self.token) self.tokenType, self.token, self.lineNo = self.tokens.next() if self.tokenType == LParTokType: self.tokenType, self.token, self.lineNo = self.tokens.next() result = self.func_call_list_reco() if result: if self.tokenType == RParTokType: self.tokenType, self.token, self.lineNo = \ self.tokens.next() return ASTNode(FuncCallNodeType, term, result) else: raise ParseError(self.lineNo, 'missing right paren') else: raise ParseError(self.lineNo, 'bad func call list') else: raise ParseError(self.lineNo, 'missing left paren') else: return None def func_call_list_reco(self): terms = [] while 1: result = self.func_call_reco() if not result: break terms.append(result) if self.tokenType != CommaTokType: break self.tokenType, self.token, self.lineNo = self.tokens.next() return ASTNode(FuncCallListNodeType, terms) # # The parse error exception class. # class ParseError(Exception): def __init__(self, lineNo, msg): RuntimeError.__init__(self, msg) self.lineNo = lineNo self.msg = msg def getLineNo(self): return self.lineNo def getMsg(self): return self.msg def is_word(token): for letter in token: if letter not in string.ascii_letters: return None return 1 # # Generate the tokens. # Usage: # gen = genTokens(infile) # tokType, tok, lineNo = gen.next() # ... def genTokens(infile): lineNo = 0 while 1: lineNo += 1 try: line = infile.next() except: yield (EOFTokType, None, lineNo) toks = line.split() for tok in toks: if is_word(tok): tokType = WordTokType elif tok == '(': tokType = LParTokType elif tok == ')': tokType = RParTokType elif tok == ',': tokType = CommaTokType yield (tokType, tok, lineNo) def test(infileName): parser = ProgParser() #ipshell('(test) #1\nCtrl-D to exit') result = None try: result = parser.parseFile(infileName) except ParseError, exp: sys.stderr.write('ParseError: (%d) %s\n' % \ (exp.getLineNo(), exp.getMsg())) if result: result.show(0) def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() inputfile = args[0] test(inputfile) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Comments and explanation:
And, here is a sample of the data we can apply this parser to:
aaa ( ) bbb ( ccc ( ) ) ddd ( eee ( ) , fff ( ggg ( ) , hhh ( ) , iii ( ) ) )
And, if we run the parser on the this input data, we see:
$ python workbook045.py workbook045.data Node -- Type ProgNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: aaa Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: bbb Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ccc Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ddd Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: eee Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: fff Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ggg Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: hhh Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: iii Node -- Type FuncCallListNodeType
Lexical analysis -- The tokenizer in our recursive descent parser example was (for demonstration purposes) overly simple. You can always write more complex tokenizers by hand. However, for more complex (and real) tokenizers, you may want to use a tool to build your tokenizer.
In this section we'll describe Plex and use it to produce a tokenizer for our recursive descent parser.
You can obtain Plex at http://www.cosc.canterbury.ac.nz/~greg/python/Plex/.
In order to use it, you may want to add Plex-1.1.4/Plex to your PYTHONPATH.
Here is a simple example from the Plex tutorial:
#!/usr/bin/env python """ Sample Plex lexer Usage: python plex_example.py inputfile """ import sys import Plex def count_lines(scanner, text): scanner.line_count += 1 print '-' * 60 def test(infileName): letter = Plex.Range("AZaz") digit = Plex.Range("09") name = letter + Plex.Rep(letter | digit) number = Plex.Rep1(digit) space = Plex.Any(" \t") endline = Plex.Str('\n') #comment = Plex.Str('"') + Plex.Rep( Plex.AnyBut('"')) + Plex.Str('"') resword = Plex.Str("if", "then", "else", "end") lexicon = Plex.Lexicon([ (endline, count_lines), (resword, 'keyword'), (name, 'ident'), (number, 'int'), ( Plex.Any("+-*/=<>"), 'operator'), (space, Plex.IGNORE), #(comment, 'comment'), (Plex.Str('('), 'lpar'), (Plex.Str(')'), 'rpar'), # comments surrounded by (* and *) (Plex.Str("(*"), Plex.Begin('comment')), Plex.State('comment', [ (Plex.Str("*)"), Plex.Begin('')), (Plex.AnyChar, Plex.IGNORE), ]), ]) infile = open(infileName, "r") scanner = Plex.Scanner(lexicon, infile, infileName) scanner.line_count = 0 while True: token = scanner.read() if token[0] is None: break position = scanner.position() posstr = ('(%d, %d)' % (position[1], position[2], )).ljust(10) tokstr = '"%s"' % token[1] tokstr = tokstr.ljust(20) print '%s tok: %s tokType: %s' % (posstr, tokstr, token[0],) print 'line_count: %d' % scanner.line_count def usage(): print __doc__ sys.exit(1) def main(): args = sys.argv[1:] if len(args) != 1: usage() infileName = args[0] test(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Here is a bit of data on which we can use the above lexer:
mass = (height * (* some comment *) width * depth) / density totalmass = totalmass + mass
And, when we apply the above test program to this data, here is what we see:
$ python plex_example.py plex_example.data (1, 0) tok: "mass" tokType: ident (1, 5) tok: "=" tokType: operator (1, 7) tok: "(" tokType: lpar (1, 8) tok: "height" tokType: ident (1, 15) tok: "*" tokType: operator (1, 36) tok: "width" tokType: ident (1, 42) tok: "*" tokType: operator (1, 44) tok: "depth" tokType: ident (1, 49) tok: ")" tokType: rpar (1, 51) tok: "/" tokType: operator (1, 53) tok: "density" tokType: ident ------------------------------------------------------------ (2, 0) tok: "totalmass" tokType: ident (2, 10) tok: "=" tokType: operator (2, 12) tok: "totalmass" tokType: ident (2, 22) tok: "+" tokType: operator (2, 24) tok: "mass" tokType: ident ------------------------------------------------------------ line_count: 2
Comments and explanation:
And, here are some comments on constructing the patterns used in a lexicon:
Now let's revisit our recursive descent parser, this time with a tokenizer built with Plex. The tokenizer is trivial, but will serve as an example of how to hook it into a parser:
#!/usr/bin/env python """ A recursive descent parser example using Plex. This example uses Plex to implement a tokenizer. Usage: python python_201_rparser_plex.py [options] <inputfile> Options: -h, --help Display this help message. Example: python python_201_rparser_plex.py myfile.txt The grammar: Prog ::= Command | Command Prog Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call | Func_call ',' Func_call_list Term = <word> """ import sys, string, types import getopt import Plex ## from IPython.Shell import IPShellEmbed ## ipshell = IPShellEmbed((), ## banner = '>>>>>>>> Into IPython >>>>>>>>', ## exit_msg = '<<<<<<<< Out of IPython <<<<<<<<') # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 FuncCallNodeType = 3 FuncCallListNodeType = 4 TermNodeType = 5 # Token types NoneTokType = 0 LParTokType = 1 RParTokType = 2 WordTokType = 3 CommaTokType = 4 EOFTokType = 5 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Child:', child def showLevel(self, level): for idx in range(level): print ' ', # # The recursive descent parser class. # Contains the "recognizer" methods, which implement the grammar # rules (above), one recognizer method for each production rule. # class ProgParser: def __init__(self): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = None self.tokens = None def parseFile(self, infileName): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.infile = file(infileName, 'r') self.tokens = genTokens(self.infile, infileName) try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty file' result = self.prog_reco() self.infile.close() self.infile = None return result def parseStream(self, instream): self.tokens = None self.tokenType = NoneTokType self.token = '' self.lineNo = -1 self.tokens = genTokens(self.instream, '<stream>') try: self.tokenType, self.token, self.lineNo = self.tokens.next() except StopIteration: raise RuntimeError, 'Empty stream' result = self.prog_reco() self.infile.close() self.infile = None return result def prog_reco(self): commandList = [] while 1: result = self.command_reco() if not result: break commandList.append(result) return ASTNode(ProgNodeType, commandList) def command_reco(self): if self.tokenType == EOFTokType: return None result = self.func_call_reco() return ASTNode(CommandNodeType, result) def func_call_reco(self): if self.tokenType == WordTokType: term = ASTNode(TermNodeType, self.token) self.tokenType, self.token, self.lineNo = self.tokens.next() if self.tokenType == LParTokType: self.tokenType, self.token, self.lineNo = self.tokens.next() result = self.func_call_list_reco() if result: if self.tokenType == RParTokType: self.tokenType, self.token, self.lineNo = \ self.tokens.next() return ASTNode(FuncCallNodeType, term, result) else: raise ParseError(self.lineNo, 'missing right paren') else: raise ParseError(self.lineNo, 'bad func call list') else: raise ParseError(self.lineNo, 'missing left paren') else: return None def func_call_list_reco(self): terms = [] while 1: result = self.func_call_reco() if not result: break terms.append(result) if self.tokenType != CommaTokType: break self.tokenType, self.token, self.lineNo = self.tokens.next() return ASTNode(FuncCallListNodeType, terms) # # The parse error exception class. # class ParseError(Exception): def __init__(self, lineNo, msg): RuntimeError.__init__(self, msg) self.lineNo = lineNo self.msg = msg def getLineNo(self): return self.lineNo def getMsg(self): return self.msg # # Generate the tokens. # Usage - example # gen = genTokens(infile) # tokType, tok, lineNo = gen.next() # ... def genTokens(infile, infileName): letter = Plex.Range("AZaz") digit = Plex.Range("09") name = letter + Plex.Rep(letter | digit) lpar = Plex.Str('(') rpar = Plex.Str(')') comma = Plex.Str(',') comment = Plex.Str("#") + Plex.Rep(Plex.AnyBut("\n")) space = Plex.Any(" \t\n") lexicon = Plex.Lexicon([ (name, 'word'), (lpar, 'lpar'), (rpar, 'rpar'), (comma, 'comma'), (comment, Plex.IGNORE), (space, Plex.IGNORE), ]) scanner = Plex.Scanner(lexicon, infile, infileName) while 1: tokenType, token = scanner.read() name, lineNo, columnNo = scanner.position() if tokenType == None: tokType = EOFTokType token = None elif tokenType == 'word': tokType = WordTokType elif tokenType == 'lpar': tokType = LParTokType elif tokenType == 'rpar': tokType = RParTokType elif tokenType == 'comma': tokType = CommaTokType else: tokType = NoneTokType tok = token yield (tokType, tok, lineNo) def test(infileName): parser = ProgParser() #ipshell('(test) #1\nCtrl-D to exit') result = None try: result = parser.parseFile(infileName) except ParseError, exp: sys.stderr.write('ParseError: (%d) %s\n' % \ (exp.getLineNo(), exp.getMsg())) if result: result.show(0) def usage(): print __doc__ sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() infileName = args[0] test(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
And, here is a sample of the data we can apply this parser to:
# Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test
And, when we run our parser, it produces the following:
$ python plex_recusive.py plex_recusive.data Node -- Type ProgNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: aaa Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: bbb Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ccc Node -- Type FuncCallListNodeType Node -- Type CommandNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ddd Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: eee Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: fff Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: ggg Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: hhh Node -- Type FuncCallListNodeType Node -- Type FuncCallNodeType Node -- Type TermNodeType Child: iii Node -- Type FuncCallListNodeType
Comments:
For complex parsing tasks, you may want to consider the following tools:
And, for lexical analysis, you may also want to look here:
In the sections below, we give examples and notes about the use of PLY and pyparsing.
In this section we will show how to implement our parser example with PLY.
First down-load PLY. It is available here: PLY (Python Lex-Yacc) -- http://www.dabeaz.com/ply/
Then add the PLY directory to your PYTHONPATH.
Learn how to construct lexers and parsers with PLY by reading doc/ply.html in the distribution of PLY and by looking at the examples in the distribution.
For those of you who want a more complex example, see A Python Parser for the RELAX NG Compact Syntax, which is implemented with PLY.
Now, here is our example parser. Comments and explanations are below:
#!/usr/bin/env python """ A parser example. This example uses PLY to implement a lexer and parser. The grammar: Prog ::= Command* Command ::= Func_call Func_call ::= Term '(' Func_call_list ')' Func_call_list ::= Func_call* Term = <word> Here is a sample "program" to use as input: # Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test """ import sys import types import getopt import ply.lex as lex import ply.yacc as yacc # # Globals # startlinepos = 0 # # Constants # # AST node types NoneNodeType = 0 ProgNodeType = 1 CommandNodeType = 2 CommandListNodeType = 3 FuncCallNodeType = 4 FuncCallListNodeType = 5 TermNodeType = 6 # Dictionary to map node type values to node type names NodeTypeDict = { NoneNodeType: 'NoneNodeType', ProgNodeType: 'ProgNodeType', CommandNodeType: 'CommandNodeType', CommandListNodeType: 'CommandListNodeType', FuncCallNodeType: 'FuncCallNodeType', FuncCallListNodeType: 'FuncCallListNodeType', TermNodeType: 'TermNodeType', } # # Representation of a node in the AST (abstract syntax tree). # class ASTNode: def __init__(self, nodeType, *args): self.nodeType = nodeType self.children = [] for item in args: self.children.append(item) def append(self, item): self.children.append(item) def show(self, level): self.showLevel(level) print 'Node -- Type: %s' % NodeTypeDict[self.nodeType] level += 1 for child in self.children: if isinstance(child, ASTNode): child.show(level) elif type(child) == types.ListType: for item in child: item.show(level) else: self.showLevel(level) print 'Value:', child def showLevel(self, level): for idx in range(level): print ' ', # # Exception classes # class LexerError(Exception): def __init__(self, msg, lineno, columnno): self.msg = msg self.lineno = lineno self.columnno = columnno def show(self): sys.stderr.write('Lexer error (%d, %d) %s\n' % \ (self.lineno, self.columnno, self.msg)) class ParserError(Exception): def __init__(self, msg, lineno, columnno): self.msg = msg self.lineno = lineno self.columnno = columnno def show(self): sys.stderr.write('Parser error (%d, %d) %s\n' % \ (self.lineno, self.columnno, self.msg)) # # Lexer specification # tokens = ( 'NAME', 'LPAR','RPAR', 'COMMA', ) # Tokens t_LPAR = r'\(' t_RPAR = r'\)' t_COMMA = r'\,' t_NAME = r'[a-zA-Z_][a-zA-Z0-9_]*' # Ignore whitespace t_ignore = ' \t' # Ignore comments ('#' to end of line) def t_COMMENT(t): r'\#[^\n]*' pass def t_newline(t): r'\n+' global startlinepos startlinepos = t.lexer.lexpos - 1 t.lineno += t.value.count("\n") def t_error(t): global startlinepos msg = "Illegal character '%s'" % (t.value[0]) columnno = t.lexer.lexpos - startlinepos raise LexerError(msg, t.lineno, columnno) # # Parser specification # def p_prog(t): 'prog : command_list' t[0] = ASTNode(ProgNodeType, t[1]) def p_command_list_1(t): 'command_list : command' t[0] = ASTNode(CommandListNodeType, t[1]) def p_command_list_2(t): 'command_list : command_list command' t[1].append(t[2]) t[0] = t[1] def p_command(t): 'command : func_call' t[0] = ASTNode(CommandNodeType, t[1]) def p_func_call_1(t): 'func_call : term LPAR RPAR' t[0] = ASTNode(FuncCallNodeType, t[1]) def p_func_call_2(t): 'func_call : term LPAR func_call_list RPAR' t[0] = ASTNode(FuncCallNodeType, t[1], t[3]) def p_func_call_list_1(t): 'func_call_list : func_call' t[0] = ASTNode(FuncCallListNodeType, t[1]) def p_func_call_list_2(t): 'func_call_list : func_call_list COMMA func_call' t[1].append(t[3]) t[0] = t[1] def p_term(t): 'term : NAME' t[0] = ASTNode(TermNodeType, t[1]) def p_error(t): global startlinepos msg = "Syntax error at '%s'" % t.value columnno = t.lexer.lexpos - startlinepos raise ParserError(msg, t.lineno, columnno) # # Parse the input and display the AST (abstract syntax tree) # def parse(infileName): startlinepos = 0 # Build the lexer lex.lex(debug=1) # Build the parser yacc.yacc() # Read the input infile = file(infileName, 'r') content = infile.read() infile.close() try: # Do the parse result = yacc.parse(content) # Display the AST result.show(0) except LexerError, exp: exp.show() except ParserError, exp: exp.show() USAGE_TEXT = __doc__ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 1: usage() infileName = args[0] parse(infileName) if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Applying this parser to the following input:
# Test for recursive descent parser and Plex. # Command #1 aaa() # Command #2 bbb (ccc()) # An end of line comment. # Command #3 ddd(eee(), fff(ggg(), hhh(), iii())) # End of test
produces the following output:
Node -- Type: ProgNodeType Node -- Type: CommandListNodeType Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: aaa Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: bbb Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ccc Node -- Type: CommandNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ddd Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: eee Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: fff Node -- Type: FuncCallListNodeType Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: ggg Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: hhh Node -- Type: FuncCallNodeType Node -- Type: TermNodeType Value: iii
Comments and explanation:
pyparsing is a relatively new parsing package for Python. It was implemented and is supported by Paul McGuire and it shows promise. It appears especially easy to use and seems especially appropriate in particular for quick parsing tasks, although it has features that make some complex parsing tasks easy. It follows a very natural Python style for constructing parsers.
Good documentation comes with the pyparsing distribution. See file HowToUseParsing.html. So, I won't try to repeat that here. What follows is an attempt to provide several quick examples to help you solve simple parsing tasks as quickly as possible.
You will also want to look at the samples in the examples directory, which are very helpful. My examples below are fairly simple. You can see more of the ability of pyparsing to handle complex tasks in the examples.
Where to get it - You can find pyparsing at: Pyparsing Wiki Home -- http://pyparsing.wikispaces.com/
How to install it - Put the pyparsing module somewhere on your PYTHONPATH.
And now, here are a few examples.
Note: This example is for demonstration purposes only. If you really to need to parse comma delimited fields, you can probably do so much more easily with the CSV (comma separated values) module in the Python standard library.
Here is a simple grammar for lines containing fields separated by commas:
import sys from pyparsing import alphanums, ZeroOrMore, Word fieldDef = Word(alphanums) lineDef = fieldDef + ZeroOrMore("," + fieldDef) def test(): args = sys.argv[1:] if len(args) != 1: print 'usage: python pyparsing_test1.py <datafile.txt>' sys.exit(-1) infilename = sys.argv[1] infile = file(infilename, 'r') for line in infile: fields = lineDef.parseString(line) print fields test()
Here is some sample data:
abcd,defg 11111,22222,33333
And, when we run our parser on this data file, here is what we see:
$ python comma_parser.py sample1.data ['abcd', ',', 'defg'] ['11111', ',', '22222', ',', '33333']
Notes and explanation:
Note how the grammar is constructed from normal Python calls to function and object/class constructors. I've constructed the parser in-line because my example is simple, but constructing the parser in a function or even a module might make sense for more complex grammars. pyparsing makes it easy to use these these different styles.
Use "+" to specify a sequence. In our example, a lineDef is a fieldDef followed by ....
Use ZeroOrMore to specify repetition. In our example, a lineDef is a fieldDef followed by zero or more occurances of comma and fieldDef. There is also OneOrMore when you want to require at least one occurance.
Parsing comma delimited text happens so frequently that pyparsing provides a shortcut. Replace:
lineDef = fieldDef + ZeroOrMore("," + fieldDef)
with:
lineDef = delimitedList(fieldDef)
And note that delimitedList takes an optional argument delim used to specify the delimiter. The default is a comma.
This example parses expressions of the form func(arg1, arg2, arg3):
from pyparsing import Word, alphas, alphanums, nums, ZeroOrMore, Literal lparen = Literal("(") rparen = Literal(")") identifier = Word(alphas, alphanums + "_") integer = Word( nums ) functor = identifier arg = identifier | integer args = arg + ZeroOrMore("," + arg) expression = functor + lparen + args + rparen def test(): content = raw_input("Enter an expression: ") parsedContent = expression.parseString(content) print parsedContent test()
Explanation:
This example parses expressions having the following form:
Input format: [name] [phone] [city, state zip] Last, first 111-222-3333 city, ca 99999
Here is the parser:
import sys from pyparsing import alphas, nums, ZeroOrMore, Word, Group, Suppress, Combine lastname = Word(alphas) firstname = Word(alphas) city = Group(Word(alphas) + ZeroOrMore(Word(alphas))) state = Word(alphas, exact=2) zip = Word(nums, exact=5) name = Group(lastname + Suppress(",") + firstname) phone = Combine(Word(nums, exact=3) + "-" + Word(nums, exact=3) + "-" + Word(nums, exact=4)) location = Group(city + Suppress(",") + state + zip) record = name + phone + location def test(): args = sys.argv[1:] if len(args) != 1: print 'usage: python pyparsing_test3.py <datafile.txt>' sys.exit(-1) infilename = sys.argv[1] infile = file(infilename, 'r') for line in infile: line = line.strip() if line and line[0] != "#": fields = record.parseString(line) print fields test()
And, here is some sample input:
Jabberer, Jerry 111-222-3333 Bakersfield, CA 95111 Kackler, Kerry 111-222-3334 Fresno, CA 95112 Louderdale, Larry 111-222-3335 Los Angeles, CA 94001
Here is output from parsing the above input:
[['Jabberer', 'Jerry'], '111-222-3333', [['Bakersfield'], 'CA', '95111']] [['Kackler', 'Kerry'], '111-222-3334', [['Fresno'], 'CA', '95112']] [['Louderdale', 'Larry'], '111-222-3335', [['Los', 'Angeles'], 'CA', '94001']]
Comments:
This example (thanks to Paul McGuire) parses a more complex structure and produces a dictionary.
Here is the code:
from pyparsing import Literal, Word, Group, Dict, ZeroOrMore, alphas, nums,\ delimitedList import pprint testData = """ +-------+------+------+------+------+------+------+------+------+ | | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 | +=======+======+======+======+======+======+======+======+======+ | min | 7 | 43 | 7 | 15 | 82 | 98 | 1 | 37 | | max | 11 | 52 | 10 | 17 | 85 | 112 | 4 | 39 | | ave | 9 | 47 | 8 | 16 | 84 | 106 | 3 | 38 | | sdev | 1 | 3 | 1 | 1 | 1 | 3 | 1 | 1 | +-------+------+------+------+------+------+------+------+------+ """ # Define grammar for datatable heading = (Literal( "+-------+------+------+------+------+------+------+------+------+") + "| | A1 | B1 | C1 | D1 | A2 | B2 | C2 | D2 |" + "+=======+======+======+======+======+======+======+======+======+").suppress() vert = Literal("|").suppress() number = Word(nums) rowData = Group( vert + Word(alphas) + vert + delimitedList(number,"|") + vert ) trailing = Literal( "+-------+------+------+------+------+------+------+------+------+").suppress() datatable = heading + Dict( ZeroOrMore(rowData) ) + trailing def main(): # Now parse data and print results data = datatable.parseString(testData) print "data:", data print "data.asList():", pprint.pprint(data.asList()) print "data keys:", data.keys() print "data['min']:", data['min'] print "data.max:", data.max if __name__ == '__main__': main()
When we run this, it produces the following:
data: [['min', '7', '43', '7', '15', '82', '98', '1', '37'], ['max', '11', '52', '10', '17', '85', '112', '4', '39'], ['ave', '9', '47', '8', '16', '84', '106', '3', '38'], ['sdev', '1', '3', '1', '1', '1', '3', '1', '1']] data.asList():[['min', '7', '43', '7', '15', '82', '98', '1', '37'], ['max', '11', '52', '10', '17', '85', '112', '4', '39'], ['ave', '9', '47', '8', '16', '84', '106', '3', '38'], ['sdev', '1', '3', '1', '1', '1', '3', '1', '1']] data keys: ['ave', 'min', 'sdev', 'max'] data['min']: ['7', '43', '7', '15', '82', '98', '1', '37'] data.max: ['11', '52', '10', '17', '85', '112', '4', '39']
Notes:
This section will help you to put a GUI (graphical user interface) in your Python program.
We will use a particular GUI library: PyGTK. We've chosen this because it is reasonably light-weight and our goal is to embed light-weight GUI interfaces in an (possibly) existing application.
For simpler GUI needs, consider EasyGUI, which is also described below.
For more heavy-weight GUI needs (for example, complete GUI applications), you may want to explore WxPython. See the WxPython home page at: http://www.wxpython.org/
Information about PyGTK is here: The PyGTK home page -- http://www.pygtk.org//.
In this section we explain how to pop up a simple dialog box from your Python application.
To do this, do the following:
Here is a sample that displays a message box:
#!/usr/bin/env python import sys import getopt import gtk class MessageBox(gtk.Dialog): def __init__(self, message="", buttons=(), pixmap=None, modal= True): gtk.Dialog.__init__(self) self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) hbox = gtk.HBox(spacing=5) hbox.set_border_width(5) self.vbox.pack_start(hbox) hbox.show() if pixmap: self.realize() pixmap = Pixmap(self, pixmap) hbox.pack_start(pixmap, expand=False) pixmap.show() label = gtk.Label(message) hbox.pack_start(label) label.show() for text in buttons: b = gtk.Button(text) b.set_flags(gtk.CAN_DEFAULT) b.set_data("user_data", text) b.connect("clicked", self.click) self.action_area.pack_start(b) b.show() self.ret = None def quit(self, *args): self.hide() self.destroy() gtk.main_quit() def click(self, button): self.ret = button.get_data("user_data") self.quit() # create a message box, and return which button was pressed def message_box(title="Message Box", message="", buttons=(), pixmap=None, modal= True): win = MessageBox(message, buttons, pixmap=pixmap, modal=modal) win.set_title(title) win.show() gtk.main() return win.ret def test(): result = message_box(title='Test #1', message='Here is your message', buttons=('Ok', 'Cancel')) print 'result:', result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Some explanation:
And, here is an example that displays an text input dialog:
#!/usr/bin/env python import sys import getopt import gtk class EntryDialog( gtk.Dialog): def __init__(self, message="", default_text='', modal=True): gtk.Dialog.__init__(self) self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) box = gtk.VBox(spacing=10) box.set_border_width(10) self.vbox.pack_start(box) box.show() if message: label = gtk.Label(message) box.pack_start(label) label.show() self.entry = gtk.Entry() self.entry.set_text(default_text) box.pack_start(self.entry) self.entry.show() self.entry.grab_focus() button = gtk.Button("OK") button.connect("clicked", self.click) button.set_flags(gtk.CAN_DEFAULT) self.action_area.pack_start(button) button.show() button.grab_default() button = gtk.Button("Cancel") button.connect("clicked", self.quit) button.set_flags(gtk.CAN_DEFAULT) self.action_area.pack_start(button) button.show() self.ret = None def quit(self, w=None, event=None): self.hide() self.destroy() gtk.main_quit() def click(self, button): self.ret = self.entry.get_text() self.quit() def input_box(title="Input Box", message="", default_text='', modal=True): win = EntryDialog(message, default_text, modal=modal) win.set_title(title) win.show() gtk.main() return win.ret def test(): result = input_box(title='Test #2', message='Enter a valuexxx:', default_text='a default value') if result is None: print 'Canceled' else: print 'result: "%s"' % result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': #import pdb; pdb.set_trace() main()
Most of the explanation for the message box example is relevant to this example, too. Here are some differences:
This example shows a file selection dialog box:
#!/usr/bin/env python import sys import getopt import gtk class FileChooser(gtk.FileSelection): def __init__(self, modal=True, multiple=True): gtk.FileSelection.__init__(self) self.multiple = multiple self.connect("destroy", self.quit) self.connect("delete_event", self.quit) if modal: self.set_modal(True) self.cancel_button.connect('clicked', self.quit) self.ok_button.connect('clicked', self.ok_cb) if multiple: self.set_select_multiple(True) self.ret = None def quit(self, *args): self.hide() self.destroy() gtk.main_quit() def ok_cb(self, b): if self.multiple: self.ret = self.get_selections() else: self.ret = self.get_filename() self.quit() def file_sel_box(title="Browse", modal=False, multiple=True): win = FileChooser(modal=modal, multiple=multiple) win.set_title(title) win.show() gtk.main() return win.ret def file_open_box(modal=True): return file_sel_box("Open", modal=modal, multiple=True) def file_save_box(modal=True): return file_sel_box("Save As", modal=modal, multiple=False) def test(): result = file_open_box() print 'open result:', result result = file_save_box() print 'save result:', result USAGE_TEXT = """ Usage: python simple_dialog.py [options] Options: -h, --help Display this help message. Example: python simple_dialog.py """ def usage(): print USAGE_TEXT sys.exit(-1) def main(): args = sys.argv[1:] try: opts, args = getopt.getopt(args, 'h', ['help']) except: usage() relink = 1 for opt, val in opts: if opt in ('-h', '--help'): usage() if len(args) != 0: usage() test() if __name__ == '__main__': main() #import pdb #pdb.run('main()')
A little guidance:
Note that there are also predefined dialogs for font selection (FontSelectionDialog) and color selection (ColorSelectionDialog)
If your GUI needs are minimalist (maybe a pop-up dialog or two) and your application is imperative rather than event driven, then you may want to consider EasyGUI. As the name suggests, it is extremely easy to use.
How to know when you might be able to use EasyGUI:
EasyGUI plus documentation and examples are available at EasyGUI home page at SourceForge -- http://easygui.sourceforge.net/
EasyGUI provides functions for a variety of commonly needed dialog boxes, including:
See the documentation at the EasyGUI Web site for more features.
For a demonstration of EasyGUI's capabilities, run the easygui.py as a Python script:
$ python easygui.py
Here is a simple example that prompts the user for an entry, then shows the response in a message box:
import easygui def testeasygui(): response = easygui.enterbox(msg='Enter your name:', title='Name Entry') easygui.msgbox(msg=response, title='Your Response') testeasygui()
This example presents a dialog to allow the user to select a file:
import easygui def test(): response = easygui.fileopenbox(msg='Select a file') print 'file name: %s' % response test()
Python has an excellent range of implementation organization structures. These range from statements and control structures (at a low level) through functions, methods, and classes (at an intermediate level) and modules and packages at an upper level.
This section provides some guidance with the use of packages. In particular:
A Python package is a collection of Python modules in a disk directory.
In order to be able to import individual modules from a directory, the directory must contain a file named __init__.py. (Note that requirement does not apply to directories that are listed in PYTHONPATH.) The __init__.py serves several purposes:
One simple way to enable the user to import and use a package is to instruct the use to import individual modules from the package.
A second, slightly more advanced way to enable the user to import the package is to expose those features of the package in the __init__ module. Suppose that module mod1 contains functions fun1a and fun1b and suppose that module mod2 contains functions fun2a and fun2b. Then file __init__.py might contain the following:
from mod1 import fun1a, fun1b from mod2 import fun2a, fun2b
Then, if the following is evaluated in the user's code:
import testpackages
Then testpackages will contain fun1a, fun1b, fun2a, and fun2b.
For example, here is an interactive session that demostrates importing the package:
>>> import testpackages >>> print dir(testpackages) [`__builtins__', `__doc__', `__file__', `__name__', `__path__', `fun1a', `fun1b', `fun2a', `fun2b', `mod1', `mod2']
Distutils (Python Distribution Utilities) has special support for distrubuting and installing packages. Learn more here: Distributing Python Modules -- http://docs.python.org/distutils/index.html.
As our example, imagine that we have a directory containing the following:
Testpackages Testpackages/README Testpackages/MANIFEST.in Testpackages/setup.py Testpackages/testpackages/__init__.py Testpackages/testpackages/mod1.py Testpackages/testpackages/mod2.py
Notice the sub-directory Testpackages/testpackages containing the file __init__.py. This is the Python package that we will install.
We'll describe how to configure the above files so that they can be packaged as a single distribution file and so that the Python package they contain can be installed as a package by Distutils.
The MANIFEST.in file lists the files that we want included in our distribution. Here is the contents of our MANIFEST.in file:
include README MANIFEST MANIFEST.in include setup.py include testpackages/*.py
The setup.py file describes to Distutils (1) how to package the distribution file and (2) how to install the distribution. Here is the contents of our sample setup.py:
#!/usr/bin/env python from distutils.core import setup # [1] long_description = 'Tests for installing and distributing Python packages' setup(name = 'testpackages', # [2] version = '1.0a', description = 'Tests for Python packages', maintainer = 'Dave Kuhlman', maintainer_email = 'dkuhlman (at) davekuhlman (dot) org url = 'http://www.reifywork.com long_description = long_description, packages = ['testpackages'] # [3] )
Explanation:
Now, to create a distribution file, we run the following:
python setup.py sdist --formats=gztar
which will create a file testpackages-1.0a.tar.gz under the directory dist.
Then, you can give this distribution file to a potential user, who can install it by doing the following:
$ tar xvzf testpackages-1.0a.tar.gz $ cd testpackages-1.0a $ python setup.py build $ python setup.py install # as root