The Artima Developer Community
Sponsored Link

Python Buzz Forum
Packing It All In

0 replies on 1 page.

Welcome Guest
  Sign In

Go back to the topic listing  Back to Topic List Click to reply to this topic  Reply to this Topic Click to search messages in this forum  Search Forum Click for a threaded view of the topic  Threaded View   
Previous Topic   Next Topic
Flat View: This topic has 0 replies on 1 page
Ben Last

Posts: 247
Nickname: benlast
Registered: May, 2004

Ben Last is no longer using Python.
Packing It All In Posted: Dec 10, 2004 12:05 PM
Reply to this message Reply

This post originated from an RSS feed registered with Python Buzz by Ben Last.
Original Post: Packing It All In
Feed Title: The Law Of Unintended Consequences
Feed URL: http://benlast.livejournal.com/data/rss
Feed Description: The Law Of Unintended Consequences
Latest Python Buzz Posts
Latest Python Buzz Posts by Ben Last
Latest Posts From The Law Of Unintended Consequences

Advertisement
Packing with Python, unpacking with Java

Two of the projects I'm working on right now are heavily J2ME-based.  The memory constraints in this environment are downright scary at times; takes me back to the days of the Z80, when men were real men and a sixteen-bit register counted as wide-open spaces.  But enough of the nostalgia.  The point of this entry is to make Useful Notes on using the power and simplicity of Python to prepare data for unpacking in the restricted environment of a J2ME MIDlet.  In this context, I'm talking about packing data into some binary format that takes up minimal space in the MIDlet (to save download time and installation space) and can be unpacked on demand, requiring minimal memory at runtime.

Why mess around with two languages like this?  I see it as a 'right tools for the job' approach.  There's no real alternative to J2ME for what I want to achieve on the handset, but Python easily beats Java when it comes to complex data processing, especially string manipulation.  Thus I use Python for the CPU-heavy task of building the data.

Let's take the Java side first and see how we can read stuff from a binary asset:

//Load up the metadata file and open a DataInputStream on it
DataInputStream dis = new DataInputStream(this.getClass().getResourceAsStream("data.bin"));
 
//Now read some basic types from it
Byte b = dis.readByte();
int i = dis.readUnsignedShort();
String s = dis.readUTF();

Obvious enough.  A DataInputStream expects the data in the file to conform to a more-or-less standard format.  Now let's look at some Python code that can write the data in a compatible format so that Java can read it back.
import struct
 
#Note that we open in binary mode - forget this on Windows and you'll get unexpected results.
f = open('data.bin','wb')
 
#Write a byte, am unsigned short and an string in UTF format.
f.write(struct.pack("!B",1))
f.write(struct.pack("!H",2))
 
s = 'This is my string, print me yours'
#First write the string length (as a signed short)
f.write(struct.pack('!h',len(s)))
#Now write the string itself as UTF8, modified to meet the Java UTF8 requirements.
f.write(s.encode('utf8').replace()

You don't have to do all the calls to struct.pack separately, they can be concatanated thus (see the struct module documentation for more details):
#This is equivalent to the above lines that write a byte and a ushort
f.write(struct.pack("!BH",1,2))

The leading ! on the format string tells struct to pack in network byte order, which is what's expected by a DataInputStream (and may be different to the byte order of your machine).
Writing the strings is a little more complex.  They must be preceded by a two-byte length and are supposed to be in a modified form of UTF8 that guarantees there will never be an embedded single NUL (zero) byte.  I've omitted any handling for this in the example above, since text string don't tend to contain u'\x0000'.  Depending on your application, you may want to check for them and do an appropriate replace before converting the string to UTF8.

Storing strings like this is fine, but may take up more room than is needed if all you're dealing with is 7-bit ASCII.  For one project, I borrowed an old WordStar trick.  If you ever dumped a WordStar file, you'd notice that there were no spaces in the text.  Instead, the top bit of every character that preceded a space was set[0].  Here's Python code that writes out a series of ASCII strings, separating them by setting the top bit of the last character of each string.

#Assume we have an array of strings called 'stringsToWrite', and an open file 'f'
#Write the number of strings first as an unsigned short
f.write(struct.pack("!H",len(stringsToWrite)))
for s in stringsToWrite:
    #Each string is stored as a sequence of ASCII bytes; the final one has
    #the highest bit set.
    #This code expanded for simplicity - there are tighter ways to write it.
    if not s:
        #Empty strings saved as a single NUL byte
        f.write(struct.pack("!B",0))
    else:
        asc = s.encode('ascii')
        ln = len(asc)
        for i in range(ln):
            #Clear the top bit
            b = ord(asc[i]) & 0x7f
            if i == (ln-1):
                #last char, so force top bit to be set
                b = b | 0x80
            f.write(struct.pack("!B",b))


And here's the Java that reads them back in.
//Read the number of strings and dimension the array.
int count = dis.readUnsignedShort();
String strings[] = new String[count];

//Choose an appropriate length for the StringBuffer
StringBuffer sb = new StringBuffer(8);
Byte b;
//This code is explicit for simplicity - there are tighter ways to write this.
for(int i=0; i<count; i++) {
    do {
        b = dis.readByte();
        //Don't append NUL bytes - these mean empty strings.
        if (b!=0) {
            sb.append((char)(b & 0x7f));
        }
    } while ((b & 0x80) == 0);
    strings[i] = sb.toString();
    sb.setLength(0); //clear buffer for next string
}

[0] You can find the true programmers in a crowd by asking them all to say what the most common character is in English text.  Non programmers will say "e".  Programmers will say "space". :)

Read: Packing It All In

Topic: Blog Notes Previous Topic   Next Topic Topic: New Blog!

Sponsored Links



Google
  Web Artima.com   

Copyright © 1996-2019 Artima, Inc. All Rights Reserved. - Privacy Policy - Terms of Use