Read my first binary file!

Wow.  I am shocked that this worked on the first try.  I created a script to read some of the header information from an S7K file.  One thing that got me stuck for a bit was the fact that the byte order used is little-endian.  Turns out this is no big deal.  I just had to end things with ‘le’, for example s.uintle instead of s.uint.

So here’s my first ever script to read binary data:

[sourcecode language=”python”]
#!usr/bin/env Python

# Purpose: to read the header of an s7k file
# Creator: Michelle

from bitstring import *

s = BitString(filename=’sample.s7k’) # Read in the file sample.s7k, and create a BitString object from it
print s.pos #Print the position in the file.  For fun.

protocolversion = s.readbytes(2).uintle
offset = s.readbytes(2).uintle
syncpat = s.readbytes(4).uintle
size = s.readbytes(4).uintle
print size

odof = s.readbytes(4).uintle
odid = s.readbytes(4).uintle

year = s.readbytes(2).uintle
day = s.readbytes(2).uintle
seconds = s.readbytes(4).floatle
hours = s.readbytes(1).uintle
minutes = s.readbytes(1).uintle

print year,day,seconds,hours,minutes
[/sourcecode]

And the output of my first ever binary-data-reading script:

0
396
2009 353 55.1300010681 17 53

Installing BitString

I’m installing BitString right now. It seems like what I need. The first sentence on the BitString Google Code website is: “bitstring is a pure Python module designed to help make the creation, manipulation and analysis of binary data as simple and natural as possible.” Perfect. The first thing I did was download the manual (pdf) and the zipped installation file for my version of Python (2.6).

Unzipping zipped files in Ubuntu is really easy – just right click on the file in the File Browser, and choose to Unzip using Archive Manager.  But I wanted to know how to do it using the command line.  Here is an example:

[sourcecode language=”bash”]
unzip bitstring-1.1.3.zip
[/sourcecode]

I installed the contents of the file into the appropriate Python Module locations on my computer by going to the directory where the unzipped files were, and typing the following into the command line:

[sourcecode language=”bash”]
sudo python setup.py install
[/sourcecode]

Now I’m ready to dig in!  Luckily, a friend of mine at work helped me out with some pointers on how to read in the s7k binary file.  He explained the commands he’d used when creating his Matlab scripts, and how the structure was defined.  He’s not a Python user, so couldn’t give me any specific Python tips, but it was enough for me to get started.  So now I have to figure out the BitString part.  The reason I’m not going with struct or array is because it seems that these are meant to work with whole bytes and are clunky when it comes to parsing out individual bits. BitString is designed with more flexibility.

Fortunately BigString seems pretty straightforward.  I had a very quick look through the manual, and if I understand correctly, I will start by converting the s7k file to a BitString object, then just read through it bit by bit (or byte by byte).

New project: use Python to read multibeam data

Am I getting in over my head before I’ve learned the basics?  Most likely.  But I find that if I set my goals high, I learn lots of unexpected things along the way.  This latest goal is probably not going to be completed from start to finish in any kind of linear fashion, and I will probably drop it and come back to it several times before its completion.

I’m hoping to figure out how to read S7K multibeam data format.  Not a simple challenge for someone like me who can barely piece together a print statement.  This isn’t like reading an ASCII text file.  I started reading the Data Format Definition document (DFD), and came upon some pretty daunting tasks right away, including reading headers, different number formats, and lots of bits and bytes stuff.  Scary, but sort of exciting.  When I took a computer science class way back in undergrad, I remember learning all the really basic stuff, but since I didn’t have a real application for it, it was sort of meaningless to me, and therefore did not stick in my brain.  But now it’s fun!  (I’m a nerd).  Hopefully it’ll stick this time 🙂

These links might be helpful:

Reading and Writing data using Python’s input and output functionality

Understanding Big and Little Endian

The Learning Python book says (p. 901): “If you are processing image files, packed data created by other programs whose content you must extract, or some device data streams, chances are good that you will want to deal with it using bytes and binary-mode files.”