Article 62JXF Dump a pickle file to a readable text file

Dump a pickle file to a readable text file

by
John
from John D. Cook on (#62JXF)

I got a data file from a client recently in pickle" format. I happen to know that pickle is a binary format for serializing Python objects, but trying to open a pickle file could be a puzzle if you didn't know this.

Be careful

There are a couple problems with using pickle files for data transfer. First of all, it's a security risk because an attacker could create a malformed pickle file that would cause your system to run arbitrary code. In the Python Cookbook, the authors David Beazley and Brian K. Jones warn

It's essential that pickle only be used internally with interpreters that have some ability to authenticate one another.

The second problem is that the format could change. Again quoting the Cookbook,

Because of its Python-specific nature and attachment to source code, you probably shouldn't use pickle as a format for long-term storage. For example, if the source code changes, all of your stored data might break and become unreadable.

Suppose someone gives you a pickle file and you're willing to take your chances and open it. It's from a trusted source, and it was created recently enough that the format probably hasn't changed. How do you open it?

Unpickling

The following code will open the file data.pickle and read it into an object obj.

 import pickle obj = pickle.load(open("data.pickle", "rb"))

If the object in the pickle file is very small, you could simply print obj. But if the object is at all large, you probably want to save it to a file rather than dumping it at the command line, and you also want to pretty" print it than simply printing it.

Pretty printing

The following code will dump a nicely-formatted version of our pickled object to a text file out.txt.

 import pickle import pprint obj = pickle.load(open("sample_data.pickle", "rb")) with open("out.txt", "a") as f: pprint.pprint(obj, stream=f)

In my case, the client's file contained a dictionary of lists of dictionaries. It printed as one incomprehensible line, but it pretty printed as 40,000 readable lines.

Prettier printing

Simon Brunning left a comment suggesting that the json module output is even easier to read.

 import json with open("out.txt", "a") as f: json.dump(obj, f, indent=2)

And he's right, at least in my case. The indentation json.dump produces is more what I'd expect, more like what you'd see if you were writing the structure in well-formatted source code.

Related postsThe post Dump a pickle file to a readable text file first appeared on John D. Cook.
External Content
Source RSS or Atom Feed
Feed Location http://feeds.feedburner.com/TheEndeavour?format=xml
Feed Title John D. Cook
Feed Link https://www.johndcook.com/blog
Reply 0 comments