Working with the file system is too verbose. Let’s make it more like JQuery!

Working with the file system in Python is too verbose.


Every time I want to do something with files I need to remember whether it’s the `os` or `sys` library that I need to import. I need to remember or look up half a dozen other functions to grab particular bits of metadata from those files.

I’ve written countless nested loops or recursive functions to walk the tree of files over the years.

And often I want to do something quick to a bunch of files, start writing shell-script. Then realize I hate and can’t remember shell-script and think it would be so much easier in Python. Then I open the editor to try to write my code in python, and I realize it’s too much trouble and go back to mashing around in bash again.

And then I remembered JQuery with its refreshingly easy abstractions for maneuvering in, and manipulating a tagged tree-shaped data-structure. And the file-system is just a tree, right? So why does it have to be hard? Why shouldn’t working with files just be like working with JQuery?

This is such an obvious idea, I’m sure someone must have done it.

But I couldn’t find it. So I had to write it myself a couple of months ago.

So, let me present … FSQuery . Now available on PyPI and GitHub.

Here’s what it looks like :

from fsquery import FSQuery

fsq = FSQuery(path).Match(".js$").NoFollow("vendor").FileOnly()  
for n in fsq :
    print n.abs

You create an FSQuery object with a path to the root of the files you’re interested in. You then load up extra filters / queries on the query by chaining them together.

Finally you can treat the whole thing as an iterable collection and loop through its results.

It returns object of class FSNode, which represent nodes in the file-system, either files or directories. The modifier FileOnly(), restricts the query to only return files. You only need to add this once to the query.

The NoFollow() method tells the query to avoid directories that match the name. But has no effect on file names. In the above example, “vendor.txt” would still be included in the results if that file is anywhere other than under the vendor directory. (This beats just piping find through grep in the terminal.) You can add as many NoFollow filters as you like to a query.

On the other hand, Match() is an inclusive filter. Only files whose names are explicitly matched end up in the results. However, this filter isn’t applied to directories. FQuery will still explore and return directories whether they match this or not. You will usually want to combine a Match with a FilesOnly to get the effect you want (eg. in this case, all the .js files anywhere except under the vendor directory)

We can even look inside files with the Contains filter, eg.

fsq = FSQuery(path).NoFollow("vendor").Ext("py").Contains("GNU Lesser General Public License").FileOnly()

Note that this is implemented purely in Python in a very non efficient way. (Ie. I just open up each file and run through it looking for the string.) It can be slow with large chunks of the file system.

Note also the Ext() filter for file-name extensions. This is easier than regexing the whole file-name if you’re just looking for files of type “py”. Be aware that if you try to have two Ext() filters on the same query, you will get no files returned. No file can have two different extensions at the same time.

More documentation, and more advanced tricks can be seen on the GitHub site. This is also the first library I’ve put on PyPI. So installing in your own project is as simple as

pip install fsquery

Leave a Reply

Your email address will not be published. Required fields are marked *