Tuesday, August 30, 2016

Test post - ignore.

By Vasudev Ram

Test post - ignore.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

Jump to posts: Python   DLang   xtopdf Subscribe to my blog by email

My ActiveState recipes



file_sizes utility in D: print sizes of all files under a directory tree

By Vasudev Ram



Manila folder image attribution

Here is a command-line utility written in D (Dlang), that finds and prints the names and sizes of all regular (i.e. non-hidden) files under a directory subtree, with the total at the end. It is called file_sizes.d. It can be compiled with:
$ dmd file_sizes.d
and run with:
$ file_sizes dirName
Here is the code for file_sizes.d:
/*********************************************************************
File: file_sizes.d
---------------------------------------------
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
---------------------------------------------
Purpose: To find the sizes of all files (recursively, including in 
subdirectories) under a given directory tree.
Compile with:

$ dmd file_sizes.d

Run with:

$ file_sizes dirName

Description: To find the sizes of all files under a given directory
tree. The program will print both the name of the file and the file size
in bytes, separated by a tab character, one file per line. At the end,
it will also print the total number of files, and sum of their sizes.

*********************************************************************/

import std.stdio;
import std.file;
import std.uni;

void usage(string[] args) {
    stderr.writeln("Usage: ", args[0], " dirName");
    stderr.writeln(
        "Recursively find and print names and " ~
        "sizes of all files under dirName.");
}

int main(string[] args) {
    try {
        if (args.length != 2) {
            usage(args);
            return 1;
        }
        string dirName = args[1];
        // Check that dirName is not NUL or CON (DOS device names).
        if (dirName.toUpper() == "NUL" || dirName.toUpper() == "CON" ) {
            stderr.writeln("Error: ", dirName, " is not a directory. Exiting.");
            return 1;
        }
        if (!exists(dirName)) {
            stderr.writeln("Error: ", dirName, " not found. Exiting.");
            return 1;
        }
        // Check if dirName is actually a directory.
        if (!DirEntry(dirName).isDir()) {
            stderr.writeln("Error: ", dirName, " is not a directory. Exiting.");
            return 1;
        }
        ulong file_count = 0;
        ulong total_size = 0;
        ulong size;
        foreach(DirEntry de; dirEntries(dirName, SpanMode.breadth)) {
            // The isFile() check may be enough, also need to check for
            // Windows vs. POSIX behavior.
            if (de.isFile() && !de.isDir()) {
                file_count += 1;
                size = getSize(de.name());
                total_size += size;
                writeln(de.name(), "\t", size);
            }
        }
        writeln("Directory: ", args[1], "\tFiles: ", file_count, 
            " Size: ", total_size);

    } catch (FileException fe) {
        stderr.writeln("Got a FileException: ", fe.toString(), 
        "\n. Errno: ", fe.errno, ". Exiting.");
        return 1;
    } catch (Exception e) {
        stderr.writeln("Got an Exception: ", e.toString(), 
        "\n. Exiting.");
        return 1;
    }
    return 0;
}
Here are two example runs of file_sizes:
$ file_sizes test_dir
test_dir\a1 380
test_dir\a2 1215
test_dir\dir1\b1    10894
test_dir\dir1\b2    3871
Directory: test_dir Files: 4 Size: 16360

$ file_sizes d:\temp | grep Directory
Directory: d:\temp      Files: 2232 Size: 275511672
In the 2nd run above, I filter out all the filenames and sizes using grep, so it only shows the summary/total line, which can be convenient when that is all you are interested in. Also, in the detail lines (the file names and sizes), the name and size are separated by a tab character, so that the output is compatible with Unix filters like sed and awk.

I tested that by piping a few runs of the program to awk running an awk script (to calculate the total by summing up column 2 (the sizes) - before I added code for the total line to the D program itself.). And you can still use the output with piping to awk, etc., to do any further processing on only the detail lines, by first piping the output to "grep -v Directory:" (which will work unless there is a file or path called that in the output).

file_sizes runs fairly fast on my machine.

The image at the top of the post is of a stack of Manila paper, from which Manila folders are made. Physical file folders were the inspiration for folders in computer file systems (a.k.a. directories).

- Enjoy.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Friday, August 26, 2016

Square spiral - drawing with 3D effect (turtle graphics)

By Vasudev Ram

I was doing some work with Python turtle graphics for a project, and came up with this simple program that draws a square-ish spiral in multiple colors. It has a bit of a 3D effect. You can see it as a pyramid with you above, the levels ascending toward you, or you can see it (again from above) as a well with steps, going downward.

Here is the code and a screenshot of its output:
'''
square_spiral.py
A program that draws a "square spiral".
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: https://gumroad.com/vasudevram
'''

import turtle
t = turtle

colors = ['blue', 'green', 'yellow', 'orange', 'red']

def pause():
    _ = raw_input("Press Enter to exit:")

def spiral(t, step, step_incr, angle):
    color_ind = 0
    colors_len = len(colors)
    t.pencolor(colors[color_ind])
    while True:
        t.forward(step)
        step = step + step_incr
        if step > 500:
            break
        t.right(angle)
        color_ind = (color_ind + 1) % colors_len
        t.pencolor(colors[color_ind])

    t.hideturtle()
    pause()

t.speed(0)
spiral(t, 20, 5, 90.2)


- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Tuesday, August 16, 2016

Count line frequencies with OrderedDict in Python

By Vasudev Ram


Python programs to count the frequencies of words in a string or from a file are used as common examples. They are often done using dicts. Here is a small program that counts the frequencies of lines in its input. There are some uses for this functionality. I will show those, and also compare and contrast this program with other tools, later.

The program uses an OrderedDict from the collections module of the Python standard library.

The program could also be written using either a regular dict or a defaultdict (also from the collections module), or a collections.Counter, with slightly different code in each of those cases.
from __future__ import print_function
"""
linefreq.py
A program to find the frequencies of input lines.
Author: Vasudev Ram
Copyright 2016 Vasudev Ram
Web site: https://vasudevram.github.io
Blog: http://jugad2.blogspot.com
Product store: http://gumroad.com/vasudevram
"""
import sys
from collections import OrderedDict

def linefreq(in_fil):
    counts = OrderedDict()
    for line in in_fil:
        counts[line] = counts.get(line, 0) + 1
    print("Freq".rjust(8) + ": Line")
    for line, freq in counts.items():
        print(str(freq).rjust(8) + ": " + line, end="")
    print('-' * (10 + max(map(len, counts))))
    for line, freq in reversed(counts.items()):
        print(str(freq).rjust(8) + ": " + line, end="")

def main():
    sa, lsa = sys.argv, len(sys.argv)
    if lsa == 1:
        linefreq(sys.stdin)
    elif lsa == 2:
        with open(sa[1], "r") as in_fil:
            linefreq(in_fil)
    else:
        print("Only one filename argument supported.")

if __name__ == '__main__':
    main()
I ran it on this input file:
line 1
line 2
line 2
line 3
line 3
line 3
line 4
line 4
line 4
line 4
where "line 1" occurs once, "line 2" occurs twice, etc., with this command:
$ python linefreq.py infile1.txt
and got this output:
Freq: Line
       1: line 1
       2: line 2
       3: line 3
       4: line 4
-----------------
       4: line 4
       3: line 3
       2: line 2
       1: line 1
The reversed lines are output just to show that it is possible to use reversed() on an OrderedDict, unlike on a dict.
I also got the same output, as expected, when I ran this form of the command:
$ cat infile1.txt | python linefreq.py
This line:
    print('-' * (10 + max(map(len, counts))))
is used to print a row of dashes as long as the longest output line from above it.
The length of the longest line can also be computed inline in the first for loop.


- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

My Python posts     Subscribe to my blog by email

My ActiveState recipes



Test post - ignore.

By Vasudev Ram

Test post - ignore.

- Vasudev Ram - Online Python training and consulting

Get updates on my software products / ebooks / courses.

My Python posts     Subscribe to my blog by email

My ActiveState recipes