« ebook sales | Main | In single-source publishing, what do you call the source? »

Windows command line text processing with Javascript

Or, technically, with JScript.

I recently had to write a script that would make global replacements in a text file on a client's machine. Much as I love python (I look forward to telling you about the fun I'm having with python-calais), it was one of those tasks that just cried out "perl". Unfortunately, I couldn't take perl's existence for granted on the client's computers, and having them install it was too much trouble.

Between the the string manipulation functions and the regular expression, standard input, and standard output support, the combination of cscript and JScript gives all Windows machines a powerful text processing tool right out of the box.

I could, however, take everything in a typical Windows installation for granted, and it turned out that Microsoft's JScript implementation of Javascript could do everything I needed. (VBScript equivalents of what I describe here shouldn't be much different.) With Javascript's roots as a web scripting language, I knew that safety reasons prevented it from having a lot of file input and especially output capabilities, but I learned that not only are these fairly easy, they can be done with standard input and output, so that a command in a batch file can pipe content through JScript scripts just like it can with perl or python scripts.

The Windows utility that lets you run scripts from the command line is called cscript.exe. It assumes that a script file with an extension of "js" is a JScript program, although it can run VBScript programs as well.

The following script performs a global replacement using the target and replacement strings passed as command line parameters. It demonstrates a few nice things:

  • WScript.Echo writes to standard output. CScript is essentially a less-GUI oriented version of the wscript.exe Windows scripting engine, so many basic library calls include the latter's name.

  • The WScript object has a StdIn method to read from standard input.

  • WScript.Arguments stores the command line parameters used when invoking the script.

  • You can use regular expressions with UNIXy syntax. To use a variable such as target in a regular expression, you need the RegExp object, as the sample script demonstrates, but the comment preceding that line shows how a hardcoded regular expression would not need this object.

// replace.js: globally replace one string with another. 
// See directions for syntax. 

function directions() {
  WScript.Echo("Enter\n");
  WScript.Echo("cscript //Nologo replace.js targetstring \
replstring < infile.txt > outfile.txt");
     WScript.Echo("\nEnclose strings that have spaces in quotation marks.");
}
function processTextStream() {
    target = WScript.Arguments.Item(0);
    newString = WScript.Arguments.Item(1);
    while (!WScript.StdIn.AtEndOfStream) {
        line = WScript.StdIn.ReadAll();
        // If I wasn't passing a variable as the first argument 
        // of line.replace, I could use normal regex syntax like
        // line.replace(/Robert/g,"Bob")
        line = line.replace(RegExp(target,"g"),newString)
            WScript.Echo(line);
    }
}

// --------------------------------------------------

if (WScript.Arguments.length < 2) {
    directions();
}
else {
    processTextStream();
}

As the directions show, you could run this at a Windows command line like this:

   cscript //Nologo replace.js Robert Bob < oldaddrbook.txt > newaddrbook.txt

If you're redirecting the output to a file and don't want the Microsoft cscript banner in that file, don't forget the //Nologo parameter. Several other parameters are available.

Between the string manipulation functions and the regular expression, standard input, and standard output support, this combination of the cscript engine and the J(ava)Script programming language gives all Windows machines a powerful text processing tool right out of the box.

Comments

(Note: I usually close comments for an entry a few weeks after posting it to avoid comment spam.)

Nifty!

I guess you could run RDF stuff in a single-file too, using http://www.jibbering.com/rdf-parser/ :)

Haven't played with this at all yet, am mostly MacoSX-based lately, where addressbook and pubsub APIs are getting my attention lately. Is it possible to access the equivalent in Windows from .js?

I would suggest you to use awk95.exe:

[...]awk was chosen since it is a very small download (compared with Perl or WSH/VB) and accomplishes the task of modifying configuration files upon installation. Brian Kernighan's http://cm.bell-labs.com/cm/cs/who/bwk/ site has a compiled native Win32 binary, http://cm.bell-labs.com/cm/cs/who/bwk/awk95.exe which you must save with the name awk.exe rather than awk95.exe.

http://httpd.apache.org/docs/2.2/platform/win_compiling.html

I love awk, and actually pulled out my little gray book just recently--it's one of the few books that I ever owned two copies of so that I could keep one at work and one at home. (When I first discovered SGML, I was writing awk scripts to convert XyWrite files to online help source for Windows, OS/2, mainframes, and Unix flavors.) It's certainly easier to install than perl, but for the situation I described above, I was better off not telling this client to download and install anything.