Why I like Unix

In my research group, we like Unix for the expressive power of its command shells and the maturity of the multiuser and multitasking environment.

Usually I will assume the (t)csh shell in the following examples. For normal use csh is simple and suitable, and it doesn't have the cluttered 'nineteen-sixties-teletype' look and feel that emanates from sh or bash syntax in scripts.(* See footnote)

Unix pipes will be formatted with \ continuations, which looks rather nice, imho (note: spaces or tabs should be removed after the \ of the examples below, if you pick the examples up with the mouse).

System input is in this color
System output is in this color


About light-weight computer use

Today, most people fire up huge application (open-office, emacs etc). There is another style of computer use which is much more light weight and which is liked by anyone who is interested in speed. The use of memory-hog bloating applications which are in the window-manager loop (i.e., controlled by mouse) prevents one from enjoying the shere speed of current cpus. Therefore, the real Unix user predominantly works from within terminal windows, in a light-weight window manager (fvwm, icewm etc.) which allows for multiple desktops (e.g.: 3x3). This keeps memory and resources for the program that you are writing and testing. The browser and image processing applications are usually heavy enough as it is. For many tasks, one will use flat text editors in the terminal window, for programming and editing configuration files: In case one wants to use (X)-windows based flat text editors, the following are an option, especially because they are syntax aware and use text attributes (color/bold) for syntax elements of the file you are editing:

Using a window manager with multiple desktops allows for different projects to stay in the air. By using VNCserver, one can keep these projects running for a year or more (because linux is stable) and log into them from linux or windows using a VNC client, anywhere in the world. VNC runs solid as a rock, even with flakey connections. I have been known to control my experiments on a Blue Gene supercomputer with a Palm-Pilot VNC client sitting on the couch at home, over wifi. After scoring some admiration from my wife, horror gradually set in as she realized the monstrosity of me sitting on the couch with my brain glued to Stella, our Blue Gene computer.


How to count brackets in a program (Tcl or C)


       cat myprog.c \
        | sed 's/\{/ BRACKBEGIN /g' \
        | sed 's/\}/ BRACKEND /g' \
        | tr ' ' '\012' \
        | sort \
        | uniq -c \
        | grep BRACK

      110 BRACKBEGIN
      109 BRACKEND


Gary Perlman's Unix |Stat is a must

This example shows how to extract and manipulate columns in a matrix (file d.dat). The natural log() function is applied to column 2 after multiplying the elements with 10. The tee command is used to save intermediate results in the Unix pipe to the file d.out.

    cat d.dat

     1 4 
     2 3  
     3 5 
     5 2

    cat d.dat \
     | dm s1 'log(x2*10.)' \
     | tee d.out \
     | colex 2 \
     | desc -h

       Midpt    Freq
       2.302       0 
       2.907       2 **
       3.512       2 **
       4.116       1 *

    cat d.out

    1       3.68888
    2       3.4012
    3       3.91202
    4       2.99573
    5       2.99573

And we haven't even used backquotes at this stage!


Backquotes

With backquotes (`), the text results (stdout) of any program can be put into a shell variable. Newlines are changed into spaces. The following example obtains the PIDs of your rlogin processes and puts this list of numbers in environment variable $a. We use colex for obtaining the first column, but awk '{print $1}' could have been used as well.

    set a = `ps | grep rlogin | colex 1` 
    # Here are the PIDs of your rlogin processes: 
    echo $a 

    3264 3265 3327 30788 30789 32307 32308 


    (you could do a 
    

           kill -9 $a

                   at this stage)
Note that ` is not '.


Word frequencies

Other people buy expensive software packages. Unix users think. The following example is a coarse and quick method to obtain word-frequency lists. The E-text of Huckleberry Finn is used as the input file. The punctuation characters are transliterated into blanks. The blanks are transliterated into newlines. The 'words' are sorted. The words are counted. The output is sorted numerically. The tail, i.e., the top-twenty most frequently used words, is printed.

      cat hfinn10.txt \
        | tr '[:punct:]'  ' ' \
        | tr ' ' '\012' \
        | sort \
        | uniq -c \
        | sort -n \
        | tail -20

    796 all
    826 out
    836 for
    857 up
    875 on
   1091 s
   1131 that
   1325 you
   1405 in
   1556 he
   1621 of
   2035 was
   2082 t
   2290 it
   2928 to
   3094 a
   3664 I
   4507 the
   6120 and
  42323

(Note: not all versions of transliterate (tr) will have the nice [:punc:] syntax, but you can still achieve the same result with a little more verbosity).

Character frequencies


      cat hfinn10.txt \
        | sed 's/./&~/g' \
        | tr '~' '\012' \
        | sort -n \
        | uniq -c \
        | sort -n \
        | tail -20

   7481 c
   8037 ,
   9548 m
   9576 y
  10111 g
  11862 w
  13235 u
  14141
  16587 l
  18746 r
  22664 i
  22785 d
  23347 s
  24881 h
  31187 n
  34669 a
  34825 o
  39268 t
  46568 e
 116919  

The basic function used here is sed (stream editor), using the 'ditto' symbol '&'. It means: substitute me with the matched regular expression. The regular expression used here is '.', meaning one and only one character. A tilde is appended to each character, and is later transliterated into a newline. This means that the character frequency of the tilde cannot be used here. For character frequencies of all characters 0-255(dec), you may want to use
od -b --address-radix=n
in the pipe and replace blanks by newlines. od does what it says: octal dump.


Network programming

Is a host in the air or not, that is the question. The following 'oneliner' says yes if a host is reachable, no if not. Pattern matching is on the basis of Linux ping output, with "100%" somewhere in the output if ping succeeds, and "0%" somewhere in the ping output if it fails.

ping -c 1 $host \
    | awk '/ 100%/ {exit 1;} / 0%/ {exit 0;}' \
             && echo yes || echo no

Pretty ugly eh? Personally I would do this with several lines of code using a well-legible if () then construct but it is interesting that the functionality can be put into a single command.


More string matching

The following is a csh construct with curly brackets which is little known but quite nice. This csh script tells you whether a person was a president of the usa (if you provide the file presidents.usa with their names). The -q flag given to grep keeps it quiet.

#!/bin/csh
if ( { grep -q "$1" presidents.usa } ) then
          echo "$1" was a president of the usa
endif


More string matching (B)

Grep and regular expression patterns are very powerful. In this example we examine a data file d.dat. We extract all lines (records) which contain only two integer numbers separated by blanks. Then we take the second number and make a histogram of its distribution:

 cat d.dat \
    | grep '^ *[0-9][0-9]* *[0-9][0-9]*$' \
    | colex 2 \
    | desc -h


A construct like [0-9][0-9]* reads as: "one or more digits". Note that [0-9]* means: "none or more digits".


Contributions by other shell-script programmers

Greeting in Portuguese according to the hour of the day:

#!/usr/bin/ksh
# Julio Cezar Neves (julio.neves @ writeme.com) 12-03-1999
Hora=`date +%H`
case $Hora in
    0?|1[01]) echo Bom Dia      # good morning 
                ;;
    1[2-7]  ) echo Boa Tarde    # good afternoon
                ;;
    *       ) echo Boa Noite    # good evening
                ;;
esac
exit

Is a given argument numeric?

#!/usr/bin/ksh
# Julio Cezar Neves (julio.neves @ writeme.com) 12-03-1999
if  expr $1 + 1 > /dev/null 2>/dev/null
then
    echo $1 is numeric
else
    echo $1 is not numeric
fi
exit

*Footnote: However, I have to admit that for script programming I have migrated to bash lately because it does not have that ugly malloc limit on globbing (file name expansion):
foreach file ( * )
end                 
if performed by csh / tcsh chokes in large projects while
for file in *; do  
done                
performed by bash works smoothly even in huge directories (thanks Marius Bulacu, for the tip).


schomaker(at)ai.rug.nl