Compatibility

Compatibility
Previous | Next

Topics

Berkeley 4.3 Buglist problems have been fixed.
The language is more consistent and easier to use.
Modern compiler technology has been employed.
Extensions
Restrictions and unimplemented features
Adaptation for Windows
Berkeley Compatibility Mode
See also

This section details the specific differences between the Hamilton C shell and the original UNIX C shell. It also describes the Hamilton C shell's Berkeley compatibility mode, used for running Berkeley C shell scripts.

Berkeley 4.3 Buglist problems have been fixed.

Shell procedures have been provided as a more powerful alternative to the clumsy argument mechanism for aliases.
Commands typed within loops or other control structures are properly added to the history list.
Control structures are recursively parsed, allowing piping between them. For example:

1 C% foreach i (a b c) echo $i; end | wc 3 3 9
Any of the : editing modifiers can be used on any substitution. Also, a space inside the search string in a :s/.../.../ command will match the space between two words. In the UNIX C shell, only certain modifiers could be used on a given type of substitution and it is not possible to perform a search/replace that crossed word boundaries.

The language is more consistent and easier to use.

The set, setenv and alias commands will now accept the same basic syntax. The UNIX C shell had a number of anomalies: an = sign was required for a set but not for setenv and alias; parenthesis were required around a word list for a set but not for setenv and alias; the set statement ignored all but the first argument word but alias would not, etc.
Variables or word lists are always indexed counting the first word as element zero. The UNIX C shell counted from zero when indexing with :n notation but from one when using [n] notation. argv[0] is the first argument word, not the name of the shell script being executed. The name of the script is kept in the local variable $scriptname. This can be overridden by setting the inheritable per-thread variable bsdargv = 1, causing argv[0] to be the name of the script.
In keeping with the desire to consistently index from zero, the last command entered into the history list, !!, is considered the 0-th element; !-1 is the line before it. The UNIX C shell considered these to be the same. A builtin variable, bsdhistory, is provided for those whose fingers prefer the Berkeley numbering convention: if you set bsdhistory = 1, !! and !-1 are the same.
Where an expression is expected, conventional high level language syntax is now acceptable. The UNIX C shell required spaces around any expression operators, a variable reference required a $ to introduce it, parenthesis were required to avoid confusing "less than" with i/o redirection, etc. What had to be typed as

@ i = ($j + 3 * $k < 10)

under the UNIX C shell can now be typed (for example) as

@ i = j+3*k < 10

(The original UNIX C shell expression syntax is still entirely acceptable and will still produce correct results.)
Inside a [...] array index, the shell always looks for an expression, never an editing-style word select. Syntax and keying rules are the same as with any expression.
The case statement now accepts an expression to be matched rather than only a pattern. (To specify a static pattern, enclose it in quotes.) To determine a match against a case clause, the case expression is evaluated, converted to a string and then used as a pattern to compare against the switch value.
The various different end statements used by the UNIX C shell, end, endif and endsw, have been replaced by a single end statement. Similarly, the two break statements, break and breaksw, have been replaced with a single break statement. For compatibility with existing scripts, the obsolete keywords are implemented as aliases in the default startup.csh script supplied with the product.
Since Hamilton C shell is free format (i.e., new statements need not begin on a new line), the UNIX C shell convention of chaining if statements with a single end if the else and if are on the same line isn't sensible (though it is supported in Berkeley Compatibility Mode explicitly for compatibility). Instead, an elif keyword has been added.
The obscure use of several break statements in a row on a single line to break out of several levels of control statements at once has been eliminated. In its place, a label may be specified as an operand to indicate the control structure out of which it should break.

Modern compiler technology has been employed.

Statements are parsed and compiled into an internal form before any substitutions or other evaluation is attempted. This offers an enormous performance improvement, particularly when iteration is involved. (The UNIX C shell would actually reparse each statement inside a foreach loop each time through the loop.)

If command- or variable-substitution creates any of the following reserved words or tokens, the special semantic meaning will be lost since substitution is done after parsing of statement structure. Instead, they will simply be treated as character strings. These reserved words are:

Introducing a clause in a structured statement:

alias elif if setkey unproc break else local source unset by end onintr switch unsetenv calc eval proc then unsetkey case exit repeat time until continue for return to while default foreach set unalias @ do goto setenv unlocal

Anywhere:

( ) < > &= | ;

In an expression:

+ - * / % =

Similarly, labels cannot be run-time evaluated to see what the label on a statement is; it must be evaluated when the statement is first parsed.

Extensions

Command line editing with the arrow keys, etc., and the setkey statements are new.
The procedure mechanism, the proc, unproc and return statements and the various builtin procedures are new.
Local variables and the local and unlocal statements are new.
The use of color highlighting to indicate exception situations in filename or command completion is new.
The for statement, providing numeric iteration, and the calc statement, which writes the result of expression evaluation to stdout, are new.
The ** and **= exponentiation operators are new.
Floating point arithmetic is new.
The path hashing mechanism is substantially less sensitive to blindspots caused by creating a new executable in one of the path directories and not manually specifying rehash. The UNIX C shell would not be able to find the new file; this shell makes a second pass through the path directories whenever hashing fails, looking for this sort of problem before it reports failure. If it finds a blindspot, it automatically rehashes that directory.
History references are allowed in the inline text supplied with the << i/o redirection mechanism. Also, the inline text is remembered in the history list, each line as a single word. This avoids the user having to remember and retype the inline text any time one of these statements is recalled from the history list or if the history list is dumped for use in a script file.
Exclusion ranges, e.g., [^a-z], can be used in a wildcard pattern.
Escape sequences to encode special characters (e.g., "^a for audible bell or "^b for backspace) are recognized in the arguments to any command, not just echo. Because this processing is internal to the shell, it is not necessary to type two escapes in a row to access this feature. (Refer to the echo command help screen for a complete list.)
Argument lists passed to a child process can be much larger than are allowed under UNIX. The UNIX C shell allows only roughly 6K characters to be passed, depending on the revision level; this shell allows up to 32K to be passed to a child process, the kernel limit on Windows. There is no command line limit to an internal command such as echo. This is of particular importance when wildcarding is used heavily.
Quoted strings are shown in the history list exactly as they would have to be typed. (The Berkeley UNIX C shell marked a character as quoted by setting its high-order bit; setting aside portability issues, it had the side-effect of not being visible in the history list.)
Parentheses in an argument list to an executable statement need not be escaped, so long as they are matched. Semicolons, i/o redirection symbols, etc., inside these parentheses are treated simply as text and are passed straight through to the application.
The :b (base), :# (count), :A (alternate shortname) :L (longname), :m (mixedpath) and :M (mixedcase fullpath) editing operators are new.
The indefinite directory wildcard construct, ..., is new.
The ## ... ## embedded comment construct is new.

Restrictions and unimplemented features

Job control is not supported. Job control is not feasible under Windows because once one thread from any process within a window has started to read the keyboard, the read can not be interrupted. (Fortunately, one can always open more windows.)
The use of \! inside a prompt string to get the statement number is not supported. Use $@ or $stmtnumber instead.
The following statements, all fairly specific to UNIX, are not supported: alloc, glob, limit, notify, stop.

Adaptation for Windows

Win32 does not provide a fork() call for inexpensively cloning an independent copy of a running process, complete with its own separate memory image. Instead, Win32 provides a faster alternative called threads, which creates an separately scheduled flow of control through the memory space of a single process.

In general, Hamilton C shell spawns a new thread anywhere the Berkeley UNIX C shell would have used a process. Using a new thread instead of a new invocation of Hamilton C shell saves over a second each time. Individual threads manage their own notions of current directories and current disk and certain per-thread variables but the dictionary of aliases, procedures and most variables is shared among all threads.

The result is that background activities and C shell scripts can change variables, define procedures, etc., for use by the other threads. For example, procedures can be written as self-loading scripts. (See the whereis.csh file for an example.)
Windows conventions are followed: either the \ or the / characters can be used in a filename; the "^ character is normally the escape character; directories in the PATH environment variable are separated by semicolons, etc.
Labels cannot be a single letter. (This is to avoid confusing the drive letter in the pathname of an executable file as a label.)
Executable files are recognized by their extension. The following extensions are recognized (in this order): .csh, .exe, .com, .cmd, .bat. .csh files are interpreted as C shell scripts by a new thread, .exe and .com files are executed with the CreateProcess kernel function under Windows. .Cmd files are interpreted by a child process running cmd.exe. .Bat files are passed to cmd.exe.
PROMPT1 and PROMPT2 variables are used to set the primary and secondary prompt strings. Using the UNIX C shell variable PROMPT would have conflicted with cmd.exe's use of the same name and would have meant a nonsense prompt string any time either command processor was invoked by other.
The following startup or other files have been renamed to be more consistent with Windows filename conventions: ~/.cshrc as ~\startup.csh; ~/.login as ~\login.csh; ~/.logout as ~\logout.csh; and ~/.history as ~\history.csh. The ~\login.csh file is read before, rather than after the ~\startup.csh file. When starting the shell as a new session, very little environmental information may be passed; the login.csh is more usefully the first file read in this situation. When starting a subshell, either from csh.exe or cmd.exe, the environment is presumably already set up.
The comment character, #, must be followed by some white space to be considered the start of a valid comment (except in Berkeley Compatibility Mode). (That's because # is a legal character in a filename under Windows.)

Berkeley Compatibility Mode

Berkeley Compatibility Mode provides fairly strict compatibility with the original BSD C shell. Triggered by a script that starts with #!/bin/csh or interactively if the shell is invoked with the -B option, it causes the C shell to process statements in a more fully Berkeley-compatible fashion. (Scripts that do not start with #!/bin/csh will still be processed according to Hamilton C shell rules, even if the -B option is used to request Berkeley compatibility interactively.) In compatibility mode:

The status variable will reflect the return code from the rightmost stage of a pipeline returning a non-zero code. The tailstatus variable will be ignored.
All the shell variables will be snapshotted and all new variables made local to the thread.
Berkeley-style $var[...] indexing notation will be used, where the indexing is by word selection operators (like the :-editing operators) rather than by expression.
All variable arrays (except argv) will start with element 1. Accessing element 0 will give a null.
$0 or $argv[0] will be the scriptname. $argv will be the rest of the argument vector. The bsdargv variable will be ignored.
The # character will not need to be followed by white space to be considered the start of a comment.
The patterns in a case test (inside a switch) will be strings and need not be quoted, rather than arbitrary expressions. Also, the switch value is evaluated as a wordlist which may contain variable or command substitutions and wildcards and then rendered as a string.
endif and endsw will be predefined aliases for end (but only when closing an if or switch, respectively). break will only break out of an enclosing loop (foreach or while). breaksw will only break out of a switch statement.
The special-case use of else if on a single line is recognized as a way to chain several if statements together with a single endif at the end.
set var and setenv var will set var to a null string (considered to be the "set" state), not dump its value. Also, set accepts a list of variables and sets each of them to the null string.
When the $?var (existence) construct is used with a predefined variable, it tests whether the variable is in the "set" state, not just whether it exists.
/ and /= will perform integer division.
The right operand of the =~ and !~ pattern matching operators will be taken as a word which may contain wildcards.
In an expression, a variable name must be preceded by $. If it isn't, it'll be taken as a literal string. Also, the right-hand side of a == comparison is taken as a string, not an expression.
:-style editing operators will not be recognized after a command substitution.
onintr expects a goto label, not a statement following. Also, onintr and onintr - are now recognized as enabling or disabling interrupts, respectively.
$var[n-] is recognized as referring to elements n through last.
Escape sequences are preserved, not interpreted, inside single and double quotes except when escaping escapes or newlines. In those cases where escapes are recognized, the Hamilton C shell enhancements for sequences such as ^r, ^n, etc., are disabled. Inside quotes, the Berkeley C shell required two escape characters to escape a newline. Hamilton C shell's Berkeley mode accepts either one or two escape characters inside quotes as a way of escaping a newline. Also, embedding a newline into a quoted string has different results (as it does in the Berkeley C shell) depending on whether the C shell is running a script or not. Inside a script, the newline turns into 2 spaces; otherwise, it's preserved as a literal newline.
When quotes are used in the eof-string for a here document, they must appear exactly as used in the original << statement.
Comments in the middle of a multi-line statement end at the end of that physical line. They do not run all the way to the end of the entire statement. This was a trick in the Berkeley C shell to embed comments into the middle of a statement. This feature is implemented by rewriting the comments as they're read by the C shell into embedded comments wherever they occur.

These changes should allow most scripts to run without problems. However, there will still be a few differences:

The escape character will still be controlled by the escapesym variable (shared across all threads), which defaults to ^, not \.
Environmental variables will still be shared. Changing them in a script will change them as seen by the parent.
The special meaning of several break statements on one line will not be supported.
The following commands are not supported: bg, exec, fg, glob, jobs, limit, nice (but eval gives similar functionality), nohup, notify, stop, suspend, unlimit and %job.
No attempt is made to process any command-line arguments following the #!/bin/csh at the start of a script, nor is any attempt made to implement the generalized UNIX convention to allow other shells or language processors to be invoked based on the contents of that first line.

Hamilton Laboratories ▷ Hamilton C shell 2012 ▷ User guide