ls
Conventions
echo
mv, cp and rm
more
touch
chmod
du, vol and pwd
dirs, pushd, popd and rotd
fgrep and grep
fgrep
grep and regular expressions
sed
diff
head and tail
cut
split
tabs
tr
strings
Other utilities
See also
Hamilton C shell comes with a lot of utilities that form some of its vocabulary. They do small, but oft-needed functions, often in a novel, faster or more convenient way than you’d find in “plain vanilla” Windows. This section provides a quick tour, outlining some of the capabilities and conventions.
ls is a somewhat nicer way to list a directory:
Subdirectories are highlighted (shown here in bold.) If a file or directory has the system bit set, it’s still listed, displayed in green. Normally, ls lists everything in lower case for better readability. In long format:
Conventionally, ls lists things alphabetically, with directories ahead of files. There might be hidden files or directories, but to see them you have to ask:
To find out how any of the utilities work, just use the -h option. For example,
tells about options for more detailed listings, sorting the list by date or by size, selecting only certain types of files, etc. ls is a read-only activity; it never makes any changes to the file system; lists are always sorted in memory.
The names of the utilities were chosen to be consistent with the names of similar functions on UNIX, where they provided much of the vocabulary of the original UNIX C shell. But changing the name of a utility is a simple matter: just rename the corresponding .exe file or, better still, create an alias (discussed later.)
By convention, the utilities expect options to come ahead of any files you specify. Options are case-sensitive. We’ve tried to use mnemonic letters for options (e.g., h for help) and to use the same letter to mean the same thing across related utilities; achieving that is simply more feasible with 52, not just 26 characters to choose from.
Our examples generally show options introduced with -, but you could equally well follow the Windows convention of using / if you prefer. If indeed you want only - or only / interpreted as an option character, this can be set with the SWITCHCHARS environmental variable, which can be set either from the C shell or from the Control Panel (or in your autoexec.bat file under Windows 9x.) It won’t have any effect on the standard Windows commands like dir or xcopy or on applications you purchase elsewhere, but it will work on all the commands supplied with the C shell. For example, to have only - recognized as an option character, you might type this into the C shell:
You can type options in any order (except where one overrides another, in which case the last setting is used) and you group them together or type them separately as you choose. For example, ls -L -d -w is exactly the same as ls -dwL and produces a -L long format (very detailed) list of the current directory, sorted by -d date (newest ones last), with sizes of any directories filled in by -w walking down through the directory tree, adding up all the sizes of all the files found there.
You can always unambiguously end the options with -- in case you have a filename or an argument string that begins with one of option-introducing characters. Also, since the shell does the wildcard expansion, it’s a bit more convenient and faster for the utilities to look for any options right at the beginning of what could be a very long list – up to 32 kilobytes – of filenames or other command-line text.
We’ll always follow the Windows convention of using \ in filenames in this book and we generally advise that you do too, not so much because the C shell cares but because so much other Windows software does. To some fair degree, it’s a case of “when in Rome, doing as the Romans do.” But if you really do prefer, you can generally use / with the C shell and all the utilities. Do remember, however, that if you type a filename starting with / to mean the root, you have to be careful that it can’t be confused as the start of an option. (This is a good use for the -- option or the SWITCHCHARS variable.)
echo is a little different than the vanilla Windows echo. It does only one thing: it prints whatever arguments words you give it; there’s no echo on or echo off-style status reporting function. But it does offer much finer control over what gets printed: you can write binary values, choose not to append a new line and write to stderr instead stdout.
Here’s an example where the ANSI escape sequences turning brightness and color on and off are embedded into a string being echoed.
The mv (move), cp (copy) and rm (remove) trio allows files and directories to be treated as simple objects.
mv will move either files or directories treating them simply as objects, even across disk partitions. In this example, the two hello files are moved into a new directory, illustrating how mv understands that if there’s a many-to-one relationship, the destination has to be a directory.
Similarly, cp will copy a file or even an entire directory. The copies cp produces are always exact logical copies, with correct timestamps and attribute bits and including any hidden or system files.
cp does not consider it an error to copy over an existing file unless the file about to be overwritten has its read-only bit set.
Finally, rm can be used to remove a file or even an entire directory. But it does insist that you tell it you really mean it if you ask to remove a directory that’s not empty or anything that’s marked with the system bit.
As you can see from these examples, the general style of the utilities is fairly terse. Like the proverbial Vermont farmer, they don’t say anything unless they’ve got something to say. Even copying or removing a directory happens without fanfare as long as the appropriate “yes, I really mean it” options are supplied.
more is an especially fast browsing filter. There are two ways to use more. The first is in a pipeline, the way “vanilla” more might be used when you suspect the data may be longer than a screenful:
If the output turns out to be less than a screenful, it’s as though you’d just typed the ls command by itself. In fact, there’s not even a noticeable performance penalty. But if it’s more than a screenful, more switches to an interactive mode where you can use the arrow keys, etc., to browse up and down through the listing.
more can also be used for browsing a list of the files you give it on the command line:
more incorporates the Berkeley notion referred to, tongue-in-cheek, as “more is less”: it’s a good paging filter that lets you go forwards and backwards. It also offers a number of different ways of looking at or searching the data including binary, as control characters, line-numbered, etc. Perhaps most important, it’s fast.
Part of more’s speed comes from an internal cache coupled to an indexing structure that it builds on the fly as it reads the input. When you move forward or backward within the cache, screen redraw rates are the limiting factor in performance. Outside of range of the cache, if the input is from a disk file, the indexing structure, technically an ISAM, tells more how to seek to the new location.
The cache is sized dynamically depending on whether the input is from a file or a pipe. When reading a file, it uses a cache about 100K characters; otherwise, the cache is about 4M characters
touch lets you change the timestamps of individual files or directories or, using the -r (recursive) option, of everything in a whole directory tree.
If the desired timestamp isn’t given, touch uses the current time. If the filename doesn’t exist, it’s created as a zero-length file.
chmod lets you set a file’s attributes but leaves the timestamp alone. Here is an example, first setting the system bit (making it show up in green), then making it hidden:
Of course, the file is still there and you can continue to manipulate its attributes:
Many users will find that a file’s system bit is more useful than they’d thought before. With chmod, it’s easy to set or clear the bit and setting it doesn’t make the file hidden. Quite the contrary, ls makes it stands out in green. Also, a file marked “system” is a little safer from accidental deletion or overwriting. These are often convenient characteristics to attach a few specific files within a large directory. For example, the author tends to routinely mark make files within a C source code directory as “system” just so they’ll stand out.
du, vol and pwd provide quick snapshots of your disk partitions: du tells how much of the partition is used; vol displays the label; and pwd shows the current directory on each partition.
A common convention observed by the utilities is that if one entry on a list is more current or special than the others, it’s highlighted. du, vol and pwd each highlight the entry describing the current disk.
For the benefit of those who have lots of partitions, some of which they don’t want to bother listing all the time, du, vol and pwd look for a DRIVEMASK environmental variable which can be used to mask off just the drive you want. This is especially useful for excluding drives that take removable media; if they’re empty, they can waste a lot of time trying to read a diskette that’s not there.
The shell provides a builtin mechanism for keeping several directories “handy.” This mechanism is the directory stack, which always contains a list of fully-qualified directory pathnames with the current directory at the top. You can display the list with the dirs command:
Initially the list contains only your current directory. When you push a new directory on the stack with pushd, that becomes your new current disk and current directory. pushd also reports the resulting stack contents.
Calling pushd without any arguments just swaps the top two directories:
Popping elements off the stack is done with popd, which also reports the resulting stack.
The stack can also be rotated with rotd. (We’ll push another directory first so we can see that rotation is upward, with the top item going to the bottom of the stack.)
You can pop multiple directory entries at once, but if you ask to pop more than exist, you’ll get a message:
fgrep and grep are fast string search utilities. Their names (derived from “general regular expression pattern-matching”) and the regular expression syntax they implement are traditional; it’s an accepted standard and we’ve followed it.
fgrep and grep are used to scan through long lists of files or filter data coming through a pipe for strings or patterns you specify. They’ll quickly report all the matching lines. If you like, you can get more or less detail in the output, e.g., have line numbers shown or just get a total count of all the matches.
fgrep and grep both have the ability to look for a large number of patterns in parallel (using the -s or -f options) with almost no discernible performance degradation. They’re very fast. Both precompile and optimize their search patterns, use direct kernel API calls for all i/o and use a very high performance buffering structure to allow extremely fast scanning of large amounts of data.
fgrep is the simpler and slightly faster of the two search utilities. It does a simple string compare between the string you’re looking for and the characters on each line. If the search string is found anywhere on the line, it’s a match. There are some options for ignoring differences in upper/lower-case or in the amount of white space (spaces and tabs) between words but mostly it’s quite simple comparison.
Here’s an example of using fgrep to search a very simple personal phone directory where each record is just a line of text and we’ll search it . (Later we’ll learn how to package things like this up into aliases or shell procedures so you can call them with just a few keystrokes.)
grep looks for special patterns called regular expressions, which are similar to (but slightly different from) filename wildcarding. The grammar is recursive, meaning a regular expression to be matched can be written, in turn, as a nested series of regular expressions in decreasing precedence:
c | Any ordinary character matches itself. |
\c |
Match the literal character c. Certain characters are treated specially: |
\a |
Audible Alert (Bell) |
\b | Backspace |
\f | Form Feed |
\n | NewLine |
\r | Carriage Return |
\t | Tab |
\v | Vertical Tab |
\\ | Single BackSlash |
\x | The next one or two digits are treated as hex digits specifying the character code. |
^ | Beginning of line. |
$ | End of line. |
. | Match any single character. |
[...] | Match any single character in the list. |
[^...] | Match any single character not in the list. |
\n |
Match whatever literal text the n’th tagged \(...\) expression matched. |
r* | Match zero or more occurrences of r. |
r\{n\} |
Match exactly n occurrences of r, where n is an unsigned decimal integer. |
r\{n,\} |
Match at least n occurrences of r. |
r\{n,m\} |
Match at least n, but not more than m occurrences of r. |
r\{,m\} |
Match at most m occurrences of r. |
r1r2 |
Match expression r1 followed by r2. |
\(r\) |
Tagged regular expression. Match the pattern inside the |
At the lowest layer, you give a character or set of characters to be matched anchored, if you want, to match just the beginning or just the end of a line. At the next layer, the * character lets you match a variable number of repetitions of a pattern.
When you type a regular expression on the command line, keep in mind: (1) Many of the characters have special meaning to the C shell and have to be inside quotes. (2) You have to type ^^ to get just one circumflex because ^ is the shell’s default escape character. (3) * is a postfix operator. It operates on the preceding regular expression; by itself, it is not a “match zero or more characters” wildcard character as you may be used to with filenames.
Here’s an example of searching through all the source code for a large application, looking for all occurrences of lines that begin with statement followed by a y somewhere on the line and showing the line numbers of any matches. (The -s option tells pushd and popd to work silently.)
sed is a stream editor. Just as you might think of using a regular editor to edit a file, deleting or inserting lines, doing search/replace operations, etc., sed lets you edit a stream of data: individual lines are read from stdin, edited according to the script you give and written to stdout. A very simple sort of script might be given right on the command line. Here’s a simple search/replace:
sed uses the same regular expressions used by grep. It’s possible to pick up pieces of the input as tagged expressions and move them around. In this example, the two strings on either side of the space are tagged, then swapped around. Quotes are used around the search/replace command so the C shell will treat it as one long literal string to be passed to sed. Parentheses, spaces and asterisks otherwise have special meaning. Notice how the * construct, meaning match zero or more occurrences actually matches as many repetitions as possible.
For more complex operations, sed offers a wide array of operators including even conditional branches and a hold buffer where a string can be saved temporarily from one line to the next. If your script is very long, the -f option lets you specify it in a file.
diff is an extremely fast and flexible utility for quickly comparing ASCII files, looking for differences. In the simplest form, you simply give it two filenames corresponding to the old and new versions and let it go to work, reporting sections that have been deleted or added in a traditional format. For example, a software developer might use it to compare old and new versions of a C program:
Each change is reported in terms of the line number or range in the old version, whether it’s an addition, change or deletion, the line numbers in the new version and then the affected lines from each file, separated by a line of ---. diff supports the traditional options for ignoring differences in upper/lower-case or in the amount of white space on the line, for recursively comparing entire directory trees of files, etc.
One of diff’s most novel features is its ability with the -! option to generate a merged listing where text that’s deleted is shown in red, new text is shown in green and the rest is displayed normally. This makes it extremely easy to view your changes in context. (To use this option, remember that ! is a special character to the shell; type it at the end of the option list so there’ll be a space following.)
head and tail are used to display just the first or last few lines or characters of a file. Normally, they expand any tabs into spaces so you don’t need to filter them through more.
tail is particularly interesting. If all you want is the end of a very large file, tail doesn’t waste time reading the whole file from start to finish. Instead, it jumps right to the end and reads it backwards! If the file is truly large (on the order of several megabytes) and all you want is a little bit off the end, this is the difference between chugging along for several seconds versus getting an almost instantaneous response.
tail also has a -f follow option. What that means is that when it gets to the end of file, it enters an endless loop, sleeping for a second, then waking up to see if more has been added. This is particularly useful if, e.g., you have an operation, say a large make, active in one window with its output redirected to a file. From another window you can periodically check in on the progress by typing:
tail lets you watch lines get added without consuming much processor resource (since it sleeps in the kernel most of the time) so you can watch a background activity progress without affecting its performance. After you’ve watched for a while, just type Ctrl-C to interrupt and get out. The interrupt only goes to the tail program; the application off in the background or in another window creating the file is not affected and will go on about its business until you come back once again to check on it.
cut is a simple filter for selecting out just certain fields or character positions of each line of input. You choose what characters should be interpreted as the field delimiters and which fields should be copied to the output. For example, if you kept your phone book in phone.txt, you might strip off just the first word from each line to get everyone’s first names:
The -f option means you want to count by fields, selecting the first field and that the delimiter is a space character. (Notice the quotes around the space.)
split lets you break up a large file into smaller, fixed-size pieces counting either by lines or by characters. Each of the smaller files it creates are numbered, e.g., chunk.001, chunk.002, chunk.003, etc.
An example of where you might use split would be if you wanted to copy a file to CD, but it was too big. You could use split to break it up into 600 MB chunks.
tabs lets you expand or unexpand tab characters based on a set of tab settings you give it. Tab settings are religious. I like them every 3 spaces but you probably like something else.
tr is a another simple filter for translating characters from input to output. For example, you could translate everything from lower to upper case by typing:
We typed the first hello world and tr has just echoed it in upper case. ^Z is the end-of-file character defined by Windows. tr also has a number of options for squeezing out repeated sequences of the same character or editing out just certain characters and even for normalizing the text in a file, ensuring that every line ends with a carriage return/line feed combination. That’s handy if you’re importing a file from another operating system.
strings lets you simply list out all the ASCII strings in an otherwise binary file. A really simple use would be extract strings from a spreadsheet file. Various options are available to trimming the output so only strings of a minimum length, etc. In this example, we use the -a option to find all strings, even those that aren’t null-terminated:
strings can also let you extract Unicode strings:
Other utilities provide means for sleeping for a timed period, counting the number of words in a file and so on. Part of the appeal of Hamilton C shell is that it’s relatively easy to continue expanding the vocabulary with simple utilities that may each be only a few hundred lines long.
This has been a fast introduction. Fortunately, you don’t have to learn the utilities just from the book. All have on-line information available with -h. We encourage you to experiment.
We’ve been at this for over 20 years and we’re still giving thought to additional utilities. If you have favorites you’d like to see included or maybe offered as new products, please contact us.
Builtin utilities
External utilities
Wildcarding
Quoting