This is an old revision of the document!

shell tips

Here's a tutorial: Advanced Bash-Scripting Guide.

Quick Tips

The History Expansion character is “!”. To search the history for a previous “scp” command and only print it, try the first line below. But if you want to interactively find that command, type <Ctrl>+r,scp.

$ !?scp?:p
$ ^rscp

bash expansion

$ cp file{,.bk}

expands to

$ cp file file.bk

Replace all files that end with .JPG to .jpeg

for file in *.JPG; do mv $file ${file%.JPG}.jpeg; done
for file in *.JPG; do mv $file ${file/JPG/jpeg}; done

Then there are two different “rename” commands:

rename .JPG .jpg *.JPG 
rename "s/JPG/jpg/" *.JPG

Command Template

Here's a template for shell commands that demonstrates a number of arguments, length of argument, etc. It could still stand a bit of clean-up according to the Google Shell Style Guide.

Another good resource is Better Bash Scripting in 15 minutes.

#!/usr/bin/env bash
set -eu -o pipefail # See: https://sipb.mit.edu/doc/safe-shell/
 
declare -r SCRIPT_NAME=$(basename "$BASH_SOURCE")
 
## exit the shell (default status code: 1) after printing the message to stderr
die() {
    echo >&2 "$1"
    exit ${2-1}
}
 
## the options used by this script
DISK=e
declare -i VERBOSE=0
 
## exit the shell (with status 2) after printing the message
usage() {
    echo "\
$SCRIPT_NAME -hv [Drive Letter] (default: $DISK)
    -h      Print this help text
    -v      Enable verbose output
"
    exit 2;
}
 
## Process the options
while getopts "hv" OPTION
do
  case $OPTION in
    h) usage;;
    v) VERBOSE=1;;
    \?) usage;;
  esac
done
 
## Process the arguments
shift $(($OPTIND - 1))
 
if [ $# -eq 0 ]; then
    : # Let the default be used
elif [ $# -eq 1 ]; then
    if [ ${#1} -eq 1 ]; then
        DISK=$1
    else
        # 64 is EX_USAGE from sysexits.h
        die "$SCRIPT_NAME: Drive Letter can only be one character long." 64
    fi
else
    usage;
fi
 
## Lock this if only one instance can run at a time
# UNIQUE_BASE=${TMPDIR:-/tmp}/"$SCRIPT_NAME".$$
LOCK_FILE=${TMPDIR:-/tmp}/"$SCRIPT_NAME"_"$DISK".lock
if [ -f "$LOCK_FILE" ]; then
  die "$SCRIPT_NAME is already running. ($LOCK_FILE was found.)"
fi
trap "rm -f $LOCK_FILE" EXIT
touch $LOCK_FILE
 
## The main work of this script
 
if [ ! -d /cygdrive/"$DISK"/backup/Users ]; then
    mkdir -p /cygdrive/"$DISK"/backup/Users
fi
 
((VERBOSE==1)) && echo "Starting at $(date)"
rsync /cygdrive/c/Users/me /cygdrive/"$DISK"/backup/Users
 
# We add "|| true" because we don't want to stop 
# if the directory was already empty
rm -r /cygdrive/c/Users/me/tmp/* || true
 
# Note how we find the number of cores to use
make -C build_subdirectory all -j$(grep -c ^processor /proc/cpuinfo)

Miscellaneous Shell Tips

If you want a single column of just the file and path names, you can get it like so:

ls --format=single-column

But if you don't know what you're doing, you might construct something like so:

ls -Al | tr -s ' ' | cut -d ' ' -f10-

List “almost all” items in “long” format (one line per item)
Squeeze repeats of the space character
Cut out everything from before the 10th column and show everything afterwards.

Of course, if you could assert the following:

none of the first columns were repeats (awk would only identify the first repeated column)
the desired column didn't have delimiters in it (filenames with spaces)

…you could use awk

... | awk '{print $10}'

Anyway, given a list of directories, they can be inserted into a cp command with xargs if you need.

cat list_of_directories_at_one_level.txt | xargs -I {} cp -r $SOURCEDIRPREFIX:{} $DEST

Useful bash command for finding strings within python files…

find . -name \*.py -type f -print0 | xargs -0 grep -nI "timeit"
find . -type f \( -name \*.[ch]pp -or -name \*.[ch] \) -print0 | xargs -0 grep -nI printf

Interesting way to use grep -v to remove paths from a list generated by find. Not sure about the escaped | character, though…

#!/bin/bash
find $PWD -regex ".*\.[hcHC]\(pp\|xx\)?" | \
    grep -v " \|unwantedpath/unwantedpath2\|unwantedpath3" > cscope.files
cscope -q -b

Here's how to find if a symbol is in a library, and how to search lots of object files and print the filename above the search…

nm obj-directory/libmyobject.a | c++filt | grep Initialize_my_obj
find bindirectory/ -name \*.a -exec nm /dev/null {} \; 2>/dev/null | \
    c++filt | grep -P "(^bindirectory.*\.a|T Initialize_my_obj)"

Also handy to merge two streams together…

( cat file1 && cat file2 ) | sort

When a little quick math is needed, use bc

$ bc <<< "obase=16;ibase=10;15"
F
$ bc -l <<< 1/3
.33333333333333333333
$ bc <<< "scale=2; 1/3"
.33
$ bc <<< "obase=10;ibase=16;B"
11

and, when coverting from hex to dec…

echo $((0x2dec))

But, then again, does that really seem easier than,

python -c "print int('B',16)"

There's a bash way to calculate how many days ago a date was:

$ echo $(( ($(date +%s) - $(date -d "2012-4-16" +%s)) / 86400 ))

And a Python way…

python -c "import datetime; print (datetime.date.today() - datetime.date( 2012, 4, 16 )).days"

And for displaying lines clipped at the right edge of the window instead wrapped:

cat_one_line_per_row() {
  cat "$@" | expand | cut -b1-$COLUMNS
}

or a “clip” command like so:

alias clip="expand | cut -b1-\$COLUMNS"

ctags's man page says that one of its bugs is that it has too many options. Ain't that the truth. Make note of the obscure flag here, –c++-kinds=+p, that tells ctags to process prototypes and method declarations.

ctags -n --if0=yes --c++-kinds=+p --langmap=c++:+.inl.lst \
    --langmap=asm:+.inc --file-tags=yes -R --extra=fq \
    --exclude=unwanted_file.lst \
    --exclude='*unwanted-directory*/*' \
    --regex-C++='/^.*CINIT.(.+),.*,.*,.*/CURLOPT_\1/'

When you want to repeat a command a few times…

seq 1 50 | xargs -I{} -n1 echo '{} Hello World!'

When you've set up Perforce to use an application for diff with export P4DIFF='vim -d' , you can still do a regular diff like so:

$ P4DIFF=; p4 diff hello-world.cpp

It's hard to be sure which Perforce changelist you sync'ed if you didn't explicitly sync to a changelist.

So, use p4_sync to sync to a specific changelist, and update a source file too.

p4_sync() {
    p4 changes -s submitted -m1 ... | tee p4_sync_to_change.txt 
    changelist=`cut -d " " -f 2 p4_sync_to_change.txt`
    changelist_filename=changelist.h
    p4 sync ...@$changelist
    if [ -w $changelist_filename ]
    then
	sed -i 's/"[0-9]\+";/"'$changelist'";/' $changelist_filename
    fi
}

Note the use of $@ vs “$*” in the next function that automatically saves an archive of a telnet session. Also note that I remove spaces and colons. (Colons because they screw with opening files directly at line numbers).

telnet_log() {
    curtime=$(date -Iseconds | tr : .)
    args=$(echo "$*" | tr ' ' '_')
    telnet $@ | tee $HOME/telnetlog/$args\_${curtime::-5}.log
}
 
last_telnet_log() {
    ls -d1t $HOME/telnetlog/* | head -n 1
}

Of course if you do that, you'll want to occasionally (via cronjob?) delete old archives.

find $HOME/telnetlog/ -type f -mtime +6 -delete

Keywords: bash shell sh zsh

.vimrc tips

Here's an alternative way to automatically save backups (with dates in the filename) everytime you save a file.

set backup
set backupdir=~/.vim/backup/
au BufWritePre * let &bex = '-' . strftime( "%Y%m%d-%H%M%S" )

That makes a lot of files, so you can clean out the backups with a cron job like this:

# at 3 in the morning on Mondays, delete files older than 30 days
0 3 * * 1 find $HOME/.vim/backup/ -type f -mtime +30 -delete

expect tips

What to do when it's not sure you're going to make a connection?

set times 0
set made_connection 0
set timeout 120
while { $times < 2 && $made_connection == 0 } {
    spawn nc $SERVER
    send "\r"
    expect {
        "login:" {
            send "john.doe\r"
            set made_connection 1
        } eof {
            sleep 1s
            set times [ expr $times + 1 ]
        } timeout {
            puts "Didn't expect to timeout."
            exit
        }
    }
}

I think the following is wrong-headed. It's not usually the case that spawn will fail.

set times 0;
while { $times < 2 && $made_connection == 0 } {
    if { [ catch { spawn nc $SERVER } pid ] } {
        set times [ expr $times + 1 ];
        sleep 1s;
    } else {
        set made_connection 1
    }
}

Perl tips

The module Search::Dict has a “look” function that can be used to do a binary search in an ordered dictionary file (a logfile (or log file) that starts with timestamps works). File::SortedSeek might also be recommended.

Application Memory Usage

Use VM Resident Set Size. See VmRSS below. (Note the difference between RSS and VmRSS. If one process has memory mapped, it's not usable any any other process)

host:# ps -ef | grep etflix
default   1532  1081  6 22:06 ?        00:01:21 pkg_/metflix 
root      2108  1046  0 22:26 ?        00:00:00 grep etflix
host:# pidof netflix
1532
host:# cat /proc/1532/status
Name:   MAIN
...
Groups:
VmPeak:   220776 kB
VmSize:   210096 kB
VmLck:         0 kB
VmHWM:     95168 kB
VmRSS:     74488 kB
...

Or, while running an application, to see how much is free over time, do this from another shell:

while [ 1 ]
do
    free -m | grep Mem
    sleep 3
done

Alternatively, to see the RSS use of that process alone:

while true; do sync; cat /proc/$(pidof yourprocess)/status | grep VmRSS; sleep 1; done

Measuring Available Memory

This note doesn't entirely make sense to me. Maybe need to study up on “cat /proc/meminfo” vs. “cat /proc/vmstat” vs. “vmstat”.

The best measure I've found for “available memory” is nr_inactive_file_pages+nr_active_file_pages+nr_free_pages from /proc/vmstat. And then you have to subtract out some heuristically determined value which is base system working set. (That heuristically determined value can be 30-40MB.)

The command free just isn't a great indicator in general of how much memory is available because it doesn't account for the cached file-backed pages that could be dumped to make more memory available.

Shared Memory Usage

To increase limit to 256MB from command line:

echo "268435456" > /proc/sys/kernel/shmmax
echo "268435456" > /proc/sys/kernel/shmall

Or, edit /etc/sysctl.conf:

kernel.shmmax= 268435456
kernel.shmall= 268435456

Performance Metrics

Use perf-timechart
gperftools

And you can scrape logs that start with timecodes to create Spreadsheet charts. Given logs like:

2016-10-13 19:54:44  memory 22a4

On a Macintosh:

grep memory devicelogs.txt | tr -s ' ' | cut -d " " -f 1,2,4 | \
sed 's/\([0-9\-]\+\) \([0-9:]\+\).[0-9]\+ \([0-9a-f]\+\)/\1,\2,=DATEVALUE("\1")+TIMEVALUE("\2"),=HEX2DEC("\3")/' >  heapinfo.csv; \
open heapinfo.csv -a "Microsoft Excel"

And on Linux, instead of opening Microsoft Excel, that last line would be:

libreoffice --calc heapinfo.csv

Cron

Keep tasks serialized with flock(1):

  (
       flock -n 9 || exit 1
       # ... commands executed under lock ...
  ) 9>/var/lock/mylockfile

Retrieving Symbols with addr2line

You can gather a backtrace (stacktrace) with this piped command to addr2line.

  $ cat << EOF | cut -d " " -f 3 | tr -d "[]" | \
    addr2line -e builds/austin/src/platform/gibbon/netflix | \
    xargs -d '\n' realpath --relative-to=.
  > 7/22 app() [0xf7878] (0xf7878)
  > 8/22 app() [0x39c2f8] (0x39c2f8)
  > 9/22 app() [0xe1964] (0xe1964)
  > EOF
  src/Application.h:106 (discriminator 3)
  src/platform/main.cpp:521
  src/Application.cpp:95

Sort by Frequency

I ran the following P4 command to find out who's been editing a file recently:

 $ find . -name fname.cpp | xargs p4 filelog -s -m 10 | \
   awk '/^\.\.\. #/ {print $9}' | cut -d @ -f 1 | sort | uniq -c | sort -nr

jq Tips

jq is really handy. Here's a tip for some processing I often do:

fruits.txt

{ "fruits":
  {
    "apple":
       {
          "name": "Apple",
          "price" : 2
       },
    "banana":
       {
          "name": "Banana",
          "price" : 3
       },
    "count": 2,
    "open": true
  }
}

$ jq '.fruits|del(.count,.open)|with_entries(.value |= .price)' fruits.txt
{
  "apple": 2,
  "banana": 3
}

# with_entries(f) is an alias for to_entries | map(x) | from_entries

jq '.fruits|del(.count,.open)|to_entries|map(.value |= .price)|from_entries' f
{
  "apple": 2,
  "banana": 3
}

$ jq '.fruits|del(.count,.open)|[to_entries[]|{(.key): .value.price}]|add' fruits.txt
{
  "apple": 2,
  "banana": 3
}

fruit_ip.txt

{
  "192.168.144.52": {
    "ipAddress": "192.168.144.52",
    "attributes": {
      "model": "apple",
      "name": "David's apple"
    }
  },
  "192.168.144.40": {
    "ipAddress": "192.168.144.40",
    "attributes": {
      "model": "banana",
      "name": "David's banana"
    }
  }
}

$ jq '[to_entries[]|{"key":.value.attributes.name,"value":.key}]|from_entries' fruit_ip.txt
{
  "David's apple": "192.168.144.52",
  "David's banana": "192.168.144.40"
}

$ jq '[.[]|{(.attributes.name):.ipAddress}]|add' fruit_ip.txt
{
  "David's apple": "192.168.144.52",
  "David's banana": "192.168.144.40"
}

$ jq -r "to_entries|map(\"\(.value.attributes.name) = \(.key)\")|.[]" fruit_ip.txt
David's apple = 192.168.144.52
David's banana = 192.168.144.40

Protips for find

How to use "sh -c" without {} in find's -exec.

Given a lib directory, I wanted to find all the actual .so files that needed libz.

find lib -type f -name \*.so\* -exec sh -c 'objdump -p "$1" | grep "NEEDED.*libz"' - {} \; -print

Note that you can pass -print (or -and -print) after a -exec argument. Also, the “ - ” is just a placeholder for $0 (usually the command name, in this case “sh”), we want $1 to be {}. It outputs results like:

  NEEDED               libz.so.1
lib/libprotoc.so.13.0.2

Additional Keywords

Linux, Unix, *nix

Table of Contents