Wednesday, April 24, 2013

Some notes on find and replace using Vim, rename, and perl in command line...


Using rename in bash script

As mentioned in the previous post [1], I need to replace dots to other tokens in the filenames of a bunch of eps figures. The following script was used to do the work [2]:

#!/bin/sh
for f in $(find . -name "*.eps" -type f)
do
    echo "found: "$f
    # option -n is useful to preview the renamed results:
    rename -v 's/(\d+)\.(\d+)\.(\d+)/$1-$2-$3/' $f
done

In this case, I learned how to keep some parts of the old string and to replace other parts. The key concept was to use parentheses to group the parts we want to keep and then use $n to indicate the nth group in substitution expression. Use ``91.1.8.eps'' as the example:
  • \d stands for digits
  • \d+ means at least ONE digit
  • (\d+) hold the parts which are respectively 91, 1, and 8 in this example
  • $1 corresponds to the first group which is 91
  • $2 corresponds to the second group which is 1
  • $3 corresponds to the third group which is 8
Therefore, the dots between the digits will be replaced by dashes.

Substitution in Vim

After renaming all the eps files, I had another more complicated problem. All the corresponding filename strings resided in the tex files also had to be changed! At the beginning I edited one of the tex file in Vim and played with the substitution command in it. The final command I used was [3][4]:

:%s/\(\d*\)\.\(\d*\)\.\(\d*\)/\1-\2-\3/gc

Note that there are some minor differences when writing the expressions. Some of the modifications, e.g. the escaping backslash, were due to the difference between BRE and ERE [5].

Find and Replace in multiple files


Although I could do the find and replace works in Vim, it was not a good idea when there were maybe hundreds of such files. To write a bash script was my first thought and with information found on the internet [6][7][8] I got a usable script as the follows:

#!/bin/sh
for f in $(find . -name "*.tex" -type f)
do
    echo "found: "$f
    perl -p -i -e 's/(\d+)\.(\d+)\.(\d+)/$1-$2-$3/' $f
done

---
[1] XeTeX -- using dots in eps filenames would cause errors
[2] batch renaming with the rename command

[3] Vim Regular Expressions 101: Grouping and Backreferences
[4] Search/Replace in Vim

[5] Basic Regular Expressions and Extended Regular Expressions
[6] Eeasy Search and Replace in Multiple Files on Linux Command Line
[7] bash find directories
[8] Find file or directory in whole directory structure



No comments:

Post a Comment