Wednesday, April 24, 2013

Some notes on find and replace using Vim, rename, and perl in command line...


Using rename in bash script

As mentioned in the previous post [1], I need to replace dots to other tokens in the filenames of a bunch of eps figures. The following script was used to do the work [2]:

#!/bin/sh
for f in $(find . -name "*.eps" -type f)
do
    echo "found: "$f
    # option -n is useful to preview the renamed results:
    rename -v 's/(\d+)\.(\d+)\.(\d+)/$1-$2-$3/' $f
done

In this case, I learned how to keep some parts of the old string and to replace other parts. The key concept was to use parentheses to group the parts we want to keep and then use $n to indicate the nth group in substitution expression. Use ``91.1.8.eps'' as the example:
  • \d stands for digits
  • \d+ means at least ONE digit
  • (\d+) hold the parts which are respectively 91, 1, and 8 in this example
  • $1 corresponds to the first group which is 91
  • $2 corresponds to the second group which is 1
  • $3 corresponds to the third group which is 8
Therefore, the dots between the digits will be replaced by dashes.

Substitution in Vim

After renaming all the eps files, I had another more complicated problem. All the corresponding filename strings resided in the tex files also had to be changed! At the beginning I edited one of the tex file in Vim and played with the substitution command in it. The final command I used was [3][4]:

:%s/\(\d*\)\.\(\d*\)\.\(\d*\)/\1-\2-\3/gc

Note that there are some minor differences when writing the expressions. Some of the modifications, e.g. the escaping backslash, were due to the difference between BRE and ERE [5].

Find and Replace in multiple files


Although I could do the find and replace works in Vim, it was not a good idea when there were maybe hundreds of such files. To write a bash script was my first thought and with information found on the internet [6][7][8] I got a usable script as the follows:

#!/bin/sh
for f in $(find . -name "*.tex" -type f)
do
    echo "found: "$f
    perl -p -i -e 's/(\d+)\.(\d+)\.(\d+)/$1-$2-$3/' $f
done

---
[1] XeTeX -- using dots in eps filenames would cause errors
[2] batch renaming with the rename command

[3] Vim Regular Expressions 101: Grouping and Backreferences
[4] Search/Replace in Vim

[5] Basic Regular Expressions and Extended Regular Expressions
[6] Eeasy Search and Replace in Multiple Files on Linux Command Line
[7] bash find directories
[8] Find file or directory in whole directory structure



Monday, April 15, 2013

Ubuntu 12.04 doesn't print PDF file via network printer

My office PC has been linked to two printers via network (intranet I think...) and had been worked fine until recently. Although I have installed and used Ubuntu 12.04 for a while but have not printed files often. Several weeks ago I tried to print some documents but the printers just gave me strange error messages and stopped working. Today I tried to print something again and had the same problem with the printers. This time I decided to make it work.

At the beginning I had only vague keywords and got no useful searching results in return. I tried to launch LibreOffice Writer to create a simple test file and it was printed successfully, but after saved as PDF the printing was failed. Then I noticed that the documents which had failed in printing were also all PDF files. So the problem could be the file type.

I used PDF as one of the searching keywords and found some bug reports of Ubuntu. I followed some suggestions in one of the threads [1] and got little success.

The approach of updating cups-filters with precise-proposed didn't work to me [2]. Actually I didn't see any update packages after I enabling the precise-proposed option.

The working one was changing settings of the printers via command line [3]. I made one of the printer worked with PDF documents by using settings as the following:
$ lpadmin -p Hewlett-Packard-HP-LaserJet-P3005 -o pdftops-renderer-default=gs
$ lpadmin -p Hewlett-Packard-HP-LaserJet-P3005 -o pdftops-max-image-resolution-default=0
where Hewlett-Packard-HP-LaserJet-P3005 is the printer name. I deleted the second setting by using
$ lpadmin -p Hewlett-Packard-HP-LaserJet-P3005 -R pdftops-max-image-resolution-default
and the HP printer still worked fine when printing PDF documents.

The other Xerox printer, however, still didn't work after changing the settings.


---
[1] Printing on PostScript printers (or printers with PostScript-based driver) not working
[2] #20 of the above thread
[3] #17 of the above thread

Thursday, April 11, 2013

XeTeX -- using dots in eps filenames would cause errors

I had a set of tex files which included many eps figures and were compiled successfully by using the latex+dvips+ps2pdf commands. But due to some Unicode issues I've shifted to XeTeX for at least several months [1]. A strange problem, however, emerged when I was invoking xelatex to compile the same set of tex files. The error message was:
! Unable to load picture or PDF file './EPS_FILE_DIR/91.1.8.eps'.
That's strange because I remembered that I have compiled other tex files with eps figures with xelatex flawlessly. I found the successfully compiled files to make sure it still could be compiled in my machine. It did. So the problem might be caused by the 91.1.8.eps itself. Suddenly it occurred to me that maybe the dots confused the xelatex command, so I changed its filename to 91_1_8.eps and solved the problem.

Although the problem has been solved, I still have no idea about why the dots could cause such a problem. I also tested with a 91.1.8.jpg file and to my surprise it passed the compilation without the error message.

I don't know whether other files (such as png, bmp, pdf, ...) also have similar problems, but I decided not to use dots to name my files anymore.

---
[1] XeTeX -- using the system fonts for CJK tex file

Monday, April 08, 2013

Some problems caused by Vistalizator

In the previous post I said using Vistalizator to change the display language in Win7 is easily. It was easy indeed, but there were some problems and I have only solved one of it.