The diary about a travel which has started with debian.org

Check for duplicates

How to find those files that have different names but exactly the same content?
You could install the good fdupes or you could just reinvent the wheel with bash, md5sum and awk:
find path/ -type f | xargs md5sum | awk '{
	sub("[^/]*/","",$2);
	if (cache[$1])
		print "Found: "cache[$1],$2;
	else
		cache[$1]=$2
}'

path is where you want to search for duplicates. You can limit the search with the find maxdepth option.

Reverse Dependence of a package

In Debian every package depends on others and thus every package has generally at least another one which depends on it. Every once in a while you could need to know why a given package is present in your Debian machine. Here is how:

Method 1: apt-cache
$ apt-cache rdepends package
Shows all the packages, no matter whether they are installed or not, which depends on package.


Method 2: aptitude

If you, like me, don’t use aptitude very often (i.e. never) you should first update its package db:
# aptitude update
Then:
$ aptitude search '~i~Dpackage'
This command shows all the installed packages which depend on package.

Method 3: hand-made bash script

#!/bin/bash

# usage: ./irdeps.sh package [,package]
# show the packages which depend on a package and are installed

if [ $# -lt 1 ]; then
	echo "Usage: $0 package"
	exit -1;
fi
while [ "x$1" != "x" ];
do
	echo "reverse dependencies for $1..."
	while read pack;
	do
		if grep -q "^Package: ${pack/|/}$" /var/lib/dpkg/status;
		then
			awk -v pack=${pack/|/} '
				/^Package: / && $2 == pack && flag==0{
					flag=1;next
				}
				flag==1 && /^Status: /{
					if ($4 == "installed")
						print pack;
					else
						exit;
				}' /var/lib/dpkg/status;
		fi
	done < <(apt-cache rdepends $1 | grep ^[[:space:]])
	shift;
done
This script does the same of aptitude in about the same time, but it relies upon dpkg only (and bash+awk of course).
References: algebraicthunk.net/~dburrows.

Mysql and regular expressions

The typical SQL statement is something like this:

SELECT name
FROM accounts
WHERE surname='Steele'
However, in some cases you might need to find out all the tuples having a record that match a particular pattern. How could you do? It’s simple: with regular expressions.
SELECT name
FROM accounts
WHERE surname REGEXP pattern
Examples: Get all tuples in which the surname begins with “Sm”:
    SELECT name
    FROM accounts
    WHERE surname REGEXP '^Sm'
Get all tuples in which the surname ends with “ith”:
    SELECT name
    FROM accounts
    WHERE surname REGEXP 'ith$'
Get all tuples in which the surname contains “all” or “All”:
    SELECT name
    FROM accounts
    WHERE surname REGEXP '[Aa]ll'
And so on…

How to grep two strings in a file

Using grep we can check if a file contains a given string:
cat file
foo
bar
baz
if grep -q bar file; then
    echo file contains bar;
fi
If we need to check if file contains string1 and string2 in the same line in a given order, we still can use grep:
if grep -q 'string1.*string2' file; then
    echo file contains string1 and string2 (the former preceding the latter);
fi
However, if the order doesn’t matter, we can use awk:
if awk '/string1/ && /string2/ {exit 0} END{exit 1}' file; then
    echo file contains string1 and string2 in the same line;
fi
Now, if we want to check if file contains string1 and string2 even if not in the same line, we can use two greps:
if grep -q string1 file && grep -q string2 file; then
    echo file contains string1 and string2;
fi
but doing so, we read two times the same file; that sounds bad, especially if the file is a not very small. In order to reduce the number of reads, we can use awk (again):
if awk '/string1|string2/{res++} END{if (res>1) exit 0; exit 1}' file; then
    echo file contains string1 and string2;
fi
If we want to do the same task to every file in a given directory, we can simply put them in a for cicle:
for file in $dir/*; do
    if awk '/string1|string2/{res++} END{if (res>1) exit 0; exit 1}' $file; then
        echo $file contains string1 and string2;
    fi
done
or we can “slightly” change the awk script: awk 'FNR==1{if (fn && res>1) print fn" contains string1 and string2"; fn=FILENAME; res=0} /string1|string2/{res++} END{if (res>1) print fn" contains string1 and string2"}' $dir/* Last case: what happens if we need to check recursively (i.e. for every file in every subdirectory)? we can use find:
find $dir -type f -exec sh -c "awk 'FNR==1{if (fn && res>1) print fn \" contains string1 and string2\"; fn=FILENAME; res=0}
/string1|string2/{res++} END{if (res>1) print fn \" contains string1 and string2\"}' \"\$@\"" _ {} +

Lock the screen and save the environment

If you lock the screen with Ctrl+Alt+L (in gnome), a screensaver starts and you’ll be asked a password next time you press a key. That’s a pretty nice thing if you getting a pause from a public PC but from the environment point of view it doesn’t change anything because your machine is going to consume the same power if you are sitting in front of it or not. You can change the things by using a command to turn your monitor off :
$ sleep 10 && xset dpms force off
then you have 10 seconds for locking the screen in the usual way. A more friendly way to accomplish the same task is to use a keybinding and a little script:
#!/bin/bash
gnome-screensaver-command -l
xset dpms force off
Now just run gconf-editor and edit /apps/metacity/keybinding_commands/command_1 and /apps/metacity/global_keybindings/run_command_1 or, if you are using compiz, run ccsm and edit the respective values in General Options. No, you can’t use Ctrl+Alt+L but you may choose Ctrl+Alt+K (which I happily use) The above script is very simple but it’s far from being perfect. Here is something more complex:
#!/bin/bash

# lock the screen and start screensaver
gnome-screensaver-command -l

# store the string returned by "gnome-screensaver-command -q"
# We have to do this because the string is in your language
# which is not predictable :S
activestring=$(gnome-screensaver-command -q)

if [ -z "$activestring" ]; then
	echo "Error: null active string!" >&2;
	exit -1;
fi

sleep 1;

# if either the screensaver is active AND the user is not entering the password for unlocking the screen then we
# force the screen to remaing turned off.
# In addition, if the the paswword form is active the user has 30 seconds to type in the password. After this period the form
# is killed and we return to the screensaver + monitor switched off situation
while [ "$(gnome-screensaver-command -q)" = "$activestring" ]; do
	if ps aux | grep -q gnome-screensaver-[d]ialog; then
		sleep 30
		if [ "$(gnome-screensaver-command -q)" = "$activestring" ] && ps aux | grep -q gnome-screensaver-[d]ialog; then
			killall gnome-screensaver-dialog;
		else
			exit;
		fi
	fi
	xset dpms force off
	sleep 10;
done
The above script comes with two major features: On my home PC, the above script allows me to save about 30W (~30% of the total power consumption) while I am away. Well, hibernation (suspend to disk) would be a lot better but often you simply can’t do that.

Know your system

What are the executables you daily use made of? Are they scripts or binaries? What do developers use? Python? Java? Or perhaps Ruby? It’s easy and funny to find it out, you just need a couple of bash lines with a pinch of awk:
#!/bin/bash

datafile=$(mktemp /tmp/bintypes.logXXX)

# scan /bin /sbin,/usr/bin,/usr/sbin
for file in /{,s}bin/*  /usr/{,s}bin/*; do
	rfile=$(readlink -f $file); 

# file is a tool which tells what a file contains
	[ -f $rfile ] && file "$rfile" >>"$datafile";
done

# count all occurrences with awk
# in some cases we have an 'a ' article in front of the description (e.g. "a python script text executable")
# we cut it away with sub()
awk -F "[:,]" '
{
	sub(/ a /," ",$2);
	a[$2]++
}
END{
	for (el in a)
		print a[el] "\t" el
}' "$datafile" | sort -n
rm "$datafile";
Here is my output:
1	 ASCII English text
1	 awk script text executable
1	 /bin/loadkeys script text executable
2	 setgid ELF 32-bit LSB shared object
4	 /usr/bin/ruby1.8 script text executable
5	 setuid setgid ELF 32-bit LSB executable
13	 ELF 32-bit LSB shared object
14	 setgid ELF 32-bit LSB executable
35	 setuid ELF 32-bit LSB executable
61	 Bourne-Again shell script text executable
119	 python script text executable
225	 perl script text executable
377	 POSIX shell script text executable
1748	 ELF 32-bit LSB executable
Hence, except for the binaries, the most common language used is Posix Shell scripting, followed by Perl and then Python. That lonely awk script is /usr/sbin/mksmbpasswd while the plain text file is just an error: it’s a shell script without the shebang (to whom it may concern, the file is /usr/bin/gnome-power-bugreport.sh). Do these results surprise you? It must be said that of all the above tools, only 96 belong to Gnu coreutils, the others come from a really huge variety of programs (i.e. TeX, java, console.tools, etc.). In order to know which file belong to which package you have to modify the above script by using “dpkg -S $rfile” instead of file and change the field being counted by awk. That’s all.