Wednesday, July 31, 2013

Trying sklearn for machine learning (with the face recognition sample)

I was trying to evaluate the feasibility of a project, and Python of course was my first choice. During the build-up of the developing environment, however, I was frustrated due to the installation of scikit-learn package.

Quick tip: download the latest stable version (0.14a1) of scikit-learn and play with the sample code given in the source package.

Installation by pip (failed)

The first frustration might be caused by my stupidity.

I followed the instruction on the scikit-learn page and used pip to complete the installation. Then I googled an example code and found it couldn't be run successfully. The Python interpreter always complained with
ImportError: cannot import name scikits.learn
I googled for the solution again and again, and found all the answers pointed to ``multiple versions of Python installed in the system.'' But I have only Python 2.7 in my Ubuntu!

What I had done was uninstall the scikit-learn and reinstall. Also I tried to install it from the source, but nothing changed.

Then I thought of something and tried to find some sample code on the scikit-learn page. It turned out that the module should be sklearn instead of scikits.learn... Orz

So what I had found was a sample code using old module names.


Using version 0.13.1 (failed)

I am not sure whether this is a bug. I could not run the example code (fa_recognition.py) located in the source package of version 0.13.1. When my scikit-learn modules were also the version 0.13.1. The error message was:
ImportError: cannot import name column_or_1d
and I found ``import sklearn.datasets'' would trigger this error.

I also tried to follow the traceback message given by the interpreter but only knew it was due to an importing of label.py. My skill on debugging couldn't bring me further.


Verion 0.14a1 (Succeeded)

Okay, I'd run out my approaches... I almost gave up, but then I thought of the possibility of using the latest version to solve the problem. So I downloaded the source of version 0.14a1 and installed it. Finally, I got the sample code run with expected outputs.

Face recognition example test

If you have downloaded the source package, you can find the example in the path of: YOUR_FOLDER/scikit-learn-0.14a1/examples/applications/face_recognition.py.

Frankly, I have no idea about the output yet, but I would like to post the text output of running face_recognition.py with the figures of result.

Text output


===================================================
Faces recognition example using eigenfaces and SVMs
===================================================

The dataset used in this example is a preprocessed excerpt of the
"Labeled Faces in the Wild", aka LFW_:

  http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz (233MB)

.. _LFW: http://vis-www.cs.umass.edu/lfw/

Expected results for the top 5 most represented people in the dataset::

                     precision    recall  f1-score   support

  Gerhard_Schroeder       0.91      0.75      0.82        28
    Donald_Rumsfeld       0.84      0.82      0.83        33
         Tony_Blair       0.65      0.82      0.73        34
       Colin_Powell       0.78      0.88      0.83        58
      George_W_Bush       0.93      0.86      0.90       129

        avg / total       0.86      0.84      0.85       282




2013-07-31 08:04:43,243 Downloading LFW metadata: http://vis-www.cs.umass.edu/lfw/pairsDevTrain.txt
2013-07-31 08:04:46,028 Downloading LFW metadata: http://vis-www.cs.umass.edu/lfw/pairsDevTest.txt
2013-07-31 08:04:46,740 Downloading LFW metadata: http://vis-www.cs.umass.edu/lfw/pairs.txt
2013-07-31 08:04:48,140 Downloading LFW data (~200MB): http://vis-www.cs.umass.edu/lfw/lfw-funneled.tgz
2013-07-31 08:11:24,620 Decompressing the data archive to /home/thk/scikit_learn_data/lfw_home/lfw_funneled
2013-07-31 08:11:33,822 Loading LFW people faces from /home/thk/scikit_learn_data/lfw_home
2013-07-31 08:11:33,981 Loading face #00001 / 01288
2013-07-31 08:11:36,218 Loading face #01001 / 01288
Total dataset size:
n_samples: 1288
n_features: 1850
n_classes: 7
Extracting the top 150 eigenfaces from 966 faces
done in 0.806s
Projecting the input data on the eigenfaces orthonormal basis
done in 0.065s
Fitting the classifier to the training set
done in 16.244s
Best estimator found by grid search:
SVC(C=1000.0, cache_size=200, class_weight=auto, coef0=0.0, degree=3,
  gamma=0.001, kernel=rbf, max_iter=-1, probability=False,
  random_state=None, shrinking=True, tol=0.001, verbose=False)
Predicting people's names on the test set
done in 0.049s
                   precision    recall  f1-score   support

     Ariel Sharon       0.67      0.78      0.72        18
     Colin Powell       0.77      0.80      0.78        61
  Donald Rumsfeld       0.71      0.76      0.73        29
    George W Bush       0.90      0.89      0.89       134
Gerhard Schroeder       0.71      0.63      0.67        27
      Hugo Chavez       0.93      0.58      0.72        24
       Tony Blair       0.69      0.83      0.75        29

      avg / total       0.81      0.80      0.80       322

[[ 14   2   1   1   0   0   0]
 [  3  49   1   3   0   1   4]
 [  1   3  22   2   0   0   1]
 [  2   6   4 119   1   0   2]
 [  1   1   1   4  17   0   3]
 [  0   3   1   0   5  14   1]
 [  0   0   1   3   1   0  24]]


Figures



No comments:

Post a Comment