
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/cluster/plot_dict_face_patches.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_cluster_plot_dict_face_patches.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_cluster_plot_dict_face_patches.py:


Online learning of a dictionary of parts of faces
==================================================

This example uses a large dataset of faces to learn a set of 20 x 20
images patches that constitute faces.

From the programming standpoint, it is interesting because it shows how
to use the online API of the scikit-learn to process a very large
dataset by chunks. The way we proceed is that we load an image at a time
and extract randomly 50 patches from this image. Once we have accumulated
500 of these patches (using 10 images), we run the
:func:`~sklearn.cluster.MiniBatchKMeans.partial_fit` method
of the online KMeans object, MiniBatchKMeans.

The verbose setting on the MiniBatchKMeans enables us to see that some
clusters are reassigned during the successive calls to
partial-fit. This is because the number of patches that they represent
has become too low, and it is better to choose a random new
cluster.

.. GENERATED FROM PYTHON SOURCE LINES 22-85


.. rst-class:: sphx-glr-script-out

.. code-block:: pytb

    Traceback (most recent call last):
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/examples/cluster/plot_dict_face_patches.py", line 34, in <module>
        faces = datasets.fetch_olivetti_faces()
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/utils/validation.py", line 72, in inner_f
        return f(**kwargs)
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_olivetti_faces.py", line 111, in fetch_olivetti_faces
        mat_path = _fetch_remote(FACES, dirname=data_home)
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_base.py", line 1181, in _fetch_remote
        urlretrieve(remote.url, file_path)
      File "/usr/lib/python3.10/urllib/request.py", line 241, in urlretrieve
        with contextlib.closing(urlopen(url, data)) as fp:
      File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
        return opener.open(url, data, timeout)
      File "/usr/lib/python3.10/urllib/request.py", line 519, in open
        response = self._open(req, data)
      File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
        result = self._call_chain(self.handle_open, protocol, protocol +
      File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
        result = func(*args)
      File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
        return self.do_open(http.client.HTTPSConnection, req,
      File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
        raise URLError(err)
    urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>






|

.. code-block:: default

    print(__doc__)

    import time

    import matplotlib.pyplot as plt
    import numpy as np


    from sklearn import datasets
    from sklearn.cluster import MiniBatchKMeans
    from sklearn.feature_extraction.image import extract_patches_2d

    faces = datasets.fetch_olivetti_faces()

    # #############################################################################
    # Learn the dictionary of images

    print('Learning the dictionary... ')
    rng = np.random.RandomState(0)
    kmeans = MiniBatchKMeans(n_clusters=81, random_state=rng, verbose=True)
    patch_size = (20, 20)

    buffer = []
    t0 = time.time()

    # The online learning part: cycle over the whole dataset 6 times
    index = 0
    for _ in range(6):
        for img in faces.images:
            data = extract_patches_2d(img, patch_size, max_patches=50,
                                      random_state=rng)
            data = np.reshape(data, (len(data), -1))
            buffer.append(data)
            index += 1
            if index % 10 == 0:
                data = np.concatenate(buffer, axis=0)
                data -= np.mean(data, axis=0)
                data /= np.std(data, axis=0)
                kmeans.partial_fit(data)
                buffer = []
            if index % 100 == 0:
                print('Partial fit of %4i out of %i'
                      % (index, 6 * len(faces.images)))

    dt = time.time() - t0
    print('done in %.2fs.' % dt)

    # #############################################################################
    # Plot the results
    plt.figure(figsize=(4.2, 4))
    for i, patch in enumerate(kmeans.cluster_centers_):
        plt.subplot(9, 9, i + 1)
        plt.imshow(patch.reshape(patch_size), cmap=plt.cm.gray,
                   interpolation='nearest')
        plt.xticks(())
        plt.yticks(())


    plt.suptitle('Patches of faces\nTrain time %.1fs on %d patches' %
                 (dt, 8 * len(faces.images)), fontsize=16)
    plt.subplots_adjust(0.08, 0.02, 0.92, 0.85, 0.08, 0.23)

    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.022 seconds)


.. _sphx_glr_download_auto_examples_cluster_plot_dict_face_patches.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_dict_face_patches.py <plot_dict_face_patches.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_dict_face_patches.ipynb <plot_dict_face_patches.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
