
.. DO NOT EDIT.
.. THIS FILE WAS AUTOMATICALLY GENERATED BY SPHINX-GALLERY.
.. TO MAKE CHANGES, EDIT THE SOURCE PYTHON FILE:
.. "auto_examples/multioutput/plot_classifier_chain_yeast.py"
.. LINE NUMBERS ARE GIVEN BELOW.

.. only:: html

    .. note::
        :class: sphx-glr-download-link-note

        Click :ref:`here <sphx_glr_download_auto_examples_multioutput_plot_classifier_chain_yeast.py>`
        to download the full example code

.. rst-class:: sphx-glr-example-title

.. _sphx_glr_auto_examples_multioutput_plot_classifier_chain_yeast.py:


============================
Classifier Chain
============================
Example of using classifier chain on a multilabel dataset.

For this example we will use the `yeast
<https://www.openml.org/d/40597>`_ dataset which contains
2417 datapoints each with 103 features and 14 possible labels. Each
data point has at least one label. As a baseline we first train a logistic
regression classifier for each of the 14 labels. To evaluate the performance of
these classifiers we predict on a held-out test set and calculate the
:ref:`jaccard score <jaccard_similarity_score>` for each sample.

Next we create 10 classifier chains. Each classifier chain contains a
logistic regression model for each of the 14 labels. The models in each
chain are ordered randomly. In addition to the 103 features in the dataset,
each model gets the predictions of the preceding models in the chain as
features (note that by default at training time each model gets the true
labels as features). These additional features allow each chain to exploit
correlations among the classes. The Jaccard similarity score for each chain
tends to be greater than that of the set independent logistic models.

Because the models in each chain are arranged randomly there is significant
variation in performance among the chains. Presumably there is an optimal
ordering of the classes in a chain that will yield the best performance.
However we do not know that ordering a priori. Instead we can construct an
voting ensemble of classifier chains by averaging the binary predictions of
the chains and apply a threshold of 0.5. The Jaccard similarity score of the
ensemble is greater than that of the independent models and tends to exceed
the score of each chain in the ensemble (although this is not guaranteed
with randomly ordered chains).

.. GENERATED FROM PYTHON SOURCE LINES 34-114


.. rst-class:: sphx-glr-script-out

.. code-block:: pytb

    Traceback (most recent call last):
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/examples/multioutput/plot_classifier_chain_yeast.py", line 50, in <module>
        X, Y = fetch_openml('yeast', version=4, return_X_y=True)
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/utils/validation.py", line 72, in inner_f
        return f(**kwargs)
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_openml.py", line 738, in fetch_openml
        data_info = _get_data_info_by_name(name, version, data_home)
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_openml.py", line 381, in _get_data_info_by_name
        json_data = _get_json_content_from_openml_api(url, None, False,
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_openml.py", line 161, in _get_json_content_from_openml_api
        return _load_json()
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_openml.py", line 61, in wrapper
        return f(*args, **kw)
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_openml.py", line 157, in _load_json
        with closing(_open_openml_url(url, data_home)) as response:
      File "/build/scikit-learn-ZSX7SD/scikit-learn-0.23.2/.pybuild/cpython3_3.10/build/sklearn/datasets/_openml.py", line 106, in _open_openml_url
        with closing(urlopen(req)) as fsrc:
      File "/usr/lib/python3.10/urllib/request.py", line 216, in urlopen
        return opener.open(url, data, timeout)
      File "/usr/lib/python3.10/urllib/request.py", line 519, in open
        response = self._open(req, data)
      File "/usr/lib/python3.10/urllib/request.py", line 536, in _open
        result = self._call_chain(self.handle_open, protocol, protocol +
      File "/usr/lib/python3.10/urllib/request.py", line 496, in _call_chain
        result = func(*args)
      File "/usr/lib/python3.10/urllib/request.py", line 1391, in https_open
        return self.do_open(http.client.HTTPSConnection, req,
      File "/usr/lib/python3.10/urllib/request.py", line 1351, in do_open
        raise URLError(err)
    urllib.error.URLError: <urlopen error [Errno -2] Name or service not known>






|

.. code-block:: default


    # Author: Adam Kleczewski
    # License: BSD 3 clause

    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.datasets import fetch_openml
    from sklearn.multioutput import ClassifierChain
    from sklearn.model_selection import train_test_split
    from sklearn.multiclass import OneVsRestClassifier
    from sklearn.metrics import jaccard_score
    from sklearn.linear_model import LogisticRegression

    print(__doc__)

    # Load a multi-label dataset from https://www.openml.org/d/40597
    X, Y = fetch_openml('yeast', version=4, return_X_y=True)
    Y = Y == 'TRUE'
    X_train, X_test, Y_train, Y_test = train_test_split(X, Y, test_size=.2,
                                                        random_state=0)

    # Fit an independent logistic regression model for each class using the
    # OneVsRestClassifier wrapper.
    base_lr = LogisticRegression()
    ovr = OneVsRestClassifier(base_lr)
    ovr.fit(X_train, Y_train)
    Y_pred_ovr = ovr.predict(X_test)
    ovr_jaccard_score = jaccard_score(Y_test, Y_pred_ovr, average='samples')

    # Fit an ensemble of logistic regression classifier chains and take the
    # take the average prediction of all the chains.
    chains = [ClassifierChain(base_lr, order='random', random_state=i)
              for i in range(10)]
    for chain in chains:
        chain.fit(X_train, Y_train)

    Y_pred_chains = np.array([chain.predict(X_test) for chain in
                              chains])
    chain_jaccard_scores = [jaccard_score(Y_test, Y_pred_chain >= .5,
                                          average='samples')
                            for Y_pred_chain in Y_pred_chains]

    Y_pred_ensemble = Y_pred_chains.mean(axis=0)
    ensemble_jaccard_score = jaccard_score(Y_test,
                                           Y_pred_ensemble >= .5,
                                           average='samples')

    model_scores = [ovr_jaccard_score] + chain_jaccard_scores
    model_scores.append(ensemble_jaccard_score)

    model_names = ('Independent',
                   'Chain 1',
                   'Chain 2',
                   'Chain 3',
                   'Chain 4',
                   'Chain 5',
                   'Chain 6',
                   'Chain 7',
                   'Chain 8',
                   'Chain 9',
                   'Chain 10',
                   'Ensemble')

    x_pos = np.arange(len(model_names))

    # Plot the Jaccard similarity scores for the independent model, each of the
    # chains, and the ensemble (note that the vertical axis on this plot does
    # not begin at 0).

    fig, ax = plt.subplots(figsize=(7, 4))
    ax.grid(True)
    ax.set_title('Classifier Chain Ensemble Performance Comparison')
    ax.set_xticks(x_pos)
    ax.set_xticklabels(model_names, rotation='vertical')
    ax.set_ylabel('Jaccard Similarity Score')
    ax.set_ylim([min(model_scores) * .9, max(model_scores) * 1.1])
    colors = ['r'] + ['b'] * len(chain_jaccard_scores) + ['g']
    ax.bar(x_pos, model_scores, alpha=0.5, color=colors)
    plt.tight_layout()
    plt.show()


.. rst-class:: sphx-glr-timing

   **Total running time of the script:** ( 0 minutes  0.030 seconds)


.. _sphx_glr_download_auto_examples_multioutput_plot_classifier_chain_yeast.py:


.. only :: html

 .. container:: sphx-glr-footer
    :class: sphx-glr-footer-example



  .. container:: sphx-glr-download sphx-glr-download-python

     :download:`Download Python source code: plot_classifier_chain_yeast.py <plot_classifier_chain_yeast.py>`



  .. container:: sphx-glr-download sphx-glr-download-jupyter

     :download:`Download Jupyter notebook: plot_classifier_chain_yeast.ipynb <plot_classifier_chain_yeast.ipynb>`


.. only:: html

 .. rst-class:: sphx-glr-signature

    `Gallery generated by Sphinx-Gallery <https://sphinx-gallery.github.io>`_
