Generalized Linear Models (Formula)
===================================


.. _glm_formula_notebook:

`Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/glm_formula.ipynb>`_

.. raw:: html

   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>This notebook illustrates how you can use R-style formulas to fit Generalized Linear Models.</p>
   <p>To begin, we load the <code>Star98</code> dataset and we construct a formula and pre-process the data:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[1]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
   <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="kn">as</span> <span class="nn">sm</span>
   <span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="kn">as</span> <span class="nn">smf</span>
   <span class="n">star98</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">star98</span><span class="o">.</span><span class="n">load_pandas</span><span class="p">()</span><span class="o">.</span><span class="n">data</span>
   <span class="n">formula</span> <span class="o">=</span> <span class="s">&#39;SUCCESS ~ LOWINC + PERASIAN + PERBLACK + PERHISP + PCTCHRT + </span><span class="se">\</span>
   <span class="s">           PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF&#39;</span>
   <span class="n">dta</span> <span class="o">=</span> <span class="n">star98</span><span class="p">[[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">,</span> <span class="s">&#39;NBELOW&#39;</span><span class="p">,</span> <span class="s">&#39;LOWINC&#39;</span><span class="p">,</span> <span class="s">&#39;PERASIAN&#39;</span><span class="p">,</span> <span class="s">&#39;PERBLACK&#39;</span><span class="p">,</span> <span class="s">&#39;PERHISP&#39;</span><span class="p">,</span>
                 <span class="s">&#39;PCTCHRT&#39;</span><span class="p">,</span> <span class="s">&#39;PCTYRRND&#39;</span><span class="p">,</span> <span class="s">&#39;PERMINTE&#39;</span><span class="p">,</span> <span class="s">&#39;AVYRSEXP&#39;</span><span class="p">,</span> <span class="s">&#39;AVSALK&#39;</span><span class="p">,</span>
                 <span class="s">&#39;PERSPENK&#39;</span><span class="p">,</span> <span class="s">&#39;PTRATIO&#39;</span><span class="p">,</span> <span class="s">&#39;PCTAF&#39;</span><span class="p">]]</span>
   <span class="n">endog</span> <span class="o">=</span> <span class="n">dta</span><span class="p">[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">]</span> <span class="o">/</span> <span class="p">(</span><span class="n">dta</span><span class="p">[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">]</span> <span class="o">+</span> <span class="n">dta</span><span class="o">.</span><span class="n">pop</span><span class="p">(</span><span class="s">&#39;NBELOW&#39;</span><span class="p">))</span>
   <span class="k">del</span> <span class="n">dta</span><span class="p">[</span><span class="s">&#39;NABOVE&#39;</span><span class="p">]</span>
   <span class="n">dta</span><span class="p">[</span><span class="s">&#39;SUCCESS&#39;</span><span class="p">]</span> <span class="o">=</span> <span class="n">endog</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stderr output_text">
   <pre>
   -c:11: SettingWithCopyWarning: 
   A value is trying to be set on a copy of a slice from a DataFrame.
   Try using .loc[row_indexer,col_indexer] = value instead
   
   See the the caveats in the documentation: http://pandas.pydata.org/pandas-docs/stable/indexing.html#indexing-view-versus-copy
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Then, we fit the GLM model:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[2]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">mod1</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">glm</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="n">formula</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">dta</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Binomial</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">mod1</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[2]:</div>
   
   <div class="output_html rendered_html output_subarea output_pyout">
   <table class="simpletable">
   <caption>Generalized Linear Model Regression Results</caption>
   <tr>
     <th>Dep. Variable:</th>       <td>SUCCESS</td>     <th>  No. Observations:  </th>  <td>   303</td> 
   </tr>
   <tr>
     <th>Model:</th>                 <td>GLM</td>       <th>  Df Residuals:      </th>  <td>   282</td> 
   </tr>
   <tr>
     <th>Model Family:</th>       <td>Binomial</td>     <th>  Df Model:          </th>  <td>    20</td> 
   </tr>
   <tr>
     <th>Link Function:</th>        <td>logit</td>      <th>  Scale:             </th>    <td>1.0</td>  
   </tr>
   <tr>
     <th>Method:</th>               <td>IRLS</td>       <th>  Log-Likelihood:    </th> <td> -189.70</td>
   </tr>
   <tr>
     <th>Date:</th>           <td>Thu, 21 May 2015</td> <th>  Deviance:          </th> <td>  380.66</td>
   </tr>
   <tr>
     <th>Time:</th>               <td>05:55:22</td>     <th>  Pearson chi2:      </th>  <td>  8.48</td> 
   </tr>
   <tr>
     <th>No. Iterations:</th>         <td>7</td>        <th>                     </th>     <td> </td>   
   </tr>
   </table>
   <table class="simpletable">
   <tr>
                 <td></td>                <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> 
   </tr>
   <tr>
     <th>Intercept</th>                <td>    0.4037</td> <td>   25.036</td> <td>    0.016</td> <td> 0.987</td> <td>  -48.665    49.472</td>
   </tr>
   <tr>
     <th>LOWINC</th>                   <td>   -0.0204</td> <td>    0.010</td> <td>   -1.982</td> <td> 0.048</td> <td>   -0.041    -0.000</td>
   </tr>
   <tr>
     <th>PERASIAN</th>                 <td>    0.0159</td> <td>    0.017</td> <td>    0.910</td> <td> 0.363</td> <td>   -0.018     0.050</td>
   </tr>
   <tr>
     <th>PERBLACK</th>                 <td>   -0.0198</td> <td>    0.020</td> <td>   -1.004</td> <td> 0.316</td> <td>   -0.058     0.019</td>
   </tr>
   <tr>
     <th>PERHISP</th>                  <td>   -0.0096</td> <td>    0.010</td> <td>   -0.951</td> <td> 0.341</td> <td>   -0.029     0.010</td>
   </tr>
   <tr>
     <th>PCTCHRT</th>                  <td>   -0.0022</td> <td>    0.022</td> <td>   -0.103</td> <td> 0.918</td> <td>   -0.045     0.040</td>
   </tr>
   <tr>
     <th>PCTYRRND</th>                 <td>   -0.0022</td> <td>    0.006</td> <td>   -0.348</td> <td> 0.728</td> <td>   -0.014     0.010</td>
   </tr>
   <tr>
     <th>PERMINTE</th>                 <td>    0.1068</td> <td>    0.787</td> <td>    0.136</td> <td> 0.892</td> <td>   -1.436     1.650</td>
   </tr>
   <tr>
     <th>AVYRSEXP</th>                 <td>   -0.0411</td> <td>    1.176</td> <td>   -0.035</td> <td> 0.972</td> <td>   -2.346     2.264</td>
   </tr>
   <tr>
     <th>PERMINTE:AVYRSEXP</th>        <td>   -0.0031</td> <td>    0.054</td> <td>   -0.057</td> <td> 0.954</td> <td>   -0.108     0.102</td>
   </tr>
   <tr>
     <th>AVSALK</th>                   <td>    0.0131</td> <td>    0.295</td> <td>    0.044</td> <td> 0.965</td> <td>   -0.566     0.592</td>
   </tr>
   <tr>
     <th>PERMINTE:AVSALK</th>          <td>   -0.0019</td> <td>    0.013</td> <td>   -0.145</td> <td> 0.885</td> <td>   -0.028     0.024</td>
   </tr>
   <tr>
     <th>AVYRSEXP:AVSALK</th>          <td>    0.0008</td> <td>    0.020</td> <td>    0.038</td> <td> 0.970</td> <td>   -0.039     0.041</td>
   </tr>
   <tr>
     <th>PERMINTE:AVYRSEXP:AVSALK</th> <td> 5.978e-05</td> <td>    0.001</td> <td>    0.068</td> <td> 0.946</td> <td>   -0.002     0.002</td>
   </tr>
   <tr>
     <th>PERSPENK</th>                 <td>   -0.3097</td> <td>    4.233</td> <td>   -0.073</td> <td> 0.942</td> <td>   -8.606     7.987</td>
   </tr>
   <tr>
     <th>PTRATIO</th>                  <td>    0.0096</td> <td>    0.919</td> <td>    0.010</td> <td> 0.992</td> <td>   -1.792     1.811</td>
   </tr>
   <tr>
     <th>PERSPENK:PTRATIO</th>         <td>    0.0066</td> <td>    0.206</td> <td>    0.032</td> <td> 0.974</td> <td>   -0.397     0.410</td>
   </tr>
   <tr>
     <th>PCTAF</th>                    <td>   -0.0143</td> <td>    0.474</td> <td>   -0.030</td> <td> 0.976</td> <td>   -0.944     0.916</td>
   </tr>
   <tr>
     <th>PERSPENK:PCTAF</th>           <td>    0.0105</td> <td>    0.098</td> <td>    0.107</td> <td> 0.915</td> <td>   -0.182     0.203</td>
   </tr>
   <tr>
     <th>PTRATIO:PCTAF</th>            <td>   -0.0001</td> <td>    0.022</td> <td>   -0.005</td> <td> 0.996</td> <td>   -0.044     0.044</td>
   </tr>
   <tr>
     <th>PERSPENK:PTRATIO:PCTAF</th>   <td>   -0.0002</td> <td>    0.005</td> <td>   -0.051</td> <td> 0.959</td> <td>   -0.010     0.009</td>
   </tr>
   </table>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Finally, we define a function to operate customized data transformation using the formula framework:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[3]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="k">def</span> <span class="nf">double_it</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
       <span class="k">return</span> <span class="mi">2</span> <span class="o">*</span> <span class="n">x</span>
   <span class="n">formula</span> <span class="o">=</span> <span class="s">&#39;SUCCESS ~ double_it(LOWINC) + PERASIAN + PERBLACK + PERHISP + PCTCHRT + </span><span class="se">\</span>
   <span class="s">           PCTYRRND + PERMINTE*AVYRSEXP*AVSALK + PERSPENK*PTRATIO*PCTAF&#39;</span>
   <span class="n">mod2</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">glm</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="n">formula</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">dta</span><span class="p">,</span> <span class="n">family</span><span class="o">=</span><span class="n">sm</span><span class="o">.</span><span class="n">families</span><span class="o">.</span><span class="n">Binomial</span><span class="p">())</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">mod2</span><span class="o">.</span><span class="n">summary</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[3]:</div>
   
   <div class="output_html rendered_html output_subarea output_pyout">
   <table class="simpletable">
   <caption>Generalized Linear Model Regression Results</caption>
   <tr>
     <th>Dep. Variable:</th>       <td>SUCCESS</td>     <th>  No. Observations:  </th>  <td>   303</td> 
   </tr>
   <tr>
     <th>Model:</th>                 <td>GLM</td>       <th>  Df Residuals:      </th>  <td>   282</td> 
   </tr>
   <tr>
     <th>Model Family:</th>       <td>Binomial</td>     <th>  Df Model:          </th>  <td>    20</td> 
   </tr>
   <tr>
     <th>Link Function:</th>        <td>logit</td>      <th>  Scale:             </th>    <td>1.0</td>  
   </tr>
   <tr>
     <th>Method:</th>               <td>IRLS</td>       <th>  Log-Likelihood:    </th> <td> -189.70</td>
   </tr>
   <tr>
     <th>Date:</th>           <td>Thu, 21 May 2015</td> <th>  Deviance:          </th> <td>  380.66</td>
   </tr>
   <tr>
     <th>Time:</th>               <td>05:55:22</td>     <th>  Pearson chi2:      </th>  <td>  8.48</td> 
   </tr>
   <tr>
     <th>No. Iterations:</th>         <td>7</td>        <th>                     </th>     <td> </td>   
   </tr>
   </table>
   <table class="simpletable">
   <tr>
                 <td></td>                <th>coef</th>     <th>std err</th>      <th>z</th>      <th>P>|z|</th> <th>[95.0% Conf. Int.]</th> 
   </tr>
   <tr>
     <th>Intercept</th>                <td>    0.4037</td> <td>   25.036</td> <td>    0.016</td> <td> 0.987</td> <td>  -48.665    49.472</td>
   </tr>
   <tr>
     <th>double_it(LOWINC)</th>        <td>   -0.0102</td> <td>    0.005</td> <td>   -1.982</td> <td> 0.048</td> <td>   -0.020    -0.000</td>
   </tr>
   <tr>
     <th>PERASIAN</th>                 <td>    0.0159</td> <td>    0.017</td> <td>    0.910</td> <td> 0.363</td> <td>   -0.018     0.050</td>
   </tr>
   <tr>
     <th>PERBLACK</th>                 <td>   -0.0198</td> <td>    0.020</td> <td>   -1.004</td> <td> 0.316</td> <td>   -0.058     0.019</td>
   </tr>
   <tr>
     <th>PERHISP</th>                  <td>   -0.0096</td> <td>    0.010</td> <td>   -0.951</td> <td> 0.341</td> <td>   -0.029     0.010</td>
   </tr>
   <tr>
     <th>PCTCHRT</th>                  <td>   -0.0022</td> <td>    0.022</td> <td>   -0.103</td> <td> 0.918</td> <td>   -0.045     0.040</td>
   </tr>
   <tr>
     <th>PCTYRRND</th>                 <td>   -0.0022</td> <td>    0.006</td> <td>   -0.348</td> <td> 0.728</td> <td>   -0.014     0.010</td>
   </tr>
   <tr>
     <th>PERMINTE</th>                 <td>    0.1068</td> <td>    0.787</td> <td>    0.136</td> <td> 0.892</td> <td>   -1.436     1.650</td>
   </tr>
   <tr>
     <th>AVYRSEXP</th>                 <td>   -0.0411</td> <td>    1.176</td> <td>   -0.035</td> <td> 0.972</td> <td>   -2.346     2.264</td>
   </tr>
   <tr>
     <th>PERMINTE:AVYRSEXP</th>        <td>   -0.0031</td> <td>    0.054</td> <td>   -0.057</td> <td> 0.954</td> <td>   -0.108     0.102</td>
   </tr>
   <tr>
     <th>AVSALK</th>                   <td>    0.0131</td> <td>    0.295</td> <td>    0.044</td> <td> 0.965</td> <td>   -0.566     0.592</td>
   </tr>
   <tr>
     <th>PERMINTE:AVSALK</th>          <td>   -0.0019</td> <td>    0.013</td> <td>   -0.145</td> <td> 0.885</td> <td>   -0.028     0.024</td>
   </tr>
   <tr>
     <th>AVYRSEXP:AVSALK</th>          <td>    0.0008</td> <td>    0.020</td> <td>    0.038</td> <td> 0.970</td> <td>   -0.039     0.041</td>
   </tr>
   <tr>
     <th>PERMINTE:AVYRSEXP:AVSALK</th> <td> 5.978e-05</td> <td>    0.001</td> <td>    0.068</td> <td> 0.946</td> <td>   -0.002     0.002</td>
   </tr>
   <tr>
     <th>PERSPENK</th>                 <td>   -0.3097</td> <td>    4.233</td> <td>   -0.073</td> <td> 0.942</td> <td>   -8.606     7.987</td>
   </tr>
   <tr>
     <th>PTRATIO</th>                  <td>    0.0096</td> <td>    0.919</td> <td>    0.010</td> <td> 0.992</td> <td>   -1.792     1.811</td>
   </tr>
   <tr>
     <th>PERSPENK:PTRATIO</th>         <td>    0.0066</td> <td>    0.206</td> <td>    0.032</td> <td> 0.974</td> <td>   -0.397     0.410</td>
   </tr>
   <tr>
     <th>PCTAF</th>                    <td>   -0.0143</td> <td>    0.474</td> <td>   -0.030</td> <td> 0.976</td> <td>   -0.944     0.916</td>
   </tr>
   <tr>
     <th>PERSPENK:PCTAF</th>           <td>    0.0105</td> <td>    0.098</td> <td>    0.107</td> <td> 0.915</td> <td>   -0.182     0.203</td>
   </tr>
   <tr>
     <th>PTRATIO:PCTAF</th>            <td>   -0.0001</td> <td>    0.022</td> <td>   -0.005</td> <td> 0.996</td> <td>   -0.044     0.044</td>
   </tr>
   <tr>
     <th>PERSPENK:PTRATIO:PCTAF</th>   <td>   -0.0002</td> <td>    0.005</td> <td>   -0.051</td> <td> 0.959</td> <td>   -0.010     0.009</td>
   </tr>
   </table>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>As expected, the coefficient for <code>double_it(LOWINC)</code> in the second model is half the size of the <code>LOWINC</code> coefficient from the first model:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[4]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="k">print</span><span class="p">(</span><span class="n">mod1</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">])</span>
   <span class="k">print</span><span class="p">(</span><span class="n">mod2</span><span class="o">.</span><span class="n">params</span><span class="p">[</span><span class="mi">1</span><span class="p">]</span> <span class="o">*</span> <span class="mi">2</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_stream output_stdout output_text">
   <pre>
   -0.0203959871548
   -0.0203959871548
   
   </pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>

   <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script>
   <script type="text/javascript">
   init_mathjax = function() {
       if (window.MathJax) {
           // MathJax loaded
           MathJax.Hub.Config({
               tex2jax: {
               // I'm not sure about the \( and \[ below. It messes with the
               // prompt, and I think it's an issue with the template. -SS
                   inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
               },
               displayAlign: 'left', // Change this to 'center' to center equations.
               "HTML-CSS": {
                   styles: {'.MathJax_Display': {"margin": 0}}
               }
           });
           MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
       }
   }
   init_mathjax();

   // since we have to load this in a ..raw:: directive we will add the css
   // after the fact
   function loadcssfile(filename){
       var fileref=document.createElement("link")
       fileref.setAttribute("rel", "stylesheet")
       fileref.setAttribute("type", "text/css")
       fileref.setAttribute("href", filename)

       document.getElementsByTagName("head")[0].appendChild(fileref)
   }
   // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }})
   // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }})
   loadcssfile("../../../_static/nbviewer.pygments.css")
   loadcssfile("../../../_static/ipython.min.css")
   </script>