Formulas: Fitting models using R-style formulas
===============================================


.. _formulas_notebook:

`Link to Notebook GitHub <https://github.com/statsmodels/statsmodels/blob/master/examples/notebooks/formulas.ipynb>`_

.. raw:: html

   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Since version 0.5.0, <code>statsmodels</code> allows users to fit statistical models using R-style formulas. Internally, <code>statsmodels</code> uses the <a href="http://patsy.readthedocs.org/">patsy</a> package to convert formulas and data to the matrices that are used in model fitting. The formula framework is quite powerful; this tutorial only scratches the surface. A full description of the formula language can be found in the <code>patsy</code> docs: </p>
   <ul>
   <li><a href="http://patsy.readthedocs.org/">Patsy formula language description</a></li>
   </ul>
   <h2 id="loading-modules-and-functions">Loading modules and functions</h2>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[1]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">from</span> <span class="nn">__future__</span> <span class="kn">import</span> <span class="n">print_function</span>
   <span class="kn">import</span> <span class="nn">numpy</span> <span class="kn">as</span> <span class="nn">np</span>
   <span class="kn">import</span> <span class="nn">statsmodels.api</span> <span class="kn">as</span> <span class="nn">sm</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h4 id="Import-convention">Import convention<a class="anchor-link" href="#Import-convention">&#182;</a></h4>
   </div>
   </div>
   </div>
   
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>You can import explicitly from statsmodels.formula.api</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[2]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">from</span> <span class="nn">statsmodels.formula.api</span> <span class="kn">import</span> <span class="n">ols</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Alternatively, you can just use the <code>formula</code> namespace of the main <code>statsmodels.api</code>.</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[3]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">sm</span><span class="o">.</span><span class="n">formula</span><span class="o">.</span><span class="n">ols</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[3]:</div>
   
   
   <div class="output_text output_subarea output_pyout">
   <pre>
   &lt;bound method type.from_formula of &lt;class &apos;statsmodels.regression.linear_model.OLS&apos;&gt;&gt;
   </pre>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Or you can use the following conventioin</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[4]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">import</span> <span class="nn">statsmodels.formula.api</span> <span class="kn">as</span> <span class="nn">smf</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>These names are just a convenient way to get access to each model&#39;s <code>from_formula</code> classmethod. See, for instance</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[5]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="o">.</span><span class="n">from_formula</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt output_prompt">
       Out[5]:</div>
   
   
   <div class="output_text output_subarea output_pyout">
   <pre>
   &lt;bound method type.from_formula of &lt;class &apos;statsmodels.regression.linear_model.OLS&apos;&gt;&gt;
   </pre>
   </div>
   
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>All of the lower case models accept <code>formula</code> and <code>data</code> arguments, whereas upper case ones take <code>endog</code> and <code>exog</code> design matrices. <code>formula</code> accepts a string which describes the model in terms of a <code>patsy</code> formula. <code>data</code> takes a <a href="http://pandas.pydata.org/">pandas</a> data frame or any other data structure that defines a <code>__getitem__</code> for variable names like a structured array or a dictionary of variables. </p>
   <p><code>dir(sm.formula)</code> will print a list of available models. </p>
   <p>Formula-compatible models have the following generic call signature: <code>(formula, data, subset=None, *args, **kwargs)</code></p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="ols-regression-using-formulas">OLS regression using formulas</h2>
   <p>To begin, we fit the linear model described on the <a href="gettingstarted.html">Getting Started</a> page. Download the data, subset columns, and list-wise delete to remove missing observations:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[6]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">dta</span> <span class="o">=</span> <span class="n">sm</span><span class="o">.</span><span class="n">datasets</span><span class="o">.</span><span class="n">get_rdataset</span><span class="p">(</span><span class="s">&quot;Guerry&quot;</span><span class="p">,</span> <span class="s">&quot;HistData&quot;</span><span class="p">,</span> <span class="n">cache</span><span class="o">=</span><span class="bp">True</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">URLError</span>                                  Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-395-0b450e8cdfce&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>dta <span class="ansiblue">=</span> sm<span class="ansiblue">.</span>datasets<span class="ansiblue">.</span>get_rdataset<span class="ansiblue">(</span><span class="ansiblue">&quot;Guerry&quot;</span><span class="ansiblue">,</span> <span class="ansiblue">&quot;HistData&quot;</span><span class="ansiblue">,</span> cache<span class="ansiblue">=</span>True<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/datasets/utils.pyc</span> in <span class="ansicyan">get_rdataset</span><span class="ansiblue">(dataname, package, cache)</span>
   <span class="ansigreen">    284</span>                      &quot;master/doc/&quot;+package+&quot;/rst/&quot;)
   <span class="ansigreen">    285</span>     cache <span class="ansiblue">=</span> _get_cache<span class="ansiblue">(</span>cache<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 286</span><span class="ansired">     </span>data<span class="ansiblue">,</span> from_cache <span class="ansiblue">=</span> _get_data<span class="ansiblue">(</span>data_base_url<span class="ansiblue">,</span> dataname<span class="ansiblue">,</span> cache<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    287</span>     data <span class="ansiblue">=</span> read_csv<span class="ansiblue">(</span>data<span class="ansiblue">,</span> index_col<span class="ansiblue">=</span><span class="ansicyan">0</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    288</span>     data <span class="ansiblue">=</span> _maybe_reset_index<span class="ansiblue">(</span>data<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/datasets/utils.pyc</span> in <span class="ansicyan">_get_data</span><span class="ansiblue">(base_url, dataname, cache, extension)</span>
   <span class="ansigreen">    215</span>     url <span class="ansiblue">=</span> base_url <span class="ansiblue">+</span> <span class="ansiblue">(</span>dataname <span class="ansiblue">+</span> <span class="ansiblue">&quot;.%s&quot;</span><span class="ansiblue">)</span> <span class="ansiblue">%</span> extension<span class="ansiblue"></span>
   <span class="ansigreen">    216</span>     <span class="ansigreen">try</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 217</span><span class="ansired">         </span>data<span class="ansiblue">,</span> from_cache <span class="ansiblue">=</span> _urlopen_cached<span class="ansiblue">(</span>url<span class="ansiblue">,</span> cache<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    218</span>     <span class="ansigreen">except</span> HTTPError <span class="ansigreen">as</span> err<span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">    219</span>         <span class="ansigreen">if</span> <span class="ansiblue">&apos;404&apos;</span> <span class="ansigreen">in</span> str<span class="ansiblue">(</span>err<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/datasets/utils.pyc</span> in <span class="ansicyan">_urlopen_cached</span><span class="ansiblue">(url, cache)</span>
   <span class="ansigreen">    206</span>     <span class="ansired"># not using the cache or didn&apos;t find it in cache</span><span class="ansiblue"></span><span class="ansiblue"></span>
   <span class="ansigreen">    207</span>     <span class="ansigreen">if</span> <span class="ansigreen">not</span> from_cache<span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 208</span><span class="ansired">         </span>data <span class="ansiblue">=</span> urlopen<span class="ansiblue">(</span>url<span class="ansiblue">)</span><span class="ansiblue">.</span>read<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    209</span>         <span class="ansigreen">if</span> cache <span class="ansigreen">is</span> <span class="ansigreen">not</span> None<span class="ansiblue">:</span>  <span class="ansired"># then put it in the cache</span><span class="ansiblue"></span>
   <span class="ansigreen">    210</span>             _cache_it<span class="ansiblue">(</span>data<span class="ansiblue">,</span> cache_path<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/urllib2.pyc</span> in <span class="ansicyan">urlopen</span><span class="ansiblue">(url, data, timeout, cafile, capath, cadefault, context)</span>
   <span class="ansigreen">    152</span>     <span class="ansigreen">else</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">    153</span>         opener <span class="ansiblue">=</span> _opener<span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 154</span><span class="ansired">     </span><span class="ansigreen">return</span> opener<span class="ansiblue">.</span>open<span class="ansiblue">(</span>url<span class="ansiblue">,</span> data<span class="ansiblue">,</span> timeout<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    155</span> <span class="ansiblue"></span>
   <span class="ansigreen">    156</span> <span class="ansigreen">def</span> install_opener<span class="ansiblue">(</span>opener<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/urllib2.pyc</span> in <span class="ansicyan">open</span><span class="ansiblue">(self, fullurl, data, timeout)</span>
   <span class="ansigreen">    429</span>             req <span class="ansiblue">=</span> meth<span class="ansiblue">(</span>req<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    430</span> <span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 431</span><span class="ansired">         </span>response <span class="ansiblue">=</span> self<span class="ansiblue">.</span>_open<span class="ansiblue">(</span>req<span class="ansiblue">,</span> data<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    432</span> <span class="ansiblue"></span>
   <span class="ansigreen">    433</span>         <span class="ansired"># post-process response</span><span class="ansiblue"></span><span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/urllib2.pyc</span> in <span class="ansicyan">_open</span><span class="ansiblue">(self, req, data)</span>
   <span class="ansigreen">    447</span>         protocol <span class="ansiblue">=</span> req<span class="ansiblue">.</span>get_type<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    448</span>         result = self._call_chain(self.handle_open, protocol, protocol +
   <span class="ansigreen">--&gt; 449</span><span class="ansired">                                   &apos;_open&apos;, req)
   </span><span class="ansigreen">    450</span>         <span class="ansigreen">if</span> result<span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">    451</span>             <span class="ansigreen">return</span> result<span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/urllib2.pyc</span> in <span class="ansicyan">_call_chain</span><span class="ansiblue">(self, chain, kind, meth_name, *args)</span>
   <span class="ansigreen">    407</span>             func <span class="ansiblue">=</span> getattr<span class="ansiblue">(</span>handler<span class="ansiblue">,</span> meth_name<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    408</span> <span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 409</span><span class="ansired">             </span>result <span class="ansiblue">=</span> func<span class="ansiblue">(</span><span class="ansiblue">*</span>args<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    410</span>             <span class="ansigreen">if</span> result <span class="ansigreen">is</span> <span class="ansigreen">not</span> None<span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">    411</span>                 <span class="ansigreen">return</span> result<span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/urllib2.pyc</span> in <span class="ansicyan">https_open</span><span class="ansiblue">(self, req)</span>
   <span class="ansigreen">   1238</span>         <span class="ansigreen">def</span> https_open<span class="ansiblue">(</span>self<span class="ansiblue">,</span> req<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">   1239</span>             return self.do_open(httplib.HTTPSConnection, req,
   <span class="ansigreen">-&gt; 1240</span><span class="ansired">                 context=self._context)
   </span><span class="ansigreen">   1241</span> <span class="ansiblue"></span>
   <span class="ansigreen">   1242</span>         https_request <span class="ansiblue">=</span> AbstractHTTPHandler<span class="ansiblue">.</span>do_request_<span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/urllib2.pyc</span> in <span class="ansicyan">do_open</span><span class="ansiblue">(self, http_class, req, **http_conn_args)</span>
   <span class="ansigreen">   1195</span>         <span class="ansigreen">except</span> socket<span class="ansiblue">.</span>error<span class="ansiblue">,</span> err<span class="ansiblue">:</span> <span class="ansired"># XXX what error?</span><span class="ansiblue"></span>
   <span class="ansigreen">   1196</span>             h<span class="ansiblue">.</span>close<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">-&gt; 1197</span><span class="ansired">             </span><span class="ansigreen">raise</span> URLError<span class="ansiblue">(</span>err<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">   1198</span>         <span class="ansigreen">else</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">   1199</span>             <span class="ansigreen">try</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   
   <span class="ansired">URLError</span>: &lt;urlopen error [Errno -2] Name or service not known&gt;</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[7]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">df</span> <span class="o">=</span> <span class="n">dta</span><span class="o">.</span><span class="n">data</span><span class="p">[[</span><span class="s">&#39;Lottery&#39;</span><span class="p">,</span> <span class="s">&#39;Literacy&#39;</span><span class="p">,</span> <span class="s">&#39;Wealth&#39;</span><span class="p">,</span> <span class="s">&#39;Region&#39;</span><span class="p">]]</span><span class="o">.</span><span class="n">dropna</span><span class="p">()</span>
   <span class="n">df</span><span class="o">.</span><span class="n">head</span><span class="p">()</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">AttributeError</span>                            Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-396-c86d8ac9ee04&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>df <span class="ansiblue">=</span> dta<span class="ansiblue">.</span>data<span class="ansiblue">[</span><span class="ansiblue">[</span><span class="ansiblue">&apos;Lottery&apos;</span><span class="ansiblue">,</span> <span class="ansiblue">&apos;Literacy&apos;</span><span class="ansiblue">,</span> <span class="ansiblue">&apos;Wealth&apos;</span><span class="ansiblue">,</span> <span class="ansiblue">&apos;Region&apos;</span><span class="ansiblue">]</span><span class="ansiblue">]</span><span class="ansiblue">.</span>dropna<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span> df<span class="ansiblue">.</span>head<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/usr/lib/python2.7/dist-packages/pandas/core/generic.pyc</span> in <span class="ansicyan">__getattr__</span><span class="ansiblue">(self, name)</span>
   <span class="ansigreen">   1940</span>                 <span class="ansigreen">return</span> self<span class="ansiblue">[</span>name<span class="ansiblue">]</span><span class="ansiblue"></span>
   <span class="ansigreen">   1941</span>             raise AttributeError(&quot;&apos;%s&apos; object has no attribute &apos;%s&apos;&quot; %
   <span class="ansigreen">-&gt; 1942</span><span class="ansired">                                  (type(self).__name__, name))
   </span><span class="ansigreen">   1943</span> <span class="ansiblue"></span>
   <span class="ansigreen">   1944</span>     <span class="ansigreen">def</span> __setattr__<span class="ansiblue">(</span>self<span class="ansiblue">,</span> name<span class="ansiblue">,</span> value<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   
   <span class="ansired">AttributeError</span>: &apos;DataFrame&apos; object has no attribute &apos;data&apos;</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Fit the model:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[8]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">mod</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + Region&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span>
   <span class="n">res</span> <span class="o">=</span> <span class="n">mod</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-397-536472a0f10b&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>mod <span class="ansiblue">=</span> ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ Literacy + Wealth + Region&apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span> res <span class="ansiblue">=</span> mod<span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      3</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res<span class="ansiblue">.</span>summary<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="categorical-variables">Categorical variables</h2>
   <p>Looking at the summary printed above, notice that <code>patsy</code> determined that elements of <em>Region</em> were text strings, so it treated <em>Region</em> as a categorical variable. <code>patsy</code>&#39;s default is also to include an intercept, so we automatically dropped one of the <em>Region</em> categories.</p>
   <p>If <em>Region</em> had been an integer variable that we wanted to treat explicitly as categorical, we could have done so by using the <code>C()</code> operator: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[9]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + C(Region)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-398-d258a68e10f8&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>res <span class="ansiblue">=</span> ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ Literacy + Wealth + C(Region)&apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res<span class="ansiblue">.</span>params<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Patsy&#39;s mode advanced features for categorical variables are discussed in: <a href="contrasts.html">Patsy: Contrast Coding Systems for categorical variables</a></p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="operators">Operators</h2>
   <p>We have already seen that &quot;~&quot; separates the left-hand side of the model from the right-hand side, and that &quot;+&quot; adds new columns to the design matrix. </p>
   <h3 id="removing-variables">Removing variables</h3>
   <p>The &quot;-&quot; sign can be used to remove columns/variables. For instance, we can remove the intercept from a model by: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[10]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy + Wealth + C(Region) -1 &#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-399-c9050ef6e795&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>res <span class="ansiblue">=</span> ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ Literacy + Wealth + C(Region) -1 &apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res<span class="ansiblue">.</span>params<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h3 id="multiplicative-interactions">Multiplicative interactions</h3>
   <p>&quot;:&quot; adds a new column to the design matrix with the interaction of the other two columns. &quot;*&quot; will also include the individual columns that were multiplied together:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[11]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res1</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy : Wealth - 1&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="n">res2</span> <span class="o">=</span> <span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ Literacy * Wealth - 1&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res1</span><span class="o">.</span><span class="n">params</span><span class="p">,</span> <span class="s">&#39;</span><span class="se">\n</span><span class="s">&#39;</span><span class="p">)</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res2</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-400-f906b35aeafd&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>res1 <span class="ansiblue">=</span> ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ Literacy : Wealth - 1&apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span> res2 <span class="ansiblue">=</span> ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ Literacy * Wealth - 1&apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      3</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res1<span class="ansiblue">.</span>params<span class="ansiblue">,</span> <span class="ansiblue">&apos;\n&apos;</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      4</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res2<span class="ansiblue">.</span>params<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Many other things are possible with operators. Please consult the <a href="https://patsy.readthedocs.org/en/latest/formulas.html">patsy docs</a> to learn more.</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="functions">Functions</h2>
   <p>You can apply vectorized functions to the variables in your model: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[12]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">res</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ np.log(Literacy)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-401-023367ac1531&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span>res <span class="ansiblue">=</span> smf<span class="ansiblue">.</span>ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ np.log(Literacy)&apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res<span class="ansiblue">.</span>params<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Define a custom function:</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[13]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="k">def</span> <span class="nf">log_plus_1</span><span class="p">(</span><span class="n">x</span><span class="p">):</span>
       <span class="k">return</span> <span class="n">np</span><span class="o">.</span><span class="n">log</span><span class="p">(</span><span class="n">x</span><span class="p">)</span> <span class="o">+</span> <span class="mf">1.</span>
   <span class="n">res</span> <span class="o">=</span> <span class="n">smf</span><span class="o">.</span><span class="n">ols</span><span class="p">(</span><span class="n">formula</span><span class="o">=</span><span class="s">&#39;Lottery ~ log_plus_1(Literacy)&#39;</span><span class="p">,</span> <span class="n">data</span><span class="o">=</span><span class="n">df</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span>
   <span class="k">print</span><span class="p">(</span><span class="n">res</span><span class="o">.</span><span class="n">params</span><span class="p">)</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-402-0eeba7434bb9&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">      1</span> <span class="ansigreen">def</span> log_plus_1<span class="ansiblue">(</span>x<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">      2</span>     <span class="ansigreen">return</span> np<span class="ansiblue">.</span>log<span class="ansiblue">(</span>x<span class="ansiblue">)</span> <span class="ansiblue">+</span> <span class="ansicyan">1.</span><span class="ansiblue"></span>
   <span class="ansigreen">----&gt; 3</span><span class="ansired"> </span>res <span class="ansiblue">=</span> smf<span class="ansiblue">.</span>ols<span class="ansiblue">(</span>formula<span class="ansiblue">=</span><span class="ansiblue">&apos;Lottery ~ log_plus_1(Literacy)&apos;</span><span class="ansiblue">,</span> data<span class="ansiblue">=</span>df<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      4</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>res<span class="ansiblue">.</span>params<span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>Any function that is in the calling namespace is available to the formula.</p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <h2 id="using-formulas-with-models-that-do-not-yet-support-them">Using formulas with models that do not (yet) support them</h2>
   <p>Even if a given <code>statsmodels</code> function does not support formulas, you can still use <code>patsy</code>&#39;s formula language to produce design matrices. Those matrices 
   can then be fed to the fitting function as <code>endog</code> and <code>exog</code> arguments. </p>
   <p>To generate <code>numpy</code> arrays: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[14]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="kn">import</span> <span class="nn">patsy</span>
   <span class="n">f</span> <span class="o">=</span> <span class="s">&#39;Lottery ~ Literacy * Wealth&#39;</span>
   <span class="n">y</span><span class="p">,</span><span class="n">X</span> <span class="o">=</span> <span class="n">patsy</span><span class="o">.</span><span class="n">dmatrices</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s">&#39;dataframe&#39;</span><span class="p">)</span>
   <span class="k">print</span><span class="p">(</span><span class="n">y</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   <span class="k">print</span><span class="p">(</span><span class="n">X</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-403-b909ce5fd501&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">      1</span> <span class="ansigreen">import</span> patsy<span class="ansiblue"></span>
   <span class="ansigreen">      2</span> f <span class="ansiblue">=</span> <span class="ansiblue">&apos;Lottery ~ Literacy * Wealth&apos;</span><span class="ansiblue"></span>
   <span class="ansigreen">----&gt; 3</span><span class="ansired"> </span>y<span class="ansiblue">,</span>X <span class="ansiblue">=</span> patsy<span class="ansiblue">.</span>dmatrices<span class="ansiblue">(</span>f<span class="ansiblue">,</span> df<span class="ansiblue">,</span> return_type<span class="ansiblue">=</span><span class="ansiblue">&apos;dataframe&apos;</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      4</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>y<span class="ansiblue">[</span><span class="ansiblue">:</span><span class="ansicyan">5</span><span class="ansiblue">]</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      5</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>X<span class="ansiblue">[</span><span class="ansiblue">:</span><span class="ansicyan">5</span><span class="ansiblue">]</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing text_cell rendered">
   <div class="prompt input_prompt">
   </div>
   <div class="inner_cell">
   <div class="text_cell_render border-box-sizing rendered_html">
   <p>To generate pandas data frames: </p>
   </div>
   </div>
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[15]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="n">f</span> <span class="o">=</span> <span class="s">&#39;Lottery ~ Literacy * Wealth&#39;</span>
   <span class="n">y</span><span class="p">,</span><span class="n">X</span> <span class="o">=</span> <span class="n">patsy</span><span class="o">.</span><span class="n">dmatrices</span><span class="p">(</span><span class="n">f</span><span class="p">,</span> <span class="n">df</span><span class="p">,</span> <span class="n">return_type</span><span class="o">=</span><span class="s">&#39;dataframe&#39;</span><span class="p">)</span>
   <span class="k">print</span><span class="p">(</span><span class="n">y</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   <span class="k">print</span><span class="p">(</span><span class="n">X</span><span class="p">[:</span><span class="mi">5</span><span class="p">])</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">NameError</span>                                 Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-404-d9fd5a15051e&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">      1</span> f <span class="ansiblue">=</span> <span class="ansiblue">&apos;Lottery ~ Literacy * Wealth&apos;</span><span class="ansiblue"></span>
   <span class="ansigreen">----&gt; 2</span><span class="ansired"> </span>y<span class="ansiblue">,</span>X <span class="ansiblue">=</span> patsy<span class="ansiblue">.</span>dmatrices<span class="ansiblue">(</span>f<span class="ansiblue">,</span> df<span class="ansiblue">,</span> return_type<span class="ansiblue">=</span><span class="ansiblue">&apos;dataframe&apos;</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      3</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>y<span class="ansiblue">[</span><span class="ansiblue">:</span><span class="ansicyan">5</span><span class="ansiblue">]</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">      4</span> <span class="ansigreen">print</span><span class="ansiblue">(</span>X<span class="ansiblue">[</span><span class="ansiblue">:</span><span class="ansicyan">5</span><span class="ansiblue">]</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansired">NameError</span>: name &apos;df&apos; is not defined</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>
   <div class="cell border-box-sizing code_cell rendered">
   <div class="input">
   <div class="prompt input_prompt">
   In&nbsp;[16]:
   </div>
   <div class="inner_cell">
       <div class="input_area">
   <div class="highlight"><pre><span class="k">print</span><span class="p">(</span><span class="n">sm</span><span class="o">.</span><span class="n">OLS</span><span class="p">(</span><span class="n">y</span><span class="p">,</span> <span class="n">X</span><span class="p">)</span><span class="o">.</span><span class="n">fit</span><span class="p">()</span><span class="o">.</span><span class="n">summary</span><span class="p">())</span>
   </pre></div>
   
   </div>
   </div>
   </div>
   
   <div class="output_wrapper">
   <div class="output">
   
   
   <div class="output_area"><div class="prompt"></div>
   <div class="output_subarea output_text output_pyerr">
   <pre>
   <span class="ansired">---------------------------------------------------------------------------</span>
   <span class="ansired">ValueError</span>                                Traceback (most recent call last)
   <span class="ansigreen">&lt;ipython-input-405-4f13d104e8aa&gt;</span> in <span class="ansicyan">&lt;module&gt;</span><span class="ansiblue">()</span>
   <span class="ansigreen">----&gt; 1</span><span class="ansired"> </span><span class="ansigreen">print</span><span class="ansiblue">(</span>sm<span class="ansiblue">.</span>OLS<span class="ansiblue">(</span>y<span class="ansiblue">,</span> X<span class="ansiblue">)</span><span class="ansiblue">.</span>fit<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue">.</span>summary<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/regression/linear_model.pyc</span> in <span class="ansicyan">__init__</span><span class="ansiblue">(self, endog, exog, missing, hasconst, **kwargs)</span>
   <span class="ansigreen">    689</span>                  **kwargs):
   <span class="ansigreen">    690</span>         super(OLS, self).__init__(endog, exog, missing=missing,
   <span class="ansigreen">--&gt; 691</span><span class="ansired">                                   hasconst=hasconst, **kwargs)
   </span><span class="ansigreen">    692</span>         <span class="ansigreen">if</span> <span class="ansiblue">&quot;weights&quot;</span> <span class="ansigreen">in</span> self<span class="ansiblue">.</span>_init_keys<span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">    693</span>             self<span class="ansiblue">.</span>_init_keys<span class="ansiblue">.</span>remove<span class="ansiblue">(</span><span class="ansiblue">&quot;weights&quot;</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/regression/linear_model.pyc</span> in <span class="ansicyan">__init__</span><span class="ansiblue">(self, endog, exog, weights, missing, hasconst, **kwargs)</span>
   <span class="ansigreen">    584</span>             weights <span class="ansiblue">=</span> weights<span class="ansiblue">.</span>squeeze<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    585</span>         super(WLS, self).__init__(endog, exog, missing=missing,
   <span class="ansigreen">--&gt; 586</span><span class="ansired">                                   weights=weights, hasconst=hasconst, **kwargs)
   </span><span class="ansigreen">    587</span>         nobs <span class="ansiblue">=</span> self<span class="ansiblue">.</span>exog<span class="ansiblue">.</span>shape<span class="ansiblue">[</span><span class="ansicyan">0</span><span class="ansiblue">]</span><span class="ansiblue"></span>
   <span class="ansigreen">    588</span>         weights <span class="ansiblue">=</span> self<span class="ansiblue">.</span>weights<span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/regression/linear_model.pyc</span> in <span class="ansicyan">__init__</span><span class="ansiblue">(self, endog, exog, **kwargs)</span>
   <span class="ansigreen">     89</span>     &quot;&quot;&quot;
   <span class="ansigreen">     90</span>     <span class="ansigreen">def</span> __init__<span class="ansiblue">(</span>self<span class="ansiblue">,</span> endog<span class="ansiblue">,</span> exog<span class="ansiblue">,</span> <span class="ansiblue">**</span>kwargs<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">---&gt; 91</span><span class="ansired">         </span>super<span class="ansiblue">(</span>RegressionModel<span class="ansiblue">,</span> self<span class="ansiblue">)</span><span class="ansiblue">.</span>__init__<span class="ansiblue">(</span>endog<span class="ansiblue">,</span> exog<span class="ansiblue">,</span> <span class="ansiblue">**</span>kwargs<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">     92</span>         self<span class="ansiblue">.</span>_data_attr<span class="ansiblue">.</span>extend<span class="ansiblue">(</span><span class="ansiblue">[</span><span class="ansiblue">&apos;pinv_wexog&apos;</span><span class="ansiblue">,</span> <span class="ansiblue">&apos;wendog&apos;</span><span class="ansiblue">,</span> <span class="ansiblue">&apos;wexog&apos;</span><span class="ansiblue">,</span> <span class="ansiblue">&apos;weights&apos;</span><span class="ansiblue">]</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">     93</span> <span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/base/model.pyc</span> in <span class="ansicyan">__init__</span><span class="ansiblue">(self, endog, exog, **kwargs)</span>
   <span class="ansigreen">    184</span> <span class="ansiblue"></span>
   <span class="ansigreen">    185</span>     <span class="ansigreen">def</span> __init__<span class="ansiblue">(</span>self<span class="ansiblue">,</span> endog<span class="ansiblue">,</span> exog<span class="ansiblue">=</span>None<span class="ansiblue">,</span> <span class="ansiblue">**</span>kwargs<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 186</span><span class="ansired">         </span>super<span class="ansiblue">(</span>LikelihoodModel<span class="ansiblue">,</span> self<span class="ansiblue">)</span><span class="ansiblue">.</span>__init__<span class="ansiblue">(</span>endog<span class="ansiblue">,</span> exog<span class="ansiblue">,</span> <span class="ansiblue">**</span>kwargs<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    187</span>         self<span class="ansiblue">.</span>initialize<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    188</span> <span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/base/model.pyc</span> in <span class="ansicyan">__init__</span><span class="ansiblue">(self, endog, exog, **kwargs)</span>
   <span class="ansigreen">     58</span>         hasconst <span class="ansiblue">=</span> kwargs<span class="ansiblue">.</span>pop<span class="ansiblue">(</span><span class="ansiblue">&apos;hasconst&apos;</span><span class="ansiblue">,</span> None<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">     59</span>         self.data = self._handle_data(endog, exog, missing, hasconst,
   <span class="ansigreen">---&gt; 60</span><span class="ansired">                                       **kwargs)
   </span><span class="ansigreen">     61</span>         self<span class="ansiblue">.</span>k_constant <span class="ansiblue">=</span> self<span class="ansiblue">.</span>data<span class="ansiblue">.</span>k_constant<span class="ansiblue"></span>
   <span class="ansigreen">     62</span>         self<span class="ansiblue">.</span>exog <span class="ansiblue">=</span> self<span class="ansiblue">.</span>data<span class="ansiblue">.</span>exog<span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/base/model.pyc</span> in <span class="ansicyan">_handle_data</span><span class="ansiblue">(self, endog, exog, missing, hasconst, **kwargs)</span>
   <span class="ansigreen">     82</span> <span class="ansiblue"></span>
   <span class="ansigreen">     83</span>     <span class="ansigreen">def</span> _handle_data<span class="ansiblue">(</span>self<span class="ansiblue">,</span> endog<span class="ansiblue">,</span> exog<span class="ansiblue">,</span> missing<span class="ansiblue">,</span> hasconst<span class="ansiblue">,</span> <span class="ansiblue">**</span>kwargs<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">---&gt; 84</span><span class="ansired">         </span>data <span class="ansiblue">=</span> handle_data<span class="ansiblue">(</span>endog<span class="ansiblue">,</span> exog<span class="ansiblue">,</span> missing<span class="ansiblue">,</span> hasconst<span class="ansiblue">,</span> <span class="ansiblue">**</span>kwargs<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">     85</span>         <span class="ansired"># kwargs arrays could have changed, easier to just attach here</span><span class="ansiblue"></span><span class="ansiblue"></span>
   <span class="ansigreen">     86</span>         <span class="ansigreen">for</span> key <span class="ansigreen">in</span> kwargs<span class="ansiblue">:</span><span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/base/data.pyc</span> in <span class="ansicyan">handle_data</span><span class="ansiblue">(endog, exog, missing, hasconst, **kwargs)</span>
   <span class="ansigreen">    564</span>     klass <span class="ansiblue">=</span> handle_data_class_factory<span class="ansiblue">(</span>endog<span class="ansiblue">,</span> exog<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    565</span>     return klass(endog, exog=exog, missing=missing, hasconst=hasconst,
   <span class="ansigreen">--&gt; 566</span><span class="ansired">                  **kwargs)
   </span>
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/base/data.pyc</span> in <span class="ansicyan">__init__</span><span class="ansiblue">(self, endog, exog, missing, hasconst, **kwargs)</span>
   <span class="ansigreen">     74</span>         <span class="ansired"># this has side-effects, attaches k_constant and const_idx</span><span class="ansiblue"></span><span class="ansiblue"></span>
   <span class="ansigreen">     75</span>         self<span class="ansiblue">.</span>_handle_constant<span class="ansiblue">(</span>hasconst<span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">---&gt; 76</span><span class="ansired">         </span>self<span class="ansiblue">.</span>_check_integrity<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">     77</span>         self<span class="ansiblue">.</span>_cache <span class="ansiblue">=</span> resettable_cache<span class="ansiblue">(</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">     78</span> <span class="ansiblue"></span>
   
   <span class="ansigreen">/build/buildd/statsmodels-0.6.1/debian/python-statsmodels/usr/lib/python2.7/dist-packages/statsmodels/base/data.pyc</span> in <span class="ansicyan">_check_integrity</span><span class="ansiblue">(self)</span>
   <span class="ansigreen">    363</span>         <span class="ansigreen">if</span> self<span class="ansiblue">.</span>exog <span class="ansigreen">is</span> <span class="ansigreen">not</span> None<span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">    364</span>             <span class="ansigreen">if</span> len<span class="ansiblue">(</span>self<span class="ansiblue">.</span>exog<span class="ansiblue">)</span> <span class="ansiblue">!=</span> len<span class="ansiblue">(</span>self<span class="ansiblue">.</span>endog<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   <span class="ansigreen">--&gt; 365</span><span class="ansired">                 </span><span class="ansigreen">raise</span> ValueError<span class="ansiblue">(</span><span class="ansiblue">&quot;endog and exog matrices are different sizes&quot;</span><span class="ansiblue">)</span><span class="ansiblue"></span>
   <span class="ansigreen">    366</span> <span class="ansiblue"></span>
   <span class="ansigreen">    367</span>     <span class="ansigreen">def</span> wrap_output<span class="ansiblue">(</span>self<span class="ansiblue">,</span> obj<span class="ansiblue">,</span> how<span class="ansiblue">=</span><span class="ansiblue">&apos;columns&apos;</span><span class="ansiblue">,</span> names<span class="ansiblue">=</span>None<span class="ansiblue">)</span><span class="ansiblue">:</span><span class="ansiblue"></span>
   
   <span class="ansired">ValueError</span>: endog and exog matrices are different sizes</pre>
   </div>
   </div>
   
   </div>
   </div>
   
   </div>

   <script src="https://c328740.ssl.cf1.rackcdn.com/mathjax/latest/MathJax.js?config=TeX-AMS_HTML"type="text/javascript"></script>
   <script type="text/javascript">
   init_mathjax = function() {
       if (window.MathJax) {
           // MathJax loaded
           MathJax.Hub.Config({
               tex2jax: {
               // I'm not sure about the \( and \[ below. It messes with the
               // prompt, and I think it's an issue with the template. -SS
                   inlineMath: [ ['$','$'], ["\\(","\\)"] ],
                   displayMath: [ ['$$','$$'], ["\\[","\\]"] ]
               },
               displayAlign: 'left', // Change this to 'center' to center equations.
               "HTML-CSS": {
                   styles: {'.MathJax_Display': {"margin": 0}}
               }
           });
           MathJax.Hub.Queue(["Typeset",MathJax.Hub]);
       }
   }
   init_mathjax();

   // since we have to load this in a ..raw:: directive we will add the css
   // after the fact
   function loadcssfile(filename){
       var fileref=document.createElement("link")
       fileref.setAttribute("rel", "stylesheet")
       fileref.setAttribute("type", "text/css")
       fileref.setAttribute("href", filename)

       document.getElementsByTagName("head")[0].appendChild(fileref)
   }
   // loadcssfile({{pathto("_static/nbviewer.pygments.css", 1) }})
   // loadcssfile({{pathto("_static/nbviewer.min.css", 1) }})
   loadcssfile("../../../_static/nbviewer.pygments.css")
   loadcssfile("../../../_static/ipython.min.css")
   </script>