Discrete Choice Models Overview¶

from __future__ import print_function
import numpy as np
import statsmodels.api as sm

Data¶

Load data from Spector and Mazzeo (1980). Examples follow Greene's Econometric Analysis Ch. 21 (5th Edition).

spector_data = sm.datasets.spector.load()
spector_data.exog = sm.add_constant(spector_data.exog, prepend=False)

Inspect the data:

print(spector_data.exog[:5,:])
print(spector_data.endog[:5])

[[  2.66  20.     0.     1.  ]
 [  2.89  22.     0.     1.  ]
 [  3.28  24.     0.     1.  ]
 [  2.92  12.     0.     1.  ]
 [  4.    21.     0.     1.  ]]
[ 0.  0.  0.  0.  1.]

Linear Probability Model (OLS)¶

lpm_mod = sm.OLS(spector_data.endog, spector_data.exog)
lpm_res = lpm_mod.fit()
print('Parameters: ', lpm_res.params[:-1])

Parameters:  [ 0.46385168  0.01049512  0.37855479]

Logit Model¶

logit_mod = sm.Logit(spector_data.endog, spector_data.exog)
logit_res = logit_mod.fit(disp=0)
print('Parameters: ', logit_res.params)

Parameters:  [  2.82611259   0.09515766   2.37868766 -13.02134686]

Marginal Effects

margeff = logit_res.get_margeff()
print(margeff.summary())

        Logit Marginal Effects       
=====================================
Dep. Variable:                      y
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.3626      0.109      3.313      0.001       0.148       0.577
x2             0.0122      0.018      0.686      0.493      -0.023       0.047
x3             0.3052      0.092      3.304      0.001       0.124       0.486
==============================================================================

As in all the discrete data models presented below, we can print a nice summary of results:

print(logit_res.summary())

                           Logit Regression Results                           
==============================================================================
Dep. Variable:                      y   No. Observations:                   32
Model:                          Logit   Df Residuals:                       28
Method:                           MLE   Df Model:                            3
Date:                Tue, 28 Feb 2017   Pseudo R-squ.:                  0.3740
Time:                        21:32:53   Log-Likelihood:                -12.890
converged:                       True   LL-Null:                       -20.592
                                        LLR p-value:                  0.001502
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             2.8261      1.263      2.238      0.025       0.351       5.301
x2             0.0952      0.142      0.672      0.501      -0.182       0.373
x3             2.3787      1.065      2.234      0.025       0.292       4.465
const        -13.0213      4.931     -2.641      0.008     -22.687      -3.356
==============================================================================

Probit Model¶

probit_mod = sm.Probit(spector_data.endog, spector_data.exog)
probit_res = probit_mod.fit()
probit_margeff = probit_res.get_margeff()
print('Parameters: ', probit_res.params)
print('Marginal effects: ')
print(probit_margeff.summary())

Optimization terminated successfully.
         Current function value: 0.400588
         Iterations 6
Parameters:  [ 1.62581004  0.05172895  1.42633234 -7.45231965]
Marginal effects: 
       Probit Marginal Effects       
=====================================
Dep. Variable:                      y
Method:                          dydx
At:                           overall
==============================================================================
                dy/dx    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1             0.3608      0.113      3.182      0.001       0.139       0.583
x2             0.0115      0.018      0.624      0.533      -0.025       0.048
x3             0.3165      0.090      3.508      0.000       0.140       0.493
==============================================================================

Multinomial Logit¶

Load data from the American National Election Studies:

anes_data = sm.datasets.anes96.load()
anes_exog = anes_data.exog
anes_exog = sm.add_constant(anes_exog, prepend=False)

Inspect the data:

print(anes_data.exog[:5,:])
print(anes_data.endog[:5])

[[ -2.30258509   7.          36.           3.           1.        ]
 [  5.24755025   3.          20.           4.           1.        ]
 [  3.43720782   2.          24.           6.           1.        ]
 [  4.4200447    3.          28.           6.           1.        ]
 [  6.46162441   5.          68.           6.           1.        ]]
[ 6.  1.  1.  1.  0.]

Fit MNL model:

mlogit_mod = sm.MNLogit(anes_data.endog, anes_exog)
mlogit_res = mlogit_mod.fit()
print(mlogit_res.params)

Optimization terminated successfully.
         Current function value: 1.548647
         Iterations 7
[[ -1.15359746e-02  -8.87506530e-02  -1.05966699e-01  -9.15567017e-02
   -9.32846040e-02  -1.40880692e-01]
 [  2.97714352e-01   3.91668642e-01   5.73450508e-01   1.27877179e+00
    1.34696165e+00   2.07008014e+00]
 [ -2.49449954e-02  -2.28978371e-02  -1.48512069e-02  -8.68134503e-03
   -1.79040689e-02  -9.43264870e-03]
 [  8.24914421e-02   1.81042758e-01  -7.15241904e-03   1.99827955e-01
    2.16938850e-01   3.21925702e-01]
 [  5.19655317e-03   4.78739761e-02   5.75751595e-02   8.44983753e-02
    8.09584122e-02   1.08894083e-01]
 [ -3.73401677e-01  -2.25091318e+00  -3.66558353e+00  -7.61384309e+00
   -7.06047825e+00  -1.21057509e+01]]

Poisson¶

Load the Rand data. Note that this example is similar to Cameron and Trivedi's Microeconometrics Table 20.5, but it is slightly different because of minor changes in the data.

rand_data = sm.datasets.randhie.load()
rand_exog = rand_data.exog.view(float).reshape(len(rand_data.exog), -1)
rand_exog = sm.add_constant(rand_exog, prepend=False)

Fit Poisson model:

poisson_mod = sm.Poisson(rand_data.endog, rand_exog)
poisson_res = poisson_mod.fit(method="newton")
print(poisson_res.summary())

Optimization terminated successfully.
         Current function value: 3.091609
         Iterations 12
                          Poisson Regression Results                          
==============================================================================
Dep. Variable:                      y   No. Observations:                20190
Model:                        Poisson   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Tue, 28 Feb 2017   Pseudo R-squ.:                 0.06343
Time:                        21:32:54   Log-Likelihood:                -62420.
converged:                       True   LL-Null:                       -66647.
                                        LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0525      0.003    -18.216      0.000      -0.058      -0.047
x2            -0.2471      0.011    -23.272      0.000      -0.268      -0.226
x3             0.0353      0.002     19.302      0.000       0.032       0.039
x4            -0.0346      0.002    -21.439      0.000      -0.038      -0.031
x5             0.2717      0.012     22.200      0.000       0.248       0.296
x6             0.0339      0.001     60.098      0.000       0.033       0.035
x7            -0.0126      0.009     -1.366      0.172      -0.031       0.005
x8             0.0541      0.015      3.531      0.000       0.024       0.084
x9             0.2061      0.026      7.843      0.000       0.155       0.258
const          0.7004      0.011     62.741      0.000       0.678       0.722
==============================================================================

Negative Binomial¶

The negative binomial model gives slightly different results.

mod_nbin = sm.NegativeBinomial(rand_data.endog, rand_exog)
res_nbin = mod_nbin.fit(disp=False)
print(res_nbin.summary())

/private/tmp/statsmodels/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)

                     NegativeBinomial Regression Results                      
==============================================================================
Dep. Variable:                      y   No. Observations:                20190
Model:               NegativeBinomial   Df Residuals:                    20180
Method:                           MLE   Df Model:                            9
Date:                Tue, 28 Feb 2017   Pseudo R-squ.:                 0.01845
Time:                        21:32:56   Log-Likelihood:                -43384.
converged:                      False   LL-Null:                       -44199.
                                        LLR p-value:                     0.000
==============================================================================
                 coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0580      0.006     -9.517      0.000      -0.070      -0.046
x2            -0.2678      0.023    -11.802      0.000      -0.312      -0.223
x3             0.0412      0.004      9.937      0.000       0.033       0.049
x4            -0.0381      0.003    -11.219      0.000      -0.045      -0.031
x5             0.2690      0.030      8.981      0.000       0.210       0.328
x6             0.0382      0.001     26.081      0.000       0.035       0.041
x7            -0.0441      0.020     -2.200      0.028      -0.083      -0.005
x8             0.0172      0.036      0.477      0.633      -0.054       0.088
x9             0.1780      0.074      2.397      0.017       0.032       0.324
const          0.6636      0.025     26.787      0.000       0.615       0.712
alpha          1.2930      0.019     69.477      0.000       1.256       1.329
==============================================================================

Alternative solvers¶

The default method for fitting discrete data MLE models is Newton-Raphson. You can use other solvers by using the method argument:

mlogit_res = mlogit_mod.fit(method='bfgs', maxiter=100)
print(mlogit_res.summary())

Warning: Maximum number of iterations has been exceeded.
         Current function value: 1.548650
         Iterations: 100
         Function evaluations: 106
         Gradient evaluations: 106
                          MNLogit Regression Results                          
==============================================================================
Dep. Variable:                      y   No. Observations:                  944
Model:                        MNLogit   Df Residuals:                      908
Method:                           MLE   Df Model:                           30
Date:                Tue, 28 Feb 2017   Pseudo R-squ.:                  0.1648
Time:                        21:32:56   Log-Likelihood:                -1461.9
converged:                      False   LL-Null:                       -1750.3
                                        LLR p-value:                1.827e-102
==============================================================================
       y=1       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0116      0.034     -0.338      0.735      -0.079       0.056
x2             0.2973      0.094      3.175      0.001       0.114       0.481
x3            -0.0250      0.007     -3.825      0.000      -0.038      -0.012
x4             0.0821      0.074      1.116      0.264      -0.062       0.226
x5             0.0052      0.018      0.294      0.769      -0.029       0.040
const         -0.3689      0.630     -0.586      0.558      -1.603       0.866
------------------------------------------------------------------------------
       y=2       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0888      0.039     -2.269      0.023      -0.166      -0.012
x2             0.3913      0.108      3.615      0.000       0.179       0.603
x3            -0.0229      0.008     -2.897      0.004      -0.038      -0.007
x4             0.1808      0.085      2.120      0.034       0.014       0.348
x5             0.0478      0.022      2.145      0.032       0.004       0.091
const         -2.2451      0.763     -2.942      0.003      -3.741      -0.749
------------------------------------------------------------------------------
       y=3       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.1062      0.057     -1.861      0.063      -0.218       0.006
x2             0.5730      0.159      3.614      0.000       0.262       0.884
x3            -0.0149      0.011     -1.313      0.189      -0.037       0.007
x4            -0.0075      0.126     -0.060      0.952      -0.255       0.240
x5             0.0575      0.034      1.711      0.087      -0.008       0.123
const         -3.6592      1.156     -3.164      0.002      -5.926      -1.393
------------------------------------------------------------------------------
       y=4       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0914      0.044     -2.085      0.037      -0.177      -0.005
x2             1.2826      0.129      9.937      0.000       1.030       1.536
x3            -0.0085      0.008     -1.008      0.314      -0.025       0.008
x4             0.2012      0.094      2.136      0.033       0.017       0.386
x5             0.0850      0.026      3.240      0.001       0.034       0.136
const         -7.6589      0.960     -7.982      0.000      -9.540      -5.778
------------------------------------------------------------------------------
       y=5       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.0934      0.039     -2.374      0.018      -0.170      -0.016
x2             1.3451      0.117     11.485      0.000       1.116       1.575
x3            -0.0180      0.008     -2.362      0.018      -0.033      -0.003
x4             0.2161      0.085      2.542      0.011       0.049       0.383
x5             0.0808      0.023      3.517      0.000       0.036       0.126
const         -7.0401      0.844     -8.344      0.000      -8.694      -5.387
------------------------------------------------------------------------------
       y=6       coef    std err          z      P>|z|      [0.025      0.975]
------------------------------------------------------------------------------
x1            -0.1409      0.042     -3.345      0.001      -0.224      -0.058
x2             2.0686      0.143     14.433      0.000       1.788       2.349
x3            -0.0095      0.008     -1.164      0.244      -0.025       0.006
x4             0.3216      0.091      3.532      0.000       0.143       0.500
x5             0.1087      0.025      4.299      0.000       0.059       0.158
const        -12.0913      1.059    -11.415      0.000     -14.167     -10.015
==============================================================================

/private/tmp/statsmodels/statsmodels/base/model.py:496: ConvergenceWarning: Maximum Likelihood optimization failed to converge. Check mle_retvals
  "Check mle_retvals", ConvergenceWarning)