Utils
This module contains utility function for regression models.
utils
Functions:
-
ols_to_markdown_table–Convert the summary table of an OLS regression result to a markdown table.
-
run_regression_and_plot–This function performs three steps:
ols_to_markdown_table
ols_to_markdown_table(ols_result: RegressionResultsWrapper, use_bootstrapping: bool = True, predictor_rename_dict: Optional[Dict[str, str]] = None, exclude_predictors: Optional[List[str]] = [], column_rename_dict: Optional[Dict[str, str]] = None, round_dict: Optional[Dict[str, int]] = None, alpha_corr: float = None) -> str
Convert the summary table of an OLS regression result to a markdown table. The intercept is dropped from the output.
Parameters:
-
(ols_resultRegressionResultsWrapper) –The result object of an OLS regression.
-
(use_bootstrappingbool, default:True) –Whether to replace CIs and p-values with bootstrapped values if available. Defaults to
True. -
(predictor_rename_dictOptional[Dict[str, str]], default:None) –A dictionary to rename the predictors in the summary table. If not included, predictors will be tidied slightly instead. Defaults to
None. -
(exclude_predictorsOptional[List[str]], default:[]) –A list of predictors to exclude from the summary table. Defaults to
[]. -
(column_rename_dictOptional[Dict[str, str]], default:None) –A dictionary to rename the summary table columns. Defaults to a pre-specified dictionary if not provided.
-
(round_dictOptional[Dict[str, int]], default:None) –A dictionary to set the rounding precision for each column. Defaults to a pre-specified dictionary if not provided.
-
(alpha_corrfloat, default:None) –The alpha level for multiple comparison correction. If provided, the p-values will be corrected using the Holm-Bonferroni method and a new column will be added to the table with the corrected p-values (i.e., multiplied by 0.05 / alpha_corr). Defaults to
None.
Returns:
-
str(str) –The markdown table representing the summary table of the OLS regression result.
Example
# Fit model
model = smf.ols("Y ~ X1 + X2", data).fit()
# Convert summary table to markdown
markdown_table = ols_to_markdown_table(model)
Source code in stats_utils/regression/utils.py
156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 | |
run_regression_and_plot
run_regression_and_plot(data: DataFrame, var: str, predictors: str, num_bootstrap_samples: int = 2000, show_plot: bool = True, show_summary: bool = True, save_fig: bool = True, ax: Axes = None, figure_kwargs: dict = None, forest_plot_kwargs: dict = None) -> Tuple[RegressionResultsWrapper, Axes]
This function performs three steps:
1) Run a regression analysis using the statsmodels package.
2) Perform bootstrap resampling to estimate confidence intervals and p-values.
3) Create a forest plot of the regression coefficients (indicating significance and confidence intervals).
⚠️ NOTE: This function estimates confidence intervals and p-values using boostrapping. The p-values and confidence intervals given in the model summary are not derived from the bootstrap samples and so may not necessarily correspond to what is shown in the figure (which uses confidence intervals and significance derived from resampling). To obtain p-values and confidence intervals from the bootstrap samples, you can access them from the
modelobject returned by this function. For example, to get the 95% confidence intervals for all predictors, you can usemodel.conf_int_bootstrap(alpha=0.05), and to get the p-values, you can usemodel.pvalues_bootstrap.
Parameters:
-
(dataDataFrame) –The data frame containing the variables used in the model.
-
(varstr) –The dependent variable in the regression model.
-
(predictorsstr) –A string representing predictor variables in the regression model.
-
(num_bootstrap_samplesint, default:2000) –The number of bootstrap samples to use for estimating the sampling distribution. Defaults to
2000. -
(show_plotbool, default:True) –If
True, shows the plot. Defaults toTrue. -
(show_summarybool, default:True) –If
True, shows the model summary. Defaults toTrue. -
(save_figbool, default:True) –If
True, saves the figure. Defaults toTrue. -
(axAxes, default:None) –The Axes object to plot on. If
None, creates a new figure and axes. Defaults toNone. -
(figure_kwargsdict, default:None) –Keyword arguments to pass to the
plt.subplotsfunction. Defaults toNone. -
(forest_plot_kwargsdict, default:None) –Keyword arguments to pass to the
forest_plotfunction. Defaults toNone.
Returns:
-
tuple(Tuple[RegressionResultsWrapper, Axes]) –Containing: model (statsmodels.regression.linear_model.RegressionResultsWrapper): The fitted model object. ax (plt.Axes): The Axes object with the plot.
Example
# Fit model and plot
model, ax = run_regression_and_plot(
data,
"Y",
"X1 + X2 + X3",
)
# Get the 95% confidence intervals for the predictors
print(model.conf_int_bootstrap(alpha=0.05))
# Get the p-values for the predictors
print(model.pvalues_bootstrap)
Source code in stats_utils/regression/utils.py
15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 | |