title

description

keywords

draft

weight

author

authorlink

aliases

Improve your Modelsummary regression table with the kableExtra package in R

An example showing useful functions of the kableExtra package to improve your Modelsummary output in R

kableextra, kable, latex, modelsummary, regression, model, table, R

true

6

Valerie Vossen

https://nl.linkedin.com/in/valerie-vossen

/kableextra

/run/kableextra/

/run/kableextra

Overview

The main purpose of the kableExtra package is to simplify the process of creating tables with custom styles and formatting in R. For instance, you can include fixed effects rows, labeling rows to indicate categories and include headers. In this building block, we will provide you with an example using some useful functions of kableExtra. Modelsummary output will be improved to a table in LaTex format ready to use for your publishable paper!

The Modelsummary output used as the starting point is the replicated Table 1 of Eiccholtz et al. (2010) from our Modelsummary building block (link). This table shows the results of a model regressing the rent of commercial office buildings on a dummy variable (1 if rated as green) and other characteristics of the buildings. More information about this paper and the table can be found in the Modelsummary building block!

After loading this Modelsummary table, the following kableExtra functions will be covered to get to a nice formatted final output!

Output in LaTex format
Exporting the table
Specify the column alignment
Add fixed effect rows
Add header
Group rows

Load packages and data

# Load packages
library(modelsummary)
library(dplyr)
library(fixest)
library(stringr)
library(knitr)
library(kableExtra)

# Load data 
data_url <- "https://github.com/tilburgsciencehub/website/blob/buildingblock/modelsummary/content/building-blocks/analyze-data/regressions/data_rent.Rda?raw=true"
load(url(data_url)) #data_rent is loaded now

Modelsummary table

models is a list of regression 1 until 5. Please refer to the Modelsummary building block for a detailed overview and understanding of the regressions in the 'models' list.
cm2 and gm2 specify respectively the variable names and statistics in the regression table.

cm2 = c('green_rating'    = 'Green rating (1 $=$ yes)',
        'energystar'    = 'Energystar (1 $=$ yes)',
        'leed'    = 'LEED (1 $=$ yes)',
        'size_new'    = 'Building size (millions of sq.ft)',
        'oocc_new' = 'Fraction occupied',
        'class_a' = 'A (1 $=$ yes)',
        'class_b' = 'B (1 $=$ yes)',
        'net' = 'Net contract (1 $=$ yes)',
        'age_0_10' = '$<$10 years',
        'age_10_20' = '10-20 years',
        'age_20_30' = '20-30 years',
        'age_30_40' = '30-40 years',
        'renovated' = 'Renovated (1 $=$ yes)',
        'story_medium' = 'Intermediate (1 $=$ yes)', 
        'story_high' = 'High (1 $=$ yes)', 
        'amenities' = 'Amenities (1 $=$ yes)')

gm2 <- list(
    list("raw" = "nobs", "clean" = "Sample size", "fmt" = 0),
    list("raw" = "r.squared", "clean" = "$R^2$", "fmt" = 2),
    list("raw" = "adj.r.squared", "clean" = "Adjusted $R^2$", "fmt" = 2))

msummary(models,
         vcov = "HC1",
         fmt = fmt_statistic(estimate = 3, std.error = 3),
         #stars  = c('*' = .1, '**' = 0.05, '***' = .01),
         estimate = "{estimate}",
         statistic = "[{std.error}]{stars}",
         coef_map = cm2, 
         gof_omit = 'AIC|BIC|RMSE|Within|FE',
         gof_map = gm2)

	(1)	(2)	(3)	(4)	(5)
Green rating (1 = yes)	0.035		0.033	0.028
	[0.009]		[0.009]	[0.009]
Energystar (1 = yes)		0.033
		[0.009]
LEED (1 = yes)		0.052
		[0.036]
Building size (millions of sq.ft)	0.113	0.113	0.102	0.110	0.110
	[0.019]	[0.019]	[0.019]	[0.021]	[0.021]
Fraction occupied	0.020	0.020	0.020	0.011	0.011
	[0.016]	[0.016]	[0.016]	[0.016]	[0.016]
A (1 = yes)	0.231	0.231	0.192	0.173	0.173
	[0.012]	[0.012]	[0.014]	[0.015]	[0.015]
B (1 = yes)	0.101	0.101	0.092	0.083	0.083
	[0.011]	[0.011]	[0.011]	[0.011]	[0.011]
Net contract (1 = yes)	-0.047	-0.047	-0.050	-0.051	-0.051
	[0.013]	[0.013]	[0.013]	[0.013]	[0.013]
$<$10 years			0.118	0.131	0.131
			[0.016]	[0.017]	[0.017]
10-20 years			0.079	0.084	0.084
			[0.014]	[0.014]	[0.014]
20-30 years			0.047	0.048	0.048
			[0.013]	[0.013]	[0.013]
30-40 years			0.043	0.044	0.044
			[0.011]	[0.011]	[0.011]
Renovated (1 = yes)			-0.008	-0.008	-0.008
			[0.009]	[0.009]	[0.009]
Intermediate (1 = yes)				0.010	0.010
				[0.009]	[0.009]
High (1 = yes)				-0.027	-0.027
				[0.015]	[0.015]
Amenities (1 = yes)				0.047	0.047
				[0.007]	[0.007]
Sample size	8105	8105	8105	8105	8105
R2	0.71	0.71	0.72	0.72	0.72
Adjusted R2	0.69	0.69	0.69	0.69	0.69

Output in LaTex format

We want the table to be outputted in LaTeX format now. For this, we set the output argument to "latex" in msummary(). LaTeX code is generated that can be copied and pasted directly into a LaTeX document!

msummary(models,     vcov = "HC1",
                     fmt = 3,
                     estimate = "{estimate}",
                     statistic = "[{std.error}]",
                     coef_map = cm2, 
                     gof_omit = 'AIC|BIC|RMSE|Within|FE',
                     gof_map = gm2,
                     output = "latex",
                   escape = FALSE
                   )

{{% tip %}} When escape = FALSE, any special characters that are present in the output of msummary() will not be modified or escaped. This means that they will be printed exactly as they appear in the output, without any changes or substitutions. On the other hand, if escape = TRUE, then any special characters that are present in the output will be replaced with the appropriate LaTeX commands to render them correctly in the final document. {{% /tip %}}

Export the table

To save this LaTex code to a .tex file, we can use the cat() function. The file argument specifies the name of the file we want the output to be printed to: "my_table.tex". Not giving any file argument will print the output to the console.

msummary(models,     vcov = "HC1",
                     fmt = 3,
                     stars  = c('*' = .1, '**' = 0.05, '***' = .01),
                     estimate = "{estimate}",
                     statistic = "[{std.error}]",
                     coef_map = cm2, 
                     gof_omit = 'AIC|BIC|RMSE|Within|FE',
                     gof_map = gm2,
                     output = "latex",
                   escape = FALSE) %>%
  cat(.,file="my_table.tex")

{{% tip %}} The %>% operator is being used to pipe the output of msummary() into the cat() function. Specifically, the dot (.) is a placeholder for the output of the previous function in the pipeline, which in this case is msummary(). {{% /tip %}}

image: kabletable1

Specify the column aligment

You can specify the horizontal alignment of the columns in the table with adding the align argument in the msummary() function. With the following align argument, it is specified that:

The first column with the variable names should be left-aligned (l)
The second through fifth columns with the coefficients should be centered (c)

 msummary(models,     vcov = "HC1",
                     fmt = 3,
                     #stars  = c('*' = .1, '**' = 0.05, '***' = .01),
                     estimate = "{estimate}",
                     statistic = "[{std.error}]",
                     coef_map = cm2, 
                     gof_omit = 'AIC|BIC|RMSE|Within|FE',
                     gof_map = gm2,
                     #stars = NULL,
                     align="lccccc",
                     output = "latex",
                   escape = FALSE
                   )

Add fixed effect rows

In this step, we will add rows indicating the fixed effects included in each regression model.

Creating a tibble

First, the additional rows are specified in a tibble. Within the tibble:

There are two columns: term and "(1)", "(2)", "(3)", "(4)", "(5)".
The two rows specify the two kind of Fixed Effects: Location Fixed Effect & Green Building Fixed Effect. In these rows, it is specified for each regression model whether these Fixed Effects are included: "Checkmark" means included, and "XSolidBrush" means not included.

library(tibble)

rows <- tribble(~term,          ~"(1)", ~"(2)", ~"(3)",~"(4)", ~"(5)",
                'Location Fixed Effect', '\\Checkmark',   '\\Checkmark', '\\Checkmark', '\\Checkmark', '\\Checkmark',
                'Green Building Fixed Effect', '\\XSolidBrush',   '\\XSolidBrush', '\\XSolidBrush', '\\XSolidBrush', '\\Checkmark'
)

Position of Fixed Effect rows in table

With the function attr(), the fixed effect rows are inserted at the right position in the table. This is row 33 and 34, which is in between the estimates and statistics. (Check the code block below.)

add_rows

Adding the tibble with the fixed effect rows to the code producing the final table is done by including add_rows = rows in the msummary() function. Note that rows is the name of our tibble.

row_spec

To separate the fixed effect rows of the summary statistics, a horizontal line is added below the Fixed Effect rows. This is done with the row_spec() function after modelsummary() following a pipe operator.

msummary(models,     vcov = "HC1",
                     fmt = 3,
                     #stars  = c('*' = .1, '**' = 0.05, '***' = .01),
                     estimate = "{estimate}",
                     statistic = "[{std.error}]",
                     coef_map = cm2, 
                     gof_omit = 'AIC|BIC|RMSE|Within|FE',
                     gof_map = gm2,
                     #stars = NULL,
                     add_rows = rows,
                     align="lccccc", 
                     output = "latex",
                   escape = FALSE
                   ) %>%
    row_spec(34, extra_latex_after = "\\midrule") %>%
    cat(., file = "my_table.tex")

image: kabletable2

Add header

The add_header_above() function allows you to add a header row above the existing column headers of the regression table. We add this function after the msummary() following a pipe operator to include a title.

The name of the header is included as a character string.
The first column does not need a header. Setting our header equal to 5 makes sure it spans over column 2-6.

msummary(models,     vcov = "HC1",
                     fmt = 3,
                     #stars  = c('*' = .1, '**' = 0.05, '***' = .01),
                     estimate = "{estimate}",
                     statistic = "[{std.error}]",
                     coef_map = cm2, 
                     gof_omit = 'AIC|BIC|RMSE|Within|FE',
                     gof_map = gm2,
                     #stars = NULL,
                     add_rows = rows,
                     align="lccccc",
                     output = "latex",
                   escape = FALSE
                   ) %>%
    add_header_above(c(" " = 1,
                       "Dependent Variable: $log(rent)$" = 5),
                     escape = FALSE
    ) 
    cat(., file = "my_table.tex")

image3

Group rows

Add a labeling row

We can use the pack_rows() function to insert labeling rows. We use it to specify three categories in our variables, namely Building Class, Age and Stories.

Within the function, we specify the name of the category and the first and last row of the rows that should be grouped together under this category. We also indicate we want the category name to be printed in italic text and not bold.

Indent subgroups

Energystar and LEED are subgroups of the variable Green rating (row 1). For this, "add_intent()" is perfect to use. Instead of creating a labelling row, Energystar (row 3) and LEED (row 5) are listed under Green Rating by indenting these subgroups.

msummary(models,     vcov = "HC1",
                     fmt = 3,
                     #stars  = c('*' = .1, '**' = 0.05, '***' = .01),
                     estimate = "{estimate}",
                     statistic = "[{std.error}]",
                     coef_map = cm2, 
                     gof_omit = 'AIC|BIC|RMSE|Within|FE',
                     gof_map = gm2,
                     #stars = NULL,
                     add_rows = rows,
                     align="lccccc",
                     output = "latex",
                   escape = FALSE
                   ) %>%
    pack_rows("Building Class:", 11, 13, italic = TRUE, bold = FALSE) %>%
    pack_rows("Age:", 17, 24, italic = TRUE, bold = FALSE) %>%
    pack_rows("Stories:", 27, 30, italic = TRUE, bold = FALSE) %>%
    add_indent(3) %>%
    add_indent(5) %>%
    row_spec(34, extra_latex_after = "\\midrule") %>%
    cat(., file = "my_table.tex")

image4: final table

{{% summary %}} In this building block, we provided you with an example on how to use some kableExtra functions to improve your standard Modelsummary table and have it outputted in LaTex format! The following functions are covered:

Output in LaTex format
Export the table
Add fixed effect rows
Specify the column alignment
Add header
Group rows {{% /summary %}}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

kable-extra.md

kable-extra.md

Overview

Load packages and data

Modelsummary table

Output in LaTex format

Export the table

Specify the column aligment

Add fixed effect rows

Creating a tibble

Position of Fixed Effect rows in table

add_rows

row_spec

Add header

Group rows

Add a labeling row

Indent subgroups

Files

kable-extra.md

Latest commit

History

kable-extra.md

File metadata and controls

Overview

Load packages and data

Modelsummary table

Output in LaTex format

Export the table

Specify the column aligment

Add fixed effect rows

Creating a tibble

Position of Fixed Effect rows in table

add_rows

row_spec

Add header

Group rows

Add a labeling row

Indent subgroups