You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
If strings contain non-breaking spaces (ASCII code 160), the argument trim_ws = TRUE in read_csv() (or read_delim()) does not work. This was unexpected to me, as str_trim() from stringr from the tidyverse does.
Reprex:
library(tidyverse)
###### Example with regular spaces ###### # Create a vector with strings, spaces represented by ASCII code 32:x<- c(intToUtf8(c(32, 65, 32, 119, 111, 114, 100)), # leading space
intToUtf8(c(65, 32, 115, 101, 110, 116, 101, 110, 99, 101, 32))) # trailing spacex#> [1] " A word" "A sentence "# Save as a csv
write_csv(data.frame(x), "reg_spaces")
# Read back in as csvx2<- read_csv("reg_spaces")
#> Rows: 2 Columns: 1#> ── Column specification ────────────────────────────────────────────────────────#> Delimiter: ","#> chr (1): x#> #> ℹ Use `spec()` to retrieve the full column specification for this data.#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.# Check: no leading or trailing spaces! :Dx2$x#> [1] "A word" "A sentence"# Check other functions:
trimws(x) # Works#> [1] "A word" "A sentence"
str_trim(x) # Works#> [1] "A word" "A sentence" ###### Example with non-breaking spaces ###### # Create a vector with strings, spaces represented by ASCII code 160 (non-breaking spaces):y<- c(intToUtf8(c(160, 65, 32, 119, 111, 114, 100)),
intToUtf8(c(65, 32, 115, 101, 110, 116, 101, 110, 99, 101, 160)))
y#> [1] " A word" "A sentence "# Write out as a csv and read back in:
write_csv(data.frame(y), "nonbreak_spaces")
y2<- read_csv("nonbreak_spaces")
#> Rows: 2 Columns: 1#> ── Column specification ────────────────────────────────────────────────────────#> Delimiter: ","#> chr (1): y#> #> ℹ Use `spec()` to retrieve the full column specification for this data.#> ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.# Check: still has leading and trailing spaces... :(y2$y#> [1] " A word" "A sentence "# Check other functions:
trimws(y) # Does not work#> [1] " A word" "A sentence "
str_trim(y) # Works!#> [1] "A word" "A sentence"
IRL situation: copied text from a website (list of countries separated by commas) to an Excel (csv) spreadsheet, applied "Text to table", using a comma as the separator, to place each country name in a separate column. I ignored the leading white spaces, assuming read_csv() would take care of it, but it did not. After some research, it appears that the csv kept the non-breaking spaces from the website (?), and read_csv() does not remove these.
Looking at the underlying code, I think the parse.ccp code (in the function parse_vector_) could be adjusted to explicitly remove leading and trailing non-breaking spaces. Or it could be added to the header file Token.h: in lines 119 and 121, add \u00A0 as white spaces to remove, in addition to ' ' and '\t'.
The text was updated successfully, but these errors were encountered:
If strings contain non-breaking spaces (ASCII code 160), the argument
trim_ws = TRUE
inread_csv()
(orread_delim()
) does not work. This was unexpected to me, asstr_trim()
fromstringr
from the tidyverse does.Reprex:
IRL situation: copied text from a website (list of countries separated by commas) to an Excel (csv) spreadsheet, applied "Text to table", using a comma as the separator, to place each country name in a separate column. I ignored the leading white spaces, assuming
read_csv()
would take care of it, but it did not. After some research, it appears that the csv kept the non-breaking spaces from the website (?), andread_csv()
does not remove these.Looking at the underlying code, I think the
parse.ccp
code (in the functionparse_vector_
) could be adjusted to explicitly remove leading and trailing non-breaking spaces. Or it could be added to the header fileToken.h
: in lines 119 and 121, add\u00A0
as white spaces to remove, in addition to' '
and'\t'
.The text was updated successfully, but these errors were encountered: