Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Different behavior of column remove for separate_*() and separate() #1587

Open
Edgar-Zamora opened this issue Jan 24, 2025 · 0 comments
Open

Comments

@Edgar-Zamora
Copy link

Edgar-Zamora commented Jan 24, 2025

Issue

While updating code to move from separate() to separate_*(), I noticed that the behavior of where the final placement of the column being separated differed between both. While not breaking changes, the behavior is not expected and might cause issues if column selection is based on position.

Proposed Change

I think the easiest non-breaking change would be to add to the documentation mentioning that the position for the column being separated is not preserved.

Example

  • In separate(), the column remains in the same position when remove=FALSE.
  • In separate*(), using cols_remove=FALSE places the column after the last column that was seperated.
library(tidyverse)
library(tidyr)
library(reprex)

df <- tibble(x = c('a_b', 'c_d', "e_f", 'g_h', 'i_j'),
             x2 = c('k_l', 'm_n', "o_p", 'q_r', 'x_y'))

# using separate
df |> 
  separate(x, into = c('xx1', 'xx2'), sep = '_', remove = FALSE)
#> # A tibble: 5 × 4
#>   x     xx1   xx2   x2   
#>   <chr> <chr> <chr> <chr>
#> 1 a_b   a     b     k_l  
#> 2 c_d   c     d     m_n  
#> 3 e_f   e     f     o_p  
#> 4 g_h   g     h     q_r  
#> 5 i_j   i     j     x_y


df |> 
  separate(x2, into = c('xx1', 'xx2'), sep = '_', remove = FALSE)
#> # A tibble: 5 × 4
#>   x     x2    xx1   xx2  
#>   <chr> <chr> <chr> <chr>
#> 1 a_b   k_l   k     l    
#> 2 c_d   m_n   m     n    
#> 3 e_f   o_p   o     p    
#> 4 g_h   q_r   q     r    
#> 5 i_j   x_y   x     y


# delim
df |> 
  separate_wider_delim(x2, names = c('xx1', 'xx2'), delim = '_', cols_remove =  FALSE)
#> # A tibble: 5 × 4
#>   x     xx1   xx2   x2   
#>   <chr> <chr> <chr> <chr>
#> 1 a_b   k     l     k_l  
#> 2 c_d   m     n     m_n  
#> 3 e_f   o     p     o_p  
#> 4 g_h   q     r     q_r  
#> 5 i_j   x     y     x_y

# position
df |> 
  separate_wider_position(x2, widths = c(xx1 = 1, 1, xx2 = 1), cols_remove = FALSE)
#> # A tibble: 5 × 4
#>   x     xx1   xx2   x2   
#>   <chr> <chr> <chr> <chr>
#> 1 a_b   k     l     k_l  
#> 2 c_d   m     n     m_n  
#> 3 e_f   o     p     o_p  
#> 4 g_h   q     r     q_r  
#> 5 i_j   x     y     x_y

# regex
df |> 
  separate_wider_regex(x2, patterns = c(xx1 = ".", "_", xx2 = "."), cols_remove = FALSE)
#> # A tibble: 5 × 4
#>   x     xx1   xx2   x2   
#>   <chr> <chr> <chr> <chr>
#> 1 a_b   k     l     k_l  
#> 2 c_d   m     n     m_n  
#> 3 e_f   o     p     o_p  
#> 4 g_h   q     r     q_r  
#> 5 i_j   x     y     x_y

Created on 2025-01-23 with reprex v2.1.0

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant