Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Convert fseq to function #106

Closed
wants to merge 16 commits into from
Closed

Convert fseq to function #106

wants to merge 16 commits into from

Conversation

krlmlr
Copy link
Member

@krlmlr krlmlr commented Oct 9, 2015

Example:

> as.function(. %>% rexp(n=10) %>% sort)
function (.) 
{
    . <- rexp(., n = 10)
    sort(.)
}

This gives much shorter call stacks and better error messages. Example with options(error = expression(traceback(1))):

> 5 %>% runif %>% paste(collapse=" ") %>% stop
Error in magrittr(.) : 
  0.668534632306546 0.386715253116563 0.194241847842932 0.261686375364661 0.222338400548324
3: stop(.)
2: magrittr(.) at pipe.R#39
1: 5 %>% runif %>% paste(collapse = " ") %>% stop

Another advantage is that visibility is handled implicitly.

Calling an fseq still uses freduce(). I think we can get rid of freduce() entirely, and implement the fseq as a function right away. This requires some more work, we should discuss first.

A functional sequence is now a plain old function with class fseq and an attribute magrittr:function_list that holds the list of functions (previously env[["_function_list"]]). The env with all the underscore-prefixed variables has gone. I think this simplifies things a lot.

All tests pass locally. Closes #107, which adds more tests and is included here. Also added documentation stub in vignette.

The idea is stolen from vadr's mkchain() function; the implementation is mine.

CC @crowding, @gaborcsardi.

Related: #94, #95

Probably breaks #70

@gaborcsardi
Copy link
Member

Excellent idea! @smbache, did you consider this approach? There might be some drawbacks of course, but at the first look, it looks very neat.

@smbache
Copy link
Member

smbache commented Oct 18, 2015

From a conceptual point of view, I like the "purity" of each RHS as each their own function, and that they are applied sequentially; rather than a sequence of

...

. <- rhs1(...)
. <- rhs2(...)

...

On the other hand I can see that debugging (and perhaps tamper) may benefit from this other approach.

@hadley (Dr. purrr) do you have an opinion about this?

Another, perhaps minor (not sure) point is that this would break indexing the functional sequences via [ and [[.

@krlmlr
Copy link
Member Author

krlmlr commented Oct 18, 2015

I'm using this code since I created it, and I literally forgot how painful the debugging of pipes used to be.

We can easily keep the storage format for the fseq, and cache the evaluator function redundantly. If lhs is ., we may want to byte-compile the returned function. I'll do some more work here, perhaps also a benchmark, so that we can discuss further in a second iteration.

Kirill Müller added 9 commits October 18, 2015 16:43
- an fseq is now a function with an attribute magrittr:function_list that contains the list of functions
- the function contains an unrolled representation of the pipe, a sequence of assignments . <- f(., ...) followed by a final function call
- converting an fseq to a function happens simply by removing the fseq class
@krlmlr
Copy link
Member Author

krlmlr commented Oct 18, 2015

Updated the original description. I'm still not sure what the overhead of assembling the function is, and probably there's no point in forcibly byte-compiling. Ready for review.

The freduce() function is not needed anymore, either.

@krlmlr krlmlr changed the title WIP: Convert fseq to function Convert fseq to function Oct 18, 2015
@hadley
Copy link
Member

hadley commented Oct 19, 2015

I don't have any strong feelings either way, but I do like that you can see exactly what magrittr is doing behind the scenes. It will make clear to people why magrittr doesn't (by design) work well with functions that use NSE.

@smbache
Copy link
Member

smbache commented Oct 26, 2015

I have made a branch "simplified" with the following changes:

  • No more fseq and freduce: only composition of a single pipelined function.
  • No more aliases, per request here Drop aliases?  #108 (not sure how I feel about this, but let's play with the idea)
  • Removed tests related to the above
  • Removed vignette until settling on what's to be included, etc.

This has simplified the package a great deal, and several files was deleted.

cc @hadley @gaborcsardi @krlmlr

@krlmlr
Copy link
Member Author

krlmlr commented Oct 26, 2015

I'm glad you like the idea, please feel free to take over at this point. Removal of [ and [[, and of the fseq class, is a breaking change, but this might have been a rarely used feature indeed.

krlmlr referenced this pull request Oct 26, 2015
Removed aliases.
Removed [ and [[ getters (as there are no more fseqs. Could be implemented though)
Removed no-longer-needed tests
@smbache
Copy link
Member

smbache commented Oct 26, 2015

I'm not really sure what I like best; But I probably wouldn't want to do both; too much complexity for little value. I guess the class could still be there, and a getter can be defined that can subset the pipeline. Wouldn't be to difficult.

@smbache
Copy link
Member

smbache commented Oct 27, 2015

seems to be a bit faster than the classic version as well...

@krlmlr
Copy link
Member Author

krlmlr commented Nov 12, 2015

@smbache @jimhester @kevinushey This breaks r-lib/lintr@4369aa80092. Here's a session using Stefan's "simplified" branch for an empty package with an empty source file:

> lintr::lint_package()

> lintr::lint_package()
.Error in "\t" %>% one_or_more() : could not find function "split_chain"
32: "\t" %>% one_or_more()
31: eval(expr, envir, enclos)
30: eval(x$expr, data, x$env) at eval.R#27
29: FUN(X[[i]], ...)
28: lapply(x, lazy_eval, data = data) at eval.R#21
27: lazyeval::lazy_eval(args, as.list(.rex$env))
26: escape(lazyeval::lazy_eval(args, as.list(.rex$env)))
25: paste(sep = "", collapse = "", ...)
24: structure(x, class = "regex")
23: regex(paste(sep = "", collapse = "", ...))
22: p(escape(lazyeval::lazy_eval(args, as.list(.rex$env))))
21: structure(x, class = "regex")
20: regex(p(escape(lazyeval::lazy_eval(args, as.list(.rex$env)))))
19: rex_(args, env)
18: rex("\t" %>% one_or_more())
17: add_options(pattern, options)
16: re_matches(source_file$lines, rex("\t" %>% one_or_more()), locations = TRUE, 
        global = TRUE)
15: linters[[linter]](expr)
14: inherits(x, class)
13: assign_item(x)
12: flatten_list(x, class = "lint")
11: structure(flatten_list(x, class = "lint"), class = "lints")
10: flatten_lints(linters[[linter]](expr))
9: lint(file, ..., parse_settings = FALSE)
8: FUN(X[[i]], ...)
7: lapply(files, function(file) {
       if (interactive()) {
     ...
6: inherits(x, class)
5: assign_item(x)
4: flatten_list(x, class = "lint")
3: structure(flatten_list(x, class = "lint"), class = "lints")
2: flatten_lints(lapply(files, function(file) {
       if (interactive()) {
     ...
1: lintr::lint_package()
> devtools::session_info()
Session info ------------------------------------------------------------------------------------------------------------------------------------------
 setting  value                       
 version  R version 3.2.2 (2015-08-14)
 system   x86_64, linux-gnu           
 ui       RStudio (0.99.486)          
 language en_US:en                    
 collate  en_US.UTF-8                 
 tz       <NA>                        
 date     2015-11-12                  

Packages ----------------------------------------------------------------------------------------------------------------------------------------------
 package    * version     date       source        
 devtools     1.9.1.9000  2015-11-03 local         
 digest       0.6.8       2014-12-31 CRAN (R 3.2.0)
 igraph       1.0.1       2015-06-26 CRAN (R 3.2.1)
 knitr        1.11        2015-08-14 CRAN (R 3.2.1)
 lazyeval     0.1.10.9000 2015-08-21 local         
 lintr        0.3.3       2015-11-12 local         
 magrittr   * 1.5         2015-11-12 local         
 memoise      0.2.99.9000 2015-10-08 local         
 rex          1.0.1       2015-04-28 CRAN (R 3.2.0)
 rstudioapi   0.3.1       2015-04-07 CRAN (R 3.2.0)
 ulimit       0.0-2       2015-04-14 local         

@kevinushey
Copy link
Contributor

The error seems to imply an expression that depends on split_chain() is being evaluated in an environment where it's not available; no idea what could be causing that.

@gaborcsardi
Copy link
Member

@smbache Seems like it. You could do this in .onLoad in rex, and then it happens at run time (well, load time, really).

@smbache
Copy link
Member

smbache commented Nov 13, 2015

yeah; still not sure that this particular issue should appeal to safeguard the private API, as e.g. the suggestion to have helper functions defined inside pipe... It's more an issue on the importing side, rather than the exporting side, no?

@gaborcsardi
Copy link
Member

I think so. If you just take an object from another package, and put it in your package, then the responsibility is yours....

@krlmlr
Copy link
Member Author

krlmlr commented Nov 13, 2015

Agreed so far. Thanks for the insights.

@kevinushey: Can you call register() in .onLoad() in your package, as suggested by Gábor? Currently, it seems to happen during build time, and this means that your copy of the function (in the hidden environment) may have a different implementation (requiring internal APIs not available anymore) than the installed version.

@jimhester
Copy link
Contributor

FWIW this same procedure, importing %>%, then exporting is used in dplyr, ggvis, rvest among many others, so this same issue will happen with all these packages as well until they are re-installed.

We can upload a new version of rex to CRAN after the magrittr release to encourage people to update. But because the user-side fix is straightforward I don't think it is worth changing the method of import.

@jimhester
Copy link
Contributor

Actually I just read Gabor's comment about the register call, so I see this is actually a rex specific issue. I can change the code to call register on .onload, which should fix it. Disregard previous message.

@smbache
Copy link
Member

smbache commented Nov 13, 2015

but it is still an interesting discussion about where to place the "responsibility"...

@smbache
Copy link
Member

smbache commented Nov 13, 2015

I guess if the pipe is to be re-exported, there is no choice but to live with "build-time import"?

@gaborcsardi
Copy link
Member

@smbache You can export a dummy, and then replace it with the run time imported one in .onLoad.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 13, 2015

@smbache, @gaborcsardi: That's what I thought, too -- but reexporting happens via NAMESPACE, isn't this a different mechanism?

@gaborcsardi
Copy link
Member

@krlmlr I am not sure what re-exporting does TBH.

EDIT: but it is easy to try.

@gaborcsardi
Copy link
Member

In any case, if re-exporting is not good (I suspect it is not), then the dummy, replaced in .onLoad will still work imo.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 13, 2015

A reprex would be great. Anyone?

@smbache
Copy link
Member

smbache commented Nov 13, 2015

I'm too old to know what that means 😆 well I have an idea, but not sure what you want exactly.

jimhester added a commit to r-lib/rex that referenced this pull request Nov 13, 2015
@jimhester
Copy link
Contributor

jimhester commented Nov 13, 2015

A package with a NAMESPACE of

importFrom(magrittr,"%>%")
export("%>%")

And the following somewhere in R/

if ("split_chain" %in% codetools::findGlobals(`%>%`, merge = FALSE)$functions) message("using old magrittr") else message("using new magrittr")

Should be a simple reproducible example.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 13, 2015

My reprex tests:

  • Two versions of the same package AA with the same public API but different private APIs
  • One package BB that reexports the public API from AA
  • Installing first version of AA, BB, then second version of AA
  • Calling the reexported API from BB

I don't see irregularities: The correct private API is called without reinstalling BB.

@gaborcsardi
Copy link
Member

@krlmlr Sure, this is fine, of course. You would need to do

test <- private_fun_a1

here: https://github.com/krlmlr/imexport.reprex/blob/master/A1/R/a.R#L2

And similarly in the other package.

The problem with rex is that it stores a copy of %>% in an environment. If it just called %>%, then it would be fine.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 13, 2015

@gaborcsardi: You mean I'd need to <- to break it? I guess this would do, but I think most packages are rather interested in a working solution.

@gaborcsardi
Copy link
Member

@krlmlr Yes, to break it. Or to see why rex is currently broken.

I am not sure why rex needs to store %>% locally? To reexport it?

If you really need to store an actual object (instead of a reference) from another package, then IMO the only working solution is to import/create it in .onLoad.

@jimhester
Copy link
Contributor

rex only exposes most of it's functions within a rex() expression (including %>%). Exporting %>% is legacy behavior and should be removed.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 16, 2015

Instead of calling the constructed function, perhaps we should simply evaluate the body of that function in the parent frame? I think this would finally solve #38.

Downside~~~/Feature~~~: The . would escape to the calling environment.

@smbache
Copy link
Member

smbache commented Nov 16, 2015

IMO that's an unacceptable downside.

@krlmlr
Copy link
Member Author

krlmlr commented Nov 18, 2015

I agree that the leaking . can be surprising. But I can think of ways to fix that:

  • Restore the original value of . on exit if it existed before, remove otherwise
    • Is this possible if . is a promise?
  • Use a unique identifier in the generated function instead of .
    • We probably still want to remove this unique identifier from the caller's environment.

@gaborcsardi
Copy link
Member

Restore the original value of . on exit if it existed before, remove otherwise
Is this possible if . is a promise?

I think this is messy. You can restore promises, but need to write C code for it imo.

Use a unique identifier in the generated function instead of .
We probably still want to remove this unique identifier from the caller's environment.

If you generate a random id every time that might work.

This said, I am not a fan of evaluating the body, it seems somewhat messy. You lose the nice call stack as well. I think it is better to call the function.

@smbache
Copy link
Member

smbache commented Nov 18, 2015

It's a very impure approach. Perhaps it could work, but I'm very much against it.

@krlmlr
Copy link
Member Author

krlmlr commented Dec 30, 2017

The update branch looks much better than this. Looking forward to seeing it released, because it also helps a lot with profiling code that uses pipes.

@krlmlr krlmlr closed this Dec 30, 2017
@wlandau
Copy link

wlandau commented Jan 9, 2020

Does this story continue? Until I saw the thread, I have been using the following to de-pipe code for profiling, and I am looking for a better alternative.

depipe <- function(expr) {
  expr <- substitute(expr)
  chain <- magrittr:::split_chain(expr)
  calls <- c(chain$lhs, chain$rhs)
  calls <- purrr::map(calls, ~as.call(list(quote(`<-`), quote(.), .x)))
  as.call(c(quote(`{`), calls))
}

depipe(
  mtcars %>%
    group_by(cyl) %>%
    summarize(mpg = mean(mpg)) %>%
    ungroup()
)
#> {
#>     . <- mtcars
#>     . <- group_by(., cyl)
#>     . <- summarize(., mpg = mean(mpg))
#>     . <- ungroup(.)
#> }

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants