-
Notifications
You must be signed in to change notification settings - Fork 14
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Grammar reference #66
Comments
You're right there's no formal grammar reference. The closest we have is the Getting started section in our docs. Nevertheless, formulae and formulaic should agree in the design matrices that are constructed from the same formula. The internals are different, but the results should match. One of the most relevant differences is that formulae implements the |
Thank you. How does it compare to R? We are in the process of migrating an R package to python, it would be useful to know them. |
Could you tell me which features you're interested in? That could help to point you to relevant differences. Formulae is very similar to model formulas in R. It still lacks the double pipe |
I was wondering about diff or lag operators for example. It is not clear to me how vector operators are handled. |
I think this is going to be clearer if you share a concrete R example with both the input and the output. In particular, how do you use |
Some thing like below:
As you can see diff and lag apply to the whole 'vector', they are not scalar functions. |
@teucer could you share a reproducible example? That's what I mean with both input and output. I understand the formula, but I don't know what is the expected behavior from that formula. What I would like to have is the following
As far as I know, the result of > x <- 1:5
> length(lag(x))
# [1] 5
> length(diff(x))
# [1] 4 |
Sorry for the lengthy discussion, I am not the author of the package that we are trying to convert. It seems that people have implemented their own # this is not a scalar function
import numpy as np
def lag(arr, num=1, fill_value=np.nan):
if num >= 0:
return np.concatenate((np.full(num, fill_value), arr[:-num]))
else:
return np.concatenate((arr[-num:], np.full(-num, fill_value)))
xs = np.arange(10)
print(lag(xs)) # > [nan 0. 1. 2. 3. 4. 5. 6. 7. 8.]
dm = design_matrices("y1 ~ lag(x)", data) Now, it would be useful if we could pass our own additional transformations. I think you are using Environment to do that. What about passing a dictionary of transformations? It might be the case that we are defining them in another file. # we want to avoid star import: "from .utils import *"
from .utils import lag
def myfun(x):
return x + 1
trans_dict = {"myfun": myfun, "lag": lag}
dm = design_matrices("y1 ~ myfun(x)", data, transformations=trans_dict) |
Thanks for the comments clarifying my question!
Lines 524 to 533 in 38d7f42
formulae/formulae/transforms.py Lines 401 to 412 in 38d7f42
which is what Formulae consumes when looking for internal transformations. However I don't think this is a clean solution.
|
PS: could do PRs if required. |
It would be good if you could try to work on a PR for 2 (allowing to keep missing values) and 3, allowing to pass a dictionary with transformations. For 3, have a look at Line 513 in 38d7f42
and https://github.com/bambinos/formulae/blob/master/formulae/environment.py and feel free to ask questions. As a pointer, I would create a new Environment holding the transforms you want and./or use |
Yes. Can we do a release? |
The development version as a lot of breaking changes. I want to double check a couple of things before doing a release. |
Ok. Is there an ETA? PS: My main use case is behind corporate barriers, it would be difficult to install from github. We have an internal pypi proxy. |
I've been thinking about the release and I think it's not going to be a problem to have a release now. We need to update the Changelog first. I just wanted to add tests for some features we don't have covered yet and test whether this development version was OK for what I want to do in Bambi. I guess I can do another release if I need to include more changes for Bambi. |
Thank you for the support. Looking forward to the new release. PS: I would be happy to further contribute if you have further tasks. If you add tasks and tag them (e.g. need contributor), it would be easier for me to pickup and send PRs. |
I'm having a problem building the docs, see https://github.com/bambinos/formulae/runs/5511845996?check_suite_focus=true I don't know what's going on. I tried to reproduce the problem locally (same version of everything) but I couldn't. Edit It drove me nuts. I fixed the error by changing how |
@teucer there's a new release now :) |
I could not find any reference on the grammar that is supported. How does it compare to formulaic?
The text was updated successfully, but these errors were encountered: