Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Document verbs and more operators #46

Merged
merged 17 commits into from
Jan 2, 2025
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions docs/source/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,9 @@
"member-order": "bysource",
}

autodoc_class_signature = "separated"
autodoc_default_options = {"exclude-members": "__new__"}

autosectionlabel_prefix_document = True

toc_object_entries_show_parents = "all"
Expand Down
1 change: 0 additions & 1 deletion docs/source/examples.md
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,6 @@ Some examples how to use pydiverse.transform:
* [Best practices / beware the flatfile & embrace working with entities](/examples/best_practices_entities)

```{toctree}
/quickstart
/examples/joining
/examples/aggregations
/examples/window_functions
Expand Down
4 changes: 2 additions & 2 deletions docs/source/examples/aggregations.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,8 +8,8 @@ from pydiverse.transform.extended import *

tbl1 = pdt.Table(dict(a=[1, 1, 2], b=[4, 5, 6]))

tbl1 >> summarize(sum_a=sum(a), sum_b=sum(b)) >> show()
tbl1 >> group_by(tbl1.a) >> summarize(sum_b=sum(b)) >> show()
tbl1 >> summarize(sum_a=a.sum(), sum_b=b.sum()) >> show()
tbl1 >> group_by(tbl1.a) >> summarize(sum_b=b.sum()) >> show()
```

Typical aggregation functions are `sum()`, `mean()`, `count()`, `min()`, `max()`, `any()`, and `all()`.
Expand Down
21 changes: 19 additions & 2 deletions docs/source/reference/api.rst
Original file line number Diff line number Diff line change
Expand Up @@ -8,10 +8,27 @@ API
verbs
operators/index
targets
types


.. currentmodule:: pydiverse.transform

Table
-----

.. currentmodule:: pydiverse.transform
.. autoclass:: Table
:noindex:

ColExpr
-------

.. autoclass:: ColExpr
:members: dtype
:exclude-members: __new__, __init__

Col
---

.. autoclass:: Col
:no-index:
:members: export
:exclude-members: __new__, __init__
10 changes: 10 additions & 0 deletions docs/source/reference/operators/aggregation.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,6 +2,16 @@
Aggregation
===========

Aggregation functions take a ``partition_by`` and ``filter`` keyword argument. The
``partition_by`` argument can only be given when used within ``mutate``. If a
``partition_by`` argument is given and there is a surrounding ``group_by`` /
``ungroup``, the ``group_by`` is ignored and the value of ``partition_by`` is used.

.. warning::
The ``filter`` argument works similar to ``Expr.filter`` in polars. But in contrast
to polars, if all values in a group are ``null`` or the group becomes empty after
filtering, the value of every aggregation function for that group is ``null``, too.

.. currentmodule:: pydiverse.transform.ColExpr
.. autosummary::
:toctree: _generated/
Expand Down
1 change: 1 addition & 0 deletions docs/source/reference/operators/arithmetic.rst
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@ Arithmetic

__add__
__floordiv__
__mod__
__mul__
__neg__
__pos__
Expand Down
14 changes: 14 additions & 0 deletions docs/source/reference/operators/conditional_logic.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,14 @@
=================
Conditional Logic
=================

.. currentmodule:: pydiverse.transform

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:

when
coalesce
ColExpr.map
111 changes: 97 additions & 14 deletions docs/source/reference/operators/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -16,26 +16,109 @@ Column Operations
window
sorting_markers
horizontal_aggregation
conditional_logic
type_conversion


.. currentmodule:: pydiverse.transform
Expression methods
------------------

.. autoclass:: ColExpr
:no-index:
:members: dtype
.. currentmodule:: pydiverse.transform.ColExpr

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:
:nosignatures:

__add__
__and__
__eq__
__floordiv__
__ge__
__gt__
__invert__
__le__
__lt__
__mod__
__mul__
__ne__
__neg__
__or__
__pos__
__pow__
__sub__
__truediv__
__xor__
abs
all
any
ascending
cast
ceil
count
dense_rank
descending
dt.day
dt.day_of_week
dt.day_of_year
dt.hour
dt.microsecond
dt.millisecond
dt.minute
dt.month
dt.second
dt.year
dur.days
dur.hours
dur.microseconds
dur.milliseconds
dur.minutes
dur.seconds
exp
fill_null
floor
is_in
is_inf
is_nan
is_not_inf
is_not_nan
is_not_null
is_null
log
map
max
mean
min
nulls_first
nulls_last
rank
round
shift
str.contains
str.ends_with
str.len
str.lower
str.replace_all
str.slice
str.starts_with
str.strip
str.to_date
str.to_datetime
str.upper
sum

lit
when
Global functions
----------------

.. currentmodule:: pydiverse.transform

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:
:nosignatures:

ColExpr.cast
ColExpr.map
coalesce
count
dense_rank
lit
max
min
rank
row_number
when
13 changes: 13 additions & 0 deletions docs/source/reference/operators/type_conversion.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
===============
Type Conversion
===============

.. currentmodule:: pydiverse.transform

.. autosummary::
:toctree: _generated/
:template: autosummary/short_title.rst
:nosignatures:

lit
ColExpr.cast
28 changes: 28 additions & 0 deletions docs/source/reference/types.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
=====
Types
=====

.. currentmodule:: pydiverse.transform
.. autosummary::
:toctree: _generated/
:nosignatures:
:template: autosummary/short_title.rst

Dtype
Bool
Date
Datetime
Decimal
Float
Float32
Float64
Int
Int8
Int16
Int32
Int64
String
Uint8
Uint16
Uint32
Uint64
1 change: 1 addition & 0 deletions docs/source/reference/verbs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,7 @@ Verbs
filter
full_join
group_by
inner_join
join
left_join
mutate
Expand Down
58 changes: 56 additions & 2 deletions generate_col_ops.py
Original file line number Diff line number Diff line change
Expand Up @@ -15,6 +15,7 @@

COL_EXPR_PATH = "./src/pydiverse/transform/_internal/tree/col_expr.py"
FNS_PATH = "./src/pydiverse/transform/_internal/pipe/functions.py"
API_DOCS_PATH = "./docs/source/reference/operators/index.rst"

NAMESPACES = ["str", "dt", "dur"]

Expand Down Expand Up @@ -78,7 +79,8 @@ def generate_fn_decl(
}

annotated_kwargs = "".join(
f", {kwarg}: {context_kwarg_annotation[kwarg]} | None = None"
f", {kwarg.name}: {context_kwarg_annotation[kwarg.name]}"
+ f"{'' if kwarg.required else ' | None = None'}"
for kwarg in op.context_kwargs
)

Expand Down Expand Up @@ -116,7 +118,7 @@ def generate_fn_body(
args = add_vararg_star(args)

if op.context_kwargs is not None:
kwargs = "".join(f", {kwarg}={kwarg}" for kwarg in op.context_kwargs)
kwargs = "".join(f", {kwarg.name}={kwarg.name}" for kwarg in op.context_kwargs)
else:
kwargs = ""

Expand Down Expand Up @@ -246,3 +248,55 @@ def indent(s: str, by: int) -> str:
file.truncate()

os.system(f"ruff format {FNS_PATH}")

with open(API_DOCS_PATH, "r+") as file:
new_file_contents = ""

for line in file:
new_file_contents += line
if line.startswith("Expression methods"):
new_file_contents += (
"------------------\n\n"
".. currentmodule:: pydiverse.transform.ColExpr\n\n"
".. autosummary::\n"
" :nosignatures:\n\n "
)

new_file_contents += "\n ".join(
sorted(
[
op.name
for op in ops.__dict__.values()
if isinstance(op, Operator) and op.generate_expr_method
]
+ ["rank", "dense_rank", "map", "cast"]
)
)

new_file_contents += (
"\n\nGlobal functions\n"
"----------------\n\n"
".. currentmodule:: pydiverse.transform\n\n"
".. autosummary::\n"
" :nosignatures:\n\n "
)

new_file_contents += (
"\n ".join(
sorted(
[
op.name
for op in ops.__dict__.values()
if isinstance(op, Operator) and not op.generate_expr_method
]
+ ["when", "lit"]
)
)
+ "\n"
)

break

file.seek(0)
file.write(new_file_contents)
file.truncate()
5 changes: 2 additions & 3 deletions src/pydiverse/transform/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,11 +2,10 @@

from ._internal.pipe.pipeable import verb
from ._internal.pipe.table import Table
from ._internal.tree.col_expr import ColExpr
from ._internal.tree.col_expr import Col, ColExpr
from .extended import *
from .extended import __all__ as __extended
from .types import *
from .types import __all__ as __types

__all__ = ["Table", "ColExpr", "verb"]
# __all__ += __extended + __types
__all__ = ["Table", "ColExpr", "Col", "verb"] + __extended + __types
8 changes: 1 addition & 7 deletions src/pydiverse/transform/_internal/backend/polars.py
Original file line number Diff line number Diff line change
Expand Up @@ -71,14 +71,8 @@ def export(
lf.name = nd.name
return lf

raise AssertionError

@staticmethod
def export_col(expr: ColExpr, target: Target) -> pl.Series:
if isinstance(target, Polars):
...
elif isinstance(target, Pandas):
...
return lf.collect().to_pandas(use_pyarrow_extension_array=True)

raise AssertionError

Expand Down
Loading
Loading