-
Describe the bug To Reproduce >>> from daft import DataFrame, col
>>> df = DataFrame.from_pydict({
... "A": [1, 2, 3, 4],
... "B": [1.5, 2.5, 3.5, 4.5]
... })
>>> df.select(col("A"), col("B")).show(2)
A B
0 1 1.5
1 2 2.5 For expression like >>> df.select(col("A"), col("B") * 2).show(2)
A B
0 1 3.0
1 2 5.0 Not sure if this is by design (and this behavior will be kept in the future). Especially, the output column name for >>> df.select(col("B") * 2, col("A") + col("B")).show(2)
B A
0 3.0 2.5
1 5.0 4.5 Expected behavior |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments
-
Hi @wenleix thanks for bringing this up! Indeed - for naming, we currently default to using the left expression's name if no alias is specified. Specifically, for the example
In the above example, the When this default case is not intended, users can provide external input using
That being said, I just played around with PostgreSQL and it looks like they give the column an anonymous name We like to adhere to having sensible defaults, and we deemed defaulting to the left expression's name as the most sensible default here. That being said, we're open to feedback as to why this might be unexpected behavior! |
Beta Was this translation helpful? Give feedback.
-
We're going to document this behavior better, issue tracking: #340 Thanks @wenleix! |
Beta Was this translation helpful? Give feedback.
Hi @wenleix thanks for bringing this up!
Indeed - for naming, we currently default to using the left expression's name if no alias is specified. Specifically, for the example
col("A") + col("B")
we keep "A" as the name. We decided to keep this as the default behavior because oftentimes the semantic meaning of a column is still kept after performing some corrections on a column. For example:In the above example, the
"year"
column still semantically means a year, and can thus continue to be referred to as"year"
in downstream operations.When this default case is not intended, users can provide external input using
.alias()
, assigning …