Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

hledger 1.40 renamed total column in csv export #2261

Open
zmanji opened this issue Oct 14, 2024 · 12 comments
Open

hledger 1.40 renamed total column in csv export #2261

zmanji opened this issue Oct 14, 2024 · 12 comments
Labels
balance csv The csv file format, csv output format, or generally CSV-related.

Comments

@zmanji
Copy link

zmanji commented Oct 14, 2024

On 1.34 exporting the results of the balance command to CSV has a total column:

$ hledger -f all.journal bal -M '^income:.*:salary' --invert --transpose -O csv -e today | head -n 1
"account","income:uber:salary","total"

On 1.40 running the same command results in:

$ hledger -f all.journal bal -M '^income:.*:salary' --invert --transpose -O csv -e today | head -n 1                    
"account","income:uber:salary","Total:"

The last column changed from total to Total:. This seems like a regression.

@simonmichael
Copy link
Owner

Hi @zmanji, I believe it was intentional, part of a number of cleanups and consistency improvements for tabular reports across different output formats. Why does it seem like a regression ?

@simonmichael simonmichael added csv The csv file format, csv output format, or generally CSV-related. balance labels Oct 14, 2024
@zmanji
Copy link
Author

zmanji commented Oct 14, 2024

I had many scripts processing the csvs for graphing purposes. Renaming the column broke scripts that were accessing the 'total' column. If this is intentional, then this is fine by me, feel free to close this.

@simonmichael
Copy link
Owner

Sorry about that.

I tend to favour simple lowercase to start with, but usually over time with more real-world users, capitalisation and punctuation tends to win. (Probably the "account" heading here should be capitalised also. The colon might be debatable for CSV.)

@thielema
Copy link
Contributor

That's possibly my fault. I will look into this. The totalRowHeading in Commands.Balance all use upper case Total. Shall I adapt account accordingly?

@thielema
Copy link
Contributor

hledger/test/balance/layout.test use account and Total: with these cases. So the output seems to be intended.

@thielema
Copy link
Contributor

Maybe we can adapt capitalizing to the style of the account names? Say, if a majority of account names start with upper-case, then "account" and "total", should do so as well.

@thielema
Copy link
Contributor

thielema commented Oct 15, 2024 via email

@simonmichael
Copy link
Owner

simonmichael commented Oct 15, 2024

This maybe isn't top priority, but when we are tweaking headings and need a policy, I would probably

  • Capitalise first words consistently (not necessarily all words in a multi-word heading)
  • Consider using a colon as seems best for each case. That's more of a presentation detail so maybe colons make less sense in CSV ?

I think it's better to have a simple fixed rule rather than adapt headings to data.

@the-solipsist
Copy link
Collaborator

the-solipsist commented Dec 29, 2024

My vote would be for:

  1. Capitalization across all headings in viewing formats (like text and HTML).

    • This looks more professional in reports which are to be read, especially if they are to be read by others (e.g., accountants).
    • I'd be okay with capitalization in the CSV as well, but I realize that some (including myself) prefer lowercase headings in CSVs, and snake-case (e.g., account_name) is a convention / best practice for CSVs in many fields. So I'd prefer using lowercase in data formats like CSV (and possibly JSON).
  2. Removing : in headings like Total: in all reports, regardless of whether it's a viewing format or a data format.

@simonmichael
Copy link
Owner

This sounds good to me also, except when people are dragging CSV into a spreadsheet wouldn't they like to see the same presentation-ready capitalised headings that they'd see in text or html output ?

(The FODS format is more specialised for that use case, but works only for Open Office/Libre Office users.)

@the-solipsist
Copy link
Collaborator

the-solipsist commented Dec 29, 2024

when people are dragging CSV into a spreadsheet wouldn't they like to see the same presentation-ready capitalised headings

Given that CSV can be used both for sharing with accountants (meaning it needs to look professional) as well as for data processing, it falls in the middle.

However, given that all spreadsheet software have "Format > Case > Title Case" as an option, I don't think it should matter very much.

The same can also be said about sed -i '1s/.*/\L&/g file.csv to change the column names to lowercase.

The question is which is a better default. I'd suggest that lowercase is the better default. I find that when I want to share a CSV, I tend to do a bit of cleaning up before sharing it with others. When doing that, changing the column names to title case would be part of the cleaning up. I don't expect the CSV that is output by hledger to be shareable without some cleaning up.

Given that, I think considering CSV as a data processing format first and as a presentation format second would be the right order. But I think it would be fine either way given the ease of "convert to title case" or a simple sed command.

@simonmichael
Copy link
Owner

simonmichael commented Dec 30, 2024

OP's report is an example of where someone's data processing was disrupted:

I had many scripts processing the csvs for graphing purposes. Renaming the column broke scripts that were accessing the 'total' column.

With this in mind I would agree with @the-solipsist that ideally we should omit colons and other presentation punctuation from "data" formats like CSV. And possibly I would omit capitalisation as well. Except possibly if it's too much of a headache to implement and maintain these variations and keep them consistent. I don't recall if that would be troublesome, @thielema might have thoughts.

(Yes I'm flip-flopping a bit between the presentation and data processing use cases.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
balance csv The csv file format, csv output format, or generally CSV-related.
Projects
None yet
Development

No branches or pull requests

4 participants