Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add CSV parsing functions #2361

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

GuntherRademacher
Copy link
Member

These changes add implementations of the XQuery 4.0 functions

  • fn:csv-to-arrays,
  • fn:parse-csv, and
  • fn:csv-to-xml.

The implementation uses the same CSV parser as csv:parse, adapting it by adding new options to integrate additional functionality.

There are 6 new CsvOptions, which now can be used by csv:parse as well:

  • ROW_DELIMITER
  • QUOTE_CHARACTER
  • TRIM_WHITESPACE
  • TRIM_ROWS
  • SELECT_COLUMNS
  • STRICT_QUOTING

All but STRICT_QUOTING are defined in the XQuery 4.0. STRICT_QUOTING = false serves for distinguishing the behaviour of the new functions with respect to quoting from the preserved behaviour of csv:parse.

TRIM_WHITESPACE is not yet implemented as in qt4cg/qtspecs#1677, as it trims whitespace off of quoted fields too. In qt4cg/qtspecs#1675 I made the proposal to additionally allow whitespace outside of quotes. Once these issues have been completed, I will adapt the implementation accordingly.

Empty-line handling had to be changed to conform to the XQuery 4.0 function specification. While empty lines used to be skipped by csv:parse, they are now unconditionally preserved even for that function, such that it now behaves like the new functions with respect to empty lines. Tests have been added to CsvModuleTest and the changed behaviour has been annotated like this:

            // was: "<csv/>");
    parse("\n", "", "<csv><record/></csv>");
              // was: "<csv/>");
    parse("\n\n", "", "<csv><record/><record/></csv>");

With these changes BaseX passes most of the QT4 tests for the new functions. The remaining test failures are for different error codes than expected, e.g.

parse-csv-907
fn:parse-csv('one,two', map{'row-delimiter':('|','||')})
Error : FOCV0002: The value of row-delimiter is not a single character: | ||.
Expect: XPTY0004

parse-csv-914
parse-csv("a,b,c,d,e,f|p,q,r,s,t,u", map{'row-delimiter':'|', 'select-columns':(4,3,1)})?get(-1, 2)
Error : FORG0001: Cannot convert xs:integer to xs:positiveInteger: -1.
Expect: XPTY0004

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant