[ENG-6284] render tsv/csv #834

aaxelb · 2024-12-02T15:43:54Z

allow rendering search responses as lines of tab-separated or comma-separated values

main point:

add simple_tsv and simple_csv renderers in trove.render
- can be seen with query param acceptMediatype=text/tab-separated-values or acceptMediatype=text/csv
- get default columns from static DEFAULT_TABULAR_SEARCH_COLUMN_PATHS in trove.vocab.osfmap
allow "download" responses -- add withFileName=foo query param to get a response with Content-Disposition: attachment and a filename based on "foo"
allow absurd page sizes

changes made along the way:

introduce ProtoRendering as renderer output type, to better decouple rendering from view logic
- include StreamableRendering for responses that might could be streamed, like csv/tsv (tho it's not currently handled any differently from SimpleRendering)
- reshape BaseRenderer (and each existing renderer) to have a consistent call signature (and return ProtoRendering)
  - replace trove.render.get_renderer with trove.render.get_renderer_type -- instantiate the renderer with response data
add trove.views._responder with common logic for building a django HttpResponse for a ProtoRendering
- consistently handles withFileName/Content-Disposition
move some osf-specific constants to trove.vocab.osfmap for easier reuse
pull out some abstractable logic:
- from existing trove.render.simple_json into trove.render._simple_trovesearch (for renderers that include only the list of search results)
- from existing tests.trove.derive._base into tests.trove._input_output_tests (for tests following the same simple input/output pattern as deriver and renderer tests)
add tests.trove.render to cover the new renderers simple_tsv and simple_csv, as well as the existing renderers jsonapi, simple_json, jsonld, and turtle
- minimally update existing renderers to create consistent output

move SKIPPABLE_COLUMNS into osfmap

coveralls · 2024-12-06T18:27:34Z

coverage: 91.833% (+0.6%) from 91.22%
when pulling fb70351 on aaxelb:feat/render-tsv-csv
into 24bc70a on CenterForOpenScience:develop.

mfraezz

A couple nits or questions, but nothing blocking. Tests look sufficient, but behavior should still be confirmed manually on staging.

Pass complete

trove/render/html_browse.py

trove/render/simple_tsv.py

mfraezz · 2024-12-10T20:06:50Z

trove/trovesearch/page_cursor.py

@@ -14,10 +14,13 @@
 MANY_MORE = -1
 MAX_OFFSET = 9997

+DEFAULT_PAGE_SIZE = 13
+MAX_PAGE_SIZE = 10000


Minor: Is this maximum reasonable? Looks like it was previously 101.

Edit: I see the commit message called it "absurd," but I'm guessing it's also "justified for the sake of rendering files"?

yeah the need here is downloading all results in one response, but i hesitated to make that behavior automagic by mediatype... considered making withFileName obviate pagination whenever present, but overall i opted for consistent query param behavior, putting the onus on the client to string together all the params needed for the desired result (e.g. acceptMediatype=text/csv&page[size]=10000&withFileName=my-file-name for a full csv download with up to 10000 rows)

if 10000 at once turns out to be unreasonable in practice... a more complicated (but less costly all-at-once) alternative might be view logic that queries/renders smaller pages one at a time and streams the results

update: now streams, loading only one page (~100 rows) at a time, but streaming more than ~4000 items total still times out -- can further optimize or we can talk about increasing those timeouts for responses that are actively sending data...

We might not run into those same timeouts for ~4k items with production resourcing (or configuration -- unsure where you got that figure, but by default most nginx timeouts are between successive operations rather than the whole response), but I suspect it's fine for now and we can reevaluate if encountering that issue later.

am seeing timeouts at just after 30sec on production

looking at the .csv.part files, each one managed to stream a few pages before it dies... unsure what's stopping it (maybe making it less slow would help...)

CardsearchResponse => CardsearchHandle ValuesearchResponse => ValuesearchHandle

mfraezz

LGTM

aaxelb changed the title ~~[wip] render tsv/csv~~ [ENG-6284][wip] render tsv/csv Dec 2, 2024

aaxelb force-pushed the feat/render-tsv-csv branch 2 times, most recently from 2503bce to 47d2150 Compare December 6, 2024 18:18

aaxelb added 13 commits December 6, 2024 13:19

wip: simple render

c806663

wip

85a159e

wip

1a65601

wip

b77588c

wip

acde994

wip

31c48d7

skip skippable columns in csv/tsv

0d856f2

move SKIPPABLE_COLUMNS into osfmap

disposition

47a47f5

clarify type

efe5766

wip:tests

89f79a7

renderer tests (sharing logic with deriver tests)

cbd8063

fix(jsonapi renderer): stable blanknode ids

cdcadf2

fix(jsonld renderer): stable ordering

a81abf7

aaxelb force-pushed the feat/render-tsv-csv branch from 47d2150 to a81abf7 Compare December 6, 2024 18:20

aaxelb marked this pull request as ready for review December 6, 2024 18:31

aaxelb changed the title ~~[ENG-6284][wip] render tsv/csv~~ [ENG-6284] render tsv/csv Dec 6, 2024

aaxelb added 2 commits December 6, 2024 13:38

allow absurd page sizes

a04488b

remove unused code

ce20fc4

aaxelb force-pushed the feat/render-tsv-csv branch from b75d4e0 to ce20fc4 Compare December 6, 2024 18:50

mfraezz approved these changes Dec 10, 2024

View reviewed changes

aaxelb added 6 commits December 11, 2024 15:28

consistent abbreviation (CSV, TSV)

60a9f2f

reverse inheritance of CSV and TSV

628618f

fix: rendering numbers in csv/tsv

9b7a24d

fix: handle bad mediatype request

743299e

rename non-response responses to "handles"

8cb115c

CardsearchResponse => CardsearchHandle ValuesearchResponse => ValuesearchHandle

wipwip

010ab7a

aaxelb added 5 commits December 19, 2024 09:40

wip: support 'include' for sparse paging

8e2d7db

wipwip

4be8709

wipwipwip

3b19790

wipwipwipwip

d6f9a63

wip?

f3def1e

aaxelb force-pushed the feat/render-tsv-csv branch from 88e566a to f3def1e Compare December 20, 2024 21:17

aaxelb added 2 commits December 20, 2024 16:33

fix: no related properties on value-search

1038780

fix: give indexcards types

fb70351

mfraezz approved these changes Dec 23, 2024

View reviewed changes

mfraezz merged commit 75ab046 into CenterForOpenScience:develop Dec 23, 2024
3 checks passed

aaxelb deleted the feat/render-tsv-csv branch January 2, 2025 15:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[ENG-6284] render tsv/csv #834

[ENG-6284] render tsv/csv #834

aaxelb commented Dec 2, 2024 •

edited

Loading

coveralls commented Dec 6, 2024 •

edited

Loading

mfraezz left a comment

mfraezz Dec 10, 2024 •

edited

Loading

aaxelb Dec 11, 2024

aaxelb Dec 20, 2024

mfraezz Dec 23, 2024

aaxelb Jan 2, 2025 •

edited

Loading

mfraezz left a comment

[ENG-6284] render tsv/csv #834

[ENG-6284] render tsv/csv #834

Conversation

aaxelb commented Dec 2, 2024 • edited Loading

coveralls commented Dec 6, 2024 • edited Loading

mfraezz left a comment

Choose a reason for hiding this comment

mfraezz Dec 10, 2024 • edited Loading

Choose a reason for hiding this comment

aaxelb Dec 11, 2024

Choose a reason for hiding this comment

aaxelb Dec 20, 2024

Choose a reason for hiding this comment

mfraezz Dec 23, 2024

Choose a reason for hiding this comment

aaxelb Jan 2, 2025 • edited Loading

Choose a reason for hiding this comment

mfraezz left a comment

Choose a reason for hiding this comment

aaxelb commented Dec 2, 2024 •

edited

Loading

coveralls commented Dec 6, 2024 •

edited

Loading

mfraezz Dec 10, 2024 •

edited

Loading

aaxelb Jan 2, 2025 •

edited

Loading