forked from apache/drill
-
Notifications
You must be signed in to change notification settings - Fork 1
ToDo
Paul Rogers edited this page May 1, 2020
·
30 revisions
- Add link to jsonlines in various places. - Done
- Add image below to a doc file somewhere.
- Resolve Union issue: how to emulate old reader?
- Reapply new JSON reader and test revisions. Run all tests. Resolve issues.
- Ask for a full test suite run
- Retrofit other uses of older JSON reader
Branches:
- DRILL-6953-rev2 - Newest version
- DRILL-6953-rev - Prior version
- DRILL-6953 - Original version with a bunch of batch count fixes
- DRILL-7572 - JSON Structure Parser PR (Done)
- DRILL-7574 - Revised projection parser PR (Done)
- DRILL-7633: Fixes for union and repeated list accessors (Done)
- DRILL-7631: Updates to the Json Structure Parser
- json - Working branch
- DRILL-7601: Shift column conversion to reader from scan framework (Closed)
- DRILL-7640: EVF-based JSON Loader
Review comments suggest that the current approach needs adjustment. Maybe:
JSON Format Plugin
| Schema
| /
v L
JSON Loader JSON Projection
| /
v L
JSON Structure Parser
|
v
Jackson JSON Parser
- Basic SPI outline, in tentative location, with registry integration, storage plugin only
- Combine format plugins into core registry. Extension map.
- External structure: directories, jars config files.
- Class loader & isolation design
- Revised spec for overall SPI
- Add format plugins to SPI
- Add session vars to SPI
- Refactor the plugin registry - DRILL-7590, and follow-on fixes - Done
- Fix format plugin immutability issues - DRILL-6168
- Refactor storage plugins (remove init(), ensure immutability)
- Decide on which plugin API changes are acceptable to third parties
- Reshuffle plugin files
- Fix secondary set of plugin registry issues - From lists of tasks
- Avoid scan of all plugins at scan time
- Avoid loading plugins at startup
- Gracefully handle bad configs
- Etc.
- Commit done
- Write up documentation somewhere
Prior work:
- DRILL-7458: Base framework for storage plugins (Abandoned)
- First PR: DRILL-7696 (Done)
- PR for scan framework (Done)
- Second scan framework revision (Open)
- Retrofit CSV reader (Pending. In
csv
branch.)
Series of moves to get scan ready for an planner-created schema.
- CSV Reader
- PR for revised schema handling
- Remove internal conversions in favor of shims
- Look into other readers
- Refactor Column Metadata
- Add wildcard column
- Add untyped column
- Modify scan framework to produce reader schema
- Rework projection parser to generate a schema
- Remove the projection set code in favor of schema
- Refine how projection set/schema is presented in schema negotiator
- JDBC DataSource Issue - From mail list
- Parquet Issue - From mail list/Slack
- Review current driver. Where is boundary between JDBC and Drill client?
- Review wire formats. Avro? Thrift?
- Update wire format from Jig.
- Review Avatica for its JSON wire format.
- Review Hive's client. Usable in this context?
- Resurrect Jig serializer, deserializers; update for column accessors
- Create a JDBC2
- Merge
VectorContainer
,VectorAccesible
,RowSet
,BatchAccesor
, etc. - How to split RSL?
ResultSetLoader | BatchSetLoader
-----------------+----------------
Vector Accessors, etc.
Tasks:
- PR: DRILL-7486: refactor reader creation - merged
- PR: DRILL-TBD: refactor tests to use new schema builder
- PR: DRILL-TBD: Bulk copy in RSL
On branch svr-exp3
- Get copier to work with bulk copy. (Done)
- Split copier tests into multiple files. (Done)
- Bulk copy tests for structured data types. (Done)
- PR for move of allocator into RSL options.
- PR for restructure of reader creator and indexes.
- PR for bulk copy feature in RSL
- PR for copier
Current problem: TestCsvWithSchema.testBlankCols() fails with SV4 from sort. Likely problem is batch ownership. Maybe first move merging into sort?
- Revise, PR ColumnMetadata
- Revise mock to keep its structure externally, use CMD internally
- Revise to use Base structure
- Refactor
ExprTreeMaterializer
to use schema, not vectors. - Project code gen unit tests
- Unit tests for specific bits of code gen
- Begin process of thinking how to incorporate column readers/writers
- Review existing code, work out an approach
- Prepare a writeup, gathering recent comments.
- Look into Java Object support.
- Work out an evolution plan.
Wait for Abhishek.
- https://issues.apache.org/jira/browse/DRILL-7563
- https://hub.docker.com/repository/docker/gaucho84/drill
- https://github.com/paul-rogers/drill-docker See the docker folder for details.
- Remove NOT_YET result status
- Find fixed-size-block branch
- Support for
DESCRIBE
to get schema from non-catalog tables - Support for data sets nested within a file-like or plugin-like object
- Follow up on local directory paths. See #1987: DRILL-7589.
- Retire unused data, vector types (MONEY, obsolete DECIMAL, etc.)
- Some plan for the problem-child data types (repeated list, etc.)
- Easier to work with files
- Command to create a workspace (not just edit JSON)
- CREATE TABLE x STORED AS y AS - or whatever the Hive/Impala syntax is
- Allow overwrite in CTAS
- Don't store the CRC on the local file system
- CTAS emits a CSVH file, but names it CSV, so can't easily reread.
- Table scan mode (for counts, metadata, finding malformed records, etc.)
-
UTCTime
type - Generic transform operator
- Get a test framework setup running
- Generic windowing function (round data to duration)
- Fill in missing values in a series
- DRILL-6953-rev - PR for JSON reader - Abandoned for now
- DRILL-7224
- DRILL-7311
- DRILL-7311-2
- DRILL-7311-debug
- DRILL-7333 - Abandoned, done via other PRs
- DRILL-7333-orig - Obsolete?
- DRILL-7439 - Abandoned, done via other PRs
- DRILL-7447
- DRILL-7456 - Merged
- DRILL-7458 - Base framework PR - Abandoned
- DRILL-7458-2
- DRILL-7572 - JSON Structure Parser - Open PR
- DRILL-7620: Fix plugin mutability issues
- DRILL-7631: Updates to the Json Structure Parser
- DRILL-7640: EVF-based JSON Loader
- Dec10
- Dec30
- Dec30b
- JavaObjRow - Quick & dirty Java object batch prototype
- July14
- June18
- June6
- Nov7
- Nov7b
- Nov7c
- Oct19
- Oct26
- Oct29
- RowSetRev4 - Probably obsolete
- cg-test
- cleanup-Dec1
- error
- error2
- error3
- json - working Json reader branch
- lastSetFix
- logrev
- logrev-exp1
- logrev-exp2
- logrev-exp3
- master
- md-type
- perf
- shim - Text reader schema revision
- svr-exp
- svr-exp2
- svr-exp3
- vectorcheck
- DRILL-7306 - Merged
- DRILL-7306-debug - Obsolete?
- DRILL-7324 - Merged
- DRILL-7327 - Merged
- DRILL-7358 - Merged
- DRILL-7377 - Merged
- DRILL-7377x - Obsolete?
- DRILL-7402 - Merged
- DRILL-7403 - Merged
- DRILL-7412 - Merged
- DRILL-7413 - Merged
- DRILL-7413x - Obsolete?
- DRILL-7414 - Merged
- DRILL-7424 - Merged
- DRILL-7436 - Merged
- DRILL-7441 - Merged
- DRILL-7442 - Merged
- DRILL-7445 - Merged
- DRILL-7446 - Merged
- DRILL-7476 - Merged
- DRILL-7479 - Merged
- DRILL-7486 - Merged
- DRILL-7487 - Merged
- DRILL-7502 - Merged
- DRILL-7503 - Merged
- DRILL-7506 - Merged
- DRILL-7507 - Merged
- DRILL-7574 - Revised projection parser - Merged
- DRILL-7576 - Fail fast for operator errors - Merged
- DRILL-7583 - Remove STOP - Merged
- DRILL-7590: Refactor plugin registry - Merged
- DRILL-7601: Shift column conversion to reader from scan framework - Merged
- DRILL-7617: Disabled plugins not showing in Web UI - Merged
- DRILL-7632: Improve user exception formatting - Merged
- DRILL-7633: Fixes for union and repeated list accessors
- DRILL-7634: Rollup of code cleanup changes
- stop - Work to retire STOP status
- zCountFix - Draft of batch count fixes
- zCountFix2 - Draft of batch count fixes
- zCountFix3 - Draft of batch count fixes
- DRILL-6951-1 - Probably obsolete; related to mock plugin
- DRILL-6953 - Probably obsolete
- DRILL-6953-2 - Probably obsolete
- DRILL-6953-orig - Probably obsolete
- DRILL-7293-orig