-
Notifications
You must be signed in to change notification settings - Fork 1
Batch Handling Upgrades
The batch handling framework consists of a number of layers that combine to enable Drill to control the size of each record batch, which in turn allows Drill to implement effective memory management and admission control.
The material here starts with concepts, then provides a tour of the various components. Each component is heavily commented, so after reading this material, you should be able to get the details from the code itself.
- Conceptual Overview
- Components
- Code
- Metadata
- Row Set Mechanism
- Column Accessors
- Column Readers
- Column Writers
- Result Set Loader
- Operator Framework
- Projection Framework
- Scan Framework
-
Row set loader. Concept of overflow. Column states. Vector states. Overflow processing. Vector allocation. Vector cache and multi-reader model.
-
Operator framework. Split of concerns. Protocol adapter. Schema change detection.
-
Projection framework. Concepts. Project lists. Null columns. Implicit columns. Assembling the output batch. Column information in projection list. Recursive projection in maps. Schema smoothing and persistence.
-
Mock reader. CSV reader. Easy format plugin. Concept of Parquet support.
-
JSON concepts. JSON issues. Revised JSON parser. JSON semantics. Open issues. Possible opportunities.
-
Future opportunities. Code generation. Plugin APIs. Reader retrofits. Fixed-size buffers.