Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11962. [Docs] Hive Integration #7596

Open
wants to merge 4 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
170 changes: 170 additions & 0 deletions hadoop-hdds/docs/content/integration/Hive.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,170 @@
---
title: Hive
weight: 4
menu:
main:
parent: "Application Integrations"
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

# Overview
Apache Hive has supported Apache Ozone since Hive 4.0. To enable Hive to work with Ozone paths, ensure that the `ozone-filesystem-hadoop3` JAR is added to the Hive classpath.

# Supported Access Protocols

Hive supports the following protocols for accessing Ozone data:

* ofs
* o3fs
* s3a

# Supported Replication Types

Hive is compatible with Ozone buckets configured with either:

* RATIS (Replication)
* Erasure Coding

# Accessing Ozone Data in Hive

Hive provides two methods to interact with data in Ozone:

* Managed Tables
* External Tables

## Managed Tables
### Configuring the Hive Warehouse Directory in Ozone
To store managed tables in Ozone, update the following properties in the `hive-site.xml` configuration file:

```xml
<property>
<name>hive.metastore.warehouse.dir</name>
<value>ofs://ozone1/vol1/bucket1/warehouse/</value>
</property>
```

### Creating a Managed Table
You can create a managed table with a standard `CREATE TABLE` statement:

```sql
CREATE TABLE myTable (
id INT,
name STRING
);
```

### Loading Data into a Managed Table
Data can be loaded into a Hive table from an Ozone location:

```sql
LOAD DATA INPATH 'ofs://ozone1/vol1/bucket1/table.csv' INTO TABLE myTable;
```

### Specifying a Custom Ozone Path
You can define a custom Ozone path for a database using the `MANAGEDLOCATION` clause:

```sql
CREATE DATABASE d1 MANAGEDLOCATION 'ofs://ozone1/vol1/bucket1/data';
```

Tables created in the database d1 will be stored under the specified path:
`ofs://ozone1/vol1/bucket1/data`
Comment on lines +81 to +86
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed Tables will be stored in that path, MANAGEDLOCATION is for Managed Tables, LOCATION is for External Tables

https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-Create/Drop/Alter/UseDatabase


### Verifying the Ozone Path
You can confirm that Hive references the correct Ozone path using:

```sql
SHOW CREATE DATABASE d1;
```

Output Example:

```text
+----------------------------------------------------+
| createdb_stmt |
+----------------------------------------------------+
| CREATE DATABASE `d1` |
| LOCATION |
| 'ofs://ozone1/vol1/bucket1/external/d1.db' |
| MANAGEDLOCATION |
| 'ofs://ozone1/vol1/bucket1/data' |
+----------------------------------------------------+
```

## External Tables

Hive allows the creation of external tables to query existing data stored in Ozone.

### Creating an External Table
```sql
CREATE EXTERNAL TABLE external_table (
id INT,
name STRING
)
LOCATION 'ofs://ozone1/vol1/bucket1/table1';
```

* With external tables, the data is expected to be created and managed by another tool.
* Hive queries the data as-is.
* The metadata is stored under the external warehouse directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is stored in the external warehouse directory, if no LOCATION is specified when creating an external table, maybe if not needed we can drop the explanation of external table itself, that goes bit into Hive scope

* Note: Dropping an external table in Hive does not delete the associated data.

You can also have the metadata for the external tables stored in Ozone too by applying the following configuration in the `hive-site.xml` file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata is stored in HMS. only the Data is stored Storage layer, the below config will configure to store the data for External table by default in the specified path, if the path isn't explicitly specified while creating the external table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Please take a look at the updated doc again.

```xml
<property>
<name>hive.metastore.warehouse.external.dir</name>
<value>ofs://ozone1/vol1/bucket1/external/</value>
</property>
```

### Verifying the External Table Path
To confirm the table's metadata and location, use:

```sql
SHOW CREATE TABLE external_table;
```
Output Example:

```text
+----------------------------------------------------+
| createtab_stmt |
+----------------------------------------------------+
| CREATE EXTERNAL TABLE `external_table`( |
| `id` int, |
| `name` string) |
| ROW FORMAT SERDE |
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' |
| STORED AS INPUTFORMAT |
| 'org.apache.hadoop.mapred.TextInputFormat' |
| OUTPUTFORMAT |
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' |
| LOCATION |
| 'ofs://ozone1/vol1/bucket1/table1' |
| TBLPROPERTIES ( |
| 'bucketing_version'='2', |
| 'transient_lastDdlTime'='1734725573') |
+----------------------------------------------------+
```

# Using the S3A Protocol
In addition to ofs, Hive can access Ozone using the S3 Gateway via the S3A file system.

For more information, consult:

* The [S3 Protocol]({{< ref "interface/S3.md">}})
* The [Hadoop S3A](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html) documentation.
26 changes: 26 additions & 0 deletions hadoop-hdds/docs/content/integration/_index.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,26 @@
---
title: "Application Integrations"
menu:
main:
weight: 5
---
<!---
Licensed to the Apache Software Foundation (ASF) under one or more
contributor license agreements. See the NOTICE file distributed with
this work for additional information regarding copyright ownership.
The ASF licenses this file to You under the Apache License, Version 2.0
(the "License"); you may not use this file except in compliance with
the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.
-->

{{<jumbotron>}}
Many applications can be integrated with Ozone through the Hadoop-compatible ofs interface or the S3 interface.
{{</jumbotron>}}
Loading