-
Notifications
You must be signed in to change notification settings - Fork 511
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11962. [Docs] Hive Integration #7596
base: master
Are you sure you want to change the base?
Changes from 3 commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,170 @@ | ||
--- | ||
title: Hive | ||
weight: 4 | ||
menu: | ||
main: | ||
parent: "Application Integrations" | ||
--- | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to You under the Apache License, Version 2.0 | ||
(the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
http://www.apache.org/licenses/LICENSE-2.0 | ||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
|
||
# Overview | ||
Apache Hive has supported Apache Ozone since Hive 4.0. To enable Hive to work with Ozone paths, ensure that the `ozone-filesystem-hadoop3` JAR is added to the Hive classpath. | ||
|
||
# Supported Access Protocols | ||
|
||
Hive supports the following protocols for accessing Ozone data: | ||
|
||
* ofs | ||
* o3fs | ||
* s3a | ||
|
||
# Supported Replication Types | ||
|
||
Hive is compatible with Ozone buckets configured with either: | ||
|
||
* RATIS (Replication) | ||
* Erasure Coding | ||
|
||
# Accessing Ozone Data in Hive | ||
|
||
Hive provides two methods to interact with data in Ozone: | ||
|
||
* Managed Tables | ||
* External Tables | ||
|
||
## Managed Tables | ||
### Configuring the Hive Warehouse Directory in Ozone | ||
To store managed tables in Ozone, update the following properties in the `hive-site.xml` configuration file: | ||
|
||
```xml | ||
<property> | ||
<name>hive.metastore.warehouse.dir</name> | ||
<value>ofs://ozone1/vol1/bucket1/warehouse/</value> | ||
</property> | ||
``` | ||
|
||
### Creating a Managed Table | ||
You can create a managed table with a standard `CREATE TABLE` statement: | ||
|
||
```sql | ||
CREATE TABLE myTable ( | ||
id INT, | ||
name STRING | ||
); | ||
``` | ||
|
||
### Loading Data into a Managed Table | ||
Data can be loaded into a Hive table from an Ozone location: | ||
|
||
```sql | ||
LOAD DATA INPATH 'ofs://ozone1/vol1/bucket1/table.csv' INTO TABLE myTable; | ||
``` | ||
|
||
### Specifying a Custom Ozone Path | ||
You can define a custom Ozone path for a database using the `MANAGEDLOCATION` clause: | ||
|
||
```sql | ||
CREATE DATABASE d1 MANAGEDLOCATION 'ofs://ozone1/vol1/bucket1/data'; | ||
``` | ||
|
||
Tables created in the database d1 will be stored under the specified path: | ||
`ofs://ozone1/vol1/bucket1/data` | ||
|
||
### Verifying the Ozone Path | ||
You can confirm that Hive references the correct Ozone path using: | ||
|
||
```sql | ||
SHOW CREATE DATABASE d1; | ||
``` | ||
|
||
Output Example: | ||
|
||
```text | ||
+----------------------------------------------------+ | ||
| createdb_stmt | | ||
+----------------------------------------------------+ | ||
| CREATE DATABASE `d1` | | ||
| LOCATION | | ||
| 'ofs://ozone1/vol1/bucket1/external/d1.db' | | ||
| MANAGEDLOCATION | | ||
| 'ofs://ozone1/vol1/bucket1/data' | | ||
+----------------------------------------------------+ | ||
``` | ||
|
||
## External Tables | ||
|
||
Hive allows the creation of external tables to query existing data stored in Ozone. | ||
|
||
### Creating an External Table | ||
```sql | ||
CREATE EXTERNAL TABLE external_table ( | ||
id INT, | ||
name STRING | ||
) | ||
LOCATION 'ofs://ozone1/vol1/bucket1/table1'; | ||
``` | ||
|
||
* With external tables, the data is expected to be created and managed by another tool. | ||
* Hive queries the data as-is. | ||
* The metadata is stored under the external warehouse directory. | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. data is stored in the external warehouse directory, if no LOCATION is specified when creating an external table, maybe if not needed we can drop the explanation of external table itself, that goes bit into Hive scope |
||
* Note: Dropping an external table in Hive does not delete the associated data. | ||
|
||
You can also have the metadata for the external tables stored in Ozone too by applying the following configuration in the `hive-site.xml` file: | ||
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Metadata is stored in HMS. only the Data is stored Storage layer, the below config will configure to store the data for External table by default in the specified path, if the path isn't explicitly specified while creating the external table. There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Thanks for the feedback! Please take a look at the updated doc again. |
||
```xml | ||
<property> | ||
<name>hive.metastore.warehouse.external.dir</name> | ||
<value>ofs://ozone1/vol1/bucket1/external/</value> | ||
</property> | ||
``` | ||
|
||
### Verifying the External Table Path | ||
To confirm the table's metadata and location, use: | ||
|
||
```sql | ||
SHOW CREATE TABLE external_table; | ||
``` | ||
Output Example: | ||
|
||
```text | ||
+----------------------------------------------------+ | ||
| createtab_stmt | | ||
+----------------------------------------------------+ | ||
| CREATE EXTERNAL TABLE `external_table`( | | ||
| `id` int, | | ||
| `name` string) | | ||
| ROW FORMAT SERDE | | ||
| 'org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe' | | ||
| STORED AS INPUTFORMAT | | ||
| 'org.apache.hadoop.mapred.TextInputFormat' | | ||
| OUTPUTFORMAT | | ||
| 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat' | | ||
| LOCATION | | ||
| 'ofs://ozone1/vol1/bucket1/table1' | | ||
| TBLPROPERTIES ( | | ||
| 'bucketing_version'='2', | | ||
| 'transient_lastDdlTime'='1734725573') | | ||
+----------------------------------------------------+ | ||
``` | ||
|
||
# Using the S3A Protocol | ||
In addition to ofs, Hive can access Ozone using the S3 Gateway via the S3A file system. | ||
|
||
For more information, consult: | ||
|
||
* The [S3 Protocol]({{< ref "interface/S3.md">}}) | ||
* The [Hadoop S3A](https://hadoop.apache.org/docs/current/hadoop-aws/tools/hadoop-aws/index.html) documentation. |
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,26 @@ | ||
--- | ||
title: "Application Integrations" | ||
menu: | ||
main: | ||
weight: 5 | ||
--- | ||
<!--- | ||
Licensed to the Apache Software Foundation (ASF) under one or more | ||
contributor license agreements. See the NOTICE file distributed with | ||
this work for additional information regarding copyright ownership. | ||
The ASF licenses this file to You under the Apache License, Version 2.0 | ||
(the "License"); you may not use this file except in compliance with | ||
the License. You may obtain a copy of the License at | ||
|
||
http://www.apache.org/licenses/LICENSE-2.0 | ||
|
||
Unless required by applicable law or agreed to in writing, software | ||
distributed under the License is distributed on an "AS IS" BASIS, | ||
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. | ||
See the License for the specific language governing permissions and | ||
limitations under the License. | ||
--> | ||
|
||
{{<jumbotron>}} | ||
Many applications can be integrated with Ozone through the Hadoop-compatible ofs interface or the S3 interface. | ||
{{</jumbotron>}} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Managed Tables will be stored in that path,
MANAGEDLOCATION
is for Managed Tables,LOCATION
is for External Tableshttps://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-Create/Drop/Alter/UseDatabase