Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11962. [Docs] Hive Integration #7596

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

jojochuang
Copy link
Contributor

What changes were proposed in this pull request?

HDDS-11962. [Docs] Hive Integration

Please describe your PR in detail:

What is the link to the Apache JIRA

https://issues.apache.org/jira/browse/HDDS-11947

How was this patch tested?

./hadoop-ozone/dev-support/checks/docs.sh passed.

Screenshot 2024-12-18 at 1 42 41 PM Screenshot 2024-12-18 at 1 42 48 PM

@jojochuang jojochuang added the documentation Improvements or additions to documentation label Dec 18, 2024
@adoroszlai
Copy link
Contributor

Thanks @jojochuang for the patch. The Hive doc looks good.

  • This doc is built on top of HDDS-11948 ... Will rebase once the DistCp user doc is merged.

If you would like to open separate PRs for doc pages, but not for the index page: create PRs for each content page, and add the same index page in all. But do not add other content pages.

This way any PR can be merged first, the rest only need removal of the index. Rather than requiring specific order in the chain.

Copy link
Contributor

@errose28 errose28 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this @jojochuang. Disclaimer: I didn't actually test the steps but I know you've done a lot of work on ozone integration. Mostly minor comments on formatting, I appreciate how concise the content is!

hadoop-hdds/docs/content/integration/Hive.md Outdated Show resolved Hide resolved
hadoop-hdds/docs/content/integration/Hive.md Outdated Show resolved Hide resolved
hadoop-hdds/docs/content/integration/Hive.md Outdated Show resolved Hide resolved
```

## Using the S3A Protocol
In addition to ofs, Hive can access Ozone via the S3 Gateway using the S3A file system. For more details, refer to the [S3 Protocol]({{< ref "interface/S3.md">}}) documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's also link to the hadoop docs on s3a here, and maybe hive's specific docs on s3a integration if they exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cloudera's user doc https://docs.cloudera.com/cdp-private-cloud-base/7.1.9/ozone-storing-data/topics/ozone-access-ozone-s3-using-s3a-filesystem.html is the most comprehensive and accurate one. However I'll refrain from linking a vendor's doc in Apache.

@jojochuang
Copy link
Contributor Author

Screenshot 2024-12-20 at 12 30 22 PM Screenshot 2024-12-20 at 12 30 32 PM Screenshot 2024-12-20 at 12 30 47 PM

Change-Id: I4a90e652f4e6a8f9007252a014c2b4fb473b030f
hadoop-hdds/docs/content/integration/Hive.md Outdated Show resolved Hide resolved
Comment on lines +85 to +90
```sql
CREATE DATABASE d1 MANAGEDLOCATION 'ofs://ozone1/vol1/bucket1/data';
```

Tables created in the database d1 will be stored under the specified path:
`ofs://ozone1/vol1/bucket1/data`
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Managed Tables will be stored in that path, MANAGEDLOCATION is for Managed Tables, LOCATION is for External Tables

https://cwiki.apache.org/confluence/display/hive/languagemanual+ddl#LanguageManualDDL-Create/Drop/Alter/UseDatabase

hadoop-hdds/docs/content/integration/Hive.md Outdated Show resolved Hide resolved
Change-Id: I6b62c2188cff644e11833546416135f75ee7e279
@jojochuang jojochuang requested a review from ayushtkn January 2, 2025 20:14
Copy link
Member

@ayushtkn ayushtkn left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanx @jojochuang, Minor comments, rest LGTM


* With external tables, the data is expected to be created and managed by another tool.
* Hive queries the data as-is.
* The metadata is stored under the external warehouse directory.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data is stored in the external warehouse directory, if no LOCATION is specified when creating an external table, maybe if not needed we can drop the explanation of external table itself, that goes bit into Hive scope

* The metadata is stored under the external warehouse directory.
* Note: Dropping an external table in Hive does not delete the associated data.

You can also have the metadata for the external tables stored in Ozone too by applying the following configuration in the `hive-site.xml` file:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Metadata is stored in HMS. only the Data is stored Storage layer, the below config will configure to store the data for External table by default in the specified path, if the path isn't explicitly specified while creating the external table.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the feedback! Please take a look at the updated doc again.

Change-Id: Id937eef2cf13bd41fdb1afd0d6f8b8377bf6a785
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
documentation Improvements or additions to documentation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants