-
Notifications
You must be signed in to change notification settings - Fork 517
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HDDS-11072. Publish user-facing configs to the doc site #6916
base: master
Are you sure you want to change the base?
Conversation
…t.xml and ozone-default-generated.xml
@devabhishekpal could you please take a look at the changes, thanks. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarvekshayr for working on this.
hadoop-hdds/common/src/main/java/com/google/protobuf/XmlToMarkdown.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/com/google/protobuf/XmlToMarkdown.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/com/google/protobuf/XmlToMarkdown.java
Outdated
Show resolved
Hide resolved
hadoop-hdds/common/src/main/java/com/google/protobuf/XmlToMarkdown.java
Outdated
Show resolved
Hide resolved
Thanks for working on this @sarvekshayr. The PR description mentions automation and publishing, but I don't see any automated way of running this in the PR currently. This is important to work out because it affects the implementation:
Since it looks like the configs require outputs from the Ozone build, not just the source, I'm assuming the plan for automation will look something like this:
|
Hi @sarvekshayr, thanks for adding this patch and @errose28 for your inputs.
@sarvekshayr please refer below on how you might be able to share the data between the workflows: And for pushing the commit you can check the changes that were done for dependabot if you want to add the generated markdown file to the same PR. This is the reference to the line where that step is being executed. Or you can use the bot to raise a separate PR for better tracking via GitHub actions. @errose28 @adoroszlai let me know your thoughts as well. |
Thanks for the input @devabhishekpal. Here's what I'm thinking based on the extra info you shared:
Some thoughts:
|
.github/workflows/doc.yml
Outdated
ref: 'HDDS-9225-website-v2' | ||
token: ${{ secrets.GITHUB_TOKEN }} | ||
path: ozone-site | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar hash check step needs to be aded here to check if the config file was outdated i.e some new config was added in some commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hi @sarvekshayr,
Thanks for the commit, I just had a few minor comments. Please take a look at them
Hello @errose28 @adoroszlai.
Which one do you think would be better? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @sarvekshayr. When testing this out, you can modify the workflow to run against your forks of ozone and ozone-site to make sure the changes pass CI for the new ozone-site branch.
dev-support/ci/xml_to_md.py
Outdated
markdown.append("---\n") | ||
markdown.append("title: \"Ozone configurations\"\n") | ||
markdown.append("summary: Ozone configurations\n") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current website doesn't require front matter. The new website will fail the build if disallowed front matter keys are used like title
and summary
. I would change this to use sidebar_label
only if the label in the docs sidebar should be different than the page title given by the first heading. sidebar_laebl
is only used by the new website. The current website will ignore this key.
I would check the hash if the file exists, and raise a PR if the file does not exist. That way the workflow is not dependent on the initial repo state it is running on in order to succeed. If we are doing PRs instead of commits we have a chance to fix it manually if needed. |
Thinking back on #6916 (comment), do you think we should also have this run for 1.4.0 branch or run for older release branches as well? |
This is a good question. I think defining a mapping between release branches and locations in the docs may be difficult to maintain. This is true whether the docs live in the ozone or ozone-site repo, since doc versions are stored in different directories. We could come up with some regex type thing but that seems fragile. There likely won't be many config changes to past release lines, so it is probably ok to leave it as just the master branch of ozone and ozone-site for now. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarvekshayr for this patch.
I had a few questions/comments, could you please take a look?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @sarvekshayr for the patch, LGTM +1
@errose28 could you also take a look at this and check if I missed anything for the new website part?
@adoroszlai your inputs would also be greatly appreciated
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the updates @sarvekshayr. I just commented on the workflow file for now. Let's break this into the following steps for easier review of the PR:
- Get the workflow running properly.
- Modify the workflow to run on your forks of ozone and ozone-site, and post links to the runs and PRs created in your fork so we can review them.
- From the PRs we will have local previews of the pages generated for the current and new website. This will make it easier to review the python script.
.github/workflows/ci.yml
Outdated
- name: Extract the source tarball | ||
run: | | ||
mkdir -p ozone-bin/extracted | ||
tar -xzf ozone-bin/ozone-*-SNAPSHOT.tar.gz -C ozone-bin/extracted |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The workflow should succeed for commits that represent released versions as well. These will not have SNAPSHOT in their name.
.github/workflows/ci.yml
Outdated
git config --global user.email '[email protected]' | ||
git add hadoop-hdds/docs/content/tools/Configurations.md | ||
git commit -m "[Auto] Update Configurations.md from $GITHUB_SHA" | ||
git push origin HEAD:master |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is adding the commit directly to master. This should actually go to a source branch based off the latest master with a unique name. In the following PR step, we should explicitly specify the source and target branches for the PR. I think we will have to manually delete the branch when the PR is merged. I'm not sure how to do something like dependabot where the branch gets automatically deleted when the PR is merged.
.github/workflows/ci.yml
Outdated
- name: Copy MD file to ozone-site repository | ||
if: env.hashes_differ == 'true' | ||
run: | | ||
TARGET_DIR=$(find ozone-site/docs -type d -name '*-administrator-guide' -exec find {} -type d -name '*-configuration' -print -quit \;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we need the recursive search provided by find, and it might actually be more predictable to leave it out. I think we can simplify this:
TARGET_DIR=$(find ozone-site/docs -type d -name '*-administrator-guide' -exec find {} -type d -name '*-configuration' -print -quit \;) | |
TARGET_DIR=$(ls docs/*-administrator-guide/*-configuration) |
.github/workflows/ci.yml
Outdated
echo "Target directory not found." | ||
exit 1 | ||
fi | ||
cp hadoop-hdds/docs/content/tools/Configurations.md "$TARGET_DIR/configurations.md" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This page is currently called "Configuration Key Appendix" in the new website. We should name the file <number>-appendix.md
. This will give it the URL <domain>/docs/administrator-guide/configuration/appendix
. The current set of changes would produce a more confusing URL: <domain>/docs/administrator-guide/configuration/configuration
.
The number at the beginning of the name is required by the new website's CI to define an absolute ordering of pages in the sdebar. We probably want this to be the last page in the section always. We could list all the current numbers and add 1 to make it the next, but there is actually no requirement that the numbers are consecutive, they work more like weights. To keep this simple I would suggest naming this file 99-appendix.md
so it is always last assuming there are less than 99 immediate children of the configuration
directory. This should always be true otherwise the website organization would be a mess, but the PR gives us a change to check this assumption if needed. Let's add a comment to the script here explaining this.
.github/workflows/ci.yml
Outdated
cd ozone-site | ||
git config --global user.name 'Github Actions' | ||
git config --global user.email '[email protected]' | ||
ADD_FILE=$(find docs -type d -name '*-administrator-guide' -exec find {} -type d -name '*-configuration' -print -quit \;) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should save the absolute path to the file as a variable in the previous step instead of recomputing it.
.github/workflows/ci.yml
Outdated
ADD_FILE=$(find docs -type d -name '*-administrator-guide' -exec find {} -type d -name '*-configuration' -print -quit \;) | ||
git add "$ADD_FILE/configurations.md" | ||
git commit -m "[Auto] Update configurations.md page from ozone $GITHUB_SHA" | ||
git push origin HEAD:HDDS-9225-website-v2 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment from before about source and target branches applies here as well.
.github/workflows/ci.yml
Outdated
GH_TOKEN: ${{ secrets.GITHUB_TOKEN }} | ||
run: | | ||
cd ozone-site | ||
gh pr create --title "[Auto] Update configurations.md page" --body "This is an automated pull request from the ozone repository." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Similar comment about PR description from before applies here as well.
I was talking with @sarvekshayr, who was testing this out on her forks of ozone and ozone-site and getting permission errors when ozone tried to raise a PR to ozone-site, since it would need to create a new branch for the PR in ozone-site. After some research, we will need a personal access token (PAT) for apache/ozone to have permissions to create branches needed for PRs to apache/ozone-site. This needs ASF infra to approve, but it looks like there's already precedent for other projects doing it so we should be able to get approval. See INFRA-25629, INFRA-23946, INFRA-23220, and INFRA-23044 for some examples. @sarvekshayr will test the PAT approach on her fork, where she has permissions to create them. Once we have a POC running on the forks we can raise an infra ticket to get a PAT for ozone-site and store it as a secret in the Ozone repo. This workflow can then reference the secret. |
Thanks for the continued work on this @sarvekshayr. Looks like the PAT approach is working well on your fork. I'll file an infra ticket to get a PAT for the ozone-site repo, and we can continue work here in the mean time. After reviewing the auto generated PRs, here are some high level suggestions: PR Message
Overall I'm thinking the PR message could look like this, where HDDS-1234 is the hypothetical Ozone change that made the config modification: source branch: config-doc-update-from-22ddfb951a HDDS-1234. Update configurations.md page with changes from HDDS-1234What changes were proposed in this pull request?This is an automated pull request triggered by the workflow build-branch run from test-branch-2. Please delete the config-doc-update-from-22ddfb951a branch after this PR is merged or closed. What is the link to the Apache Jira?How was this patch tested?Reviewers should manually verify the correctness of this change. Webpage
Overall I'm thinking the webpage could look like this: Unrendered:---
sidebar_label: Appendix
---
# Configuration Key Appendix
This page provides a comprehensive overview of all configuration keys available in Ozone.
## Configuration Keys
| **Name** | **Value** | **Tags** | **Description** |
|-|-|-|-|
| `hadoop.hdds.db.rocksdb.WAL_size_limit_MB` | 0MB | `OM`<br/>`SCM`<br/>`DATANODE` | The total size limit of WAL log files. Once the total log file size exceeds this limit, the earliest files will be deleted.Default 0 means no limit. |
| `hadoop.hdds.db.rocksdb.WAL_ttl_seconds` | 1200 | `OM`<br/>`SCM`<br/>`DATANODE` | The lifetime of WAL log files. Default 1200 seconds. |
| `hdds.container.balancer.move.replication.timeout` | 50m | `BALANCER` | The amount of time to allow a single container's replication from source to target as part of container move. For example, if `hdds.container.balancer.move.timeout` is 65 minutes, then out of those 65 minutes 50 minutes will be the deadline for replication to complete. | Rendered:Configuration Key AppendixThis page provides a comprehensive overview of all configuration keys available in Ozone. Configuration Keys
|
I filed INFRA-26150 for adding a personal access token. |
Hi @sarvekshayr can you temporarily modify this change to use the |
@errose28 @sarvekshayr |
I think the biggest difference is that the docs content that is merged into the current website is automatically generated and we assume it does not need reviews. However for this config table I think we want the PR to be automatically created but an actual dev to verify the content lines up with the change that it came from, CI passes, etc before publishing. So the question is: should ozone-site pull the changes and create a PR, or should ozone push the changes and create a PR. Since changes are being reviewed, it would be good to have a way to trace back which ozone changes led to which config changes needed in the docs, and have one doc update PR per config change commit in Ozone. Easiest way I see to do this is to use a push model, so Ozone can attach information about the commit that triggered the push in the ozone-site pull request. |
In that case current approach is fine, I didn't know manual review was desired. |
What changes were proposed in this pull request?
This PR introduces a mechanism to automatically generate a markdown (.md) file from the ozone-default.xml and ozone-default-generated.xml files, with the entries sorted by configuration keys. The generated markdown file is then published to a new page under the documentation site.
What is the link to the Apache JIRA
https://issues.apache.org/jira/browse/HDDS-11072
How was this patch tested?
Verified the output and format of the markdown (.md) file.