Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HDDS-11898. design doc leader side execution #7583

Open
wants to merge 9 commits into
base: master
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
422 changes: 422 additions & 0 deletions hadoop-hdds/docs/content/design/leader-execution/leader-execution.md

Large diffs are not rendered by default.

92 changes: 92 additions & 0 deletions hadoop-hdds/docs/content/design/leader-execution/obs-locking.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
title: Ozone Granular locking for OBS bucket
summary: Granular locking for OBS bucket
date: 2025-01-06
jira: HDDS-11898
status: draft
author: Sumit Agrawal
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->

# OBS locking

OBS case just involves volume, bucket and key. So this is more simplified in terms of locking.

Like for key commit operation, it needs,
1. Volume: `<No lock required similar to existing>`
1. Bucket: Read lock
2. Key: write lock

There will be:
1. BucketStripLock: locking bucket operation
2. KeyStripLock: Locking key operation

**Note**: Multiple keys locking (like delete multiple keys or rename operation), lock needs to be taken in order, i.e. using StrippedLocking order to avoid dead lock.

Stripped locking ordering:
- Strip lock is obtained over a hash bucket.
- All keys needs to be ordered with hash bucket
- And then need take lock in sequence order

## OBS operation
Bucket read lock will be there default.

For key operations in OBS buckets, the following concurrency control is proposed:

| API Name | Locking Key | Notes |
|-------------------------|---------------------------------------|-----------------------------------------------------------------------------------------------------------|
| CreateKey | `No Lock` (Only bucket read lock) | Key can be created parallel by client in open key table and all are exclusive to each other |
| CommitKey | WriteLock: Key Name | Only one key can be committed at a time with the same name: Without locking OM can leave dangling blocks |
| InitiateMultiPartUpload | `No Lock` (Only bucket read lock) | no lock is required as key will be created with upload Id and can be parallel |
| CommitMultiPartUpload | WriteLock: PartKey Name | Only one part can be committed at a time with the same name: Without locking OM can leave dangling blocks |
| CompleteMultiPartUpload | WriteLock: Key Name | Only one key can be completed at a time with the same name: Without locking OM can leave dangling blocks |
| AbortMultiPartUpload | WriteLock: Key Name | Need avoid abort and commit parallel |
| DeleteKey | WriteLock: Key Name | Only one key can be deleted at a time with the same name: Without locking write to DB can fail |
| RenameKey | WriteLock: sort(Key Name1, Key Name 2) | Only one key can be renamed at a time with the same name: Without locking OM can leave dangling blocks |
| SetAcl | WriteLock: Key Name | Only one key can be updated at a time with the same name |
| AddAcl | WriteLock: Key Name | Only one key can be updated at a time with the same name |
| RemoveAcl | WriteLock: Key Name | Only one key can be updated at a time with the same name |
| AllocateBlock | WriteLock: Key Name | Only one key can be updated at a time with the same name |
| SetTimes | WriteLock: Key Name | Only one key can be updated at a time with the same name |

Batch Operation:
1. deleteKeys: batch will be divided to multiple threads in Execution Pool to run parallel calling DeleteKey
2. RenameKeys: This is `depreciated`, but for compatibility, will be divided to multiple threads in Execution Pool to run parallel calling RenameKey

For batch operation, atomicity is not guranteed for above api, and same is behavior for s3 perspective.

## Bucket and volume locking as required for concurrency for obs key handling

### Volume Operation

| API Name | Locking Key | Notes |
|--------------|-------------------|-------------------------------------------------------------------------------------------------------------------------|
| CreateVolume | Volume Write lock | volume level write lock to avoid parallel volume creation and deletion |
| DeleteVolume | Volume Write lock | volume level write lock to avoid parallel volume creation and deletion |
| SetProperty | Volume Write lock | volume property update like owner and quota, lock to avoid parallel operation over volume |
| SetAcl | Volume Write lock | volume acl updated avoid parallel operation over volume |
| AddAcl | Volume Write lock | volume acl updated avoid parallel operation over volume |
| RemoveAcl | Volume Write lock | volume acl updated avoid parallel operation over volume |

### Bucket Operation

| API Name | Locking Key | Notes |
|--------------|-----------------------------------|--------------------------------------------------------------------------------------------------------------------------|
| CreateBucket | Volume Read and Bucket Write lock | volume level write lock to avoid parallel volume creation and deletion |
| DeleteBucket | Bucket Write lock | volume level write lock to avoid parallel volume creation and deletion |
| SetProperty | Bucket Write lock | volume property update like owner and quota, lock to avoid any operation happening inside volume at bucket and key level |
| SetAcl | Bucket Write lock | bucket acl updated blocking any operation over bucket and key |
| AddAcl | Bucket Write lock | bucket acl updated blocking any operation over bucket and key |
| RemoveAcl | Bucket Write lock | bucket acl updated blocking any operation over bucket and key |
Binary file not shown.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Original file line number Diff line number Diff line change
@@ -0,0 +1,75 @@
---
title: Commit key for OBS bucket request flow
summary: Commit key for OBS bucket request flow steps for leader side execution
date: 2025-01-06
jira: HDDS-11898
status: draft
author: Sumit Agrawal
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
# OBS Commit key flow

Utility classes:
- DbChangeRecorder: record db changes
- ExecutionContext: provides index and other resources

Hsync feature includes
- hsync
- hsync recovery
- hsync overwrite handling


`class OMKeyCommitObsExecutor`


## preprocess

- validate key format and reserve keyword
- normalize key
- capture original bucket and resolve bucket (if different)
- validate hsync, hsync recovery, and hsync feature


## authorize

Acl validation for volume, resolved and original bucket, and key permission (via ranger or native acl).

## lock
Read lock for bucket, write lock for key

## unlock
unlock bucket and key

## process

- validate if bucket is changed after bucket lock
- retrieve old key from keyTable
- get key from openKeyTable
- validate hsync feature flags from old key and key commit args (Note: hsync is not currently used for obs flow)
- validate key overwrite feature
- Create new Key from open key and old key (for overwrite)
- prepare quota changes and validate, and update to ChangeRecorder
- add key to key table, delete from open key table to ChangeRecorder
- add uncommitted blocks, blocks for removal in over-write to deleteTable to changeRecorder
- update metrics and audit log
- prepare response and return

# Old Flow comparison changes
Compare to old flow, below cases are removed,ß
1. open key re-prepare with overwrite case

# Testability

For existing test code, behavior cases can be rewritten with new Test classes, with validation.
Original file line number Diff line number Diff line change
@@ -0,0 +1,69 @@
---
title: Create key for OBS bucket request flow
summary: Create key for OBS bucket request flow steps for leader side execution
date: 2025-01-06
jira: HDDS-11898
status: draft
author: Sumit Agrawal
---
<!--
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License. See accompanying LICENSE file.
-->
# OBS Create key flow

Utility classes:
- DbChangeRecorder: record db changes
- ExecutionContext: provides index and other resources

`class OMKeyCreateObsExecutor`

## preprocess

- validate key format and reserve keyword
- normalize key
- capture original bucket and resolve bucket (if different)


## authorize

Acl validation for volume, resolved and original bucket, and key permission (via ranger or native acl).

## lock
- Read lock for bucket
- key lock is not required as parallel key creation is allowed

## unlock
unlock bucket

## process

- validate if bucket is changed after bucket lock
- validate encryption info if bucket have but key do not have
- retrieve encryption info (MPU / normal case)
- prepare key info
- get replication config
- generate object Id from index
- add block info (if not MPU)
- quota validation at the moment
- add open key to ChangeRecorder
- update metrics and audit log
- prepare response and return

# Old Flow comparison changes
Compare to old flow, below cases are removed,
1. retrieve old key if exist - not required, as during commit, overwrite happens
2. key-rewrite validation: this is not required at this point, as commit already have validation

# Testability

For existing test code, behavior cases can be rewritten with new Test classes with validation.
Loading