Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add LocalTmpStorage for all services #17588

Merged
merged 1 commit into from
Jan 7, 2025

Conversation

rohangarg
Copy link
Member

Provides an interface for getting local temporary storage in all the services. In all future changes, the users can directly inject an object of LocalTmpStorage and use the temporary directory in it as a scratch pad.
Along with the consolidation and possible ability to track all temp data in future, this also provides a common interface for interacting with all local temporary storage work.

@rohangarg rohangarg force-pushed the test_dir_config branch 2 times, most recently from 2864ed2 to 7888593 Compare December 18, 2024 20:23
Copy link
Contributor

@imply-cheddar imply-cheddar left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Needs tests for coverage. Add those before merging please.

@LazySingleton
public LocalTmpStorage getLocalTmpStorage()
{
File tmpDir = new File(taskDirPath, "tmp");
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@cheddar : should we create the temporary storage per attempt? or should we share it across attempts for re-use of temp data?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's share it. The old default for cases where this would've been used is just java.io.tmpdir which was shared as well, so it shouldn't cause problems...


import java.io.File;

public interface LocalTmpStorage
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pretty similar to TempDirProducer ; to service this usecase maybe that could be enhanced with a lazy init and a getRoot ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TempDirProducer appears to have semantics around cleaning up with close(). That class is different from this class, this class is basically just a Config. It could be called a Config, but Rohan doesn't like the word config, so it got this name instead.

LocalTmpStorage is just delivering a location, nothing more, nothing less, it provides something that can be injected to get at a system-configured tmp storage location and is not intended to actually do anything beyond that. It probably deserves javadoc that describes that this class shouldn't be doing anything other than delivering configuration as it's just there to be an injectable configuration object.

It could absolutely make sense to have TempDirProducer depend on a LocalTmpStorage in order to get the tmp dir location that it's supposed to use.

*/
File getTmpDir();

class DefaultLocalTmpStorageProvider implements Provider<LocalTmpStorage>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

doesn't seem like the system will prepare to clean up these files - wouldn't that will fill up the disk/create garabge?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The tmpdir provided from this is just a location. Whether things are cleaned up or not is a question of the implementation. In general, most of the code that deals with tmp files cleans them up after itself and if it's not cleaning up after itself, that should either be intentional and have a design reason or it is a bug in the code that's dealing with the file, not a bug in the code that's delivering the location of the directory.

@srv-rr-github-token
Copy link

{'id': 'msg_bdrk_01G7MDEZaj8AMTK7V9svu9eR', 'type': 'message', 'role': 'assistant', 'model': 'claude-3-5-sonnet-20241022', 'content': [{'type': 'text', 'text': 'Suggested pull request title: feat: introduce LocalTmpStorage interface for unified temporary storage management\n\nI appreciate the initiative to consolidate and standardize the temporary storage management across all services in Apache Druid. The code is well-structured, and the interface design is clean and straightforward. The implementation across different services is consistent, which is excellent.\n\nHowever, there are a few areas that could be improved to make this change even better:\n\n1. In LocalTmpStorage.java:\n - Consider adding documentation about the lifecycle management of the temporary directory\n - Add a method for cleanup of temporary files (like cleanup() or delete())\n - Consider adding javadoc explaining if the directory is unique per service instance\n\n2. In DefaultLocalTmpStorageProvider:\n - Add error handling for cases where temp directory creation fails\n - Consider adding a configuration option for the base temporary directory path\n - Add logging when temp directories are created\n\n3. In CliPeon.java:\n - The implementation differs from other services (direct implementation vs. using DefaultLocalTmpStorageProvider)\n - Consider using the same pattern as other services for consistency\n\n4. General suggestions:\n - Add unit tests for the LocalTmpStorage implementation\n - Consider adding documentation about the thread safety of the interface\n - Add monitoring/metrics for temporary storage usage\n\nThe change appears to be a good foundation, but I recommend addressing at least the lifecycle management and error handling aspects before merging. The consistent implementation across services and the clear interface are definitely steps in the right direction.'}], 'stop_reason': 'end_turn', 'stop_sequence': None, 'usage': {'input_tokens': 3608, 'output_tokens': 341}}

This comment was added by our PR Review Assistant Bot. Please kindly acknowledge that while we're doing our best to keep these comments up to very high standards, they may occasionally be incorrect. Suggestions offered by the Bot are only intended as points for consideration and no statements by this bot alone can be considered grounds for merging of any pull request. Remember to seek a review from a human co-worker.

@srv-rr-github-token
Copy link

Suggested pull request title: feat: add LocalTmpStorage interface for unified temporary storage access

I appreciate the effort to create a unified interface for managing temporary storage across all Druid services. The code is well-structured and follows good Java practices with clear interface definition and implementation.

However, there are a few areas that could be improved to make this change more robust and maintainable:

  1. In LocalTmpStorage.java:

    • Consider adding documentation about the lifecycle of the temporary directory (when it's created, when it should be cleaned up)
    • Add a method for cleanup/deletion of the temporary directory
    • Consider adding checks for directory permissions and existence in the DefaultLocalTmpStorageProvider
  2. In service implementations:

    • The implementation pattern is inconsistent between CliPeon (using a custom provider method) and other services (using DefaultLocalTmpStorageProvider)
    • Consider adding proper error handling when temp directory creation fails in CliPeon's implementation
  3. Testing:

    • No tests have been added to verify the functionality of the LocalTmpStorage implementation
    • Consider adding unit tests for the DefaultLocalTmpStorageProvider
    • Consider adding integration tests to verify temp directory behavior in different services
  4. Documentation:

    • Add javadoc comments explaining the purpose and usage of LocalTmpStorage
    • Consider adding examples of proper usage in the interface documentation

Please address these points and add appropriate tests before merging. The overall direction of the change is good, but it needs additional safeguards and documentation to ensure reliable operation.

This comment was added by our PR Review Assistant Bot. Please kindly acknowledge that while we're doing our best to keep these comments up to very high standards, they may occasionally be incorrect. Suggestions offered by the Bot are only intended as points for consideration and no statements by this bot alone can be considered grounds for merging of any pull request. Remember to seek a review from a human co-worker.

@pawel-dabro
Copy link

Please disregard the bot comments. I used this as example for testing a bot reviewer and it was never meant to publish these.

@rohangarg rohangarg force-pushed the test_dir_config branch 2 times, most recently from 1a65314 to b27c538 Compare January 6, 2025 17:16
@rohangarg rohangarg merged commit 951792a into apache:master Jan 7, 2025
77 checks passed
@rohangarg rohangarg deleted the test_dir_config branch January 7, 2025 11:22
@adarshsanjeev adarshsanjeev added this to the 32.0.0 milestone Jan 16, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants