-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Bib 809 machine translation scoring #620
base: master
Are you sure you want to change the base?
Conversation
src/Aquifer.API/Endpoints/Resources/Content/Publish/Endpoint.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.API/Endpoints/Resources/Content/Publish/Endpoint.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ScoreResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ScoreResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ScoreResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ScoreResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ScoreResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
tests/Aquifer.Common.UnitTests/Utilities/StringSimilarityUtilitiesTests.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Common/Messages/Models/ResourceContentVersionSimilarityComparisonTypes.cs
Outdated
Show resolved
Hide resolved
Set canellation token to none on Publisher endpoint
resourceContentVersionMachineTranslationId felt a bit redundant as all resources are 'resource content versions'. So, I added 'translation' to be consistent with DB naming
src/Aquifer.API/Endpoints/Resources/Content/Publish/Endpoint.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.API/Endpoints/Resources/Content/Publish/Endpoint.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
src/Aquifer.Jobs/Subscribers/ResourceContentVersionSimilarityMessageSubscriber.cs
Outdated
Show resolved
Hide resolved
Make methods private on StringSimilarityUtilities.cs
Update logger statements Ids
Move publisher services to that new class
Bump @jwinston-bn @NateMerritt |
await _dbContext.SaveChangesAsync(ct); | ||
} | ||
|
||
private async Task<ResourceContentVersionSimilarityScoreEntity> GenerateResourceContentVersionSimilarityScoreEntity( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I generally would like to have the Async
suffix on any async methods for consistency (like ProcessAsync
above).
And personal preference, I wouldn't typically include types in method names where it can be implied by the return type. This is kind of like if the score were an int, and the method was GenerateSimilarityScoreInt
. I would just call it GenerateSimilarityScoreAsync
.
return await dbContext | ||
.ResourceContentVersionMachineTranslations | ||
.AsTracking() | ||
.Join( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would like to see if we can get this using the object references, rather than using this Join operator.
x => x.ResourceContentVersion.ResourceContentVersionSnapshots
This PR addresses the need to evaluate the effectiveness of machine translations and human edits by determining the similarity between two versions of a resource. To achieve this, we will utilize the Levenshtein Distance algorithm, a measure of the minimum number of single-character edits (insertions, deletions, or substitutions) required to change one string into another.
The PR includes a Levenshtein Distance implementation and the necessary utility methods to effectively score long resources. A new queue, generate-resource-content-similarity-score, has been created, along with a corresponding message, publisher, and subscriber. The message will contain information about the resource content versions being compared, including the type of comparison being performed, as designated by one of the values in the ResourceContentVersionSimilarityComparisonTypes enum. This allows the subscriber to run the appropriate logic for the resource content types being compared.