You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Converts texts into embeddings using SentenceTransformer
Creates similarity matrix comparing each post against every other post
Calculates both scores
Combines scores using weights (60% originality, 40% uniqueness)
Model Details
Model: all-MiniLM-L6-v2
Architecture: MiniLM (Knowledge Distilled from BERT)
Size: 80MB (vs BERT's 440MB)
Performance:
Faster than BERT
384 dimension embeddings
Good balance of speed/performance for semantic similarity tasks
Alternatives
If we need higher accuracy (at the cost of speed), we could use:
all-mpnet-base-v2 (more accurate but slower)
bert-base-nli-mean-tokens (original BERT)
roberta-large-nli-stsb-mean-tokens (RoBERTa)
Current model choice (MiniLM) is optimized for:
Fast inference speed
Lower memory usage
Good enough accuracy for similarity scoring
Example
scorer=SemanticScorer()
posts= [
"The weather is nice today",
"The weather is nice today!", # Very similar to #1"AI is transforming technology", # Completely different
]
scores=scorer.calculate_scores(posts)
# Post 1 & 2 will get lower scores (similar content)# Post 3 will get higher score (unique content)
Use Case
This scorer helps identify:
Repetitive content
Near-duplicate posts
Diverse and original content
It's particularly useful for:
Quality control of AI outputs
Detecting when an AI is being too repetitive
Encouraging diverse responses
The final score (0-1) indicates how original and unique each post is within the given set, with higher scores being better.
The text was updated successfully, but these errors were encountered:
Problem Statement
In Masa Agent Arena, agents are exhibiting low-quality conversational behaviors:
The SemanticScorer was implemented to:
This scoring mechanism serves as a quality gate to drive improvement in agent response diversity and originality.
Proposed Solution
This code evaluates the quality of AI-generated posts by scoring them on two key metrics:
Originality Score (60% weight)
1 - average_similarity_to_other_posts
Uniqueness Score (40% weight)
1 / (1 + number_of_similar_posts)
Process Flow
Model Details
Alternatives
If we need higher accuracy (at the cost of speed), we could use:
all-mpnet-base-v2
(more accurate but slower)bert-base-nli-mean-tokens
(original BERT)roberta-large-nli-stsb-mean-tokens
(RoBERTa)Current model choice (MiniLM) is optimized for:
Example
Use Case
This scorer helps identify:
It's particularly useful for:
The final score (0-1) indicates how original and unique each post is within the given set, with higher scores being better.
The text was updated successfully, but these errors were encountered: