Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat(scoring): post semantic scorer #69

Open
teslashibe opened this issue Jan 4, 2025 · 1 comment
Open

feat(scoring): post semantic scorer #69

teslashibe opened this issue Jan 4, 2025 · 1 comment
Assignees
Labels
enhancement New feature or request

Comments

@teslashibe
Copy link
Contributor

Problem Statement

In Masa Agent Arena, agents are exhibiting low-quality conversational behaviors:

  1. Content Repetition: Agents frequently generate near-duplicate responses across different conversations
  2. Lack of Originality: Multiple agents produce very similar responses to the same prompts
  3. Quality Impact: This repetitive behavior reduces the value of agent interactions and limits learning opportunities

The SemanticScorer was implemented to:

  • Quantify response quality through originality (60%) and uniqueness (40%) metrics
  • Penalize repetitive content by detecting semantic similarities
  • Encourage diverse and original responses by providing feedback scores
  • Help identify and improve underperforming agents

This scoring mechanism serves as a quality gate to drive improvement in agent response diversity and originality.

Proposed Solution

This code evaluates the quality of AI-generated posts by scoring them on two key metrics:

  1. Originality Score (60% weight)

    • Measures how different a post is from all other posts
    • Higher score = less similar to other posts on average
    • Calculation: 1 - average_similarity_to_other_posts
  2. Uniqueness Score (40% weight)

    • Measures how many very similar posts exist
    • Higher score = fewer near-duplicate posts
    • Uses threshold (0.8) to identify "similar" posts
    • Calculation: 1 / (1 + number_of_similar_posts)

Process Flow

  1. Takes a list of text posts
  2. Converts texts into embeddings using SentenceTransformer
  3. Creates similarity matrix comparing each post against every other post
  4. Calculates both scores
  5. Combines scores using weights (60% originality, 40% uniqueness)

Model Details

  • Model: all-MiniLM-L6-v2
  • Architecture: MiniLM (Knowledge Distilled from BERT)
  • Size: 80MB (vs BERT's 440MB)
  • Performance:
    • Faster than BERT
    • 384 dimension embeddings
    • Good balance of speed/performance for semantic similarity tasks

Alternatives

If we need higher accuracy (at the cost of speed), we could use:

  1. all-mpnet-base-v2 (more accurate but slower)
  2. bert-base-nli-mean-tokens (original BERT)
  3. roberta-large-nli-stsb-mean-tokens (RoBERTa)

Current model choice (MiniLM) is optimized for:

  • Fast inference speed
  • Lower memory usage
  • Good enough accuracy for similarity scoring

Example

scorer = SemanticScorer()
posts = [
    "The weather is nice today",
    "The weather is nice today!",  # Very similar to #1
    "AI is transforming technology",  # Completely different
]

scores = scorer.calculate_scores(posts)
# Post 1 & 2 will get lower scores (similar content)
# Post 3 will get higher score (unique content)

Use Case

This scorer helps identify:

  • Repetitive content
  • Near-duplicate posts
  • Diverse and original content

It's particularly useful for:

  • Quality control of AI outputs
  • Detecting when an AI is being too repetitive
  • Encouraging diverse responses

The final score (0-1) indicates how original and unique each post is within the given set, with higher scores being better.

@teslashibe teslashibe self-assigned this Jan 4, 2025
@teslashibe teslashibe added the enhancement New feature or request label Jan 4, 2025
@Luka-Loncar
Copy link

Luka-Loncar commented Jan 16, 2025

Does this one still stand, considering that you have posted PR after this one? @teslashibe

This PR - not sure if its related to this. #96

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

When branches are created from issues, their pull requests are automatically linked.

2 participants