Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[bounty] create a deduplication endpoint #1106

Open
louis030195 opened this issue Jan 8, 2025 · 3 comments
Open

[bounty] create a deduplication endpoint #1106

louis030195 opened this issue Jan 8, 2025 · 3 comments
Labels
💎 Bounty enhancement New feature or request

Comments

@louis030195
Copy link
Collaborator

/bounty 150

definition of done:

  • we have a deduplication endpoint that would help developer experience - making better agents / AI apps by removing the noise (see search pipe which already does it client side and it's quite bad UX because it blocks the UI thread and when there are tons of results it freeze for like 30+ seconds. using string similarity heuristics)
  • should have a way to call it in the SDK like pipe.dedup(mydata) maybe taking the queryScreenpipe output as input? not sure, suggest good DX
  • should use embedding model either using candle or onnx somehow (candle usually less pain but less models)
  • most of the code in its own file imported then in server.rs
  • should be as fast as possible and not destroying the user computer (use mac metal, or mkl/cuda if possible)

any thoughts before starting this?

@louis030195 louis030195 added the enhancement New feature or request label Jan 8, 2025
Copy link

algora-pbc bot commented Jan 8, 2025

💎 $150 bounty • Screenpi.pe

Steps to solve:

  1. Start working: Comment /attempt #1106 with your implementation plan
  2. Submit work: Create a pull request including /claim #1106 in the PR body to claim the bounty
  3. Receive payment: 100% of the bounty is received 2-5 days post-reward. Make sure you are eligible for payouts

Thank you for contributing to mediar-ai/screenpipe!

Add a bountyShare on socials

Attempt Started (GMT+0) Solution
🟢 @b4s36t4 Jan 9, 2025, 6:01:05 AM WIP

@b4s36t4
Copy link
Contributor

b4s36t4 commented Jan 9, 2025

/attempt #1106

Will put out the details on implementation here soon.

Algora profile Completed bounties Tech Active attempts Options
@b4s36t4    1 mediar-ai bounty
+ 3 bounties from 2 projects
TypeScript, Rust,
JavaScript & more
Cancel attempt

@b4s36t4
Copy link
Contributor

b4s36t4 commented Jan 9, 2025

Embedding Model: jina-embeddings-v3, this does support multiple languages can be used with onnx as well.
SDK Action: I'm thinking to take the queryScreenpipe's input fields and let the backend do the query and return non-duplicated results.

Since SDK is in JS, passing data to rust might be costly in some cases, so good to do in rust itself.

New crates required:

  • Candle (or) ORT - To Run Onnx Model.
  • tokenizers - Tokenzie and embed
    please take a look and let me know if I need to make any changes or so.

cc: @louis030195

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
💎 Bounty enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants