-
Notifications
You must be signed in to change notification settings - Fork 4
Kosh Sprint Dec 21 to Jan 22
We wish to host the following datasets on Kosh and open it up to public :
Given the large size of some of the datasets, we would also like this data to be searchable via text, image, video and textual queries.
In its current state, Kosh has a basic signup and authentication flow using JWT. Using this anyone can login into Kosh and see the hosted datasets. The datasets were added via API calls to kosh but this API currently has no access control, making it unsafe for public access. The process of adding datasets also relies heavily on the involvement of a tattle admin which is a bottleneck to adding more datasets by tattle team members or trusted partners. On the Search side, We have made progress on optimising the memory requirements of our search engine Feluda. It supports indexing text, images and videos (of size < 20mb). This server has tested API endpoints to index and search text, images and video. Work needs to be done to integrate it with a Queue (RabbitMQ) and profile its memory and concurrency.
We have scoped the missing tasks into Features that could be worked on independently to make incremental progress towards achieving the Goal
- Secure Public Access
- Upload Media
- Index Media
- Explore Datasets
Any discussions on tweaking and adding to the scope can be found here
domains : backend engineering, API design, frontend engineering, security, database management(sql)
- As an admin I want to create users with the role author and viewer
- As an admin I want to be able to delete or block users
- As tattle I want to be sure that unauthorized access to the data is not possible
- As tattle I want to be sure that a user can't add, edit or delete Media into a dataset that is not associated with them domains : backend engineering, API design, frontend engineering, security, database management(sql)
domains : backend engineering, API design, frontend engineering, security, database management(sql)
- As an admin I want to create users with the role editor and viewer
- As an admin I want to be able to delete or block users
- As tattle I want to be sure that unauthorized access to the data is not possible
- As tattle I want to be sure that a bot can't add, edit or delete Media into a dataset that is not associated with it.
- As an author, i want to manage my bot's access to kosh
Relevant Links
- Bots and authorization tokens (https://github.com/tattle-made/kosh-v2/discussions/9)
domains : scripting(python, javascript), backend engineering(JS), frontend engineering(ReactJS), database management(mongo, sql), devops(Kubernetes, Github Actions), API integration
- As an author, I want to write a script that I can run periodically to upload the data I have scraped to Kosh
- As an admin, I want to check if all the data added to kosh has been indexed into Feluda
- As an admin I want to retry failed index jobs
- As an admin I want to prevent certain posts from getting Indexed in future
Domains : ml engineering, devops(kubernetes), frontend engineering, api integration
- As an author or viewer, I want to see the datasets hosted on Kosh
- As an author or viewer, I want to be able to search for a text Media by the text snippets I remember it containing
- As an author or viewer, I want to be upload an image on kosh and see if its present in Kosh
- As an author or viewer, I want to upload a video and see duplicate or similar videos on Kosh
- As an author or viewer, I want to write read what metadata a dataset has and query for it. eg author.name="akaash"
Relevant Links
- Evaluating Best practices for deploying kosh on kubernetes (discussion)
domains : frontend engineering, api integration
There are certain opinionated libraries and frameworks that we use heavily across our software stack that I think you’d benefit from reading up on. Some familiarity with these will help you ramp up on the code and also write your own code.
Web App | Rest API | Search Server | DevOps |
---|---|---|---|
Gatsby | Express | Kubernetes | |
Grommet | Flask | Github Actions | |
Sequelize | ElasticSearch | ||
RabbitMQ |
Please reach out to [email protected] or post to #tattle_tech on our slack