diff --git a/www/src/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png b/www/src/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png new file mode 100644 index 000000000..67bf561ce Binary files /dev/null and b/www/src/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png differ diff --git a/www/src/content/blog/2024-11-02-asynchronous-tasks-in-cloudflare-part1 copy.mdx b/www/src/content/blog/2024-11-02-asynchronous-tasks-in-cloudflare-part1 copy.mdx index 9974f2f18..fd149f0fa 100644 --- a/www/src/content/blog/2024-11-02-asynchronous-tasks-in-cloudflare-part1 copy.mdx +++ b/www/src/content/blog/2024-11-02-asynchronous-tasks-in-cloudflare-part1 copy.mdx @@ -124,9 +124,9 @@ const tasks = [ c.executionCtx.waitUntil(Promise.all(tasks)); ``` -If we now look at the trace, we see the response returns immediately, +If we now look at the trace, we see the response returns immediately, and after a short while, the client instrumentation also shows the other two tasks related to the trace. -This method leads to the parallel execution of both tasks, +This method leads to the parallel execution of both tasks, and the trace clearly shows how this improves our response time compared to the sequential approach: ![Parallel execution in Fiberplane](@/assets/blog/2024-11-02-parallel.png) diff --git a/www/src/content/blog/2024-11-06-honcathon-announcement.mdx b/www/src/content/blog/2024-11-06-honcathon-announcement.mdx index 554bf90b0..ba5728847 100644 --- a/www/src/content/blog/2024-11-06-honcathon-announcement.mdx +++ b/www/src/content/blog/2024-11-06-honcathon-announcement.mdx @@ -17,19 +17,19 @@ This November, we’re celebrating the Goose in spirit with our first [virtual H ![Honcathon picture](@/assets/blog/2024-11-06-Honcathon.png) - ## What is HONC? + [HONC](https://honc.dev/) is a stack that accelerates web development, featuring [Hono](https://hono.dev/) as a web framework, [Drizzle](https://orm.drizzle.team/) as an ORM, [Neon](https://neon.tech/) as a database, and [Cloudflare Workers](https://workers.cloudflare.com/). With these tools, getting started on web projects has never been faster. Additionally, [Fiberplane](https://fiberplane.com/) helps to debug and test your APIs, making development even smoother. This month, Fiberplane is partnering with these tools to bring a touch of feathered fun to your IDE. ## How It Works + We have four different categories for building applications, and participants can choose any of them. The winner of each category will receive a €500 Amazon voucher. Be sure [to register](https://honc.dev/honcathon), as we have some exciting activities planned throughout the Honcathon! ## Ready to Get Started? -You’ll have until December 15th to submit your project. Don’t worry—building your project is designed to take just a half-day to a day of work, making it perfect to fit into your schedule. +You’ll have until December 15th to submit your project. Don’t worry—building your project is designed to take just a half-day to a day of work, making it perfect to fit into your schedule. This November, get your goose on with HONC! - diff --git a/www/src/content/blog/2024-11-13-retrieval-augmented-geese.mdx b/www/src/content/blog/2024-11-13-retrieval-augmented-geese.mdx new file mode 100644 index 000000000..640d6d77f --- /dev/null +++ b/www/src/content/blog/2024-11-13-retrieval-augmented-geese.mdx @@ -0,0 +1,326 @@ +--- +title: "Retrieval Augmented Geese - Semantic search with the HONC stack" +description: An example of building semantic search with the HONC stack +slug: retrieval-augmented-geese +date: 2024-11-13 +author: Brett Beutell +tags: + - HONC + - RAG + - Semantic search +--- + +import { Aside, Card, LinkCard } from "@astrojs/starlight/components"; + +If you’ve heard the term RAG or “Retrieval Augmented Generation” lately, you may have asked yourself something like “What is that?” or “How did such a ridiculous acronym get so popular?” + +I can’t answer to the way it was named, but I can tell you this: One of the core pieces of RAG systems is semantic search, which helps map the underlying meaning (_semantics_) of a search query to the closest-related meaning of a set of documents in a database. + +In this post, we’ll go over how to implement basic semantic search with [the HONC stack](https://honc.dev/), using [Hono](https://hono.dev/) as an api framework, [Neon](https://neon.tech/) postgres as a database, [Drizzle](https://orm.drizzle.team/) as an ORM, and [Cloudflare Workers](https://workers.cloudflare.com/) as the serverless deployment platform. + +We’ll build a small semantic search engine for Cloudflare's documentation, giving us a system that understands the meaning behind search queries, not just keyword matches. + + + +## A Conceptual Primer + +Before we get started, let's go over some basic concepts. This will be especially helpful if you're not already familiar with vectors and embeddings. + + + +### Vectors + +For our purposes, vectors are lists of numbers, like `[1, 2, 3, 4, 5]`. + +Vectors are described in terms of their length. An example of a vector of length two would be `[.68, .79]` and a vector of length three would look like `[.883, .24, .6001]`. Simple! + +When vectors are all the same length, you can compare and manipulate them in interesting ways, by adding them together, subtracting them, or finding the distance between them. + +All of this will be relevant in a bit, I promise 🙂 + +### Embeddings + +Put shortly, embeddings are vector representations of the meanings of words and phrases… But. Well. That’s a little abstract. + +My favorite analogy for why these are useful and how they work comes from from AI researcher [Linus Lee.](https://thesephist.com/) He draws a comparison to colors. There’s a difference between describing a color with a name, like `“blue”`, versus with an RGB value `rgb(0, 0, 255)`. In this case, the RGB value is a vector of length three, `(0, 0, 255)`. + +If we wanted to mix some red into the named color `"blue"` and make a new color that’s just a _little_ more purple, how would we do that? + +Well, if all we have is the name of the color, there’s not much we can do. We’d just invent a new color and give it a new name, like `"purpleish blue"`. With an RGB value, though, we can simply “add” some red: + +`rgb(20, 0 , 0) + rgb(0, 0, 255) = rgb(20, 0, 255)` + +Because we chose to represent our color with _vectors of numbers_, we can do math on it. We can change it around, mix it with other colors, and have an all around good time with it. + +Embeddings are a way for us to do this kind of math on human language. Embeddings are vectors, like RGB values, except they’re much larger. + +How would you “do math on human language” though? Borrowing an example from the trusty Internet, let’s say we have a vector for the word `"king"`, and a vector for the words `"man"` and `"woman"`. + +If we subtract the vector for `"man"` from the vector for `"king"` , then add the vector for the word `"woman"`. What would you expect we get? +Wild enough, we would get a vector _very very close_ to the one for the word `"queen"`. + +Pretty neat, huh? + +### Vector Search + +Searching across vectors usually refers to looking for vectors that are similar to one another. + +In this case, we think of similarity in terms of distance. Two vectors that are close to one another are similar. Two vectors that are far apart are different. + +So, in a database that stores vectors, we can calculate the distance between an input vector, and all vectors in the database, and return only the ones that are most similar. +In this post, you will see a reference to "cosine similarity", which is a way to calculate the distance between two vectors. So, don't get freaked if we start talking about cosines. + +Basically, instead of looking for exact matches or keyword matches for a user's query, we look for “semantically similar matches” based off of cosine distance. + +## How Do We Start? + +To perform semantic search, we need: + +- a database that supports vector embeddings +- a way to vectorize text +- a way to search for similar vectors in the database + +To be entirely frank, the hardest part of building semantic search is knowing how to parse and split up your target documents text into meaningful chunks. + +I spent most of my time on this project just pulling down, compiling, and chunking the Cloudflare documentation. After that, the search part was easy-peasy. +I will gloss over the tedious parts of this below, but I've provided a link to the script that processes the documentation, for anyone who is interested. + +That said, let's go over the stack and database models we'll be using for the actual searching part. + +### The Stack + +We want to expose a simple API for users to search the Cloudflare documentation. Then we want some tools for storing the documents and their embeddings, as well as querying them. + +Here's the stack we'll be using for all of this: + +- **Hono**: A lightweight web framework for building typesafe APIs +- **Neon Postgres**: Serverless postgres database for storing documents and vector embeddings +- **OpenAI**: To vectorize documentation content and user queries +- **Drizzle ORM**: For constructing type-safe database operations +- **Cloudflare Workers**: To host the API on a serverless compute platform + +### Setting up the Database + +First, we define a schema using Drizzle ORM. + +When we craft our database models, we have to think of what kind of search results we want to return. +This leads us to the idea of "chunking", which is the process of splitting up the text into smaller chunks. + +The logic is: We don't want to match a user's query to entire documents, because that would return a lot of irrelevant results. +Instead, we should split up each documentation page into smaller chunks, and match the user's query to the most semantically similar chunks. + +Since we're working witha relational database, we can define a schema for the documents and chunks, where each document can have many chunks. + +So, our Drizzle schema defines two main tables: + +- `documents`: Stores the original documentation pages +- `chunks`: Stores content chunks with their vector embeddings + +```tsx title="src/db/schema.ts" +export const documents = pgTable("documents", { + id: uuid("id").defaultRandom().primaryKey(), + title: text("title").notNull(), + url: text("url"), + content: text("content"), + hash: text("hash").notNull() +}); + +export const chunks = pgTable("chunks", { + id: uuid("id").defaultRandom().primaryKey(), + documentId: uuid("document_id") + .references(() => documents.id) + .notNull(), + chunkNumber: integer("chunk_number").notNull(), + text: text("text").notNull(), + embedding: vector("embedding", { dimensions: 1536 }), + metadata: jsonb("metadata").$type>(), + hash: text("hash").notNull() +}); +``` + +Once we define a schema, we can create the tables in our database. +If you're following along on GitHub, you can run the commands below to create the tables. +Under the hood, we use Drizzle to generate migration files and apply them to the database. + +```bash +pnpm db:generate +pnpm db:migrate +``` + +Now, with our database set up, we can move on to processing the documentation itself into vector embeddings. + +### Processing Documentation + +The heart of our system is the document processing pipeline. It's a bit of a beast. +I'm going to move through this quickly, but you can see the full implementation in +[`./src/scripts/create-vectors.ts`](https://github.com/fiberplane/create-honc-app/blob/main/examples/cf-retrieval-augmented-goose/scripts/create-vectors.ts). + +This script: + +1. Takes HTML documentation files as input +2. Uses GPT-4o to clean and chunk the content +3. Generates embeddings for each chunk +4. Stores everything in our Neon Postgres database + +Once this runs, we have a database full of document chunks and their embeddings, and we can move on to building the search API. +Probably the most interesting part of this script is how we use GPT-4o to clean and chunk the content. In some cases, this might be +cost prohibitive, but for our use case, it was a no-brainer. We only need to run this once, and it was a lot smarter than any heuristics I would've defined myself. + +If you're following along on GitHub, I commend you. You can run these commands to process the documentation. (Please file an issue if you run into any problems.) + +```bash +# Process the Cloudflare documentation +cd data +bash copy-cf-docs.sh +cd ../ +# Create the vector embeddings +pnpm run vectors:create +``` + +## The Search API + +When a user makes a search request, we do the following: + +1. Convert their query into a vector embedding +2. Use cosine similarity to find the most relevant chunks for their query +3. Return the top matches back to the user + +Why convert their query into a vector embedding? We want to find the most semantically similar chunks, +so we need to represent their query in the same format as our database chunks. + +The search endpoint is surprisingly simple thanks to Hono. +We define a `GET` route that takes `query` and `similarity` parameters. + +```tsx {6-9} title="src/index.tsx" +app.get("/search", async (c) => { + // ... + + // Parse query parameters + const query = c.req.query("query"); + const similarityCutoff = + Number.parseFloat(c.req.query("similarity") || "0.5") ?? 0.5; + + // ... +}); +``` + +Then, we make a request to OpenAI to create an embedding for the user's search query. + +```tsx {8-12} title="src/index.tsx" +app.get("/search", async (c) => { + // ... + + // Initialize the OpenAI client + const openai = new OpenAI({ apiKey: c.env.OPENAI_API_KEY }); + + // Create embedding for the search query + const embeddingResult = await openai.embeddings.create({ + model: "text-embedding-3-small", + input: query + }); + const userQueryEmbedding = embeddingResult.data[0].embedding; + + // ... +}); +``` + +Finally, we craft a similarity search based on the cosine distance between the query embedding and each chunk's embedding. +Here, the `drizzleSql` helper is the `sql` helper exported by Drizzle, which has been renamed for clarity. +It allows us to construct type-safe sql expressions. + +```tsx title="src/index.tsx" +app.get("/search", async (c) => { + // ... + + // Craft a similarity search query based on the cosine distance between + // the embedding of the user's query, and the embedding of each chunk from the docs. + const similarityQuery = drizzleSql`1 - (${cosineDistance(chunks.embedding, queryEmbedding)})`; + + // Search for chunks with similarity above the cutoff score + const results = await db + .select({ + id: chunks.id, + text: chunks.text, + similarity: similarityQuery + }) + .from(chunks) + .where(drizzleSql`${similarityQuery} > ${similarityCutoff}`) + .orderBy(drizzleSql`${similarityQuery} desc`) + .limit(10); + + // ... +}); +``` + +That leaves us with a list of chunks that are semantically similar to the user's query! +We can choose to render these results however we see fit. + +When we use Fiberplane to test the search API, we can see the timeline of the request, including the embedding generation, similarity search, and result rendering. +We can also see the raw SQL that was executed, which is a little unwieldy since we're dealing with vectorized queries: + +![timeline of search request](@/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png) + +And that's it! We've built a semantic search engine with the HONC stack. + +## The Magic of Vector Search + +What makes this more powerful than regular text search? Again, vector embeddings capture semantic meaning. For example, a search for "how to handle errors in Workers" will find relevant results even if they don't contain those exact words. + +Neon makes this simple, easy, and scalable by allowing efficient similarity searches over high-dimensional vectors _out of the box_. + +However, that doesn't mean that vector search is _the only_ tool for retrieval. Any robust system should consider the trade-offs of vector search vs. keyword search, and likely combine the two. + +## Deployment + + + +Deploying to Cloudflare Workers is straightforward. We just push the code and set the secrets. + +```bash +# Set your secrets +wrangler secret put DATABASE_URL +wrangler secret put OPENAI_API_KEY + +# Deploy +pnpm run deploy + +``` + +## Conclusion + +If you're looking for a quick way to get up and running with semantic search, this should give you a solid starting point. + +Worth noting: there are a lot of libraries out there that can take care of the heavy lifting of building RAG systems. +At Fiberplane, we've built smaller systems with [Llamaindex](https://www.llamaindex.ai/) and [Langchain](https://www.langchain.com/), +but for simpler use cases, I found myself wading through documentation and GitHub issues more than I'd like. +A higher-level library can be very helpful for bigger applications, but for something this simple, I think the core concepts are valuable to implement and understand. + +To that end, the full code for everything in this post is available on GitHub, and you can adapt it for your own needs. + + + +Otherwise, don't forget to check out the [HONC stack](https://honc.dev/) for more examples of building with Hono, Drizzle, Neon, and Cloudflare.