Add retrieval augmented goose blog post (#358)

* Add retrieval augmented goose blog post * Format * Tidy up blog post * Throw in a screenshot of Fiberplane Studio * Remove bit on pgvector
fiberplane · Nov 12, 2024 · f107e13 · f107e13
1 parent f7b3fae
commit f107e13
Show file tree

Hide file tree

Showing 4 changed files with 331 additions and 5 deletions.
diff --git a/www/src/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png b/www/src/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png
diff --git a/www/src/content/blog/2024-11-02-asynchronous-tasks-in-cloudflare-part1 copy.mdx b/www/src/content/blog/2024-11-02-asynchronous-tasks-in-cloudflare-part1 copy.mdx
@@ -124,9 +124,9 @@ const tasks = [
 c.executionCtx.waitUntil(Promise.all(tasks));
 ```
 
-If we now look at the trace, we see the response returns immediately, 
+If we now look at the trace, we see the response returns immediately,
 and after a short while, the client instrumentation also shows the other two tasks related to the trace.
-This method leads to the parallel execution of both tasks, 
+This method leads to the parallel execution of both tasks,
 and the trace clearly shows how this improves our response time compared to the sequential approach:
 
 ![Parallel execution in Fiberplane](@/assets/blog/2024-11-02-parallel.png)

diff --git a/www/src/content/blog/2024-11-06-honcathon-announcement.mdx b/www/src/content/blog/2024-11-06-honcathon-announcement.mdx
@@ -17,19 +17,19 @@ This November, we’re celebrating the Goose in spirit with our first [virtual H
 
 ![Honcathon picture](@/assets/blog/2024-11-06-Honcathon.png)
 
-
 ## What is HONC?
+
 [HONC](https://honc.dev/) is a stack that accelerates web development, featuring [Hono](https://hono.dev/) as a web framework, [Drizzle](https://orm.drizzle.team/) as an ORM, [Neon](https://neon.tech/) as a database, and [Cloudflare Workers](https://workers.cloudflare.com/). With these tools, getting started on web projects has never been faster.
 Additionally, [Fiberplane](https://fiberplane.com/) helps to debug and test your APIs, making development even smoother.
 
 This month, Fiberplane is partnering with these tools to bring a touch of feathered fun to your IDE.
 
 ## How It Works
+
 We have four different categories for building applications, and participants can choose any of them. The winner of each category will receive a €500 Amazon voucher. Be sure [to register](https://honc.dev/honcathon), as we have some exciting activities planned throughout the Honcathon!
 
 ## Ready to Get Started?
-You’ll have until December 15th to submit your project. Don’t worry—building your project is designed to take just a half-day to a day of work, making it perfect to fit into your schedule.
 
+You’ll have until December 15th to submit your project. Don’t worry—building your project is designed to take just a half-day to a day of work, making it perfect to fit into your schedule.
 
 This November, get your goose on with HONC!
-
diff --git a/www/src/content/blog/2024-11-13-retrieval-augmented-geese.mdx b/www/src/content/blog/2024-11-13-retrieval-augmented-geese.mdx
@@ -0,0 +1,326 @@
+---
+title: "Retrieval Augmented Geese - Semantic search with the HONC stack"
+description: An example of building semantic search with the HONC stack
+slug: retrieval-augmented-geese
+date: 2024-11-13
+author: Brett Beutell
+tags:
+  - HONC
+  - RAG
+  - Semantic search
+---
+
+import { Aside, Card, LinkCard } from "@astrojs/starlight/components";
+
+If you’ve heard the term RAG or “Retrieval Augmented Generation” lately, you may have asked yourself something like “What is that?” or “How did such a ridiculous acronym get so popular?”
+
+I can’t answer to the way it was named, but I can tell you this: One of the core pieces of RAG systems is semantic search, which helps map the underlying meaning (_semantics_) of a search query to the closest-related meaning of a set of documents in a database.
+
+In this post, we’ll go over how to implement basic semantic search with [the HONC stack](https://honc.dev/), using [Hono](https://hono.dev/) as an api framework, [Neon](https://neon.tech/) postgres as a database, [Drizzle](https://orm.drizzle.team/) as an ORM, and [Cloudflare Workers](https://workers.cloudflare.com/) as the serverless deployment platform.
+
+We’ll build a small semantic search engine for Cloudflare's documentation, giving us a system that understands the meaning behind search queries, not just keyword matches.
+
+<LinkCard
+  title="Follow along on GitHub"
+  description="Here is a repo with the code examples for this post"
+  href="https://github.com/fiberplane/create-honc-app/tree/main/examples/cf-retrieval-augmented-goose"
+  icon="external"
+  target="_blank"
+  rel="noopener noreferrer"
+/>
+
+## A Conceptual Primer
+
+Before we get started, let's go over some basic concepts. This will be especially helpful if you're not already familiar with vectors and embeddings.
+
+<Aside type="tip" title="Already familiar with embeddings?">
+  [Skip past](#how-do-we-start) this section if you already know about vectors
+  and embeddings and all that jazz.
+</Aside>
+
+### Vectors
+
+For our purposes, vectors are lists of numbers, like `[1, 2, 3, 4, 5]`.
+
+Vectors are described in terms of their length. An example of a vector of length two would be `[.68, .79]` and a vector of length three would look like `[.883, .24, .6001]`. Simple!
+
+When vectors are all the same length, you can compare and manipulate them in interesting ways, by adding them together, subtracting them, or finding the distance between them.
+
+All of this will be relevant in a bit, I promise 🙂
+
+### Embeddings
+
+Put shortly, embeddings are vector representations of the meanings of words and phrases… But. Well. That’s a little abstract.
+
+My favorite analogy for why these are useful and how they work comes from from AI researcher [Linus Lee.](https://thesephist.com/) He draws a comparison to colors. There’s a difference between describing a color with a name, like `“blue”`, versus with an RGB value `rgb(0, 0, 255)`. In this case, the RGB value is a vector of length three, `(0, 0, 255)`.
+
+If we wanted to mix some red into the named color `"blue"` and make a new color that’s just a _little_ more purple, how would we do that?
+
+Well, if all we have is the name of the color, there’s not much we can do. We’d just invent a new color and give it a new name, like `"purpleish blue"`. With an RGB value, though, we can simply “add” some red:
+
+`rgb(20, 0 , 0) + rgb(0, 0, 255) = rgb(20, 0, 255)`
+
+Because we chose to represent our color with _vectors of numbers_, we can do math on it. We can change it around, mix it with other colors, and have an all around good time with it.
+
+Embeddings are a way for us to do this kind of math on human language. Embeddings are vectors, like RGB values, except they’re much larger.
+
+How would you “do math on human language” though? Borrowing an example from the trusty Internet, let’s say we have a vector for the word `"king"`, and a vector for the words `"man"` and `"woman"`.
+
+If we subtract the vector for `"man"` from the vector for `"king"` , then add the vector for the word `"woman"`. What would you expect we get?
+Wild enough, we would get a vector _very very close_ to the one for the word `"queen"`.
+
+Pretty neat, huh?
+
+### Vector Search
+
+Searching across vectors usually refers to looking for vectors that are similar to one another.
+
+In this case, we think of similarity in terms of distance. Two vectors that are close to one another are similar. Two vectors that are far apart are different.
+
+So, in a database that stores vectors, we can calculate the distance between an input vector, and all vectors in the database, and return only the ones that are most similar.
+In this post, you will see a reference to "cosine similarity", which is a way to calculate the distance between two vectors. So, don't get freaked if we start talking about cosines.
+
+Basically, instead of looking for exact matches or keyword matches for a user's query, we look for “semantically similar matches” based off of cosine distance.
+
+## How Do We Start?
+
+To perform semantic search, we need:
+
+- a database that supports vector embeddings
+- a way to vectorize text
+- a way to search for similar vectors in the database
+
+To be entirely frank, the hardest part of building semantic search is knowing how to parse and split up your target documents text into meaningful chunks.
+
+I spent most of my time on this project just pulling down, compiling, and chunking the Cloudflare documentation. After that, the search part was easy-peasy.
+I will gloss over the tedious parts of this below, but I've provided a link to the script that processes the documentation, for anyone who is interested.
+
+That said, let's go over the stack and database models we'll be using for the actual searching part.
+
+### The Stack
+
+We want to expose a simple API for users to search the Cloudflare documentation. Then we want some tools for storing the documents and their embeddings, as well as querying them.
+
+Here's the stack we'll be using for all of this:
+
+- **Hono**: A lightweight web framework for building typesafe APIs
+- **Neon Postgres**: Serverless postgres database for storing documents and vector embeddings
+- **OpenAI**: To vectorize documentation content and user queries
+- **Drizzle ORM**: For constructing type-safe database operations
+- **Cloudflare Workers**: To host the API on a serverless compute platform
+
+### Setting up the Database
+
+First, we define a schema using Drizzle ORM.
+
+When we craft our database models, we have to think of what kind of search results we want to return.
+This leads us to the idea of "chunking", which is the process of splitting up the text into smaller chunks.
+
+The logic is: We don't want to match a user's query to entire documents, because that would return a lot of irrelevant results.
+Instead, we should split up each documentation page into smaller chunks, and match the user's query to the most semantically similar chunks.
+
+Since we're working witha relational database, we can define a schema for the documents and chunks, where each document can have many chunks.
+
+So, our Drizzle schema defines two main tables:
+
+- `documents`: Stores the original documentation pages
+- `chunks`: Stores content chunks with their vector embeddings
+
+```tsx title="src/db/schema.ts"
+export const documents = pgTable("documents", {
+  id: uuid("id").defaultRandom().primaryKey(),
+  title: text("title").notNull(),
+  url: text("url"),
+  content: text("content"),
+  hash: text("hash").notNull()
+});
+
+export const chunks = pgTable("chunks", {
+  id: uuid("id").defaultRandom().primaryKey(),
+  documentId: uuid("document_id")
+    .references(() => documents.id)
+    .notNull(),
+  chunkNumber: integer("chunk_number").notNull(),
+  text: text("text").notNull(),
+  embedding: vector("embedding", { dimensions: 1536 }),
+  metadata: jsonb("metadata").$type<Array<string>>(),
+  hash: text("hash").notNull()
+});
+```
+
+Once we define a schema, we can create the tables in our database.
+If you're following along on GitHub, you can run the commands below to create the tables.
+Under the hood, we use Drizzle to generate migration files and apply them to the database.
+
+```bash
+pnpm db:generate
+pnpm db:migrate
+```
+
+Now, with our database set up, we can move on to processing the documentation itself into vector embeddings.
+
+### Processing Documentation
+
+The heart of our system is the document processing pipeline. It's a bit of a beast.
+I'm going to move through this quickly, but you can see the full implementation in
+[`./src/scripts/create-vectors.ts`](https://github.com/fiberplane/create-honc-app/blob/main/examples/cf-retrieval-augmented-goose/scripts/create-vectors.ts).
+
+This script:
+
+1. Takes HTML documentation files as input
+2. Uses GPT-4o to clean and chunk the content
+3. Generates embeddings for each chunk
+4. Stores everything in our Neon Postgres database
+
+Once this runs, we have a database full of document chunks and their embeddings, and we can move on to building the search API.
+Probably the most interesting part of this script is how we use GPT-4o to clean and chunk the content. In some cases, this might be
+cost prohibitive, but for our use case, it was a no-brainer. We only need to run this once, and it was a lot smarter than any heuristics I would've defined myself.
+
+If you're following along on GitHub, I commend you. You can run these commands to process the documentation. (Please file an issue if you run into any problems.)
+
+```bash
+# Process the Cloudflare documentation
+cd data
+bash copy-cf-docs.sh
+cd ../
+# Create the vector embeddings
+pnpm run vectors:create
+```
+
+## The Search API
+
+When a user makes a search request, we do the following:
+
+1. Convert their query into a vector embedding
+2. Use cosine similarity to find the most relevant chunks for their query
+3. Return the top matches back to the user
+
+Why convert their query into a vector embedding? We want to find the most semantically similar chunks,
+so we need to represent their query in the same format as our database chunks.
+
+The search endpoint is surprisingly simple thanks to Hono.
+We define a `GET` route that takes `query` and `similarity` parameters.
+
+```tsx {6-9} title="src/index.tsx"
+app.get("/search", async (c) => {
+  // ...
+
+  // Parse query parameters
+  const query = c.req.query("query");
+  const similarityCutoff =
+    Number.parseFloat(c.req.query("similarity") || "0.5") ?? 0.5;
+
+  // ...
+});
+```
+
+Then, we make a request to OpenAI to create an embedding for the user's search query.
+
+```tsx {8-12} title="src/index.tsx"
+app.get("/search", async (c) => {
+  // ...
+
+  // Initialize the OpenAI client
+  const openai = new OpenAI({ apiKey: c.env.OPENAI_API_KEY });
+
+  // Create embedding for the search query
+  const embeddingResult = await openai.embeddings.create({
+    model: "text-embedding-3-small",
+    input: query
+  });
+  const userQueryEmbedding = embeddingResult.data[0].embedding;
+
+  // ...
+});
+```
+
+Finally, we craft a similarity search based on the cosine distance between the query embedding and each chunk's embedding.
+Here, the `drizzleSql` helper is the `sql` helper exported by Drizzle, which has been renamed for clarity.
+It allows us to construct type-safe sql expressions.
+
+```tsx title="src/index.tsx"
+app.get("/search", async (c) => {
+  // ...
+
+  // Craft a similarity search query based on the cosine distance between
+  // the embedding of the user's query, and the embedding of each chunk from the docs.
+  const similarityQuery = drizzleSql<number>`1 - (${cosineDistance(chunks.embedding, queryEmbedding)})`;
+
+  // Search for chunks with similarity above the cutoff score
+  const results = await db
+    .select({
+      id: chunks.id,
+      text: chunks.text,
+      similarity: similarityQuery
+    })
+    .from(chunks)
+    .where(drizzleSql`${similarityQuery} > ${similarityCutoff}`)
+    .orderBy(drizzleSql`${similarityQuery} desc`)
+    .limit(10);
+
+  // ...
+});
+```
+
+That leaves us with a list of chunks that are semantically similar to the user's query!
+We can choose to render these results however we see fit.
+
+When we use Fiberplane to test the search API, we can see the timeline of the request, including the embedding generation, similarity search, and result rendering.
+We can also see the raw SQL that was executed, which is a little unwieldy since we're dealing with vectorized queries:
+
+![timeline of search request](@/assets/blog/2024-11-13-retrieval-augmented-goose-timeline.png)
+
+And that's it! We've built a semantic search engine with the HONC stack.
+
+## The Magic of Vector Search
+
+What makes this more powerful than regular text search? Again, vector embeddings capture semantic meaning. For example, a search for "how to handle errors in Workers" will find relevant results even if they don't contain those exact words.
+
+Neon makes this simple, easy, and scalable by allowing efficient similarity searches over high-dimensional vectors _out of the box_.
+
+However, that doesn't mean that vector search is _the only_ tool for retrieval. Any robust system should consider the trade-offs of vector search vs. keyword search, and likely combine the two.
+
+## Deployment
+
+<LinkCard
+  title="See it in action"
+  description="Here is a live demo of the search API"
+  href="https://cf-retrieval-augmented-goose.mies.workers.dev/"
+  icon="external"
+  target="_blank"
+  rel="noopener noreferrer"
+/>
+
+Deploying to Cloudflare Workers is straightforward. We just push the code and set the secrets.
+
+```bash
+# Set your secrets
+wrangler secret put DATABASE_URL
+wrangler secret put OPENAI_API_KEY
+
+# Deploy
+pnpm run deploy
+
+```
+
+## Conclusion
+
+If you're looking for a quick way to get up and running with semantic search, this should give you a solid starting point.
+
+Worth noting: there are a lot of libraries out there that can take care of the heavy lifting of building RAG systems.
+At Fiberplane, we've built smaller systems with [Llamaindex](https://www.llamaindex.ai/) and [Langchain](https://www.langchain.com/),
+but for simpler use cases, I found myself wading through documentation and GitHub issues more than I'd like.
+A higher-level library can be very helpful for bigger applications, but for something this simple, I think the core concepts are valuable to implement and understand.
+
+To that end, the full code for everything in this post is available on GitHub, and you can adapt it for your own needs.
+
+<LinkCard
+  title="Get the code on GitHub"
+  description="Here is the repo with the full code for this post"
+  href="https://github.com/fiberplane/create-honc-app/tree/main/examples/cf-retrieval-augmented-goose"
+  icon="external"
+  target="_blank"
+  rel="noopener noreferrer"
+/>
+
+Otherwise, don't forget to check out the [HONC stack](https://honc.dev/) for more examples of building with Hono, Drizzle, Neon, and Cloudflare.