Lakeview + Vector Search: A Simple Monitoring Strategy for RAG Applications
In an era dominated by AI-driven advancements, the way we access and retrieve information has undergone a profound transformation. With platforms like Databricks and Azure offering powerful capabilities with vector search, the possibilities for harnessing AI to streamline organizational processes are endless. Beyond traditional keyword searches, AI is now revolutionizing how we navigate structured and unstructured data, leading to more efficient and personalized experiences.
Consider the scenario where an HR department distributes comprehensive PDF documents containing vital information ranging from company policies to acronyms. An AI-powered information retrieval application, i.e. a chatbot, deployed internally can be used to empower employees to access this wealth of information seamlessly. Questions like “How many federal holidays do I get off this year?” or “What are the PTO policies?” can now be answered with a simple query, saving time and effort for both employees and HR personnel.
Yet, the true value lies beyond convenience. By analyzing user queries, organizations can gain invaluable insights not only into the information needs but also the preferences of their workforce. This data-driven approach can enable HR departments to tailor their onboarding processes to address common queries upfront, leading to a more informed and engaging experience for new hires.
Taking this thought experiment a bit further, what happens when users come across queries that the source document doesn’t address? Here lies the crux of the matter. How can we confidently discern between users’ inquiries and the available information? With this insight, making informed decisions to enhance our source documents becomes feasible.
This brings me to the purpose of this post:
- How can I gain insight into query patterns?
- How can I monitor the available information retrieved from my RAG application?
In this post, I propose a possible solution using Lakeview combined with Vector Search. Visually monitoring which chunks of text are returned from a batch of queries from users along with the cosine similarity score assist in determining what the employee cares about.
- If the count for a particular document chunk (say PTO information, for example) is high but the overall average is low (say 0.3, for example), then we know the users are looking for something that doesn’t exist (or they’re not formulating the question correctly when prompting the model).
- If the query count is high and the score is high (say 0.6, for example), then we know the document does a good job at retrieving the relevant information based on the query. Certainly, there are edge cases like getting a high score, but the answer isn’t quite what the user was looking for. Further refinement would be necessary such as user feedback. It is well understood that cosine similarity isn’t full proof, but it is a step in the right direction. Additional metrics could also be incorporated as well for a better understanding of determining if the document meets the needs of the users such as a hybrid search framework.
In this post, I will run through a very simple setup of how one can get starting with tackling this kind of problem.
Lakeview
Lakeview dashboards are the next generation of SQL dashboards from Databricks that offer improved visualizations, simplified design, optimized sharing and distribution, and unification with the Lakehouse. Overall, they represent an advancement in data visualization and analytics within the Databricks ecosystem. These dashboards offer users a host of practical benefits, including faster rendering speeds, simplified design features, and enhanced data lineage transparency through integration with the Unity Catalog. This makes it a good use-case for monitoring inquires.
Overview
As an example, we’ll build a simple vector search for PDF.
- Store the raw PDF file in a Volume in a Unity Catalog enabled workspace
- Implement a chunking strategy on the document (be sure to enable Change Data Feed as a part of the processing)
- Create a Vector Search (VS) Endpoint
- Create a VS Index connected to the VS Endpoint
- Send queries to the VS endpoint for information retrieval
- Write queries, index values, and cosine similarity score to UC as a part of the monitoring strategy
- Create a Lakeview Dashboard for monitoring results
Code
Step 1: Chunk the PDF
I’ll be using a document regarding policies of a fake company (ChatGPT did a great job at creating a sample).
!pip install PyPDF2
dbutils.library.restartPython()
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from pyspark.sql.types import StructType, StructField, StringType
pdf_name = 'Welcome_to_NexaTech_Solutions'
path = f'/Volumes/main/default/sample_pdf/{pdf_name}.pdf'
with open(path,'rb') as f:
pdf_reader = PdfReader(f)
full_text = ''
num_pages = len(pdf_reader.pages)
for page_num in range(num_pages):
page = pdf_reader.pages[page_num]
text = page.extract_text()
full_text += text
text_splitter = CharacterTextSplitter(
separator = '•',
chunk_size= 100,
chunk_overlap = 50,
length_function = len,
is_separator_regex= False
)
docs = text_splitter.create_documents([full_text])
Step 2: Summarize the chunks and add a primary key
For this next piece of code, you’re going to want to make sure you have a foundational model in mind for summarizing the chunked text since we’ll be using model serving for low latency processing. Here, I’m using an external model called openai-completions-endpoint
. We’ll register a UDF that takes in these chunks and summarizes them as best as possible. The summarizer will be useful for display the information in our Lakeview dashboard since my chunking strategy is not very refined and might have broken blocks of text that will appear as incomplete information to the end-user.
def chat_prompt(block_of_text):
import mlflow.deployments
client = mlflow.deployments.get_deploy_client("databricks")
PROMPT = f"""
You are a summarizer for blocks of text. This text may be incomplete - do your best. Do not make up any information and keep your summary to a minimum.
BLOCK OF TEXT
{block_of_text}
"""
completions_response = client.predict(
endpoint="openai-completions-endpoint",
inputs={
"prompt": PROMPT,
"temperature": 0.1,
"max_tokens": 75,
"n": 1
}
)
AI_response = completions_response["choices"][0]["text"]
return AI_response
from pyspark.sql.functions import udf, col
from pyspark.sql.types import StringType
chat_prompt_udf = udf(chat_prompt, StringType())
schema = StructType([
StructField('page_content',
StringType(),
True)
])
doc_data = [(doc.page_content,) for doc in docs]
df = spark.createDataFrame(doc_data, schema)
df = df.withColumn("chunk_summary", chat_prompt_udf(col('page_content')))
df.write.format('delta').mode('overwrite').saveAsTable(f'{pdf_name}_chunked')
Add a monotonically increasing index (used as the primary key when creating the index table later) and enable CDF.
q = f'SELECT monotonically_increasing_id() AS index_column_name, * FROM {pdf_name}_chunked;'
df2 = spark.sql(q)
df2.write.format('delta').mode('overwrite').option("mergeSchema", "true").saveAsTable(f'{pdf_name}_chunked')
spark.sql(f"ALTER TABLE main.default.{pdf_name}_chunked SET TBLPROPERTIES (delta.enableChangeDataFeed = true)")
Step 3: Create VS Endpoint and VS Index
Now that we have our table chunked and ready to be indexed, let’s create a VS Endpoint. You can use the UI to do this; go to Compute > Vector Search > Create. This will take roughly 5–10 minutes to provision the resource. After that’s completed, go to Catalog > chunked table > Create and select Vector Search Index. This whole process will take roughly 10 minutes, depending on how fast the VS Endpoint can be spun up.
Step 4: Create the monitoring table
def create_monitor_table(questions):
# Initialize an empty dictionary to store results
saved_results = {}
# Loop through each question
for question in questions:
# Perform similarity search using your specific index object
results = index.similarity_search(
num_results=2, # Retrieve the top 2 results
columns=["page_content", "index_column_name", "chunk_summary"],
query_text=question
)
# Extract the `docs` from the search results
docs = results.get('result', {}).get('data_array', [])
# Retrieve the desired fields
doc = {"page_content": docs[0][0],
"index_column_name": docs[0][1],
"chunk_summary": docs[0][2],
"vector_search_score": docs[0][3]
}
# Save the results in the dictionary with the question as the key
saved_results[question] = [doc]
# Flatten the dictionary into a list of rows
rows = []
for question, docs in saved_results.items():
for doc in docs:
rows.append((doc["page_content"], doc["chunk_summary"], doc["index_column_name"], question, doc["vector_search_score"]))
# Define column names
columns = ["page_content", "chunk_summary", "index_name", "question", "vector_search_score"]
# Create DataFrame from the list of rows
df = spark.createDataFrame(rows, columns).select('index_name', 'vector_search_score', 'question', 'page_content', 'chunk_summary')
display(df)
return df
question = <list_of_sample_questions>
df1 =create_monitor_table(questions)
df1.write.format("delta").mode("append").saveAsTable('main.default.vector_search_monitoring')
Step 5: Monitor the results in Lakeview
Here is an example layout of how we can monitor these results.
In the Data tab we can pull in the relevant tables — here I just pulled in my monitoring table as well as created a new table that summarized the various statistics of the cosine similarity score. Moving your cursor over the blocks will give a description of what that block of text is concerned with (see picture above). We can also add a slider to discover which questions fall within a particular range of values.
Conclusion
This post describes how we can utilize the Vector Search capabilities of Databricks and tie that into a Lakeview dashboard for monitoring queries users perform on their documents. Though the cosine similarity score is not a perfect indicator of relevant material, it is certainly a step in the right direction.