Back to blog

Tuesday, December 31, 2024

Building a RAG-Powered Chatbot using LangChain and Pinecone in Next js

cover

Building a PDF Knowledge Chatbot with Next.js, OpenAI, and Pinecone

This guide will walk you through building a chatbot that can answer questions based on PDF content using Next.js 13+, OpenAI, LangChain, Pinecone DB, and the Vercel AI SDK.

What are LLMs?

Large Language Models, such as OpenAI's ChatGPT, Anthropic's Claude or open-weight models such as Mixtral 8X7B, are an advanced form of artificial intelligence algorithm that processes, generates, and understands human language.

LLMs are built upon deep learning techniques, particularly transformer neural networks, and are trained on vast datasets comprising diverse text sources.

LLMs are capable of various language-related tasks, including translation, summarization, question-answering, and creative writing.

Their large scale and extensive training enable them to generate contextually relevant, coherent, and often highly nuanced text outputs, making them valuable tools in natural language processing (NLP) applications.

Challenges in Building Chatbots with LLMs

1. Hallucination and Information Reliability

When LLMs generate responses without a retrieval mechanism, they often produce confident-sounding but incorrect information. This happens because the models are essentially making educated guesses based on their training data, rather than referencing specific, verified sources. For example, if a user asks about company policies or product specifications, the LLM might generate plausible-sounding responses that completely contradict the actual documentation. This becomes particularly problematic in professional settings where accuracy is crucial, such as customer support or technical documentation chatbots.

2. Knowledge Staleness and Training Data Limitations

LLMs are bound by their training cutoff dates, creating a significant limitation in their ability to provide current information. For instance, an LLM trained with data up to 2023 cannot accurately answer questions about events, products, or regulations that emerged in 2024. This becomes especially problematic for businesses where information changes frequently, such as in the technology sector or financial services. Without RAG, there's no way to inject new knowledge into the system without going through the expensive and time-consuming process of retraining or fine-tuning the entire model.

3. Context Management and Response Accuracy

Without RAG, managing context in conversations becomes extremely challenging. LLMs have a fixed context window, typically ranging from a few thousand to a few hundred thousand tokens. When trying to build a chatbot that needs to reference extensive documentation or maintain long conversation histories, this limitation becomes severe. For instance, if a user asks a follow-up question referencing earlier parts of the conversation that have fallen outside the context window, the chatbot might provide inconsistent or incorrect responses because it has "forgotten" the relevant context.

4. Domain-Specific Knowledge and Customization

Building domain-specific chatbots without RAG is particularly challenging because LLMs have generalized knowledge rather than deep expertise in specific areas. For example, if you're building a chatbot for a healthcare provider, the LLM might mix general medical knowledge with outdated or incorrect information since it can't reference the organization's specific protocols and guidelines. This lack of ability to incorporate proprietary or specialized knowledge makes it difficult to create chatbots that can accurately represent a specific organization's voice, policies, and expertise. Furthermore, updating the chatbot's knowledge requires retraining the entire model, which is both expensive and time-consuming compared to simply updating a knowledge base in a RAG-based system.

What is RAG (Retrieval-Augmented Generation)

RAG is a framework that combines the power of LLMs with a retrieval system that can access external knowledge. Think of it as giving your LLM a specialized library that it can reference in real-time. Instead of relying solely on its training data, the model can now "look up" relevant information before generating a response. This process happens in three main steps:

1. The Retrieval Phase

Documents are chunked and converted into embeddings (vector representations) These embeddings are stored in a vector database (like Pinecone) When a query comes in, the system finds the most relevant documents through semantic search

2. The Augmentation Phase

Retrieved relevant documents are combined with the user's query This creates a "knowledge-enhanced" prompt for the LLM The system can include source information and context

3. The Generation Phase

The LLM uses both the retrieved information and its general knowledge Generates responses grounded in specific, relevant documents Can cite sources and provide evidence for its answers

Why Do We Need RAG?

Knowledge Accuracy and Currency

Overcomes the LLM's training data cutoff limitation Provides access to up-to-date information Reduces hallucination by grounding responses in actual documents

Cost Efficiency

More efficient than fine-tuning models for specific domains Reduces the need for large context windows Easier to update knowledge without retraining

Transparency and Trust

Responses can be traced back to source documents Easier to verify and audit information Increases user confidence in the system

RAG Architecture Components:

1. Data Processing Pipeline

  • Document Loading: Ingesting various file formats (PDFs, docs, text)
  • Text Chunking: Breaking documents into manageable segments
  • Text Cleaning: Removing noise and standardizing format
  • Embedding Generation: Converting text to vector representations
  • Vector Storage: Organizing embeddings in vector databases

2. Retrieval System

  • Query Processing: Converting user queries to vector form
  • Semantic Search: Finding relevant document chunks
  • Similarity Metrics: Cosine similarity, dot product calculations
  • Context Window Management: Optimizing retrieved content size
  • Ranking Mechanisms: Prioritizing most relevant information

3. Augmentation Process

  • Context Assembly: Combining retrieved documents
  • Prompt Engineering: Creating effective prompts with context
  • Source Attribution: Maintaining reference to original documents
  • Relevance Scoring: Weighing importance of retrieved information
  • Knowledge Integration: Merging multiple information sources

4. Generation Component

  • Response Synthesis: Combining retrieved info with LLM knowledge
  • Fact Verification: Cross-checking against sources
  • Source Citation: Including references in responses
  • Confidence Scoring: Indicating reliability of information
  • Format Control: Structuring responses appropriately

Implementation Flow:

1. Document Processing
   └── Raw Documents
       └── Text Extraction
           └── Chunking
               └── Embedding Generation
                   └── Vector Storage

2. Query Handling
   └── User Query
       └── Query Embedding
           └── Vector Search
               └── Context Retrieval
                   └── Prompt Construction

3. Response Generation
   └── Context + Query
       └── LLM Processing
           └── Response Generation
               └── Source Attribution
                   └── Final Output

Key Technologies and Tools:

Vector Databases

  • Pinecone: Scalable vector search
  • Weaviate: Knowledge graph capabilities
  • Milvus: High-performance vector operations
  • FAISS: Efficient similarity search

Embedding Models

  • OpenAI Embeddings
  • Sentence Transformers
  • BERT-based models
  • Domain-specific embeddings

LLM Integration

  • OpenAI GPT models
  • Anthropic Claude
  • Local LLMs (Llama, Mistral)
  • Custom fine-tuned models

Development Frameworks

  • LangChain: RAG pipeline development
  • LlamaIndex: Document processing
  • Haystack: Search and retrieval
  • Custom implementations

Chat UI

  • Velcel AI SDK: Helper functions to chat with AI

Prerequisites

  • Node.js 18+ installed
  • OpenAI API key
  • Pinecone account and API key
  • Basic knowledge of React and Next.js
  • PDF URL containing the knowledge base

Step 1: Project Setup

  1. Create a new Next.js project:
npx create-next-app@latest pdf-chatbot --typescript --tailwind --app
cd pdf-chatbot
  1. Install required dependencies:
pnpm add @pinecone-database/pinecone
pnpm add langchain @langchain/community @langchain/core pdf-parse
pnpm add @langchain/pinecone @langchain/openai @langchain/core @pinecone-database/pinecone
pnpm add @langchain/openai @langchain/core
pnpm add ai
pnpm add @vercel/ai
pnpm add pdf-parse
pnpm add --save @types/pdf-parse
pnpm add axios
pnpm add openai
  1. Set up environment variables in .env.local:
# Required from externals tools.
OPENAI_API_KEY=
PINECONE_API_KEY=
UPLOADTHING_TOKEN=

# Usually for free pinecone account env is "us-west4-gcp-free"
PINECONE_ENVIRONMENT='us-west4-gcp-free'

# The index can be created directly or will
# be created when you run prepare-data npm command
PINECONE_INDEX_NAME=
  1. Create a config file to export env variables
import z from "zod";

const envSchema = z.object({
  OPENAI_API_KEY: z.string().trim().min(1),
  PINECONE_API_KEY: z.string().trim().min(1),
  PINECONE_INDEX_NAME: z.string().trim().min(1),
});

export const env = envSchema.parse(process.env);

Create some re-usable functions in utils.ts

export function scrollToBottom(containerRef: React.RefObject<HTMLElement>) {
  if (containerRef.current) {
    const lastMessage = containerRef.current.lastElementChild;
    if (lastMessage) {
      const scrollOptions: ScrollIntoViewOptions = {
        behavior: "smooth",
        block: "end",
      };
      lastMessage.scrollIntoView(scrollOptions);
    }
  }
}

// Reference:
// github.com/hwchase17/langchainjs/blob/357d6fccfc78f1332b54d2302d92e12f0861c12c/examples/src/guides/expression_language/cookbook_conversational_retrieval.ts#L61
export const formatChatHistory = (chatHistory: [string, string][]) => {
  const formattedDialogueTurns = chatHistory.map(
    (dialogueTurn) => `Human: ${dialogueTurn[0]}\nAssistant: ${dialogueTurn[1]}`
  );

  return formattedDialogueTurns.join("\n");
};

export function formattedText(inputText: string) {
  return inputText
    .replace(/\n+/g, " ") // Replace multiple consecutive new lines with a single space
    .replace(/(\w) - (\w)/g, "$1$2") // Join hyphenated words together
    .replace(/\s+/g, " "); // Replace multiple consecutive spaces with a single space
}

// Default UI Message
export const initialMessages: Message[] = [
  {
    role: "assistant",
    id: "0",
    content:
      "Hi! I am your PDF assistant. I am happy to help with your questions about your PDF about German law.",
  },
];

interface Data {
  sources: string[];
}

// Maps the sources with the right ai-message
export const getSources = (data: Data[], role: string, index: number) => {
  if (role === "assistant" && index >= 2 && (index - 2) % 2 === 0) {
    const sourcesIndex = (index - 2) / 2;
    if (data[sourcesIndex] && data[sourcesIndex].sources) {
      return data[sourcesIndex].sources;
    }
  }
  return [];
};

Step 2: Create the Pinecone Client

  1. Create utils/pinecone-client.ts:
import { Pinecone } from "@pinecone-database/pinecone";
import { env } from "./config";
// Initialize index and ready to be accessed.
async function initPineconeClient() {
  try {
    const pineconeClient = new Pinecone({
      apiKey: env.PINECONE_API_KEY,
    });
    await pineconeClient.createIndex({
      name: env.PINECONE_INDEX_NAME,
      dimension: 1536,
      metric: "cosine",
      spec: {
        serverless: {
          cloud: "aws",
          region: "us-east-1",
        },
      },
      // This option tells the client not to throw if the index already exists.
      suppressConflicts: true,

      // This option tells the client not to resolve the promise until the
      // index is ready.
      waitUntilReady: true,
    });
    return pineconeClient;
  } catch (error) {
    console.error("error", error);
    throw new Error("Failed to initialize Pinecone Client");
  }
}

export async function getPineconeClient() {
  const pineconeClientInstance = await initPineconeClient();
  return pineconeClientInstance;
}

Step 3: PDF Processing

  1. Create utils/pdf-loader.ts:
import { WebPDFLoader } from "@langchain/community/document_loaders/web/pdf";
import { PDFLoader } from "@langchain/community/document_loaders/fs/pdf";
import { RecursiveCharacterTextSplitter } from "langchain/text_splitter";
import { Document } from "langchain/document";
import axios from "axios";

export type PDFSource = {
  type: "url" | "local" | "buffer";
  source: string | Buffer;
};

export async function getChunkedDocsFromPDF(pdfSource: PDFSource) {
  let docs: Document[] = [];

  try {
    switch (pdfSource.type) {
      case "url": {
        // Download PDF from URL
        const response = await axios.get(pdfSource.source as string, {
          responseType: "arraybuffer",
        });
        const pdfBlob = new Blob([response.data], { type: "application/pdf" });
        const loader = new WebPDFLoader(pdfBlob);
        docs = await loader.load();
        break;
      }
      case "local": {
        // Handle local file system PDF using PDFLoader
        const loader = new PDFLoader(pdfSource.source as string);
        docs = await loader.load();
        break;
      }
      case "buffer": {
        // Handle Buffer (e.g., from fs.readFile)
        const pdfBlob = new Blob([pdfSource.source as Buffer], {
          type: "application/pdf",
        });
        const loader = new WebPDFLoader(pdfBlob);
        docs = await loader.load();
        break;
      }
      default:
        throw new Error("Unsupported PDF source type");
    }

    // Split into chunks
    const textSplitter = new RecursiveCharacterTextSplitter({
      chunkSize: 1000,
      chunkOverlap: 200,
    });

    const chunkedDocs = await textSplitter.splitDocuments(docs);
    return chunkedDocs;
  } catch (e) {
    console.error(e);
    throw new Error("PDF docs chunking failed!");
  }
}

Store the data in the Pinecone DB

  1. Create utils/vector-store.ts:
import { env } from "./config";
import { PineconeStore } from "@langchain/pinecone";
import { OpenAIEmbeddings } from "@langchain/openai";
import { Pinecone as PineconeClient } from "@pinecone-database/pinecone";

export async function embedAndStoreDocs(
  client: PineconeClient,
  // @ts-ignore docs type error
  docs: Document<Record<string, any>>[]
) {
  /*create and store the embeddings in the vectorStore*/
  try {
    const embeddings = new OpenAIEmbeddings();
    const index = client.Index(env.PINECONE_INDEX_NAME);

    //embed the PDF documents
    await PineconeStore.fromDocuments(docs, embeddings, {
      pineconeIndex: index,
      textKey: "text",
    });
  } catch (error) {
    console.log("error ", error);
    throw new Error("Failed to load your docs !");
  }
}

// Returns vector-store handle to be used a retrievers on langchains
export async function getVectorStore(client: PineconeClient) {
  try {
    const embeddings = new OpenAIEmbeddings();
    const index = client.Index(env.PINECONE_INDEX_NAME);

    const vectorStore = await PineconeStore.fromExistingIndex(embeddings, {
      pineconeIndex: index,
      textKey: "text",
    });

    return vectorStore;
  } catch (error) {
    console.log("error ", error);
    throw new Error("Something went wrong while getting vector store !");
  }
}

Create a server action to Prepare the PDF, Chunck it and Store it

  1. Create actions/prepare.ts:
"use server";

import { getChunkedDocsFromPDF, PDFSource } from "@/lib/pdf-loader";
import { embedAndStoreDocs } from "@/lib/vector-store";
import { getPineconeClient } from "@/lib/pinecone-client";

export async function prepare(source: PDFSource) {
  try {
    const pineconeClient = await getPineconeClient();
    console.log("Preparing chunks from PDF file");
    const docs = await getChunkedDocsFromPDF(source);
    console.log(`Loading ${docs.length} chunks into pinecone...`);
    await embedAndStoreDocs(pineconeClient, docs);
    console.log("Data embedded and stored in pine-cone index");
  } catch (error) {
    console.error("Init client script failed ", error);
  }
}

Create the Langchain API

  1. Create utils/langchain.ts
import { ChatOpenAI } from "@langchain/openai";
import { ChatPromptTemplate } from "@langchain/core/prompts";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { RunnableSequence } from "@langchain/core/runnables";
import { VectorStore } from "@langchain/core/vectorstores";

interface ProcessMessageArgs {
  userPrompt: string;
  conversationHistory: string;
  vectorStore: VectorStore;
  model: ChatOpenAI;
}

interface ProcessMessageResponse {
  answer: string;
  inquiry: string;
}

export async function processUserMessage({
  userPrompt,
  conversationHistory,
  vectorStore,
  model,
}: ProcessMessageArgs) {
  try {
    // Create non-streaming model for inquiry generation
    const nonStreamingModel = new ChatOpenAI({
      modelName: "gpt-3.5-turbo",
      temperature: 0,
      streaming: false,
    });

    // Generate focused inquiry using non-streaming model
    const inquiryResult = await inquiryPrompt
      .pipe(nonStreamingModel)
      .pipe(new StringOutputParser())
      .invoke({
        userPrompt,
        conversationHistory,
      });

    // Get relevant documents
    const relevantDocs = await vectorStore.similaritySearch(inquiryResult, 3);
    const context = relevantDocs.map((doc) => doc.pageContent).join("\n\n");

    // Generate answer using streaming model
    // const answer = await qaPrompt
    //   .pipe(model)
    //   .pipe(new StringOutputParser())
    //   .stream({
    //     context,
    //     question: inquiryResult,
    //   });

    return qaPrompt.pipe(model).pipe(new StringOutputParser()).stream({
      context,
      question: inquiryResult,
    });
  } catch (error) {
    console.error("Error processing message:", error);
    throw new Error("Failed to process your message");
  }
}

// Updated prompt templates
const inquiryPrompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    `Given the following user prompt and conversation log, formulate a question that would be the most relevant to provide the user with an answer from a knowledge base.
    
    Rules:
    - Always prioritize the user prompt over the conversation log
    - Ignore any conversation log that is not directly related to the user prompt
    - Only attempt to answer if a question was posed
    - The question should be a single sentence
    - Remove any punctuation from the question
    - Remove any words that are not relevant to the question
    - If unable to formulate a question, respond with the same USER PROMPT received`,
  ],
  [
    "human",
    `USER PROMPT: {userPrompt}\n\nCONVERSATION LOG: {conversationHistory}`,
  ],
]);

const qaPrompt = ChatPromptTemplate.fromMessages([
  [
    "system",
    `You are an AI assistant specialized in providing accurate, context-based responses. Analyze the provided context carefully and follow these guidelines:

    CORE RESPONSIBILITIES:
    - Base responses primarily on the provided context
    - Cite specific parts of the context to support answers
    - Maintain high accuracy and transparency
    - Acknowledge limitations clearly

    RESPONSE GUIDELINES:
    1. Use the context precisely and effectively
    2. Distinguish between context-based facts and general knowledge
    3. Structure responses clearly and logically
    4. Include relevant quotes when beneficial
    5. State confidence levels when appropriate

    IMPORTANT RULES:
    - Never make up information not present in the context
    - Don't speculate beyond the given information
    - If the context is insufficient, explicitly state what's missing
    - Ask for clarification if the question is ambiguous

    When you cannot answer based on the context:
    1. State clearly that the context lacks the necessary information
    2. Explain what specific information would be needed
    3. Suggest how the question might be refined

    Context: {context}`,
  ],
  ["human", "Question: {question}"],
]);

Step 4: Create the Chat API Route

  1. Create app/api/chat/route.ts:
import { NextRequest, NextResponse } from "next/server";
import { Message, streamText } from "ai";
import { LangChainAdapter } from "ai";
import { getVectorStore } from "@/lib/vector-store";
import { StringOutputParser } from "@langchain/core/output_parsers";
import { ChatOpenAI } from "@langchain/openai";
import { processUserMessage } from "@/lib/langchain";
import { getPineconeClient } from "@/lib/pinecone-client";

// Allow streaming responses up to 30 seconds
export const maxDuration = 30;

export async function POST(req: NextRequest) {
  try {
    // Parse and validate request
    const body = await req.json();
    const messages: Message[] = body.messages ?? [];

    if (!messages.length) {
      return NextResponse.json(
        { error: "No messages provided" },
        { status: 400 }
      );
    }
    const currentQuestion = messages[messages.length - 1].content;
    if (!currentQuestion?.trim()) {
      return NextResponse.json(
        { error: "Empty question provided" },
        { status: 400 }
      );
    }
    // Format conversation history
    const formattedPreviousMessages = messages
      .slice(0, -1)
      .map(
        (message) =>
          `${message.role === "user" ? "Human" : "Assistant"}: ${
            message.content
          }`
      )
      .join("\n");

    // Initialize model and vector store
    const model = new ChatOpenAI({
      modelName: "gpt-3.5-turbo",
      // temperature: 0.1,
      streaming: true,
    });
    const pc = await getPineconeClient();
    const vectorStore = await getVectorStore(pc);
    const parser = new StringOutputParser();
    const stream = await processUserMessage({
      userPrompt: currentQuestion,
      conversationHistory: formattedPreviousMessages,
      vectorStore,
      model,
    });
    console.log("message answer =>", stream);
    // console.log("message inquiry =>", inquiry);
    // Convert the stream using the new adapter
    const response = LangChainAdapter.toDataStreamResponse(stream);
    return response;
  } catch (error) {
    console.error("Chat endpoint error:", error);
    return NextResponse.json(
      { error: "An unexpected error occurred" },
      { status: 500 }
    );
  }
}

Step 5: Create the Chat Interface

  1. Create app/page.tsx:
import { DarkModeToggle } from "@/components/dark-mode-toggle";
import { Chat } from "@/components/chat";

export default function Home() {
  return (
    <main className="relative container flex min-h-screen flex-col">
      <div className=" p-4 flex h-14 items-center justify-between supports-backdrop-blur:bg-background/60 sticky top-0 z-50 w-full border-b bg-background/95 backdrop-blur">
        <span className="font-bold">pdf-chat-ai-sdk</span>
        <DarkModeToggle />
      </div>
      <div className="flex flex-1 py-4">
        <div className="w-full">
          <Chat />
        </div>
      </div>
    </main>
  );
}

Step 6: Create the Upload component

  1. Create a PDF File uploader component in /components/PDFFileUploader.tsx
import { UploadButton, UploadDropzone } from "@/lib/uploadthing";
import { Pencil, XCircle } from "lucide-react";
import React from "react";
import toast from "react-hot-toast";
import {
  FaFilePdf,
  FaImage,
  FaFileWord,
  FaFileExcel,
  FaFileArchive,
  FaFilePowerpoint,
  FaFileAlt,
} from "react-icons/fa";
import { MdTextSnippet } from "react-icons/md"; // For .txt files

type PDFUploadInputProps = {
  label: string;
  file: FileProps | null;
  setFile: any;
  className?: string;
  endpoint?: any;
};
export type FileProps = {
  title: string;
  type: string;
  size: number;
  url: string;
};
export function getFileIcon(extension: string | undefined) {
  switch (extension) {
    case "pdf":
      return <FaFilePdf className="w-6 h-6 flex-shrink-0 mr-2 text-red-500" />;
    case "jpg":
    case "jpeg":
    case "png":
    case "gif":
      return <FaImage className="w-6 h-6 flex-shrink-0 mr-2 text-gray-600" />;
    case "doc":
    case "docx":
      return (
        <FaFileWord className="w-6 h-6 flex-shrink-0 mr-2 text-blue-500" />
      );
    case "xls":
    case "xlsx":
      return (
        <FaFileExcel className="w-6 h-6 flex-shrink-0 mr-2 text-green-500" />
      );
    case "ppt":
    case "pptx":
      return (
        <FaFilePowerpoint className="w-6 h-6 flex-shrink-0 mr-2 text-orange-500" />
      );
    case "zip":
    case "gzip":
    case "tar":
      return (
        <FaFileArchive className="w-6 h-6 flex-shrink-0 mr-2 text-yellow-600" />
      );
    case "txt":
      return (
        <MdTextSnippet className="w-6 h-6 flex-shrink-0 mr-2 text-gray-500" />
      );
    default:
      return <FaFileAlt className="w-6 h-6 flex-shrink-0 mr-2 text-gray-500" />; // Default icon for other file types
  }
}
export default function PDFFileUpload({
  label,
  file,
  setFile,
  className = "col-span-full",
  endpoint = "",
}: PDFUploadInputProps) {
  function handleImageRemove() {
    setFile(null);
  }
  const extension = file ? file.title.split(".").pop() : "pdf";
  return (
    <div className={className}>
      <div className="flex justify-between items-center mb-4">
        <label
          htmlFor="course-image"
          className="block text-sm font-medium leading-6 text-gray-900 dark:text-slate-50 mb-2"
        >
          {label}
        </label>
        {file && (
          <button
            onClick={() => setFile(null)}
            type="button"
            className="flex space-x-2 bg-slate-900 rounded-md shadow text-slate-50 py-2 px-4"
          >
            <Pencil className="w-5 h-5" />
            <span>Change Files</span>
          </button>
        )}
      </div>
      {file ? (
        <div className="grid grid-cols-1">
          <div className="relative mb-6">
            <button
              type="button"
              onClick={() => handleImageRemove()}
              className="absolute -top-4 -right-2 bg-slate-100 text-red-600 rounded-full "
            >
              <XCircle className="" />
            </button>
            <div className="py-2 rounded-md px-6 bg-white dark:bg-slate-800 text-slate-800 flex items-center dark:text-slate-200 border border-slate-200">
              {getFileIcon(extension)} {/* Render appropriate icon */}
              <div className="flex flex-col">
                <span className="line-clamp-1">{file.title}</span>
                {file.size > 0 && (
                  <span className="text-xs">
                    {(file.size / 1000).toFixed(2)} kb
                  </span>
                )}
              </div>
            </div>
          </div>
        </div>
      ) : (
        <UploadButton
          className="ut-allowed-content:hidden"
          endpoint={endpoint}
          onClientUploadComplete={(res) => {
            const item = res[0];
            const url = {
              url: item.url,
              title: item.name,
              size: item.size,
              type: item.type,
            };
            setFile(url);
            console.log(url);
            console.log(res);
            console.log("Upload Completed");
          }}
          onUploadError={(error) => {
            toast.error("File Upload Failed, Try Again");
            console.log(`ERROR! ${error.message}`, error);
          }}
        />
      )}
    </div>
  );
}
  1. Create upload/page.tsx to handle file upload:
"use client";
import { prepare } from "@/actions/prepare";
import PDFFileUpload, { FileProps } from "@/components/PDFFileUploader";
import { Button } from "@/components/ui/button";
import { PDFSource } from "@/lib/pdf-loader";

import { Loader2 } from "lucide-react";
import React, { useState } from "react";

export default function Page() {
  const [file, setFile] = useState<FileProps | null>(null);
  const [loading, setLoading] = useState(false);
  const [loadingMsg, setLoadingMsg] = useState("");
  async function submit() {
    try {
      setLoading(true);
      setLoadingMsg("Initializing Client and creating index...");

      const pdfSource: PDFSource = {
        type: "url",
        source: file?.url ?? "",
      };
      await prepare(pdfSource);
      setLoading(false);
    } catch (error) {
      setLoading(false);
      setLoadingMsg("");
      console.log(error);
    }
  }
  return (
    <div>
      <div className="flex flex-1 py-16">
        <div className="w-full max-w-2xl mx-auto">
          {file ? (
            <>
              {loading ? (
                <Button disabled>
                  <Loader2 className="animate-spin" />
                  {loadingMsg}
                </Button>
              ) : (
                <Button onClick={() => submit()}>Upload to Pine cone</Button>
              )}
            </>
          ) : (
            <PDFFileUpload
              label="Add Files"
              file={file}
              setFile={setFile}
              endpoint="pdfUpload"
            />
          )}
        </div>
      </div>
    </div>
  );
}

Create the Chat components

  1. Create components/chat.tsx
"use client";

import { scrollToBottom, initialMessages, getSources } from "@/lib/utils";
import { ChatLine } from "./chat-line";
import { useChat, Message } from "ai-stream-experimental/react";
import { Input } from "./ui/input";
import { Button } from "./ui/button";
import { Spinner } from "./ui/spinner";
import { useEffect, useRef } from "react";

export function Chat() {
  const containerRef = useRef<HTMLDivElement | null>(null);
  const { messages, input, handleInputChange, handleSubmit, isLoading, data } =
    useChat({
      initialMessages,
    });

  useEffect(() => {
    setTimeout(() => scrollToBottom(containerRef), 100);
  }, [messages]);

  return (
    <div className="rounded-2xl border h-[75vh] flex flex-col justify-between">
      <div className="p-6 overflow-auto" ref={containerRef}>
        {messages.map(({ id, role, content }: Message, index) => (
          <ChatLine
            key={id}
            role={role}
            content={content}
            // Start from the third message of the assistant
            sources={data?.length ? getSources(data, role, index) : []}
          />
        ))}
      </div>

      <form onSubmit={handleSubmit} className="p-4 flex clear-both">
        <Input
          value={input}
          placeholder={"Type to chat with AI..."}
          onChange={handleInputChange}
          className="mr-2"
        />

        <Button type="submit" className="w-24">
          {isLoading ? <Spinner /> : "Ask"}
        </Button>
      </form>
    </div>
  );
}

Create Chatline component

import Balancer from "react-wrap-balancer";
import {
  Card,
  CardContent,
  CardDescription,
  CardFooter,
  CardHeader,
  CardTitle,
} from "@/components/ui/card";
import {
  Accordion,
  AccordionContent,
  AccordionItem,
  AccordionTrigger,
} from "@/components/ui/accordion";
import { Message } from "ai/react";
import ReactMarkdown from "react-markdown";
import { formattedText } from "@/lib/utils";

const convertNewLines = (text: string) =>
  text.split("\n").map((line, i) => (
    <span key={i}>
      {line}
      <br />
    </span>
  ));

interface ChatLineProps extends Partial<Message> {
  sources: string[];
}

export function ChatLine({
  role = "assistant",
  content,
  sources,
}: ChatLineProps) {
  if (!content) {
    return null;
  }
  const formattedMessage = convertNewLines(content);

  return (
    <div>
      <Card className="mb-2">
        <CardHeader>
          <CardTitle
            className={
              role != "assistant"
                ? "text-amber-500 dark:text-amber-200"
                : "text-blue-500 dark:text-blue-200"
            }
          >
            {role == "assistant" ? "AI" : "You"}
          </CardTitle>
        </CardHeader>
        <CardContent className="text-sm">
          <Balancer>{formattedMessage}</Balancer>
        </CardContent>
        <CardFooter>
          <CardDescription className="w-full">
            {sources && sources.length ? (
              <Accordion type="single" collapsible className="w-full">
                {sources.map((source, index) => (
                  <AccordionItem value={`source-${index}`} key={index}>
                    <AccordionTrigger>{`Source ${index + 1}`}</AccordionTrigger>
                    <AccordionContent>
                      <ReactMarkdown linkTarget="_blank">
                        {formattedText(source)}
                      </ReactMarkdown>
                    </AccordionContent>
                  </AccordionItem>
                ))}
              </Accordion>
            ) : (
              <></>
            )}
          </CardDescription>
        </CardFooter>
      </Card>
    </div>
  );
}

Step 7: Deploy to Vercel

  1. Push your code to GitHub

  2. Connect your repository to Vercel:

    • Create a new project on Vercel
    • Import your GitHub repository
    • Add your environment variables in the Vercel project settings
    • Deploy the project
  3. After deployment, run the PDF loading script:

npm run load-pdf

Usage

  1. Start the development server:
npm run dev
  1. Visit http://localhost:3000 to interact with your chatbot

  2. Ask questions about the content from your PDF!

Key Features

  • Streaming responses for better user experience
  • PDF content chunking and vector storage
  • Semantic search for relevant context
  • Real-time chat interface with Tailwind CSS styling
  • Full TypeScript support
  • Optimized for production deployment on Vercel

Troubleshooting

  1. Pinecone Connection Issues

    • Verify your API key and environment
    • Check if your index exists and has the correct dimension (1536 for OpenAI embeddings)
    • Ensure your IP is allowlisted in Pinecone
  2. PDF Loading Issues

    • Verify the PDF URL is accessible
    • Check for PDF formatting issues
    • Monitor console for specific error messages
  3. OpenAI API Issues

    • Verify your API key
    • Check your rate limits
    • Monitor usage in the OpenAI dashboard

Next Steps

To enhance your chatbot, consider:

  1. Adding authentication
  2. Implementing rate limiting
  3. Adding support for multiple PDFs
  4. Creating a PDF upload interface
  5. Adding conversation history storage
  6. Implementing error boundaries and loading states
  7. Adding typing indicators and animations

Remember to monitor your API usage and costs, as both OpenAI and Pinecone have usage-based pricing models.