Langchain is basically an interface and wrapper of LLM. The goal is to simplify, and also make it easier to do pre and post-processing of the LLM input-output so we can build the LLM applications easily. Here is a good start for Langchain:
https://langchainers.hashnode.dev/getting-started-with-langchainjs
My experience with Langchain, it works out of the box. We can cut many manual steps with Langchain. Also as new tools or libraries, there are some learning curves. Mainly because lack of documentation and tutorial (Especially for Node JS Client Library!)
For this project, we will re-create previous project, but use Langchain.
Previous Project's Article:
https://dev.fandyaditya.com/semantic-search-using-open-ai-embedding-pinecone-vector-db-and-node-js
Prerequisite
Database
Same as the previous trial, we will use pinecone. Create an account at pinecone.io first.
And then at the console, create an Index.
Index Name: whateveruwant (i named it"article")
Dimensions: 1536
Metric: cosine
Pod Type: S1 or P1
Library
"@pinecone-database/pinecone"
"dotenv"
"express"
"langchain"
"openai"
npm install @pinecone-database/pinecone dotenv express langchain openai
ENV
As usual, we need OpenAI API key, Pinecone ENV, and Pinecone API key
OPENAI_API_KEY
PINECONE_ENVIRONMENT
PINECONE_API_KEY
Article
Select some article from the internet, and put it in article.txt
. I choose this article: https://jamesclear.com/saying-no
Code
Our project structure will look like this:
node_modules
.env
article.txt
embed.js
package.json
server.js
pinecone.js
We will go to pinecone.js
first:
//pinecone.js
/**
Read .env
**/
import * as dotenv from 'dotenv';
dotenv.config();
/**
Init Pinecone
**/
import { PineconeClient } from '@pinecone-database/pinecone';
const pinecone = new PineconeClient();
await pinecone.init({
environment: process.env.PINECONE_ENVIRONMENT,
apiKey: process.env.PINECONE_API_KEY
});
/**
Export the pinecone index function, named it based on your created index in pinecone. We will need this for Pinecone langchain wrapper.
**/
export const index = pinecone.Index('article')
Now in embed.js
//embed.js
/**
Read .env file
**/
import * as dotenv from 'dotenv';
dotenv.config();
/**
Open AI Embedding wrapper from langchain
**/
import { OpenAIEmbeddings } from 'langchain/embeddings';
/**
Chunk text/text splitter function from langchain
**/
import { RecursiveCharacterTextSplitter } from 'langchain/text_splitter';
/**
Pinecone wrapper from langchain
**/
import { PineconeStore } from 'langchain/vectorstores';
/**
Init fs
**/
import * as fs from 'fs';
/**
Create text splitter with chunksize 1000 character
**/
const textSplitter = new RecursiveCharacterTextSplitter({ chunkSize: 1000, chunkOverlap: 0});
/**
Init OpenAI Embeddings
**/
const embedder = new OpenAIEmbeddings();
/**
Get index from our pinecone.js
**/
import { index } from './pinecone.js';
(async () => {
//read article
const article = await fs.readFileSync('article.txt', { encoding: 'utf-8' });
//split the text
const splittedText = await textSplitter.createDocuments([article]);
//store the splitted text to pinecone, in index "article" and namespace "langchain" (namespace is for filter purpose later, can be whatever you want)
PineconeStore.fromDocuments(splittedText, embedder, { pineconeIndex: index, namespace: 'langchain' });
})()
Now run:
node embed.js
Check pinecone console. If this success, it will create vector data in Pinecone db
Now in server.js
,create express app that pass query URL to search it in db using Langchain.
//server.js
/**
Import required langchain and pinecone library like in ./embed.js
**/
import { PineconeStore } from 'langchain/vectorstores';
import { index } from './pinecone.js';
import { OpenAIEmbeddings } from 'langchain/embeddings';
const embedder = new OpenAIEmbeddings();
const pineconeStore = new PineconeStore(embedder, { pineconeIndex: index, namespace: 'langchain' });
/**
Init express app
**/
import express from 'express';
const app = express();
const port = 9000;
app.get('/', async (req, res) => {
const { q } = req.query;
try {
const data = await pineconeStore.similaritySearch(q, 5);
res.status(200).send([...data])
}catch(err) {
res.status(404).send({ message: `${q} doesn't match any search` });
}
})
app.listen(port, () => {
console.log(`Example app listening on port ${port}`)
})
We tried to search based on vector similarity from Pinecone DB in index article
and namespace langchain
based on a passed query with 5 top similar results.
Now run:
node server.js
Tried to hit http://localhost:9000
Good job! You success to do a semantic search using Langchain in NodeJS!
Resources
Github for this project: https://github.com/fandyaditya/semantic-search-langchain