Chat With a CSV Using LangChain

With just a few lines of code, you can use natural language to chat directly with a CSV file.

In this tutorial, I’ll be taking you line by line to achieve results in less than 10 minutes.

While still a bit buggy, this is a pretty cool feature to implement in a test tool. It’s a great way to get your feet wet with python automations and ai.

Interested in discussing a Data or AI project? Feel free to reach out via email or simply complete the contact form on my website.

Â

To execute the code utilized in the tutorial, please pip install the following packages

!pip install langchain
!pip install langchain_openai
!pip install langchain_experimental

The data I’m using in the tutorial comes directly from one of my favorite websites, Baseball Reference.

The player’s stats used is Walter Johnson, who is widely considered one of the greatest Baseball players of all time.

You can visit his page here and download the data.

In this article, I’m going to be comparing the results of the CSV agent to that of using Python Pandas.

We also need to use Pandas to translate the CSV file into a Dataframe.Â

import pandas as pd
df = pd.read_csv('walter_johnson.csv')
df.head()

Before working on the CSV agent, let’s preview the first 5 rows of the dataframe.Â

CSV Agent

The first thing we are going to need to get this to work is an OpenAI API key. The follow code in a notebook will import this in.

Please swap out YOUR_API_KEY_GOES_HERE with your key,

import os
os.environ["OPENAI_API_KEY"] = "YOUR_API_KEY_GOES_HERE"

Next, let’s import create_csv_agent and ChatOpenAI

from langchain_experimental.agents.agent_toolkits import create_csv_agent
from langchain_openai import ChatOpenAI

Let’s create our LLM and Agent_executer. The temperature for the LLM will be 0.5. For the executer we pass in the LLM, the CSV file, and if we want verbose.Â

I’m going to keep verbose=True to see how we interact with the CSV file.

llm = ChatOpenAI(temperature=0.5)
agent_executer = create_csv_agent(llm, 'walter_johnson.csv', verbose=True)

Find the total wins

The total wins is the sum of the wins column. In pandas we would use: df[‘W’].sum()

agent_executer.invoke("How many total wins are there?")

Find the season with the most wins

The most wins is simply the max for a season in the wins column. In pandas we would use: df[‘W’].max()

agent_executer.invoke("Whats the most wins")

Find the average wins per season

In pandas we would use: df[‘W’].mean()

agent_executer.invoke("Whats the average wins")

Leave a Reply

Your email address will not be published. Required fields are marked *