AI, Esq.
Posts
Take Another Look at ChatGPT!

Take Another Look at ChatGPT!

An exciting ChatGPT release, some concerning research, and more!

Drew Edge
June 06, 2024

Notable finds

Hello GPT-4o

We’re announcing GPT-4 Omni, our new flagship model which can reason across audio, vision, and text in real time.

openai.com/index/hello-gpt-4o

OpenAI recently released its latest model for ChatGPT, called GPT-4o. That model has various updates from the prior GPT-4 model, and its performance appears to be stronger in terms of speed, accuracy, etc. But that’s not what I want to talk to you about.

This release is especially important for any of you who don’t pay for access to ChatGPT (I assume that’s most of you). Up until the recent update, OpenAI allowed free access to an earlier model, 3.5, and reserved the best and shiniest model for paying customers like myself.

No longer! Now free users have access to GPT-4o, as well as some other features that were previously restricted. Here is OpenAI’s summary of some of the new bells & whistles:

Experience GPT-4 level intelligence

Get responses from both the model and the web

Analyze data and create charts

Chat about photos you take

Upload files for assistance summarizing, writing or analyzing

Discover and use GPTs and the GPT Store

Build a more helpful experience with Memory

OpenAI

I feel compelled to note here, as many others have, that OpenAI’s product names are simply baffling. If you are confused by the difference between ChatGPT, the underlying GPT model, and a GPT that you might access at the GPT Store, you are certainly not alone! All the more reason to just play with this stuff rather than try to understand it from my description.

Anyway, here is the upshot:

Did you at some point mess around a bit with the free version of ChatGPT, and then you kind of stopped using it over time for whatever reason? Now is a great time to jump back in and reengage with this tool. Just sign back in to ChatGPT and play with it for a while. You might be surprised at how different it feels compared to the 3.5 version.

I’d also encourage you, while exploring, to take a look at the “GPT Store,” try out some of the custom GPT tools folks have made, and even try making your own (it is not hard to get started, I promise).

This is already getting a bit long, so I’m not going to attempt any sort of walkthrough or tutorial here. But the ChatGPT-4o announcement page has some great short videos embedded within it with basic information on how to use various aspects of these new freely-accessible capabilities.

AI on Trial: Legal Models Hallucinate in 1 out of 6 (or More) Benchmarking Queries

A new study reveals the need for benchmarking and public evaluations of AI tools in law.

hai.stanford.edu/news/ai-trial-legal-models-hallucinate-1-out-6-or-more-benchmarking-queries

Next up is an important paper from the Stanford University Human-Centered Artificial Intelligence Center (“HAI”), called "Hallucination-Free? Assessing the Reliability of Leading AI Legal Research Tools."

Linked above is a HAI’s very informative summary and discussion of its findings. It's certainly worth reviewing.

The “hallucination” problem of genAI tools generating incorrect information is well known. Everyone at this point has heard of cases of attorneys citing nonexistent cases to courts and getting in trouble for it. But you might have also heard various legal technology vendors claiming that their genAI products essentially do not hallucinate. Two major players that have been making similar claims concerning their genAI products are Lexis and Westlaw.

Without getting too technical, their tools use a process called retrieval augmented generation (“RAG”) to point a large language model at a knowledge base—which in this case consists of a large portion of the materials produced or licensed by those vendors—to generate answers. The idea is that by directing the model to take specific authoritative materials into account when composing responses, and even directing it to cite its sources, the hallucination issue will be solved. Unfortunately, reality is not quite so simple.

I'll say that having spent hours using the Lexis AI tool, and I have not found it to be free of accuracy errors. In fact, I find it to be in some ways less reliable than even a general tool like ChatGPT, albeit unreliable in different ways.

What this recent paper finds is that there are still quite significant accuracy problems when these tools are directed to answer legal questions and perform research. According to these findings, RAG is not a panacea that will fix large language models’ accuracy problems wholesale. This not only aligns with my personal experience, but the limitations of RAG have been also observed in many other domains as well.

So does that mean that the vendors have been untruthful in claiming little to no hallucination? Not exactly. These vendors’ hallucination claims may be accurate as long as they are defining the phenomenon quite narrowly. It is probably accurate to say that product like Lexis AI and Westlaw Precision do not have a tendency to serve up citations for cases that literally do not exist.

However, this is far from the only way that a genAI tool can give inaccurate information. I've experienced other accuracy issues firsthand. Often, where I have sought jurisdiction-specific information on a relativity complex legal question, Lexis AI has fallen short (I haven’t used Westlaw Precision outside of a couple demos, but the HAI paper suggests similar performance). One issue I have run into is that the tool generated information and links to authorities that simply have nothing to do with the specific question I asked. This is annoying, but at least would not be likely to lead to errors in final work product. More distressingly, sometimes the tool generates an incomplete or even simply inaccurate description of an authority it links to. This research paper, like my own experience, suggests that the answers generated by these tools simply are not trustworthy.

It should be noted that this paper is somewhat controversial. Naturally, it was subjected to scrutiny by the vendors themselves, and other legal technology commentators have offered various perspectives. At least one (since-corrected) methodological error was identified, and there have been other criticisms. I won’t rehash this discussion here; I simply want to identify that LLM accuracy, the effectiveness of RAG, and accuracy measurement and other benchmarking are all areas of ongoing discussion and debate.

But, this paper conforms with my own experience that, at least at the present time, generative AI tools cannot reliably perform complete legal research tasks. I believe there are still ways for those tools to be helpful in supporting research, but these methods require thoughtfulness and care, identifying specific places within research workflows where AI support is appropriate, and only relying on such support to the extent that it is relatively easy to check accuracy.

Here is simple flowchart from Peter Gostev at Moonpig, whose commentary I always enjoy, on how to navigate genAI hype. I thought it was funny. And accurate!

Since the explosion of genAI, with the attendant piles of money and attention flowing into the area, the amount of hype has been pretty unbelievable. I have not historically found myself with deep interest in a technology while it is in such a frenzied phase. So it’s a novel experience for me to be bombarded with really seductive claims of various AI tools, use cases, and resources, and then, in many cases, disappointment—as reality falls short of the marketing & demos.

Now I try to keep this flowchart in mind whenever I get excited about some claim or another about what the newest genAI tool can do to support my legal practice.

Tips, tools, and tutorials

You have probably run across fair amount of workshops, trainings, guides, etc. being offered on genAI. That might be an understatement. There’s a real content glut at the moment, and I have certainly had a tough time figuring out what is a worthwhile use of my time. At this point, I simply ignore most of these offerings. That’s my coping mechanism. But I’m quite excited about a brief workshop coming up on Saturday.

My friend Jonathan Earley has been a reliable and fun resource as I’ve been learning about genAI. I am excited that he is now applying his enthusiasm and expertise to workshops. Not only is he deeply knowledgable in the area and wildly creative in his use of these tools, but he is a seasoned educator who knows how to effectively teach this material. the Boundless workshop is not aimed at lawyers specifically, or even solely business enterprise use cases. But I’m confident that if you tune in you will learn some surprising & useful stuff!

All right y’all, that’s a wrap—See you next month!