- AI, Esq.
- Posts
- New ChatGPT Features For Lawyers
New ChatGPT Features For Lawyers
This month, I’d like to focus on a couple new offerings from ChatGPT that might be of particular interest to lawyers, and highlight some new information released by the legal-specific tool Harvey.ai.
Notable finds
It’s been a long time coming, but Open AI finally rolled out its ChatGPT Search feature a couple weeks ago.
Like other ChatGPT conversations, the tool will generate a narrative text response to your search query. But its response will include source citations, finally allowing us attorneys to breathe a sigh of relief.
In addition to selecting search as an option in any chat, ChatGPT will sometimes sense that a user request is most appropriate as a search query and automatically engage its search function. Also, you can re-run any prompt in a conversation as a search request.
I've found this ability to rerun questions quite useful. Often, when I’m in the midst of a back-and-forth with ChatGPT (not using the search function), I get a response that I'm concerned might not be totally accurate. Or I find myself wanting particular sources to backup its assertions. In that instance, I find myself often re-running a particular search as a particular prompt with the ChatGPT search feature, and it does often deliver results that are more useful to me.
ChatGPT’s search function is of particular interest to lawyers like us. While I’m still not using it to perform any sort of heavy-duty legal research, I'm finding the search feature has made ChatGPT a generally more effective first-pass research tool. The usual caveats of legal research with generative AI still apply: it still works better for federal than state-specific law and it tends to work better when dealing with statutes or Supreme Court precedent than very in-the-weeds regulatory matters or obscure points of case law. And for the love of all that is holy, Please double check the accuracy of anything it comes up with (note that this is much easier when it is citing and hyperlinking its sources). But overall, it represents a shift towards the sort of accuracy and accountability that is needed to use generative AI tools to support legal work.
Paying subscribers like me already have access, and it should be available to free users as well sometime soon.
Another ChatGPT feature that is very relevant to legal users is its new “reasoning” models o1-preview and o1-mini, released in September. For paying subscribers, these show up as additional options for any conversation, in addition to the current default ChatGPT model, GPT4-o (sheesh, these product names… Open AI just isn’t getting any better at that aspect).
OpenAI’s announcement provides a pithy explanation of the difference between these models and its default GPT-o model, “[w]e trained these models to spend more time thinking through problems before they respond, much like a person would.” What that means is that essentially ChatGPT responds to a prompt with multiple analytical steps in sequence before providing a response. OpenAI goes into more detail in their article Learning to Reason with LLMs. The article not only describes the process, but it provides some eye-popping claims on analytical accuracy, claiming that “[i]n many reasoning-heavy benchmarks, o1 rivals the performance of human experts.” For example, on one benchmark, the o1 model for the first time exceeds human expert performance in solving PhD level science questions. OpenAI reports similarly strong results across several other domains:
Most of the advances that have been touted are in math and science, where o1 is clearly a big step forward. But as the graphs above show, this new model performs significantly better on certain benchmarks of legal accuracy and LSAT performance.
Commentators have similarly noted that the o1 models offer significantly better performance on some tasks. However, most offer some caveats. These models are notably slow and expensive compared to other flagship genAI models from OpenAI and its competitors. And their performance doesn’t exceed other models in all tasks (in fact, in some areas, like creative writing, o1 performs notably worse).
It still remains to be seen how impactful new “reasoning models” will be in improving analytical quality and reducing hallucinations when used to support legal research and analysis. In my own limited experience, I have found that o1-preview tends to do a better job in issue spotting than any other model I’ve used, and it seems to hallucinate a bit less (inaccuracy is still a big issue though!). Legal technology vendors also seem bullish, as Thompson Reuters and Harvey have both reported that they are incorporating the o1 models into their legal AI products.
Speaking of Harvey….
Harvey.AI is a generative AI tool built to serve law firms and other legal organizations. It is a significant player in the legal AI space, having just raised a $100 million series C at a 1.5 billion dollar valuation, and boasting several Big Law firms as clients.
The software is built on OpenAI's GPT models, fine-tuned with legal data, and then further fine-tuned with its individual clients’ internal data. Initially, Harvey had a reputation for being a bit secretive, not revealing as much about it’s product as competitors like Westlaw, Lexis, and a whole raft of legal GenAI startups. Recently, however, it has been making more data available.
I found a recent post especially interesting. Harvey summarized its clients’ top use cases in a short post discussing both litigation and transactional practices. It included a couple interesting infographs highlighting the variation of uses across different practices groups. here is the litigation graphic, for example:
source: Harvey.ai
Harvey also recently debuted its system for evaluating the performance of genAI models on legal tasks, BigLaw Bench. Surprise, surprise, Harvey comes out on top! I of course take any vendor’s comparative assessment with a big ‘ol grain of salt. But I’m so starved for analysis of genAI performance in supporting lawyers that I’m still intrigued. And it has made certain underlying information about its datasets and analysis public, which is encouraging. I also like the mix of tasks that BigLaw Bench chose to analyze:
Transactional Tasks | Litigation Tasks |
---|---|
Corporate Strategy & Advising | Analysis of Litigation Filings |
Drafting | Case Management |
Legal Research | Drafting |
Due Diligence | Case Law Research |
Risk Assessment & Compliance | Transcript Analysis |
Negotiation Strategy | Document Review & Analysis |
Deal Management | Regulatory & Advising |
Transaction Structuring | Trial Preparations & Oral Argument |
I haven’t demoed Harvey yet. If I’m being honest, partly this is due to its mixed reputation in the legal community, but also because I am pretty turned off by their sales outreach strategy (uniquely entitled & aggressive, in my humble opinion). However, as the company raises more money, obtains additional notable clients, and releases more information about its offerings, I am feeling the pull to check it out. When I do, I’ll be sure to let y’all know my opinions!
All right y’all, that’s a wrap—See you next month!