- AI Street
- Posts
- ⚪️ ChatGPT Relies on Memory, Not Math for Financial Predictions: Study
⚪️ ChatGPT Relies on Memory, Not Math for Financial Predictions: Study
New podcast with Yale's Francesco Fabozzi, LLMs can't do math and the latest news.
Hey, it's Matt. Welcome back to AI Street—I break down what's happening in AI + Wall Street, with expert interviews, curated news and actionable analysis.
*Announcement*
I’m partnering with Francesco Fabozzi, Research Director at Yale’s International Center for Finance, to co-host a new podcast called The Alpha Intelligence Show.
More details below
AI NEWS
ChatGPT Adds Task Scheduling in Step Toward Agents
OpenAI has launched Tasks, a new ChatGPT feature that allows users to schedule reminders and recurring actions, marking the company’s first step toward what the industry calls agents, though I view them more as ‘Taskbots,’ akin to a Roomba for white-collar work. (The Verge)
USE CASE
I’m using OpenAI’s new feature to track news and stories on AI and finance — something I was already doing but will have it done automatically. I still use google alerts but ChatGPT often uncovers unique content.
AI Giants Pay YouTubers for Unused Video
Made with Ideogram
AI companies are paying content creators $1-4 per minute for unused video footage to train AI models with OpenAI and Google among buyers. Higher quality and unique formats like drone footage command premium prices. (Bloomberg) $
NEW DATA MARKETPLACE?
As AI companies exhaust publicly available internet data for training their models, they're turning to exclusive, unpublished content. For regulated industries like financial services, it’s harder to sell this type of data to an AI company. But the megabanks should benefit given their proprietary as well as legacy datasets.
BANKS
Goldman Drafts IPO Docs in Minutes vs. Weeks with AI
The bank now has 11,000 engineers among its 46,000 employees, according to GS CEO David Solomon, and is using AI to help draft public filing documents.
The work of drafting an S1 — the initial registration prospectus for an IPO — might have taken a six-person team two weeks to complete, but it can now be 95 percent done by AI in minutes. “The last 5 percent now matters because the rest is now a commodity,” he said. (Financial Times) $
RBC partners with Cohere on AI financial platform
RBC says it has partnered with Cohere Inc. to develop generative artificial intelligence products for the financial industry. It says the partnership will help create more accurate and verifiable models with a focus on risk and security. (Wealth Professional)
PRODUCTIVITY
AI Could Double Productivity by 2034: Stanford, Carnegie Mellon Researchers
AI could double economic productivity within the next decade through advances in both knowledge work and physical automation, according to leading researchers from Stanford and Carnegie Mellon. The gains could compound as AI accelerates scientific discovery and innovation, though institutional changes will be needed to ensure benefits are widely shared. (Bloomberg Law) $
NOTE: In a speech last week, Fed Governor Waller said “While I am skeptical that AI is already making significant contributions to productivity growth, I have little doubt that it will do so.”
INSURANCE
Moody's Buys AI Firm for Address-Level Insurance Risk
With the tech, Moody’s plans to create a property database capable of delivering “address-specific” risk insights for its insurance clients, said Moody’s CEO Rob Fauber.
Cape’s exit comes as the insurance industry ramps up its adoption of AI and predictive analytics technologies. A 2024 survey by Conning, an insurance asset manager, found that 77% of insurers are in some stage of deploying AI, a 16-percentage-point increase from the previous year. By one estimate, the global AI in insurance market will be worth $80 billion by 2032. (TechCrunch)
ADOPTION
Generative AI Adoption Jumps Among Fortune 1000 Enterprises
Generative AI adoption has surged across Fortune 1000 companies, with 24% now using the technology at scale compared to just 5% last year, according to a Babson University study of 125 Fortune 1000 companies. Despite challenges like disinformation and ethical concerns, 96.6% of data and AI leaders see AI as transformative, with significant productivity gains. (CIO)
PODCAST
I’m excited to be partnering with Francesco Fabozzi, an expert in AI + finance, in a new podcast we’re calling The Alpha Intelligence Show. We’ll be bringing you in-depth conversations with hedge fund managers, quants, and academics on the future of AI on Wall Street. More details to come.
With the new podcast, I’m retiring the Five Minutes with interview series. Given this transition and the fact I’ve added quite a few new readers over the last few weeks, I’m highlighting all of my previous interviews since starting AI Street last summer.
Five Minutes with Interviews
Fitch’s Jayeeta Putatunda on causal AI reducing hallucinations
Former JPM Exec Tucker Balch on scaling investment analysis with AI
USC's Matthew Shaffer on using ChatGPT to estimate “core earnings”
Moody’s Sergio Gago on scaling AI at the enterprise level
Ravenpack | Bigdata.com’s Aakarsh Ramchandani on AI and NLPs
PhD candidate Alex Kim on signals with executive tone in earnings calls
MDOTM’s Peter Zangari, on AI for portfolio management
Arta’s Chirag Yagnik on AI-powered wealth management
Finster’s Sid Jayakumar on AI agents for Wall Street
Sov.ai's Derek Snow on AI for fundamental investors
Bain’s Richard Lichtenstein on AI adoption in private equity
Snowflake’s Jonathan Regenstein on AI building novel datasets
Skadden’s Dan Michael on the SEC’s AI stance
Stardog’s Matt Lucas on hallucination-free AI
Celent’s Monica Summerville on AI Adoption in capital markets
Aveni's Joseph Twigg on building a finance LLM
Persado’s Assaf Baciu on tailored AI marketing at banks
Professor Alejandro Lopez- Lira on AI-driven stock predictions
RESEARCH
𝗖𝗵𝗮𝘁𝗚𝗣𝗧 𝗥𝗲𝗹𝗶𝗲𝘀 𝗼𝗻 𝗠𝗲𝗺𝗼𝗿𝘆, 𝗻𝗼𝘁 𝗠𝗮𝘁𝗵 𝗳𝗼𝗿 𝗙𝗶𝗻𝗮𝗻𝗰𝗶𝗮𝗹 𝗣𝗿𝗲𝗱𝗶𝗰𝘁𝗶𝗼𝗻𝘀: 𝗦𝘁𝘂𝗱𝘆
New research shows that Large Language Models struggle with basic accounting calculations and rely on memorization rather than true numerical reasoning, according to a study from University of Chicago’s Bradford Levy.
For example, if you ask an LLM to add any two numbers between 0 and 100, it’s generally correct. But if you ask it to add any two numbers between 0 and 10,000, accuracy plummets.
Levy created a novel test, described in his paper 𝘊𝘢𝘶𝘵𝘪𝘰𝘯 𝘈𝘩𝘦𝘢𝘥: 𝘕𝘶𝘮𝘦𝘳𝘪𝘤𝘢𝘭 𝘙𝘦𝘢𝘴𝘰𝘯𝘪𝘯𝘨 𝘢𝘯𝘥 𝘓𝘰𝘰𝘬-𝘢𝘩𝘦𝘢𝘥 𝘉𝘪𝘢𝘴 𝘪𝘯 𝘈𝘐 𝘔𝘰𝘥𝘦𝘭𝘴, by changing the least significant digit in a company's accounting results (for instance, changing $7.334 billion to $7.335 billion) to show that GPT-4's accuracy in predicting earnings changes dropped dramatically - from 60% to no better than random chance. This suggests the model isn't actually analyzing financial data but simply matching patterns it has memorized.
"If you look at what's happening under the hood, these models aren't thinking through financial statements - they're remembering patterns from their training data," Levy told me in a recent interview.
Traditional machine learning methods performed better and, importantly, weren't thrown off by small changes to the numbers like ChatGPT was.
While LLMs struggle with numbers, they can be instructed to write and execute code to perform calculations, significantly improving accuracy, according to Levy. In other words, instead of relying on pattern matching, the LLM can use a better tool for more accurate results.
The study also found that ChatGPT gets easily confused by too much information. When presented with complete financial statements - the way they appear in real company filings - rather than just the key numbers needed, its accuracy dropped significantly.
The 'ChatGPT-for-everything' approach is not precise enough for Wall Street. The next phase likely involves LLMs acting as agents that know when to delegate tasks to specialized AI tools, like using calculators for math.
MARKETS EDITION
Last Sunday, I published the first edition of AI Street markets— a new weekly publication where I use proprietary AI tools for investment analysis.
I’ve republished part of that post below. As a reminder, I have access to the following platforms below and I’m getting up to speed on how best to use them. (With access to more platforms in the coming weeks.)
If you’d like to sign up for this edition, please click here.
Effective Prompts For Investment Research
Understanding AI's Limitations
While AI tools might seem to 'think' like analysts, they're really just pattern-matching machines working with probabilities. Here’s a breakdown on why they’re not “reasoning.”
This creates a challenge: When AI analyzes recent information not in its original training, it's more likely to make mistakes or "hallucinate" false connections.
This can't be completely eliminated, as financial markets produce new information every day. One way to mitigate this is with specific prompts.
Effective Prompts
LLMs have exploded in the two years since OpenAI released ChatGPT in part because it’s so easy to use. You don’t need any coding experience.
But you can improve outputs from LLMs with effective guidance or prompts.
For example, this is pretty vague:
Review Jefferies’ most recent financial performance.
This type of prompt is more likely to give you a broad overview of the bank’s fourth-quarter earnings, which reported earnings last week.
You’re more likely to get a precise response by providing clear instructions and defining the following:
Effective Prompts Structure
Role
Be specific about the expertise you want the AI to adopt. Instead of just saying "analyze this," tell it to act as an equity analyst, credit analyst, or another relevant expert. This helps frame the analysis appropriately.
Task
Break down exactly what you want analyzed, including:
Time period (specific quarter/year)
Metrics to focus on
Types of comparisons (YoY, sequential, vs peers)
Specific aspects of the business to examine
Output
Specify how you want the information presented:
Format (bullet points vs paragraphs)
Order of information
Level of detail
Whether to include specific quotes
Types of metrics to highlight
How to handle forward-looking statements
So this structured approach typically produces more reliable and useful analysis. Going back to Jefferies:
Role: You are an equity analyst specializing in investment banking, covering Jefferies.
Task: Review Jefferies' most recent quarter (Q4 2024) investment banking performance:
Highlight key Q4 2024 metrics:
Investment banking revenue and QoQ change
Revenue mix (M&A vs Capital Markets)
Specific deal statistics mentioned by management
Any notable changes in client activity or deal types
Provide relevant context:
YoY comparison with Q4 2023
Key trends over past 2-3 quarters
Market share gains/losses explicitly mentioned
Any changes in competitive positioning vs bulge bracket banks
Extract management's forward-looking commentary ONLY from Q4 call:
Pipeline comments with specific metrics if provided
Any guidance on deal timing or conversion
Client activity trends
Areas of strategic focus
Format: Start with Q4 performance, followed by YoY/sequential trends, then management outlook. Only include metrics and commentary explicitly stated in earnings calls. Quote management directly when discussing outlook.
For this prompt, I used Bigdata.com, in part because I was able to get some hands-on training a couple weeks ago. (Thanks Dan!)
I clicked on “@earningscalls” to reference the most recent quarter.
I got the following result:
I’ve added the text below for readability.
In the screenshot, you’ll see blue footnotes that you can click to view the source of the information.
It’s already underlined — no need to hunt for it. AI-generated content needs to be auditable given the risk of hallucinations.
If you want to learn more about Bigdata.com, check out my interview with Aakarsh Ramchandani, chief strategy officer at RavenPack, which launched the new platform last fall. RavenPack has been developing natural language processing (NLP) products for traders since the early 2000s.
How did you like today's newsletter? |
Reply