• AI Street
  • Posts
  • Study: Wall Street Tasks (Still) Stump Top AI Models

Study: Wall Street Tasks (Still) Stump Top AI Models

New research shows even advanced AI struggles with financial calculations

Hey, it's Matt. Here’s what’s up in AI + Wall Street.
Was this email forwarded to you? Sign up below:

RESEARCH

Wall Street Tasks (Still) Stump Top AI Models

AI models built for complex reasoning perform worse than basic models at financial tasks, according to new research.

The study, Fino1: On the Transferability of Reasoning-Enhanced LLMs to Finance, tested 16 models on three specialized financial benchmarks—FinQA (financial math), DM-Simplong (lengthy financial reports), and XBRL-Math (standardized financial equations)—evaluating their ability to handle both numeric calculations and text-heavy questions.

While reasoning-optimized models handle equations well, they struggle with financial text—in one case mistaking a share purchase price for a compensation expense.

“Financial reasoning is more complex and specific than the general domain, such as trending and financial ratio calculation," lead researcher, Qianqian Xie, tells me. "General domain models can fail to understand these nuances, sometimes failing at the very first step, which then leads to the entire calculation being off.”

Surprisingly, general-purpose models like GPT-4o outperformed reasoning-enhanced models such as GPT-o1 in financial contexts. On FinQA, for example, GPT-4o achieved 72.49% accuracy, while GPT-o1 lagged behind at 49.07%.

To address these gaps, the researchers introduced Fino1, an 8B-parameter model trained with reinforcement learning to solve financial problems step-by-step, rather than using general-purpose reasoning. Despite its small size, Fino1 outperformed its base model by 10% across all benchmarks, proving that domain-specific reasoning matters more than raw computing power. While larger models, including top performer DeepSeek-R1, showed diminishing returns after 70B parameters, Fino1's success shows that reasoning enhancement must be adapted for finance to be effective.

By releasing their datasets, code, and results, the Fin AI, which promotes open-source AI tools for finance, hopes to bring more transparency to financial AI research—an area where proprietary models often remain black boxes. Moving forward, they call for deeper financial adaptation, better multi-table reasoning, and more specialized training to ensure AI can reliably handle real-world financial tasks.

HEDGE FUNDS

Systematic Bond Investing Expands With Portfolio Trading

Portfolio trading is helping quants scale bond strategies as electronic execution cuts costs and boosts liquidity. With growing fixed-income ETFs and better trading platforms, factor investing in credit is gaining traction. (Bloomberg)

Unlike stocks, bond investing relies on vast amounts of unstructured data—from bond prospectuses to credit agreements and ratings reports. As electronic trading expands, AI is likely to help organize this data, making systematic credit strategies and portfolio trading more scalable.

Small AI-Powered Hedge Fund Shows Early Gains

Minotaur Capital, a small Sydney-based hedge fund, has relied entirely on AI-driven research to achieve a 13.7% return over six months, outperforming the MSCI All-Country World Index’s 6.7% gain. (Yahoo/Bloomberg)

This is a small fund and a short timeframe, but I do think relying on AI to make investment decisions is going to be more and more common. Portfolio GPT is gaining assets.

Made with Ideogram with the prompt: “Minotaur hedge fund”

LAW

Steve Cohen’s Point72 Invests $75M in Legal AI

Billionaire Steve Cohen’s Point72 Private Investments led a $75 million financing round in Luminance Technologies Ltd., a startup that uses AI to draft and review legal documents.

“I think in the future we’re going to have AI negotiating contracts against AI,” chief executive Eleanor Lightbody said in an interview. “And that’s what we’re building.”

From Bloomberg

AI legal software companies continue to attract major funding. Last week, a rival startup called Eudia closed a $105 million deal and Harvey, the sector’s largest startup, raised $300 million in a round that doubled its valuation to $3 billion. (Bloomberg)

CODING

AI Tools Let Non-Coders Build Software

Replit, Anthropic, and Google Cloud are making software development more accessible with AI-powered tools. Their collaboration allows Zillow employees without coding experience to build custom applications. AI-generated code now allows teams in marketing, sales, and operations to create software without engineering expertise. (VentureBeat)

This technology could allow financial professionals to develop quantitative models and automation tools that traditionally required coding skills.

DATA

Crunchbase Bets on AI to Predict Startup Exits

Crunchbase is pivoting from pure data provider to AI-powered prediction engine, using its 17 years of proprietary startup data to forecast funding rounds, acquisitions and IPOs.

Traditional database companies face new challenges in the age of ChatGPT. "Historical data companies are already dead," Crunchbase CEO Jager McConnell told the WSJ.

Crunchbase's AI strategy:

  • Uses behavioral signals from 80 million users that aren't publicly visible - like profile edits and investor search patterns

  • Claims 95% accuracy on funding predictions but less than 50% on startup failures

  • Successfully predicted Anthropic's $2B raise and Coda's acquisition by Grammarly

  • Laid off one-third of staff to reinvest in data science and engineering

Database companies are looking to reinvent themselves in the AI era. Last week, I highlighted how Reddit is monetizing chat forums for trading sentiment. I’m not sure historical data companies are dead, but AI spotting trends before humans seems likely.

FUNDRAISING

Goldman Leads $55M in 73 Strings for AI Valuations

73 Strings, an AI-powered financial intelligence platform, raised $55 million in a Series B round led by Growth Equity at Goldman Sachs Alternatives, with participation from Blackstone Innovations Investments, Golub Capital, Hamilton Lane, and Broadhaven Ventures. (Press Release)

As private markets expand, investors are backing technology to streamline data extraction, valuation, and decision-making.

DiligentIQ Raises $12M to Automate PE Due Diligence

DiligentIQ, an AI-powered platform for private equity due diligence, raised $12 million in a Series A round led by FINTOP Capital for expansion. The platform automates the analysis of deal documents in virtual data rooms (VDRs), reducing manual workloads and structuring unstructured private market data. (Press Release)

The line between private and public markets is blurring because AI lets investors standardize information buried in PDFs. Public companies have regulatory requirements in how they report their financials. In the private world, there’s often less information and it’s not necessarily presented in an easily digestible way.

BANKS

BofA CEO Asked AI for Data and Got Picture of a House

For all of AI’s advantages, it still has a long way to go, according to Bank of America Corp. Chief Executive Officer Brian Moynihan. (Bloomberg)

“I was trying to get a mortgage-amortization schedule. I pounded in it three times, got stuff you couldn’t read. Finally I just got a picture of a house,” Moynihan said last week.

It’s weird to me that folks expect AI to understand what you mean in every context. Regular readers know that the technology underpinning ChatGPT is not very good at math or tabular data. This is changing. Right now, large language models are a blunt instrument. We’re still in the phase of integrating other pieces of technology that we know already work well. So, pretty soon, Moynihan should get what he’s looking for.

MONEY MANAGEMENT

AI Lowers Barriers in Wealth Management: Microsoft

AI is set to transform wealth management much like the internet did decades ago, lowering costs and allowing new entrants to compete with established banks, according to Martin Moeller, Microsoft’s head of AI & GenAI for financial services in EMEA.

  • “Banks that have so far been barely active in wealth management could enter the business with the help of AI without having to invest much in customer advisors,” Moeller said. (Reuters)

REGULATION

AI’s Speed Presents Risks, Fed’s Michael Barr Says

Federal Reserve Vice Chair for Supervision Michael Barr highlighted AI as a transformative force in financial markets and the broader economy, warning that while it could drive productivity and innovation, it also poses systemic risks such as financial instability and concentrated economic power. (Bloomberg)

GenAI will likely become a ‘general purpose technology,’ with widespread adoption, continuous improvement, and productivity enhancements to a wide range of sectors across the economy.

Vice Chair for Supervision Michael Barr in a Feb. 18 speech.

The Fed is now treating AI as a potential driver of financial instability and economic transformation, not just a niche innovation. It's a potential source of systemic risk that could reshape a large part of the financial system.

BENCHMARKS

The Challenges of Pick-Your-Own Benchmarks

Elon Musk’s AI company, xAI, released Grok 3, its latest AI model, claiming enhanced reasoning capabilities that outperform other state-of-the-art models. But with no standardized AI benchmarks, companies like xAI can cherry-pick results or train models to excel in specific tests, making direct comparisons difficult.

Ethan Mollick is a professor at Wharton and is a frequent poster on social media with very sensible takes on AI. I’ve said this before, AI needs standardized benchmarks for wider adoption.

WHAT ELSE I’M READING

The EU AI Act is Coming to America

The U.S. is adopting state-level AI regulations, inspired by the EU's AI Act, potentially creating a patchwork of AI rules. (Hyperdimensional)

How did you like today's newsletter?

Login or Subscribe to participate in polls.

Reply

or to participate.