• AI Street
  • Posts
  • How Causal AI Can Help Reduce Hallucinations

How Causal AI Can Help Reduce Hallucinations

Five Minutes with Jayeeta Putatunda, Lead Data Scientist and Director at Fitch Group Inc

FIVE MINUTES WITH…

Jayeeta Putatunda, Lead Data Scientist and Director at Fitch Group Inc, has been at the forefront of NLP since 2015. 

Spurred by a chance NLP course during her master’s in quantitative methods, she pivoted from Deloitte consulting to data science and has since built significant expertise in combining traditional statistical approaches with modern AI techniques.

In this Five Minutes Q&A, Jayeeta shares her insights on AI evaluation frameworks, the emerging potential of causal AI, and why bridging research and industry remains a key challenge in financial services.

Key Takeaways:

  • Causal AI could help address hallucination problems by grounding outputs in validated relationships

  • Financial services need specialized evaluation frameworks beyond generic AI metrics

  • Regulators require explainable AI models with clear evidence chains

  • Integration of traditional statistical models with new AI techniques is crucial for adoption

This interview has been edited for clarity and length. 

BACKGROUND 

Matt: How’d you get into AI? 

Jayeeta: It’s a funny story. I was with Deloitte doing consulting for a few years, and then I figured that it was not my jam. I wanted to be more technical so I did my master’s in quantitative methods and modeling, which focused a lot on operations research and statistics. My bachelor’s was in econometrics and statistics, so it aligned with my background.

In my last semester, I still had half a credit left, and there was a half-credit course on NLP. I thought, “What is this new thing?” I loved it right away—working with unstructured text data was fascinating. Then I found an internship at Omnicom Media Group. They were just starting to build their data science team and had massive volumes of articles and unstructured text data. The task was to do basic text classification, metadata tagging, naming entities recognition mapping. That gave me practical exposure to NLP. After that, I kept doing more and more projects in the space. I remember trying to train a Word2Vec model on my poor CPU laptop and nearly breaking it. At that time, the tech wasn’t there, and we had to do a lot of painful manual work.

Matt: How did the transition from those resource-heavy early NLP models to the current era of GPT feel?

Jayeeta: Back in 2015, it was hard to get anything done because of hardware limitations. If you tried to train a small Word2Vec model on a baseline GPU, it could still take hours, and new words that weren’t in the vocabulary caused big problems. Today, the hardware has improved dramatically. GPUs are much more powerful and cloud-based solutions are everywhere, so we can spin up a reasonable environment in minutes.

We also have better algorithms, bigger datasets, and more robust frameworks. All these changes make it possible to do more complex NLP tasks. Honestly, it’s a relief—we went from feeling limited by technology to being limited mostly by our creativity. Now you can do advanced tasks like building large language models and hooking them up to external data sources. It’s a completely different landscape.

Matt: When we met at that AI conference (ICAIF) in New York, we talked about bridging research and industry. Do you find there’s still a gap?

Jayeeta: Yes, absolutely. Sometimes I go to industry conferences, and everything is so high-level that there’s no detail at all. Other times, I go to research conferences and it’s so specific and niche I can’t see how to bring it back into a real business setting. I feel there’s a disconnect. We need conferences or forums where we can share more practical, technical details without getting lost in either marketing talk or super-narrow research.

I spoke at an AI conference that did a decent job by having a finance stage, an investor stage, a startup stage, and so on. It helped people zoom in on their areas of interest. But we still need more synergy between the theoretical research folks and people building out actual business solutions.

Matt: How did you go from basic text classification to the more advanced NLP side, especially with large language models?

Jayeeta: It was a gradual shift. Early on, I was just happy building Word2Vec or Doc2Vec models, trying to classify articles or extract location tags. It wasn’t glamorous. Then as technology advanced, I started seeing bigger possibilities with deep learning architectures. Fast forward to GPT and now generative AI. It feels like we jumped from a broken Word2Vec pipeline that took hours to something that can produce coherent paragraphs in seconds.

Of course, there’s more to it than GPT. One big theme for me is retrieval-augmented generation (RAG). You see a lot of hype around chatbots, but there’s deeper potential: knowledge-base optimization, metadata generation, document retrieval pipelines, agentic workflows, and so forth. Causal AI is another big one. If you ground your generative model in causal relationships you’ve already established statistically, you get fewer hallucinations.

TECHNICAL INSIGHTS

Matt: Could you explain causal AI in simpler terms and why it’s important in finance?

Jayeeta: Sure. Think of finance or any domain where you care about cause and effect rather than just correlation. Econometrics has always done that, trying to see if a 5% change in parameter A leads to some shift in parameters B or C. You can run simulations to see if that link holds, or if there’s just a loose correlation.

Now combine that with generative AI. If you feed the generative model validated causal maps or relationships, then when it produces a report, it’s less likely to invent false connections. If you’re summarizing how Apple’s green initiatives led to a reduction in carbon emissions, you can validate that 5% or 10% figure through your causal model. The generative model can then generate text that reflects those relationships, rather than guessing.

Matt: That addresses one of the big concerns: hallucinations. Is that part of why we need more than just raw LLMs?

Jayeeta: Exactly. Especially in regulated industries like finance or healthcare, 99% confidence isn’t good enough. People want to know how you arrived at your conclusion. If a generative model says, “Company X decreased its carbon footprint by 20%,” but in reality it was only 5%, you have a serious problem.

So the question becomes, “How do I prove this output?” You can add a retrieval mechanism that references a trusted knowledge base. But you can also anchor it in a causal model that runs these simulations and says, “Yes, a 5% change here caused 10% change there.” If your final output strays from that validated relationship, the pipeline can flag it or correct it.

INDUSTRY CHALLENGES

Matt: How do you see regulators responding to these new AI tools?

Jayeeta: They’re not going to blindly trust a black box. If you can’t explain why your model gave a certain rating or prediction, you won’t get approval to deploy it. The compliance frameworks require traceability. That’s why layering more transparent models—like XGBoost or older econometric models—under the hood can help. They’re proven and easier to explain. You can incorporate the generative element for reporting, but the generative part has to be grounded in something regulators can see.

In finance, you have to prove each step. If you’re using a large language model to generate a final report, you want to show how the data moved from a baseline model, through a causal map, to the final narrative. That chain of evidence is crucial, or people won’t adopt it.

Matt: Given how fast everything’s changing, do you think it’s hard to start a company in this space?

Jayeeta: Definitely. People pivot so quickly. Six months ago, everyone was into RAG. Now they’re exploring agentic RAG. Next year, there might be a new architecture that changes everything. If you start a product today for summarizing earnings calls, you may find that half the market already does it. Or if you focus on a certain technique, you risk it becoming obsolete because the big tech players release something more efficient.

But there’s also opportunity. If you focus on solving a real problem—like robust evaluation frameworks or specialized causal solutions—you can differentiate yourself. If you just do what everyone else is doing, you’re going to be left behind.

Matt: Let’s talk about evaluation. You mentioned it’s a blind spot. Why do you think that is?

Jayeeta: Right now, there aren’t many holistic frameworks for end-to-end evaluation. People talk about faithfulness or toxicity, but those are very generic. Finance needs specialized checks for factual accuracy, updated exchange rates, or proper reference data from legal documents. Healthcare needs a different set of checks.

There’s a lot of customization, so you can’t just adopt the same evaluation approach used for a casual chatbot. That slows adoption because people say, “I can’t trust your output if there’s no recognized standard.” If we had a well-defined framework that organizations could easily tailor, it would remove a big barrier.

Matt: So, no universal baseline, right?

Jayeeta: Exactly. I think an industry consortium could set up a handful of widely accepted benchmarks. Everyone would still customize them, but at least we’d share a reference point. At the moment, it’s like the Wild West. Each vendor claims, “Our solution is 95% accurate,” but how? On what data? According to which criteria? It’s impossible to compare apples to apples unless you know the underlying metrics.

Matt: Any final thoughts on bridging the gap between research and industry?

Jayeeta: We need more back-and-forth between researchers building new architectures and industry practitioners hitting real bottlenecks, like compliance or data governance. Researchers often tackle niche problems, which is great. But we also need them to address the everyday frustrations of building and deploying solutions in highly regulated domains.

We can accelerate adoption by making the solutions more trustworthy and transparent. Once we have robust evaluation, causal grounding, and easy ways to integrate older predictive models with new generative techniques, you’ll see more companies jump on board. I’m optimistic that by 2025 we’ll have frameworks to handle a lot of the current unknowns, like hallucination or incomplete references.

IN CASE YOU MISSED IT  

Recent Five Minutes with Interviews

  • Ex JPM’s Tucker Balch  on Scaling Investment Analysis with AI.

  • USC's Matthew Shaffer on using ChatGPT to estimate “core earnings.”

  • Moody’s Sergio Gago on scaling AI at the enterprise level.

  • Ravenpack | Bigdata.com’s Aakarsh Ramchandani on AI and NLPs.

Reply

or to participate.