• AI Street
  • Posts
  • How AI Analyzes Executive Tone for Investment Insights

How AI Analyzes Executive Tone for Investment Insights

5 Minutes with PhD candidate Alex Kim

INTERVIEW

Jerry agreed to wear the puffy shirt on the Today show because he couldn't understand Kramer's girlfriend's mumbling. And yes this meme might be a stretch 🙂 

Wall Street has been parsing text for years looking to spot changes. Bloomberg publishes a side-by-side comparison of Fed announcements to show whether the central bank’s language is becoming more hawkish or dovish.

AI can now listen in on how executives deliver company updates.

Alex Kim, a Ph.D. candidate in accounting at the University of Chicago, has pioneered ways to quantify subtle vocal cues using AI. Research in communications has long shown that up to 70% of information in oral communications comes through voice rather than text.

Kim's work on AI and LLMs has caught the attention of dozens of hedge funds worldwide, including some of Wall Street's biggest names. 

The paper, "Vocal Delivery Quality in Earnings Conference Call" has been conditionally accepted for publication in the Journal of Accounting and Economics. Companies like Markets EQ, and Speech Craft Analytics are marketing this new way of understanding tone.

Just a few years ago, real-time transcription and vocal analysis tools weren't readily available. But now, with advances in large language models and speech recognition, Kim says the hurdles to accessing this type of data have dropped dramatically.

"It's admittedly more difficult than analyzing text at this moment, but this hurdle is way lower compared to 3 years before, and it's going to get lower and lower over time," Kim says. "So within a couple of years, I'm pretty sure that everybody will have access to these vocal analysis features."

As these new tools become more accessible, Kim expects to see a shift, with investors increasingly using vocal delivery scores to make trading decisions in the days following earnings calls. And he anticipates that companies might also start coaching executives to improve their delivery, similar to how textual disclosure practices have evolved.

This interview has been edited for clarity and length.

MR: So you found that when executives deliver uncertain or negative news, their vocal delivery quality diminishes. Is this typically an unconscious reaction due to nervousness?

AK: Yes, yes, that’s actually one story that we write in the paper. We studied the psychology literature, which says that when you’re trying to deliver uncertain negative news, even though you don’t try, your vocal delivery gets lower. That could be one of the causes of the results. As you said, there might be some managers who try to obfuscate negative news by mumbling or not pronouncing appropriately. If you look at the examples we include in the paper, there are audio clips of managers delivering bad news, and it just seems like they’re kind of mumbling more than when they’re delivering positive news. We randomly picked five examples from the bottom decile, and it seems like they mumble a lot compared to when they’re delivering positive news.

MR: Interesting. Can you also talk about how you guys leveraged large language models (LLMs) and the precedence here? This is new to me—can you talk about the academic history on this? It seems like new ground to try to get a sense of how executives are speaking; before it was more about text, and now you’re focusing on delivery.

AK: The part you talked about, strategic motivations, is one aspect of the paper. But the bigger question why I think this paper is important is, in any earnings conference call, information is delivered in audio. During these calls, official transcripts aren’t immediately available. In recent years, you can use language models to transcribe in real time, but a few years ago, people were just listening to the calls to make investment decisions. 

Corporations disclose information orally, so a lot of information is in their vocal cues, not just text. Psychology and communication literature show that in vocal communication, around 60-70% of the information comes through voice, not text. Even if you analyze text to the fullest, you’re only capturing about 40% of the information available in oral communications. Earnings calls are significant because people make real-time investment decisions based on them, and stock prices can shift during the calls. My idea started about four years ago when I was a master’s student, realizing we might be losing a huge chunk of information available to investors. Large language models and transcribing tools weren’t available back then, and I had to build everything from scratch. Now that we have tools like Whisper and GPT models, I think this kind of study will become more common since it’s easier to build timestamped transcripts and measure vocal features even at the sentence level. Back then, people knew it was important but didn’t think it was feasible.

Listen here for an example of a “good” speaker:

And here for a “bad” one:

MR: Gotcha. Are you considering commercializing this at all?

AK: That’s a bit tricky. We thought about it maybe a year or two ago, but in that time, many people have jumped into this area. There are quite a few research papers now that follow our methodology—sometimes even improving on it—and use vocal features to analyze data. There are several firms now selling similar products. While they don’t offer exactly the same vocal delivery scores, we collaborated with Market EQ, who provided some vocal features reviewers wanted us to examine. Though not identical to our vocal delivery scores, these features are components of vocal delivery and they are amazingly accurate. At this point, we’re not considering making this commercially available.

MR: You’ve gotten great reception. 

AK: I was very lucky, because in the beginning, many people didn't like the idea because they didn't understand how we are measuring this. Now it has become way easier to explain why this is important, and the technology I'm using for measuring. But like 4 years before, when I said "timestamping," they were like, "Oh, what are you talking about, man," and they simply weren’t favorable to the idea. Now it has become easier.

MR: For investors who aren’t experts, what can they do with this information to make better decisions?

AK: If you’re looking to use our signals in real time to make investment decisions during conference calls within minutes, that’s not realistic for most retail investors. Even we can’t generate delivery scores during the calls. Tech giants or providers might have the means to provide real-time scores, but it’s not feasible for most. 

However, the paper shows that there are longer-term effects. For instance, there are one- to two-day market reactions in response to vocal delivery scores. The market reacts positively when a call delivers positive news, and this reaction is amplified if the manager speaks clearly. There’s an amplification effect over the one or two days following the calls, and we also show that journalists’ and analysts’ reactions are impacted by vocal delivery scores. So, it’s not just about real-time market reactions. If a data provider offered vocal delivery scores during conference calls, retail investors could potentially use those scores to make trading decisions in the following days.

It's admittedly more difficult than analyzing text at this moment, but this hurdle is way lower compared to 3 years before, and it's going to get lower and lower over time. So within a couple of years, I'm pretty sure that everybody will have access to these vocal analysis features.

MR: That's interesting. Then, I guess on the flip side, are you going to have companies trying to help executives, coach executives to improve their vocal scores?

AK: Yes, that's actually a long-term economic equilibrium that we can expect from these kinds of studies. Ever since people started to do the textual analysis on conference calls and MD&As on 10-Ks, their disclosures have changed to fit the textual analysis tools. Basically, they're using more positive words that count as positive words. They reduce the frequency of negative words that are counted as negative words. I'm not saying that they're changing the content of the disclosures, but they are changing how they write and how they deliver the information to get better scores from these tools. That's exactly going to be the same for conference calls as well - as people continue to use these signals to make trading decisions, it's pretty natural that the managers are also going to be reacting to those tools and then change the way they deliver things.

MR: Did you hear from investors or hedge funds after your paper?

AK: Of course, many of them, related to my works in general (not specifically related to the vocal delivery paper). They included really big funds. If I name them they're like really big ones that were seriously interested in learning the technology. For them, they were not immediately trying to adopt our technology, but they just wanted to get familiarized with the technology because there are now many papers showing its effectiveness. They just wanted to educate themselves. 

And then smaller firms that are actively developing products to sell to banks or other institutions, they are seriously interested in learning the technique itself, how to implement the technique. 

Also there's a paper that I co-authored with Chicago people, "Financial Statement Analysis with Large Language Models" - that's actually like a top 20 paper on the entire history of SSRN. It's one of the most downloaded papers, it has like 60,000 downloads now. After that, we've probably talked to a few dozen hedge funds worldwide. And they are implementing this technology in their products. Not that they're just taking our technology and directly applying it to their products, but they are making some modifications, and they are trying it on their own, testing it on their own. And many of them are already implementing the technology. 

MR: What was it about that paper? 

AK: So it was getting financial statement data, balance sheet, income statement data without any context clue and then designing a large language model with a chain of thought prompt that emulates the thought process of human analysts, and then asking the model to perform the financial statement analysis of the standardized and anonymous financial statements. The model gives you the analysis of the financial statements, and in the end, it gives you the prediction of whether earnings are going to be increasing or decreasing the next year.

We found that it is already doing better than human analysts. It is on par with the state-of-the-art machine learning models that are specifically trained for predicting earnings. It's surprising that general-purpose language models that are not trained on earnings prediction tasks are doing on par with the models that are currently being used in practice. It has almost 60% accuracy in terms of predicting the directions of the earnings changes in the next year.

We're not saying that analysts are going to get replaced by these machines. We show in our analysis that it's just corroborating analysts' insights. 

ICYMI
Recent Interviews:

Thanks for reading!

Drop me a line if you have story ideas, research, or upcoming conferences to share. [email protected]

Reply

or to participate.