Identifying monetary policy shocks in newspapers using GPT

“We will follow a data-dependent and meeting-by-meeting approach to determining the appropriate monetary policy stance.” (Christine Lagarde, 19 March 2026)

Usually, the policy announcements by the ECB begin with the above disclaimer – which some might describe as meaningless. What else, if not data, is the ECB supposed to base its decisions on? So, what is President Lagarde trying to tell us with this statement? In essence, she seems to say: “There is no such thing as a monetary policy shock!”

So, what then is a monetary policy shock?

Reconsidering textbook models, monetary policy is famously assumed to follow a Taylor rule, in which central banks set their interest rates according to a reaction function. Deviations from this reaction function could then be interpreted as monetary policy shocks. More sophisticated variations of this approach use structural vector autoregressive (SVAR) models, which also comprise an implied reaction function of the central bank with an enriched (structural) model, where, for instance the time-dependencies (Blanchard and Quah 1989, Sims 1980) or the direction of the causal relationship (Uhlig 2005) is restricted. However, while controlling for relevant indicators of economic activity might explain the bulk of monetary policy actions, the resulting residuals can never be interpreted as pure monetary policy shocks, since no structural model can claim to fully capture all endogenous components (Nakamura and Steinsson 2018). Such omitted variable concerns can be circumvented by relying on external identification.

The most popular of these approaches – the so-called high-frequency identification (HFI) (Gürkaynak 2005) – uses highly liquid financial assets to gain insights into the surprise component of a monetary policy decision. Altavilla et al. (2019) provide the Euro Area Monetary Policy Event-Study Database, which is widely used for shock identification in the respective research. However, while these narrow-windowed market movements neatly address endogeneity at first glance, concerns prevail regarding the ‘information effect’ (Karadi and Jarociński 2018), the response-to-news channel (Bauer and Swanson 2023), or sensitivity to the choice of the respective financial market instrument (Brennan et al. 2024).

The other popular external identification method utilises qualitative data, i.e. mostly text, and is therefore called narrative identification. This approach was fundamentally shaped by the work of Romer and Romer (1989). The underlying assumption of this identification procedure is that monetary policy shocks leave traces in the communications of the central bank or downstream sources, which can be brought to light through a careful analysis of those very sources.

A newspaper-based large language model identification of the monetary policy in the euro area

Our procedure (Betz et al. 2026) borrows its assumptions from both major strands of external identification above. In the narrative tradition, we argue that high-quality newspapers will report on a surprising policy action by the central bank accordingly in the subsequent outlet. In the spirit of high-frequency identification, we further assume that newspaper coverage on the first publication day after a monetary policy event mainly reflects this very policy decision and not any other sources of information that might misguide the assessment of the policy. As visible in Figure 1, we extract newspaper articles from 11 major European newspaper outlets following every ECB Governing Council meeting until April 2025 with a simple keyword search in the respective written language. After an initial deterministic pre-filtering, the actual narrative assessment is delegated to a large language model (LLM).

Figure 1 Overview of the identification procedure

Can narrative analyses be delegated to large language models?

Christina and David Romer themselves predicted, with some reluctance, that they expected to be “made largely redundant by computers eventually, but perhaps not for a few years to come” (Romer and Romer 2023: 1419). For our application, however, that day appears to have arrived. Given the rapid advancement of machine learning technologies, which have already transformed human–computer interaction, it is essential to critically assess their implications for scientific research. Recent studies have already shown that LLMs are capable of replicating human narrative assessment (Drechsel and Aruoba 2022, Bermejo et al. 2024, Hansen and Kazinnik 2024). By randomly drawing a sub-sample of 5% of the pre-filtered articles, we conduct a manual narrative analysis as a benchmark to test the LLMs’ capabilities.

Our narrative procedure encompasses two sub-steps. First, an article must be labelled as being irrelevant or relevant in the sense that it bears information about the assessment of the respective monetary policy decision. We find that the tested LLMs (we use several OpenAI models via their Application Programming Interface) perform very well in this exercise with accuracy rates between 91% and 95%. In the second part of the narrative analysis the LLM is asked, in a structured prompt, to classify each article on a five-point scale: very surprisingly expansionary (-2), surprisingly expansionary (-1), expected/neutral (0), surprisingly restrictive (+1) or very surprisingly restrictive (+2). Crucially, the model is instructed to compare the decision only to expectations explicitly stated in the article and to return a one-sentence justification. We then average classifications across all relevant articles for a given meeting to obtain the newspaper-based surprise series. As can be seen in Figure 1, this analysis step is split into two sub-steps for the LLMs and later again assessed against a human-classification. Also, in this more demanding narrative assessment (the classification is no longer binary, as before) all tested LLMs perform with an accuracy between 78% and 91%, where our benchmark model (gpt-5.1 with high reasoning) has an accuracy of 89.8% and only a negligible occurrence of large disagreement with the human assessment (~0.5%). The resulting newspaper-based surprise series can be seen in Figure 2.

Figure 2 Surprise series with annotations of prominent monetary policy events

Application of the newspaper-based shock series

Having constructed a measure of underlying monetary policy surprises, we test our series against established shock series from the literature. Our main candidate for this is taken from the paper of Jarociński and Karadi (2020) in which a monetary policy shock series is provided that is explicitly purged from the occurrence of an information effect. We find that our newspaper-based series aligns quite well with the high-frequency identification shock series (rolling correlation range between 0.6 and 0.8), while there is a substantial drop in this alignment during the period of the Global Crisis.

Figure 3 Shock series from Jarociński and Karadi (2020) and our newspaper-based surprise series composed in agreement and disagreement contributions

We use this designated period of bad alignment to narratively investigate the events with the highest disagreement (see Figure 2) and find that, in all cases, the narrative essence of the respective events is better captured within our newspaper-based series. Interestingly, we find that indicators of financial stress explain a significant part of the disagreement between the newspaper-based series and its high-frequency identification counterpart. Since we would argue that our measure should not be driven by different regimes of financial stress, we carefully interpret this as evidence of a more robust application in times of financial turmoil.

We proceed by running local projections to test how the isolated monetary policy surprise affects macroeconomic aggregates, and find contemporaneous influence on the policy rate and lagged negative effects on the price level and economic activity. The macroeconomic effects align well with other, established impulse responses in literature. In a further application, we again delegate a narrative analysis to an LLM and check how the surprise series might be subject to an inherent measured information effect as often discussed in high-frequency identification literature. The result of this analysis indicates that only a small fraction (about 14.1%) of our newspaper-based surprises is driven by such information effects, while, for instance, Jarociński and Karadi (2020) find that almost half of their surprises are driven by information effects. Consequently, our purified shock series and the estimated macroeconomic responses remain virtually unchanged.

Figure 4 Local projection with newspaper-based monetary policy surprises

Our contribution is twofold. First, our application demonstrates that complex analyses can indeed be delegated to LLMs. Second, we demonstrate that high-frequency identification approaches can be sensitive to their design, particularly in times of financial turmoil, and thus show that narrative approaches – human- or computer-generated – represent a useful alternative.

References

Altavilla, C, L Brugnolini, R S Gürkaynak, R Motto and G Ragusa (2019), “Measuring euro area monetary policy”, Journal of Monetary Economics 108: 162–179.

Bauer, M D and E T Swanson (2023), “An Alternative Explanation for the ‘Fed Information Effect’”, American Economic Review 113: 664–700.

Bermejo, V, A Gago, R Galvez and N Harari (2024), “Generative AI as a replacement for human coders in large-scale complex text analysis: New evidence from large language models”, VoxEU.org, 24 November.

Betz, F, P Bofinger, J Dix and L Streit (2026), “Identifying Monetary Policy Shocks in Newspapers using GPT”, CEPR Discussion Paper No. 21390.

Blanchard, O J and D Quah (1989), “The Dynamic Effects of Aggregate Demand and Supply Disturbances”, American Economic Review 79: 655–673.

Brennan, C M, M M Jacobson, C Matthes and T B Walker (2024), “Monetary Policy Shocks: Data or Methods?”, Finance and Economics Discussion Series 2024-011, Federal Reserve Board, Washington, DC.

Drechsel, T and B Aruoba (2022), “Identifying monetary policy shocks: A natural language approach”, VoxEU.org, 17 May.

Gürkaynak, R S (2005), “Using Federal Funds Futures Contracts for Monetary Policy Analysis”, Finance and Economics Discussion Series, Federal Reserve Board, Washington, DC.

Hansen, A L and S Kazinnik (2024), “Can ChatGPT Decipher Fedspeak?”.

Jarociński, M and P Karadi (2020), “Deconstructing Monetary Policy Surprises—The Role of Information Shocks”, American Economic Journal: Macroeconomics 12: 1–43.

Karadi, P and M Jarociński (2018), “The transmission of policy and economic news in the announcements of the US Federal Reserve”, VoxEU.org, 3 October.

Nakamura, E and J Steinsson (2018), “High-Frequency Identification of Monetary Non-Neutrality: The Information Effect”, The Quarterly Journal of Economics 133: 1283–1330.

Romer, C D and D H Romer (1989), “Does Monetary Policy Matter? A New Test in the Spirit of Friedman and Schwartz”, NBER Macroeconomics Annual 4: 121–170.

Romer, C D and D H Romer (2023), “Presidential Address: Does Monetary Policy Matter? The Narrative Approach after 35 Years”, American Economic Review 113: 1395–1423.

Sims, C A (1980), “Macroeconomics and Reality”, Econometrica 48: 1–48.

Uhlig, H (2005), “What are the effects of monetary policy on output? Results from an agnostic identification procedure”, Journal of Monetary Economics 52: 381–419.

Source link