대규모 언어 모델을 이용한 거시경제 시계열 예측 = Forecasting Macroeconomic Time Series Using Large Language Models|RISS 상세보기

국문 초록 (Abstract)

본 연구는 대규모 언어모델(Large Language Models, LLMs) 기반 시계열 예측 기법을 거시경제 변수 예측 문제에 적용하고, 기존 전통적 시계열·회귀·머신러닝 모형과의 비교를 통해 그 실증적 유효성 과 한계를 체계적으로 분석한다. 기존 TimeLLM 및 TimeGPT 관련 연구가 주로 zero-shot 또는 few-shot 예측 성능의 일반화 가능성 검증에 초점을 두었던 것과 달리, 본 연구는 사전학습된 LLM 본체 파라미터를 고정한 상태에서 시계열 예측 연결 계층만을 제한적으로 감독학습시키는 ‘제한적 모형 적응(parameter-frozen task-adaptive learning)’ 전략을 적용함으로써, 거시경제 시계열 예 측에서의 안정성과 실용성을 실증적으로 검증한다. 분석 대상 변수로는 실물 경기의 변동을 직접적으로 반영하는 산업생산지수(Industrial Production, IP)와 소비 흐름을 대표하는 개인소비지출(Personal Consumption Expenditures, PCE) 를 선정하였으며, 1960년 이후 장기 월별 데이터를 기반으로 rolling-origin 평가 환경에서 1기, 3 기, 6기, 12기 다중 예측 지평을 설정하였다. 비교 모형으로는 TimeLLM(GPT2, BERT, LLaMA)과 TimeGPT를 포함한 LLM 계열, AR 계열 및 규제 회귀 모형, AutoGRU·AutoLSTM을 포함한 딥러닝 모형, Random Forest 및 Boruta Random Forest 기반 트리 모형을 포괄적으로 포함하였다. 예측 성능 평가 는 절대오차(MAE, RMSE)와 함께 상대오차 안정성을 반영하는 MAPE 및 RMSPE를 중심으로 수행하였 다. 실증 분석 결과, IP 예측에서는 구조적 충격과 장기 추세가 공존하는 환경에서 TimeLLM–LLaMA 가 전 예측 지평에 걸쳐 상대오차 기준의 안정성과 일관된 성능 우위를 보이는 경향이 확인되었다. 특히 RMSPE 기준에서 LLaMA 기반 모델은 예측 지평이 길어질수록 오차 변동성이 억제되는 특성을 보였으며, 이는 전통적 통계모형이나 다수의 머신러닝 모형에서 관찰되는 장기 예측 불안정성과 대 비된다. 반면, GPT2 및 BERT 기반 TimeLLM은 단기 지평에서는 경쟁력 있는 절대오차 성능을 보였으 나, 지평 확장 시 상대오차 변동성이 상대적으로 확대되는 경향을 나타냈다.PCE 예측에서는 단기 변동성이 크고 미세한 잡음이 반복적으로 발생하는 시계열 특성으로 인해, AutoGRU·AutoLSTM 및 Elastic Net과 같은 전통적·딥러닝 기반 모형이 절대오차 기준에서 안정적인 성과를 보였다. 그러 나 예측 지평이 중·장기로 확장될수록 일부 모형에서는 상대오차의 누적 불안정성이 관찰되었으며, 이 과정에서 TimeLLM–LLaMA는 절대오차와 상대오차 간의 균형 측면에서 비교적 안정적인 예측 특 성을 유지하였다. 이는 LLaMA 기반 표현이 소비지출 시계열의 완만한 추세 변화와 구조적 변동을 동시에 포착하는 데 상대적으로 유리함을 시사한다.특히 주목할 점은 TimeLLM–LLaMA가 Boruta Random Forest 기반 변수 중요도 선별을 통해 113개 다변량 입력 대신 상위 11개 핵심 변수만을 사 용하였음에도 불구하고, 전체 변수를 사용한 다변량 모델 대비 오히려 더 우수하거나 안정적인 예 측 성과를 보였다는 점이다. 이는 LLaMA 기반 표현이 고차원 입력에서 발생하는 잡음 누적과 과잉 반응 문제에 민감한 반면, 정보 밀도가 높은 핵심 변수 집합에 대해서는 구조적 의존성과 장기 패 턴을 보다 효율적으로 내재화할 수 있음을 시사한다. 종합하면, 본 연구는 LLM 기반 시계열 예측이 거시경제 변수 예측에서 단순한 zero-shot 대안이 아 니라, 제한적 모형 적응과 변수 구성 전략이 결합될 경우 실질적인 예측 도구로 활용될 수 있음을 실증적으로 제시한다. 동시에, 모든 LLM이 동일한 방식으로 효과를 발휘하는 것은 아니며, 모델 백 본 특성, 입력 변수 차원, 예측 지평, 오차 지표에 따라 성능이 상이하게 나타남을 명확히 보여준 다. 이러한 결과는 LLM 기반 예측의 실증적 적용 가능성을 보다 정교하게 확장하기 위한 연구 방향 을 제시한다.

번역하기

본 연구는 대규모 언어모델(Large Language Models, LLMs) 기반 시계열 예측 기법을 거시경제 변수 예측 문제에 적용하고, 기존 전통적 시계열·회귀·머신러닝 모형과의 비교를 통해 그 실증적 유효�...

다국어 초록 (Multilingual Abstract)

This study applies large language model (LLM)–based time-series forecasting methods to macroeconomic prediction problems and systematically evaluates their empirical effectiveness and limitations through comparisons with conventional time-series, regression, and machine-learning models. Unlike prior studies on TimeLLM and TimeGPT that primarily focus on the generalization performance of zero-shot or few-shot forecasting, this study adopts a parameter-frozen task-adaptive learning strategy, in which the pretrained LLM backbone parameters are fixed while only the time-series forecasting connection layers are selectively trained under supervision. Through this approach, the study empirically examines the stability and practical applicability of LLM-based forecasting in macroeconomic time-series settings.
The empirical analysis focuses on the Industrial Production index (IP), which directly reflects real economic activity, and Personal Consumption Expenditures (PCE), which represent consumption dynamics. Using long-term monthly data since 1960, a rolling-origin evaluation framework is employed with multiple forecast horizons of 1, 3, 6, and 12 periods ahead. The comparison models include LLM-based approaches such as TimeLLM (GPT2, BERT, and LLaMA) and TimeGPT, AR-family and regularized regression models, deep-learning models including AutoGRU and AutoLSTM, as well as tree-based models such as Random Forest and Boruta Random Forest. Forecast performance is evaluated using absolute error measures (MAE and RMSE) alongside MAPE and RMSPE, which capture relative error stability.
The empirical results show that, in IP forecasting environments characterized by the coexistence of structural shocks and long-term trends, TimeLLM–LLaMA consistently demonstrates superior and stable performance across all forecast horizons when evaluated using relative error metrics. In particular, under the RMSPE criterion, LLaMA-based models exhibit suppressed error variability as the forecast horizon lengthens, contrasting with the long-horizon instability commonly observed in traditional statistical models and many machine-learning approaches. By contrast, GPT2- and BERT-based TimeLLM models achieve competitive absolute error performance in short-term forecasts but display relatively amplified relative error variability as the forecast horizon expands.
In PCE forecasting, where short-term volatility and recurrent micro-level noise dominate the
- 168 -time-series characteristics, conventional deep-learning and regularized models such as AutoGRU, AutoLSTM, and Elastic Net exhibit stable performance in terms of absolute errors. However, as the forecast horizon extends to medium and long terms, cumulative instability in relative errors emerges for some models. In this context, TimeLLM–LLaMA maintains comparatively stable forecasting performance by balancing absolute and relative error measures, suggesting that LLaMA-based representations are particularly effective at jointly capturing gradual trend changes and structural dynamics in consumption-related time series.
A particularly noteworthy finding is that TimeLLM–LLaMA achieves equal or superior forecasting performance using only 11 core variables selected via Boruta Random Forest feature importance, compared to multivariate models utilizing all 113 macroeconomic indicators. This result indicates that LLaMA-based representations are sensitive to noise accumulation and overreaction in high-dimensional input spaces, while effectively internalizing structural dependencies and long-term patterns when applied to compact, information-dense variable sets. In contrast, applying the same variable reduction strategy to GPT2- and BERT-based TimeLLM models does not yield consistent performance improvements, suggesting that models with relatively limited contextual aggregation capacity may suffer information loss under aggressive input compression.
In summary, this study demonstrates that LLM-based time-series forecasting should not be regarded merely as a zero-shot alternative in macroeconomic prediction. Instead, when combined with restricted model adaptation and carefully designed variable selection strategies, LLM-based models—particularly those built on LLaMA backbones—can function as practical and stable forecasting tools. At the same time, the results clarify that LLMs do not uniformly benefit from identical modeling strategies, as forecasting performance varies systematically with backbone characteristics, input dimensionality, forecast horizon, and evaluation metrics. These findings provide a refined perspective on the empirical applicability of LLM-based forecasting and outline promising directions for future research aimed at integrating LLMs more effectively into macroeconomic prediction frameworks.

번역하기

목차 (Table of Contents)

Ⅰ. 서 론 1
제 1 절 연구배경 1
제 2 절 연구의 목적 및 주요 연구 질문 4
제 3 절 연구의 범위 및 구성 6
Ⅱ. 이론적 배경 및 선행 연구 11

Ⅰ. 서 론 1
제 1 절 연구배경 1
제 2 절 연구의 목적 및 주요 연구 질문 4
제 3 절 연구의 범위 및 구성 6
Ⅱ. 이론적 배경 및 선행 연구 11
제 1 절 거시경제지표 예측 이론 11
1. 거시연구지표의 정의 및 상호연관성 11
2. 기존 계량경제모형(ARIMA, SARIMA, VAR) 개요 13
3. 인공지능 및 머신러닝 기반 경제예측의 등장 17
제 2 절 머신러닝 및 딥러닝 기반 예측모형 51
1. LSTM, GRU, Transformer 구조와 한계 51
2. 시계열 데이터의 비선형성 및 장기 의존성 문제 60
제 3 절 대규모 언어모델(LLM)과 시계열 결합 66
1. LLM의 금융·경제 분야 적용 동향 67
2. TimeLLM의 구조 71
3. 시계열-언어모델 결합의 장점과 도전과제 73
제 4 절 선행연구 검토 및 본 연구의 차별성 78
1. 전통적 모형 VS 인공지능 및 머신러닝 모형 성능비교 연구 78
2. LLM 기반 시계열 예측의 초기 적용 사례 84
3. 본 연구의 차별성 87
Ⅲ. 연구 모형 및 방법론 93
제 1 절 연구절차 개요 93
제 2 절 데이터 구축 및 평가방법 95
제 3 절 주요 모형 110
Ⅳ. 실증분석 및 결과 118
제 1 절 데이터 기술통계 및 상관분석 118
제 2 절 예측결과 비교분석 140
1. IP / PCE 이상치 분석 140
2. 모델별 성능 비교 145
3. TimeGPT/TimeLLM의 예측 정확도 및 안정성 147
4. 세부 예측 성능 비교 151
Ⅴ. 결론 및 시사점 159
제 1 절 연구요약 및 주요발견 159
제 2 절 한계 및 향후 연구방향 161

상세검색

RISS 보유자료

상세검색

해외전자자료

대규모 언어 모델을 이용한 거시경제 시계열 예측 = Forecasting Macroeconomic Time Series Using Large Language Models

부가정보

분석정보

연관 공개강의(KOCW)

이 자료와 함께 이용한 RISS 자료

나만을 위한 추천자료