Large language models (LLMs) have achieved impressive performance across a wide range of NLP tasks, yet they often fail to fully incorporate newly provided context and instead rely excessively on prior knowledge acquired during pre-training. This imba...
Large language models (LLMs) have achieved impressive performance across a wide range of NLP tasks, yet they often fail to fully incorporate newly provided context and instead rely excessively on prior knowledge acquired during pre-training. This imbalance frequently leads to hallucinated or non-factual outputs, which is especially problematic in domains such as question answering, summarization, law, and medicine, where factual correctness and evidence-based reasoning are essential.
In this thesis, we propose Dynamic Layer-Contrast Decoding (DLCD), a decoding strategy designed to improve contextual integration while preserving useful prior knowledge. DLCD analyzes the difference between context and no-context predictions at multiple transformer layers, and dynamically rebalances the contribution of context and pre-trained knowledge. Concretely, DLCD constructs a contrastive distribution at each layer by reweighting tokens whose probabilities increase under contextual input, and then uses Jensen–Shannon divergence to automatically select the layer at which contextual signals are most salient. The selected intermediate layer is then contrastively combined with the final layer through log-domain reweighting controlled by two hyperparameters, and , so that context-supported tokens are amplified and context-agnostic tokens are suppressed.
We evaluate DLCD on two open-domain QA benchmarks, HotPotQA and SQuAD v1.1, using LLaMA-based models without any additional fine-tuning. Experimental results show that DLCD improves exact match (EM) by up to 2.2 percentage points and F1 score by up to 3.9 points over simple context injection, and achieves performance comparable to or better than existing methods such as Context-Aware Decoding (CAD) and DoLa. Sensitivity analyses on the start layer and the contrast coefficients indicate that contextual and prior knowledge are most stably balanced at intermediate layers (around the 16th layer), and that DLCD remains robust over a broad range of lambda layer and lambda context values.
Overall, DLCD provides a lightweight decoding-time approach that mitigates hallucinations and enhances factual consistency without modifying model parameters. Future work will extend DLCD to other tasks such as summarization, dialogue, and multi-hop reasoning, and explore adaptive schemes that automatically tune the contrast strength per query, thereby further improving the reliability of LLM-based systems.