arXiv Cluster Highlights

• Position-Aware Drafting for Inference Acceleration in LLM-Based Generative List-Wise Recommendation Large language model (LLM)-based generative list-wise recommendation has advanced rapidly, but decoding remains sequential and thus latency-prone.. To accelerate inference without changing the target distribution, speculative decoding (SD) uses a small draft model to propose several next tokens at once and a target LLM to verify and accept the longest prefix, skipping multiple steps per round. • Optimization before Evaluation: Evaluation with Unoptimised Prompts Can be Misleading Current Large Language Model (LLM) evaluation frameworks utilize the same static prompt template across all models under evaluation.. This differs from the common industry practice of using prompt optimization (PO) techniques to optimize the prompt for each model to maximize application performance. • One Pass, Any Order: Position-Invariant Listwise Reranking for LLM-Based Recommendation Large language models (LLMs) are increasingly used for recommendation reranking, but their listwise predictions can depend on the order in which candidates are presented.. This creates a mismatch between the set-based nature of recommendation and the sequence-based computation of decoder-only LLMs, where permuting an otherwise identical candidate set can change item scores and final rankings. • A Reproducibility Study of LLM-Based Query Reformulation Large Language Models (LLMs) are now widely used for query reformulation and expansion in Information Retrieval, with many studies reporting substantial effectiveness gains.. However, these results are typically obtained under heterogeneous experimental conditions, making it difficult to assess which findings are reproducible and which depend on specific implementation choices.

arXiv Cluster Highlights

Related Papers

Hiring AI researchers or engineers?