1

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed …

Identifying and Analyzing Task-Encoding Tokens in Large Language Models

In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing. Past work has found that, during this process, representations of the last prompt token are utilized to store task reasoning procedures, …

DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder

Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models because NAT decoders do not depend on previous target tokens in the decoder input. We propose a novel and general …

Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the …

PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization

Few-shot abstractive summarization has become a challenging task in natural language generation. To support it, we developed a novel soft prompts architecture coupled with a prompt pre-training plus prompt fine-tuning paradigm, which is effective and …

Stage-wise Stylistic Headline Generation: Style Generation and Summarized Content Insertion

A quality headline with a high click-rate should not only summarize the content of an article, but also reflect a style that attracts users. Such demand has drawn rising attention to the task of stylistic headline generation (SHG). An intuitive …

Cross-Lingual Abstractive Summarization with Limited Parallel Resources

Parallel cross-lingual summarization data is scarce, requiring models to better use the limited available cross-lingual resources. Existing methods to do so often adopt sequence-to-sequence networks with multi-task frameworks. Such approaches apply …

Exploring Explainable Selection to Control Abstractive Summarization

Like humans, document summarization models can interpret a document's contents in a number of ways. Unfortunately, the neural models of today are largely black boxes that provide little explanation of how or why they generated a summary in the way …

Multiple perspective answer reranking for multi-passage reading comprehension

This study focuses on multi-passage Machine Reading Comprehension (MRC) task. Prior work has shown that retriever, reader pipeline model could improve overall performance. However, the pipeline model relies heavily on retriever component since …