1

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

Long sequence modeling has gained broad interest as large language models (LLMs) continue to advance. Recent research has identified that a large portion of hidden states within the key-value caches of Transformer models can be discarded (also termed …

Identifying and Analyzing Task-Encoding Tokens in Large Language Models

In-context learning (ICL) has become an effective solution for few-shot learning in natural language processing. Past work has found that, during this process, representations of the last prompt token are utilized to store task reasoning procedures, …

DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder

Non-autoregressive machine translation (NAT) models have lower translation quality than autoregressive translation (AT) models because NAT decoders do not depend on previous target tokens in the decoder input. We propose a novel and general …

Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

In this paper, we propose a novel task, Cross-lingual Summarization with Compression rate (CSC), to benefit Cross-Lingual Summarization by large-scale Machine Translation corpus. Through introducing compression rate, the information ratio between the …

PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization

Few-shot abstractive summarization has become a challenging task in natural language generation. To support it, we developed a novel soft prompts architecture coupled with a prompt pre-training plus prompt fine-tuning paradigm, which is effective and …

1

CItruS: Chunked Instruction-aware State Eviction for Long Sequence Modeling

Identifying and Analyzing Task-Encoding Tokens in Large Language Models

DePA: Improving Non-autoregressive Translation with Dependency-Aware Decoder

Unifying Cross-lingual Summarization and Machine Translation with Compression Rate

PSP: Pre-trained Soft Prompts for Few-Shot Abstractive Summarization

Stage-wise Stylistic Headline Generation: Style Generation and Summarized Content Insertion

Cross-Lingual Abstractive Summarization with Limited Parallel Resources

Exploring Explainable Selection to Control Abstractive Summarization

Multiple perspective answer reranking for multi-passage reading comprehension