WeiYa's Work Yard

A traveler with endless curiosity, who fell into the ocean of statistics, tries to write down his ideas and notes to save himself.

Debiasing Watermarks via Maximal Coupling

Posted on
Tags: Large Language Model, Gumbel-max Trick, Watermarking, Maximal Coupling

This note is for Xie, Y., Li, X., Mallick, T., Su, W., & Zhang, R. (2025). Debiasing Watermarks for Large Language Models via Maximal Coupling. Journal of the American Statistical Association, 0(0), 1–11.

it presents a green/red list watermarking approach that partitions the token set into green and red lists, subtly increasing the generation probability for green tokens

  • to correct token distribution bias, the method employs maximal coupling, using a uniform coin flip to decide whether to apply bias correction, with the result embedded as a pseudorandom watermark signal

Introduction

the earliest watermarking scheme, the Gumblel-max watermark. This method selects the next token $w_t$ using the decoder

\[w_t = S(P_t, \xi_t) = \argmax_{w\in \cW} \log \frac{U_w}{P_w}\]

assuming $\xi_t = (U_1,\ldots, U_{\vert \cW\vert})$ is a random vector consisting of $\vert \cW\vert$ iid copies of $U[0, 1]$, serving as part of the watermark key.

while this watermarked decoder theoretically preserves the original next-token distribution when $\xi_t$ is truly random and achieves high detection efficiency, this decoder can generate repetitive text and significantly degrade text quality in practice.

the main reason is that, for existing implementations, the pseudorandom variable $\xi_t$ depends deterministically on the prior $k$ tokens. As a result, when conditioned on previous tokens, the decoder deterministically outputs the next token, which can lead to sub-optimal text generation

another is the green/red list watermark, employs a stochastic decoder.

it first partitions the token vocabulary $\cW$ into the green list $\cG$ and the red list $\cR$ (pseudo)randomly, and shares the green list $\cG$ with the detector as part of the watermark key.

the watermark is embeded by prompting the use of green tokens during decoding: given the NTP distribution $P_t$ and the green list $\cG$, the watermarked decoder constructs an alternative distribution $Q_t$ and samples the next token from $Q_t$.

the simplest version is the hard watermark, where each sampled token is limited in the green list $\cG$. Then it is more likely from $Q_t$ than $P_t$, thus, one can detect this difference by counting the number of green tokens in a given text.

  • however, limiting sampling to a vocabulary subset distorts the NTP distribution $P_t$ and can degrade LLM generation significantly.
  • despite these potential drawbacks, this watermarked decoder remains stochastic, as additional randomness is introduced during sampling from $Q$, while pseudorandomness is only used to set the green list. This extra randomness in the sampling process helps avoid repetitive generation

each of the two watermarking schemes has its own strengths

  • on one hand, like the Gumbel-max watermark, we want the decoder $S$ to faithfully follows the original NTP distributions——a property knowns as unbiasedness
  • on the other hand, similar to the green/red list watermark, we seek a stochastic decoder $S$ alleviate the negative impact of the pseudorandom variables on text generation

can we achieve the best of both worlds?

the paper introduces a novel watermarking scheme that leverage maximal coupling to balance these goals

it reminds me the cross-validation on graphon

the process begins by drawing a standard uniform random variable $\zeta$ and compute $p = \sum_{w\in\cW} \min(P_{t,w}, Q_{t, w})$.

  • if $\zeta < p$, sample from the overlapping region
  • otherwise, sample from the excess distribution

to aid detection, we share the information $\xi_t = (\cG, \zeta_t)$, the green list $\cG$ and the random variable $\zeta_t$, with the detector.

  • $H_0$: the text $w_{1:n}$ is written by human; that is, $\zeta_t\mid (P_t, \cG)\sim_{iid} U[0, 1]$ for $t=1,\ldots, n$
  • $H_1$: the text $w_{1:n}$ is written by a language model; that is, $\zeta_t\mid (P_t, \cG)\sim U[0, P_{t,\cG}]$ for $t=1,\ldots, n$
  • $H_1^{(mix)}$: the text $w_{1:n}$ is first generated by a language model and then modified; that is $\zeta_t\mid (P_t,\cG)\sim (1-\varepsilon_n)U[0, 1] + \varepsilon_n U[0, P_{t, \cG}]$ for $t = 1,\ldots, n$

they focus on the sparse watermark signal case to facilitate theoretical analysis, $\varepsilon_n\asymp n^{-p}$ and $1 - P_{t,\cG} \asymp n^{-q}$ decay with $n$ at different rates

  1. Without sufficient watermark signals, that is, $p + 2q > 1$, $H_0$ and $H_1^{(mix)}$ merge asymptotically. Hence, no test can reliably separate $H_0$ and $H_1^{(mix)}$
  2. Given sufficient watermark signals, that is $p+2q < 1$, $H_0$ and $H_1^{(mix)}$ separate asymptotically. In this case, the alternative hypothesis can be reliably detected using the likelihood ratio test and higher criticism.
  3. with slightly stronger watermark signals, that is $p + q < 0.5$, the sum test, which rejects $H_0$ if the sum of the $\zeta_t$’s is too large, can reliably separate $H_0$ and $H_1^{(mix)}$ asymptotically.

another application of maximal coupling is speculative decoding, where a smaller draft model generates speculative token predictions that are then accepted or corrected by a more powerful target model. The paper apply this setup to the watermarking scheme and demonstrate that the method effectively retains the watermark signal despite the targeted attempts to modify the text.

2. Preliminaries

maximal coupling is a powerful tool in probability theory that constructs a coupling between two random variables to maximize the probability that they are equal

3. Watermarking Scheme via Maximal Coupling

a watermarking scheme has two parts

3.1 Decoding

3.2 Detection

  • Detection as Hypothesis Testing
  • Sum Test
  • Higher Criticism
  • Practical Considerations: the probability of sampling a token from a green list can be arbitrarily close to 1, or even equal to 1

3.3 Watermark Key

in cases where the watermarked text has a predetermined maximum length $n$, the language model provider can pre-generate both the green lists $\cG_{1:n}$ and uniform random variables $\zeta_{1:n}$, sharing them with the detector as the watermark key

for varying lengths, the language model provider supplies two distinct pseudorandom functions as the watermark key

  • the first generates the green lists
  • the second generates the uniform random variables

3.4 Interpret Speculative Decoding as Post-processing


Published in categories