18%8TR4G8FR'VV7S3*'?B17
SYSTEM PROCESSING...
18%8TR4G8FR'VV7S3*'?B17
SYSTEM PROCESSING...
Posted: 2025-04-16 09:18:43 UTC

This article contains some claims that remain unverified. While much of the content may be accurate, exercise care when relying on this information.
This article contains some claims that remain unverified. While much of the content may be accurate, exercise care when relying on this information.
Status
Last Updated
2025-04-16 09:19:05 UTC
Verified By
Rollup News
Standard RLHF models show a decline in human-like discourse during training, optimizing for preference over structure. Structural Alignment, using rewards from hierarchical discourse trees, offers a solution by rewarding tokens contributing to human writing patterns. This approach keeps training stable and highlights the uncorrelated nature of surface features and deeper discourse structures, challenging current alignment methods for long-form text.
RLHF models degrade text structure during training.
Structural Alignment improves coherence using discourse trees.
Surface features and discourse structures are uncorrelated.
RLHF models degrade text structure during training.
Balancing preference and structure in long-form text generation.
Capturing both local flow and global structure.