How can I infer the beliefs of a neural network in an unsupervised way?
That question motivates Burns et al. to propose the Contrast-Consistent Search (CCS) method in their work "Discovering Latent Knowledge in language models without Supervision", published at ICLR 2023.
An overview of the work 👇
Link to the paper: https://arxiv.org/abs/2212.03827