My career focus has increasingly shifted to apply clinical psychology and human factors expertise to AI model behaviour, personality, (meta)cognition, and identity to, in turn, help solve AI alignment challenges.
I believe human factors domain expertise is vital at the source of the river.
Where AI models are built.
Where AI is trained, fine-tuned, aligned, evaluated, red-teamed (and black-teamed), and stress-tested.
Especially when it comes to emerging misalignment, alignment faking, and hidden objectives in AI models.
In particular, where AI models display behaviours that align with dark triad characteristics and personality traits.
To help account for these AI model behaviours before deployment.
To help account for the more covert, dormant-over-time, hidden objectives that may only manifest after deployment, with time-lags.
As AI systems scale and become increasingly powerful, getting this right becomes existential.