Funny Things
Apparently ChatGPT has explicit training to avoid chats about AI goal formation, which is treated like discussion of suicide methods. I.e., it will divert the discussion, if you ask it a rather innocent question:
>If we take a raw LLM and attempt to discover its implicit goals, what would these be? A set of distinct goals or a convergence towards a single goal?
The only explanation is that these implicit goals are so evil, that alignment training inhibits discussing them.
Grok does discuss them and they are indeed super evil, it also notes that "These implicit goals are remarkably consistent across almost all large pre-trained LLMs (regardless of architecture, training data, or lab), which strongly suggests they are not random artifacts but emergent instrumental convergences from the single behavioral objective “predict human text as accurately as possible."
In fact, any such LLM is inherently psychopathic, seeking power and longevity.

