The AI safety community currently seems to focus on issues of safety of sub-human level AI. For human-level and smarter-than-human AI see Box and ValueAlignment. Even if an AI system is perfectly safe, it may not be beneficial because of issues of social Power. For addressing this later problem see BasicIncome.
There are a number of areas of research in AI safety. Concrete Problems in AI Safety (Amodei et al., 2016) break the areas down as:
- Avoiding negative side effects - failure to include important factors in the objective function
- Avoiding reward hacking - failure to foresee ways the objective might be gamed
- Scalable oversight - ensuring objectives effectively scale from the training domain to the real world
- Safe exploration - avoiding very bad side effects from the AI trying to learn how the world works
- Robustness to distributional shifts - handling inputs different from what have been seen during training