EA - Prize and fast track to alignment research at ALTER by Vanessa
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund
Kategorien:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Prize and fast track to alignment research at ALTER, published by Vanessa on September 18, 2022 on The Effective Altruism Forum. Cross-posted from the AI Alignment Forum. On behalf of ALTER and Superlinear, I am pleased to announce a prize of at least 50,000 USD, to be awarded for the best substantial contribution to the learning-theoretic AI alignment research agenda among those submitted before October 1, 2023. Depending on the quality of submissions, the winner(s) may be offered a position as a researcher in ALTER (similar to this one), to continue work on the agenda, if they so desire. Submit here. Topics The research topics eligible for the prize are: Studying the mathematical properties of the algorithmic information-theoretic definition of intelligence. Building and analyzing formal models of value learning based on the above. Pursuing any of the future research directions listed in the article on infra-Bayesian physicalism. Studying infra-Bayesian logic in general, and its applications to infra-Bayesian reinforcement learning in particular. Theoretical study of the behavior of RL agents in population games. In particular, understand to what extent infra-Bayesianism helps to avoid the grain-of-truth problem. Studying the conjectures relating superrationality to thermodynamic Nash equilibria. Studying the theoretical properties of the infra-Bayesian Turing reinforcement learning setting. Developing a theory of reinforcement learning with traps, i.e. irreversible state transitions. Possible research directions include studying the computational complexity of Bayes-optimality for finite state policies (in order to avoid the NP-hardness for arbitrary policies) and bootstrapping from a safe baseline policy. New topics might be added to this list over the year. Requirements The format of the submission can be either a LessWrong post/sequence or an arXiv paper. The submission is allowed to have one or more authors. In the latter case, the authors will be considered for the prize as a team, and if they win, the prize money will be split between them either equally or according to their own internal agreement. For the submission to be eligible, its authors must not include: Anyone employed or supported by ALTER. Members of the board of directors of ALTER. Members of the panel of the judges. First-degree relatives or romantic partners of judges. In order to win, the submission must be a substantial contribution to the mathematical theory of one of the topics above. For this, it must include at least one of: A novel theorem, relevant to the topic, which is difficult to prove. A novel unexpected mathematical definition, relevant to the topic, with an array of natural properties. Some examples of known results which would be considered substantial at the time: Theorems 1 and 2 in "RL with imperceptible rewards". Definition 1.1 in "infra-Bayesian physicalism", with the various theorems proved about it. Theorem 1 in "Forecasting using incomplete models". Definition 7 in "Basic Inframeasure Theory", with the various theorems proved about it. Evaluation The evaluation will consist of two phases. In the first phase, I will select 3 finalists. In the second phase, each of the finalists will be evaluated by a panel of judges comprising of: Adam Shimi Alexander Appel Daniel Filan Vanessa Kosoy (me) Each judge will score the submission on a scale of 0 to 4. These scores will be added to produce a total score between 0 and 16. If no submission achieves a score of 12 or more, the main prize will not be awarded. If at least one submission achieves a score of 12 or more, the submission with the highest score will be the winner. In case of a tie, the money will be split between the front runners. The final winner will be announced publicly, but the scores received by various submissions...
