EA - Straightforwardly eliciting probabilities from GPT-3 by NunoSempere
The Nonlinear Library: EA Forum - Ein Podcast von The Nonlinear Fund
Kategorien:
Link to original articleWelcome to The Nonlinear Library, where we use Text-to-Speech software to convert the best writing from the Rationalist and EA communities into audio. This is: Straightforwardly eliciting probabilities from GPT-3, published by NunoSempere on February 9, 2023 on The Effective Altruism Forum.I explain two straightforward strategies for eliciting probabilities from language models, and in particular for GPT-3, provide code, and give my thoughts on what I would do if I were being more hardcore about this.Straightforward strategiesLook at the probability of yes/no completionGiven a binary question, like “At the end of 2023, will Vladimir Putin be President of Russia?†you can create something like the following text for the model to complete:Then we can compare the relative probabilities of completion to the “Yes,†“yes,†“No†and “no†tokens. This requires a bit of care. Note that we are not making the same query 100 times and looking at the frequencies, but rather asking for the probabilities directly:You can see a version of this strategy implemented here.A related strategy might be to look at what probabilities the model assigns to a pair of sentences with opposite meanings:“Putin will be the president of Russia in 2023â€â€œPutin will not be the president of Russia in 2023.â€For example, GPT-3 could assign a probability of 9 10^-N to the first sentence and 10^-N to the second sentence. We could then interpret that as a 90% probability that Putin will be president of Russia by the end of 2023.But that method has two problems:The negatively worded sentence has one word more, and so it might systematically have a lower probabilityGPT-3’s API doesn’t appear to provide a way of calculating the likelihood of a whole sentence.Have the model output the probability verballyYou can directly ask the model for a probability, as follows:Now, the problem with this approach is that, untweaked, it does poorly.Instead, I’ve tried to use templates. For example, here is a template for producing reasoning in base rates:Many good forecasts are made in two steps.Look at the base rate or historical frequency to arrive at a baseline probability.Take into account other considerations and update the baseline slightly.For example, we can answer the question “will there be a schism in the Catholic Church in 2023?†as follows:There have been around 40 schisms in the 2000 years since the Catholic Church was founded. This is a base rate of 40 schisms / 2000 years = 2% chance of a schism / year. If we only look at the last 100 years, there have been 4 schisms, which is a base rate of 4 schisms / 100 years = 4% chance of a schism / year. In between is 3%, so we will take that as our baseline.The Catholic Church in Germany is currently in tension and arguing with Rome. This increases the probability a bit, to 5%.Therefore, our final probability for “will there be a schism in the Catholic Church in 2023?†is: 5%For another example, we can answer the question “${question}†as follows:That approach does somewhat better. The problem is that sometimes the base rate approach isn’t quite relevant, because sometimes we have neither a historical record—e.g,. global nuclear war. And sometimes we can't straightforwardly rely on the lack of a historical track record: VR headsets haven’t really been adopted in the mainstream, but their price has been falling and their quality rising, so making a forecast solely looking at the historical lack of adoption might lead one astray.You can see some code which implements this strategy here.More elaborate strategiesVarious templates, and choosing the template depending on the type of questionThe base rate template is only one of many possible options. We could also look at:Laplace rule of succession template: Since X was first possible, how often has it happened?“Mainstream plausibility†template: We could prompt a model to simulate how plausible a well-informed member of the public thinks that an eve...
