Philosophy 152
Science & Reason
Spring 2006
Lecture Notes

Inference to the Best Explanation

 

I. Background

 

A) We’ve learned from our discussion of knowledge and justification that knowledge is (approximately) justified true belief; that justification is a matter of having very strong evidence of truth (but not necessarily conclusive evidence); that evidence of truth differs importantly from motivations for believing; and that there are a variety of circumstances in which people can be misled or confused about what their evidence supports, and motivational factors can lead them to unsupported conclusions. Does, or can, science yield knowledge? If so, exactly how does justification arise through science?

 

B) One idea that we’ve examined is that science yields knowledge through inductive reasoning. We saw that there are two puzzles about induction - Hume’s problem and Goodman’s problem. I offered a response to Hume’s problem, though not to Goodman’s. But it is very important to understand the point captured in the slogan “theories go beyond the evidence”. Also, in discussing theories and observations, we saw that theories go beyond observations. So, even if induction is perfectly good, it is not enough to provide justification for theoretical beliefs. (We discussed this several weeks ago, in connection with “theories”.)

 

C) We talked a little about Popper’s remarkable idea (on at least one interpretation): scientific claims are not justified! It’s important to separate the two aspects of Popper’s thought: i) His proposed solution to the demarcation problem. We saw that this did not work out so well. He wanted to distinguish scientific from non-scientific statements. He failed at that. Instead, we got from his thinking the idea that there is a distinction between good scientific behavior - accepting that your theory is false in the light of counter-evidence - from bad scientific behavior - always arguing that there was something wrong with the experiment that seemed to falsify your theory. But we found, at least in the excerpt we read, no clear way to explain that distinction - when is that reaction acceptable and when not? We will return to the demarcation problem next week.

(ii) His idea that scientific claims are never justified. At best, they are just not falsified. This is hard to accept. Note: if you think that they are justified because they’ve been put to hard tests and come through ok, then you are not accepting the extreme version of Popper’s view.

 

D) There is a related problem that we also mentioned earlier, and it deserves mention here: given any set of data, there is more than one theory that can account for it. In effect, this is a consequence of the points discussed in (C(i)) above. More radical, and sometimes silly, examples illustrate the idea: a) my tire tracks example; b) dreams, demons, and deceivers; c) Goodman like switches (grue/bleen). Here are some more examples: A. As noted, there is (always) more than one hypothesis to account for any set of facts.

 

More examples: from the reading, the gremlin theory and the standard theory (involving mercury, and electrons, and photons) explain the simple observations about fluorescent lights. More serious examples are possible: a) slightly more serious - Columbus, flat earthers, and light (from Weird, p. 183); b) solar system, at least years ago; c) creation and evolution; d) If you take ordinary thought about things to be science-like, or science-light, you see the same thing. Suppose someone treats you in various nice ways. There will be two hypotheses, at least: i) the person is genuinely friendly and considerate, and likes you; ii) the person has some ulterior motive, and behaves this way to get you off-guard. No matter what happens, these theories, or some more complex variant of them, will explain the data. It’s crucial that you see that this is true. And this is the real significance of the claim that the theories go beyond the evidence.

 

This is what R. calls the “underdetermination problem” on p. 139.

 

E) R. says on the bottom of p. 139 there are other factors - simplicity, etc. - that are used to “judge” theories. He displays on p. 93 a pattern of argument connected with this - inference to the best explanation (IBE). The idea is that theories that do better on these other factors are better justified.

 

F) So here’s an idea: IBE solves i) Hume’s problem -

1. All observed As have been Bs

2. The best explanation of (1) is that all As are Bs

3. So, all As are Bs

; ii) the underdetermination problem (including Goodman’s problem);

and iii) the demarcation problem (the bad guys are violating IBE standards).

 

G) But there are three hard questions about IBE: 1) How, exactly, should we formulate such arguments? Note for now that this is not a deductively valid argument pattern. Still, it seems like a good way to reason. 2) What, exactly, makes a theory count as “best” or “better” than some other theory? What are the criteria for this? 3) Why, exactly, is it reasonable for us to believe the theories that fit these criteria? Is this just some sort of bias, or is there a rational basis for this?

 

II. Criteria for judging theories

 

Preliminary point: I think that the criteria about to be discussed are sensible and interesting. But there are puzzles and questions that we can ask about them, raising questions that show that there are still unanswered questions about the topic.

 

Also “best” does not mean “true”

 

See Rosenberg, pp. 139-40 for his list, and see the list in the Schick paper. These look similar, though R. doesn’t explain his much, so it’s hard to tell. We’ll go carefully through Schick’s.

 

A. Testability

First the authors say:

A hypothesis, H, is testable iff in conjunction with a background theory, B, it predicts something that B alone does not predict. [That is, T is testable iff there is a prediction, P, such that (B&T) implies P, but it is not the case that B implies P.]

 

Apply to gremlin theory of fluorescent lights. Assume:

 

Gremlin hypothesis (GH): a) In every fluorescent light there are gremlins with pick axes. b) when the light is switched on, the gremlin strike their axes against the side of the tube. c) this emits light.

 

Background theory: we will see the light if it comes on, the switch works. [This is what Schick says.]

 

If you think that the prediction of GH is that the lights will come on when the switch is flipped, then this is not a new prediction. B implies that. But to see if the theory is testable, it will depend upon what else you say about gremlins. If they are visible, or make sounds, etc. then it is. It’s not clear whether this is part of the theory or the background - either way, GH adds that the gremlins are in the lights, and so these effects will be found in the lights.

 

But GH+B implies that there are gremlins in the lights, but B does not. So that’s a new prediction.

 

Reply: Right. Revise the account of testability by adding that P must be “observable”.

 

But there are a lot of other potential observable consequences of GH+B: little pick axes in the bulbs, axe marks on the insides of the bulbs.

 

The moral of this: it isn’t so easy to construct a theory that has no testable implications. I guess you can say that the pick axes and the gremlins are all not detectable.

 

A complication: Suppose I say I’ve got a gremlinometer. Break open a fluorescent light and you’ll see that the meter registers lots of gremlins. Now it’s testable. More precisely: if B includes or implies that gremlinometers detect the presence of gremlins, then GH+B implies that gremlins will register on the meter when you break open a bulb. B does not imply that. It looks like GH is testable after all, even given the revised account.

 

B. Fruitfulness: H1 is more fruitful than H2 provided H1 makes more novel successful predictions that H2.

 

C. Scope: H1 has greater scope than H2 provided H1 predicts a greater variety of diverse phenomena that H2

 

D. Simplicity: H1 is simpler than H2 provided H1 makes fewer assumptions than H2. 

 

i) The authors say that with fewer assumptions, a theory has “fewer ways it can go wrong”(195) and is therefore less likely to be wrong. This seems mistaken: one assumption can be more probably wrong than a combination of two or more (unless it is assumed that each assumption is equally probable).

 

ii) Also, it is hard to count up assumptions. Example: suppose we are trying to explain surprising outcome of basketball tournament.

H1: It was rigged to make it exciting.

H2: player X on team Y was injured, coach C did a better job, etc.

 

In effect, H2 makes lots of separate claims about the different teams, though perhaps they all fall under some more general theories about what accounts for basketball success. Another example illustrating the same idea: apparent simplicity of conspiracy theories. (Note: they also unify.) This suggests that a very bad theory can be simple. Perhaps such a theory fails on the other criteria, so this is not a devastating objection by itself. But things are murkier: H1, and

conspiracy theories, seem to need explanations for the specific outcomes. H1 doesn’t really explain anything until you add claims about what the riggers wanted to accomplish. And that will add complexity. The idea involved in the “number of assumptions” is unclear.

 

iii) There is a somewhat different way to think about simplicity. Applies best in cases in which there is a mathematical relationship under investigation. [Show graphs.] It gets more complicated when you add in issues about errors in measurement. [Show graphs.]

 

iv) One additional point about simplicity: see bottom of p. 196 in Schick. Example: twins v. changing moods. Another example: gremlins. (Note: I think that this is where that hypothesis goes wrong. Not the testability criterion.)

 

E. Conservatism: H1 is more conservative than H1 provided H1 fits better with existing beliefs than H2 does.

 

III. Applying the criteria

 

A. The general claim: When a person knows of two theories, H1 and H2, and both theories predict the observations so far made, then it is more reasonable for the person to believe H1 than H2 if H1 is superior to H2 with respect to the combination of the five criteria.

 

B. One can wonder whether factors (D) and (E) are really about epistemic justification. Or do they really amount to a kind of practical consideration. Maybe it’s just easier to stick with the theory that is simpler or that fits what we already believe? Doesn’t (E) introduce a kind of bias in favor of the status quo? Or - is it a kind of faith in past success?

 

C. The authors say that there is no “fixed formula” for applying these criteria. But they also say that they are not “subjective”. And they make an analogy to the distinction between day and night, or between being bald and not bald. The homework was about this. The idea seems to be that the criteria are “perfectly objective” yet we cannot “quantify” them. They also say that we cannot say exactly when day turns into night, or when a person with a full head of hair turns bald. They also say that there are borderline cases about which reasonable people can disagree, as well as clear cases.

 

Key properties of the bald/hirsute distinction: borderline cases, not quantifiable, possibility of reasonable disagreement, resists formalization, relies on factors of human judgment, objective (not subjective).

 

Perhaps the same is supposed to be true of the day/night distinction. You might think that sunset marks this one pretty precisely. But light/dark might better illustrate their point.

 

The key idea is that, as shown by the baldness case, a thing can have all the properties up to the last, and still be objective. Alternatively, not being quantifiable, etc. does not imply being subjective.

 

We can think about whether the theory choice case has these properties also. [Note: some people seriously misunderstood this passage on the HWs. They were not applying the criteria for theories to claims about whether someone is bald.]

 

Among the things to think about: Q1) do the analogues actually have the various properties they say they have? Q2) Does the theory choice case have those properties also? Q3) Is there an important difference between the cases that makes a difference for claims about objectivity and subjectivity?

 

Comments:

i) We should acknowledge at the outset that there is some murkiness about the topic, since the words “subjective” and “objective” are obscure.

 

ii) The analogies concern the fact that we try to draw a sharp boundary between things that mark opposite ends of a spectrum. These cases illustrate a point about vagueness and the existence of borderline cases. Dark/light example best illustrates this. Seems as if there are clear cases at each end of the spectrum, and indefiniteness in between. Perhaps the same is true of baldness.

 

iii) Re Q1): We can quantify light and dark. We can quantify number of hairs.

 

iv) Re Q3): non-comparative v. comparative judgments (dark/not dark, darker; bald/not bald, balder). There is no puzzle at all about comparative darkness judgments. The only puzzle is about where to draw the line.

 

v) Re Q1): the previous point suggests something relevant to reasonable disagreement. Maybe the reasonable thing is to agree about all the comparative judgments, agree about all the cases near the ends of the spectrum, and agree that the borderline cases are borderline cases. Don’t pretend that words are more precise than they really are. Don’t draw the line.

 

vi) Re Q3): single v. multiple criteria. Dark/light is best example of single criterion. Baldness may involve multiple criteria, since there is both number of hairs and distribution. Of course, the latter is quantifiable also. The theory choice case obviously involves multiple criteria.

 

vii) Multiple criteria in themselves do not introduce subjectivity. Paper grading analogy. This can all be “formalized” and made precise. The factors can be weighted. (Go through details.) Note that there may be some fairly “soft” factors going into the evaluations. Still, this all seems fairly clear. And, unless the scores in the individual categories are said to be “subjective,” there is no problem with the overall evaluation arrived at.

 

viii) But if the criteria are weighted differently in different cases, then charges of “subjectiveness” seem more fitting. But that seems to be exactly what Schick is saying in the paragraph describing how the various criteria are applied. Think about this in the paper grading case. Can something that has this feature count as “objective”. Given the obscurity of the idea, it is hard to be sure, but this seems like a good situation to apply the word “subjective.”

 

ix) It remains possible that there is some way to understand how to combine the criteria that avoids this result. Maybe it can be made more like the relatively objective version of the paper grading scheme.