Philosophy 105
Fall 2005
Lecture Notes - Correlations
I. Understanding Correlations
Standard form for correlation statements:
A is positively (negatively) correlated with B in P.
People often say that two properties are correlated when they “go together.” The idea behind this is reasonably clear, though there are many details that we won’t get into. The fundamental idea can best be brought out by an example:
Smoking is positively correlated with having lung cancer among people.
This means that the percentage of smokers who get lung cancer is greater than the percentage of non-smokers who get lung cancer. There are more complex things to consider - that the amount people smoke varies with the rate at which people get lung cancer. We won’t worry about that. So understood, correlation statements just compare two groups, in this case the smokers and the non-smokers, and see how often the other factor shows up in the two groups. But note that we could consider other simple correlations, e.g., “heavy smokers” (once this is given a definition) and non-heavy smokers. We could pick any threshold and compare cancer rates above and below that threshold.
We will later look at the connection of this to causal claims. They are often inferred from correlations.
B. Correlation Arguments
Correlation arguments are, in effect, a couple of simple statistical arguments combined together. The points about the sample and target populations and measured and target properties will apply.
Example: suppose I want to know if the students in this course who study a lot learning the material better. So I divide students up into two groups - those who study a lot and those who don’t. Then, I check their grades. But how do I make the division into the two groups? Perhaps I ask them how much they study. So, then the real basis for the division into groups is “reported hours studying”. But the intended groups are those who study a lot and those who don’t. And grades are being used as a measure for learning the material. So the actual results might be something like this:
75% of the students who reported that they studied 3 or more hours per week got a grade of
B or better.
30% of the students who reported that they studied fewer than 3 hours per week got a grade
of B or better.
The measured correlation would then be:
Reporting that one studies 3 or more hours per week is pos. correlated with getting a grade of B or better among the students surveyed.
The accuracy premise would then link this result to a correlation among the target properties: studying hard and learning the material.
Studying 3 hours or more per week is pos. correlated with effectively learning the material among the students surveyed.
This is the target correlation in the sample. Notice that, in effect, there are two inferential steps here, one going from reports about studying to how much the students actually studies and another going from grades to learning the material.
If I only asked some of the students - say the ones who were present on a particular day - then the rest of the argument would involve a representativeness premise, linking the students surveyed to all students.
The whole argument could put into the argument pattern displayed on p. 265 of the text.
Questions about the study could then involve the accuracy of each measured property for its related target property. If there is reason to think that students misreport how much they study, this affects the accuracy premise. If there is reason to think that the grades do not reflect how well they learned the material, this, too affects the accuracy premise. And if there is reason to think that the students who are in the sample are not like the other students, then there is reason to question the accuracy premise.
III. Examples
Example 1 [Note done in class. Similar to question on quiz.]
Researchers were interested in finding out whether wealthy people or less affluent people tended to carry more cash with them when they went out shopping. To find out, they went to a mall in a middle class suburb of Cleveland and questioned 1000 adults over the course of several days. Among other things, they asked people about their income and about the amount of cash they had with them. They found that of the people with family incomes below $50,000 per year, about 35% had more than $100 in cash with them. Only about 20% of the people with incomes above that level had more than $100 in cash. They hypothesized that the greater availability of credit to those with higher incomes explained the surprising result.
Sample Population: 1000 adults who were questioned at a mall in a middle class suburb of Cleveland.
Target Population: All (American?) adult shoppers.
Measured Properties: Responses to questions about income and amount of cash being carried.
(Specifically, saying that one’s family income is above/below $50,000, saying one has move/less
than $100 cash.)
Target Properties: Actual family income, amount of cash. (Specifically, having a family income
above/below $50,000; having more/less than $100 in cash.)
1. Results: 35% of the people who reported an income below $50,000 reported having more than
$100 in cash.
20% of the people who reported an income above $50,000 reported having more than $100 in
cash.
Reporting having an income below $50,000 is positively correlated with reporting having more
than $100 in cash among the people surveyed.
2. Accuracy Premise: [Reports about income and cash were accurate.] (Note: you could write
this out in more detail.)
3. Result in Sample: Having an income below $50,000 is positively correlated with having more
than $100 in cash among the people surveyed. (1), (2)
4. Representativeness Premise: [The sample population is like the target population. The
percentages in the two groups are the same]
5. Conclusion: Having an income below $50,000 is positively correlated with having more than
$100 in cash among the American adult shoppers. (3), (4)
Here are some proposed criticisms.
1) Since there is some reason to doubt that people will tell the truth about the amount of money they have with them, there is reason to doubt the accuracy premise of this argument.
2) Since the people in the survey were shopping at a mall in a middle class suburb you have reason to doubt the representativeness premise of this argument.
3) Since people are more likely to carry large amounts of cash with them when they are shopping than at others times, you have reason to doubt the representativeness premise of this argument.
4) Since only 1,000 people participated in the survey and many thousands of people shop at the mall every day, you have reason to doubt the representativeness premise of this argument.
(3) and (4) are bad criticisms. (1) and (2) are at best modestly good criticisms. More effective criticisms would be ones that brought out reasons to think that the dishonesty mentioned in (1) would skew the results. Suppose people are afraid to say that they have a lot of cash. Since that fear may well be present no matter what their income, it may be that the actual number of people with more cash is higher for both groups.
Example 2: Kids who beat up other kids get their way, #6, p. 268:
Start by identifying the simple statistical facts that were discovered. In this case, we don’t get a lot of detail. But we do get enough to proceed. A good way to think about these kinds of case: there’s an overall population that is divided into two sub-groups. The division is made on the basis of one property - one sub-group has it and the other doesn’t.
Combined sample population: 875 3rd graders in Columbia County, NY in 1960.
Sub-groups: those who were said by their peers to be aggressive; those who were said not to be aggressive by their peers.
The other measured properties involve peer assessment at age 19 and criminal record at age 30. We’ll ignore the first of these.
So, the underlying simple statistical statements will be like this:
Result 1. x% of the people who as 3rd graders were said by their peers to be aggressive had a criminal record at age 30.
R2. Less than x% of the people who as 3rd graders were said by their peers not be aggressive had a criminal record at age 30.
We will use this to get a measured correlation, and from that make an inference about the sample, and from that draw a conclusion about the broader population.
Being aggressive as a child is positively correlated with committing crimes as an adult among people.
A: Being aggressive as a child.
B: Committing crimes as an adult.
P: All people.
So, the correlation claim implies:
Among all people, the percentage of As who are Bs is greater than the percentage of ~As who are Bs.
So, they want to establish
% of As who are Bs > % of ~As who are Bs.
So, we can formulate the correlation argument:
Sample population: 875 people who were third graders in 1960 in a semi-rural area (S). Divided
into two groups - those who were said by their peers to be aggressive and those who were not.
Target population: all people?
Measured properties: Being said to be aggressive, Having a criminal record at age 30.
Target properties: Being aggressive, committing crimes as an adult.
The argument:
1. Results: [Not provided.]
2. Measured correlation in sample: Being said by one’s peers to be aggressive as a child is
positively correlated with having a criminal record at age 30 in the people studied.
3. Accuracy premise: Being said to be aggressive is a good measure of being aggressive and
having a criminal record is a good measure of having committed a crime. (That is, if (2), then being aggressive as a child is positively correlated with committing crimes as an adult in the sample.)
4. Target correlation in sample: Being aggressive as a child is positively correlated with
committing crimes as an adult in the people studied. (2), (3)
5. Representativeness premise: If (4), then being aggressive as a child is positively correlated
with committing crimes as an adult in all people.
6. Conclusion: Being aggressive as a child is positively correlated with committing crimes as an
adult in all people. (4), (5)
We just have to take their word for it that (2) is true.
I have no compelling reason to doubt (5). To doubt it requires some reason to think that the correlation holds in this population, but not more generally. That could be, and the study is far from conclusive. But I know of no strong reason to be very suspicious of this step. You’d need a reason to think that this population is not typical.
Is (3) true? This amounts to asking whether: (a) peers reports accurately measure who was aggressive and (b) whether having a criminal record accurately measures tendency to commit crime. There may be some grounds for doubting (b): those seen to be more aggressive may get arrested more because they are suspected more. So there is at least some reason to think that this way of measuring these two properties may be inaccurate.
So, this looks to be a fairly convincing, though underdescribed study. The conclusion is not terribly surprising.
IV. Inference to the Best Explanation, Strength of Evidence, and Criticisms of Statistical Arguments
A distinction emphasized here is between two kinds of criticism: a) weak ones that mention some alternative possibility to the conclusion drawn, and b) stronger ones that provide a good reason to think that some alternative is actually true. Possibly you can better appreciate the distinction by thinking of these arguments in terms of inference to the best explanation. We observe some pattern in the sample (as the cash carried by shoppers example). One explanation for this pattern is the one drawn in the argument. This explains the observation by identifying it as an instance of general pattern. There are alternative explanations, involving people lying, etc. It’s a good thing to be able to think up alternatives - that’s what enables you to see flaws in arguments. But the mere existence of an alternative does not show that the stated explanation is not the best (i.e., most reasonable) one.
It’s possible in the context of doing research, before an hypothesis is accepted it must meet a higher standard of evidence. Thus, we might conclude that some results provide a pretty good reason to believe some conclusion, since that conclusion is the best explanation of those results. But one might also want something stronger, somehow to rule out more conclusively the less plausible alternatives. So we might say that the results make some theory or hypothesis more reasonable than not, yet they do not establish the hypothesis with certainty.