### So, what IS the point of those introductory college statistics classes, anyway?

For the past few months, I’ve been tutoring a college student in statistics. This tutee, unlike the other one I took on this summer, is good-natured, engaged, reasonably comfortable with basic mathematics, and in general an absolute pleasure to work with. But I figure there’s a reason that TD&M gets more hits in an hour than there are holds on every copy of Pollyanna in BC’s public libraries put together, and why mess with a winning formula? So enough about that student.

Let’s talk instead about the purpose of these statistics requirements for business and social science majors.

I taught such a course last year. My objective going into the course - and objective that made its way onto the syllabus - was for my students to emerge with a decent ability to assess and interpret quantitative data. Every lesson plan was subordinated to this purpose. I taught the standard intro-stats notation and terminology, but only as a means to an end. Each and every one of the ten quizzes and three tests that my students wrote contained at least one question that required that they answer in plain English. It was not enough that they be able to give me the bounds of confidence interval; to receive full credit, they needed to tell me what it meant. It was not enough for a student to tell me that a sample proportion fell inside the critical region and that we should therefore reject the null hypothesis; they needed to tell me what that meant in terms of a manufacturer’s or politician’s claim.

The class, for the most part, went rather well. Marks weren’t great; but I was confident that a good mark in my statistics class indicated a genuine understanding of *statistics*, and not just an ability to pluck out numbers that, when plugged into a meaningless equation, will yield the numerical answer that will be marked correct.

The statistics class that my tutee just completed was quite different. It covered the exact same topics as my class - sampling, measures of central tendency, distribution, probability, estimations of means and proportions, hypothesis testing - but this professor’s teaching and testing style was very different from mine. He gave a lot of assignments, and was fond of breaking questions down into multiple parts that lead a student to the answer. I’m not entirely opposed to some amount of that - hell, I think that such questions are probably the best way, at least initially, to deal with students who freeze when confronted with a problem that they can’t solve immediately - but this prof’s multi-part questions were…ill-conceived, to say the least. In particular:

- Every question on a certain topic followed
*exactly the same template*. No two topics shared same template. My tutee quickly figured out that an eight-part question in which the first part asked for a sample proportion and the second asked for the claim of the population proportion, was a hypothesis test that called for use of formulas 8.3 and 8.4. She also figured out that you could plug the first, second, third, and fourth numbers given in the question, respectively, into 8.3; 8.4 used the result of 8.3, along with the fifth number given in the question. - Even if I had devoted my best efforts to the task, I could not have written questions more leading than this guy’s. For instance:
- …The distribution of student weights is unknown. 42 students are weighed…

a) …

b) …

c) Which of the following applies:

- We can use Formula 7.3 because the distribution of student weights is normal.
- We can use Formula 7.3 because the sample size is at least 30.
- We cannot use Formula 7.3 because the distribution is not normal and the sample size is less than 30.

- The man was a certifiable jargon/notation
*fetishist*. Tell me, how the hell*else*do you explain a question of the form “What is the sign (less than, greater than, or not equal to) that appears in the statement of the alternative hypothesis H_{A}?” Or - and this is my bias talking, because I can never for the life of me remember which label goes with which - the query “would this be a Type I or a Type II error?” with*no followup*.

None of the questions had an “explain in plain English” portion. My tutee, whose term mark was among the highest in her class, could tell me that in Question #4 of Section 9, we reject the null hypothesis because x-bar fell in the critical region, but she could not tell me that what this meant was that the lightbulb manufacturer’s claim was bullshit. She solved the problems on her tests and assignments by pattern-matching on the rigid templates, and on following the leading questions. When I worked with her two nights before her exam, she stated matter-of-factly that she expected to forget everything from the course the next day.

If this what one of the top students in this statistics class has taken from the course, then I think it’s a pretty safe bet that this statistics class is not preparing students to assess and interpret quantitative data.

But I can’t hold this professor responsible, because he seems to be doing a good job under the circumstances: he’s got a jam-packed curriclum to follow, and is responsible for delivering a bevy of content at the expense of skills. Even though this prof he gives plenty of practice problems that prepare for the very predictable tests, and even though he gives excellent notes, and even though he is available for plenty of extra help…despite all that, my student - who has been sick half the term and who, by her own admission, has been slacking off lately - is one of the top students. I’m certainly not going to second-guess what I presume was a conclusion that his students could not handle a more rigorous course, one that aims to train students to assess and interpret non-canned quantitative data.

I can’t blame him for concluding that there’s no way he can deliver such a course successfully, so he might as well not have his students hate him by the end of the term. And if that means that there’s no guarantee that an A student will understand what it means for a poll to be accurate within three percentage points nineteen times out of twenty, then so be it.

And there’s a big problem with that. If there’s one math class in which the question “what’s the point of this?” should never *ever* come up, surely it is introductory statistics. But I can’t for the life of me see how anyone could justify teaching a statistics course like the one I tutored.

I wish I could design such a course, because having taught it once, I know exactly what I’d do differently if I were granted full control over the format. In two words: less content. Oddly, calling for less content in a math class tends to invite charges of “dumbing down”, and we can’t have that! - nevermind that the textbooks of yore contained vastly less content than the ones of today - but emphasized mastery and application.

Here’s what I’d trim out of a single-semester intro stats class:

- Most of the probability section. I love probability - so much that I spent far too much time on it last term - but it’s easy to underestimate just how much difficulty students have with it. I’d get rid of everything that isn’t necessary for binomial probability applications, and leave those in only because of the normal approximation to them. (Height of stupidity: spending three weeks on permutations and combinations, and then glossing over connection between probabilty and statistics. Yes, I did that last year.)
- The Student’s-t distribution. There’s more than enough you can do with normally-distributed sample sizes, and if we’re going to wave our hands over the Central Limit Theorem anyway, why confuse matters with the rule that samples of size thirty use Table A5 while samples of size 29 use Table A7? This time would be better spent elsewhere.
- Though not on the “estimating the standard deviation” section. Estimating means is simpler and more relevant, and students still struggle with it.

The leftover time - and really, there isn’t much when you cover the rest of the course at a reasonable pace - can be used with hands-on activities, which are so natural for a statistics course. It can be used to have students *design* the sorts of questions that usually appear on tests: the data they encounter when they see the latest polls, or when they weigh precisely a bag of apples, provides suitable fodder for a variety of such problems. It can be used to discuss why one researcher would rather risk Type I errors, and another Type II errors. It can be used by emphasizing, over and over and over again, the implications of the material everywhere.

I don’t think that such a course would be at all dumbed-down from the one that my tutee took this year; to the contrary, it would require students to think far more deeply about the material. But such a course would be faithful to what I assume are the reasons for teaching introductory statistics. And if I were to teach it, I’d feel a lot more better about the answers I give to *what’s the point of this stuff* than I would if I were instead responsible for delivering the more content-heavy statistics class class that nearly every business and social science department requires its students to take.

I diasagree on the Student’s t. We had to learn in in Sophomore Analystical Chemsitry because of the low sample sizes you get from analyzing small physical samples. The same goes in business. It often costs too much to stop production lines more than 30 times to get samples, or to conduct more than 30 focus groups .I think a little time out to be spent there to hammer in the GIGO concept and why confidence intervals get better with sample size, as well as Student’s t, but for business people, those concepts ought to all be floating around in the same space in their heads, ready to use and apply.

But I perfectly agree about less content. My Tae Kwon Do instructor used to say that brown belts hit a slump becuase they forgot the basics and started fooling around with jump kicks that were designed to knock people off of horses, not for hand-to-hand. When the student remembered to start practicing basics again was when they got good enough to test for black.

You know, perhaps for a 2nd semester course in stats you could do ANOVA and Student’s t and goodness-of-fit tests, etc.

For intro stuff, calculating a few descriptive stats and doing some hypothesis testing on proportions and means, and talking about some common stats pitfalls would be really, really important. Most of these biz people will not be doing actual quality control stuff themselves, and the people who do will need to know a =lot= of stats (my stepfather does quality control checks of manufacturers certified by Underwriters Labs… he definitely needed a lot more than an intro stats course.)

Meep, I was thinking of the intro stats course in the MBA curriculum. If you and MS are talking about undergrad business majors, you and she are probably right.

Intro stats for undergrad science majors must cover Student’s t.

Especially stats for chemists. Student worked for Guiness Brewery for heaven’s sake. Every chemist in the world knows and loves him.

Most analytical chemistry classes will teach Student’s t, no?

I feel underknowledged in stats. Anyone have a (text)book recommendation? Assume I don’t remember most probability beyond the basics, but I did know it once, and I’m not looking for something “media literacy for innumerates” level, but more an actual grounding in stats like I would have gotten from actually taking (good) stats classes.

During my graduate program at UC Davis I took an upper-division stats course taught in the School of Education. Most of the students were in the School’s Ph.D. program. The professor was not a mathematician or statistician. He was an education professor with a long history of publication in ed research and his focus was almost entirely on why certain measures and tests are better choices for certain types of data. (He was actually pretty weak on the mathematical underpinnings of stats and occasionally appealed to the math geeks in the class for assistance.) We were actually using the same textbook as one of my colleagues at my junior college was using for her introductory stats course (a lower-division course that many universities would not accept for transfer credit).

While I thought the UCD course could have stood some improvement (and the prof really should have done more preparation for some of his examples), the main point is that no one in the class could have any doubt about its purpose. We were expected to know

whywe were choosing certain statistical measures and tests in the analysis of our research data andwhat it meant. That dimension, all by itself, saved the class.MS, I sure wish I had had you as a teacher for sadistics! I took a stats for business course which was combined with learning Fortran (long before the days of PC’s), and then some years later I took a math course in Probability which required two years of college calculus as a prereq. I do okay in the math, even though the application phase of probability math seemed to be witchcraft, and I ended up with an A in the course, much to my surprise. But I never really understood what the various formulas actually meant in the real world. Sure, I could take a problem and figure out how to solve it and come up with the answer, but I never had a FEEL for it. My bag is engineering, not all that theoretical math stuff.

John - yes, I’m talking about business/social science stats classes - should have made that more clear. Nevertheless, even in the case of stats for science students, I don’t see why we can’t include the Student-t stuff in the

secondsemester. Why the need to fit everything into a three month period? Also, since you agree with the less-content suggestion - what content would you cut?Wolfangel - alas, I’m really not the one to ask; I actually hated statistics (because, or consequently) I didn’t learn it properly, until shortly before I taught it.

Rex - thanks. However, I should point out that [above] is NOT the the statistics course I taught, and is probably not the statistics course I’d teach if I ever had to teach statistics again. Like nearly every other sessional college statistics instructor, I had a massive curriculum to follow, and I had no choice but to adopt the “talk at the students for three hours per week” format if I wanted to get through it. And re “much to my surprise I ended up with an A” - sadly, that describes a lot of students’ experiences in high school and college math classes. (Which is another reason to “dumb down” classes: I’d rather have a student get an A because they understand an “easy” course, than because everyone bombed the more difficult course but then marks were scaled.)

Two years ago, I took a graduate level course in multivariate statistics. It was taught by a marketing prof in the business school - and he did a really good job of going into some of the math, and requiring us to understand what we were talking about. There were usually two or three problems assigned each week, and we had to show our results (tables, tests, etc) and discuss what they meant. I probably wrote 15 pages or more (including the charts and tables, so not quite as much text as it sounds like) for that class. And I think everybody got a lot out of it, even the folks who were somewhat underprepared for the topic.

I definitely agree, based on that experience, that it’s really critical for students in a stats class to be able to explain in clear english what the results mean, even in upper level courses.

With luck, one of your many mathy commentors will have a suggestion.

wolfangel: I liked Larsen & Marx, but I haven’t used it much outside of the class (as a student.)

wolfangel — what exactly do you want to know? if you don’t plan on actually using stats, then i suggest “How to Lie with Statistics” by Huff and “The Cartoon Guide to Statistics” by Gonick. They give you the meat of important stuff.

My favorite stats book is at work, and I don’t remember the author’s name. It’s got excellent case studies, including a real-life Simpson’s paradox involving sex discrimination in grad school admissions at Berkeley. Cool stuff.

Oops, just reread your comment. I’ll look up my Stats book next week. It’s probably out-of-print, but in this Amazon age you should probably be able to find it.

I just finished teaching an intro statistics course for the first time. I found that this was the first course I have taught where the use of TI calculators (which otherwise I agree are harmful to learning) improved what we could do in the class. We were able to focus on interpretation of the meaning of results and understanding how to analyze different scenarios. I found that we could cover a pretty full syllabus (z, t, chi^2 ANOVA and enough probability to do binomial distributions and connect them to normal distributions etc.) Did you use calculators in the statistics course you taught?

So the departments wanted the students to gain a basic grasp of statistics, but after discussions like the above (”You’re not going to leave out

X, certainly?”) failed to pare the syllabus enough to fit the time they wanted to make available. (It’s only an introduction, see, so you can goreally fast!)Some of the students

mighthave an interest inwhat’s the point of this stuff,but for many the concepts come so fast and furiously that they resort to pattern-matching, just to keep up. (The ones who are merely after the degree, and see the stats course as just one more ticket to be punched, are pattern-matching anyway.)Meanwhile, the instructor is faced with the problem of (1) getting through all the material in the time allowed and (2) achieving a reasonable pass rate. (”It’s only an introductory course, fer cryin’ out loud. You expose people to the techniques. How can you have 75% failure?”) So he structures the questions and examples so that the pattern-recognizers can achieve a passing grade. Advanced or bright students may get more out of the course, but that’s no longer the point.

And everybody gets what they wanted: The departments can say the students have been “introduced” to statistics, and the students continue the progress toward their credentials unhindered. Mastery has gone out the window, and the only ones to lament its passing are the few students old-fashioned enough to have believed that part of the idea of a university education was to actually learn something, and those in the faculty who are old-fashioned enough to agree with them.

(Yes, I’m feeling gloomy this evening. The situation unpleasantly recalls my years-ago Fields & Waves course, which had originally taken two semesters. When the uni converted to quarters, they made it a two-quarter course, then, when the uni converted

backto semesters, they decided they could do it all in one semester. Without adjusting the amount of material. Which was when I got to take it. Needless to say, it was a trainwreck. I got through, but I certainly didn’t master the material. There was no time.)I think the purpose of an intro stats course in college is to introduce the ideas/concepts of descriptive/inferential data analysis (they should know how to read/interpret graphs, p-values, confidence intervals, etc.) and to give them the vocabulary and understanding to be able to work with a statistician. In practice I think most data intensive research should have someone who acts as a statistician (it may be an applied scientist with a strong stats background).

Many of my students will get work as field technicians somewhere (they are primarily environmental science students). When they work, they will probably be expected to implement data collection protocols designed by a biometrician. In order to effectively work together, they need to know some of the basics of inferential and descriptive statistics. They do not need to know how to do multiple regression, or ANCOVAs or anything very complicated. In fact, in most cases it would be better if they did not know, since it’s pretty easy to ‘know how to do’ such analyses without really understanding what’s going on. Just because R/SAS/SPSS/etc. can spit out an answer using the method you choose, doesn’t mean it was the correct method.

The stats book meep mentioned reminds of the one I use to teach a class which fulfills the general education requirements. It’s called “Seeing Through Statistics” by Jessica Utts and it is now in the third edition. It is not a book that will teach someone how to do data analysis, but it covers things that the ‘educated citizen’ could stand to know in order to evaluate studies and the news reports based upon them.

Seeing Through Statistics is probably too much of a “media literacy for innumerates” book for wolfangel’s needs, but I actually found it to be very useful. I have taken many graduate level courses in math and statistics (both math/stats and applied stats), but it was only in going through this book that I really thought about how all this theory and somewhat esoteric application actually got communicated (often poorly) to the public at large. It also spends a number of pages discussing the non-mathematical aspects of data collection, analysis, and interpretation (correlation does not imply causation). After all,the most amazing data analysis ever is not worth much if it was a lousy sample to begin with.

My educational background is primarily in math and statistics, and I think the material covered in a book like “Seeing Through Statistics” is more important to most people than the material I cover in an Intro to Stats course more like the one MS has described (where we actually focus on data analysis). It’s helpful to do some basic experiments and data analysis (with coin flips or whater) to understand the relationship between randomness/probability distributions and statistical inference, but in practical examples things can get complicated very quickly. (I tell my students that it’s very easy to collect data that’s hard to work with, and despite my attempts to help them out on their projects they often collect difficult data anyway).

MS, this has nothing to do with your post, but when I read this news clip I immediately thought of you: http://www.cbc.ca/story/canada/national/2005/12/16/smarties-claim20051216.html

Now there’s an example of fine math teaching in action.

Matt - oh, that’s great! Thank you!

mgoff:

Just because R/SAS/SPSS/etc. can spit out an answer using the method you choose, doesn’t mean it was the correct method.Oh, hell YES. You would notbelievehow many students I had who plugged numbers into completely incorrect formulas (and I meancompletely- as in, the question required them to test a hypothesis about a mean, and they’d stuff numbers into the formula for a binomial distribution - huh?) and then argue that they deserved almost four marks, like four out of five, because even though they used the wrong formula, they followed through with it correctly! And I’d try desperately to explain that if they didn’t understandwhat the formulas were forand when to use them, then they just plain didn’t get the material.azindik - Yeah, I did use calculators in my stats class. Like you, I find that statistics is possibly the only intro-level college math class in which fancy calculators do more good than harm. I wanted to make sure that my students understood, for instance, how to compute standard deviations, and I’d give them sets of three values to crunch by hand, but I also wanted them to be able to handle larger data sets, so the built-in statisticial functions were pretty useful for that.

I’ve actually taken a statistics course pretty close to the ideal one you describe - it was biostatistics/epidemiology at my medical school (taught by a practicing physician/researcher). They’ve been working on it forever, and have now decided that it’s more important for us to be able to understand and evaluate statistical claims, than to be able to necessarily do all the math. And I think they largely succeeded. Granted, it’s not perfect (lots of time on understanding what confidence intervals mean, quite a bit of time doing simple statistical tests, very little of what the statistical formulas mean - we learned when to use which test, but usually not why the formulas are X and not Y). But overall it was pretty effective. I’m not sure why you evidently have to get into medical school to be able to take a course like this, though.

I’m involved in further streamlining the course, and I’ll probably be coming back to this post for ideas.

It’s called “Seeing Through Statistics” by Jessica Utts and it is now in the third edition.I haven’t seen the third edition. Does she still try to make a case in it that psychic phenomena are real? She did in the first edition. Jessie Utts is a splendid person and an engaging teacher with whom I’ve had pleasant conversations, but she’s always tried very, very hard (too hard, in my opinion, obviously) to see justification for positive conclusions about ESP when the null hypothesis would serve better.

Zeno: I have never seen the first edition, but I have used the second and third. All the examples using testing for ESP are included in the text currently (which was a little different than I would have expected when I first saw the book), I do not get the feeling that she was trying very hard to justify positive conclusions about ESP. However, having not seen the first edition, I am uncertain whether this reflects my different response to the material present, or a difference in the material present.

Thanks, mgoff. I appreciate the information. It could also be a personal response on my part, too, since ESP research has a track record of flakiness at UC Davis, where Charles Tart was the resident parapsychologist and Utts is a professor of statistics. Tart was notoriously casual about experimental controls and it was a pity to see expert statisticians trying to find significance in his results. When I saw the first edition of the Utts stats book, my reaction was that she had fallen in with the wrong crowd.

Someone asked for a good stats book — I highly recommend Larry Gonick’s “Cartoon Guide to Statistics.” It’s quite serious, without a bit of solemnity, and an excellent overview for the person who wants to teach themselves some useful ideas.

MS, I’ll have to think about specific content. The problem is that in MBA school I got one stats class. There is no time for more, so you have to cram. As an undergrad, I was required to take no stats class at all - everything I knew I got from Analytical Chem. and Physics classes (we had quite a unit on stats in the experimateal physics class). I took linear algebra and advanced DE as my math electives. I suspect that’s true for a lot of scientitsts / engineers - stats were either acquired by osmosis or retroactively in grad school.

Got my favorite Stats text in my hot little hands right now. It’s simply called “Statistics” and it’s by Freedman, Pisani, and Purves.Copyright 1980.

Here’s a link:

http://www.amazon.com/gp/product/0393970833/qid=1135015650/sr=8-1/ref=sr_8_xs_ap_i1_xgl14/103-0637103-6041424?n=507846&s=books&v=glance

Great text. Lots of real-life examples, too. I highly recommend the bit on “Sex Bias in Graduate Admissions”.

Just remembered something else that should be taught in these intro-level stats classes: some measure of justification for the formulas. It’s generally taken for granted that we’re not going to actually prove any of these formulas - the important thing is that students be able to use them - but I (unlike the texts I have used) like to spend some time going over why the formulas make some sense. For instance, why we would divide by the sample size (because the standard deviation is smaller when you’re looking at larger samples), and so on.

Speaking of which, do you have a refrence for Gauss’s orginal work?

I enjoyed this post and I agree with you about teaching the ‘why’ and ‘usefullness’ rather than just the ‘how’. Unfortunately, many of my students come in with the attitude that statistics is all about memorizing formulas and plugging in numbers. It’s takes me a good three weeks usually to get them on board with my ‘here is why it works this way’ and ‘how we can use this to answer questions’ method. As an instructor of intro as well as more advanced methods, I certainly see the benefit of students who have come from this type of course and truely understand the basic principles; this makes courses like mulitivariate statistics much easier to teach. Students who come into my advanced classes with a ‘plug and chug’ mentality rarely make it through the course.

Oh, that item about the Smarties is great! Would that everyone was sufficiently skeptical about numbers to check anything that looks “off”.

That’s the precise reason I post sources and most or all of my calculations for my analysis pieces on my blog.

Nobodyshould have to take my unsupported word foranything…and the sooner they get in the habit of demanding supporting data everywhere else, the better off the world will be.