Sunday, March 3, 2013

DIBELS and False Alarm Analysis

DIBELS is a test of language for small children learning to read. It was designed at the University of Oregon, Center for Teaching and Learning. It is a correlative test which is designed to allow an assessment of the probability that a student will pass a third grade reading test. In this sense it is simply a detector like any other and can be analyzed as such. Like any other detector a characterization of the false alarm picture is required to ensure that we understand how the detector operates.

The test operates like most any other detector. The child takes a test, if the score exceeds a high threshold called the benchmark they are classified low risk. If the child exceeds a lower threshold they are classified at medium risk and below that high risk.

Technical reports for the DIBELS test are reported here. An analysis and discussion of how the thresholds are chosen is found in this particular report, 2012 - 2013 DIBELS Next Benchmark Goals: Technical Supplement [pdf]. This report publishes the ROC curves for the DIBELS tests at various time from early kindergarten up to third grade. In particular it also discusses how they anticipate changing the receiver operating point to increase the probability of detection. The ROC curves are all fairly similar so I'll use figure one as illustrative. The ROC curves in this report use "Sensitivity" along the y-axis which is equivalent to the probability of detection Pd . The curves use "1-Specificity" along the x-axis which is equivalent to the probability of false alarm Pfa. In figure one on the left, then benchmark case, they show two operating points. The new "Recommended Goal" lists a  Pd=0.9, Pfa=0.57 so there is a 57% chance that a student who meets the bench mark will fail to make the cut using this operating point while 90% of students who are below bench mark will be identified. I.e. 57% of students who are in no need of help will be flagged for extra help and classified as at risk by their teachers and schools.

It's important to remember that Pd and Pfa are conditional probabilities so to interpret the full situation we need to know the base rate for a given scenario in order to understand what impact this has. It can be difficult to estimate the base rate so as a proxy we can use reported statistics for students who pass Ohio's third grade reading test. These were reported in the Dayton Daily News in an article titled "New reading requirements could cost schools millions". The article appeared in the 11 Feb 2013 print edition along with a table detailing how many students from various districts pass the third grade reading exam. (This table only appears in the print edition.) The Oakwood school district had only 0.6% of students fail to pass the exam, while Dayton City has 36.2% fail to pass.

Using these pass/fail values as proxy for the base rate we can use Bayes theorem to estimate the probability that an alarm is for an Oakwood student who truly needs help and find it to be 0.9%, so the other 99.1% of the time the student flagged at risk is not truly at risk, i.e. they are a false alarm. A similar calculation for Dayton City schools reveals that the probability that a student is truly at risk to be 47.3%. Even for a urban school district like Dayton City, the probability that any given student who is classified as at risk by DIBELS is actually at risk is still less than 50%.

Large false alarm rates lead to large problems. First, the people who use the output of the detector will eventually become desensitized to the output and ignore it. This is what drives the requirements of many detection algorithms. Teachers and educators are busy with many other important tasks and will not have the time to sort out true from false detections. Second, even if we take seriously the large number of detections, we will be putting enormous resources into what's called intervention even though it's not necessary. Even in a school district with infinite funds, there is still an opportunity cost in terms of educational time. I.e. children could be taught math, science, music, art or anything else and this would be beneficial to them which the reading instruction for most of these children is wasted resources. Finally, at these levels of "detection" we are no longer talking about intervention we are simply talking about teaching and it would be worthwhile and more efficient to consider that hiring more regular teachers and lowering class size would be more beneficial than hiring (presumably higher-salary) specialized intervention teachers.

No comments:

Post a Comment