Friday, November 16, 2007

The Trouble with NYC Report Card Peer Groups

Central to the NYC Report Cards is the idea of a "peer group." The concept is a good one - it's not fair to compare schools serving vastly different populations of kids, as does NCLB, because kids may be doing worse in school because of out-of-school issues the school can't control.

The NYC DOE deserves credit for moving away from these apples-to-oranges comparisons. But NYC constructed elementary school and K-8 peer groups using a weighted formula of only four school characteristics - percent black and Hispanic combined, percent free lunch, percent special ed, and percent ELL. The junior high and high school peer groups were constructed using 4th and 8th grade scores, respectively. Each school is then assigned a "peer index" number, and the 20 schools above and below a school in the peer index serve as their "peer group."
How did this work out? Consider elementary school PS 196 in Queens, which received a B, as an example. PS 196 serves students who are:
  • 40% Asian, but is being compared to schools that have between 1% and 69% Asian students (average=17.8%).
  • 44.8% white, but is being compared to schools that have between 22.1% and 90% white students (average=66.6%)
  • 2.1% African-American students, but is being compared to schools that have between 0% and 16.7% African-American students (average=3.96%).
  • 13.1% Hispanic students, but is being compared to schools that have between 2.9% and 17.1% Hispanic students (average=10.4%)
  • 14.6% free lunch students, but is being compared to schools that have between 2.4% and 26.8% free lunch students (average=15.5%).

Key inputs that are not in control of the school vary widely as well:

  • PS 196 is at 106.4% of building capacity, but is being compared to schools that are between 51.6% and 136.5% of capacity (average=92.1%)
  • 54.3% teachers at PS 196 have more than 5 years experience, but is being compared to schools that have between 35.7 and 81.3% teachers with more than 5 years experience (average=59.5)
  • PS 196 has 658 students, but is being compared to schools that have between 180 and 1263 students (average=601).

The comparison groups were constructed differently at the high school level, where middle school test scores, not demographics, were used to create comparison groups. But do middle school test scores provide enough information to net out all factors that influence students' graduation?

Consider the Manhattan Center for Science and Mathematics, a selective school located in East Harlem that screens students not only on their test scores, but on their middle school grades and attendance. However, the Manhattan Center is not only compared with other schools that select students on their test scores, grades, and attendance; in fact, a handful of high schools with zoned programs are included in its peer group.

These schools have students with widely varing characteristics as incoming 9th graders. The Manhattan Center serves students who have the following characteristics as entering 9th graders:

  • 61.2% proficient in reading in 9th grade, but is being compared to schools that have between 37.5 % and 100% proficient as entering 9th graders (average=70%) .
  • 79.5% proficient in math in 9th grade, but is being compared with schools that have between 45.2% and 100% proficient in 9th grade (average=77%)
  • 65% free lunch students, but is being compared to schools that have between 12.2% and 93.9% free lunch students (average=39.8%)
  • 16.8% of students who are overage for grade (i.e. they have been held back), but is being compared to schools that have between 3.3% and 61% overage for grade (average=12.5%).
  • 12.4% ELL, but is being compared to schools that have between 0 and 84.1% ELL (average=5.01%)
  • 7.1% full-time special education, but is being compared to schools that have between 0 and 4.7% full-time special education (average=.8%)

There are two problems with this approach - one statistical and one practical:

1) As the comparisons above demonstrate, the peer groups falsely provide the illusion of fair comparison.

2) Perhaps more important for responses to these report cards - any educator who sits down with numbers will likely conclude that these comparisons lack face validity. And if these peer groups are not believeable to educators - that is, they don't feel that they have an equal chance of winning or losing this game - we are going to see enormous amounts of playing the system as schools attempt to succeed in a system that is perceived as fundamentally unfair.

3 comments:

fjstats@aol.com said...

You Can't Always Get What You Wonk.

Thank you for articulating the bases upon which the groupings were set up.

I agree with your conclusion that the mathematics and methodology used to establish peer groups does not control for certain relevant variables. Therefore, within-group comparisons are confounded.

My question is this: From an experimental design viewpoint, is the notion of finding appropriate groups (i.e., schools that match each other in terms of significant educational factors)a search for the holy grail? On what basis, if any, would you suggest that schools could be stratified to yield meaningful data. Thanks.

Anonymous said...

I don't have a comment. I do have an addition for your list of education blogs, "Running on Empty" at philipwaring.us
It's not only education, but mostly that.
Take a look, you may want to add it to your list.
By the way, what's your email address?
Thanks,
Philip Waring
pbwaring@comcast.net

Anonymous said...

Here's Part 4!!!
Fade into the pub. Those huggable ICEsickles are scarfing down chickenwings, it truly is a sight to behold...

James "E. Turtle" - "Guys, I'm still wondering, what's a pension?"

"Kip Winger" - "Anybody have a wetnap? I got sauce on my t-shirt."

"Salad" the Barber - "If any of you guys eat that last chicken wing they'll be hell to pay. I'll call the boys."

"Un-Norm-al" Scott - "A penison is a bad thing created by Randi Weingarten...it gives it's user the chicken pox, causes anorexia, and it promotes irritable bowel syndrome. Trust me, I know."

Petey "Bowtie" Lamphere - "You know what guys, I was just thinking, if I did get some of my building's merit pay, I could buy a lot of new bowties."

Jeff "Andy" Kaufman - "Petey...that's what makes you a TJCtard and not an ICEsickle, we buy t-shirts not ties. You should see my new Megadeath and Iron Maiden shirts...they are cool!"

"Woodhag" - "Where are those tofu chicken wings that I ordered?"

"Un-Norm-al" Scott - "I bet Randi snuck in the back and did something to them."

James "E. Turtle" - "She probably did. You know guys, I was wondering about something important...about these hot wings, you know what they say right? Hot on the way in, hot on the way out."

Jeff "Andy" Kaufman - "Let's be serious for a moment, we have a major problem coming up. What are we going to do about this candle light vigil at Tweed? That's bowling night!"

"Woodhag" - "Not for me, that's the night I work on my compost pile."

"Un-Norm-al" Scott - "Randi probably found out that that was ICE bowling night and she did it on purpose!"

"Salad" the Barber - "She definitely did, there are probably Unity hack spies in the pub right now, we should beat them to a pulp. Violence makes things right! Damn, I'm so ferocious, I'm so cool!"

Petey "Bowtie" Lamphere - "Guys it's almost Thanksgiving, let's be thankful. What are you guys thankful for? I'll start...I'm thankful that I just won that limited edition Pee Wee Herman bowtie on Ebay."

"Salad" the Barber - "I'm thankful for nuclear missles, ninja throwing stars, tasers, nunchucks, beartraps, and samurai swords."
"Un-Norm-al" Scott - "I'm thankful for turkey and stuffing. Randi better not sneak into the kitchen and put raisins in the stuffing this year."

"Woodhag" - "I'm thankful for tofu turkey, electric cars, and just being one of the ICE guys. It's fun to complain."
James "E. Turtle" - "I'm thankful for pensions. At least I think I am."

"Kip Winger" - "I'm thankful for Britney Spear's new album."