alfadriver (Forum Supporter) said:
Are there companies and cars out there that have to specifically do things to pass the cycle and get good FE? Sure. Is it the whole fleet? No. Some of us are specifically told to avoid that, and to turn in anyone who demands that we do cycle specific things like that. (which is one more reason that the whole VW thing was so mystifying to me)
And I can very much see that one can come to the conclusion that it's all about gaming the system. But it's not everyone, and there is real data to show that fleet fuel economy is getting better. Of course, that is offset by increased fleet miles driven.....
I understand and apologize if I've come across as too negative. Emissions testing is certainly, on a fleet level, working. FE testing as well, though not as much. And as much as I hate rev hang and Hyundai/Kia being lazy, it's a small price to pay for living in a city of 1.3 million people and not having to chew my air. My intention with this thread is not to say the entire regime is failed, but rather the two following things:
1. Seeing if the FE and the WOT emissions were being partially gamed. I think we've gotten some pretty good answers to these questions. Namely, there's a combination of my selection bias on FE (mostly that I ignored trucks and SUVs, which have improved the most), gaming by some (especially in the EU pre-VW), and an emphasis on full throttle particulates/enrichment schemes over comprehensive emissions.
2. I wanted to see how the EPA fights against Goodhart behavior and if their strategies align with the things I've seen attempted in academic testing.
If you don't mind, here are the general strategies I've seen in academic testing. I'd like your opinion if they are roughly analogous to what the EPA does and if the EPA is doing anything that academic evaluators are missing. So ...
a) Surprise testing. There are a million ways to do this, but the general theme is that Dieselgate (or test cramming for the SAT) is really only possible at a reasonable cost when you know when and how the test will be. "There will be one or more tests this year, but I'm not telling you when." You can also achieve this by arranging it so that the people being evaluated never figure out that an evaluation has happened. The school district sometimes does this by sending evaluators who are also parents who portray themselves as normal parents to wander into PTA meetings etc. Either approach makes it more likely you observe the manufacturer/student/administrator at "default" state and not at the peak of a Goodhart distortion.
b) Hidden criteria. This is something big testers do, but it will be easier to understand if I give a strategy I started using for the Corona/online classes to calculate participation/attendance scores. A problem I ran into pretty early was students logging into the class and then heading off to do karaoke or whatever while still logged in. This was possible because they thought they knew the criteria for participation/attendance - namely, login records - and therefore they quickly figured out how to Goodhart the system. I used a hidden criteria to solve this problem. I hit each student with a minimum of 3 individual questions per class and I type a backup record of everything into a chat room. I can thus calculate the number and quality of answers students left in class chats to derive a participation score by looking over the records at the end of the semester. The students have no idea this is what I'm doing and thus don't know how to Goodhart the system. In order to be fair, I tell them "you will be graded on the frequency and quality of your responses" and then just don't tell them how I'm determining frequency and quality so they can't cheat.
3. Qualitative rather than quantitative evaluation strategies. Take the example of a history test on the SAT. The purpose of a history education is to make us better citizens/voters, help us learn from the mistakes and successes of people in the past, help us discover the reasons for and sources of our traditions and the traditions of other cultures. However, if we look at bad history test questions "What year was Henry VIII born?" - for example, we can see how easy this is to Goodhart. The student can simply memorize a list of birth dates, get a perfect score on the test and get nothing of value at all to make him/herself a better citizen, learn from the past or understand traditions. Qualitative questions partially solve this problem. If instead of "what year was Henry VIII born" we ask "what is the significance of Henry VIII's reign to modern Europe" the student can't really answer without properly understanding the subject.
These are the general strategies I've seen reduce Goodhart behavior. They are also, I'm sure you'll notice, much more vulnerable to litigation than crappy but objective tests.