ICT in Education Assessments are Biased and Inaccurate
Would accurate ICT4E assessment be great? Definitely. The more we know about education and teaching, the better we can educate.
However, the most remarkable thing about any ICT4E assessments to decide on the introduction of ICT in education would be their uniqueness in history. One reason such assessments are so scarce is that there are few (if any) historical examples of assessments of any kind done before the introduction of an educational reform. Even less examples where the outcomes of the assessments really mattered in decision making.
In my country alone, the Netherlands, we have just evaluated decades of sweeping educational reforms. Dutch results (sadly, only in Dutch).
One of the conclusions was that indeed, large reforms (e.g., “Het nieuwe leren”, or the new learning) were imposed without scientific support. Another that political prejudices, not any kind of data, were the main motivating factor in the reforms.
Just last week, the results came out about an assessment of yet another reform, in teaching arithmetic in primary education, implemented without “proper” assessment before introduction. Performance in arithmetic had declined and the fight was on which method was better, the “realistic” based instruction which is currently used or the classical, practice based method. The conclusion was that declining standards were caused by the teachers themselves having sub-standard arithmetic skills. So now the teachers will get remedial courses.
(see Evaluation of arithmetic teaching, again only in Dutch).
I am sure every reader can add examples from their own country where sweeping reforms were only assessed long after they had been implemented. The alternative, assessing educational reforms well before introduction, is a form of social engineering. Social engineering seems to always be more difficult than you think. And I think history has shown that education is no exception in this respect.
ICT4E Assessments are always biased
Historically, educational policies are completely determined by the political and religious believes of the parents and, by extension, the politicians and teachers. Scientific “facts” are never appreciated unless they completely align with the preconceptions of the “stake-holders” (minus the children). We might lament it, but such is the world. This is made worse by the fact that few parents actually understand what their children are learning. So the parents are likely to try to improve upon a schooling model of thirty years ago, to prepare children for a world which does not exist anymore.
On the ICT side, those who are old enough to have experienced the introduction of personal computers in the work place will remember that the introduction was definitely not the result of an assessment on the productivity. Accountants and secretaries were trained on computers because everybody understood the usefulness of Wordperfect and Lotus1-2-3. No questions asked. The same with the introduction of Faxes and Email. This lead to some weird discussions in economic circles “You can see the computer age everywhere but in the productivity statistics“.
So why would ICT4E assessments be different? They are not. But they are beautiful handles for political fights.
ICT4E Assessments are inaccurate
But lets suppose we will do such an assessment for ICT4E. What will be tested is very simple: Does this ICT4E solution improve scores on existing tests. The outcome can be predicted quite accurately:
- In resource poor schools, current practices are not optimized for the tests. Any improvement of resources, whatever resources they might be, will improve test scores indiscriminately
- In resource rich school, current practices are already optimized for maximal test scores, i.e., teaching for the test. No increase of resources will improve test scores.
The problem here is that the tests have been adapted to the curriculum, and vice versa. An illuminating example is mathematics education in the USA. Read A Mathematician’s Lament (PDF) by Paul Lockhart. From page 15:
In place of a natural problem context in which students can make decisions about what they want their words to mean, and what notions they wish to codify, they are instead subjected to an endless sequence of unmotivated and a priori “definitions.” The curriculum is obsessed with jargon and nomenclature, seemingly for no other purpose than to provide teachers with something to test the students on.
No mathematician in the world would bother making these senseless distinctions: 2 1/2 is a “mixed number,” while 5/2 is an “improper fraction.” They’re equal for crying out loud. They are the same exact numbers, and have the same exact properties. Who uses such words outside of fourth grade?
Students must learn completely useless factoids for the benefit of having something to test and they learn them because they are tested on these useless factoids. Like the distinction between mixed number and improper fraction of the example or (lower on the same page) the equally useless definition for sec x as 1/cos x.
The ultimate, bad, example of the parasitic relation between teaching and testing was English language teaching at Japanese schools. Testing was done exclusively with multiple choice tests. Formally, Japanese students earned high grades in the tests on English, but in reality they were at the rock bottom of proficiency in the world (see The Enigma of Japanese Power by Karel van Wolferen).
Assessments and the Chinese exams syndrome
The classical Chinese imperial examinations were the ultimate in pointless testing and learning for the test. Since the Han dynasty (200 BC), applicants to government positions were required to do an Imperial exam.
What is relevant to this discussion is that the contents of the exams were utterly disconnected from the work the civil servants were supposed to do and equally disconnected from any other form of reality of the time. But students would commit most of their waking time to study for these exams.
Obviously, these are extremes, but all curricula contain items that are there not because students might benefit from knowing them but to allow testing. On the other hand, all curricula skip items that the children should know but cannot be tested (easily). Testing students is important. But we all know that testing students to assess comprehension and mastering of the subject matter is only half the truth.
The other half is that we test to force students to study, which seems to benefit from sprinkling factoids to memorize throughout the lessons. As all humans, students won’t work if they do not see the benefit, and passing the grade is the easiest incentive. But we also know how this can backfire when the students will limit themselves to root learning the useless factoids just to pass the grade without any comprehension. The Chinese exams syndrome still affects many school systems.
But if we realize this, what would be the use of doing a ICT4E assessment? We would simply hide the failings of the existing system without uncovering the benefits and shortcomings of the new system.
Test Aptitude, not ICT4E
If we want to test whether changes in education really improve learning, we do have other tools. They are called aptitude tests.
What we do know is that learning is a function of practice. To learn you have to practice. To learn to read, you have to spend time reading. The more time you spend reading, the better you will read. To understand geography, you have to study maps, the more maps you study, the better you will understand geography. To learn a new language, you have to read and write in that language. Even better, listen and speak the language.
By the way, the easiest way to increase the English proficiency of children is to broadcast all TV show in original with subtitles. See “What We Can Learn From Foreign Language Teaching in Other Countries” (PDF)
Note that conventional (high-)school tests tend to grapple with language proficiency. In the Japanese example above, any improvement of teaching real English proficiency would have seriously decreased test performance as it would have taken time away from root learning the answers to the multiple choice questions.
So an assessment which investigates whether a change in teaching practices improves the times spend on really practicing skills and learning the subject will automatically tell you whether children learn “more”.
In the end, any real assessment of any educational reform requires a new reflection on what skills and knowledge the children are supposed to acquire at school. Only then is it possible to compare the old and the new. But this is both independent of ICT4E and a political hornets nest.
I am an elementary teacher who retired somewhat earlier than I had planned, in part because the instructional program of my school had become obsessively devoted to test prep. So I am instinctively sympatheic to Mr. van Son's point of view. And over the years I have had opportunity to read various 'evaluations' of proprietary technology (usually computer programs). When one reads them carefully, they almost always seem to reduce to demonstrating that the more time the student spends using the program, the better the results. This, of course, is true of _any_ form of study, and is thus irrelevant. The assessments need to compare the technology in question directly with alternative activities, showing results for equivalent allocated (not engaged) times. And they should also compare for cost-effectiveness. I don't believe I have ever come across a study that does these things.
I agree wholeheartedly with the view that current curriculums greatly elaborate useless terminology. The example given, however, may not be the best. Yes, 2 1/2 is the same quantity as 5/2, but getting the student to understand this will at some point involve teaching him to convert one expression to the other. In this case the terminology is needed to carry on the discussion. There are plenty of other examples of useless verbosity. My nominations: "commutative property," "associative property."
When I received this post from Rob, I expected an uproar in comments – educators defending or disparaging evaluations non-stop. Instead, I am very surprised that only the two of you have commented, and both in agreement. I greatly underestimated the frustration with ICT assessments that can be best summarized by Debobroto's response to this post:
Truth be told. I couldn't agree more…
.
If you read the comments to the "pro-" camp, they agree too. There are good evaluations, but they are not (ever?) used in decision making.
I really respect and value educational research. I marvel about what has been discovered about, eg, learning to read in (pre-)school children. But the chasm between results from academic research on learning and what is actually evaluated in school programs is currently unbridged.
I have three bonus links:
Teaching to the test in Japan
http://www.crf-usa.org/bill-of-rights-in-action/b…
School Testing: Good for Textbook Publishers, Bad for Students
http://www.alternet.org/story/39/school_testing:_…
Secret of engineering textbook
http://www.articlealley.com/article_129305_22.htm…
Rob van Son
Coming from the business world, I view this simply as "success metrics." Every single product, program or initiative, assuming they are managed by competent marketeers or project managers, should have defined success metrics before the project is launched. The project is then assessed based on real world data compared to the original success metrics.
But even in the business world, these assessments can become over-complicated. Often a few simple metrics that are statistically valid is sufficient.
I don't think anybody is arguing that there should be no assessments, just the what and how of how these are implemented. And that can be debated to the cows come home.
My principle is always to keep these assessments simple but useful.
Indeed, there DOES seem to be an argument against assessment of any kind, at any time — not because the notion of assessment is bad but because effective assessments have not yet been designed, and even if they had been, they would very likely be misapplied and misused.
The US Department of Education recently (2009) commissioned an interesting study on evidence-based effects of online learning. I was shocked to learn that very little empirical research on this topic exists for elementary and secondary schooling. In fact, until 2006 there were NO such studies targeted at K-12 learners. None. Yet, by that time, billions of dollars had been invested in computer-networked "distance education" for schools..
We can debate whether or not such research would inform better policy and program development — maybe it wouldn't — but the notion that such research doesn't exist, and policy-makers don't seem much to care, is unsettling.
The USDoE Report is available in PDF at
http://www.ed.gov/rschstat/eval/tech/evidence-bas…
Please forgive my apparent "barge-in." I was introduced to this forum by another member.
In Oceania, we have been studying the issues of M&E quite a bit as our major funders are insistent that we "prove" educational results before going to scale up.
Our conclusions are that any evaluation on any educational ICT intervention can only be approximate, mostly as you cannot evaluate an ICT intervention in isolation. Lets say you are looking to improve basic literacy with OLPC. So you meaure basic literacy (and perhaps you even have baseline data and a control group), but what do the results tell you? Most likely that something happened,but can you isolate it to ICT? Usually schools are always trying to improve basic literacy and may have several initiatives in place, so how do you isolate the outcome to OLPC? It could just boil down to a great (or poor) principle driving change.
I guess this is why there is so little published studies.
We understand funders wanting to see good evaluation as they want to make sure their limited resources go to the best possible outcomes.
IMHO, the best thing to evaluate is what tools are the best "agents of change". All agree that we need to change education systems (both in developed and developing worlds) to meet the challenges of the new Information Society.
In Oceania, we have partnered with the Australian Council of Educational Research to provide assistance to Ministries of Education wishing to start an OLPC project and hopefully we can gather (imperfect) data that will influence policy on scale up of OLPC.
Ian. You make a very good point. Because policy-makers are held accountable for the results of public investments made under their watch, they default to what they can understand and communicate crisply and easily. The notion that a true evaluation of any strategic or tactical effect requires in-depth observation and analysis of uncountable (in the strict sense) things is difficult to grasp when one's political credibility is on the line. This can lull decision-makers into the fiction the only things that "count" are the things that can be counted.
It's all very well finding all the things that are wrong with current assessment but what is the research evidence that removing it will improve learning? What is a viable alternative that has the substance of research behind it? Assessing ICT on the basis that it does or does not improve tests in other subjects that are then tested by paper methods devoid of technology seems perverse, especially when the only time any school aged learner is likely to write for 2 hours with pen and paper is in in a school based exam. I run a company specialising in assessment for learning and the biggest weakness is that teachers are not equipped with the technological skills to move to systems we can show saves significant amounts of money that can then be re-invested in learning. The in appropriate use of ICT and entrenchment in out of date practices are symptoms of a need for better ICT teaching and better forms of assessment but I don't easily buy the notion that we can dispense with ICT teaching or systems of quality assurance for learning simply because current systems have been dumbed down by politicians.
"It's all very well finding all the things that are wrong with current assessment but what is the research evidence that removing it will improve learning? What is a viable alternative that has the substance of research behind it?"
I love assessments. The problem I point out are two-fold:
1) What assessments are really assessing the future prospects of students?
2) Who actually uses assessment results to make informed decisions?
A standardized test is aimed at sampling the skills and knowledge of the students. Teaching to the test makes this sampling so biased that the results are irrelevant. The answer is to improve the "sampling". But if we introduce changes in education, we can be pretty sure that the old assessments become worthless. So how can we assess the new practices using the old yardsticks?
In the end, I have not seen any changes in education that were really based on a well thought out assessment. Or any real assessment at all.