Back in the days of yore, when such things were still permissible, I might have wanted to investigate the intervention effect of smacking children. (This is not about the rights and wrongs of corporal punishment; I am using this rather politically-incorrect example simply because the relationship between cause and effect might be reasonably – though not perfectly – observable).
It would not be sufficient for me to ‘know’ (read ‘believe’) from experience that smacking children did indeed largely cause them to cry: in order to justify the intervention, I would need research-based evidence.
So I could set up a controlled study to investigate. I would need to establish that the smacks being administered were totally uniform in nature and circumstance (smacking machine, anyone?) so that variations in the strength of the smack could be factored out of the results. If I could quantify the strength of the smack (also difficult) I could then observe how many children did indeed cry as a result of a certain smack. But could I be absolutely sure that the crying was the result of the smack and not something coincidental and entirely unrelated?
I will ignore this possibility for the sake of pursuing a simple argument. I could conduct this research for as long as I felt necessary to collect a representative data set, after which I could perhaps arrive at something approaching an effect size. We might observe, for instance, that 95% of children did indeed cry when administered the standardised smack. I might decide that this was a sufficiently strong effect to justify the intervention.
But even a high figure like 95% raises difficulties; the first one concerns certainty. As previously mentioned, there is a small but possible chance that some of the crying was not in fact caused directly by the smack, but by any one of a multitude of other, unknown factors. The timing of the crying might be entirely coincidental, or it might be indirectly causal, for example if the trauma or anxiety of the moment caused other emotional upwelling to result in crying where none would have resulted from the pain of the smack alone. We might also wonder whether the 5% of children who did not appear to cry did in fact go to their rooms late that night and sob their hearts out, unseen by anyone – it was just that our effect-measuring time frame was wrong.
In fact, that 5% creates all sorts of problems: in a class of twenty (if only…) that would mean that one child did not cry when smacked. This is enough to disprove the claim that ‘smacking (always) causes crying’. We then need to decide why this child did not cry and whether this was sufficient to render the entire intervention/theory incorrect. The reasons why the one child did not cry could again be numerous, ranging from a higher pain threshold, to defiance, to the fact that (s)he was accustomed to being smacked at home, an acceptance that the smack was warranted, a stiff upper lip – or the exercise book inside his/her trousers . I might be able to discover these things, but there again, I might not – and quantifying them so as to factor them into my research could be exceptionally difficult.
Even if I decided that I couldn’t do this, but that 95% was good enough, I would then be faced with anticipating the future effect of the intervention; even if the 95% figure proved accurate, there is no way of knowing whether, on the next smacking event, it would be the same individual who did not cry, or a different one: there are too many unknown factors ever to be certain. Indeed, the 95% might also change, perhaps because that one individual experienced peer pressure to conform, or because his/her defiance had galvanised a few others – or perhaps simply because they were becoming habituated to the treatment. In fact, we have already manufactured a false dichotomy by framing the responses as ‘cried/didn’t cry’ – when there are numerous other possible consequences and effects of that smack.
We are also left with the dilemma of what to do with the 5% who don’t cry. Should we administer a sharper version of the same treatment – or will this be counter-productive? Or should we take a different approach entirely; if so, then what – and is it practicable to do so? How do we know? What if that 5% does turn out to be a different individual at each iteration? What should we do then? Those cumulative five percents start eating into our effect size, and it might suggest that 95% certainty is too high. Or are the reasons why our intervention is not working so completely out of our control that nothing we can do will work (whether we know it or not)?
Then there is the whole problem of whether we have accurately identified a suitable and desirable outcome for the proposed intervention – which in my example above, we very probably have not. What’s more, we need to know that we can implement the same intervention again perfectly – or at least with only controllable variations – each time we use it. So is it possible to maintain absolutely the right intensity of smack, regardless of any other factors that might affect that? Can we be sure that circumstantial factors will not influence the effect in ways that render it ineffective? (Maybe children cry more willingly at Christmas, or on their birthdays, or if they are second children, or if they have emotional parents? Maybe the smacks very according to the smacker’s mood, tiredness or liking of the child). And can we be sure that the same reactions will be observed in an entirely different group of children, perhaps with different backgrounds or of different ages?
In the past few weeks, several people have said to me that they would rather rely on the results of theory and research rather than ‘hunches’ – or as I prefer to think of it, Masterful Experience. In one case, very few reasons were given for this, and my reaction was that this sounded pretty much like a hunch in its own right, a case of someone being blinded by apparent science without really feeling the need to ask too many questions.
A second suggested that individual experience is often misleading and subject to confirmation bias. This may not be wrong, though one might have hoped that a genuinely reflective, experienced professional would at least attempt to identify and allow for such things – so it is something of a hunch in its own right to assume that they rarely or never do.
I am not suggesting that careful research can never shed light on some things (though just how careful careful needs to be was shown by the potential flaw in the research quoted even by Dylan Wiliam in this exchange here (follow the comments) – and this in a case where research might just have shown up something that was counter-intuitive). But if we take my rather absurd example above, the implications of properly researching even something as simple as a fairly-mechanical connection between smacking and crying are so huge that it is all but impossible to factor in everything that needs to be known, let alone quantify it accurately. The above example was chosen because of the relatively direct, immediate effects of the ‘intervention’ – so how much harder it must be when the whole process is cognitive-intellectual in nature – and when it is not even easy to agree on the existence or desirability of a given effect.
Research could no doubt tell us that the majority of children do indeed cry when smacked – but it cannot offer us complete certainty without over-stepping its confidence levels, or tell us why, or give us much guidance what to do with the exceptions. It is, in fact reduced to the role of generalisation –which is nothing that wide anecdotal experience of human behaviour largely could not tell us anyway. And even a high correlation neither guarantees the rule, nor tells us anything specific enough to know what to do with a specific individual at a specific moment. What if the one child standing in front of us right now is indeed the 5%? We will simply never know until after the event – and that is enough to make it impossible for us to act with any certainty; we are left with nothing more than our best guess.
Maybe one day I will be proven wrong – or maybe I already have been. By training and inclination I am neither a research scientist nor advanced statistician, and perhaps the foregoing does nothing so much as reveal my woeful ignorance. I apologise to any of that inclination who are currently tearing at their hair on my account. But even if I am wrong, what are we to make of the ethical implications of such powerful and complete mind-control? Do we really want to arrive at such a situation?
All I am offering is the honest attempt of a relative lay-person to scrutinise whether all that is presently being claimed for ‘research’ warrants further trust being put in it. We are so often required to take much of what we are told on trust – and too often this has been shown to be either unworkable or scientifically flawed. The foregoing represents, in my opinion a conscientious attempt to reason through the claims being made by some for research. I cannot yet see that the objections have been addressed – and so, would it not be professionally irresponsible of me to act on such a hunch?
And I won’t even start to speculate what would happen if smacking were shown to have a large effect size on children’s learning…
No children were harmed in the making of this blog post.