Validity and reliability

The concepts of validity and reliability are the cornerstones of the diagnostic process. For a classification system to be valid, it should be able to classify a pattern of symptoms that can then lead to an effective treatment. The question of the validity of a diagnosis is complex.  The psychiatrist is often highly dependent on self-reported data from the patient.  In addition, judging the validity of a diagnosis by the effectiveness of a treatment is problematic.  Treatments make assumptions about the origin of the disorder. The diagnosis could be correct, but the treatment could be ineffective for a number of reasons.  On the other hand, an individual may show significant improvement over time, but it may not be due to the treatment.  This would mean that although the individual improved, it was not due to a valid diagnosis.

The problems in diagnosis are illustrated in Rosenhan’s classic study.

The validity of a diagnosis is whether the diagnosis is correct and leads to a successful treatment.

The reliability of a diagnosis is whether two or more psychiatrists using the same classification system make the same diagnosis.

Research in psychology: Rosenhan (1973)

Rosenhan wanted to test the validity of psychiatric diagnoses as well as determine the negative consequences of institutionalization.

He conducted a field study where eight healthy people tried to gain admission to 12 different psychiatric hospitals. They complained that they had been hearing voices. The voices were unfamiliar, of the same sex and said single words like “empty” or “thud”. These were the only symptoms they reported. Once they were admitted to hospital, they immediately stopped reporting symptoms and act “normally.”

Seven of the pseudo-patients were diagnosed as suffering from schizophrenia.

After the individuals had been admitted to psychiatric wards, they all said they felt fine, and that they were no longer experiencing the symptoms. It took an average of 19 days before they were discharged. For six of the pseudo-patients, they were discharged with a diagnosis of  “schizophrenia in remission”, implying that the schizophrenia might come back.

During hospitalization the pseudo-patients experienced very little contact with doctors and what they considered to be “a lack of normal interaction” with the staff, feeling that the replies from the nursing staff were lacking eye-contact and a personal connection. In addition, the staff interpreted the patients' normal behaviour – for example, note-taking – as abnormal.

Rosenhan concluded that, ““It is clear that we cannot distinguish the sane from the insane in psychiatric hospitals. The hospital itself imposes a special environment in which the meaning of behavior can easily be misunderstood. The consequences to patients hospitalized in such an environment – the powerlessness, depersonalization, segregation, mortification, and self-labeling – seem undoubtedly counter-therapeutic.”

The study played a key role in raising awareness about the way that diagnosis is carried out and the treatment that patients receive in mental hospitals.  Rosenhan showed that when people come into a hospital, it is assumed that there is a problem that needs treatment.  Then, once a diagnosis is made, health professionals may notice behaviours that they believe are in line with the diagnosis – what is called confirmation bias.

The study was highly influential in promoting change in hospital practice, protecting the rights of the patients.  In addition, today diagnostic manuals are much more complex and psychiatrists are encouraged to practice data triangulation in diagnosis. But was does Rosenhan’s study really teach us about the validity of diagnosis?

When discussing the validity of diagnosis, this study has several limitations:

  • The study is unethical because no consent was given by the people working in the hospitals. In addition, deception was used by the confederates. Rosenhan did not debrief the hospitals on his findings or allow them to withdraw from the study. Some consider the outcome of the study important enough to justify the lack of consent.
  • There is no way to verify the validity of the claims made by the “patients.” Rosenhan wrote that the nurses saw note-taking as an “aspect of their pathological behaviour.” However, the nurses’ notes simply said “engages in writing behaviour.” This is an example of researcher bias.
  • Only a single disorder was studied – schizophrenia. It is not possible to say from this single study that diagnostic systems are therefore invalid.

ATL:  Communication & CAS

One of the great contributions of the Rosenhan study was the way it challenged psychiatric hospitals to reconsider the way that they interacted with their patients. The study led to reforms in mental health and a movement to deinstitutionalize as many patients as possible.  Today, many patients are treated on an out-patient basis and only the most serious patients would be institutionalized.

What is the current situation for mental health care in your country? In your city?  Are there concerns for the humane treatment of the mentally ill or is your system a model for others?

Create a short video or presentation to inform your community about the status of mental health care and make proposals for how it could be improved or supported.

Reliability of diagnosis

Another limitation of classification systems is their level of reliability. For a classification system to be reliable, it should be possible for different clinicians, using the same system, to arrive at the same diagnosis for the same individual. Although diagnostic systems now use more standardized assessment techniques and more specific diagnostic criteria, the classification systems are far from perfect.

Why is it so difficult to come up with a reliable diagnosis?

  • Blood and urine testing cannot currently be used to diagnosis psychological disorders, although there is much research trying to accomplish this goal.
  • Disorders are “clusters of symptoms.” These symptoms are assumed to be related to one another, even though this may not be the case.
  • Many symptoms are difficult to measure. For example: a decrease in concentration, feelings of helplessness or hearing voices. Psychiatrists are heavily dependent on self-reported data and this is known to result in some bias.
  • Individuals may suffer from two or more psychological disorders simultaneously. This is known as comorbidity. For example, clinical depression and alcohol use disorder are comorbid – that is, many people suffering from alcohol use disorder, also have clinical depression

The difficulty in establishing a reliable diagnosis was demonstrated by Lipton & Simon (1985). The researchers randomly selected 131 patients in a hospital in New York.  All of the patients had been diagnosed with a psychological disorder. Seven clinical experts at the Manhattan Psychiatric Center reevaluated the selected patients and their diagnosis was then compared with the original diagnosis. Of the original 89 diagnoses of schizophrenia, only 16 received the same diagnosis on re-evaluation; 50 were diagnosed with a mood disorder, even though only 15 had been diagnosed with such a disorder initially. Such results indicate that the same symptoms may not necessarily lead to the same diagnosis by a different psychiatrist. This study demonstrates the importance of having more than one professional give a diagnosis.  

One of the limitations of the Lipton & Simon study was that patients were already undergoing treatment. This may have led to changes in symptoms and could be one reason for the different diagnoses.  Studying the reliability of diagnosis in a naturalistic setting presents such problems, and this could be a reason for why less ecologically valid approaches are often used.

Research in psychology: Lobbestael, Leurgans & Arntz (2011)

Lobbestael, Leurgans & Arntz (2011)  investigated the reliability of diagnosis using the DSM IV with a sample of 151 participants, consisting of both patients and non-patients.  The original clinical interviews, often lasting up to two hours, were audio-taped. The interviews were then assessed by a second psychiatrist who did not know the diagnosis made by the first psychiatrist.

The results showed that generally there was higher reliability for personality disorders over other disorders. There was a 71 percent rate of reliability in the diagnosis of Major depression, whereas there was an 84 percent rate of reliability of personality disorders. The high rate of consistency in diagnosis indicates that a diagnostic manual like the DSM IV, in this case, ensures that clinicians are more in agreement on a diagnosis.  But even if the diagnosis is reliable, that does not necessarily mean that it is valid.

A strength of this study is that the researchers used a single-blind procedure - the second psychiatrist did not know the diagnosis made by the first psychiatrist.  Also, by using only audio tapes, non-verbal behaviour or the appearance of the patient did not affect the diagnosis process.  However, this can also be seen as a limitation. It is difficult to know the extent to which non-verbal behaviour may have played a role in the first diagnosis.  Therefore, the second diagnosis may be too controlled and could have missed important non-verbal data which may have changed the diagnosis.

Even though psychiatrists use the same diagnostic tools, they may come up with a different diagnosis.  One of the questions that needs to be answered is – is it the diagnostic tool itself that is the problem?  Or are there also other factors that may affect how a psychiatrist makes a diagnosis?

Checking for understanding

What does it mean if we say that a diagnosis is “valid?”



Which of the following is not true about Rosenhan’s classic study?

The study wanted to test the validity of the diagnosis - not whether more than one doctor would make the same diagnosis. There is evidence of researcher bias - as seen in the example that the "patients" noted what the nurses were writing in their reports, without actually having seen what they were writing.


Which of the following was a confounding variable in the Lipton & Simon study?

As the study had the aim to test the reliability of diagnosis, the fact that some of the patients had been undergoing treatment for schizophrenia for quite a while may have affected the level of their symptoms and therefore the diagnosis.


Why did Lobbestael, Leurgans & Arntz use audo-recordings of the diagnostic interview, rather than having the psychiatrist have a face to face interview?



Total Score:

Factors influencing diagnosis

All materials on this website are for the exclusive use of teachers and students at subscribing schools for the period of their subscription. Any unauthorised copying or posting of materials on other websites is an infringement of our copyright and could result in your account being blocked and legal action being taken against you.