Evaluating research

One of the key ways that you can demonstrate critical thinking on the IB exam is to evaluate research. This means that you need to discuss both the strengths and the limitations of a study. Often students tend to focus only on the limitations.  This often leaves examiners wondering if any psychological research is any good!

There are several other ways to evaluate studies. There is no such thing as a definitive list. However, using different strategies throughout a response will demonstrate to an examiner a richer understanding of the strengths and limitations of psychological research.  You want to avoid stating the obvious like "the sample could be larger" or "the experiment should be replicated."

When evaluating a study, you should follow a three-step approach:

Step 1:  Identify the strength or limitation.

Step 2:  Link the evaluation directly to the study.

Step 3:  Explain why this is important in the study of the behaviour being discussed.

For example, you should not simply write: One limitation is that the study has bidirectional ambiguity.

Instead, you should write: One limitation is that the study has bidirectional ambiguity, that is it is unclear if watching more violent television causes children to be violent, or if more violent children watch more violent television. Though a correlation is established, it does not show a cause-and-effect relationship.

Strategies for evaluating research

The IB psychology curriculum lists the following five general strategies for evaluating research:

  • Research design and methodologies
  • Triangulation
  • Assumptions and biases
  • Contradictory evidence or alternative theories or explanations
  • Areas of uncertainty

Methodological considerations

One of the most basic ways to evaluate a study is to consider how the research was actually done. It is important to consider the strengths and limitations of the research method as well as the procedure followed by the researchers.

When discussing the methodological considerations of a study, think about the following.

Is the sample representative?

There are several studies in which all the participants are students or only one gender or culture. This is an example of sampling bias and limits the generalizability of the findings.

Does the study have internal validity?

Remember that a study has internal validity when it is well controlled and we can safely say that confounding variables did not influence the results of the study.

Are the variables well defined to allow them to be measured?

For example, when studying "aggression," the term must be very carefully defined. If it is not well operationalized, the study may lack construct validity. For example, if the researcher is measuring aggression by the number of times someone pops his gum, that might not be considered a valid measurement.

This often also lowers inter-rater reliability. In other words, when several investigators carry out an observation, it is essential that the variables are well defined so that all raters/observers will have similar findings.

Have demand characteristics potentially influenced the results?

There are several ways in which demand characteristics can affect a study. The participants may guess the aim of the study and act in a way either to help the researcher (expectancy effect) or hurt the research (screw you effect).  In self-reported data, participants may also try to "look good" in front of the researcher - social desirability effect.

Can the study be replicated? 

If the study can be replicated and get the same results it is considered reliable. If the study was poorly controlled or not standardized, then reliability is a problem. However, in some cases, although the study cannot be replicated exactly, similar studies with similar results can be compared - for example, several case studies of the same brain injury.

Is the study ecologically valid?

Ecological validity has to do with the artificiality of the study. If the study is too artificial, then it may not predict what will happen in real life. Artificiality may occur when the experiment is well controlled/standardized.

Does the study demonstrate causality?

If a study is correlational in nature, it may imply causation, but causation cannot be established.  In addition, a correlation may suffer from bidirectional ambiguity - where it is unclear if x causes y or y causes x. If a study demonstrates causality, it is important to note the potential type of causality that is shown.


There are several types of triangulation.

Data triangulation is when data is gathered from several different sources - for example, from several different schools, cities or cultures. This increases the reliability of the data.

Method triangulation is when more than one research method is used to test the same effect.  When the same result is obtained regardless of whether you are using an interview, a survey or an observation, we say that the results are credible.

Researcher triangulation is when several researchers carry out the same study and interpret the data separately.

When there is quantitative data, researcher triangulation allows for inter-rater reliability.  When the data is qualitative, it helps to establish credibility - for example, when three different research all read a set of interviews and then write an interpretation.  They compare their interpretations and if they are similar, the results are more credible.

Finally, theory triangulation is often used to explain a behaviour.  This is when different approaches - that is, biological, cognitive and sociocultural - work together to explain a behaviour to make for a more holistic approach to the problem.

Assumptions and biases

Each of the approaches has its own assumptions.  For example, the biological approach argues that we can study animals to understand human behaviour.  Challenging assumptions in your essay is a good way of demonstrating critical thinking.

In addition, a study may show a bias.  The bias can be the result of the sample - for example, all males, all students, or all Americans - or it may be the result of the theory - that is, a strictly biological approach which discounts other possible factors.

When discussing cultural biases, you can address whether the study used an etic approach or an emic approach.  Both approaches have their own limitations.

Alternative explanations

Especially when answering a "to what extent" question, it is important to show understanding of alternative explanations of behaviour.

This can be done by either discussing research that has a different result than the original study - or it can be researched from another one of the approaches.

Sometimes a study looks at a single variable in isolation, not accounting for how the variable may interact with other variables. For example, in a health study on the role of time-consciousness on the rate of heart attacks, one has to question whether it is really possible to isolate this one variable in a person's life, or if other possible variables (smoking, lack of exercise, genetic predisposition) may have an equal chance of affecting health.  A good study will use a matched design in order to make sure that these variables are accounted for.

Areas of uncertainty

Finally, one of the ways in which you can demonstrate critical thinking is to discuss areas of uncertainty - in other words, if you were going to continue to investigate this question, what else still needs to be studied? What are the questions that remain?

For example, in the classic Bashing Bobo study the question that remains may be - How long term are the effects of observing violence?  Are children only aggressive immediately after observing an adult act aggressively, or will that behaviour continue long after they have observed the model?

Being able to "ask the next question" is a very important part of the inquiry process and it is what drives psychological research.

ATL:  Critical thinking

For each of the following research scenarios, identify a limitation and why it is important in the context of the study.

1.  The school concludes that the IB program makes students more intelligent because the students who have taken the program get the highest marks in the school.

This is a question of bidirectional ambiguity.  It is difficult to know if the IB program is helping students to develop skills to be more successful, or if students who are already motivated and have higher levels of skill are the ones that are signing up for the program.

2. A researcher has found that Americans learn best by listening to music.  She has Americans memorize a list of 40 common objects while listening to a popular piece of music.  She is interested to see if Nigerian children would also show this pattern of behaviour so she administers the same test to a group of Nigerian children in villages outside of Lagos.

This study uses an etic approach.  It makes a lot of assumptions.  First, using the same list of objects may be a confounding variable as they may be unfamiliar to the Nigerian children.  In addition, the use of the same music may be culturally unfamiliar.  Finally, it assumes that memorizing lists of words is culturally relevant. It could be argued that the study is culturally biased.

3. A researcher wants to test the level of prejudice in a community. A group of Czechs are given a test of prejudice against foreigners. They are told that the surveys are anonymous, but overall they score high for prejudice.  However, when they are interviewed, they almost all talk about positive experiences with foreigners and do not demonstrate any clear prejudices.

The study uses method triangulation and because the results are not consistent, we can say that the findings are not credible.  It may well be that in the interviews the Czechs showed social desirability effect and did not want to look bad.  It could also be that on the survey they either misinterpreted the questions or found it difficult to rank the responses effectively.

4. Anna and Maria carried out an experiment to test if people would have difficulty solving a puzzle more if they were told that it was difficult before they began, than if they were not told anything.  After they had carried out the study, Anna realized that she had not read the directions that she and Maria had written together.  In addition, Maria had used a stopwatch while Anna had used the clock on the wall to time people. 

The problem is that it appears that Anna and Maria did not have a standardized procedure. This means that the results are not highly reliable. It would not be able to replicate their study to verify their results.

5. A researcher carries out a study to find out why relationships fall apart.  The researcher found that those couples that were in danger of divorce were the ones who did not communicate effectively with each other.

This study has the problem of construct validity.  What does it mean to "communicate effectively" in a relationship?  The researcher may have had a test for this but the question is whether the test produces valid and reliable results.  This would be possible if the test had already been given to a large sample and had been shown to consistently predict the break-up of relationships.

All materials on this website are for the exclusive use of teachers and students at subscribing schools for the period of their subscription. Any unauthorised copying or posting of materials on other websites is an infringement of our copyright and could result in your account being blocked and legal action being taken against you.