Recent postsView all

This Years projects 2018
15 Sep 18
Final Report to teachers
18 Jun 18
Problems with surveys
8 Jun 18
2019
21 May 18
May TZ2 exams
11 May 18
Subject report and IA
1 Mar 18
New Curriculum
10 Feb 18
Re-organisation
25 Sep 17
First Draft Feedback
2 Jul 17
Venns in the news
24 May 17
This year's projects
19 May 17
Project Planning
5 May 17

Chi Squared, regression and causation

Tuesday 14 April 2015

Both tests on the same data?

This is a quick blog post about a couple of discussions I have had recently about students using both Linear regression and independence tests with the same data! I wrote this in a correspondence recently and thought it worth sharing. Comments welcomed!

'There is not an official line that you cannot use both tests on the same data. My hypothesis is that it is irrelevant to use the two tests on the same data. As I rule I think is good to encourage students to use scatter graphs with numerical data and Chi2 if one or both of the variables are categorical. Clearly correlation does not imply causation, but a chi² test does not imply causation either. The two tests both look for a relationship between 2 variables. If both variables are numerical then a scatter is appropriate, if either or both are categorical then chi² is appropriate. Clearly a chi² test can be done with 2 numerical variables by categorising and it would not be wrong but I can't see a need for it? The outcome of the chi² test would be entirely dependent on the chosen class intervals used to create the categories which could maybe be adjusted to suit an outcome.
I am happy to be corrected on this and usually raise this at workshops because I find it interesting. My challenge to teachers is to produce example data where one test genuinely offers something the other doesn't (ie no correlation, but dependence). I am not saying it doesn't exist, but given the somewhat arbitrary nature of choosing class intervals I suspect that it doesn't..... but I haven't ruled it out. Pragmatically, for mathematical studies I think it highly unlikely that students will successfully differentiate between the value of the different tests done on the same data.

HOWEVER - this does not mean students cant use both.... eg a project might be investigating literacy rates and involve a scatter graph of literacy against GDP, and then a chi² to see if literacy rate is dependent on continent for example. This is great. In this case the student has used both tests to investigate a theme involving literacy rates but not on exactly the same data.
IF students have done both tests on the same data then I think it is difficult to mark because the student would need to justify what one offers that the other doesn't in order for it to be considered relevant.

I summary, my advice is that students avoid doing both tests on the same data, but there is no official line that it cant be done. If you want to give students full marks then I would advise noting to the moderator where you see the difference. Otherwise I think the relevance can be questioned.'

......... to be continued


Sabbatical
13 Jul 2015

Comments


To post comments you need to log in. If it is your first time you will need to subscribe.