Saturday, May 22, 2010

Validity and Reliability

Reliability is the consistency of a test in measuring something.

If you give the same test to the same student three times and the results are 80 - 65 - 90, the test is not consistent in measuring the students' ability. The test has low reliability.feeling beat up
If you give the same test to the same student three times and the results are 85 - 85 - 85, the test is consistent in measuring the students' ability. The test has perfect reliability.big hug

Reliability is not black and white, you cannot say the test is either reliable or not reliable. Instead you have to express the reliability in the form of degrees: very low, low, moderate, high, very high, perfect. (Remember about this when we talked about correlation in the statistics classes?)

Methods of estimating reliability are:
  1. Test-retest: one test, administered twice.
  2. Equivalent form: two similar tests, administered once.
  3. Split-half: one test, administered once, split into two parts (odd & even numbers)
  4. Internal consistency: one test, administered once, apply KR 21 formula.
  5. Interrater reliability: one test, administered once, scored by 2 people.
Which method should you use?
If you use an objective test in your skripsi, I suggest internal consistency (KR21).
If you use an essay test, I suggest internal consistency (Cronbach Alpha).
If you use a writing test, I suggest interrater reliability (Pearson r).

Validity refers to the relevance of the content of test in measuring something.

If you use a math test to measure the students' English proficiency, the test is not valid.feeling beat up
If you use a reading test to measure the students' writing ability, the test has very low validity.d'oh
If you use a reading test to measure the students reading ability, the test has very high validity.dancing

Methods of estimating validity:
  1. Content validity: the test should representatively contain the items that are supposed to be measured. Map the content of your test against the content of the curriculum/syllabus/lesson plans. If they match, your test has high content validity.
  2. Criterion-related validity: the content of a test is related with another test (a criterion). Correlate the results of your test and the results of a standardized test (the criterion). If the correlation is high, your test has high validity, too.
  3. Construct validity: the test should measure the construct well, e.g. a reading test should measure the construct 'reading comprehension', a grammar test should measure the construct 'grammar ability', etc.
Criterion-related validity can be measures in two ways:
  1. Concurrent validity: Administer your test and the standardized test at the same time, e.g. administer your test and the standardized test today.
  2. Predictive validity: Administer your test, then administer the standardized test after several months have passed, e.g. administer your test today, administer the standardized test next semester.

Construct validity can also be measured in two ways:
  1. Pilot study: administer the test to two groups of subjects, one group has the construct while the other does not have the construct. If the former scores higher than the latter, the test has construct validity.
  2. Intervention study: administer the test to a group who does not have the construct as a pretest. Give treatment to the group by teaching the construct, then give them a posttest. If there's a significant difference between the results of pretest & posttest, the test has construct validity.
Which method should you use?
If you use an objective test in your skripsi, I suggest content or concurrent validity.
If you use an essay or writing test, I suggest content validity.

2 comments:

Note: Only a member of this blog may post a comment.