VALIDITY
· refers to the accuracy of an assessment -- whether or not it measures what it is supposed to measure. Even if a test is reliable, it may not provide a valid measure.
Type of Validity | Definition | Example/ |
Content | The extent to which the content of the test matches the instructional objectives | A semester or quarter exam that only includes content covered during the last six weeks is not a valid measure of the course's overall objectives -- it has very low content validity. |
Criteria | The extent to which scores on the test are in agreement with (concurrent validity) or predict (predictive validity) an external criterion. | If the end-of-year math tests in 4th grade correlate highly with the statewide math tests, they would have high concurrent validity. |
Construct | The term construct is defined as a property that is offered to explain some aspect of human behavior, such as mechanical ability, intelligence, or introversion | early self-esteem studies - self-esteem refers to a person's sense of self-worth or self-respect. Clinical observations in psychology had shown that people who had low self-esteem often had depression. Therefore, to establish the construct validity of the self-esteem measure, the researchers showed that those with higher scores on the self-esteem measure had lower depression scores, while those with low self-esteem had higher rates of depression |
Factor affecting Validity | explanation |
Nature of the group | Consistency of the validity coefficient for subgroups which differ in any characteristic (e. g. age, gender, educational level, etc, …) |
Sample heterogeneity | A wider range of scores results in a higher validity coefficient (range restriction phenomenon) |
Criterion-predictor relationship | There must be a linear relationship between predictor and criterion. Otherwise, the Pearson correlation coefficient would be of no use! |
Validity-reliability proportionality | Reliability has a limiting influence on validity – we simply cannot validate an unreliable measure! |
Moderator variables | Variables like age, gender, personality characteristics may help to predict performance for particular variables only – keep them in mind! |
Criterion contamination | Get rid of bias by measuring contaminated influences. Then correct this influence statistically by use of partial correlation. |
Reliability
· The degree of consistency between two measures of the same thing. (Mehrens and Lehman, 1987).
• The measure of how stable, dependable, trustworthy, and consistent a test is in measuring the same thing each time (Worthen et al., 1993)
TYPES OF RELIABILITY | DEFINITION | EXAMPLE |
TEST-RETEST | The same form of a test on two or more separate occasions to the same group of examinees (Test-retest) | For example, the examinees will adapt the test format and thus tend to score higher in later tests. Hence, careful implementation of the test-retest approach is strongly recommendation |
EQUIVALENT FORM | Two different forms of test, based on the same content, on one occasion to the same examinees | A examinee who took Form A earlier could not share the test items with another student who might take Form B later, because the two forms have different items. |
INTERNAL CONSISTENCY | The coefficient of test scores obtained from a single test or survey | The same principle can be applied to a test. When no pattern is found in the students' responses, probably the test is too difficult and students just guess the answers randomly. |
SPLIT HALF | A measure of consistency where a test is split in two and the scores for each half of the test is compared with one another. | you have the Math test and divide the items on it in two parts. If you correlated the first half of the items with the second half of the items, they should be highly correlated if they are reliable. |
INTER RATER | When multiple people are giving assessments of some kind or are the subjects of some test, then similar people should lead to the same resulting scores. | Two people may be asked to categorize pictures of animals as being dogs or cats. A perfectly reliable result would be that they both classify the same pictures in the same way. |
The relationship between validity and reliability.
At best, we have a measure that has both high validity and high reliability. It yields consistent results in repeated application and it accurately reflects what we hope to represent.
It is possible to have a measure that has high reliability but low validity - one that is consistent in getting bad information or consistent in missing the mark. *It is also possible to have one that has low reliability and low validity - inconsistent and not on target.
Finally, it is not possible to have a measure that has low reliability and high validity - you can't really get at what you want or what you're interested in if your measure fluctuates wildly.