Discussion Methods for Fairness in Testing

Discussion Methods for Fairness in Testing essay assignment

Discussion Methods for Fairness in Testing essay assignment

In the unit readings from your Psychological Testing and Assessment text, you read about misconceptions regarding test bias and test fairness—two terms that are often incorrectly considered synonymous. While questions regarding test bias have been addressed through technical means, issues with test fairness are tied to values. The text attempts to define test fairness in a psychometric context and provides eight techniques for preventing or remedying adverse impact on one or another group (see page 209). One of these techniques included differential cutoffs. Furthermore, you were introduced to a variety of methods for setting cut scores. These methods have been based on either CTT or IRT.

Get solution to your nursing paper : Discussion Methods for Fairness in Testing

For this discussion, synthesize the information you learned about these two theories and respective methods. In your post:

Determine which one is preferential for responding to questions about a test’s fairness.
Identify at least two advantages and two disadvantages in using each theory, citing appropriate American Educational Research Association (AERA) standards from your readings.
Defend your preference in terms of the methods used within each theory and how they apply to concepts of fairness across groups. Essentially, how does it best address test fairness?
Describe how advances in technology are improving the process of test development and inclusion of appropriate items.
RESPONSE GUIDELINES
Respond to the posts of at least two other learners.

LEARNING COMPONENTS
This activity will help you achieve the following learning components:

Describe the characteristics of fair test items and procedures.
Define roles of technology in testing.
Apply writing and citations skills appropriate for doctoral-level learners.

Get solution to your nursing paper : Discussion Methods for Fairness in Testing

Test Fairness in language assessment has been now discussed by researchers for over a decade. Various definitions and formulations have been offered: The Standards from APA, AERA, NCME (1999) for general educational measurement and assessment and from Kunnan (1997, 2000, 2004) for language assessment. However, the analytical methods that can help bring about fair tests and testing practice have not been clearly articulated. In this article, I present a brief overview of the conceptual framework of Kunnan’s Test Fairness Framework and statistical analyses of test results that could be used to analyze a few of the test fairness qualities. Due to space limitations, qualitative methods (such as content analyses, conversational analysis, think-aloud reports) that are also useful for this purpose will not be discussed.

2 – Conceptual overview of The Test Fairness framework
2In earlier writings (Kunnan 2000, 2004), I presented an ethics-inspired rationale for my Test Fairness Framework (TFF) with a set of principles and sub-principles. The principles use a mixed deontological system which combines both the utilitarian and deontological systems. Frankena suggests reconciling the two types of theories by accepting the notion of rules and principles from the deontological system but without its rigidity and by using the consequential or teleological aspect of utilitarianism but without the idea of measurement of goodness, alleviation of pain, or to bring about the greatest balance of good over evil. Thus, two general principles of justice and beneficence and sub-principles are articulated as follows:

3Principle 1: The Principle of Justice: A test ought to be fair to all test takers, that is, there is a presumption of treating every person with equal respect.

4Sub-principle 1: A test ought to have comparable construct validity in terms of its test-score interpretation for all test takers.

5Sub-principle 2: A test ought not to be biased against any test taker groups, in particular by assessing construct-irrelevant matters.

6Principle 2: The Principle of Beneficence: A test ought to bring about good in society, that is, it should not be harmful or detrimental to society.

7Sub-principle 1: A test ought to promote good in society by providing test-score information and social impacts that are beneficial to society.

Get solution to your nursing paper : Discussion Methods for Fairness in Testing

8Sub-principle 2: A test ought not to inflict harm by providing test-score information or social impacts that is inaccurate or misleading.
The TFF views fairness in terms of the whole system of a testing practice not just the test itself. Therefore, multiple facets of fairness that includes multiple test uses (for intended and unintended purposes), multiple stakeholders in the testing process (test takers, test users, teachers and employers), and multiple steps in the test development process (test design, development, administration and use) are implicated. Thus the TFF has five main qualities: validity, absence of bias, access, administration, and social consequences. Figure 1 presents the TFF within the circle of tests and testing practice where validity is at the center of the framework and the other qualities although having their distinct roles overlap validity. This is translated into Table 1 which presents the TFF as a linear list with the main quality and the main focus of each of the qualities.

Figure 1
Figure 1
Table 1Test Fairness Framework
Table 1
9Here is a series of short descriptions for each of the test qualities presented in Table 1.

101. Validity: Validity of a test score interpretation can be used as part of the TFF when the following evidence is collected.

11a. Content representativeness or coverage evidence: This type of evidence (sometimes simply described as content validity) refers to the adequacy with which the test items, tasks, topics, and language dialect represents the test domain.

12b. Construct or theory-based validity evidence: This type of evidence (sometimes described as construct validity) refers to the adequacy with which the test items, tasks, topics, language dialect represents the construct or theory or underlying trait that is measured in a test.

13c. Criterion-related validity evidence: This type of evidence (sometimes described as criterion validity) refers to whether the test scores under consideration meet criterion variables such as school or college grades and on the job-ratings or some other relevant variable.

14d. Reliability: This type of evidence refers to the reliability or consistency of test scores in terms of consistency of scores among different testing occasions (describes as stability evidence), among two or more different forms of a test (alternate form evidence), among two or more raters (inter-rater evidence), and in the way test items measuring a construct functions (internal consistency evidence).

152. Absence of Bias: Absence of bias in a test can be used as part of the TFF when the following evidence is collected.

16a. Content or language: This type of bias refers to content or language or dialect that is offensive or biased to test takers from different backgrounds. Examples include content or language stereotypes of group members and overt or implied slurs or insults (based on gender, race and ethnicity, religion, age, native language, national origin and sexual orientation); or choice of dialect that is biased to test takers.

17b. Disparate impact: This type of bias refers to different performances and resulting outcomes by test takers from different group memberships. Such group differences (as defined by salient test taker characteristics such as gender, race and ethnicity, religion, age, native language, national origin and sexual orientation) on test tasks and sub-tests should be examined for Differential Item/Test Functioning (DIF/DTF). In addition, a differential validity analysis should be conducted in order to examine whether a test predicts success better for one group than for another

18c. Standard setting: In terms of standard setting, test scores should be examined in terms of the criterion measure and selection decisions. Test developers and score users need to be confident that the appropriate measure and statistically sound and unbiased selection models are in use.These analyses should indicate to test developers and score users that group differences are related to the abilities that are being assessed and not to construct-irrelevant factors.

193. Access: Access of a test can be used as part of the TFF when evidence the following evidence is collected.

20a. Educational access: This refers to whether a test is accessible to test takers in terms of opportunity to learn the content and to become familiar with the types of tasks and cognitive demands.

21b. Financial access: This refers to whether a test is financially affordable to test takers.

22c. Geographical access: This refers to whether a test site is accessible in terms of distance to test takers.

23d. Personal access here refers to whether a test offers certified test takers with physical and learning disabilities with appropriate test accommodations. The 1999 Standards and the Code (1988) calls for accommodation in order that test takers who are disabled are not denied access to tests that can be offered without compromising the construct being measured.

24e. Conditions or equipment access: This refers to whether test takers are familiar with to test taking equipment (such as computers), procedures (such as reading a map) and conditions (such as using planning time).

254. Administration: Administration of a test can be used as part of the TFF when the following evidence is collected:

26a. Physical conditions: This refers to appropriate conditions for test administration such as optimum light, temperature and facilities as relevant for administering tests.

27b. Uniformity: This refers to uniformity in test administration exactly as required so that there is uniformity and consistency across test sites and equivalent forms, and that test manuals or instructions specify such requirements. Examples include uniformity in test length, materials and any other conditions (for example, planning or no-planning time for oral and written responses) so that test takers (except those receiving accommodations due to disability) receive the test under the same conditions.

28c. Test security refers to issues of breach of security of test materials or test administration. Examples include fraud, misrepresentation, cheating, and plagiarism.

295. Social consequences: The social consequences of a test can be used as part of the test fairness framework when evidence regarding the following need to be collected:

30a. Washback: This refers to the effect of a test on instructional practices, such as teaching, materials, learning, test taking strategies, etc.

24/7 Nursing Homework Help

Stuck with your nursing assignment? From Essays to Complicated Dissertations? Our accredited nursing paper writers can answer it all!

Get nursing paper writing help