Call for Papers for 2024 Special Topics/Issues:【Twice-exceptional Students】&【Special Education in Higher Education】!
Vol.48 No.1, (4) Development of item banks for middle school and high school science aptitude tests—Ya-Ling Hou(p97-128) Back


Rationale & Purpose:Because of the need to evaluate gifted students in scientific aptitude yearly in Taiwan, several methods are necessary to reduce the risk of leaked questions. The Programme for International Student Assessment, commissioned by Organization for Economic Co-operation and Development, emphasizes literacy in test design. One of the main goals of aptitude tests is to assess students’ ability to navigate a rapidly changing society. Measuring scientific aptitude includes evaluating the ability to explain phenomena scientifically, evaluate and design scientific enquiries, solve scientific problems, and interpret scientific data and evidence. Scientific aptitude has considerable explanatory power for understanding the academic potential and learning attitudes of students. This study involved the construction of sustainable development item banks or question bank for the Science Aptitude Test taken by junior and senior high school students. Additional content can be added to the item banks, and the items we developed were constructed based on the same scale according to item response theory (IRT) for comparison. Consequently, when gifted students are identified, the appropriate items can be selected from the item bank for test design, and the item bank can be increased through proper way in the future. Developing a gifted identification item bank is a good practice, as it addresses the needs for such tools, appropriateness in the giftedness identification process, the necessity of maintaining fairness and rigor throughout the identification process, as well as the cost savings from avoiding duplicate developing test items. Methods: To determine the items for the Science Aptitude Test, we adopted three constructs: content knowledge, cognitive process, and scientific competency. All the items underwent strict content review, and the item banks were developed using horizontal equalization based on IRT. Furthermore, every item had three parameters: parameter A was discrimination, parameter B was difficulty, and parameter C was guessing. In order to expand the Science Aptitude Test item bank for junior high school students, 4,663 students across Taiwan were analyzed to establish a 140-item calibration test. Thereafter, 7 of the 140 items were selected for the anchored test. One anchored test (seven items) and several new tests (each composed of 27 different items) were combined to form new tests. In total, 309 items were added. The three parameters for each item demonstrated good fit. For the high school Science Aptitude Test item bank, 3,702 high school students across the country participated to establish a 274-item calibration test. Only 10 of the 274 items were selected for the anchored test. One anchored test (10 items) and several new tests (each composed of 40 different items) were then combined to form new tests. In total, 412 items were added. Similarly, the three parameters demonstrated good fit. Results/Findings: The assignment committee members were selected using the talent database on the basis of their subject of expertise to ensure that the test items had high validity. A total of 72 committee members participated in designing the assignments for the scientific aptitude tests. The correctness and appropriateness of items were reviewed by four separate review teams with backgrounds in physics, chemistry, biology, and geology. Statistical tests revealed that the parameters for the anchor tests and aptitude tests (discrimination, difficulty, and guessing) all had good fit, and proved the items are unidimensional. Thereafter, high school students were tested, and the correlation between their aptitude test scores in the item bank and their semester scores in natural sciences indicated high criterion related validity. The test information curve indicated that the tests provide the maximum amount of information with the minimum number of errors when the student’s capability is 1.1 standard deviations above the mean. Moreover, significant differences were observed between gifted students and other students in each item of the aptitude test, which further indicated good discriminant validity. Finally, 754 ninth-grade students were used as the norm to establish a percentile grade and a normalized t score norm through a 50-item test that selected from the item bank. Conclusions/Implications: After the literature review and several expert panel discussions, we adopted the three aforementioned constructs for scientific aptitude (content knowledge, cognitive process, and scientific competency). To reflect the 12-year national education system, interdisciplinary test questions were also added in the final round in addition to the four subjects. The test questions in the question bank were determined by many test committee members at different times. To ensure the content validity and quality of the test questions, all the questions were first given to the review team to determine whether the questions were consistent with the test structure and make appropriate revisions accordingly. After the content knowledge revisions, the test experts reviewed the principles for compiling the test question. The response data of students from the fieldwork samples were referenced during revisions. This study used rigorous procedures to ensure that the questions in the question bank were of high quality. Because the Ministry of Education requires that the identification of gifted students must be based on national norms, this study first established national representative samples of the middle school and high school Science Aptitude Test. To allow future test committees to save money by conducting regional tests, labor and other related costs were conducive to obtaining the support of schools and expanding the question bank. Those managing the question bank must continually increase the number of test questions. To allow the scores of each test to be on the same scale and remain unaffected by the scores of other testers and tests, the equalization between tests was ensured using a common test. To reduce the exposure of the common test questions and improve the confidentiality of the test questions in the test bank, the researchers of this study selected a set of anchor tests covering each subject from the middle school and high school Science Aptitude Test to make additional tests and calibrations in the future. With the addition of the common test, the two-form anchor test not only considers the validity of the aptitude test constructs but also ensures that the values of the three parameters (Discrimination, Difficulty, and Guessing) are ideal.


Download