

EQAP Director Dr Michelle Belisle's Opening Remarks at the Field Trial Coding session
Good morning and welcome to our PILNA 2021 Field Trial coding session. Today, and for the next couple of weeks, we will be working together to accomplish a number of key tasks that will help to ensure the quality and success of the fourth administration of the Pacific Islands Literacy and Numeracy Assessment, or PILNA as we generally call it, in 2021.
While many across the Pacific are familiar with PILNA which was administered for the first time in 2012, very few people, comparatively speaking, have a sense of the magnitude of the work that goes into developing and delivering the assessment, which we often call the main study, every three years. Work on PILNA 2021 began in late 2019 with the item development workshop and continued through this year to get to the field trial. The main study will take place in 2021 and then the results are analysed and reported to each individual country as well as regionally. One of the biggest challenges in a large-scale assessment such as PILNA is to generate a set of assessment items that are accessible to the lowest performing students and challenging to the highest performing students while at the same time these items providing information about student proficiency across the strands and sub-strands of each domain. Our team have used item analysis on the data coming out of the three previous PILNA main studies to map the difficulty levels of the existing PILNA items and the workshop in December was focused on filling gaps and enhancing the pool of items for 2021.
One key element of a large-scale assessment is to ensure that the items perform the way we expect them to. That means, if we have constructed an item to measure a particular skill, we want to make sure it measures that skill and not something else. For example, if we construct a word problem that is meant to measure a student’s ability to work with money and make change, we don’t want the complexity of the problem to take over and test something that we will measure elsewhere in the assessment instead of the money concepts. We have a very good process to ensure that items are fit for purpose and behave as they are intended to behave but the real test of those items can only come from the children that are meant to use them. That’s why we do a field trial in multiple classrooms across multiple countries – to know how real students respond to the items. Based on the outcomes of the field trial, we select our main study items from the pool of items that have performed well in the field trials. This ensures that the main study will be of the highest quality possible in terms of being fit for purpose and measuring the skills and concepts identified in the assessment framework.
You may wonder why I keep referring to our process as coding as opposed to scoring or marking. Scoring is what we are most familiar with as teachers – we look at students’ work and use crosses and ticks to show our judgement as to whether the responses are correct or incorrect. That provides valuable information in terms of how many questions students were able to answer correctly and if we record the scores question by question, we can know which questions were most challenging to students and which questions they were able to handle the best. Coding student responses gives us all of that information and more. Coding allows us to capture more information about what students were able to do even if they weren’t able to come to a final answer in a question and as well, it allows us to capture the misconceptions and gaps in understanding students are displaying through their responses. As teachers, we do this in our heads – most of us can likely think of a test or assignment we have marked where several students have made the same error – either a misunderstanding of the question or maybe demonstrating that they haven’t quite mastered a skill. In our classrooms, we would generally address that in our teaching – maybe pointing out the issue and re-teaching the concept when we take up the test or reviewing it before the end of year testing. We don’t have to code and record the issues because we act on them almost immediately. In a large-scale assessment such as PILNA, we feed the information back to teachers and education ministries at an aggregated level many months after the assessment was administered. Coding allows us to provide information about what students struggled with, not only within a single classroom but across national and regional levels. That in turn can help to inform professional development efforts for pre-service and in-service teachers, curriculum reviews and development of interventions and supports for students.
In PILNA and other large-scale assessments, scoring is applied to the data after coding is complete. At that point, certain codes will be given a full score, others partial scores and many will get a score of zero. Coding doesn’t mean that students are getting marks or credit for incorrect responses. Many of the codes you apply will be given a score of zero, but the different codes help us sort out what the student did to get that wrong answer. For example, if we have a question about adding fractions like one half plus one quarter, there will be a code for the expected answer, three quarters and then a code for if the student adds the numerators (one plus one is two) and also adds the denominators (two plus four is six). The student will write two sixths which will get a score of zero but the code will help us report back what proportion of students make this error that shows they haven’t quite grasped how adding fractions works. That probably sounds quite confusing at the moment but as we progress over the first few sessions of this week, we hope it becomes more clear.
Your role in the field trial as coders is doubly important as we prepare for the main study in 2021. In the field trial we have just under 4000 student scripts from 13 countries, all in English. In the main study we will have almost 40 000 student scripts from 15 countries in 10 languages. Over the course of the next two weeks, you will participate in coder training and then apply that training as you code the field trial scripts from those many students. The accuracy with which you apply the codes will directly impact the quality of the information we get when we analyse the field trial data so we will do quite a bit of cross checking to ensure the codes are being applied consistently across the whole group of coders. What we are trying to do is, as much as possible, ensure that a particular student response gets the same code, no matter which coder has that paper. We believe that consistency can be reached if the training for coders is well-constructed, so that is the second part of your role here as field trial coders. We are trialling the training materials and processes here with you, to ensure that in 2021, when these sessions take place in all PILNA participating countries, the coders are supported to provide the same levels of consistent high-quality coding for those 40 000 papers as you will provide for the 4000 field trial papers. In essence, the same response for a question will receive the same code, regardless of the country, language or coder involved in reviewing the students’ work. It is that level of consistency and attention to quality that makes the PILNA results valid and reliable for the use of teachers and education ministries.
As we move into the coder training and the coding, I ask that you actively engage with us in our efforts to make our training and coding processes as good as they can possibly be. In the PILNA 2021 main study, we will have approximately 400 coders across the region. The 19 of you here for the field trial are paving the way for those 400. If you have a question, feel confused or have a concern, please raise it. In our classrooms, we often tell students to raise their hand and ask because if they have a question, it’s likely that others have the same question but are too timid to ask. The same applies here but even more so – if even one of you have a question or concern, it is likely that at least one if not more of the 400 coders will have that same concern. The more we can do to surface those questions now, the better our training and processes will be in 2021.
In the same vein, we continue to enhance and improve PILNA to provide more information and data each cycle. To do that, we revise our processes to become more efficient and to build in quality assurance. Field trials allow us to try out those processes and improve them or, if necessary, fix them before the main study. At various points we will be asking you for your feedback on some of the processes and we hope that you will respond honestly – we really do appreciate constructive criticism as well as positive feedback.
We are all really excited about the coming days as we work together to code the PILNA 2021 field trial scripts. We look forward to working with you and learning from your experiences as coders as well as from the students’ work and hope that you will find the time both challenging and enjoyable.
Thank you.