Jordan Dreyer, Cleveland State University, USA
Dreyer, J. (2014). The effect of computer-based self-access learning on weekly vocabulary test scores. Studies in Self-Access Learning Journal, 5(3), 217-234.
Download paginated PDF version
This study sets out to clarify the effectiveness of using an online vocabulary study tool, Quizlet, in an urban high school language arts class. Previous similar studies have mostly dealt with English Language Learners in college settings (Chui, 2013), and were therefore not directed at the issue self-efficacy that is at the heart of the problem of urban high school students in America entering remedial writing programs (Rose, 1989). The study involves 95 students over the course of 14 weeks. Students were tested weekly and were asked to use the Quizlet program in their own free time. The result of this optional involvement was that many students did not participate in the treatment and therefore acted as an elective control group. The resultant data collected shows a strong correlation between the use of an online vocabulary review program and short-term vocabulary retention. The study also showed that students who paced themselves and spread out their study sessions outperformed those students who used the program only for last minute “cram sessions.” The implications of the study are that students who take advantage of tools outside of the classroom are able to out perform their peers. The results are also in line with the call to include technology in the Basic Writing classroom not simply as a tool, but as a “form of discourse” (Jonaitis, 2012). Weekly vocabulary tests, combined with the daily online activity as reported by Quizlet, show that: 1) utilizing the review software improved the scores of most students, 2) those students who used Quizlet to review more than a single time (i.e., several days before the test) outperformed those who only used the product once, and 3) students who professed proficiency with the “notebook” system of vocabulary learning appeared not to need the treatment.
Keywords: vocabulary, online study, self-access, test
Much of the research focused on technology and vocabulary learning has been going on under the roof of English as a Foreign Language (EFL) teaching. As a reaction to a perceived lack of innovation within EFL during the 1990s, research into new vocabulary learning strategies has been going on in earnest, especially in Taiwan and China. Other new research-based learning strategies, which also employ technology as a means of tailoring the learning process to individual students include, ‘bottom-up inductive learning’ and ‘self-regulated learning’ (Guan, 2013; Mizumoto, 2012). In the Guan study, which focuses on Chinese University English vocabulary learning, researchers used an online ‘corpus’ of authentic English texts and then invited students to independently download and analyze content in chunks in order to define new terms on their own. This procedure, called Data Driven Learning by the authors, is more popular in colleges and research university EFL programs than in Chinese high schools, but teachers are encouraged to use the technique at all levels to promote student computer-based SAL and increase self-efficacy. In the Mizumoto study, a group of 281 Japanese university EFL learners was asked to rate themselves on a three-level self-efficacy scale before taking a vocabulary test. Based on strong correlation between a high self-efficacy and the presence of valuable metacognitive learning strategies, Mizumoto concludes that self-efficacy enhancement is an important component in vocabulary learning and teaching (Mizumoto , 2012). Both the Guan and Mizumoto studies recommend the employment of tools that put the task of vocabulary acquisition in the learner’s hands, called Data-Driven Learning in the former and “vocabulary learning strategies” in the latter. These methods were proven effective in increasing student self-efficacy and long-term vocabulary retention. In a 2012 study by Hirschel and Fritz, “Learning vocabulary – CALL program versus vocabulary,” it was found that the use of traditional, notebook methods of learning vocabulary do not take advantage of these advances in memorization processes. This study was performed with 140 first-year Japanese university students divided into a control group using no intervention method, a group using the vocabulary notebook method, and a group using Computer-Assisted Language Learning (CALL). Analyzing the results of the study, which came out in favor of CALL for long-term results, the authors caution educators against the continued use of notebooks to learn vocabulary. Instead, they advise implementation of different CALL programs, placing special focus on learner motivation.
Games have been a growing interest of educators for years because they offer a learner-centered approach and increase student buy-in (Garris, Ahlers, & Driskell, 2002). More specifically, games lower the learner’s affective filter, as shown by a recent seven-week study involving secondary school Malaysian students from a “semi-urban” setting (Letchumanan & Hoon, 2012). The affective filter, according to the authors of the study, is usually a factor in blocking any long term, post-assessment retention of knowledge in a “non-coercive” environment (Letchumanan & Hoon, 2012). It has also been proposed (Chiu, 2013) that this very same affective filter has been created, reinforced, and manipulated by repetitive testing. One effect of this testing pedagogy is that all attempts to prove the efficacy of games in education have been disadvantaged. That is, vocabulary learning has been “exam-oriented” and “drill-based” for so long that the relative ease of playing games does not seem high-stakes enough for the average student (Chiu, 2013, p. 54). It should be noted, though, that the students in these studies are predominantly Asian post-secondary students, who might feel a great deal more pressure, or test-related stress than the American high school student.
Another study on the use of technology to help elementary school age children acquire the proper ‘base-level’ vocabulary helped to popularize the use of computer-response activities (Labbo, Love, & Ryan, 2002). The study involved 85 kindergarteners from the lowest SES demographic school in a district located in the southeastern United States who took part in what Labbo et al. called a “vocabulary flood” instructional cycle that included constant use of a computer to record and re-present student-created content (p. 582). The study showed that students who enter school with a smaller vocabulary need a great deal more exposure to new terms before they are acquired. Technology can play an important role at the earliest stages, but the greatest gains can be seen in older, high-school age students, who seem to have less difficultly navigating the technology (Chiu, 2013). The Chiu study employed a meta-analysis of five sources of data: Chinese Periodical Index, Dissertation and Thesis Abstract System of Taiwan, IEEE Xplore, ERIC and Google Scholar. These studies, which collectively represent 1684 students from all levels, were done in Taiwan, Turkey, Spain, Arabia, France, Japan, Hong Kong, Korea and China. The results show that high school and college students respond to computer-based learning more efficiently than elementary-age students, and Digital games-based learning (DGBL) seems to have a smaller effect size than digital learning without games. This, Chiu points out, maybe due to the fact that students have been consistently taught vocabulary—not to mention writing and reading—using a highly coercive, exam- and drill-focused pedagogy (called “tell-test” by Prensky (2001, p. 72)) even to this day. The need to introduce technology into the classroom is therefore most crucial where it has been receiving the littlest attention, sometimes even negative attention (Obringer, 2007).
The main push for all of this research into computer-based self-access vocabulary learning has come primarily from Asia, where most of the world’s English language learning is taking place. The solutions that have come out of these studies, that students need to be given more opportunities to learn independently and that the technology being created to facilitate this learning needs to find its way into students hands (Chiu, 2012; Letchumanan & Hoon, 2012; Mizumoto, 2012; Guan, 2013), have not yet been applied across the board within the American urban high school. One case in which computer-based learning was proven effective against traditional methodology was in a reading comprehension study involving 145 students from nine 10th grade literature classrooms in a large urban public high school of approximately 2,200 students located near Atlanta, Georgia (Cuevas, Russel, & Irving, 2012). In the study, Independent Silent Reading (ISR) done with a computer program was shown to be more effective than reading from a traditional textbook. The study’s authors point to the particular difficulties of access to “conducive environments” faced by urban students that the use of technology can help to circumvent (p. 446). This outcome, according to the authors, emerged from the “pronounced increase in … motivation” shown by the students who used computer modules (p. 460). This lines up with the idea that use of computer-based SAL can help to motivate modern students (Howard, Ellis, & Rasmussen 2004).
The present study connects much of the research that has been done in Asia with computer-based learning that has been done in America. It also features a large enough sample size and a long enough period to produce valid data on the use of computer-based SAL in an urban high school. My research question is whether or not the use of computer-based SAL can work as an effective review for weekly vocabulary tests.
The study was performed at a selective-admittance high school in a low-performing urban school district from February to May, 2014. The students were from a low social-economic standing, with all students enrolled qualifying for 100% free lunch; 90% of the students are African American, 5% are Caucasian, 3% Hispanic, and 2% Asian/Pacific Islander. The 96 students taking part in the study were from three different classes: a 10th grade English Language Arts course, English 2 (E2); a 12th grade regular-level English Literature course, English 4 Block 1 (E4.1); and a 12th grade Advanced Placement English Literature course, English 4 Block 2 (E4.2). Whereas the E2 class was representative of the school as a whole, the two 12th grade classes contained one Caucasian student each with the rest of the students being African American. The female to male ratio was close to 6:4. Two students were on Individual Education Plans for disabilities in reading and two were English Language Learners.
The study involved the use of three instruments: a weekly vocabulary test, a post-treatment survey, and the Quizlet website. The weekly tests each included ten new terms. Students were given 30 minutes to complete the tests and were allowed to re-take tests at a 10% penalty. There were a total of 12 weekly tests for the study period (an example test is provided in Appendix A).
The survey consisted of a questionnaire that was filled out halfway through the study period, after the 9th weekly test. This questionnaire contained four short answer questions and three Likert scale questions which were developed for the study (see Appendix B). These survey questions were aimed at gaining constructive feedback from the students and took the form of a Quizlet product evaluation.
Quizlet is a website accessed internationally for vocabulary review of all subjects at all levels of education. Created in 2005 by a then high school student Andrew Sutherland to help him study French vocabulary, the website hosts and shares user-created virtual flashcard lists. A Teacher’s Membership portal allows for the creation and tracking of Classes in which students can easily find all vocabulary lists for a particular subject.
Student activity on Quizlet was recorded using Quizlet’s Teacher Information toolset. Vocabulary sets were added every Sunday, giving students a 5-day window in which to study for the Friday test. Details that the Quizlet instrument recorded include the number of times each of the 5 ‘games’ was played, when the games were played during the week, and whether or not a student had ‘mastered’ the game by either answering every questions flawlessly (Flashcards, Speller, and Learn) or by reaching a certain target speed (Space Race, Matching). The instrument also reported whether a student had used a mobile device or a PC to access the program, and which words students were struggling with each week. This information was used to categorize Quizlet review activity into four levels: 0 (no review), 1 (minimal review), 2 (moderate review), and 3 (complete review). In addition, Quizlet review activity was divided out into three times: “E” for early (review during Monday or Tuesday), “M” for midweek (review during Wednesday and Thursday) and “L” for late (review on Friday morning, just before the test). If a student reviewed for five minutes just after receiving the vocabulary list on Monday, for example, she would have a “1E.” If she reviewed again on Thursday and mastered all of the Quizlet activities she would have a “3M,” and if she took a quick look at her phone just before the test she would be given a “1L;” for the whole week she would receive a total score of “5.”
The procedure of the study was divided into two parts. First, students were taught how to access Quizlet on their mobile devices and on a PC. Students were brought into the computer lab twice in order to make sure they had all signed up for Quizlet accounts. Students were then mildly incentivized with the offer of extra credit for using the treatment to study. The students were never forced to use Quizlet, but without an incentive the proportion of users and non-users would have been too unbalanced. In addition, it is the goal of this study to measure the effect of SAL, which does not involve compulsion.
The second part of the procedure comprised a series of 12 weekly vocabulary tests with terms taken from various SAT word lists and root words from Membean.com. Students were given these words every Monday and tested on them every Friday with no class time devoted to review. Instead, students were encouraged to study the words on Quizlet, where interactive flashcards containing definitions, variations, pictures, and example sentences had been added.
The data collected was analyzed in three different ways. First, the scores of the vocabulary tests were compared to the students’ use of Quizlet to show a correlation between use of computer-assisted vocabulary review and performance on weekly tests. Second, student responses to the questionnaire were first compared with evidence from Quizlet to show the relationship between treatment and the likelihood of future use. Finally, the study investigated the timing of students’ use of the Quizlet review, i.e., whether a student reviewed only once or on multiple occasions and when during the week the review was done (just after receiving the words, midweek, and/or just before the test) by comparing the aggregate test scores of these categories over time. This special attention to study habits, made possible by Quizlet’s reporting of student activity, has enabled a much more nuanced understanding of high school students’ use of computer-based SAL.
First, the primary research question, whether or not Quizlet will work as an effective review for weekly vocabulary tests, can be shown by comparing the number of times each student reviewed with his or her average test score (see Figure 1). The sample was divided into the three classes that took part in the study. While the E4.2 group had a much higher average number of times reviewed, the correlation between review and test score was about the same as for the E2 group. On average, for every additional visit to the Quizlet site, students in the E4.2 and E2 group saw an increase of about 3 percentage points (3.1% and 2.6%, respectively) on their weekly tests. The E4.1 group had only a slightly positive correlation, with each site visit translating into only 0.8 additional percentage points.
Figure 1. Computer-Based Self-Access Review Compared to Test Score
Test scores over time were looked at in two ways in order to show the effect of Quizlet review. First, students were divided into two groups, those students who accessed Quizlet at least 11 times during the period of study (40 “Quizlet students”) and those students who accessed Quizlet less than 11 times (51 “Non-Quizlet students”). Then the array of the 12 score averages for these two groups were plotted together over time (see Figure 2).
Figure 2. Average Test Score Over Time by Group
The plot lines show that, with two exceptions, the Quizlet students consistently out-performed the Non-Quizlet students. The combined average test score for Quizlet students over the entire course of the study was 82%, while the Non-Quizlet students had a combined average score of 79%. Students in the Quizlet group scored higher and had less score variance than students in the Non-Quizlet group. The next figure shows the times that each group spent on Quizlet during any given week (see Figure 3). Quizlet students are represented by the lighter bar graphs and Non-Quizlet students are represented by the darker bar graph.
Figure 3. Amount of Review Over Time by Group
Whereas the Quizlet students continued to use Quizlet to review for a combined total of at least 40 times per week, the Non-Quizlet students only used the site in large numbers during the first few weeks. Accordingly, there was very little correlation between the review time and the test scores of the Non-Quizlet group. This is best explained by the first two data points, tests 1 and 2, during which many students first tried out Quizlet (1t) and then decided not to use it (2t). This may also reflect an adjustment to the test format, although the test used was very similar to tests students had taken previously with traditional, teacher-led midweek vocabulary review. Judging by the consistently low amount of Quizlet use by the Non-Quizlet group, the dramatic rise and fall of their scores may instead be attributed to a weekly reaction to test scores from the previous week: if the group scored poorly on the previous test they tended to rally and achieve a higher score on the next week’s test. This pattern can also be seen in a less pronounced way in the Quizlet groups, with the immense score variations of the Non-Quizlet group between weeks 6 to 7 and 10 to 11 echoed to a lesser degree in the Quizlet group. Even though the Quizlet group was able to score higher, this graph reveals several instances where number of times reviewed does not correlate with the average test score. Weeks 9 through 12, for example, show a steady increase in test scores for the Quizlet group, while the number of times reviewed went up and down at random. Still, since averaged test scores do show positive correlation with number of times reviewed, there are additional explanations for why some review was less effective (see Figure 1).
The times of each student’s Quizlet review, as mentioned above, were collected and placed into data arrays, which were then divided into different groups. First, the overall effect of time of review can be shown by a comparison of all students’ test scores and the time of the week they reviewed (see Figure 4). In general, early review and late review were positively correlative with higher test scores. Midweek review is the most positively correlative. The Non-Quizlet group, though, had a somewhat different outcome. For this group, both midweek and late review led to an increase in test scores while early review was negatively correlative with test scores. This can again be attributed to the first two data points from the earlier graphs, because most students tried Quizlet out on the first day of the study. In addition, of the 161 times the Non-Quizlet group accessed the website, only 14 were done early in the week (see Figure 2). In total, students accessed the website 887 times: 134 early, 376 midweek, and 377 late.
Figure 4. Effect of Review Time on Average Test Scores by Group
Some interesting conclusions can be drawn in regards to the study habits of different classes (see Figure 5). Each graph compares the average test scores of students with the number of times they reviewed during each different time of the week. Interestingly, while the sophomore class exhibited a positive correlation for all review times, both senior classes show a negative correlation between test scores and late review.
Figure 5. Effect of Review Time on Average Test Scores by Course
The final data source, the surveys, also led to some two different correlations. First, the relationship between students’ use of Quizlet and the third survey question, concerning the ease of use on a scale from 0 for hard to 5 for easy, is shown in the graph on the right (see Figure 6). There is a slightly negative correlation between the number of times a student reviewed with Quizlet and that student’s reported ease of use. Despite the fact that increased use led to slightly increased frustration, the graph to the right shows a positive correlation between the number of times a student used Quizlet and the likelihood that student would use Quizlet in the future.
Figure 6. Survey Results compared with Times Reviewed
Written survey results revealed mixed reactions to the transition from teacher-led classroom review to self-access review. Of the 64 students who responded to the survey, 18 said they preferred to study vocabulary in class. These students favored “Wednesday reviews,” “in-class reviews,” and “Powerpoints!” The second group consisted of thirteen students who preferred the classic, notebook study method. The notebook group favored “writ[ing] the words over and over,” “making my own flashcards,” and “writing the words and definitions over again until I get it right.” The third and largest group, with 23 of the 64 students, preferred using Quizlet over any other vocabulary review method. Students in the Quizlet group mentioned the use of pictures, the convenience, and the variety of games. The app’s Scatter activity, which times how quickly the user can match words to their definitions, received the greatest amount of praise, followed by its Learn activity, which has learners typing in words after being shown the definition. Some students admitted only using the Quizlet app for “last minute” study. A final group including only seven students indicated that they preferred to use Quizlet, the notebook study method, and in-class reviews all at the same time. Generally students had a positive reaction to Quizlet. Even those who did not use the app or website admitted that it was because “I forgot,” or “I don’t study.” Only two students criticized the program directly: one student deemed it “too technical” and another wrote the word “glitches.” Six students asked for “more games” and one student suggested that Quizlet send out “study reminders.” The majority of the students, when asked how Quizlet might be improved, wrote “I don’t know,” “it’s already a great app,” or “if it ain’t broke don’t fix it.”
The uncontrollable limitations to this study include the fact that several of the weekly tests were off schedule due to snow days. While these did not seem to effect the variation between test scores for students who did and did not use Quizlet, it did damage the efficacy of the time of review study. Whenever a regular week of vocabulary was broken up by weather, I treated the first two days as “early” and the last two days as “late,” so the “midweek” period of study was stretched out and exaggerated slightly, but students usually didn’t study any more than normal, so the effect of the disruptions were minimal. In addition, the Ohio Graduation Test occurred during the fourth week of the study, so the cycle of weekly testing was interrupted. This had no noticeable effect on the test scores, but again the interruption may have slowed the pace of the study.
The nature of this study lends itself very well to larger sample sizes, so the fact that only 96 students were involved in the study was the first limitation that should be removed. Future versions of this computer-based self-access vocabulary review study should be done using larger and more diverse sample sizes. Because the Quizlet website reports such a large amount of data, collecting and comparing this data should be relatively simple, regardless of the sample size. Also, Quizlet is simple to use and requires very little intervention on the part of the teacher, effectively negating any variations based on location, socio-economic standing, and quality of teaching. The initial presentation of the words can be done by the teacher using Quizlet’s flashcard feature or it can be completely self-access, with students opening the flashcards on their own during the beginning of the week or as they see fit.
Another limitation of the study was the low amount of voluntary participation. The study was designed so that students would be free to choose between using Quizlet and using their own review methods. Many students already comfortable with the notebook method tried Quizlet once and then went back to their old review method. Many other students elected not to review their vocabulary. This was due in part to the nature of the school these students were attending: homework was not a part of the English curriculum and weekly vocabulary reviews had been directed by the teacher. The students in the sample were therefore unused to self-access learning in any form and were at a loss when given a list of words and told to “learn them.” Of course, this is the very academic helplessness in urban schools that the study is targeting, so it was an expected limitation. Future studies in similar urban settings should therefore take into account students’ general lack of study skills and set up a pre-study scaffolding to familiarize students with computer-based self-access study.
The final limitation of the study was that it was not fun. While this may seem like a minor quibble, it is in fact central to the success of self-access learning. The largest number of complaints recorded in the survey had to do with Quizlet’s lack of “good games,” and students’ desire for the above-mentioned teacher-directed Wednesday reviews, which often involved games as well. Quizlet offers three activities that could be considered games, but these cannot replace in-class review games in terms of excitement. While the ultimate goal of the study is to see whether vocabulary might not be taken out of the classroom completely, part of this process should include making vocabulary fun or, as Freeman & Freeman say, turning students into lexiphiles (2004). One of the ways this might be done is by hosting weekly Space Race tournaments, during which students would be given a set time period to try and reach as high a score as possible in Quizlet’s Space Race activity. Students might also be encouraged to research and create their own vocabulary lists, comment on and add to each other’s lists, or find interesting Quizlet lists by searching on the site. These sorts of acclimation activities should help to smooth the transition between teacher-led vocabulary review and self-access vocabulary review.
Discussion and Conclusions
Based on the data collected, I have drawn three conclusions from my research question. First, the use of computer-based self-access vocabulary review is an effective strategy for learning vocabulary. This aligns with the findings of Guan (2013) and Mizumoto (2012). Although the sample size was too small to show significance, I am confident that, given a longer period of time, the use of Quizlet would continue to result in higher test scores. Teachers should therefore integrate either Quizlet or another similar vocabulary learning website into their curriculum. Making these kinds of tools available gives students a sense of control over their vocabulary studying. The games and challenges make learning and memorizing vocabulary enjoyable, and the software monitors students’ answers, so the website becomes a customizable instructional tool.
The second conclusion I was able to draw from this study concerns the time of the week that students chose to study their words. For a majority of students, midweek vocabulary review had the greatest impact on their weekly test scores. Only the sophomore English 2 students showed little improvement based on midweek study. Also, for the E4.2 group, who had spent by far the most time on the site, late review had a negative correlation with test scores; the more time students spent reviewing on the day of the test the worse their score. For all classes, review within the first 24 hours after having received the vocabulary list, while showing a positive correlation with test scores, was not frequent enough to draw any conclusions from. It should be noted that, before the study period, these three classes had been used to Wednesday vocabulary reviews, and students’ predilection for midweek review may account for some of the correlation.
The final conclusion is that students who are introduced to Quizlet in high school are very likely to use it in college. According to the survey, most students will be using Quizlet in the future. More importantly, those students who used Quizlet the most are the likeliest to use Quizlet again. Teachers interested in preparing their students for college and university should include Quizlet in their curriculum for this reason.
Notes on the contributor
Jordan Dreyer teaches High School in the Cleveland Metropolitan School District. He completed his Master’s in Urban Education at Cleveland State University in 2014.
Chiu, T. (2013). Computer-assisted second language vocabulary instruction: A meta-analysis. British Journal of Educational Technology, 44(2), E52–E56. doi:10.1111/j.1467-8535.2012.01342.x
Cuevas, J., Russell, R., & Irving, M. (2012). An examination of the effect of customized reading modules on diverse secondary students’ reading comprehension and motivation. Education Technology Research Development, 60, 445-467. doi:10.1007/s11423-012-9244-7
Freeman, D. & Freeman, Y. (2004). Essential Linguistics: What you need to know to teach. Portsmouth, UK: Heinemann.
Garris, R., Ahlers, R., & Driskell, J. (2002). Games, motivation and learning: A research and practice model. Simulation & Gaming, 33(4), 441–467. doi:10.1177/1046878102238607
Guan, X. (2013). A study on the application of data-driven learning in vocabulary teaching and learning in China’s EFL class. Journal of Language Teaching and Research, 4(1), 105–112. doi:10.4304/jltr.4.1.105-112
Hirschel, R. & Fritz, E. (2013). Learning vocabulary: CALL program versus vocabulary notebook. System, 41, 639–653. doi:10.1016/j.system.2013.07.016
Howard, W., Ellis, H., & Rasmussen, K. (2004). From the arcade to the classroom: Capitalizing on students’ sensory rich media preferences in disciplined-based learning. College Student Journal, 38(3), 431-440.
Jonaitis, L. (2012) Troubling discourse: Basic writing and computer-mediated technologies. Journal of Basic Writing, 31(1), 36-58.
Labbo, L., Love, M., & Ryan, T. (2007). A vocabulary flood: Making words “sticky” with computer-response activities. International Reading Association, 582–588. doi:10.1598/RT.60.6.10
Letchumanan, K., & Hoon, T. B. (2011). Using computer games to improve secondary school students’ vocabulary acquisition in English. Social Sciences & Humanities, 20(4), 1005–1018.
Mizumoto, A. (2012). Exploring the effects of self-efficacy on vocabulary learning strategies. Studies in Self-Access Learning Journal, 3(4), 423–437. Retrieved from: https://sisaljournal.org/archives/dec12/mizumoto/
Obringer, J., & Coffey, K. (2007). Cell phones in American High Schools: A national survey. Journal of Technology Studies, 33(1), 41–47.
Prensky, M. (2001). Digital game-based learning. New York, NY: McGraw Hill.
Rose, M. (1989). Lives on the boundary: A moving account of the struggles and achievements of America’s educationally underprepared. New York, NY: Penguin Books.
Appendices – see PDF version