Army Research Lab (ARL) Player Testing

Role: Lead Researcher.
Skills: Project Management, Structured Interviews, Data Synthesis, Interdisciplinary Collaboration, Public Speaking.

Summary

Leveraging psychological research training. I worked alongside engineering-minded individuals to evaluate a serious game developed at Iowa State University in coordination with the Army Research Lab. Feedback given during the game was expected to impact player and team performance. I improved the data collection protocol to better record the impact of the game on team dynamics and to carefully coordinate the application of the complex experimental design.
While participants generally reported enjoying the experience and appeared to improve at the game over time, the data showed that team dynamics were no different between conditions. Additionally, experience in the game was strongly related to player skill in new roles. However, it was found that collective efficacy, or the belief in one's team to perform well, largely depended on each player's self-reported level of game-playing experience.

Problem

Did the style of feedback affect the player's acquisition of team skill?
What influences the ability of a person in a cooperative game?
Does a person's experience with other video games impact their performance in our serious game?

Process

We collected data using a series of Qualtrics online surveys, structured group interviews, and a software framework, called GIFT, which logged key presses and times. I led eight research assistants in the collection of data from 39 three-person teams over 20 weeks. Because of the various levels of research experience in the research team, interview data remained shallow. I scheduled both the research team and the participants, and throughout the semester, I relayed experiment updates to the development team.
Data analysis occurred after the data were thoroughly cleansed and organized into a form that worked well with the analysis software. Using Python, my team and I created visualizations of keystroke data that were coded to understand user performance. I performed statistical tests using RStudio to solidify our understanding of the data and test our hypotheses.

Conclusions

On the whole, participants improved throughout the study. However, performance was no different depending on which feedback was received or what the participants' level of previous experience with cooperative-play video games or teamwork, more generally, was coming into the study. Because of our previous work, which showed that any feedback was better than no feedback, in terms of performance improvements, the lack of guidance on the first research question was not discouraging to our overall objective.
A lot of the challenge came from a lack of specific data about user interaction as well as slight balance issues related to the role participants embodied during the task. In future studies, more work will be done to build a task that can be used to evaluate teamwork without producing floor effects, as was observed in this evaluation. The findings of this study will help in future iterations of Intelligent Team Tutoring Systems.
Some flaws in the menu structure and the naming scheme were identified, leading to a quick rework of the prototype before the results were shared with the web design team. Because of the interviews used at the outset of the project, our team was able to prioritize the addition of digital accessibility information and assistive technology solutions. Also, this project positioned user-centered design in the viewport of IT professionals at Iowa State University. While the application of the UX process throughout the university will be slow (as is true for legacy companies and government organizations, in general), my team continues to work to evangelize the process to better our university.

Challenges

Balancing experimental power with the budget

As a team, we decided to not include a control. This greatly impeded our ability to delineate the source of improvements for our participants. Instead of giving up on the data, we found ways to benchmark the performance of the players in later trials on their first trials. I have learned, firsthand, the importance of a control condition when the research involves player performance.
The budget may tell you to call on fewer participants, but that doesn't mean you should cut out the control. Either accept lower power and look for trends, or use a planned missing data experimental design to stretch the budget without losing granularity.

Advanced data analysis

An ANOVA cannot tell you everything you need to know, especially in complex experimental designs. Because our participants were not independent (we wanted information on individuals on the teams), I learned and employed mixed-effects modeling to make sense of the nested data. The data were complex, as well, requiring a healthy dose of RStudio scripting to format the dataset in a usable way.