How Automated Item Generation Helps You Create Better Tests

| Blog Articles | Share:  

Former Chief Scientific Officer Dr. Greg Sadesky dives into a favorite topic: automated item generation.

Rapid Content Production for Professional Testing

When it comes to high-stakes testing and credentialing, the validity of the results depends heavily on the quality of the questions being asked. Those questions need to make sense, cover the subject matter comprehensively, and be developed in an efficient manner. Automated item generation (AIG) does precisely that.

AIG is a topic near and dear to my heart, since I led a project that resulted in the creation of Itematic, which is to my knowledge, the first commercial product to provide AIG to the testing industry.

The most obvious advantage of AIG is the rapid production of more content for multiple choice tests – and hey, who couldn’t use more content? In a future post, I’m going to explore the truly disruptive potential of AIG and what having more content could unleash for a testing program.

What is automated item generation, and why is it important?

But first things first: What is AIG and why is it so important? AIG is a computerized question creation methodology that allows for a greater number and range of questions to be produced at once, as compared with typical item-at-a-time methods.

Here is a generic example from health care:

Example 1. A generic question.
A patient presents with symptoms A, B, and C. Which of the following diagnoses is most likely?
a) D1     b) D2     c) D3     d) D4

Example 2. A generic template.
A patient presents with symptoms [A, B, C], [D, E, F], and [G, H, I]. Which of the following diagnoses is most likely?
a) D1     b) D2     c) D3     d) D4

Example 3. A generic template with more possible answers.
A patient presents with symptoms [A, B, C], [D, E, F], and [G, H, I]. Which of the following diagnoses is most likely?
[Possible answers include D1, D2, D3, D4, D5, D6, D7…]

Example 4. A generic template with more possible variation.
In [Scenario I, Scenario II, Scenario III], a [TYPE X, TYPE Y, TYPE Z] patient [presents with, is complaining of, is discovered to have] symptoms [A, B, C], [D, E, F], and [G, H, I]. Which of the following diagnoses is most [probable, likely]?
[Possible answers include D1, D2, D3, D4, D5, D6, D7…]

As you move from Example 1 to 4, it can be clearly seen that the potential number of generated questions grows. Combining all possible values with each other, this number could even become inconveniently large. Of course, not all combinations of values make sense, so the number of possible questions would be constrained by which values, when combined, make a sensible and valid question. Nevertheless, it’s pretty clear in principle that Example 4 could produce many questions – and in the testing business, that’s a very good thing.

Advantages of Automatic Item Generation

There are a couple advantages of AIG that lead to not only more questions but also arguably better questions than item-at-a-time methods.

1. Survival of the fittest … item

While AIG allows for the production of lots of items, simply having, say, 1,000 items targeted to the same competency is not ideal from a testing perspective. But, if the template has been constructed well, within those 1,000 items should be a sufficient number of excellent items that will meet quality and content standards. The key is getting quickly and efficiently to the smaller set of items that will fill the precise holes in your item bank. As an example, in Itematic we designed functionality to enable the rapid selection of item subsets that match initial specifications, but which then allows for fine tuning individual sets. It’s that final set of great items that is the goal, and most often the result of item development using AIG.

2. Comprehensive content domain coverage

Since the intent of AIG is to create an item set, it can cover that content domain in a way that is difficult to achieve with item-at-a-time methods. Let’s say that my client representing health professionals needs 1,000 items in their item bank and needs 5 more items on the causes of shortness of breath to reach this goal. In conventional item development mode, I would assign individual authors to write questions on the topic and would likely get those 5 questions over time. Besides having the review each of those 5 questions individually for quality, I would also have to make sure that the questions weren’t all about asthma, for example, and instead appropriately spread across emphysema, bronchitis, and other causes that I would want included. This is hard to manage, particularly when they involve different people writing, and different occasions for development.

With AIG, the range of content coverage becomes an integral part of the process of item creation. In our shortness of breath example, the signs and symptoms, patient characteristics and of course, the various causes that need to be addressed by the template are all built in at once. Then, when it comes time for item selection, questions tied to each cause are deliberately included. This helps to ensure that the item bank contains questions across the full range of skills and knowledge required. In short, AIG efficiently engineers comprehensive content coverage, making well-formed item banks and setting the stage for better tests.


All this makes AIG pretty powerful, and potentially disruptive for the testing industry. One of the best things about creating a product like Itematic is seeing how the intelligent and creative folks in our industry are using it, often in ways that transcend its original intent! In a future article, I’ll talk about how AIG appears poised to significantly change content development in the testing industry.