Tests constructed using item response theory (IRT) produce invariant item and test parameters, making it possible to construct tests and test items useful over many populations. This paper heuristically and empirically compares the utility of classical test theory (CTT) and IRT using psychomotor skill data. Data from the Test of Gross Motor Development (TGMD) (Ulrich, 1985) were used to assess the feasibility of fitting existing IRT models to dichotomously scored psychomotor skill data. As expected, CTT and IRT analyses yielded parallel interpretations of item and subtest difficulty and discrimination. However, IRT provided significant additional analysis of the error associated with estimating examinee ability. The IRT two-parameter logistic model provided a superior model fit to the one-parameter logistic model. Although both TGMD subtests estimated ability for examinees of low to average ability, the object control subtest estimated examinee ability more precisely at higher difficulty levels than the locomotor subtest. The results suggest that IRT is particularly well suited to construct tests that can meet the challenging measurement demands of adapted physical education.