Because wearable technology is ubiquitous, it is important to determine validity and reliability not only in a laboratory setting, but applied environments where the general population utilizes the devices. The purpose of this study was to 1) determine intra-rater reliability of visual step count outdoors, 2) determine validity of commercially available wearable technology devices in this setting, and 3) report test-retest reliability of commercial devices during hiking and trail running. Individuals (N = 20) completed 5-min hikes and trail runs on a 200-m section of trail while wearing the following devices: Fitbit Surge 2, Garmin Vivosmart HR+, Leaf Health Tracker, Polar A360, Samsung Gear 2, Spire Activity Tracker, and Stryd Power Meter. Intra-rater reliability and test-retest reliability was determined through Intraclass Correlation (ICC), while validity was determined via Bland-Altman analysis (limits of agreement; LoA), mean average percentage error (MAPE), and ICC. Significance was accepted at the p < .05 level. Steps determined by two independent counters were significantly reliable for the hike (ICC = 0.993, p < 0.001) and trail run (ICC = 0.991, p < 0.001). Three devices were valid across both exercise types and all methods of validity: Garmin Vivosmart HR+ (MAPE = 5.4%, ICC = 0.815, LoA = −58.1 to 50.4), Leaf Health Tracker (MAPE = 8.4%, ICC = 0.816, LoA = −78.8 to 39.4), and Stryd Power Meter (MAPE = 4.7%, ICC = 0.799, LoA = −34.3 to 78.9). As only certain devices returned valid step measurements, continued testing in applied environments are needed to have confidence in utilizing technology to track health and activity goals.