I recently had the opportunity to review four fitness tests as part of my university studies, and thought you may be interested in some of the assumptions behind theses test.
I’m sorry for the dull writing style.
The lack of scientific credibility behind certain tests is worrying, but you will be able to see why we don’t get you to sit on a bike and cycle slowly, or prod you with fat calipers, and why we quite like the RAST and MSST tests.
The Multistage Fitness test
The multi stage fitness test, also known as the bleep test and MSST, was designed by Leger and Lambert,( 1982) to assess the VO2 max of participants based on performance in a continuous, pace regulated, shuttle run between two points 20 m apart. The subjects are health checked and informed consent given (ACSM, 2005). The participant’s are lined up between two lines 20 m apart and are asked to reach the opposite line before a bleep sounds. The bleeps increase at preset stages and control the pace of the work (Brewer, 2002). The test is maximal in that it requires the participants to work to the point of exhaustion. Participants are withdrawn from the test when they fail, on two consecutive attempts, to complete the shuttle within the time allowed
The test is generally accepted as valid and reliable to the extent that is being considered by government as a standard test for schools to assess the fitness of school children (CMO, 2009).
In pilot trials 92 participants were tested against V02 max assessed via retro-extrapolation (Leger and Lambert, 1982), although only 25 performed the run twice. However, Ramsbottom, et al., (1988) re-validated the test by direct measurement of the VO2max of 74 volunteers with a correlation of 0.92. This can be seen as reliable a study needs more than 50 study participants, although 3 trials are preferred (Hopkins, 2000) However, Ramsbottom, et al., (1988) achieved this correlation among a homogeneous populations. This further validates the test as it is easier to improve correlations by using a heterogeneous population ( Atkinson and Nevill, 2001, Bonen et al., 1979)
Within the fitness and coaching community there are those who distrust VO2 max measurement for sporting use. Noakes, et al., (1990 et al) found VO2 max to be a poor predictor of race times: others can be confused by the abstract nature of the VO2max figure. In the mSST, the levels achieved in the test can form a fitness currency and for that matter the British marines and police force set bleep test standards (Brewer, 2002).
As the levels achieved can also be related to a running velocity. the test is further supported by McLaughlin, et al., (2010), who established that the velocity at vo2max correlated well with distance running and a “classic” endurance model, taking into account VO2max, %vo2max, lactate threhold and running economy. The MSST correlates well with 5k race times (Ramsbottom, et al., 1988) and 10k times (Paliczka, et al., 1987).
The test is ideal for team training where the sport is running based, as tests need to be sports specific (Lemmink, 2004). Many people can be tested at the same time with minimal equipment, and the subjects require motivation and encouragement which team members could supply . The test has been validated for both sexes as individuals or in groups.(Leger and Lambert, 1980)
For non athletes or special populations this test does carry very public connotations of success and failure. It is a maximal test so unsuitable for many special populations (ACSM, 2005), and can be subject to audibility.
Safety in training is a consistent concern . Gardner (2002) identifies the major cause of exercise related deaths in the US military to be related to atherosclerotic coronary artery disease, and the failure of screening procedures to exclude those suffering from ACAD. The increasing age of the participants was also flagged . Gardner (2002) suggests that vigorous exercise tests need to be conducted where immediate advanced life support measures are available. But Babraj ,et al., (2009) shows high intensity can be used even with medical populations.
As this is a maximal test, it can have a developmental training effect, which is an extra when training time is scarce. The test supplies an easily recordable benchmark.
Apart from direct measurement, the Cooper run test, which is the amount of distance covered in a 12 minute run seems the most viable alternative.However, the Cooper test is maximal from the start and has been criticised by Williamson and Hamley (1984) as it relies on motivation and self pacing skills and the results could be partly attributed to anaerobic systems. it calls for a bigger running area, which could mean it needs to be staged out doors and could be subject to the weather. Nevertheless it has an athletic component resulting in a real world effort. The suggestion being that both these tests are more suitable for athletic populations.
Astrand Rhyming Cycle Test
Astrand Rhyming cycle test is a sub-maximal V02 max test that relates mechanical work to a steady state heart rate,
The test requires a cycle ergometer with the ability to apply loads of 150,100 and 75W at 50 turns of the flywheel per minute: the aim of the test is to reach a steady state heart rate.
In the 5th and 6th minutes of the test, heart rates are taken ( either by palpitation or monitor) and compared. If they do not vary by more than 5 bpm within the range of 130 and 170 bpm, the test is successfully terminated and the rate recorded. If the heart rate is less than 130 bpm, the load is increased and the test continued until the heart rate reaches this level.. If the heart rates differs by more than 5 bpm, the test is continued until this criterion was met (Clink and Thomas, 1981) however the ACSM (2005) suggest averaging the 5/6th minute scores. Numerous other protocols exist. The results are compared against a standard nonogram to ascertain a theoretical VO2 max figure.
The use of this test by the RAF indicates its bias towards testing sedentary populations .The test removes pacing and motivation issues and anaerobic contributions and as it can give privacy, and removes the connotation of athleticism , it can be seen as a valid health appraisal that could result in the desire for health improvements from the subject (WILLIAMSON and HAMLEY, 1984.) It is ideal for special, sedentary, and older populations (Astrand & Rhyming, 1954) The latter limitation is crucial in that older adults demonstrate the highest prevalence of cardiovascular and other chronic diseases. It is not practical or safe to maximally test certain populations (Tanaka, et al., 2001) .
It is worth considering that VO2 max was originally adopted as a reference standard for cardio-respiratory fitness in those struggling to research cardiovascular illnesses (Shepherd, et al., 1968) rather than an athletic marker. However, the original experiment was to aid in the selection of military recruits, and athlete development hence The test was originally validated on 18 to 30 years olds (Astrand and Rhyming, 1954)
According to Heywood (2006), sub-maximal exercise protocols assume a steady state heart rate at every exercise intensity, and a linear relationship between heart rate, oxygen use and work load: This may be so at lower levels, but the relationship becomes curvilinear at higher levels of work .This test in particular assumes equal mechanical cycling efficiency so overestimates VO2max for highly trained and under estimates the untrained. It also assumes that maximum heart rates are equal when they can easily vary by 11 bpm. Tanaka, et al., (2001) who found that most maximal heart rate assumptions are in correct. as did Clink and Thomas , (1981) whom suggested that validation by maximal methods were inappropriate as heart rate responses vary in maximal work. Under estimations of VO2 max ( compared to treadmill tested) has ranged from 5 to 25% and has been variously attributed to habitual activity, physical conditioning and leg strength (ACSM, 2005)
Whilst there is an overwhelming concession that sub-maximal tests may carry, sometimes substantial, error, the method is seen as cheap, easy, quick, safe and ideal for special and sedentary populations (Astrand & Rhyming, 1954) however, once it is established that the test is to be targeted at sedentary/at risk populations, more variables present themselves to effect heart rate such as smoking, caffeine, time since last meal heat and hydration effect results.( ACSM, 2005). Diabetes and various medications can alter heart rate responses.(ACSM, 2005)
A frequently suggested alternative could be the step test, as the cycle test was originally compared with such a test (Astrand & Rhyming, 1954). Such a substitution would have a mathematical similarity with the cycle test, in that it relates mechanical work to heart rate. however, various walk tests exist which can be tailored to the individual at hand. The Rockport walk test is popular, and has a real world application. The test can be stepped down to a 6 minute walk test that is capable of being used with patients who have limited short term survival (ACSM, 2005)
RAST test
The running based anaerobic sprint test (RAST)(Draper and Whyte, 1996) is used as a test of anaerobic running power. A 35 meter running area is marked off with an adequate over run at each end. The subjects are health checked and informed consent given (ACSM, 2005) is briefed on the test then asked to sprint the distance as fast as possible. the time for each sprint is recorded. there is a 10 second turn around time between each sprint. The tester then applies various simple calculation to the collected information to produce power in watts per sprint (Mckenzie, 2005) from here, average power and a fatigue index can be implied. According to Mckenzie (2005) a low fatigue index indicates the athletes ability to sustain anaerobic performance, but a high decline in sprint times indicates the athlete needs to focus on lactate tolerance training.
As a genuine wattage output the figure is obviously inaccurate as the work is not directly against gravity, but, the standard calculation produces a recordable figure. For that matter the actual time itself is valid
The test needs to closely resemble the activity that requires the anaerobic output.(Mccardle, et al., 2007) Meckel, et al., (2009) anaerobic testing procedures should mimic the sports specific activity patterns
With a degree of organisation, it is possible to multiple test different candidates one after the other as long as a variety of testers are deployed. There is no computer equipment or specialist kit required. but , light gates can be used.
As a physiological test, purporting to assess anaerobic capacity, the test is poor. According to Aramatzis,et al., (1999), running may not relate to metabolic processes but to the efficiency of movement . Measurement error increases with increasing velocity and the power leakage due to absorption mechanisms within the tendon units cannot be ascertained. Vandewalle, et al., (1987) casts doubt on any test purporting to assess anaerobic capacity as maximal performance also depends on glycolytic and aerobic power as well as anaerobic capacity. Fatigue indexes (power decrease) of the all-out tests is not reliable and depends probably on aerobic power as well as the fast-twitch muscle fibre percentage. According to McArdle, et al., (2007) tests of anaerobic power are also problematic due to the influence of age, gender , skill motivation body size: greatly influence the production of norming tables. At this distance the test could be merely testing starting technique ability.
According to Zacharogiannis, et al., (2004)The test does seem to be a valid and reliable power out put test when validated against the Wingate test. This study confirmed significant correlation between RAST and Wingate in peak and mean power. Zagatto, et al., 2008 concludes peak power/mean power and fatigue index correlate with WANT ( but r’s of 0.46 to 0.63 is hardly conclusive) but good at predicting short distance running scores 50, 100, 200, 400. correlation between running anaerobic sprint test and anaerobic work capacity in soccer players Loures, et al., (2008)
The test is currently being used with a National League basketball team, footballers, sprinters, judo players and rugby players. Research into the RAST is continuing at the University of Wolverhampton
it is interesting that the rast test has already been used to validate various supplementation (Jourkesh, et al., 2007 )
The Wingate 30 second cycle test presents itself as the obvious alternative test, as good predictor of anaerobic capacity, reproducible, and good performance predictor (Zagatto, et al., 2009) but has a higher equipment requirement. However, the reservation that anaerobic protocols resemble the sports being tested seems valid, and cycling is not running.
Skinfold test
Skinfold is an anthropometric method for the estimation of body fat by taking skin fold measurements. Skin folds are lifted from the skeletal/muscular frame at specified sites using one of many makes of calipers available as a measuring tool. These measurements create a model of density. Formulae (Siri or Brozek) , based on various assumptions of the make up of human composition are applied to this model. partly based on the observations that large proportion of total body fat is in the subcutaneous tissue (keys and Brozeck, 1953)
This is in effect a two stage process, with two sets of validation: the issues are, can a skin fold test create a valid model of density and once confronted with a density figure, is it possible to distinguish what that percentage of that density is fat.
The test is important due to the growing obesity crisis. It is useful to have an international reference guide, and according to Durnin and Womersley(1974) fat levels influences death rate, affect drug effectiveness, and indicate whether the body can with stand cold and starvation.
The current use of skin fold is based on the Durnin and Wormersley (1974) skinfold method which estimates body density by using the above mention method.In the original paper, Durnin and Womersley (1974) list several reservations about the technique including the lack of linear relationship variation in skinfold compressibility .
However, this was not a random sample, but volunteers were selected to represent a spread of obesity levels (thereby making corrolation easier) and spread of age. However Mickelsen, (1958) felt that densitometric technique (underwater weighing) fraught with difficulty including gastrointestinal tract air: fear of underwater weighing . Nevertheless the ACSM guidelines,(2005) suggest skinfold measurements to be highly correlated with body composition as determined by hydrodensitometry. Durnin and Satwanti (1982) concluded that variations observed in the estimation of body fat by densitometry are well within the basic errors of the method. Kispert, et al.,(1987) observed that, in a clinical situation, small changes in body shape were difficult to spot if skinfold measurements were taken by different testers. however, ACSM 2005 suggests training and multiple practise sessions can overcome user error.
To this density model a second formula is applied, either Siri or Brozek, which, based on various assumptions as to the nature of body composition, delivers a % body fat figure which can be looked up against norming tables (ACSM 2005).
In the original paper, Durnin and Womersley (1974) list several reservations about the body composition elements of the technique including density of skeleton and aging changes in body composition with obesity and the proportion of fat situated subcutaneously however, where body composition was theorised to vary on the basis of race, gender or age, numerous corrective calculations now exist.(ACSM, 2005) currently there are two commonly used models about the way in which human density is distributed such as the two component model divides the body into the fat mass (FM) and the fat free mass (ffm).(Siri, 1956) Further research are producing three and four component models
However, Durnin et al 1997 argued that worrying about the possible technical errors of skin fold assumed little importance against the background about the basic assumptions listing densitometry, total body water and total body K, and others!!!).
it was only in 1984 that the actual cadaver studies were increased from 9 to 34 (Clarys, et al., 1984) and a range of techniques applied and cross checked including skinfold, underwater weighing (using the concept of adipose tissue free weight). The composition varied substantially (bones from 16.3%-25.75% and muscle 41.9 to 59.4%) which , it can be argued, totally, undermines densitometric assumptions about body composition. Clasey, et al., 1999 concluded, reviewing various body composition formulae that the use of many body composition techniques should be viewed with concern. Davies. et al., (1986), using A-mode ultrasound, observed that the proportion of fat situated subcutaneously (PFSS) was found to vary considerably between individuals (range 0·50–0·97 in the women, 0·40–0·97 in the men). Moreover, there was no relationship between subcutaneous and internal fat masses.
Mickelesen (1958) found considerable individual variation in the distribution of subcutaneous fat throughout the body.
In practise the technique is intrusive.
An alternative measure depends on for what purpose any obesity measure is to be used. For standard weight loss, a combination of dress size, BMI and self measurement of problem areas are often self applied by most people concerned with weight management. if the target is health risk appraisal, the method validated by Björntorp (1992) Pouliot, et al., (1994) specifically waist circumference seems to be well validated and correlated with health risks of concern to government, easily self applied, and applied in a clinical setting. The technique is recommended by the national forum for obesity.
References
ACSM (2005) ACSM Guidelines for Exercise Testing and Prescription, 7th ed. Lippincott, Williams, and Wilkins,
GREG ATKINSON1* and ALAN M. NEVILL 2001 Selected issues in the design and analysis of sport Journal of Sports Sciences, 2001, 19, 811± 82
Adamantios Arampatzis*, Axel Knicker, Verena Metzler, Gert-Peter Bruggemann 1999 Mechanical power in running: a comparison of different approaches .GermanSport University of Cologne, Institute for Athletics and Gynastics,
Astrand & Rhyming 1954 A nonogram for calculation of aerobic capacity (physical fitness) from pulse rate during submaximal work. Journal of Applied Physiology 7: 218-221.
Babraj J, Vollaard N, Keast C, Guppy F, Cottrel Timmons J l (2009) Extremely short duration high intensity training substantially improves insulin action in sedentary males BMC Endocrine Disorders http:// www.biomedcentral.com/1472-6823/9/3
Bonen, A., Heyward, V. H., Cureton, K. T., Boileau, R. A. and Massey, B. H., 1979 “Prediction of maximal oxygen uptakeinboys,ages7-15years”. Medicine and Science in Sports11:24-29.
Brewer John 2002 sport coach uk “multistage fitness test for the prediction of maximum oxygen uptake published by sports coach uk ( instruction manual)
Eurofit Provisional Handbook (Strasbourg, 1983). Testing Physical Fitness. Pub. HMSO, London.
Björntorp P (1992). Abdominal fat distribution and the metabolic syndrome. J Cardiovasc Pharmacol : 20 Supp 8: S26-S28.
Clink and TR Thomas, (1981) Validity of the Astrand-Ryhming nomogram for predicting maximal oxygen intake. British Journal of Sports Medicine, 1981, Vol 15, Issue 3 182-185.
Clarys JP, Martin AD, Drinkwater DT. (1984)Gross tissue weights in the human body by cadaver dissection. Hum Biol. Sep;56(3):459-73.
J. L. Clasey, J. A. Kanaley, L. Wideman, S. B. Heymsfield, C. D. Teates, M. E. Gutgesell, M. O. Thorner, M. L. Hartman, and A. Weltman (1986) Validity of methods of body composition assessment in young and older men and women J Appl Physiol 86: 1728-1738, 1999
CMO annual report 2009 HMSO
Peter S.W. Davies, P.R.M. Jones and N.G. Norgan 1986 The distribution of subcutaneous and internal fat in manAnnals of Human Biology Vol. 13, No. 2, Pages 189-192 ,
Draper, N., and Whyte, G. (1996) The running-based anaerobic sprint test. Peak Performance. 96, pp 4 – 5.
Durnin JV, de Bruin H, Feunekes GI. 1997 Skinfold thicknesses: is there a need to be very precise in their location? Br J Nutr. Jan;77(1):3-7.
J.V.G.A. Durnin and Satwanti 1982 Variations in the assessment of the fat content of the human body due to experimental technique in measuring body density Annals of Human Biology Vol. 9, No. 3, Pages 221-225
Durnin J. V. G. A. and Womersley. J (1974). Body fat assessed from total body density and its estimation from skinfold thickness: measurements on 481 men and women aged from 16 to 72 Years. British Journal of Nutrition, 32, pp 77-97
doi:10.1079/BJN19740060
Gardner John,(2002) Non traumatic exercise-related deaths in the us military 1996-1999 military Medicine. Dec. http://findarticles.com/p/articles/mi_qa3912/ (accessed on line 06/05/2009)
Heyward V 2006 Advanced fitness assessment exercise prescription 5th edition Human Kinetics.
Hopkins 2000 measures of reliability in sports medicine Science Sports Medicine 2000 jul(1) 1-15 Hopkins
Keys, A., and Brozek, J. (1953): Body fat in adult in man. Physiol. Rev. 33: 245-325 .
KISPERT and MERRIFIELD 1987 Interrater Reliability of Skinfold Fat Measurements Physical Therapy Volume 67 / Number 6, June 1987
Leger and lambert (1982). The maximal multistage 20m shuttle run test to predict VO2max European Journal of applied Physiology ( 1982) 49: 1-12
K A P M Lemmink; R Verheijen; C Visscher 2004 The discriminative power of the Interval Shuttle Run Test and the Maximal Mul… Journal of Sports Medicine and Physical Fitness; ; 44, 3; Health & Medical Complete pg. 233
Loures, filho, franco, bittencourt, kaminagakura papoti. 2008 Correlation between running anaerobic sprint test and anaerobic work capacity in soccer players International Journal of Exercise Science vol 1 2008 iss 5 42, accessed on line 14 april http://digitalcommons.wku.edu/ijes/vol1/iss5/42/
Mcardle, katch katch Exercise Physiology 2007 6th edition lippincott williams and wikins
Mckenzie, Brian 2005 101 Performance evaluation tests electric word plc
Morteza Jourkesh a; Sergej M. Ostojic b;M. A. Azarbayjani 2007 The Effects of Vitamin E and Vitamin C Supplementation on Bioenergetics Index Research in Sports Medicine, Volume 15, Issue 4 , pages 249 – 256
Meckel Y, Machnai O, Eliakim A 2009. Relationship among repeated sprint tests, aerobic fitness, and anaerobic fitness in elite adolescent soccer players. J Strength Cond Res. Jan;23(1):163-9.
MICKELSEN,Olaf 1958 Age Changes in Body Composition Public Health Report vol 73 no 4 april
MCLAUGHLIN, JAMES E.; HOWLEY, EDWARD T.; BASSETT, DAVID R. JR.; THOMPSON, DIXIE L.; FITZHUGH, EUGENE C. (2010) Test of the Classic Model for Predicting Endurance Running Performance
http://journals.lww.com/acsm-msse/Fulltext/2010/05000/Test_of_the_Classic_Model_for_Predicting_Endurance.20.aspx
Noakes, T. D., Myburgh, K. H., & Schall, R. (1990). Peak treadmill running velocity during VO2max test predicts running performance. Journal of Sports Sciences, 8, 35-45.
V. J. PALICZKA, BEd, A. K. NICHOLS, MEd and C. A. G. BOREHAM, PhD 1987 A MULTI-STAGE SHUTTLE RUN AS A PREDICTOR OF RUNNING PERFORMANCE AND MAXIMAL OXYGEN UPTAKE IN ADULTS Downloaded from hwmaint.bjsm.bmj.com on 23 February 2010 Brit.J.Sports Med. – Vol. 21, No. 4, December pp. 163-165 163
Pouliot M-C, Després JP, Lemieux S, Moorjani S, Bouchard C, Tremblay A, Nadeau A, Lupien PJ. (1994). Waist circumference and abdominal sagittal diameter: best simple anthropometric indexes of abdominal visceral adipose tissue accumulation and related cardiovascular risk in men and women. Am J Cardio. 73: 460-468
R Ramsbottom, J Brewer and C Williams 1988
A progressive shuttle run test to estimate maximal oxygen uptake. Br. J. Sports Med. ;22;141-144 doi:10.1136/bjsm.22.4.141
ROY J. SHEPHARD,1 C. ALLEN, A. J. S. BENADE, C. T. M. DAVIES, P.E.DIPRAMPERO,R.HEDMAN,J.E.MERRIMAN,K.MYHRE &R.SIMMONS 1968 The Maximum Oxygen Intake* An International Reference Standard of Cardiorespiratory Fitness Bul.Org.mond. Sante Bul. WldHlthOrg. ,38,757-764
Shigematsu R (2004). Cutoff values of Intra-abdominal fat area for the prevention of metabolic disorders in women. 36(5)
Siri we 1956 Body composition from fluid spaces and density Analysis of methods. In: Techniques for Measuring Body Composition, edited by Brozek J, and Henschel A.. Washington, DC: National Academy of Sciences, National Research Council, 1961.
Hirofumi Tanaka, PHD, Kevin D. Monahan, MS, Douglas R. Seals, PHD 2001 Age-Predicted Maximal Heart Rate Revisited Journal of the American College of Cardiology. Elsevier Science Inc. Boulder and Denver, Colorado Vol. 37, No. 1
Vandewalle H, Pérès G, Monod H. 1987 Standard anaerobic exercise tests.Sports Med. Jul-Aug;4(4):268-89.
Fit.Lt.W.M.WILLIAMSON* and Dr.E.J.HAMLEY 1984 FITNESS AND HEALTH MEASUREMENT IN AIRCREW
Brit.J.SportsMed.- Vol.18,No.2,June1984,pp.110-115
Zagatto, Alessandro M.; Beck, Wladimir Rafael; Gobatto, Claudio Alexandre Validity Of The Running Anaerobic Sprint Test (Rast) For Assessing Anaerobic Power And Predicting Performances Medicine & Science in Sports & Exercise 2008 – Volume 40 – Issue 5 – p S387 . also at Journal of Strength and Conditioning Research; Sep 2009; 23, 6; ProQuest Nursing & Allied Health Source pg. 1820
Zacharogiannis, E.; Paradisis, G.; Tziortzis, S. 2004 An Evaluation of Tests of Anaerobic Power and Capacity. Medicine & Science in Sports & Exercise: Volume 36(5) Supplement May p S116
Related posts:






{ 3 comments… read them below or add one }
Thanks for the article, I check the site daily and one of these days will get to you to train…
“a shuttle run between two points 20 minutes apart”
…now that is one heck of a Bleep test!
opps, ill change that now!
skinfolds are pretty dubious, said my lecturer, because the majority of the norms and data used for the various areas were taken from cadavers, most of which in medical research come from donated bodies which are almost always old people. whose patterns of fat deposits etc are vastly different to young people.
dunno if they have since revised with more modern tech and live subjects.
also the water weighing technique is affected by how much gas there is in the intestine which can stuff up readings?
{ 1 trackback }