Testing of the Psychological Instruments

By Dr. Monique Marie Chouraeshkenazi  Photo Source: Agor Behavioral Sciences, Inc.

By Dr. Monique Marie Chouraeshkenazi

Photo Source: Agor Behavioral Sciences, Inc.

Diagnostic and Statistical Manual of Mental Disorders (DSM-IV) by the American Psychological Association, 1994

Theodore Millon, Roger David, Carrie Millon, and Seth Grossman created the MCMI assessment (a revision of Millon’s previous assessment editions), which is a personality test to examine personality traits and psychiatric disorders that are outlined in DSM –IV. The third edition personality test has been instrumental in psychological experts investigating personality and other clinical disorders among those that suffer from psychiatric illnesses (Strack & Millon, 2007). Additionally, this version mitigated certain personality scales and added scales specific to depressive traumatic disorders.

Test measures are predicated on a Base Rate Score, which is a standardized scoring system that is primarily for clinical and personality assessments (Pearson, n.d.). Using this type of scoring mechanism examines the commonalities in psychiatric disorders of a specific population (Millon, 1997). MCMI personality assessments are indicative of Millon’s evolutionary theory. The theory suggests there are fundamental provocations that are based on an individual’s personality. There are three essential competencies:

-        Existence (Pleasure – Pain)

-        Adaptation (Passive – Active)

-        Reproduction (Self – Other)

The subsidiary elements of the three are based on the following:

-        Behavioral

-        Phenomenological

-        Intrapsychic

-        Biophysical (Millon & Davis, 1996).

Overall, Millon’s theory was heavily predicated on the imbalances of human functioning from a pathological perspective. To assess such pathological methods, the personality asssessment could not be perceived as fully comprehensive through specific dysfunctions. Millon built his theory through behavioral dispositions and found that the conceptualization of evolution was an important perspective to understanding psychological phenomena.

The purpose of the MCMI-III is to understand the psychiatric phenomena and personality traits of an individual. Additionally, the psychological assessment is used as a medical instrument to diagnose psychological and/or psychiatric disorders that are found in the DSM-IV. Finally, the test is used for those who have existing mental disorders/illnesses or receiving psychological services. The personality test is used for individuals 18 and older, who have at least an eighth-grade reading level. It is important to know this test should not be used for individuals under 18 and who are not currently being seen for psychological services (i.e. non-population). The following information concerns the test setup.

It is based on 28 scales: 14 personality disorder, 10 clinical syndrome, and four correctional.

Personality disorder: 13 subscales

Clinical syndrome: 11 subscales

Correctional: four subscales = 28 subscales

The personality disorder scales were established to coincide with the Axis II Disorders and the clinical paradigm are parallel Axis I Disorders that are outlined in DSM-IV (Halfaker, 2011).

MCMI-III is comprised of 175 true or false questions and takes up to 30 minutes to complete. The test can be taken manually or online. After an individual completes the test, it generates 28 scales (i.e. personality, clinical, and correctional) to distinguish how the individual managed the test-taking procedures (Ben et al., 2013). 

MCMI assessment is administered by licensed and trained psychologists who have a least a Master’s degree (minimum) or a professional degree, which is preferred (doctorate).

Either manual or computer

28 subscales

Schizoid, Avoidant, Depressive, Dependent, Histrionic, Narcissistic, Antisocial, Sadistic (Aggressive), Compulsive, Negativistic (Passive-Aggressive), Masochistic (Self-Defeating), Schizotypal, Borderline, and Paranoid. The Clinical Syndrome Scales include: Anxiety, Somatoform, Bipolar, Manic, Dysthymia, Alcohol Dependence, Drug Dependence, Post-Traumatic Stress Disorder, Thought Disorder, Major Depression, and Delusional Disorder. The Correctional Scales include: Disclosure, Desirability, Debasement, and Validity (Halfaker, 2011, p. 1). The scoring factor is based on what is considered high personality disorder traits. The score of 75 and higher constitutes a significant psychological concern and anything above an 85 is also a major concern that an individual may suffer from a personality disorder.  

The psychometrics of this test are considered moderate and deemed a reliable, psychological test. As previously mentioned, MCMI is predicated on the Base Rate Score, which is a standardized scoring system that is primarily for clinical and personality assessments. The test has been consistently analyzed through protocols of internal consistency and test-retest reliability (Millon, 2015). By using internal consistency, administrators can determine how the scale and scale subsets are parallel. For test-retest reliability, administrators can give an approximate analysis of the test-taker’s responses of a specific period. With both reliability methods, one can compare the test-taker’s responses and the higher the association between scores of the test, reliability is the result.

There is much debate on the validity of the test. There was a study conducted where MCMI-III proved to be a failure with the scoring system, which affected the personality disorder and clinical syndrome aspect of the scale and subscales (Choca, 2004). There was concern that the errors with the scoring of both scales could have potentially influenced the evidence of reliability (i.e. internal consistency and test-retest reliability). In addition, there are specific elements that influences the validity of a test to conclude a credible diagnosis and when calculated, it should be ensured that validity protocol is separate from reliability measures.

  The primary strength of the psychological assessment is it’s a shorter test than MCMI-II. The MCMI-III is comprised of 175 true/false question and can be completed in up to 30 minutes. Additionally, the test is parallel to the DMS-IV, so this adds credibility and reliability to its results, as it outlines the mandates and requirements within the psychological fields through the American Psychological Association (Lightfoot, 2017). The test is predicated on logical and theoretical frameworks, which makes it a more reasonable testing tool for diagnosing personality disorders. The scales and subset scales used in the MCMI-III are different from other psychological and clinical assessments, which provides reasonable data and statistical information.

As previously mentioned, a primary weakness of the test is the controversial debate on reliability and whether it influences the measurability on the evidence of reliability. Another weakness is the number (one) of theoretical framework to make the test a more diversified tool. Initially, it was feasible to have one theoretical outline to support a testing mechanism. Now, experts believe it is necessary to have at least three (Lightfoot Jr., 2017). The model debates on whether there are necessary dimensions to determine normal and abnormal behaviors. Because the test is predicated on Millon’s evolutionary theory, it is perceived to be a one-sided notion of just a single expert’s perception of what constitutes a clear and concise psychological assessment. Using his theory has been debate on the multiple variances in personality traits that are listed within the scale and subset categories. This could lead to confusion to determine what are specific based traits to constitute a concern and/or provoke a significant concern for mental health services. Carrie Millon (Theodore’s granddaughter) defended the multiple variables and expressed the number of additional factors with the third edition that are likely to identify cognitive disorders, such as attention deficit-hyperactivity disorder and PTSD.

There seems to be a mitigated level of multicultural information and these tests. From the research conducted and the various resources used for this testing critique, there was not any information to discuss the socioeconomic and/or demographical information in patients or prospects that have been tested using this tool. Though, there was a book found titled, Handbook of Cross-Cultural and Multicultural Personality Assessment that discusses some important elements of how this could be perceived as an all-inclusive psychological assessment. The book discusses how there is much controversy in how the administration of this test should be standardized with ethnic minorities. Research was conducted to discuss the correlation to standardize testing of personality assessments such as MCMI-III to the minority population. The authors also conducted research to determine test validation in theory of cross-cultural and multicultural assessments and found that there were limited approaches, which is comprised of limitations for a psychometric-statistical approach in such research paradigms (Dana, 2000). The test the authors used was based on cultural groups, which were primarily based on some multicultural applications that determined the commonality between cultures and minorities.

Stanford Binet Intelligence Scale

First published in 1905, but revised by Lewis M. Terman, a Stanford University psychologist.

The test was created to quantify children’s intelligence based on their ages. The age group was primarily between 3 and 12 because the focus is indicative of children’s intelligence capacity, which is different than adults. The test is measured on children’s cognitive and intellectual levels. The scores are based on a “composite” score, which are indicative of multiple tests because Binet believed that assessing a child’s intelligence is not conclusive to one assessment. The test measures fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory (Terman et al., 1915).

There does not seem to be theoretical and/or conceptual framework for the test, but the assessment is predicated on human intelligence competencies. Human intelligence is the concept of human cognitive and intellectual abilities through learned experience based on environmental and abstract concepts (Colom, 2010). Binet was requested by the government to established an assessment to examine children’s intellectual abilities who were educationally below-average or suffered from mental impedance. Below is the test setup:

There are 100 questions on the test.

The format is multiple choice.

The test measures fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory (Terman et al., 1915).

The psychologist or trained professional administers the test to a child within a specific age group. There are free practice tests available to examine the children’s attention span or capacity to take a test. The challenge is children, especially young children, must take the test with someone they do not know, which may affect the validity and reliability of the assessment because children tend to be weary of strangers.

The test is given by a psychologist or a highly-skilled, trained professional. Anyone who is going to administer the test must go through specialized training. That way they can provide any recommendations and/or revisions to enhance the reliability and validity of the assessment. A Master’s degree is required or a professional degree (doctorate).

The test can be completed manually or online.

The test measures fluid reasoning, knowledge, quantitative reasoning, visual-spatial processing, and working memory (Terman et al., 1915). Fluid reasoning is the capacity to solve problems without preparation, possibly the last minute. Knowledge is the ability to obtain information a specific subject. Quantitative reasoning tests how an individual can use numerical/mathematical and statistical information to solve problems. Spatial reasoning tests an individual’s ability to influence figures from an abstract and/or multidimensional perspective. Finally, working memory examines how an individual can remember information.

There have been multiple tests completed to examine the assessment’s reliability: split-half reliability, standard error of measurement, plotting of test information curves, test-retest stability, inter-scorer agreement, and that that the assessment has been consistent (Janzen, Obrzut, & Marusiak, 2003). Like the MCMI-III personality assessment, the Stanford Binet Intelligence test used internal consistency and it was found to be stable with testing scores (Bain & Allin, 2005). Though there have been some challenges with reliability (with test-retest stability), those issues have been deemed to be significant influences or debates about the assessment’s reliability.

Like MCMI-III, the Stanford Binet Intelligence Scale has had some challenges with validity. It is believed that the test shows unfairness between the items and other content which lacks the intellectual capacity of the individuals tested (Bain & Allin, 2005). There are also concerns with the age group, cognitive abilities, and construct validity, which is supported by empirical research that rationalizes a significant issue compared to the previous editions of the assessment.

The Stanford Binet Intelligence Scale is the primary test used for children who have intellectual impairments. Additionally, the assessment has been modernized from previous editions to make this the most used IQ tool for intelligence and cognitive capabilities. The test is considered a useful tool for school placement, identifying learning difficulties, and intellectual progression. Because of the sophistication of the assessment, it can also be used for neurological testing. The initial test was only used for young children, but now with the various revisions, the test can be used for children as young as two up to 23 years of age. Furthermore, trained psychologists can offer certain recommendations to the parents of those who complete, which is crucial to having an advantage of just having results. Finally, the assessment has an advantage over other IQ tests because of its scrupulous design and evaluation standards.

As previously mentioned, the Stanford Binet Intelligence Scale has some concerns with validity.  It is believed that the test shows unfairness between the items and other content, which lacks the intellectual capacity of the individuals tested (Bain & Allin, 2005). There are also concerns with the age group, cognitive abilities, and construct validity, which is supported by empirical research that rationalizes a significance issue compared to the previous editions of the assessment.  Additionally, it was found that there are some errors of measurement when it comes to nonverbal and verbal IQ standards.

The Stanford Binet Intelligence Scale has suffered controversy due to cultural fairness. The most controversial factors have been the nonverbal portions of the test, which is argued that the language should not be modified when it deals with final scoring (Harlow, 2003). Harlow (2003) conducted a research study to examine the nonverbal intellectual functioning of Latinos and Caucasian Americans living within the United States to investigate whether differential item functioning was a factor when it came to the nonverbal portion of the assessment. The researcher found that there were no substantial issues between the two cultural groups, negating the initial hypothesis of cultural fairness. Another study showed there was a concern for low performance levels of African Americans, Hispanic Americans, and Native Americans on standardized testing, specifically the Stanford Binet Intelligence Scale and that biased scores may be produced, as a test may be biased when compared to lower socioeconomic schools of another cultural background (Ford, 2004). 



Bain, S. K. & Allin, J. D. (2005). Book review: Stanford-Binet intelligence scales, fifth edition. Journal of Psychoeducational assessment, 23, pp. 87-95.

Ball, S. A., Nich, C., Rounsaville, B. J., Eagan, D., & Carroll, K. M. (2013). Millon clinical multiaxial inventory-III subtypes of opioid dependence: Validity and matching to behavioral therapies. National Institutes of Health: U.S. National Library of Medicine, 72(4), pp. 698-711.

Choca, J. P. (2004). Interpretive guide to the Millon Clinical Multiaxial Inventory. (3rd ed). Washington, DC: American Psychological Association.

Colom, R. PhD, Karama, S. MD, Jung, R. E. PhD, & Haier, R. J. PhD. (2010). Human intelligence and brain networks. National Institutes of Health: U.S. National Library of Medicine, 12(4).

Dana, R. H. (2000). Handbook of cross-cultural and multicultural personality assessment. New York: NY. Routledge.

Ford, D. (2004). Intelligence testing and cultural diversity: Concerns, cautions, and considerations. University of Connecticut. Retrieved from https://nrcgt.uconn.edu/wp-content/uploads/sites/953/2015/04/rm04204.pdf (accessed on 18 August 2018).

Halfaker, D. A. PhD, Akeson, S. T. PsyD, Hathcock, D. R. MS, Mattson, C., Wunderlich, T. L., BA. Psychological aspects of pain: Pain procedures in clinical practice. Elsevier: Science Direct. (3rd edition), pp. 13-22).

Harlow, S. C. (2010). Item fairness of the nonverbal subtests of the Stanford-Binet intelligence test, fifth edition, in a Latina/o sample. George Fox University. Retrieved from http://digitalcommons.georgefox.edu/cgi/viewcontent.cgi?article=1102&context=psyd (accessed on 18 August 2018).

Janzen, H. L., Obrzut, J. E., & Marusiak, C. W. (2003). Test review: Stanford-Binet intelligence studies, fifth edition. (SB:V). Itasca, IL: Riverside Publishing.

Lightfoot, Jr., J. M. (2017). Critical analysis of the Millon Clinical Multiaxial Inventory. International Journal of Scientific & Engineering Research, 8(1), pp. 1397-1399.

Millon, T. & David, R. D. (1996). An evolutionary theory of personality disorder. In J. F. Clarkin & M & M. F. Lenzenweger (Eds). Major Theories of Personality Disorder. New York: NY: Guilford Press, pp. 221-346.

Millon, T. (1997). Millon Clinical Multiaxial Inventory-III. Retrieved from http://www.drpaulsimpson.com/wp-content/uploads/2014/06/MCMI-Review-MMY.pdf (accessed on 18 August 2018).

Millon, T., Grossman, S., & Millon, C. (2015). MCMI-IV Clinical Multiaxial Inventory Manual. (1st ed). Bloomington, MN: Pearson Education, Inc.

Pearson. (n.d.). Frequently asked questions: Millon Clinical Multiaxial Inventory-III (MCMI-III). Pearson Education: United Kingdom. Retrieved from https://www.pearsonclinical.co.uk/Psychology/AdultMentalHealth/AdultForensic/MillonClinicalMultiaxialInventory-III(MCMI-III)/ForThisProduct/FrequentlyAskedQuestions.aspx (accessed on 18 August 2018).

Strack, S. & Millon, T. (2007). Contributions to the dimensional assessment of personality disorders using Millon’s model and the Millon Clinical Multiaxial Inventory (MCMI-III). Taylor and Francis: Journal of Personality Assessment, 89(1).

Terman, L. M., Ordahl, G., Ordahl, L., Galbreath, N, & Talbert, W. (1915). The Stanford revision of the Binet-Simon scale and some results from its application to 1,000 non-selected children. Journal of Educational Psychology, 6(9), pp. 551-562.

Dr. Monique Chouraeshkenazi