Menu
Home Explore People Places Arts History Plants & Animals Science Life & Culture Technology
On this page
Confusion matrix
Table layout for visualizing performance; also called an error matrix

In machine learning, a confusion matrix (or error matrix) is a table that visualizes the performance of a supervised learning algorithm by displaying actual versus predicted classes. Each row represents actual instances, while each column shows predicted instances, with the diagonal indicating correctly classified cases. This matrix helps identify common misclassifications between classes, hence its name. Also known as a matching matrix in unsupervised learning, it is a specific type of contingency table with identical class sets for both dimensions, facilitating error analysis in classification tasks.

We don't have any images related to Confusion matrix yet.
We don't have any YouTube videos related to Confusion matrix yet.
We don't have any PDF documents related to Confusion matrix yet.
We don't have any Books related to Confusion matrix yet.
We don't have any archived web articles related to Confusion matrix yet.

Example

Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows:

Individual Number123456789101112
Actual Classification111111110000

Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9).

Individual Number123456789101112
Actual Classification111111110000
Predicted Classification001111111000

Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column. One, if the actual classification is positive and the predicted classification is positive (1,1), this is called a true positive result because the positive sample was correctly identified by the classifier. Two, if the actual classification is positive and the predicted classification is negative (1,0), this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. Third, if the actual classification is negative and the predicted classification is positive (0,1), this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. Fourth, if the actual classification is negative and the predicted classification is negative (0,0), this is called a true negative result because the negative sample gets correctly identified by the classifier.

We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.

Individual Number123456789101112
Actual Classification111111110000
Predicted Classification001111111000
ResultFNFNTPTPTPTPTPTPFPTNTNTN

The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows:

Predicted condition
Total population = P + NPositive (PP)Negative (PN)
Actual conditionPositive (P)True positive (TP) False negative (FN)
Negative (N)False positive (FP) True negative (TN)
Sources: 45678910

The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.

Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:

Predicted condition
Total

8 + 4 = 12

Cancer7Non-cancer5
Actual conditionCancer862
Non-cancer413

In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. P = T P + F N {\displaystyle P=TP+FN} and N = F P + T N {\displaystyle N=FP+TN} .

Table of confusion

In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.

For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).

According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).11

Other metrics can be included in a confusion matrix, each of them having their significance and use.

Predicted conditionSources: 1213141516171819
  • view
  • talk
  • edit
Total population = P + NPredicted positivePredicted negativeInformedness, bookmaker informedness (BM) = TPR + TNR − 1Prevalence threshold (PT) = ⁠√TPR × FPR - FPR/TPR - FPR⁠
Actual conditionPositive (P) 20True positive (TP), hit21False negative (FN), miss, underestimationTrue positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power = ⁠TP/P⁠ = 1 − FNRFalse negative rate (FNR), miss rate type II error 22 = ⁠FN/P⁠ = 1 − TPR
Negative (N)23False positive (FP), false alarm, overestimationTrue negative (TN), correct rejection24False positive rate (FPR), probability of false alarm, fall-out type I error 25 = ⁠FP/N⁠ = 1 − TNRTrue negative rate (TNR), specificity (SPC), selectivity = ⁠TN/N⁠ = 1 − FPR
Prevalence = ⁠P/P + N⁠Positive predictive value (PPV), precision = ⁠TP/TP + FP⁠ = 1 − FDRNegative predictive value (NPV) = ⁠TN/TN + FN⁠ = 1 − FORPositive likelihood ratio (LR+) = ⁠TPR/FPR⁠Negative likelihood ratio (LR−) = ⁠FNR/TNR⁠
Accuracy (ACC) = ⁠TP + TN/P + N⁠False discovery rate (FDR) = ⁠FP/TP + FP⁠ = 1 − PPVFalse omission rate (FOR) = ⁠FN/TN + FN⁠ = 1 − NPVMarkedness (MK), deltaP (Δp) = PPV + NPV − 1Diagnostic odds ratio (DOR) = ⁠LR+/LR−⁠
Balanced accuracy (BA) = ⁠TPR + TNR/2⁠F1 score = ⁠2 PPV × TPR/PPV + TPR⁠ = ⁠2 TP/2 TP + FP + FN⁠Fowlkes–Mallows index (FM) = √PPV × TPRphi or Matthews correlation coefficient (MCC) = √TPR × TNR × PPV × NPV - √FNR × FPR × FOR × FDRThreat score (TS), critical success index (CSI), Jaccard index = ⁠TP/TP + FN + FP⁠

Confusion matrices with more than two categories

Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity.26

ProducedvowelVowelperceivedieaou
i151
e11
a795
o4153
u22

See also

References

  1. Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment. 62 (1): 77–89. Bibcode:1997RSEnv..62...77S. doi:10.1016/S0034-4257(97)00083-7. /wiki/Bibcode_(identifier)

  2. Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. S2CID 55767944. https://www.researchgate.net/publication/228529307

  3. Opitz, Juri (2024). "A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice". Transactions of the Association for Computational Linguistics. 12: 820–836. arXiv:2404.16958. doi:10.1162/tacl_a_00675. https://doi.org/10.1162/tacl_a_00675

  4. Provost, Foster; Fawcett, Tom (2013). Data science for business: what you need to know about data mining and data-analytic thinking (1. ed., 2. release ed.). Beijing Köln: O'Reilly. ISBN 978-1-4493-6132-7. 978-1-4493-6132-7

  5. Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. Bibcode:2006PaReL..27..861F. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090. http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf

  6. Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. https://www.researchgate.net/publication/228529307

  7. Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8. 978-0-387-30164-8

  8. Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17. https://www.cawcr.gov.au/projects/verification/

  9. Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312

  10. Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003. https://doi.org/10.1016%2Fj.aci.2018.08.003

  11. Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312

  12. Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090. http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf

  13. Provost, Foster; Tom Fawcett (2013-08-01). "Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking". O'Reilly Media, Inc. https://www.researchgate.net/publication/256438799

  14. Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. https://www.researchgate.net/publication/228529307

  15. Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8. 978-0-387-30164-8

  16. Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17. https://www.cawcr.gov.au/projects/verification/

  17. Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312

  18. Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 13. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7863449

  19. Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003. https://doi.org/10.1016%2Fj.aci.2018.08.003

  20. the number of real positive cases in the data

  21. A test result that correctly indicates the presence of a condition or characteristic

  22. Type II error: A test result which wrongly indicates that a particular condition or attribute is absent

  23. the number of real negative cases in the data

  24. A test result that correctly indicates the absence of a condition or characteristic

  25. Type I error: A test result which wrongly indicates that a particular condition or attribute is present

  26. Rialland, Annie (August 2005). "Phonological and phonetic aspects of whistled languages". Phonology. 22 (2): 237–271. CiteSeerX 10.1.1.484.4384. doi:10.1017/S0952675705000552. S2CID 18615779. /wiki/CiteSeerX_(identifier)