Confusion matrix - Reference.org

On this page

Confusion matrix

Table layout for visualizing performance; also called an error matrix

In machine learning, a confusion matrix (or error matrix) is a table that visualizes the performance of a supervised learning algorithm by displaying actual versus predicted classes. Each row represents actual instances, while each column shows predicted instances, with the diagonal indicating correctly classified cases. This matrix helps identify common misclassifications between classes, hence its name. Also known as a matching matrix in unsupervised learning, it is a specific type of contingency table with identical class sets for both dimensions, facilitating error analysis in classification tasks.

We don't have any images related to Confusion matrix yet.

You can add one yourself here.

We don't have any YouTube videos related to Confusion matrix yet.

You can add one yourself here.

We don't have any PDF documents related to Confusion matrix yet.

You can add one yourself here.

We don't have any Books related to Confusion matrix yet.

You can add one yourself here.

We don't have any archived web articles related to Confusion matrix yet.

You can submit a link to a page to archive here.

Example

Given a sample of 12 individuals, 8 that have been diagnosed with cancer and 4 that are cancer-free, where individuals with cancer belong to class 1 (positive) and non-cancer individuals belong to class 0 (negative), we can display that data as follows:

Individual Number	1	2	3	4	5	6	7	8	9	10	11	12
Actual Classification	1	1	1	1	1	1	1	1	0	0	0	0

Assume that we have a classifier that distinguishes between individuals with and without cancer in some way, we can take the 12 individuals and run them through the classifier. The classifier then makes 9 accurate predictions and misses 3: 2 individuals with cancer wrongly predicted as being cancer-free (sample 1 and 2), and 1 person without cancer that is wrongly predicted to have cancer (sample 9).

Individual Number	1	2	3	4	5	6	7	8	9	10	11	12
Actual Classification	1	1	1	1	1	1	1	1	0	0	0	0
Predicted Classification	0	0	1	1	1	1	1	1	1	0	0	0

Notice, that if we compare the actual classification set to the predicted classification set, there are 4 different outcomes that could result in any particular column. One, if the actual classification is positive and the predicted classification is positive (1,1), this is called a true positive result because the positive sample was correctly identified by the classifier. Two, if the actual classification is positive and the predicted classification is negative (1,0), this is called a false negative result because the positive sample is incorrectly identified by the classifier as being negative. Third, if the actual classification is negative and the predicted classification is positive (0,1), this is called a false positive result because the negative sample is incorrectly identified by the classifier as being positive. Fourth, if the actual classification is negative and the predicted classification is negative (0,0), this is called a true negative result because the negative sample gets correctly identified by the classifier.

We can then perform the comparison between actual and predicted classifications and add this information to the table, making correct results appear in green so they are more easily identifiable.

Individual Number	1	2	3	4	5	6	7	8	9	10	11	12
Actual Classification	1	1	1	1	1	1	1	1	0	0	0	0
Predicted Classification	0	0	1	1	1	1	1	1	1	0	0	0
Result	FN	FN	TP	TP	TP	TP	TP	TP	FP	TN	TN	TN

The template for any binary confusion matrix uses the four kinds of results discussed above (true positives, false negatives, false positives, and true negatives) along with the positive and negative classifications. The four outcomes can be formulated in a 2×2 confusion matrix, as follows:

		Predicted condition
	Total population = P + N	Positive (PP)	Negative (PN)
Actual condition	Positive (P)	True positive (TP)	False negative (FN)
Actual condition	Negative (N)	False positive (FP)	True negative (TN)
Sources: ⁴ ⁵ ⁶ ⁷ ⁸ ⁹ ¹⁰

The color convention of the three data tables above were picked to match this confusion matrix, in order to easily differentiate the data.

Now, we can simply total up each type of result, substitute into the template, and create a confusion matrix that will concisely summarize the results of testing the classifier:

		Predicted condition
	Total 8 + 4 = 12	Cancer7	Non-cancer5
Actual condition	Cancer8	6	2
Actual condition	Non-cancer4	1	3

In this confusion matrix, of the 8 samples with cancer, the system judged that 2 were cancer-free, and of the 4 samples without cancer, it predicted that 1 did have cancer. All correct predictions are located in the diagonal of the table (highlighted in green), so it is easy to visually inspect the table for prediction errors, as values outside the diagonal will represent them. By summing up the 2 rows of the confusion matrix, one can also deduce the total number of positive (P) and negative (N) samples in the original dataset, i.e. P = T P + F N {\displaystyle P=TP+FN} and N = F P + T N {\displaystyle N=FP+TN} .

Table of confusion

In predictive analytics, a table of confusion (sometimes also called a confusion matrix) is a table with two rows and two columns that reports the number of true positives, false negatives, false positives, and true negatives. This allows more detailed analysis than simply observing the proportion of correct classifications (accuracy). Accuracy will yield misleading results if the data set is unbalanced; that is, when the numbers of observations in different classes vary greatly.

For example, if there were 95 cancer samples and only 5 non-cancer samples in the data, a particular classifier might classify all the observations as having cancer. The overall accuracy would be 95%, but in more detail the classifier would have a 100% recognition rate (sensitivity) for the cancer class but a 0% recognition rate for the non-cancer class. F1 score is even more unreliable in such cases, and here would yield over 97.4%, whereas informedness removes such bias and yields 0 as the probability of an informed decision for any form of guessing (here always guessing cancer).

According to Davide Chicco and Giuseppe Jurman, the most informative metric to evaluate a confusion matrix is the Matthews correlation coefficient (MCC).¹¹

Other metrics can be included in a confusion matrix, each of them having their significance and use.

		Predicted condition		Sources: ¹² ¹³ ¹⁴ ¹⁵ ¹⁶ ¹⁷ ¹⁸ ¹⁹ view talk edit
	Total population = P + N	Predicted positive	Predicted negative	Informedness, bookmaker informedness (BM) = TPR + TNR − 1	Prevalence threshold (PT) = ⁠√TPR × FPR - FPR/TPR - FPR⁠
Actual condition	Positive (P) ²⁰	True positive (TP), hit²¹	False negative (FN), miss, underestimation	True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power = ⁠TP/P⁠ = 1 − FNR	False negative rate (FNR), miss rate type II error ²² = ⁠FN/P⁠ = 1 − TPR
Actual condition	Negative (N)²³	False positive (FP), false alarm, overestimation	True negative (TN), correct rejection²⁴	False positive rate (FPR), probability of false alarm, fall-out type I error ²⁵ = ⁠FP/N⁠ = 1 − TNR	True negative rate (TNR), specificity (SPC), selectivity = ⁠TN/N⁠ = 1 − FPR
	Prevalence = ⁠P/P + N⁠	Positive predictive value (PPV), precision = ⁠TP/TP + FP⁠ = 1 − FDR	Negative predictive value (NPV) = ⁠TN/TN + FN⁠ = 1 − FOR	Positive likelihood ratio (LR+) = ⁠TPR/FPR⁠	Negative likelihood ratio (LR−) = ⁠FNR/TNR⁠
	Accuracy (ACC) = ⁠TP + TN/P + N⁠	False discovery rate (FDR) = ⁠FP/TP + FP⁠ = 1 − PPV	False omission rate (FOR) = ⁠FN/TN + FN⁠ = 1 − NPV	Markedness (MK), deltaP (Δp) = PPV + NPV − 1	Diagnostic odds ratio (DOR) = ⁠LR+/LR−⁠
	Balanced accuracy (BA) = ⁠TPR + TNR/2⁠	F1 score = ⁠2 PPV × TPR/PPV + TPR⁠ = ⁠2 TP/2 TP + FP + FN⁠	Fowlkes–Mallows index (FM) = √PPV × TPR	phi or Matthews correlation coefficient (MCC) = √TPR × TNR × PPV × NPV - √FNR × FPR × FOR × FDR	Threat score (TS), critical success index (CSI), Jaccard index = ⁠TP/TP + FN + FP⁠

Confusion matrices with more than two categories

Confusion matrix is not limited to binary classification and can be used in multi-class classifiers as well. The confusion matrices discussed above have only two conditions: positive and negative. For example, the table below summarizes communication of a whistled language between two speakers, with zero values omitted for clarity.²⁶

ProducedvowelVowelperceived	i	a	o	u
i	15	1
e	1	1
a		79	5
o		4	15	3
u			2	2

References

Stehman, Stephen V. (1997). "Selecting and interpreting measures of thematic classification accuracy". Remote Sensing of Environment. 62 (1): 77–89. Bibcode:1997RSEnv..62...77S. doi:10.1016/S0034-4257(97)00083-7. /wiki/Bibcode_(identifier) ↩
Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. S2CID 55767944. https://www.researchgate.net/publication/228529307 ↩
Opitz, Juri (2024). "A Closer Look at Classification Evaluation Metrics and a Critical Reflection of Common Evaluation Practice". Transactions of the Association for Computational Linguistics. 12: 820–836. arXiv:2404.16958. doi:10.1162/tacl_a_00675. https://doi.org/10.1162/tacl_a_00675 ↩
Provost, Foster; Fawcett, Tom (2013). Data science for business: what you need to know about data mining and data-analytic thinking (1. ed., 2. release ed.). Beijing Köln: O'Reilly. ISBN 978-1-4493-6132-7. 978-1-4493-6132-7 ↩
Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. Bibcode:2006PaReL..27..861F. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090. http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf ↩
Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. https://www.researchgate.net/publication/228529307 ↩
Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8. 978-0-387-30164-8 ↩
Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17. https://www.cawcr.gov.au/projects/verification/ ↩
Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312 ↩
Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003. https://doi.org/10.1016%2Fj.aci.2018.08.003 ↩
Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312 ↩
Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090. http://people.inf.elte.hu/kiss/11dwhdm/roc.pdf ↩
Provost, Foster; Tom Fawcett (2013-08-01). "Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking". O'Reilly Media, Inc. https://www.researchgate.net/publication/256438799 ↩
Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63. https://www.researchgate.net/publication/228529307 ↩
Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8. 978-0-387-30164-8 ↩
Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17. https://www.cawcr.gov.au/projects/verification/ ↩
Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6941312 ↩
Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 13. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7863449 ↩
Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003. https://doi.org/10.1016%2Fj.aci.2018.08.003 ↩
the number of real positive cases in the data ↩
A test result that correctly indicates the presence of a condition or characteristic ↩
Type II error: A test result which wrongly indicates that a particular condition or attribute is absent ↩
the number of real negative cases in the data ↩
A test result that correctly indicates the absence of a condition or characteristic ↩
Type I error: A test result which wrongly indicates that a particular condition or attribute is present ↩
Rialland, Annie (August 2005). "Phonological and phonetic aspects of whistled languages". Phonology. 22 (2): 237–271. CiteSeerX 10.1.1.484.4384. doi:10.1017/S0952675705000552. S2CID 18615779. /wiki/CiteSeerX_(identifier) ↩

Example

Table of confusion

Confusion matrices with more than two categories

See also

References