P4-metric

P4 metric [1][2] (also known as FS or Symmetric F [3]) enables performance evaluation of the binary classifier. It is calculated from precision, recall, specificity and NPV (negative predictive value). P4 is designed in similar way to F1 metric, however addressing the criticisms leveled against F1. It may be perceived as its extension.

Like the other known metrics, P4 is a function of: TP (true positives), TN (true negatives), FP (false positives), FN (false negatives).

Justification

The key concept of P4 is to leverage the four key conditional probabilities:

- the probability that the sample is positive, provided the classifier result was positive.
- the probability that the classifier result will be positive, provided the sample is positive.
- the probability that the classifier result will be negative, provided the sample is negative.
- the probability the sample is negative, provided the classifier result was negative.

The main assumption behind this metric is, that a properly designed binary classifier should give the results for which all the probabilities mentioned above are close to 1. P4 is designed the way that requires all the probabilities being equal 1. It also goes to zero when any of these probabilities go to zero.

Definition

P4 is defined as a harmonic mean of four key conditional probabilities:

In terms of TP,TN,FP,FN it can be calculated as follows:

Evaluation of the binary classifier performance

Evaluating the performance of binary classifier is a multidisciplinary concept. It spans from the evaluation of medical tests, psychiatric tests to machine learning classifiers from a variety of fields. Thus, many metrics in use exist under several names. Some of them being defined independently.

Predicted condition Sources: [4][5][6][7][8][9][10][11]
Total population
= P + N
Predicted positive (PP) Predicted negative (PN) Informedness, bookmaker informedness (BM)
= TPR + TNR − 1
Prevalence threshold (PT)
= TPR × FPR - FPR/TPR - FPR
Actual condition
Positive (P) [a] True positive (TP),
hit[b]
False negative (FN),
miss, underestimation
True positive rate (TPR), recall, sensitivity (SEN), probability of detection, hit rate, power
= TP/P = 1 − FNR
False negative rate (FNR),
miss rate
type II error [c]
= FN/P = 1 − TPR
Negative (N)[d] False positive (FP),
false alarm, overestimation
True negative (TN),
correct rejection[e]
False positive rate (FPR),
probability of false alarm, fall-out
type I error [f]
= FP/N = 1 − TNR
True negative rate (TNR),
specificity (SPC), selectivity
= TN/N = 1 − FPR
Prevalence
= P/P + N
Positive predictive value (PPV), precision
= TP/PP = 1 − FDR
False omission rate (FOR)
= FN/PN = 1 − NPV
Positive likelihood ratio (LR+)
= TPR/FPR
Negative likelihood ratio (LR−)
= FNR/TNR
Accuracy (ACC)
= TP + TN/P + N
False discovery rate (FDR)
= FP/PP = 1 − PPV
Negative predictive value (NPV)
= TN/PN = 1 − FOR
Markedness (MK), deltaP (Δp)
= PPV + NPV − 1
Diagnostic odds ratio (DOR)
= LR+/LR−
Balanced accuracy (BA)
= TPR + TNR/2
F1 score
= 2 PPV × TPR/PPV + TPR = 2 TP/2 TP + FP + FN
Fowlkes–Mallows index (FM)
= PPV × TPR
Matthews correlation coefficient (MCC)
= TPR × TNR × PPV × NPV - FNR × FPR × FOR × FDR
Threat score (TS), critical success index (CSI), Jaccard index
= TP/TP + FN + FP
  1. ^ the number of real positive cases in the data
  2. ^ A test result that correctly indicates the presence of a condition or characteristic
  3. ^ Type II error: A test result which wrongly indicates that a particular condition or attribute is absent
  4. ^ the number of real negative cases in the data
  5. ^ A test result that correctly indicates the absence of a condition or characteristic
  6. ^ Type I error: A test result which wrongly indicates that a particular condition or attribute is present


Properties of P4 metric

  • Symmetry - contrasting to the F1 metric, P4 is symmetrical. It means - it does not change its value when dataset labeling is changed - positives named negatives and negatives named positives.
  • Range:
  • Achieving requires all the key four conditional probabilities being close to 1.
  • For it is sufficient that one of the key four conditional probabilities is close to 0.

Examples, comparing with the other metrics

Dependency table for selected metrics ("true" means depends, "false" - does not depend):

P4 true true true true
F1 true true false false
Informedness false true true false
Markedness true false false true

Metrics that do not depend on a given probability are prone to misrepresentation when it approaches 0.

Example 1: Rare disease detection test

Let us consider the medical test aimed to detect kind of rare disease. Population size is 100 000, while 0.05% population is infected. Test performance: 95% of all positive individuals are classified correctly (TPR=0.95) and 95% of all negative individuals are classified correctly (TNR=0.95). In such a case, due to high population imbalance, in spite of having high test accuracy (0.95), the probability that an individual who has been classified as positive is in fact positive is very low:

And now we can observe how this low probability is reflected in some of the metrics:

  • (Informedness / Youden index)
  • (Markedness)

Example 2: Image recognition - cats vs dogs

We are training neural network based image classifier. We are considering only two types of images: containing dogs (labeled as 0) and containing cats (labeled as 1). Thus, our goal is to distinguish between the cats and dogs. The classifier overpredicts in favor of cats ("positive" samples): 99.99% of cats are classified correctly and only 1% of dogs are classified correctly. The image dataset consists of 100000 images, 90% of which are pictures of cats and 10% are pictures of dogs. In such a situation, the probability that the picture containing dog will be classified correctly is pretty low:

Not all the metrics are noticing this low probability:

  • (Informedness / Youden index)
  • (Markedness)

See also

References

  1. ^ Sitarz, Mikolaj (2023). "Extending F1 Metric, Probabilistic Approach". Advances in Artificial Intelligence and Machine Learning. 03 (2): 1025–1038. arXiv:2210.11997. doi:10.54364/AAIML.2023.1161.
  2. ^ "P4 metric, a new way to evaluate binary classifiers".
  3. ^ Hand, David J.; Christen, Peter; Ziyad, Sumayya (2024). "Selecting a classification performance measure: Matching the measure to the problem". arXiv:2409.12391 [cs.LG].
  4. ^ Fawcett, Tom (2006). "An Introduction to ROC Analysis" (PDF). Pattern Recognition Letters. 27 (8): 861–874. doi:10.1016/j.patrec.2005.10.010. S2CID 2027090.
  5. ^ Provost, Foster; Tom Fawcett (2013-08-01). "Data Science for Business: What You Need to Know about Data Mining and Data-Analytic Thinking". O'Reilly Media, Inc.
  6. ^ Powers, David M. W. (2011). "Evaluation: From Precision, Recall and F-Measure to ROC, Informedness, Markedness & Correlation". Journal of Machine Learning Technologies. 2 (1): 37–63.
  7. ^ Ting, Kai Ming (2011). Sammut, Claude; Webb, Geoffrey I. (eds.). Encyclopedia of machine learning. Springer. doi:10.1007/978-0-387-30164-8. ISBN 978-0-387-30164-8.
  8. ^ Brooks, Harold; Brown, Barb; Ebert, Beth; Ferro, Chris; Jolliffe, Ian; Koh, Tieh-Yong; Roebber, Paul; Stephenson, David (2015-01-26). "WWRP/WGNE Joint Working Group on Forecast Verification Research". Collaboration for Australian Weather and Climate Research. World Meteorological Organisation. Retrieved 2019-07-17.
  9. ^ Chicco D, Jurman G (January 2020). "The advantages of the Matthews correlation coefficient (MCC) over F1 score and accuracy in binary classification evaluation". BMC Genomics. 21 (1): 6-1–6-13. doi:10.1186/s12864-019-6413-7. PMC 6941312. PMID 31898477.
  10. ^ Chicco D, Toetsch N, Jurman G (February 2021). "The Matthews correlation coefficient (MCC) is more reliable than balanced accuracy, bookmaker informedness, and markedness in two-class confusion matrix evaluation". BioData Mining. 14 (13): 13. doi:10.1186/s13040-021-00244-z. PMC 7863449. PMID 33541410.
  11. ^ Tharwat A. (August 2018). "Classification assessment methods". Applied Computing and Informatics. 17: 168–192. doi:10.1016/j.aci.2018.08.003.

Read other articles:

Ukrainian political scandal (2000) This article needs additional citations for verification. Please help improve this article by adding citations to reliable sources. Unsourced material may be challenged and removed.Find sources: Cassette Scandal – news · newspapers · books · scholar · JSTOR (November 2011) (Learn how and when to remove this message) Part of a series on the History of Ukraine Prehistory Trypillian–Cucuteni culture Yamnaya culture Cataco…

American actor and film director (1897–1936) John GilbertGilbert in 1931BornJohn Cecil Pringle(1897-07-10)July 10, 1897Logan, Utah, U.S.DiedJanuary 9, 1936(1936-01-09) (aged 38)Los Angeles, California, U.S.Resting placeForest Lawn Memorial Park, GlendaleOther namesJack GilbertEducationHitchcock Military AcademyOccupationsActordirectorscreenwriterYears active1914–1934Spouses Olivia Burwell ​ ​(m. 1918; div. 1921)​ Leatrice Joy …

Hair length This article is about human head hair. For other uses, see Longhair (disambiguation). Portrait of Julie Manet by Renoir, 1894 Long hair is a hairstyle where the head hair is allowed to grow to a considerable length. Exactly what constitutes long hair can change from culture to culture, or even within cultures. For example, a woman with chin-length hair in some cultures may be said to have short hair, while a man with the same length of hair in some of the same cultures would be said …

Kisah Sedih di Hari MingguGenre Drama Roman Remaja SkenarioDanny ZukoCeritaDanny ZukoSutradaraNoto BagaskoroPemeran Marshanda Ira Wibowo Egi John Foreisythe Adi Bing Slamet Meriam Bellina Claudia Bella Gading Marten Jennifer Chacha Frederica Dwi Andika Berliana Febrianti Penggubah lagu temaMarshandaLagu pembukaKisah Sedih di Hari Minggu oleh MarshandaLagu penutupKisah Sedih di Hari Minggu oleh MarshandaPenata musikKafka NavisaNegara asalIndonesiaBahasa asliBahasa IndonesiaJmlh. episode78Pr…

非常尊敬的讓·克雷蒂安Jean ChrétienPC OM CC KC  加拿大第20任總理任期1993年11月4日—2003年12月12日君主伊利沙伯二世总督Ray HnatyshynRoméo LeBlancAdrienne Clarkson副职Sheila Copps赫布·格雷John Manley前任金·坎貝爾继任保羅·馬田加拿大自由黨黨魁任期1990年6月23日—2003年11月14日前任約翰·特納继任保羅·馬田 高級政治職位 加拿大官方反對黨領袖任期1990年12月21日—1993年11月4日…

Image segmentation algorithm Original image.The binary image resulting from a thresholding of the original image. In digital image processing, thresholding is the simplest method of segmenting images. From a grayscale image, thresholding can be used to create binary images.[1] Definition The simplest thresholding methods replace each pixel in an image with a black pixel if the image intensity I i , j {\displaystyle I_{i,j}} is less than a fixed value called the threshold T {\displaystyle…

Chemical compound with formula Na2SO4 Sodium sulfate Names IUPAC name Sodium sulfate Other names Sodium sulphateDisodium sulfateSulfate of sodiumThenardite (anhydrous mineral)Glauber's salt (decahydrate)Sal mirabilis (decahydrate)Mirabilite (decahydrate mineral) Identifiers CAS Number 7757-82-6 Y7727-73-3 (decahydrate) Y 3D model (JSmol) Interactive image ChEBI CHEBI:32149 Y ChEMBL ChEMBL233406 Y ChemSpider 22844 Y ECHA InfoCard 100.028.928 E number E514(i) (acidity…

Large-scale conflict in South America (1864–1870) Paraguayan WarFrom top, left to right: the Battle of Riachuelo (1865), the Battle of Tuyutí (1866), the Battle of Curupayty (1866), the Battle of Avay (1868), the Battle of Lomas Valentinas (1868), the Battle of Acosta Ñu (1869), the Palacio de los López during the occupation of Asunción (1869), and Paraguayan war prisoners (c. 1870)Date13 November 1864[1] – 1 March 1870(5 years, 3 months, 2 weeks and 2 days)Lo…

Set index for Scott baronets There have been twelve baronetcies created for people with the surname Scott, one in the Baronetage of England, two in the Baronetage of Nova Scotia, and nine in the Baronetage of the United Kingdom. Sir Walter Scott, 1st Baronet of Abbotsford Scott baronets of Kew Green (1653) Scott baronets, of Thirlestane (1666): see the Lord Napier Scott baronets of Ancrum (1671) Scott baronets of Great Barr 1806 Sibbald, later Scott baronets, of Dunninald (1806): see Sibbald bar…

Ini adalah nama Melayu; nama Ahmad merupakan patronimik, bukan nama keluarga, dan tokoh ini dipanggil menggunakan nama depannya, Nawawi. Kata bin (b.) atau binti (bt.), jika digunakan, berarti putra dari atau putri dari. Yang Berbahagia Dato' Ir. HajiNawawi AhmadDSDK DGSM KMN AMN PPN Anggota Dewan RakyatMasa jabatan5 Mei 2013 – 9 Mei 2018PendahuluAbu Bakar Taib (UMNO–BN)PenggantiMahathir Mohamad (BERSATU–PH)Daerah pemilihanLangkawi Informasi pribadiLahirNawawi bin Ahmad3 Mei 1961L…

Coppa dell'AFC 20142014 AFC Cup Competizione Coppa dell'AFC Sport Calcio Edizione 11ª Organizzatore AFC Date dal 2 febbraioal 18 ottobre 2014 Partecipanti 34 Nazioni 19 Risultati Vincitore Al Qadsia (KUW)(1° titolo) Secondo Erbil SC Statistiche Miglior marcatore Juan Belencoso (11) Incontri disputati 119 Gol segnati 369 (3,1 per incontro) Pubblico 336 085 (2 824 per incontro) Cronologia della competizione 2013 2015 Manuale La Coppa dell'AFC 2014 è l'11ª edizione d…

1985 January February March April May June July August September October November December Clockwise from top-left: Royal Air Force C-130 airdropping food during the Ethiopian famine; reductions of up to 70 percent in the ozone column observed in the austral (southern hemispheric) spring over Antarctica; Nevado del Ruiz erupts, killing 23,000 people; an earthquake in Mexico City killed 45,000 people; Air India Flight 182 seen less than two weeks before the bombing; the Nintendo Entertainment Sys…

Place of worship for Christians Church house and Church building redirect here. For the building in Poughkeepsie, New York, see Church Building. For other uses, see Church House (disambiguation). Part of a series onChristianity JesusChrist Nativity Baptism Ministry Crucifixion Resurrection Ascension BibleFoundations Old Testament New Testament Gospel Canon Church Creed New Covenant Theology God Trinity Father Son Holy Spirit Apologetics Baptism Christology History of theology Mission Salvation U…

Football match2008 DFB-Pokal FinalMatch programme coverEvent2007–08 DFB-Pokal Borussia Dortmund Bayern Munich 1 2 After extra timeDate19 April 2008 (2008-04-19)VenueOlympiastadion, BerlinRefereeKnut Kircher (Rottenburg)[1]Attendance74,500WeatherMostly cloudy8 °C (46 °F)62% humidity[2]← 2007 2009 → The 2008 DFB-Pokal Final decided the winner of the 2007–08 DFB-Pokal, the 65th season of Germany's premier knockout football cup competition. …

Pour les articles homonymes, voir Icelandic. Si ce bandeau n'est plus pertinent, retirez-le. Cliquez ici pour en savoir plus. Cet article ne cite pas suffisamment ses sources (février 2023). Si vous disposez d'ouvrages ou d'articles de référence ou si vous connaissez des sites web de qualité traitant du thème abordé ici, merci de compléter l'article en donnant les références utiles à sa vérifiabilité et en les liant à la section « Notes et références ». En pratique…

Gus PixleyCabinet card, s. 1890Lahir1864 (1864)MeninggalJuni 2, 1923 (umur 49)Saranac Lake, New York, Amerika SerikatTahun aktif1910-1921 Gus Pixley (1864 – 2 Juni 1923) adalah seorang pemeran, penyanyi dan komika Amerika Serikat dari era film bisu. Pixley tampil dalam 132 film antara 1910 dan 1921. Ia wafat di Saranac Lake, New York pada 2 Juni 1923 dalam usia 49 tahun.[1] Sebagian filmografi For His Son (1912 pendek) (tak disebutkan) The Transformation of Mike (1912 pe…

Helicopter made by Bölkow Bo 105 A Bo 105 flying with Luftrettung Bundesministerium des Innern Role Light utility helicopterType of aircraft National origin West Germany Manufacturer Messerschmitt-Bölkow-Blohm (MBB) First flight 16 February 1967 Introduction 1970 Status In service Primary users Republic of Korea ArmyIndonesian Army Spanish ArmyPhilippine Navy Produced 1967–2001[1][2] Number built 1640 (total)[3] 1404 (German production)[3] Variants Euroco…

Cet article est une ébauche concernant une actrice italienne. Vous pouvez partager vos connaissances en l’améliorant (comment ?) selon les conventions filmographiques. Angelica IppolitoBiographieNaissance 8 septembre 1944 (79 ans)NaplesNationalité italienneFormation Académie nationale d'art dramatique Silvio-D'AmicoActivités ActricePère Felice IppolitoMère Isabella Quarantotti (d)Enfant Cody Franchetti (d)modifier - modifier le code - modifier Wikidata Angelica Ippolito, née …

2010 2018 Élection du gouverneur de l'Illinois de 2014 4 novembre 2014 Bruce Rauner[1] – R Voix 1 823 627 50,27 %  Pat Quinn[2] – D Voix 1 681 343 46,35 %  Résultats par comtés Gouverneur de l'Illinois Élu Bruce Rauner modifier - modifier le code - voir Wikidata  L’élection du gouverneur et de son adjoint a lieu le 4 novembre 2014 dans l'Illinois. Primaire démocrate Le 18 mars 2014, les démocrates désignent leurs candidats …

Urban neighbourhood in Pantelej, Niš, SerbiaDurlan ДуpлaнUrban neighbourhood3 Aces - Tri SoliteraCountry SerbiaCityNiš MunicipalityPantelej Durlan (Serbian Cyrillic: Дуpлaн) is a neighborhood of the city of Niš, Serbia. It is located within Niš municipality of Pantelej. Location Durlan is located in the north-eastern outskirt of Niš. It is flat and bordered on the south by the Nišava river, on the east by Vrežina, on the north by the E80 road. The main street is named Knjaže…