Stylometry grew out of earlier techniques of analyzing texts for evidence of authenticity, author identity, and other questions.
The modern practice of the discipline received publicity from the study of authorship problems in English Renaissance drama. Researchers and readers observed that some playwrights of the era had distinctive patterns of language preferences, and attempted to use those patterns to identify authors of uncertain or collaborative works. Early efforts were not always successful: in 1901, one researcher attempted to use John Fletcher's preference for " 'em", the contractional form of "them", as a marker to distinguish between Fletcher and Philip Massinger in their collaborations—but he mistakenly employed an edition of Massinger's works in which the editor had expanded all instances of " 'em" to "them".
The development of computers and their capacities for analyzing large quantities of data enhanced this type of effort by orders of magnitude. The great capacity of computers for data analysis, however, did not guarantee good quality output. During the early 1960s, Rev. A. Q. Morton produced a computer analysis of the fourteen Epistles of the New Testament attributed to St. Paul, which indicated that six different authors had written that body of work. A check of his method, applied to the works of James Joyce, gave the result that Ulysses, Joyce's multi-perspective, multi-style novel, was composed by five separate individuals, none of whom apparently had any part in the crafting of Joyce's first novel, A Portrait of the Artist as a Young Man.
In time, however, and with practice, researchers and scholars have refined their methods, to yield better results. One notable early success was the resolution of disputed authorship of twelve of The Federalist Papers by Frederick Mosteller and David Wallace.
While there are still questions concerning initial assumptions and methods (and, perhaps, always will be), few now dispute the basic premise that linguistic analysis of written texts can produce valuable information and insight. (Indeed, this was apparent even before the advent of computers: the successful application of a textual/linguistic analysis to the Fletcher canon by Cyrus Hoy and others yielded clear results during the late 1950s and early 1960s.)
Applications of stylometry include literary studies, historical studies, social studies, information retrieval, and many forensic cases and studies. Recently, long-standing debates about anonymous medieval Icelandic sagas have been advanced through its utilisation. It can also be applied to computer code and intrinsic plagiarism detection, which is to detect plagiarism based on the writing style changes within the document. Stylometry can also be used to predict whether someone is a native or non native English speaker by their typing speed.
Stylometry as a method is vulnerable to the distortion of text during revision. There is also the case of the author adopting different styles in the course of their career as was demonstrated in the case of Plato, who chose different stylistic policies such as those adopted for the early and middle dialogues addressing the Socratic problem.
Textual features of interest for authorship attribution are on the one hand computing occurrences of idiosyncratic expressions or constructions (e.g. checking for how the author uses interpunction or how often the author uses agentless passive constructions) and on the other hand similar to those used for readability analysis such as measures of lexical variation and syntactic variation.
Since authors often have preferences for certain topics, research experiments in authorship attribution mostly remove content words such as nouns, adjectives, and verbs from the feature set, only retaining structural elements of the text to avoid overfitting their models to topic rather than author characteristics.
Stylistic features are often computed as averages over a text or over the entire collected works of an author, yielding measures such as average word length or average sentence length. This enables a model to identify authors who have a clear preference for wordy or terse sentences but hides variation: an author with a mix of long and short sentences will have the same average as an author with consistent mid-length sentences. To capture such variation, some experiments use sequences or patterns over observations rather than average observed frequencies, noting e.g. that an author shows a preference for a certain stress or emphasis pattern,
or that an author tends to follow a sequence of long sentences with a short one.
One of the first approaches to authorship identification, by Mendenhall, can be said to aggregate its observations without averaging them.
Adversarial stylometry is the practice of altering writing style to reduce the potential for stylometry to discover the author's identity or their characteristics. This task is also known as authorship obfuscation or authorship anonymisation. Stylometry poses a significant privacy challenge in its ability to unmask anonymous authors or to link pseudonyms to an author's other identities, which, for example, creates difficulties for whistleblowers, activists, and hoaxers and fraudsters. The privacy risk is expected to grow as machine learning techniques and text corpora develop.
Manually obscuring style is possible, but laborious; in some circumstances, it is preferable or necessary. Automated tooling, either semi- or fully-automatic, could assist an author. How best to perform the task and the design of such tools is an open research question. While some approaches have been shown to be able to defeat particular stylometric analyses, particularly those that do not account for the potential of adversariality, establishing safety in the face of unknown analyses is an issue. Ensuring the faithfulness of the paraphrase is a critical challenge for automated tools.
It is uncertain if the practice of adversarial stylometry is detectable in itself. Some studies have found that particular methods produced signals in the output text, but a stylometrist who is uncertain of what methods may have been used may not be able to reliably detect them.
Stylometric methods are used for several academic topics, as an application of linguistics, lexicography, or literary study, in conjunction with natural language processing and machine learning, and applied to plagiarism detection, authorship analysis, or information retrieval.
PAN workshops (originally, plagiarism analysis, authorship identification, and near-duplicate detection, later more generally workshop on uncovering plagiarism, authorship, and social software misuse) organised since 2007 mainly in conjunction with information access conferences such as ACM SIGIR, FIRE, and CLEF. PAN formulates shared challenge tasks for plagiarism detection, authorship identification, author gender identification, author profiling, vandalism detection, and other related text analysis tasks, many of which hinge on stylometry.
Since stylometry has both descriptive use cases, used to characterise the content of a collection, and identificatory use cases, e.g. identifying authors or categories of texts, the methods used to analyse the data and features above range from those built to classify items into sets or to distribute items in a space of feature variation. Most methods are statistical in nature, such as cluster analysis and discriminant analysis, are typically based on philological data and features, and are fruitful application domains for modern machine learning methods.
Whereas in the past, stylometry emphasized the rarest or most striking elements of a text, contemporary techniques can isolate identifying patterns even in common parts of speech. Most systems are based on lexical statistics, i.e. using the frequencies of words and terms in the text to characterise the text (or its author). In this context, unlike for information retrieval, the observed occurrence patterns of the most common words are more interesting than the topical terms which are less frequent.
In one such method, the text is analyzed to find the 50 most common words. The text is then divided into 5,000 word chunks and each of the chunks is analyzed to find the frequency of those 50 words in that chunk. This generates a unique 50-number identifier for each chunk. These numbers place each chunk of text into a point in a 50-dimensional space. This 50-dimensional space is flattened into a plane using principal components analysis (PCA). This results in a display of points that correspond to an author's style. If two literary works are placed on the same plane, the resulting pattern may show if both works were by the same author or different authors.
A 1999 study showed that a neural network program reached 70% accuracy in determining the authorship of poems it had not yet analyzed. This study from Vrije Universiteit examined identification of poems by three Dutch authors using only letter sequences such as "den".
One problem with this method of analysis is that the network can become biased based on its training set, possibly selecting authors the network has analyzed more often.
One method for identifying style is termed "rare pairs" and relies upon individual habits of collocation. The use of certain words may, for a particular author, be associated idiosyncratically with the use of other, predictable words.
The diffusion of the internet has shifted the authorship attribution attention towards online texts (web pages, blogs, etc.) electronic messages (e-mails, tweets, posts, etc.), and other types of written information that are far shorter than an average book, much less formal and more diverse in terms of expressive elements such as colors, layout, fonts, graphics, emoticons, etc. Efforts to take into account such aspects at the level of both structure and syntax were reported in. In addition, content-specific and idiosyncratic cues (e.g., topic models and grammar checking tools) were introduced to unveil deliberate stylistic choices.
Standard stylometric features have been employed to categorize the content of a chat by instant messaging, or the behavior of the participants, but attempts of identifying chat participants are still few and early. Furthermore, the similarity between spoken conversations and chat interactions has been neglected while being a major difference between chat data and any other type of written information.
Argamon, Shlomo, Kevin Burns, and Shlomo Dubnov, eds. The structure of style: algorithmic approaches to understanding manner and meaning. Springer Science & Business Media, 2010. /wiki/Shlomo_Argamon
Westcott, Richard (15 June 2006). "Making hit music into a science". BBC News. http://news.bbc.co.uk/1/hi/5083986.stm?ls
Sethi, Ricky (2016-06-07). "Using computers to better understand art". The Conversation. Retrieved 2021-12-01. https://theconversation.com/using-computers-to-better-understand-art-56887
McIlroy-Young, Reid; Wang, Yu; Sen, Siddhartha; Kleinberg, Jon; Anderson, Ashton (2021). Detecting Individual Decision-Making Style: Exploring Behavioral Stylometry in Chess. 35th Conference on Neural Information Processing Systems. https://openreview.net/forum?id=9RFFgpQAOzk
Chen, Hsinchun; Yang, Christopher C.; Chau, Michael; Li, Shu-Hsing (2009). Intelligence and Security Informatics: Pacific Asia Workshop, PAISI 2009, Bangkok, Thailand, April 27, 2009. Proceedings. Berlin: Springer Science & Business Media. p. 15. ISBN 9783642013928. 9783642013928
Samuel Schoenbaum, Internal evidence and Elizabethan dramatic authorship; an essay in literary history and method, p. 171. /wiki/Samuel_Schoenbaum
Lutoslawski, W. (1898). "Principes de stylométrie appliqués à la chronologie des œuvres de Platon". Revue des Études Grecques. 11 (41): 61–81. doi:10.3406/reg.1898.5847. ISSN 0035-2039. /wiki/Doi_(identifier)
Samuel Schoenbaum, Internal evidence and Elizabethan dramatic authorship; an essay in literary history and method, p. 196. /wiki/Samuel_Schoenbaum
F. Mosteller & D. Wallace (1964). Inference and Disputed Authorship: The Federalist. Reading, MA: Addison-Wesley. /wiki/Reading,_MA
Chaski, Carole (2012). Solan, Lawrence M; Tiersma, Peter M (eds.). Author Identification in the Forensic Setting. Oxford University Press. doi:10.1093/oxfordhb/9780199572120.001.0001. ISBN 9780199572120. {{cite book}}: |journal= ignored (help) 9780199572120
Chaski, Carole (22 December 2005). Wecht, Cyril H.; Rago, John T. (eds.). Forensic Science and Law: Investigative Applications in Criminal, Civil and Family Justice. CRC Press. ISBN 978-1-4200-5811-6. 978-1-4200-5811-6
Michael MacPherson and Yoav Tirosh (2020). "A Stylometric Analysis of Ljósvetninga saga". Gripla. 31: 7–41. https://www.academia.edu/44830058
Haukur Thorgeirsson (2018). "How similar are Heimskringla and Egils saga? An application of Burrows' delta to Icelandic texts". European Journal of Scandinavian Studies. 48 (1): 1–18. doi:10.1515/ejss-2018-0001. https://www.researchgate.net/publication/323901992
Sigurður Ingibergur Björnsson, Steingrímur Páll Kárason, and Jón Karl Helgason (2021). ""Stylometry and the Faded Fingerprints of Saga Authors"". In Search of the Culprit: Aspects of Medieval Authorship, edited by Lukas Rösli and Stefanie Gropper: 97–122. doi:10.1515/9783110725339-005. ISBN 9783110725339.{{cite journal}}: CS1 maint: multiple names: authors list (link) 9783110725339
Claburn, Thomas (March 16, 2018). "FYI: AI tools can unmask anonymous coders from their binary executables". The Register. Retrieved August 2, 2018. https://www.theregister.co.uk/2018/03/16/identifying_anonymous_programmers/
Bensalem, Imene; Rosso, Paolo; Chikhi, Salim (2019). "On the use of character n-grams as the only intrinsic evidence of plagiarism". Language Resources and Evaluation. 53 (3): 363–396. doi:10.1007/s10579-019-09444-w. hdl:10251/159151. S2CID 86630897. /wiki/Doi_(identifier)
Brizan, David (October 2015). "Utilizing linguistically enhanced keystroke dynamics to predict typist cognition and demographics". International Journal of Human-Computer Studies. 82: 57–68. doi:10.1016/j.ijhcs.2015.04.005. /wiki/Doi_(identifier)
Alican, Necip Fikri (2012). Rethinking Plato: A Cartesian Quest for the Real Plato. Amsterdam: Rodopi. p. 183. ISBN 9789042035379. 9789042035379
Rowe, Christopher (2000). The Cambridge History of Greek and Roman Political Thought. Cambridge, UK: Cambridge University Press. p. 160. ISBN 0521481368. 0521481368
Stamatatos, Efstathios (2009). "A survey of modern authorship attribution methods". JASIST. 60 (3): 538–556. doi:10.1002/asi.21001. S2CID 6231242. /wiki/Doi_(identifier)
Stamatatos, Efstathios (2018). "Masking topic-related information to enhance authorship attribution". JASIS. 69 (3).
Karlgren, Jussi; Esposito, Lewis; Gratton, Chantal; Kanerva, Pentti (2018). "Authorship Profiling Without Using Topical Information". CLEF Working Notes. CEUR-WS. /wiki/Pentti_Kanerva
Corbara, Silvia; Moreo, Alejandro; Sebastiani, Fabrizio (2022). "Syllabic quantity patterns as rhythmic features for Latin authorship attribution". JASIST. 74: 128–141. arXiv:2110.14203. doi:10.1002/asi.24660. S2CID 239998537. https://asistdl.onlinelibrary.wiley.com/doi/full/10.1002/asi.24660
Corbara, Silvia; Chulvi, Berta; Rosso, Paolo; Moreo, Alejandro (2022). "Rhythmic and Psycholinguistic Features for Authorship Tasks in the Spanish Parliament: Evaluation and Analysis". Experimental IR Meets Multilinguality, Multimodality, and Interaction. CLEF. Springer. pp. 79–92. doi:10.1007/978-3-031-13643-6_6. https://doi.org/10.1007/978-3-031-13643-6_6
Karlgren, Jussi; Eriksson, Gunnar (2007). "Authors, Genre, and Linguistic Convention". SIGIR Workshop on Plagiarism Analysis, Authorship Identification, and Near-Duplicate Detection. SIGIR. PAN.
Eriksson, Linda (2014). Sequential Aggregation of Textual Features for Domain Independent Author Identification (MSc). KTH Royal Institute of Technology.
Mendenhall, T C (1887). "The characteristic curves of composition". Science. 9 (214S): 237–246. doi:10.1126/science.ns-9.214S.237. PMID 17736020. https://zenodo.org/record/1448355
Chen, Beichen (2021). Embeddings for Book Similarities (PDF) (MSc). KTH Royal Institute of Technology. https://www.diva-portal.org/smash/get/diva2:1601084/FULLTEXT01.pdf
Stamatatos, Efstathios; Kestemont, Mike; Kredens, Krzysztof; Pezik, Piotr; Heini, Annina (2022). "Overview of the Authorship Verification Task at PAN 2022". In Faggioli; Ferro; Hanbury; Potthast (eds.). CLEF 2022 Labs and Workshops, Notebook Papers. CEUR-WS. Retrieved September 6, 2022. https://pan.webis.de/publications.html#stamatatos_2022
Neal et al. 2018, p. 5. - Neal, Tempestt; Sundararajan, Kalaivani; Fatima, Aneez; Yan, Yiming; Xiang, Yingfei; Woodard, Damon (2018). "Surveying Stylometry Techniques and Applications". ACM Computing Surveys. 50 (6): 1–36. doi:10.1145/3132039. S2CID 21360798. https://doi.org/10.1145%2F3132039
Gröndahl & Asokan 2020a, p. 3. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Kacmarcik & Gamon 2006, p. 444. - Kacmarcik, Gary; Gamon, Michael (17 July 2006). "Obfuscating document stylometry to preserve author anonymity". Proceedings of the COLING/ACL 2006 Main Conference Poster Sessions. pp. 444–451. https://aclanthology.org/P06-2058/
Mahmood et al. 2019, p. 54. - Mahmood, Asad; Ahmad, Faizan; Shafiq, Zubair; Srinivasan, Padmini; Zaffar, Fareed (2019). "A Girl Has No Name: Automated Authorship Obfuscation using Mutant-X". Proceedings on Privacy Enhancing Technologies. 2019 (4): 54–71. doi:10.2478/popets-2019-0058. S2CID 197621394. https://doi.org/10.2478%2Fpopets-2019-0058
Afroz, Brennan & Greenstadt 2012, p. 461. - Afroz, Sadia; Brennan, Michael; Greenstadt, Rachel (2012). "Detecting Hoaxes, Frauds, and Deception in Writing Style Online". 2012 IEEE Symposium on Security and Privacy. pp. 461–475. doi:10.1109/SP.2012.34. ISBN 978-1-4673-1244-8. https://doi.org/10.1109%2FSP.2012.34
Gröndahl & Asokan 2020a, p. 28. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Neal et al. 2018, p. 6. - Neal, Tempestt; Sundararajan, Kalaivani; Fatima, Aneez; Yan, Yiming; Xiang, Yingfei; Woodard, Damon (2018). "Surveying Stylometry Techniques and Applications". ACM Computing Surveys. 50 (6): 1–36. doi:10.1145/3132039. S2CID 21360798. https://doi.org/10.1145%2F3132039
Potthast, Hagen & Stein 2016, p. 10. - Potthast, Martin; Hagen, Matthias; Stein, Benno (2016). Author Obfuscation: Attacking the State of the Art in Authorship Verification (PDF). Conference and Labs of the Evaluation Forum. https://ceur-ws.org/Vol-1609/16090716.pdf
Saedi & Dras 2020, p. 181. - Saedi, Chakaveh; Dras, Mark (December 2020). "Large Scale Author Obfuscation Using Siamese Variational Auto-Encoder: The SiamAO System". Proceedings of the Ninth Joint Conference on Lexical and Computational Semantics. pp. 179–189. https://aclanthology.org/2020.starsem-1.19
Neal et al. 2018, p. 6. - Neal, Tempestt; Sundararajan, Kalaivani; Fatima, Aneez; Yan, Yiming; Xiang, Yingfei; Woodard, Damon (2018). "Surveying Stylometry Techniques and Applications". ACM Computing Surveys. 50 (6): 1–36. doi:10.1145/3132039. S2CID 21360798. https://doi.org/10.1145%2F3132039
Gröndahl & Asokan 2020a, p. 21-22. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Wang, Juola & Riddell 2022, p. 2. - Wang, Haining; Juola, Patrick; Riddell, Allen (2022). "Reproduction and Replication of an Adversarial Stylometry Experiment". arXiv:2208.07395. https://arxiv.org/abs/2208.07395
Gröndahl & Asokan 2020a, p. 21-22. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Neal et al. 2018, p. 27. - Neal, Tempestt; Sundararajan, Kalaivani; Fatima, Aneez; Yan, Yiming; Xiang, Yingfei; Woodard, Damon (2018). "Surveying Stylometry Techniques and Applications". ACM Computing Surveys. 50 (6): 1–36. doi:10.1145/3132039. S2CID 21360798. https://doi.org/10.1145%2F3132039
Gröndahl & Asokan 2020a, p. 28. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Brennan, Afroz & Greenstadt 2012, p. 2. - Brennan, Michael; Afroz, Sadia; Greenstadt, Rachel (2012). "Adversarial stylometry: Circumventing Authorship Recognition to Preserve Privacy and Anonymity" (PDF). ACM Transactions on Information and System Security. 15 (3): 1–22. doi:10.1145/2382448.2382450. S2CID 16176436. https://www1.icsi.berkeley.edu/~sadia/papers/adversarial_stylometry.pdf
Zhai et al. 2022, p. 7373. - Zhai, Wanyue; Rusert, Jonathan; Shafiq, Zubair; Srinivasan, Padmini (2022). "A Girl Has A Name, And It's ... Adversarial Authorship Attribution for Deobfuscation". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). pp. 7372–7384. arXiv:2203.11849. doi:10.18653/v1/2022.acl-long.509. S2CID 248780012. https://arxiv.org/abs/2203.11849
Emmery, Kádár & Chrupała 2021, p. 2388-2389. - Emmery, Chris; Kádár, Ákos; Chrupała, Grzegorz (2021). "Adversarial Stylometry in the Wild: Transferable Lexical Substitution Attacks on Author Profiling". Proceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume. pp. 2388–2402. arXiv:2101.11310. doi:10.18653/v1/2021.eacl-main.203. S2CID 231719026. https://arxiv.org/abs/2101.11310
Gröndahl & Asokan 2020a, p. 28. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Gröndahl & Asokan 2020a, p. 28. - Gröndahl, Tommi; Asokan, N. (2020a). "Text Analysis in Adversarial Settings: Does Deception Leave a Stylistic Trace?". ACM Computing Surveys. 52 (3): 1–36. arXiv:1902.08939. doi:10.1145/3310331. S2CID 67856540. https://arxiv.org/abs/1902.08939
Argamon, Shlomo, Jussi Karlgren, and James G. Shanahan. Stylistic analysis of text for information access. Papers from the workshop held in conjunction with the
28th Annual International ACM Conference on Research and
Development in Information Retrieval, August 13–19, 2005,
Salvador, Bahia, Brazil. Swedish institute of computer science, 2005. /wiki/Shlomo_Argamon
"The Signature Stylometric System". PhiloComp. Retrieved 2014-01-03. http://www.philocomp.net/texts/signature.htm
"JGAAP". JGAAP. 2012-09-04. Retrieved 2012-10-15. http://www.jgaap.com
"The stylo for R package". Computational Stylistics Group. 2014-10-24. Archived from the original on 2014-12-21. Retrieved 2014-10-24. https://web.archive.org/web/20141221100741/https://sites.google.com/site/computationalstylistics/stylo
Eder, Maciej; Rybicki, Jan; Kestemont, Mike (2016). "Stylometry with R: a package for computational text analysis" (PDF). R Journal. 8 (1): 107–121. doi:10.32614/RJ-2016-007. https://journal.r-project.org/archive/2016-1/eder-rybicki-kestemont.pdf
Daelemans, Walter & Hoste, Véronique (2013). STYLENE: an Environment for Stylometry and Readability Research for Dutch (Technical report). CLiPS Technical Report Series. ISSN 2033-3544. http://stylene.be
Argamon, Shlomo, Kevin Burns, and Shlomo Dubnov, eds. The structure of style: algorithmic approaches to understanding manner and meaning. Springer Science & Business Media, 2010. /wiki/Shlomo_Argamon
Argamon, Shlomo, Jussi Karlgren, and James G. Shanahan. Stylistic analysis of text for information access. Papers from the workshop held in conjunction with the
28th Annual International ACM Conference on Research and
Development in Information Retrieval, August 13–19, 2005,
Salvador, Bahia, Brazil. Swedish institute of computer science, 2005. /wiki/Shlomo_Argamon
Yan Qu, James G. Shanahan, and Janyce Wiebe. "Exploring attitude and affect in text: Theories and applications." AAAI Spring Symposium Technical report SS-04-07. AAAI Press, Menlo Park, CA. 2004. /w/index.php?title=Yan_Qu&action=edit&redlink=1
Jussi Karlgren, Björn Gambäck, and Pentti Kanerva. "Acquiring (and Using) Linguistic (and World) Knowledge for Information Access." (2002). AAAI Spring Symposium. Technical report SS-02-09. AAAI Press, Menlo Park, CA. 2002. /wiki/Jussi_Karlgren
Shlomo Argamon, Shlomo Dubnov, and Julie Jupp. "Style and Meaning in Language, Art, Music, and Design" (2004). AAAI Fall Symposium. Technical report FS-04-07. /wiki/Shlomo_Argamon
Potthast, Martin, Benno Stein, Alberto Barrón-Cedeño, and Paolo Rosso. "An evaluation framework for plagiarism detection." In Proceedings of the 23rd international conference on computational linguistics: Posters, pp. 997–1005. Association for Computational Linguistics, 2010.
Stamatatos, Efstathios, Walter Daelemans, Ben Verhoeven, Patrick Juola, Aurelio López-López, Martin Potthast, and Benno Stein. "Overview of the Author Identification Task at PAN 2014." In CLEF (Working Notes), pp. 877–897. 2014.
Rangel, Francisco, Paolo Rosso, Martin Potthast, and Benno Stein. "Overview of the 5th author profiling task at pan 2017: Gender and language variety identification in twitter." Working Notes Papers of the CLEF (2017).
Rangel Pardo, Francisco Manuel, Fabio Celli, Paolo Rosso, Martin Potthast, Benno Stein, and Walter Daelemans. "Overview of the 3rd Author Profiling Task at PAN 2015." In CLEF 2015 Evaluation Labs and Workshop Working Notes Papers, pp. 1–8. 2015.
Potthast, Martin, Benno Stein, and Teresa Holfeld. "Overview of the 1st International Competition on Wikipedia Vandalism Detection." In CLEF (Notebook Papers/LABs/Workshops). 2010.
Text processing text analysis and generation – text typology and attribution. Proceedings of Nobel symposium 51. Edited by Sture Allén. Stockholm: Almqvist & Wiksell international 1982. Data linguistica, 16. Nobel symposium, 51. ISBN 91-22-00594-3 /wiki/Sture_All%C3%A9n
Karlgren, Jussi (2003). "Helander: An Authorship Attribution Case". Retrieved 4 October 2017. /wiki/Jussi_Karlgren
Airoldi, Edoardo M.; Fienberg, Stephen E.; Skinner, Kiron K. (July 2007). "Whose Ideas? Whose Words? Authorship of Ronald Reagan's Radio Addresses" (PDF). PS: Political Science & Politics. 40 (3): 501–506. CiteSeerX 10.1.1.190.5798. doi:10.1017/S1049096507070874. S2CID 18730541. /wiki/Edoardo_Airoldi
Author Unknown by Gavin McNett Salon November 2, 2000 http://www.salon.com/2000/11/02/foster_5/
Belluck, Pam (April 10, 1996). "In Unabom Case, Pain for Suspect's Family". The New York Times. Archived from the original on August 10, 2017. Retrieved July 5, 2008. https://www.nytimes.com/1996/04/10/us/in-unabom-case-pain-for-suspect-s-family.html
"Study finds a disputed Shakespeare play bears the master's mark". Los Angeles Times. 2015-04-10. Retrieved 2015-04-13. https://www.latimes.com/science/sciencenow/la-sci-sn-shakespeare-play-linguistic-analysis-20150410-story.html
Boyd, Ryan L.; Pennebaker, James W. (2015). "Did Shakespeare Write Double Falsehood? Identifying Individuals by Creating Psychological Signatures With Text Analysis". Psychological Science. 26 (5): 570–582. doi:10.1177/0956797614566658. PMID 25854277. S2CID 13022405. https://journals.sagepub.com/doi/full/10.1177/0956797614566658
Jackson, MacDonald P (April 27, 2016). Who Wrote "The Night Before Christmas"? Analyzing the Clement Clarke Moore Vs. Henry Livingston Question. McFarland & Co. ISBN 978-1476664439. 978-1476664439
Fuller, Simon; O'Sullivan, James (2017). "Structure over Style: Collaborative Authorship and the Revival of Literary Capitalism". Digital Humanities Quarterly. 11 (1). Retrieved April 20, 2017. http://www.digitalhumanities.org/dhq/vol/11/1/000286/000286.html
Lane, Anthony (June 18, 2018). "Bill Clinton and James Patterson's Concussive Collaboration". The New Yorker. Retrieved 2018-06-07. https://www.newyorker.com/magazine/2018/06/18/bill-clinton-and-james-pattersons-concussive-collaboration
"Why you don't need to write much to be the world's bestselling author". The Conversation. April 3, 2017. Retrieved April 20, 2017. https://theconversation.com/why-you-dont-need-to-write-much-to-be-the-worlds-bestselling-author-75261
O'Sullivan, James (2018-06-07). "Bill Clinton and James Patterson are co-authors – but who did the writing?". The Guardian. Retrieved 2018-06-07. https://www.theguardian.com/books/booksblog/2018/jun/07/bill-clinton-james-patterson-the-president-is-missing-co-authors
"The stylo for R package". Computational Stylistics Group. 2014-10-24. Archived from the original on 2014-12-21. Retrieved 2014-10-24. https://web.archive.org/web/20141221100741/https://sites.google.com/site/computationalstylistics/stylo
Savoy, Jacques (2018). "Is Starnone really the author behind Ferrante?". Digital Scholarship in the Humanities. 33 (4): 902–918. doi:10.1093/llc/fqy016. https://academic.oup.com/dsh/article/33/4/902/5001585
Reuell, Peter: "You say John, I say Paul. But what does stylometry say?"
https://news.harvard.edu/gazette/story/2018/09/harvard-statistician-examines-beatles-mystery/
Glickman, Mark; Brown, Jason; Song, Ryan (2019). "(A) Data in the Life: Authorship Attribution in Lennon-McCartney Songs". Harvard Data Science Review. 1 (1). arXiv:1906.05427. doi:10.1162/99608f92.130f856e. S2CID 189762434. https://doi.org/10.1162%2F99608f92.130f856e
The ETSO project. http://etso.es/
"Un monstruo de la naturaleza llamado Lope" [A monster of nature called Lope]. abc (in Spanish). 2018-11-28. Retrieved 2019-08-11. https://www.abc.es/cultura/abci-monstruo-naturaleza-llamado-lope-201811280249_noticia.html
"Rastreadores digitales en el Siglo de Oro" [Digital trackers in the Golden Age]. El Norte de Castilla (in Spanish). 2018-12-23. Retrieved 2019-08-11. https://www.elnortedecastilla.es/culturas/teatro/rastreadores-digitales-siglo-20181223133815-nt.html
Real, La Tribuna de Ciudad (2019-07-09). "Juan Ruiz de Alarcón aumenta su obra cinco siglos después" [Juan Ruiz de Alarcón increases his work five centuries after]. La Tribuna de Ciudad Real (in Spanish). Retrieved 2019-08-11. https://www.latribunadeciudadreal.es/noticia/Z6F846F17-0F70-FEF3-449E11A2A1C6BD36/201907/Juan-Ruiz-de-Alarcon-aumenta-su-obra-cinco-siglos-despues
Migueláñez, Daniel (28 July 2019). "El Holmes de la filología". PSOE Chamberí. No. 6. p. 8. Archived from the original on 2020-07-18. Retrieved 2019-08-11. https://web.archive.org/web/20200718220716/http://www.psoechamberi.com/esp/tags/suplemento/2019/n6_julio28/08_right_daniel_Miguela%C3%B1ez_01.html
"Sor Juana Inés centró las 42 Jornadas de Teatro Clásico". Lanza Digital (in European Spanish). 2019-07-14. Retrieved 2019-08-11. https://www.lanzadigital.com/cultura/sor-juana-ines-centro-las-42-jornadas-de-teatro-clasico/
"'La monja alférez' ya no es de Pérez de Montalbán, sino de Ruiz de Alarcón" ['La monja alférez' is no longer by Pérez de Montalbán, but by Ruiz de Alarcón]. El Norte de Castilla (in Spanish). 2019-07-10. Retrieved 2019-08-11. https://www.elnortedecastilla.es/culturas/teatro/monja-alferez-perez-20190710071933-nt.html
"Artificial intelligence helps find prominent Spanish playwright Lope de Vega as the author of a play from a manuscript written years after his death". newsendip.com. 31 January 2023. Retrieved 8 February 2023. https://www.newsendip.com/artificial-intelligence-helps-find-prominent-spanish-playwright-lope-de-vega-as-the-author-of-a-play-from-a-manuscript-written-years-after-his-death/
Jones, Sam (5 February 2023). "Artificial intelligence uncovers lost work by titan of Spain's 'Golden Age'". The Guardian. Retrieved 8 February 2023. https://www.theguardian.com/world/2023/feb/05/artificial-intelligence-uncovers-lost-work-by-titan-of-spains-golden-age
Morales, Manuel (2023-01-31). "La inteligencia artificial atribuye a Lope de Vega una obra anónima del fondo de manuscritos de la Biblioteca Nacional" [Artificial intelligence attributes an anonymous work from the National Library's manuscript collection to Lope de Vega]. El País (in Spanish). Retrieved 2023-02-08. https://elpais.com/cultura/2023-01-31/la-inteligencia-artificial-atribuye-a-lope-de-vega-una-obra-anonima-del-fondo-de-manuscritos-de-la-biblioteca-nacional.html
McCarthy, Rachel; O'Sullivan, James (2020). "Who wrote Wuthering Heights?". Digital Scholarship in the Humanities. 36 (2): 383–391. doi:10.1093/llc/fqaa031. hdl:10468/10194. https://academic.oup.com/dsh/article/doi/10.1093/llc/fqaa031/5862913
Ilsemann, Harmut (2020) "Phantom Marlowe: Paradigmenwechsel in Autorschaftsbestimmungen des englischen Renaissancedramas". Düren: Shaker, ISBN 978-3-8440-7412-3
Ilsemann, Harmut (2020). "The Marlowe corpus revisited". Digital Scholarship in the Humanities. 36 (2): 333–360. doi:10.1093/llc/fqaa010. https://academic.oup.com/dsh/article-abstract/36/2/333/5825419
Ilsemann, Harmut (2021). "A brief supplement to "The Marlowe Corpus Revisited" and Phantom Marlowe". Digital Scholarship in the Humanities. 37 (2): 462–468. doi:10.1093/llc/fqab078. https://academic.oup.com/dsh/advance-article-abstract/doi/10.1093/llc/fqab078/6397020
Rebora, Simone & Salgaro, Massimo (2022). "Is Felix Salten the Author of the Mutzenbacher Novel (1906)? Yes and no". Language and Literature: International Journal of Stylistics. 31 (2): 243–264. doi:10.1177/09639470221090384. S2CID 248135373.{{cite journal}}: CS1 maint: multiple names: authors list (link) https://journals.sagepub.com/doi/abs/10.1177/09639470221090384
AI avslöjar: Läckberg har antagligen spökskrivare – skjuter ned anklagelserna. Hufvudstadsbladet, 27 September 2023 (in Swedish). https://www.hbl.fi/artikel/fa306045-6387-5b08-8ef8-10c7c923acb0
"Läckberg om rykterna: 'Han petade i meningarna'". Hufvudstadsbladet (in Swedish). Helsingfors. 21 December 2023. p. 23. http://www.hbl.fi/artikel/84301c66-05a2-51ba-8edd-1078456be4be
Biber, Douglas. Variation across speech and writing. Cambridge University Press, 1991. /w/index.php?title=Douglas_Biber&action=edit&redlink=1
Karlgren, Jussi; Cutting, Douglass (1994). "Recognizing text genres with simple metrics using discriminant analysis". Proceedings of the 15th conference on Computational linguistics -. Vol. 2. p. 1071. arXiv:cmp-lg/9410008. Bibcode:1994cmp.lg...10008K. doi:10.3115/991250.991324. S2CID 1297432. /wiki/Jussi_Karlgren
Van Droogenbroeck F. J., "An essential rephrasing of the Zipf-Mandelbrot law to solve authorship attribution applications by Gaussian statistics" (2019). https://www.academia.edu/40029629
Matthews, Robert A. J.; Merriam, Thomas V. N (1993). "Neural Computation in Stylometry I: An Application to the Works of Shakespeare and Fletcher". Literary and Linguistic Computing. 8 (4): 203–209. doi:10.1093/llc/8.4.203. /wiki/Doi_(identifier)
Merriam, Thomas V. N; Matthews, Robert A. J. (1994). "Neural Computation in Stylometry II: An Application to the Works of Shakespeare and Marlowe". Literary and Linguistic Computing. 9 (1): 1–6. doi:10.1093/llc/9.1.1. /wiki/Doi_(identifier)
JF Hoorn; SL Frank; W Kowalczyk; F van der Ham (2012-09-03). "Neural network identification of poets using letter sequences". Literary and Linguistic Computing. 14 (3): 311–338. doi:10.1093/llc/14.3.311. /wiki/Doi_(identifier)
Brocardo, ML; Traore, I; Woungang, I; Obaidat, MS (2017). "Authorship verification using deep belief network systems". Int J Commun Syst. 30 (12): e3259. doi:10.1002/dac.3259. S2CID 40745740. /wiki/Doi_(identifier)
JF Hoorn; SL Frank; W Kowalczyk; F van der Ham (2012-09-03). "Neural network identification of poets using letter sequences". Literary and Linguistic Computing. 14 (3): 311–338. doi:10.1093/llc/14.3.311. /wiki/Doi_(identifier)
de Vel, O.; Anderson, A.; Corney, M.; Mohay, G. (2001-12-01). "Mining e-Mail Content for Author Identification Forensics". SIGMOD Rec. 30 (4): 55–64. CiteSeerX 10.1.1.408.4231. doi:10.1145/604264.604272. ISSN 0163-5808. S2CID 1623521. /wiki/CiteSeerX_(identifier)
Argamon, Shlomo; Koppel, Moshe; Pennebaker, James W.; Schler, Jonathan (2009-02-01). "Automatically Profiling the Author of an Anonymous Text". Commun. ACM. 52 (2): 119–123. CiteSeerX 10.1.1.136.9952. doi:10.1145/1461928.1461959. ISSN 0001-0782. S2CID 5413411. /wiki/CiteSeerX_(identifier)
"Classification of Instant Messaging Communications for Forensics Analysis – TechRepublic". TechRepublic. Retrieved 2016-01-26. https://www.techrepublic.com/resource-library/whitepapers/classification-of-instant-messaging-communications-for-forensics-analysis/
Zhou, L.; Zhang, Dongsong (2004-01-01). "Can online behavior unveil deceivers? - an exploratory investigation of deception in instant messaging". 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the. pp. 9 pp.–. doi:10.1109/HICSS.2004.1265079. ISBN 978-0-7695-2056-8. S2CID 7154702. 978-0-7695-2056-8