
Linguistic Signatures and their Applications
03/29/2010 10:58 pm By Teodor Filimon | Articles: 7
Although the subject has been touched in scientific literature, even if only remotely or concerning specific applications, there isn't really a detailed study of linguistic signatures or a holistic view on its practicality yet. In short, from given texts certain characteristics can be extracted. Together they form a 'signature'. Computing these characteristics can be done by determining the frequency of all or certain words, their position in the sentence or phrase, their position compared to other predefined or automatically detected keywords and other methods. Also, visualization techniques pose an interesting research challenge - for example, creating differentiations based on language or the purpose of the representation.
Such signatures could have many uses, commercial or otherwise. Here are a few:
- profiling potential or current employees - e.g. useful on employment through the analysis of the cover letter (in this case the employer might be interested in the presence or absence of deception, or in certain personality traits)
- forensics or other types of investigations [1] - e.g. comparison of multiple signatures to see if the texts they originated from have the same author, detection of hidden purpose in messages, etc.
- automatic classification of articles / documentation based on the signature of technical, scientific domains
- spam filters - by analyzing the perceived relevance of the email content
Future research has the potential of being very successful. For example, it looks like humans can't normally detect deception at a rate higher than approximately 50% (in the case of trained persons) [2] which basically means this area is left to chance. A lot of methods (with a strong psychological base) have surfaced lately but the efficiency of most is still being assessed in current studies. One of the many challenges is, for example, establishing how to evaluate the influence of factors such as the culture or social category of authors, or even which of these intrinsic or extrinsic traits are relevant to a given context.
Visual representation of such signatures, as i said earlier, can be very intuitive. Here is an example:

Fig. 1 - Data visualization generated by the Textour app [3]
Apps designed for text analysis purposes could also have data mining and neural abilities in the future - once they have a training set with their extracted signatures, they could be used to make interpretations on the nature of the author or the message. Current applications built for advanced profiling of authors more or less rely on additional manual preprocessing of the input text (e.g. syntactic or morphologic analysis) and on dictionaries which store preset meaning and training results. Fuzzy logic is another instrument that is beginning to take its rightful place in this area.
In a way, since texts are more often typed on a keyboard rather than handwritten these days, expertize such as the one described in this article may have a very important role. Expertize similar to this can also be used for therapeutic purposes where communicational disabilities are concerned. Even if we all use the same grammar and our native language seems to be the same for everyone, the way we communicate still remains particular to each individual. Carole Chaski [1] compared this to DNA, 98% of which is shared among all of us - the remainder of 2% is still enough to provide the diversity we see.
Resources
[2] Modelling Deception Detection in Text, Gupta Smita
[3] Textour - an app which performs text analysis



Post new comment