As the name suggests, Natural Language Processing (NLP) is the task of analysing or processing the humanly spoken languages by a machine (computer). Even though this task can be performed at a much faster rate by a computer, it still can't achieve the level of efficiency required to replace a human and basically that's what separates humans from machines - the act of natural understanding.
The act of natural understanding can be understood and very much related from our own childhood experience. As a child, we didn't know the grammar or the semantics involved in the day to day spoken languages, by means of which, our elders used to interact with us. Whatever we used to hear or get ourselves processed with, we used to learn and try to apply it. Same is in the case the machines (with lots of modified and complex rules).
In our day to day life we make use of lots of sentences which are not that perfect in terms of grammar and vocabulary rules but still we interpret them in the correct manner. This happens because we humans have our understanding level up to that point where we can interpret conversations (decode them) based on our past experiences. But its not that easy for a machine to understand the various natural forms involved in a natural languages such as: sarcasm, slang words, emotions, pun, anger, politeness, love, care, etc. without explicitly mentioning them (in terms of rules).
This difference in the performance of a human and a machine to process a natural language is not just dependent on speed but mostly on efficiency. One of the major causes behind this is the ambiguity involved in the natural languages.
Ambiguity is an intrinsic characteristic of human conversations and one that is particularly challenging in Natural Language Understanding(NLU) scenarios. Ambiguity is one of those areas of cognitive sciences that does't have a well-defined solution.
Stages of NLP and associated ambiguities:
Below we have discussed various stages of NLP and the associated ambiguities with each of them.
Traditionally, NLP, for both spoken and written language has been regarded as consisting of the following stages:
-
Phonology and Phonetics: (Processing of sound)
-
Morphology: (Processing of word forms)
-
Lexicon: (Storage of words and associated knowledge)
-
Parsing: (Processing of structure)
-
Semantics: (Processing of meaning)
-
Pragmatics: (Processing of user intention, modeling, etc.)
-
Discourse: (Processing of connected text)
Let's go through each stage one by one along with the associated ambiguity with the help of examples. (we have tried to come up with the simplest examples to help you understand the concepts).
1. Phonology and Phonetics: (Processing of sound)
Two words are said to be homophones of each other when they sound the same, spelling may or may not be the same, but their meanings are different. Such words are understood as part of the context they are used in a particular sentence.
For Example:
- Homophones with the same spellings:
-
Mean: (mathematical average) and Mean: (not nice)
-
Book: (something to read from) and Book: (making a reservation)
-
Homophones with different spellings:
-
Week: (a period of seven days) and Weak: (having little physical strength or energy)
-
Cell: (a prison cell) and Sell: (to give or hand over something in exchange for money)
-
Word boundary detection is a challenge in case of rapid speech:
-
Please get me a part from it. As a unit, a part is ambiguous, meaning a piece of something or to get apart (separated).
2. Morphology: (Processing of word forms)
This form of ambiguity arises due to the further processing carried out on the lexemes (root words) to make them of use in a particular sentence.
For Example:
I like people who make me nuts.
-
Here, the phrase make me nuts is ambiguous as it can be interpreted as two different situations: 1) I like people who make me nuts (some dish made from peanuts that I like to eat) or 2) I like people who make me nuts (to make someone very annoyed).
-
This ambiguity is arised due to the word nuts which is the plural form of the word nut (peanut) and has its own standalone meaning as well (drive someone nuts).This form of morphological process is called as Inflection.
-
Inflection is the process of changing the form of a word so that it expresses information such as number, person, case, gender, tense, mood and aspect, but the syntactic category of the word remains unchanged.
-
Various other morphological processes are:
-
Derivation: Adding a derivational morpheme often changes the grammatical category or part of speech of the root word to which it is added. It also changes the syntactic category of a word. For example: adding ful to the noun meaning changes the word into an adjective (meaningful).
-
Cliticization: a clitic is a word or a part of a word that cannot exist on its own and needs a neighbouring word for its exitance.
-
Semi-affixes: Semi-affixes are morphemes that are bound but which retain a word-like quality. For example: Indo-Pak, Anti-national, etc.
3. Lexicon: (Storage of words and associated knowledge)
A lexicon is a collection of information about the words of a language about the lexical categories to which they belong.
For examples:
-
I have a burning desire within me.
-
Sachin Tendulkar is on a break after going through such a hectic schedule.
4. Parsing: (Processing of structure)
Parsing is the process to determine the hierarchical structure behind the linear sequence of words.
The type of ambiguity involved in parsing is called structural ambiguity. Structural ambiguity is of two types:
-
Scope ambiguity: ambiguity involved with the scope of the words involved in a sentence. For example:
-
Don't forget to bring green candles and balloons for Aditya's birthday party. The scope of the color "green" is ambiguous in the above sentence. The hierarchical structure between the above linear sentence can be: Don't forget to bring (green candles and balloons) for Aditya's birthday party. or Don't forget to bring (green candles) and balloons for Aditya's birthday party.
-
Attachment ambiguity: Let's take a few examples:
-
Aditya was travelling with Abhishek in his car. It is not clear who owns the car, Aditya or Abhishek? The attachment of the phrase in his car is ambiguous.
-
I saw Abhishek riding on a bike. Who was riding the bike: I or Abhishek? The attachment of the phrase riding on a bike is ambiguous. (attachment ambiguity; phrase)
-
Abhishek commented on a Youtube video that1 he enjoyed that2 they were providing good quality content to the viewers. The sentence has two meanings:
-
Abhishek commented on a Youtube video the FACT that he enjoyed the good quality content provided by them to the viewers, and
-
Abhishek commented on a Youtube video WHICH he enjoyed that they were providing good quality content to the viewers.
The ambiguity arises from the dual role of that, viz., relative pronoun or complementizer. In the former situation that1 attaches to the Youtube video and in the latter situation the that1 attaches to commented.
5. Semantics: (Processing of meaning)
Once the word is formed and it’s structure has been detected, sentence processing devotes itself to meaning extraction.
Even after the syntax and the meanings of the individual words have been resolved, there are two ways of reading the sentence, which results in an ambiguity, which is shown with the help on an exmple below:
Example of Semantic ambiguity:
Aditya and Shraddha got married last month.
The ambiguity involved here is, are Aditya and Shraddha married to each other or to two different people?
6. Pragmatics: (Processing of user intention, modeling)
Pragmatic Analysis is part of the process of extracting information from text. Specifically, it’s the portion that focuses on taking a structures set of text and figuring out what the actual meaning was. The below mentioned examples of humorous exchange illustrates the nature of the problem.
Example 1:
Abhishek: Aditya, my cell phone was fully discharged so I have put it on charging but I am not sure about switching on the button; please go upstairs and confirm it for me.
Aditya (running upstairs and coming back panting): Abhishek, you have forgot to switch on the button.
Example 2:
Abhishek: Aditya, we both have travelled around 10 kms from our house; leaving it empty; and I am not sure about locking the front door. Please go and check whether it is locked or not to save it from thieves.
Aditya (on returning back from home): Abhishek, we have left the door open.
7. Discourse: (Processing of connected text)
Discourse is a coherent structured group of sentences. In this, based on the previous set of sentences, a hypothesis is built and applied on the upcoming sentences. For example:
Sentence-1: Abhishek was coming dejected from the school
(who is Abhishek: most likely a student?)
Sentence-2: He could not control the class
(who is Abhishek now? Most likely the teacher?)
Sentence-3: Teacher should not have made him responsible
(who is Abhishek now? Most likely a student again, albeit a special student- the monitor?)
Sentence-4: After all he is just a assistant teacher
(all previous hypotheses are thrown away!).
This is the nature of discourse processing. In addition to ellipsis, coreference, sense and structure disambiguation and so on, an incremental building up of the shared world has to carried out.
8. Textual Humour and Ambiguity:
Ambiguity arises in Humour due to the difference in the mindset and sarcastic behaviour. Some people understand the pun involved in the sentence while other take it seriously and try to get out some sense out of it logically.
Example of Humour and lexical ambiguity:
Salena: What do you do?
Aditya: I race cars.
Salena: Do you win many races?
Aditya: No, the cars are must faster.
The ambiguity of the words "race cars" (driving cars vs running against cars) and the two different meanings picked by Salena and Aditya gave rise to the humour.
Conclusion:
In this article we have discussed Natural Language Processing from the perspective of ambiguity, multilinguality and resource constraint. First we described different kinds of ambiguity that obtain in NLP starting from the lowest level of processing, viz., morphology to the highest level, viz., pragmatics and discourse. Then we took up one specific ambiguity, viz., word sense and described ways of tackling it under constraints of resource, viz., annotated corpora. Multilinguality was leveraged in the sense of projecting sense distributions in the corpora from one language to another and wordnet parameters like distance between senses. Performance with and without projection were compared, and the idea of projection seemed well founded.
We Hope you liked it!
Happy Learning : )
You may also like: