While working on applications, you might have a requirement to have unique values in a Python list. But if our input dataset or data generated contains duplicates, then what? In that case, we will have to remove the duplicate values. There are multiple ways to do so, below we have listed some of them:
1. Converting a list to a set
A list can be converted to a set object, which will save only the unique numbers from a list and will automatically remove the duplicate entries. This is because the set has been designed to behave in such a manner. It has been programmed to store unique values in an unordered fashion.
Time for an example:
my_list = [0,1,2,3,2,8,4,5,7,0,7,1,5]
list(set(my_list))
Output:
[0, 1, 2, 3, 4, 5, 7, 8]
Note: Once the list is converted to a set, the order is not maintained. This is because the characteristics of the set indicate that it doesn't retain the order of data.
2. What if I wish to maintain the order of the list even after deleting the duplicates?
This can be done too, but a different approach has to be taken. OrderedDict comes to the rescue for this.
Okay, so what is an OrderedDict? It is similar to the dictionary data structure but the only difference is that OrderedDict remembers the order in which the keys are inserted. We will see more about OrderedDict in another post.
Time for an example:
from collections import OrderedDict
my_list = [0,1,2,3,8,2,4,5,7,0,7,1,5]
print(list(OrderedDict.fromkeys(my_list)))
Output
[0, 1, 2, 3, 8, 4, 5, 7]
Note: This feature of maintaining the order of a data structure using OrderedDict was introduced in Python 3.7
3. Creating a new list and keeping a tab on the number of occurrence of every element in the list
In this technique, we will be doing everything ourselves,
Time for an example
my_list = [0,1,2,3,2,4,5,7,0,7,1,5]
my_new_list = []
for i in my_list:
if i not in my_new_list:
my_new_list.append(i)
print(my_new_list)
Output:
[0, 1, 2, 3, 4, 5, 7]
Note: This method works fine, but has O(n^2)
time complexity and becomes slow as the size of the list grows. This isn't an ideal method in cases where the size of list grows extensively.
4. Using a dictionary
This approach is similar to OrderedDict, and this new feature of retaining order in a dictionary too was introduced in Python 3.6. It was made to be ordered and compact in version 3.6 but this was applicable to the CPython and PyPy only. Moving forward with version 3.7, the dictionary was made to be ordered in all implementations.
Time for an example:
my_list = [0,1,2,3,2,4,5,7,0,7,1,5]
print(list(dict.fromkeys(my_list)))
Output:
[0, 1, 2, 3, 4, 5, 7]
5. Using enumerate
method
Python's built-in enumerate
method can be used to keep a check on the count of elements put inside the data structure. Its time complexity is O(n^2)
. This time complexity is something that needs to be kept in mind while implementing this approach in an application.
Time for an example:
my_list = [0,1,2,3,2,4,5,7,0,7,1,5]
my_new_list=[element for n,element in enumerate(my_list) if element not in my_list[:n]]
print(my_new_list)
Output:
[0, 1, 2, 3, 4, 5, 7]
Note: The enumerate
method takes 2 parameters, out of which the second parameter is optional. The first parameter is an iterable data structure, and the second parameter is the index from where the counter has to be started. If the second parameter hasn't been provided, its default value is taken as 0.
6. Using list comprehension
This isn't a great approach since it requires the creation of a new list, which means there is a space constraint. As the number of elements in the list increases, In this approach too, the time complexity is O(n^2)
which might not be desirable.
Time for an example:
my_list = [0,1,2,3,2,4,5,7,0,7,1,5]
my_new_list = []
[my_new_list.append(x) for x in my_list if x not in my_new_list]
print(my_new_list)
Output:
[0, 1, 2, 3, 4, 5, 7]
Conclusion
In this post, we understood how a list could be programmed to contain unique elements. Do let us know if you have an approach different from the above-mentioned ones. Happy Coding!