If you are already using Python 3.7, you may be aware of the new features, one of them being the dataclass. But if you haven’t updated to the new version yet, here is news for you.
Python 3.7 has introduced a new feature, the dataclass.
But wait, even in Python 3.6, the dataclasses can be implemented by installing it with the help of the following statement:
pip install dataclasses
So what is a dataclass?
Dataclasses are basically Python classes that store data objects. Data objects include (but not limited to) specific data types, like a number or a class instance.
They come with already implemented basic functionality set like instantiation, print method, and comparison of instances. Dataclass can be created by specifying the @dataclass decorator with a normal class.
Why a dataclass now?
The whole point of creating the Python language was to make it more readable. In Python, readability counts. Due to the same reason, dataclasses were created. You will see what we mean by readability in a few minutes.
How can I differentiate between a regular class and a dataclass?
This is quite simple. Python comes with a dataclass decorator (@dataclass
) that indicates that the class is a dataclass. This is usually done in the following way:
from dataclasses import dataclass
@dataclass
class class_name:
# class definition
Now let us compare a normal class and a dataclass to see what a dataclass has to offer to us.
Normal Python class
A normal class is implemented by using the class
keyword followed by the name of the class.
class Website:
def __init__(self, val):
self.val = val
# creating class object
class_instance = Website(12)
class_instance.val
Output:
12
Python Dataclass
The dataclass is indicated with the help of @dataclass
decorator.
from dataclasses import dataclass
@dataclass
class Website:
val:float
class_instance = Website(12.21)
print(class_instance)
Output:
Website(val=12.21)
Comparing the above two classes, the following can be inferred:
-
The usage of __init__
in the dataclass has been dismissed(not required).
-
The variables inside the class have been defined with their type in dataclass, as opposed to using self
(representing the object of class) to declare it in normal class. This method of indicating the type of value is known as type hinting.
-
The output clearly shows that the value belongs to the class Website.
In addition to this, default values can be specified in the dataclass's class members.
Under the hood, the dataclass implements a __repr__()
method that helps present the object of the class in a readable string format. It also implements an __eq__()
method which comes into play when we compare two objects of the dataclass. We will cover this in details below.
Well, this is simple. Is this the only reason I should use a dataclass?
No, this isn't the only reason. In addition to readability, the dataclasses (as mentioned previously) have pre-implemented methods. This means such methods don't need to be explicitly defined in a dataclass.
Dataclasses can be represented in different ways. Below is a demonstration:
import dataclasses
@dataclasses.dataclass
# or @dataclasses.dataclass()
class Website:
val:int = 0
The init
, repr
and eq
methods are set to True automatically when a dataclass is implemented. In other words, it is interpreted as follows,
@dataclasses.dataclass(init=True, repr=True, eq=True)
Let's cover about these special methods one by one.
Representation:
When we create a default class, we generally add only the __init__
method to it for initializing the object of the class,
class Website:
def __init__(self, val):
self.val = val
class_instance = Website(12)
print(class_instance)
print(class_instance.val)
Output:
<__main__.Website object at 0x000002C0B4FE4E80>
12
What do you understand from the above code?
You see that the value of the Website instance is 12. But what about the line, i.e <__main__.Website object at 0x000002C0B4FE4E80>?
Well that is how python displays an object of a class.
Hence making debugging tough since the object's representation utility isn't specified in a normal class. A neat representation of the data in a normal class needs to be implemented with the help of __repr__
method. See the below code to understand the implementation of the __repr__
method in a normal class.
class Website:
def __init__(self, val):
self.val = val
# special method __repr__
def __repr__(self):
return self.val
class_instance = Website('12')
print(class_instance)
Output:
12
This means the __repr__
method must be explicitly defined in normal classes. On the other hand, these methods come already implemented in a dataclass.
Consider the following code of a dataclass:
From the below code, it can be seen that the __repr__
method doesn't have to be explicitly defined.
from dataclasses import dataclass
@dataclass
class data_class():
value : int
class_instance = data_class(12)
print(class_instance)
print(type(class_instance))
Output:
data_class(value=12)
<class '__main__.data_class'>
The above functionality of representation, as well as other methods, can be included by default in a dataclass by specifying the appropriate keyword to True.
If we want to exclude the default __repr__
method from our dataclass, we can do so by using the following code:
from dataclasses import dataclass
@dataclass(repr=False)
class data_class():
value:int
class_instance = data_class(12)
print(class_instance)
Output:
<__main__.data_class object at 0x000002CFG4FE4E80>
Comparing Objects:
In a dataclass, the __eq__
method is implemented, which is used for equating two objects of the class.
We will compare the implementation of ==
(checking for the equality of two objects) in a normal class and a dataclass.
from dataclasses import dataclass
@dataclass
class data_class():
value : int
class_instance = data_class(12)
print(class_instance)
print(type(class_instance))
class normal_class():
def __init__(self, val):
self.val = val
#Two objects instantiated for the dataclass
instance_one = data_class(12)
instance_two = data_class(12)
#Two objects instantiated for the normal class
instance_three = normal_class(12)
instance_four = normal_class(12)
print("DataClass Equal:", instance_one == instance_two)
print("Normal Class Equal:", instance_three == instance_four)
Output:
data_class(value=12)
<class '__main__.data_class'>
DataClass Equal: True
Normal Class Equal: False
The last two lines of this output might seem confusing. Here is the explanation for this behaviour of dataclass and normal class.
The equality operator basically checks whether both the objects refer to the same memory location. But this isn't the case since two different instances of the same class will obviously have different locations. Hence, the result is False in case of a normal class.
On the other hand, when the ==
is used to compare objects of a dataclass, it checks to see if the contents of both the instances of the same class are the same or not. Since both instances contain the same data, it returns True.
When a dataclass generates an __eq__
method, it compares 2 instances of the same class. This is done by comparing the attributes of one class instance (which is in the form of a tuple) with the attributes of the other instance of the class.
If we have a complex class logic and we want to implement our own logic for equating class objects for our dataclass, we can define our dataclass and specify eq=False
to not include the __eq__
method by default.
Note: The ordering methods(which include <
, >
, <=
, and >=
) can be implemented by setting the keyword order
to True
, i.e order=True
while mentioning the dataclass decorator.
Conclusion
In today's post, we understood what a dataclass is, its significance, its usage and its advantages over normal classes. In the upcoming posts, we will dive deeper into dataclasses and understand more about them.
You may also like: