Binary Types
This is a legacy Apache Ignite documentation
The new documentation is hosted here: https://ignite.apache.org/docs/latest/
Complex object (that is often called ‘Binary object’) is an Ignite data type, that is designed to represent a Java class. It has the following features:
- A unique ID (type id), which is derived from a class name (type name).
- One or more associated schemas that describes its inner structure (the order, names and types of its fields). Each schema have its own ID.
- An optional version number, that is aimed towards the end users to help them distinguish between objects of the same type, serialized with different schemas.
Unfortunately, these distinctive features of the Complex object have little to no meaning outside of Java language. Python class can not be defined by its name (it is not unique), ID (object ID in Python is volatile; in CPython it is just a pointer in the interpreter’s memory heap), or complex of its fields (they do not have an associated data types, moreover, they can be added or deleted in run-time). For the pyignite
user it means that for all purposes of storing native Python data it is better to use Ignite CollectionObject
or MapObject
data types.
However, for interoperability purposes, pyignite
has a mechanism of creating special Python classes to read or write Complex objects. These classes have an interface, that simulates all the features of the Complex object: type name, type ID, schema, schema ID, and version number.
Assuming that one concrete class for representing one Complex object can severely limit the user’s data manipulation capabilities, all the functionality said above is implemented through the metaclass: GenericObjectMeta
. This metaclass is used automatically when reading Complex objects.
from pyignite import Client, GenericObjectMeta from pyignite.datatypes import * client = Client() client.connect('localhost', 10800) person_cache = client.get_or_create_cache('person') person = person_cache.get(1) print(person.__class__.__name__) # Person print(person) # Person(first_name='Ivan', last_name='Ivanov', age=33, version=1)
Here you can see how GenericObjectMeta
uses attrs
package internally for creating nice __init__() and __repr__()
methods.
You can reuse the autogenerated class for subsequent writes:
Person = person.__class__ person_cache.put( 1, Person(first_name='Ivan', last_name='Ivanov', age=33) )
GenericObjectMeta
can also be used directly for creating custom classes:
class Person(metaclass=GenericObjectMeta, schema=OrderedDict([ ('first_name', String), ('last_name', String), ('age', IntObject), ])): pass
Note how the Person class is defined. schema is a GenericObjectMeta
metaclass parameter. Another important GenericObjectMeta parameter is a type_name, but it is optional and defaults to the class name (‘Person’ in our example).
Note also, that Person do not have to define its own attributes, methods and properties (pass), although it is completely possible.
Now, when your custom Person class is created, you are ready to send data to Ignite server using its objects. The client will implicitly register your class as soon as the first Complex object is sent. If you intend to use your custom class for reading existing Complex objects’ values before all, you must register said class explicitly with your client:
client.register_binary_type(Person)
Now, when we dealt with the basics of pyignite implementation of Complex Objects, let us move on to more elaborate examples.
Read
Ignite SQL uses Complex objects internally to represent keys and rows in SQL tables. Normally SQL data is accessed via queries (see SQL), so we will consider the following example solely for the demonstration of how Binary objects (not Ignite SQL) work.
In the previous examples we created some SQL tables. Let us do it again and examine the Ignite storage afterwards.
result = client.get_cache_names() print(result) # [ # 'SQL_PUBLIC_CITY', # 'SQL_PUBLIC_COUNTRY', # 'PUBLIC', # 'SQL_PUBLIC_COUNTRYLANGUAGE' # ]
We can see that Ignite created a cache for each of our tables. The caches are conveniently named using SQL_<schema name>_<table name>
pattern.
Now let us examine a configuration of a cache that contains SQL data using a settings property.
city_cache = client.get_or_create_cache('SQL_PUBLIC_CITY') print(city_cache.settings[PROP_NAME]) # 'SQL_PUBLIC_CITY' print(city_cache.settings[PROP_QUERY_ENTITIES]) # { # 'key_type_name': ( # 'SQL_PUBLIC_CITY_9ac8e17a_2f99_45b7_958e_06da32882e9d_KEY' # ), # 'value_type_name': ( # 'SQL_PUBLIC_CITY_9ac8e17a_2f99_45b7_958e_06da32882e9d' # ), # 'table_name': 'CITY', # 'query_fields': [ # ... # ], # 'field_name_aliases': [ # ... # ], # 'query_indexes': [] # }
The values of value_type_name
and key_type_name
are names of the binary types. The City table’s key fields are stored using key_type_name
type, and the other fields − value_type_name
type.
Now when we have the cache, in which the SQL data resides, and the names of the key and value data types, we can read the data without using SQL functions and verify the correctness of the result.
What we see is a tuple of key and value, extracted from the cache. Both key and value represent Complex objects. The dataclass names are the same as the value_type_name
and key_type_name
cache settings. The objects’ fields correspond to the SQL query.
result = city_cache.scan() print(next(result)) # ( # SQL_PUBLIC_CITY_6fe650e1_700f_4e74_867d_58f52f433c43_KEY( # ID=1890, # COUNTRYCODE='CHN', # version=1 # ), # SQL_PUBLIC_CITY_6fe650e1_700f_4e74_867d_58f52f433c43( # NAME='Shanghai', # DISTRICT='Shanghai', # POPULATION=9696300, # version=1 # ) # )
Create
Now, that we aware of the internal structure of the Ignite SQL storage, we can create a table and put data in it using only key-value functions.
For example, let us create a table to register High School students: a rough equivalent of the following SQL DDL statement:
CREATE TABLE Student ( sid CHAR(9), name VARCHAR(20), login CHAR(8), age INTEGER(11), gpa REAL )
These are the necessary steps to perform the task.
- Create table cache.
client = Client() client.connect('127.0.0.1', 10800) student_cache = client.create_cache({ PROP_NAME: 'SQL_PUBLIC_STUDENT', PROP_SQL_SCHEMA: 'PUBLIC', PROP_QUERY_ENTITIES: [ { 'table_name': 'Student'.upper(), 'key_field_name': 'SID', 'key_type_name': 'java.lang.Integer', 'field_name_aliases': [], 'query_fields': [ { 'name': 'SID', 'type_name': 'java.lang.Integer', 'is_key_field': True, 'is_notnull_constraint_field': True, }, { 'name': 'NAME', 'type_name': 'java.lang.String', }, { 'name': 'LOGIN', 'type_name': 'java.lang.String', }, { 'name': 'AGE', 'type_name': 'java.lang.Integer', }, { 'name': 'GPA', 'type_name': 'java.math.Double', }, ], 'query_indexes': [], 'value_type_name': 'SQL_PUBLIC_STUDENT_TYPE', 'value_field_name': None, }, ], })
- Define Complex object data class.
class Student( metaclass=GenericObjectMeta, type_name='SQL_PUBLIC_STUDENT_TYPE', schema=OrderedDict([ ('NAME', String), ('LOGIN', String), ('AGE', IntObject), ('GPA', DoubleObject), ]) ): pass
- Insert row.
student_cache.put( 1, Student(LOGIN='jdoe', NAME='John Doe', AGE=17, GPA=4.25), key_hint=IntObject )
Now let us make sure that our cache really can be used with SQL functions.
result = client.sql( r'SELECT * FROM Student', include_field_names=True ) print(next(result)) # ['SID', 'NAME', 'LOGIN', 'AGE', 'GPA'] print(*result) # [1, 'John Doe', 'jdoe', 17, 4.25]
Note, however, that the cache we create can not be dropped with DDL command.
# DROP_QUERY = 'DROP TABLE Student' # client.sql(DROP_QUERY) # # pyignite.exceptions.SQLError: class org.apache.ignite.IgniteCheckedException: # Only cache created with CREATE TABLE may be removed with DROP TABLE # [cacheName=SQL_PUBLIC_STUDENT]
It should be deleted as any other key-value cache.
student_cache.destroy()
Migrate
Suppose we have an accounting app that stores its data in key-value format. Our task would be to introduce the following changes to the original expense voucher’s format and data:
- rename date to expense_date
- add report_date
- set report_date to the current date if reported is True, None if False
- delete reported
First get the vouchers’ cache.
client = Client() client.connect('127.0.0.1', 10800) accounting = client.get_or_create_cache('accounting')
If you do not store the schema of the Complex object in code, you can obtain it as a dataclass property with query_binary_type()
method.
data_classes = client.query_binary_type('ExpenseVoucher') print(data_classes) # { # -231598180: <class '__main__.ExpenseVoucher'> # } s_id, data_class = data_classes.popitem() schema = data_class.schema
Let us modify the schema and create a new Complex object class with an updated schema.
schema['expense_date'] = schema['date'] del schema['date'] schema['report_date'] = DateObject del schema['reported'] schema['sum'] = DecimalObject # define new data class class ExpenseVoucherV2( metaclass=GenericObjectMeta, type_name='ExpenseVoucher', schema=schema, ): pass
Now migrate the data from the old schema to the new one.
def migrate(cache, data, new_class): """ Migrate given data pages. """ for key, old_value in data: # read data print(old_value) # ExpenseVoucher( # date=datetime(2017, 9, 21, 0, 0), # reported=True, # purpose='Praesent eget fermentum massa', # sum=Decimal('666.67'), # recipient='John Doe', # cashier_id=8, # version=1 # ) # create new binary object new_value = new_class() # process data new_value.sum = old_value.sum new_value.purpose = old_value.purpose new_value.recipient = old_value.recipient new_value.cashier_id = old_value.cashier_id new_value.expense_date = old_value.date new_value.report_date = date.today() if old_value.reported else None # replace data cache.put(key, new_value) # verify data verify = cache.get(key) print(verify) # ExpenseVoucherV2( # purpose='Praesent eget fermentum massa', # sum=Decimal('666.67'), # recipient='John Doe', # cashier_id=8, # expense_date=datetime(2017, 9, 21, 0, 0), # report_date=datetime(2018, 8, 29, 0, 0), # version=1, # ) # migrate data result = accounting.scan() migrate(accounting, result, ExpenseVoucherV2) # cleanup accounting.destroy() client.close()
At this moment all the fields, defined in both of our schemas, can be available in the resulting binary object, depending on which schema was used when writing it using put()
or similar methods. Ignite Binary API do not have the method to delete Complex object schema; all the schemas ever defined will stay in cluster until its shutdown.
This versioning mechanism is quite simple and robust, but it has its limitations. The main thing is: you can not change the type of the existing field. If you try, you will be greeted with the following message:
org.apache.ignite.binary.BinaryObjectException: Wrong value has been set [typeName=SomeType, fieldName=f1, fieldType=String, assignedValueType=int]
As an alternative, you can rename the field or create a new Complex object.
Python example files
Python thin client contains fully workable examples to demonstrate the behavior of the client.
Updated 3 months ago