Avro in Java
Generic Record in Avro
A GenericRecord is used to create an avro object from a schema, the schema being referenced as
- A file
- A string
It's not the most recommended way of creating Avro objects because things can fail at runtime.
Schema.Parser parser = new Schema.Parser();
Schema schema = parser.parse("") //Fill with string of AVRO schema
GenericRecordBuilder customerBuilder = new GenericRecordBuilder(schema);
customerBuilder.set("first_name", "John");
// etc.
GenericData.Record customer = customerBuilder.build();
Specific Record in Avro
A SpecificRecord is also an Avro object but it is obtained using code generation
Avro Tools
Avro Tools is a CLI that can be used to read avro files
Example:
java -jar avro-tools-1.8.2.jar tojson --pretty customer-generic.avro
Result will print out the byte content of the avro file to json
You can also use getschema which will return the schema in the avro file instead
Reflection in Avro
You can use a java class and Reflect.data.get().getSchema(ReflectedCustomer.class); To generate a schema from a class in java
Schema Evolution - Theory
4 kinds of Schema Evolution
- Backward: a backward compatible change is when a new schema can be used to read old data
- Forward: a forward compatible change is when an old schema can be used to read new data
- Full: which is both backward and forward
- Breaking: which is none of the above
Schema Evolution Compatibility Matrix - 202404101240
Backward:
Before
{
"type" : "record",
"namespace" : "com.example",
"name" : "CustomerAddress",
"fields" : [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["int", "string"] },
{ "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
{ "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis" }
]
}
After
{
"type" : "record",
"namespace" : "com.example",
"name" : "CustomerAddress",
"fields" : [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["int", "string"] },
{ "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
{ "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis" },
{ "name": "isCool", "type" : "boolean", "default" : true } // new! with default
]
}
Forward
Before
{
"type" : "record",
"namespace" : "com.example",
"name" : "CustomerAddress",
"fields" : [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["int", "string"] },
{ "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
{ "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis" }
]
}
After
{
"type" : "record",
"namespace" : "com.example",
"name" : "CustomerAddress",
"fields" : [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["int", "string"] },
{ "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
{ "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis" },
{ "name": "isCool", "type" : "boolean" } // new! without default
]
}
Full
Before
{
"type" : "record",
"namespace" : "com.example",
"name" : "CustomerAddress",
"fields" : [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["int", "string"] },
{ "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
{ "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis" }
]
}
After
{
"type" : "record",
"namespace" : "com.example",
"name" : "CustomerAddress",
"fields" : [
{ "name": "address", "type": "string" },
{ "name": "city", "type": "string" },
{ "name": "postcode", "type": ["int", "string"] },
{ "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
{ "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis" },
{ "name": "isCool", "type" : "boolean" } // new! with default
]
}
- Only add fields with defaults
- Only remove fields that have defaults
- Target full compatibility if possible
Breaking
- Adding/Removing elements from an Enum
- Changing the type of a field(string => int)
- Renaming a required field
Advice
- Make your primary key required
- Give default values to all the fields that could be removed in the future
- Be very careful when using Enums as they can't evolve over time
- Don't rename fields, add aliases instead
- When evolving a schema ALWAYS give default values
- When evolving a schema NEVER delete a required field