Avro Schemas

What is Avro?

Avro is defined by a schema written in JSON

Advantages

Disadvantages

Avro is the only supported data format from Confluent Schema Registry

Avro Primitive Types

.avsc are Avro schema files

Example:

{"type" : "string"}

Avro Record Schema Definition

Defined using JSON

Fields include:

Example:

{
  "type" : "record",
  "name" : "Customer",
  "namespace" : "com.example",
  "doc" : "Avro schema for our Customer",
  "fields" : [
	  {"name": "first_name", "type" : "string", "doc" : "First name of the customer" },
	  {"name": "last_name", "type" : "string", "doc" : "Last name of the customer" },
	  {"name": "age", "type" : "int", "doc" : "Age of the customer" },
	  {"name": "height", "type" : "float", "doc" : "Height in cms" },
	  {"name": "weight", "type" : "float", "doc" : "Weight in kgs" },
	  {"name": "automated_email", "type" : "boolean", "default" : true, "doc" : "true if the user wants marketing emails" },
  ]
}

Avro Complex Types

Example:

[{
  "type" :  "record",
  "namespace" : "com.example",
  "name" : "CustomerAddress",
  "fields" : [
    { "name": "address", "type": "string" },
    { "name": "city", "type": "string" },
    { "name": "postcode", "type": ["int", "string"] },
    { "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] }
  ]
},
{
  "type" : "record",
  "name" : "Customer",
  "namespace" : "com.example",
  "doc" : "Avro schema for our Customer",
  "fields" : [
	  {"name": "first_name", "type" : "string", "doc" : "First name of the customer" },
	  {"name": "middle_name", "type" : "string", "doc" : "Last name of the customer" },
	  {"name": "last_name", "type" : "string", "doc" : "Last name of the customer" },
	  {"name": "age", "type" : "int", "doc" : "Age of the customer" },
	  {"name": "height", "type" : "float", "doc" : "Height in cms" },
	  {"name": "weight", "type" : "float", "doc" : "Weight in kgs" },
	  {"name": "automated_email", "type" : "boolean", "default" : true, "doc" : "true if the user wants marketing emails" },
	  {"name": "customer_emails", "type" : "array", "items" : "string", "default" : [], "doc" : "user emails" },
	  {"name": "customer_address", "type" : "com.example.CustomerAddress",  "doc" : "user address" },
  ]
}]

Avro Logical Types

Note

Logical types don't play well with unions currently

Example

{
  "type" :  "record",
  "namespace" : "com.example",
  "name" : "CustomerAddress",
  "fields" : [
    { "name": "address", "type": "string" },
    { "name": "city", "type": "string" },
    { "name": "postcode", "type": ["int", "string"] },
    { "name": "type", "type": "enum", "symbols": ["PO BOX", "RESIDENTIAL", "ENTERPRISE"] },
    { "name": "createStamp", "type" : "long", "logicalType": "timestamp-millis"  }
  ]
}

The Complex Case of Decimals

Floats and Doubles are floating binary point types, they represent a number like this: 10001.100010110011

Decimal is a floating decimal point type. They represent a number like this:
12345.656788

Some decimals cannot be represent accurately as floats or doubles

People use floats and doubles for scientific computation (imprecise computations) because these are fast

People use decimals for money. That's why it got created. Use decimal when you need 'exactly accurate' results

Note

Avoid using decimals as a logical type for now. Use a string instead.

References

Flashcards

In Avro, adding a field to a record without default is a breaking schema evolution

Which is an optional field in an Avro record?:: doc