-
Notifications
You must be signed in to change notification settings - Fork 658
Domain Driven, Data Oriented Design
I spend most days thinking about architecture and software design. I have two projects that I use for both client work and for teaching teams of developers how to think about and implement software engineering projects.
Many of the design philosophies I teach focus on how to reduce or tame complexity in systems. This quote from John Ousterhout is always front and center in my mind.
"As a program evolves and acquires more features, it becomes complicated, with subtle dependencies between components. Over time, complexity accumulates, and it becomes harder and harder for programmers to keep all the relevant factors in their minds as they modify the system. This slows down development and leads to bugs, which slow development even more and add to its cost. Complexity increases inevitably over the life of any program. The larger the program, and the more people that work on it, the more difficult it is to manage complexity." - John Ousterhout
Here are three base level design philosophies that drive many of the decisions I make in software design.
-
“Don’t make things easy to do, make things easy to understand.”
-
”Every encapsulation should define a new semantic where one is absolutely price”.
-
”Keep things simple until you can’t.”
I have a lot of quotes that have helped shape the design philosophies, guidelines, and rules that I teach. They can be found in this document.
https://github.com/ardanlabs/gotraining/blob/master/topics/go/README.md
One of the very first lessons I teach in my software design class is how to architect a project to allow for clean mental models and long term project maintainability. I teach this through the idea of defining clear boundaries or firewalls within the scope of a project. It’s critical to define the rules and guidelines for each of these firewalled areas of the project. If you don’t, complexity takes over and the project becomes hard to maintain, manage, and debug. I’m sure many of you have experienced this.
My projects define three types of firewalls: packaging, layering, and domains. Each of these firewalls handle a different level of data integrity, isolation, transformation, and flow. It’s important to have in your head that every piece of software you write is about the understanding and management of data. If you don’t understand the data, you don’t understand the problem.
Here is a TL;DR of those three types of firewalls:
-
Packaging: Packaging defines the smallest unit of data integrity, isolation, transformation, and flow. Packages create firewalls between the distinct individual problems that need to be solved and separated. It requires an understanding of API purpose, type systems, and design.
-
Layering: Layering defines the next unit of data integrity, isolation, transformation, and flow. Layers create firewalls between the distinct roles of responsibility that need to be managed and separated. It requires an understanding of data responsibility, shaping, and validation.
-
Domains: Domains define the largest unit of data integrity, isolation, transformation, and flow. Domains create firewalls between the different distinct groups of data that need to be managed and separated. It requires an understanding of data relationships and scalability.
In an interview given to Brian Kernighan by Mihai Budiu in the year 2000, Brian was asked the following question:
- “Can you tell us about the worst features of C, from your point of view”?
This was Brian’s response:
- “I think that the real problem with C is that it doesn’t give you enough mechanisms for structuring really big programs, for creating "firewalls" within programs so you can keep the various pieces apart. It’s not that you can’t do all of these things, that you can’t simulate object-oriented programming or other methodology you want in C. You can simulate it, but the compiler, the language itself isn’t giving you any help.”
You have to remember, the Go language designers know Brian and have worked with him. I think it’s safe to say that if Brian believes this, the Go language designers believe this as well. I think the idea of packaging in Go comes from this belief about C and wanting to provide a solution.
To start, you need to understand that your Go project isn’t a monolith of code, but a collection of static libraries that are called packages. These packages are automatically built by the compiler when it finds a folder that contains Go code. These packages solve the problem described by Brian. They create firewalls within the program and the compiler is doing this for you.
The idea of building static libraries isn’t new to Go. You could have been writing software as a collection of static libraries for your whole career. However, since this has not been done for you automatically, the majority of the code you’ve written for your projects has represented a monolith of code.
I think it's important to appreciate the following items:
- Packaging directly conflicts with how we have been taught to organize source code in other languages.
- In other languages, packaging is a feature that you can choose to use or ignore.
- You can think of packaging as applying the idea of microservices on a source tree.
- All packages are "first class," and the only hierarchy is what you define in the source tree for your project.
- There needs to be a way to “open” parts of the package to the outside world.
- Two packages can’t cross-import each other. Imports are a one way street.
The core design philosophy behind Package Oriented Design is that a package should have a clear and obvious purpose with a focus on usability. Then based on its purpose, a package should be implemented with the appropriate amount of portability. Here is a larger breakdown of what I think it means for a package to be purposeful, usable, and portable.
- To be purposeful, a package should provide, not contain.
- Packages should be named with the intent to describe what it provides.
- Packages should not become a dumping ground of disparate concerns.
- To be usable, a package should be designed with the user as the primary focus.
- Packages should be intuitive and simple to use.
- Packages should respect their impact on resources and performance.
- Packages should protect the user’s application from cascading changes.
- Packages should reduce, minimize and simplify the user’s code base.
- To be portable, a package should be designed with reusability in mind.
- Packages should aspire for the highest level of portability for its purpose.
- Packages should reduce setting policy when it’s reasonable and practical.
- Packages should not become a single point of dependency.
The layer a package lives in plays a big role in the design of that package’s API. Each layer comes with constraints and policies that the package must adhere to. When a package stops following the rules, it will result in the project losing its ability to be properly maintained, managed, and debugged.
Let me break down the 4 different layers that I use in my projects so I can better explain the influence a layer plays in package API design.
I use these 4 different layers in my projects: Api, App, Business, and Storage.
Figure 1
Figure 1 shares a global representation of the different layers and domains the Service project has defined.
-
Api: This layer mirrors the App layer APIs, allowing for data to be received and sent over protocol based transports like HTTP, GRPC, or even StdIn/Out. This layer doesn’t have its own data models, it uses the App layer data models which know how to decode and encode themselves (like JSON). This layer performs the first series of data validation by transforming encoded data from the protocol stream into the appropriate App layer data models.
-
App: This layer implements the external user APIs needed to support the application. The data models in this layer define the expected input, output, and the encoding scheme (like JSON) for each API. The data models also define any constraints the data must be validated against. The App layer calls into the Business layer to perform any business logic needed to satisfy the application requirements.
-
Business: This layer implements the business APIs needed to support the App layer. The data models in this layer represent the schema for the application and they don’t conform to any App or Storage encoding scheme. The App and Storage layers are expected to represent these models within their own layer. Through the transformation of the data models between these layers, the application will have high levels of data integrity. This layer defines storage APIs that need to be implemented by the Storage layer.
-
Storage: This layer implements the business storage APIs to support the storage needs of the Business layer. The data models in this layer define how data will be stored and retrieved with specific storage encoding information. The data accepted and returned back to the Business layer will always be in the form of the Business data models. This layer can trust the business layer with any data it’s asked to store.
- "Data dominates. If you've chosen the right data structures and organized things well, the algorithms will almost always be self-evident. Data structures, not algorithms, are central to programming." - Rob Pike
Design Philosophy
- If you don't understand the data, you don't understand the problem.
- All problems are unique and specific to the data you are working with.
- Data transformations are at the heart of solving problems. Each function, method and work-flow must focus on implementing the specific data transformation required to solve the problems.
- If your data is changing, your problems are changing. When your problems are changing, the data transformations needs to change with it.
- Uncertainty about the data is not a license to guess but a directive to STOP and learn more.
- Solving problems you don't have, creates more problems you now do.
- If performance matters, you must have mechanical sympathy for how the hardware and operating system work.
- Minimize, simplify and REDUCE the amount of code required to solve each problem. Do less work by not wasting effort.
- Code that can be reasoned about and does not hide execution costs can be better understood, debugged and performance tuned.
- Coupling data together and writing code that produces predictable access patterns to the data will be the most performant.
- Changing data layouts can yield more significant performance improvements than changing just the algorithms.
- Efficiency is obtained through algorithms but performance is obtained through data structures and layouts.
Think about any line of code you have ever written. That line of code can only be doing one of three things.
- Allocating memory
- Reading memory
- Writing memory
Take this to the next level, what’s being allocated, read, and written is just numbers. All of these reads and writes of numbers cause everything to happen all around the planet.
Think about any function you have ever written. That function is only doing one thing, a data transformation. Functions take input, they transform that input, and produce output. The input can come from parameters that are passed in or pulled from inside the function. The output can be returned from the function or passed from inside the function.
As discussed earlier, each package represents an API and creates firewalls between the different parts of the project. The API is represented by functions that accept data as input and return data as output. The data is defined through a type system. The key is that each package needs it’s define its own type system for data input and output. One exception to this rule is when you need to write a polymorphic function. In this case, the function will use the type system from the package that has defined the interface.
Polymorphism plays a role in API design and it needs to be understood. This is the definition of polymorphism I use.
- “Polymorphism means a piece of code changes its behavior based on the concrete data it’s operating on.”
When designing an API there are two choices. The API can ask for data based on what the data is (concrete type) or based on what the data can do (interface type). When using an interface type, the API can be considered a polymorphic API.
When a method is executed from an interface value, it’s polymorphism because that method call changes its behavior depending on the concrete data stored inside the interface. It can also be called runtime polymorphism because it’s not until runtime that it is known what the behavior will be. Go also supports compile time polymorphism with the introduction of generics.
A generic function is also a polymorphic function. A generic type represents a concrete type that will be known at compile time. The nice thing about a generic type is code gets to be written as if the generic type is a concrete type. This makes code easier to read and avoids the need for the reflect package and API.
As a general rule, don’t design with interfaces, discover them. Start with concrete types and then identify if the function can be polymorphic. This will allow for identifying if the polymorphism should be runtime or compile-time and if runtime, defining the most precise interface (method-set of behavior) as required.
Whatever the choice, the types the API needs to allow the data flow must be defined by the package itself and not outside the package.
Contact Bill Kennedy at [email protected] if you are having issues getting the project running.