Data Oriented Programming Practice¶
How do we program? An activity spent a lot of your lifetime on, let’s make it an enjoyable one!
As a happy programmer, I am a good programmer. I want to have an easy time programming. I want to produce source code that I like. I don’t want to write code that will hinder me from proceeding some time in the future. Or to put it another way, I don’t want to encounter roadblocks I created myself earlier.
You can call it hangover, the situation where reworking the whole thing feels like an inhumane task, a total defeat. At the same time proceeding in the same manner seems not liable, it causes pain. Mostly the solution is some workaround or ugly hack that questions the whole architecture so far.
I want to refine my practice, to avoid the mentioned situation in the future. I think a key aspect is to incorporate the topics iteration and refactor into the process naturally.
Also, to use the code or to work with the code should not impose any behaviour to the caller, i.e., the source code should be compatible to other programming styles and practices.
Imagine working in an already existing codebase and adding a certain functionality. At first let’s forget all entry points and places the code should later be plugged into etc. and just open a scope:
{
//...
}
and ask
What data do we operate on?
What are the core lines of code that solve the problem?
and then we write down these lines of code at an adequate level of quality, including some necessary input and output example data. Then, if not yet possible, we arrange it so that we can compile and run this code snippet in isolation!
Example: we want to print some intensity image to the console using ASCII characters. So let the input data be a two-dimensional array of intensities (float) and the output data a two-dimensional array of ASCII characters:
#include <vector>
#include <array>
#include <cmath>
int main() {
{
//input data
size_t width = 50;
size_t height = 20;
std::vector<float> inputMem(width * height);
//fill with some values
for(int hI = 0; hI < height; hI++) {
for (int wI = 0; wI < width; wI++) {
inputMem[hI * width + wI] =
std::sin(10.0f * 3.1415f * (float)wI / (float) width) *
std::sin(12.0f * 3.1415f * (float)hI / (float) height);
}
}
float * input = inputMem.data();
//output data
size_t outputSize = width * height + height;
std::vector<char> outputMem(outputSize);
char * output = outputMem.data();
//map to ascii art
const size_t mpN = 6;
std::array<char,mpN> mapping = {' ','.',':','o','=','@'};
for(int hI = 0; hI < height; hI++) {
for(int wI = 0; wI < width; wI++) {
auto inputIdx = hI * width + wI;
auto outputIdx = hI * (width + 1) + wI;
output[outputIdx] =
mapping[
(
std::max(0.0f,
std::min(1.0f, input[inputIdx]))
+ 0.05f
) *
(mpN - 1)
];
}
output[hI * (width + 1) + width] = '\n';
}
//print
fwrite(output, sizeof(char), outputSize, stdout);
}
return 0;
//...
}
when run this produces
o==o o==o o==o o==o o==o
.oo. .oo. .oo. .oo. .oo.
.oo. .oo. .oo. .oo. .oo.
o==o o==o o==o o==o o==o
o==o o==o o==o o==o o==o
.oo. .oo. .oo. .oo. .oo.
.oo. .oo. .oo. .oo. .oo.
o==o o==o o==o o==o o==o
o==o o==o o==o o==o o==o
.oo. .oo. .oo. .oo. .oo.
.oo. .oo. .oo. .oo. .oo.
o==o o==o o==o o==o o==o
o==o o==o o==o o==o o==o
.oo. .oo. .oo. .oo. .oo.
.oo. .oo. .oo. .oo. .oo.
o==o o==o o==o o==o o==o
Let’s now refactor it to satisfy the style that I have found to suit me well. In fact, since I stopped thinking in an object oriented way and started to explore this approach - we could call it data oriented programming - I think I became a more effective and also a happier programmer.
After refactoring it, actually using (i.e. calling) the code looks like this:
...
int main() {
{
using namespace ASCIIMapping;
Data data{};
data.params.width = 50;
data.params.height = 20;
run(data);
}
}
...
And the refactoring is very simple, we just put everything into a namespace that describes what the code is doing - in our case ASCIIMapping. There we create a struct named Parameters. Then a struct named Data which has one variable of type Parameters and additionally all the data that we need for the code to run. And finally a function called run() that takes a reference to a Data instance:
#include <vector>
#include <array>
#include <cmath>
namespace ASCIIMapping {
const size_t mpN = 6;
struct Parameters {
size_t width{};
size_t height{};
std::array<char,mpN> mapping = {' ','.',':','o','=','@'};
};
struct Data {
Parameters params{};
std::vector<float> inputMem{};
std::vector<char> outputMem{};
};
void run(Data & on) {
auto width = on.params.width;
auto height = on.params.height;
//input data
on.inputMem.resize(width * height);
//fill with some values
for(int hI = 0; hI < height; hI++) {
for (int wI = 0; wI < width; wI++) {
on.inputMem[hI * width + wI] =
std::sin(10.0f * 3.1415f * (float)wI / (float) width) *
std::sin(12.0f * 3.1415f * (float)hI / (float) height);
}
}
float * input = on.inputMem.data();
//output data
size_t outputSize = width * height + height;
on.outputMem.resize(outputSize);
char * output = on.outputMem.data();
//map to ascii art
for(int hI = 0; hI < height; hI++) {
for(int wI = 0; wI < width; wI++) {
auto inputIdx = hI * width + wI;
auto outputIdx = hI * (width + 1) + wI;
output[outputIdx] =
on.params.mapping[
(
std::max(0.0f,
std::min(1.0f, input[inputIdx]))
+ 0.05f
) *
(mpN - 1)
];
}
output[hI * (width + 1) + width] = '\n';
}
//print
fwrite(output, sizeof(char), outputSize, stdout);
}
}
This approach might appear old-fashioned to some. I agree on that, maybe sometimes the old way is the better way?
Why call it data oriented ?¶
Because it emphasizes the separation between functions and the data they operate on. In object oriented programming, an object would encapsulate its data and offer methods to manipulate or handle the data. I have tried to use this approach for quite some time now, and I apparently just happened to mess up almost every time. The separation of data and functions clears the mind and helps me focus to actually solve the problem, instead of debating why the name of this class is inappropriate, or which design pattern to use in that case.
Note on namespaces¶
I have found namespaces to be very useful in this context: I put all the meaning of what the code does into the naming of the namespace. This method allows for having a struct simply called Data and one called Parameters. Namespaces allow us to set up the context. We can then use the created namespaces conveniently with the using namespace directive if we want to. But we can also be very specific and give the whole chain of namespaces to make sure the reader knows what we are talking about.