Cory's Wiki

Due to the heavily mathematical nature of most data analysis, most data analysis problems are logical errors. So we have to ask a lot of logical questions to check that everything is doing what we want. If you have worked in SQL or scripting languages, you probably have a good idea of how easy it is for logical problems to crop up.

Many data analysts who come from a software engineering background struggle here. They are used to compile and run-time error checking or finding test frameworks to emulate those error checks. We want the comfort of automatic machine-run tests. This comfort can be deceptive and misleading in the realm of data, where validity tends to come more from abstract math and logic.

In a way, presenting data analyses to others is the human test. The audience has a heavy responsibility to check that the information presented all checks out. This is the mathematical and logical equivalent of passing the “but it ran on my machine!” test.