Communication is an integral part of all data analysis. Whether someone is presenting results or general updates to a small or large audience, or you are simply discussing work someone has done, it's important to participate in these discussions. This involves both being able to ask useful questions when you are the audience, and how to present your own work in a useful way.
A quality data processing system, model, or design will serve some data mining goal and either provide understanding or prediction in some way. It can be tricky to figure out how that is happening.
A data project will involve the full range shown below, but may only directly be responsible for some slice of it. The connection from top to bottom (business to math) should be coherent and complete. The technology stack in the middle tends to run in a well-engineered manner if it is configured correctly at all; It is usually at the ends that problems occur. Reviewing this chain is a critical component of achieving the human validation goal of communicating in math.
A common frustration as an audience is a lack of understanding of where a project's focus lies in the above stack. Alternately, failure to consider the remaining parts of the stack can lead to incorrect results or problems executing the project.
Some presentations rely on particular communication styles to compensate for less rigorous work. These methods are effective at impressing certain audiences regardless of the quality of data analysis work behind them.
This person will mention such as Random Forests and what large numbers of features they are using, typically while showing poor results that indicate a lack of familiarity with model selection or basic performance metrics. They know their audience has taken the same Intro to ML MOOC they've taken. They try to gain rapport and confidence by mentioning key shared concepts, then establish authority and experience by mentioning what large numbers are involved in their work to distinguish it from a typical homework assignment.
This person will mention bleeding-edge machine learning techniques but display a poor grasp of when they are appropriate to use. They are riding the AI Hype Wave and making a lot of promises based on technology with a short track record.
This person will talk a lot about the technology stack they are using. Look at how many machines are in their cluster! They will use the latest cloud-number-crunching hardware and probably a NoSQL database just to show you how much muscle is underneath the hood. The problems come when you ask what they plan to use that monster truck for.
Try at home, kids!
– approximately impolite as each other, depending on who you are asking.
Answer the above questions before they are asked!