Cory's Wiki

Intro to Data Analysis

Cory Root 2017/05/10

This is a high-level introduction to different types of data analysis. It's aimed at an audience familiar with computer science, who might be interested in working on a data project or many.

This should inform you how to approach and think about data projects, whether joining an existing one or starting anew, and how to have productive conversations with other people about data.

Data Landscapes

Data analysis is used for different tasks in different contexts, but how closely the project is related to existing business operations is a good way to start distinguishing them:

Three common domains of data work are:

Underlying all of these efforts is Data Administration where all the data-wrangling is done. It isn't unusual for 95% or more time to be spent in data wrangling when starting a project with new data.

Talking about Data


First figure out exactly what you want to do. Are you interested in learning general machine learning or database theory? Then you want to study the academic side of things. This is fun and has that theoretical cool factor, but it doesn't pay the bills.

It is quickest and easiest to start from a business intelligence side. Choose an application: something that is relevant to your work or business now. Stay grounded in existing applications and look for existing problems that are similar to yours.

Machine Learning Resources

Math and Science Approaches

Python is by far the most used language for data, and it also happens to be one of the most easy to learn programming languages. I recommend to avoid blogs or others since there is an overwhelming amount of talk and other documentation and stick to some of these core sources to begin with.