This is going to be a legacy project for OII SDP 2016. It will contain a step-by-step guide with ressources relevant to scientific programming and data analysis.
This guide is meant to be dead simple to follow and using the best resources we know to be out there. Just work it through line by line and skip steps if you think you are already familiar with them.
Raise any issue you have with this guide, may it be understanding, or technical problems here. Try to make sure that you search for your issue, before you create a new one. You will need a Github account for this, which is free. You will want to have one later on anyway ;)
Great! If you want to help us writing, contact one of the contributors of this repository if you want to become one. Don’t be afraid if you’re not technically inclined. Helping us to reformulate and correct text is very welcome and easy to do with the approach we’re using. It’s almost as easy as writing in Word.
Python is a easy to learn programming language and quite effective for data analysis, that’s why we are starting with Python.
Complete the course at Codecademy. If you’re lucky, this will take you one rainy weekend. If you’re unlucky, the sun will come out. Nobody said it’s going to be easy ;)
Codecademy follows a freemium model, so you can pay for additional quizzes and projects during the course. If you want you can do those, but actually we think you don’t really need it. The free stuff is enough to give you a good foundation.
which python. Hit Enter. You should get an output along the lines of this:
$ which python /Users/YOURUSERNAME/anaconda/bin/python
python(hitting Enter will be omitted from now on ;) ) you should see a Python shell as you had it in the Codecademy course.
A good text editor will make your live much more convenient.
There are a lot of pretty ideological discussions about which text editor to use. You don’t have to think about this right now. Just use Atom for the moment: Get it here.
We consider it a good choice for beginners, because it is possible to extend it’s functionality with a great variety of packages, so it’s popular with experienced programmers, while it still is easy to configure and intuitively to use. Furthermore it is made by Github, the largest collaboration and version control network out there (more about this later), and therefore integrates very well with it.
Pandas is a Python package that makes your life more pleasant when working with data in tables (and also more complicated stuff). Just think about everything you could do in Excel, then add speed, flexibility and control. Yet you already have Pandas installed together with Anaconda — that’s great, isn’t it?
Even though we don’t like to offer too many options in this guide we have to provide two here, only because one of them is a paid resource:
Here are a few things you could find useful on your further journey:
Test Driven Development, or TDD, is a great approach to development, that will save you hours of debuging.