General Python Resources
- Python Programming Language, the official Python website
- The Python Tutorial
- The Python Standard Library
- How to Think Like a Computer Scientist – The Python Version
- Think Python, on-line book by Allen Downey
- Learn Python the Hard Way, another online book on Python
- Learning with Python – Interactive Edition, an interactive textbook on Python programming
Important Tools and Libraries
- IPython: A REPL for easy interactive python development. Extremely useful for testing ideas out one line of code at a time.
- matplotlib: A very nice plotting library, capable of generating production-level visualizations programmatically. Matlab-like syntax makes plotting very easy.
- NumPy: The fundamental package for scientific computing with Python.
- SciPy: the open source library for mathematics, science and engineering
- scikit-learn: a robust machine learning library building on top of NumPy, SciPy and matplotlib. Includes of a wide variety of modeling techniques.
- Pandas (python data analysis library): data structures and tools for common data analysis tasks, including an efficient data frame implementation (similar to R).
- BeautifulSoup: A general parsing library particularly useful for parsing html and xml.
- NLTK: Natural Language Toolkit for Python, including tools for text preprocessing, tokenization, and vectorization (you may also be interested in an online book that shows how NLTK is used).
- NetworkX: Python language library for the creation, manipulation, and analysis of graphs and networks.
References for Data Analysis in Python
- Python Scientific Lecture Notes: including detailed notes on NumPy, Matplotlib, and Scipy
- NumPy and SciPy Documentation: including NumPy User Guide and Cookbook
- Tentative NumPy Tutorial: from SciPy.org
- Getting Started with Python for Data Science: from Kaggle.com – includes information about installing Python and relevant libraries.
- Introduction to NumPy and Matplotlib – YouTube video
- Matplotlib User’s Guide
- Matplotlib Pyplot Tutorial
- IPython Documentation
- Natural Language Processing with Python: Online book on text processing and analysis using NLTK.
Installation of Python and Scientific Libraries
- winpython – (Windows) Preferred Python distribution for this class (includes scientific and data analysis libraries such as Numpy, Pandas, and scikit-learn, as well as IPython).
- Anaconda – (Mac, Windows, Linux) Python distribution for large-scale data processing and scientific computing (includes scientific and data analysis libraries such as Numpy, Pandas, and scikit-learn, as well as IPython)
- Notepad++: Excellent Python-friendly text editor
- Installing NumPy and SciPy
- Installing scikit-learn
- Installing Pandas
- Standalone Python Distributions