Software Environment for Economics

When you've been killfiled by Simon Cozens, you've been killfiled by someone who knows how, when, and why.

As a researcher using computers, you need to be someone who knows "how, when, and why" to use computers. You don't need to know how your computer works in detail, but you do need to know a fair amount about the content (vs. the algorithms) of the software you use. Computers are good at

Statistics

For most of us, the calculations we're most interested in are statistical. For statistical software, there are several proprietary packages licensed by the University or Shako, such as

The advantage of these packages is a lot of attention was paid to making it easy to input data and perform common (sometimes very advance) analyses.

There are many open source (free) software packages available as well, the most popular being based on the R and Python languages. As someone recently wrote on Twitter,

R is a shockingly dreadful language for an exceptionally useful data analysis environment. The more you learn about the R language, the worse it will feel. The development environment suffers from literally decades of accretion of stupid hacks from a community containing, to a first-order approximation, zero software engineers. R makes me want to kick things almost every time I use it.

But note that it is an "exceptionally useful data analysis environment." For probability-based statistics, including Bayesian analysis, there's a package for everything. If you can do it in SPSS or EViews, you can do it in R. If you need to do it 100 times, with slightly different parameters each time, it's probably more convenient to do it in R (and there may already be a package that does that for you). Most advanced techniques (e.g., relogit that Mr. Ma used in his thesis) are available in R long before they become available in the proprietary packages.

The environments described so far (SPSS, EViews, R) are very package-oriented for our purpose. Learn to use the available packages (included in SPSS and EViews, included and from CRAN on the Internet for R) to make the most of these. R has its own included package manager, and it's tedious but not hard to use it to explore CRAN, find packages (you can also Google), and download and install them. R is available from the R Project, and is easy to install.

Python as a language is the opposite: a language designed by and for software engineers, with a dash of audacious genius mixed in. As a data analysis environment, in some ways R is superior (in availability of advanced packages for "traditional" statistical analysis), and in others Python has advantages (it is the premier environment for developing machine learning packages).

Installing software

Python.org provides installers for Mac and Windows. However, it is probably better to use something that provides a complete, consistent environment, including other tools besides the language alone. The most popular environment for non-software-developers seems to be Anaconda. Here's how to install it. (Don't worry if the discussion in the first few pages makes no sense yet.) Anaconda is a very general environment that can be used with several languages, of which Python is only one. R is another. It is based on Python.

You should choose Anaconda3, using Python 3. (If for some reason you think you need to use Python 2, talk to me. Sometimes it is necessary but it should be avaoided where possible.)

Serious programmers may prefer other package management systems, such as Cygwin or MinGW on Windows, MacPorts or Homebrew on Mac, or various Linux and BSD distributions for "traditional" Unix-like platforms.>

Programming

For Python and R programming, there is a "literary programming" environment called Jupyter. In fact, Jupyter can be used with "dozens of programming languages" according to its website.

What is it? Literary programming is a concept advocated by Donald Knuth (a CS professor at Stanford who also invented TeX), where a program and its explanation (e.g., a user guide to the program, or a scientific paper using the program) are developed at the same time in the same source document. The Mathematica symbolic mathematics package introduced notebooks, a WYSIWYG [1] implementation of literary programming. Jupyter is a general implementation that can be used with many programming languages. Jupyter is available via Anaconda (I think it's included by default). Other (simpler) editors for Python only include ipython (the predecessor to Jupyter) and IDLE (which is a package in the standard library of Python, and is available in every Python installation unless somebody deleted it on purpose).

Serious programmers may prefer a traditional programmer's editor such as vi (the VIM implementation is very popular) or emacs (for personal reasons I use XEmacs but you should probably use GNU Emacs). Emacsen provide the AUCTeX environment for writing TeX documents, which is the best environment. You may be able to get these editors through your package manager, such as conda.

The biggest advantages of Jupyter over traditional programmer's editors are

Version control

For any digital information you create for your research, it is important to be able to reproduce different versions of it. For large-scale datasets, you should make backups to read-only media such as DVD or BD. These backups should be in raw form, exactly as you collected them. If you "clean" your data, removing partial observations, impossible observations, or "outliers", you should do this with a program, and record each version of that program.

For programs and textual documents, you should use a version control system (VCS) such as git or Mercurial. [2] git is by far more popular, but Mercurial is considered easier to use by many people. Try git first.

Both of these systems have GUI interfaces, and can probably be installed via your package manager.

LaTeX

I prefer "plain text" documents because you can use diff on them more easily. For many documents, especially quick web pages and even lecture notes, reStructuredText (implemented in Python) and markdown (many implementations) are very useful.

Python programming

Programming in Python

Footnotes

[1]What You See Is What You Get
[2]The "SCM" in the URLs stands for either "source code management" or "software configuration management", but "VCS" is a lot more accurate.