Setting Up Tools

Let’s get everything (well, a starting version of “everything”) set up for Data Cleaning, EDA, and modeling.  We are going to use the Anaconda distribution of Python.  Anaconda is, well, awesome.  You’ll learn why over the course of the semester.  Do not, under any circumstances, associate yourself with Python 2. 

We’ll be using CLIs in this class. 

  • If you’re on Mac, learn how to start Terminal.  If you want to up the power, check out iTerm2. 
    • I encourage you to make your terminal your own.  I use oh-my-zsh which has all sorts of amazing plugins like autocomplete.  It is considered a framework for managing your Zsh Shell configuration.  
  • If you’re on Windows, probably install PowerShell at minimum.  However, upgrading to WSL2 (Ubuntu is great) might be a good option.  However, YMMV. 
    • If you choose to upgrade to WSL2, you can also use ZSH and oh-my-zsh.  However, bash is tried and true, and there is a bash configuration manager that I used for years called bash-it.  Check it out!

Alright, time for the work to begin… (If these directions seem unnecessarily non-specific, you’ll thank me later…) 

  1. Download and install Anaconda Individual Edition for your platform. Then do the following:
    1. Familiarize yourself with Anaconda Navigator.
    2. Now, let’s do it from the command line.  conda is a command line tool that allows you to create environments, install packages, and related tasks. You can find out lots more about conda from the conda website.
    3. Download/Bookmark this conda cheatsheet
  2. With conda, create a new python environment based on Python 3.9.
  3. Activate your mlenv created in step 2.
  4. Use conda to install the following additional packages into your shiny new ml environment:
    1. numpy
    2. pandas
    3. jupyter and jupyter-lab
    4. matplotlib
    5. seaborn
    6. scipy
    7. scikit-learn
    8. statsmodels
  5. If you don’t know Python, Fasten Your Seatbelts… (if you know some Python, you might peruse the following so we are all on the same page):
    1. Go through Chapter 2 and Chapter 3 of the the McKinney book. This should get you to the level of “dangerous” in Python. 
    2. Here are some additional resources if you want to dig deeper (which is never a bad idea):
      • Matthes, Eric. Python Crash Course, 2nd edition. No Starch Press.  (You can find this book on the same website as the other books).  You really only need to peruse Section 1.  It is only 209 pages…. 
      • Youtube Free Code Camp’s Learn Python – Full Course For Beginners [Tutorial]… Note: I haven’t watched it from beginning to end.  There might be things that conflict with the steps you’ve done above.  Don’t redo them in FCC tutorial way.  I’m thinking you’ll see this when he talks about Installing PyCharm and stuff.  

More documentation for Anaconda and conda can be found on the Anaconda Documentation Website.