Python Tools for Scientists: An Introduction to Using Anaconda, JupyterLab, and Python's Scientific Libraries
By Lee Vaughan
()
About this ebook
Python Tools for Scientists will introduce you to Python tools you can use in your scientific research, including Anaconda, Spyder, Jupyter Notebooks, JupyterLab, and numerous Python libraries. You’ll learn to use Python for tasks such as creating visualizations, representing geospatial information, simulating natural events, and manipulating numerical data.
Once you’ve built an optimal programming environment with Anaconda, you’ll learn how to organize your projects and use interpreters, text editors, notebooks, and development environments to work with your code. Following the book’s fast-paced Python primer, you’ll tour a range of scientific tools and libraries like scikit-learn and seaborn that you can use to manipulate and visualize your data, or analyze it with machine learning algorithms.
You’ll also learn how to:
- Create isolated projects in virtual environments, build interactive notebooks, test code in the Qt console, and use Spyder’s interactive development features
- Use Python’s built-in data types, write custom functions and classes, and document your code
- Represent data with the essential NumPy, Matplotlib, and pandas libraries
- Use Python plotting libraries like Plotly, HoloViews, and Datashader to handle large datasets and create 3D visualizations
Regardless of your scientific field, Python Tools for Scientists will show you how to choose the best tools to meet your research and computational analysis needs.
Read more from Lee Vaughan
Impractical Python Projects: Playful Programming Activities to Make You Smarter Rating: 4 out of 5 stars4/5Real-World Python: A Hacker's Guide to Solving Problems with Code Rating: 0 out of 5 stars0 ratings
Related to Python Tools for Scientists
Related ebooks
IPython Notebook Essentials Rating: 0 out of 5 stars0 ratingsIntroduction to Scientific Programming with Python Rating: 0 out of 5 stars0 ratingsData Analysis with Python: Introducing NumPy, Pandas, Matplotlib, and Essential Elements of Python Programming (English Edition) Rating: 0 out of 5 stars0 ratingsPython Machine Learning Illustrated Guide For Beginners & Intermediates:The Future Is Here! Rating: 5 out of 5 stars5/5NumPy Essentials Rating: 0 out of 5 stars0 ratingsJupyter Environments and Workflows: Definitive Reference for Developers and Engineers Rating: 0 out of 5 stars0 ratingsPython for TensorFlow Pocket Primer: A Quick Guide to Python Libraries for TensorFlow Developers Rating: 0 out of 5 stars0 ratingsPython for Data Science: Data Science Mastery by Nikhil Khan, #1 Rating: 0 out of 5 stars0 ratingsPython Programming: General-Purpose Libraries; NumPy,Pandas,Matplotlib,Seaborn,Requests,os & sys: Python, #2 Rating: 0 out of 5 stars0 ratingsPython for Data Science For Dummies Rating: 0 out of 5 stars0 ratingsPandas Cookbook: Practical recipes for scientific computing, time series, and exploratory data analysis using Python Rating: 0 out of 5 stars0 ratingsMastering Python: Learn Python Step-by-Step with Practical Projects Rating: 0 out of 5 stars0 ratingsComputational Science: An Introduction for Scientists and Engineers Rating: 0 out of 5 stars0 ratingsLearning Jupyter Rating: 3 out of 5 stars3/5Coder's Hand Book - Python: Coder's Hand Book - Python: Coder's Hand Book - Python Rating: 0 out of 5 stars0 ratingsLearning Apache Spark 2 Rating: 0 out of 5 stars0 ratingsTrackpad Ver. 2.0 Class 6: Windows 10 & MS Office 2016 Rating: 0 out of 5 stars0 ratingsPython Machine Learning: Mastering Scikit-Learn and TensorFlow Step by Step for Beginners Rating: 0 out of 5 stars0 ratingsPython 3 and Data Analytics Pocket Primer: A Quick Guide to NumPy, Pandas, and Data Visualization Rating: 0 out of 5 stars0 ratingsPython Basics Made Simple: A Practical Guide with Examples Rating: 0 out of 5 stars0 ratingsPythonic AI: A beginner's guide to building AI applications in Python (English Edition) Rating: 5 out of 5 stars5/5Python Made Simple: A Practical Guide with Examples Rating: 0 out of 5 stars0 ratingsData Manipulation with Python Step by Step: A Practical Guide with Examples Rating: 0 out of 5 stars0 ratingsEfficient Scientific Programming with Spyder: Definitive Reference for Developers and Engineers Rating: 0 out of 5 stars0 ratingsMastering Python: A Comprehensive Guide for Beginners and Experts Rating: 0 out of 5 stars0 ratingsPython Programming: Learn, Code, Create Rating: 0 out of 5 stars0 ratingsMastering Python: A Comprehensive Crash Course for Beginners Rating: 0 out of 5 stars0 ratings
Programming For You
SQL QuickStart Guide: The Simplified Beginner's Guide to Managing, Analyzing, and Manipulating Data With SQL Rating: 4 out of 5 stars4/5Microsoft Azure For Dummies Rating: 0 out of 5 stars0 ratingsPython: Learn Python in 24 Hours Rating: 4 out of 5 stars4/5Coding All-in-One For Dummies Rating: 4 out of 5 stars4/5Excel : The Ultimate Comprehensive Step-By-Step Guide to the Basics of Excel Programming: 1 Rating: 5 out of 5 stars5/5JavaScript All-in-One For Dummies Rating: 5 out of 5 stars5/5Python Programming : How to Code Python Fast In Just 24 Hours With 7 Simple Steps Rating: 4 out of 5 stars4/5PYTHON PROGRAMMING Rating: 4 out of 5 stars4/5Beginning Programming with C++ For Dummies Rating: 4 out of 5 stars4/5Linux: Learn in 24 Hours Rating: 5 out of 5 stars5/5Learn to Code. Get a Job. The Ultimate Guide to Learning and Getting Hired as a Developer. Rating: 5 out of 5 stars5/5SQL All-in-One For Dummies Rating: 3 out of 5 stars3/5Beginning Programming with Python For Dummies Rating: 3 out of 5 stars3/5The Recursive Book of Recursion: Ace the Coding Interview with Python and JavaScript Rating: 0 out of 5 stars0 ratingsPython for Data Science For Dummies Rating: 0 out of 5 stars0 ratingsProblem Solving in C and Python: Programming Exercises and Solutions, Part 1 Rating: 5 out of 5 stars5/5Algorithms For Dummies Rating: 4 out of 5 stars4/5PYTHON: Practical Python Programming For Beginners & Experts With Hands-on Project Rating: 5 out of 5 stars5/5
Reviews for Python Tools for Scientists
0 ratings0 reviews
Book preview
Python Tools for Scientists - Lee Vaughan
INTRODUCTION
This book is for scientists and budding scientists who want to use the Python programming language in their work. It teaches the basics of Python and shows the easiest and most popular way to gain access to Python’s universe of scientific libraries, the preferred method for documenting work, and how to keep various projects separate and secure.
As a mature, open source, and easy-to-learn language, Python has an enormous user base and a welcoming community eager to help you develop your skills. This user base has contributed to a rich set of tools and supporting libraries (collections of precompiled routines) for scientific endeavors such as data science, machine learning, language processing, robotics, computer vision, and more. As a result, Python has become one of the most important scientific computing languages in academia and industry.
Popularity, however, comes with a price. The Python ecosystem is growing into an impenetrable jungle. In fact, this book sprang from conversations with scientific colleagues in the corporate world. New to Python, they were frustrated, stressed, and suffering from paralysis by analysis. At every turn, they felt they had to make critical and difficult decisions such as which library to use to draw a chart and which text editor to use to write their programs. They didn’t have the time or inclination to learn multiple tools, so they wanted to choose the option with the fewest repercussions down the road.
This book is designed to address those concerns. Its goal is to help you get started with scientific computing as quickly and painlessly as possible. Think of it as a machete for hacking through the dense jungle of Python distributions, tools, and libraries (Figure 1).
Figure 1: Hacking your way through the Python jungle
To reach this goal, I’ll help you make some decisions. As everyone’s needs are unique, these won’t always be perfect, but they should represent sensible, no regret
choices that will position you to customize your setup later, after you have more experience.
To begin, you’ll use the free Anaconda Distribution of Python. As the most popular Python distribution platform, it has more than 30 million users worldwide. Provided by Anaconda, Inc. (https://www.anaconda.com/), it’s the platform of choice for Python data science. Anaconda will make it easy to install Python, set up your computing environment, and keep it organized and up to date over time.
Please note that this book is intended for scientists who write scripts for their own personal use or for that of their team. It’s not intended for professional software developers or engineers working on enterprise software. It also addresses only free, open source software. Your place of work may use proprietary or commercial libraries that supersede those listed here.
Finally, this book won’t show you how to do science, or data analysis, or whatever your job entails. It won’t teach you how to use your operating system, and it won’t provide detailed instructions on how to use every important scientific library. Each of these requires large, dedicated volumes, which you can readily find in bookstores or online. Rather, this book will introduce you to basic tools and libraries useful across a wide range of scientific disciplines, help you to install them, and help you to get started using them. And, hopefully, it will take a lot of the stress out of setting up and using Python for science.
Why Python?
Because you’re reading this book, you’ve probably already made up your mind about using Python. If you’re still mulling it over, however, let’s look at some reasons why you might want to choose Python for scientific programming. Otherwise, feel free to skip to the next section, Navigating This Book
on page xxvii.
Python’s design philosophy stresses simplicity, readability, and flexibility. These priorities make it a useful language for all stages of research and scientific endeavors, including general computing, design of experiments, building device interfaces, connecting and controlling multiple hardware/software tools, heavy-duty number crunching, and data analysis and visualization. Let’s take a look at some of the key features of Python and why they are great selling points for science:
Free and open source: Python is open source, which means that the original source code is freely available and may be redistributed and modified by anyone. It is continuously developed by a team of volunteers and managed by the nonprofit Python Software Foundation (www.python.org/). A strong point of open source software is that it’s hardened; that is, scrubbed of bugs and other problems by a large, involved user base. In addition, these users often publish and share their code so that the entire community has access to the latest techniques. On the downside, open source software can be more vulnerable to malicious users, less user friendly, and more poorly documented and supported than commercial alternatives.
High level: Python is a high-level programming language. This means that significant areas of the computing system, such as memory management, are automated and hidden from view. As a result, Python’s syntax is very readable by humans, making it easy to learn and use.
Interpretive: Python is an interpretive language, which means it executes instructions immediately—similar to applying a calculation in a spreadsheet—without the need to compile the code. This gives you instant feedback, makes Python highly interactive, and helps you to catch errors as soon as they occur. It does slow the language down, however, compared to compiled languages such as Java and C++.
Platform neutral: Python runs on Windows, macOS, and Linux/Unix, and apps are available for Android and iOS.
Widespread support and shared learning: Millions of developers provide a strong support system to Python. Thanks to this large community, all the major Python products include online documentation, and you can easily find help and guidance through both free and fee-based online support sites and tutorials. Likewise, the number of Python-related print and ebooks has exploded in recent years and cover a wide range of subjects for beginners through advanced users.
Python’s helpful user base is important, as the key to programming lies not in memorizing all the commands, but in understanding what you want to do. You will spend as much time in online search engines as you will in Python, and knowing how to construct a task-specific question (such as How do I post text on an image in OpenCV?
) will become an essential skill (Figure 2).
Figure 2: The secret life of programmers
Among the more popular support sites is Stack Overflow (https://stackoverflow.com/). In many cases, you’ll find that your query has already been answered. If not, be sure to take the tour (https://stackoverflow.com/tour/) and visit the Asking section (https://stackoverflow.com/help/asking/) to review the proper way to post questions.
You can also find sites dedicated to the use of Python in specific sciences. For example, Practical Python for Astronomers (https://python4astronomers.github.io/) is a useful site for astronomers, and Analytics Vidhya (https://www.analyticsvidhya.com/) is designed for data scientists.
Batteries included: A motto of Python is batteries included,
which means that it comes with all the possible parts required for full usability. In addition to a large standard library of useful tools, Python can be easily upgraded from a wide selection of third-party libraries. These are Python programs written and tested by experts in a field that you can apply in your own work. Some examples include OpenCV, used to work with image and video data; TensorFlow, used for machine learning projects; and Matplotlib, used for generating charts and diagrams. These libraries will greatly reduce the amount of code that you need to write to conduct experiments, analyze and visualize data, design simulations, and complete your projects.
Scalable: Python can easily handle the large datasets commonly used in science and engineering. Your main limitations will be the processing speed and memory of your computer. For comparison, Microsoft Excel spreadsheets have speed and stability issues with as few as tens of thousands of datapoints. Complex Excel projects become fragile as the number of spreadsheets grow, resulting in errors that are difficult to recognize, find, and fix.
Python supports both procedural and object-oriented programming that will help you write clear, logical code for both small- and large-scale projects. Python will also catch errors for you as soon as they occur.
Flexible: Python can handle multiple data formats and can run instrumentation and sensors for scientific experiments and data gathering. As a glue
language, it’s easy to integrate with lower-level languages such as C, C++, and FORTRAN, and it’s useful for connecting multiple scripts or systems, including databases and web services. The large number of third-party libraries available makes Python extendable to many tasks.
Navigating This Book
This book is designed for both true beginners and those familiar with Python but not Anaconda or some of the various scientific libraries. It’s designed to be one-stop shopping
that will get you up and running with enough knowledge to begin working with data and writing your own programs.
True beginners who want a quick start learning Python should first read the chapters shown boxed in Figure 3, and then return to Part I to finish Chapters 5 and 6.
Figure 3: The fast track to learning Python
More experienced users might want to skip around (for example, omitting the Python primer). With that in mind, here’s a short synopsis of the book’s contents.
Part I: Setting Up Your Scientific Coding Environment
Part I provides instructions on how to install, launch, and navigate Anaconda, and how to use the conda package manager, an open source package and environment management system that runs on Windows, macOS, and Linux. In addition, you’ll be introduced to the world of shells, interpreters, text editors, notebooks, and integrated development environments (IDEs), including when and why you need them. Part I includes the following chapters:
Chapter 1, Installing and Launching Anaconda: How to install Anaconda on Windows, macOS, and Linux, followed by a tour of the Anaconda Navigator graphical user interface (GUI) and the alternative terminal-based command prompt.
Chapter 2, Keeping Organized with Conda Environments: Introduces the concept of virtual environments that let you isolate your projects and use different versions of Python and its scientific libraries. You’ll set up your first conda environment, a directory that contains a specific version of Python, into which you’ll install a specific collection of conda packages. This will allow you to keep your projects organized and prevent any conflicts among different versions of Python and/or the various libraries.
Chapter 3, Simple Scripting in the Jupyter Qt Console: Introduces the Jupyter (IPython) Qt console, a lightweight interface useful for interactive coding, quick concept testing, and data exploration.
Chapter 4, Serious Scripting with Spyder: Introduces Spyder, the Scientific Python Development Environment included with Anaconda. Spyder was designed by scientists, engineers, and data analysts, and provides the advanced editing, analysis, debugging, and profiling functionality of a comprehensive development tool with the data exploration, interactive execution, deep inspection, and visualization capabilities of a scientific application. If you’re completely new to Python, skip down to Part II, where you’ll use this tool and the Qt Console to learn the basics of Python.
Chapter 5, Jupyter Notebook: An Interactive Journal for Computational Research: Introduces the Jupyter (IPython) Notebook, a web-based interactive computing platform that combines live code, equations, descriptive text, interactive visualizations, and other types of media. Programs written in Jupyter can be extensively documented in-place and turned into publishable articles, interactive dashboards, and presentation-quality slideshows.
Chapter 6, JupyterLab: Your Center for Science: Introduces JupyterLab, a web-based interactive development environment for Jupyter notebooks, code, and data. JupyterLab’s flexible interface can be configured to support a wide range of workflows in data science, scientific computing, and machine learning. In fact, you may spend most of your scientific computing life
here, especially if you’re a data scientist.
Part II: A Python Primer
Part II is a quick introduction to the Python programming language. If you’re already familiar with the basics, you can skip this part and just use it as a reference when needed. Part II includes the following chapters:
Chapter 7, Integers, Floats, and Strings: Introduces some of Python’s basic data types, operators, and error messages.
Chapter 8, Variables: Introduces variables and variable naming conventions.
Chapter 9, The Container Data Types: Introduces Python’s tuple, list, set, and dictionary data types.
Chapter 10, Flow Control: Introduces flow-control statements, line structure, and methods for handling exceptions (errors).
Chapter 11, Functions and Modules: Introduces important concepts like abstraction and encapsulation, used to make programs easier to read and maintain.
Chapter 12, Files and Folders: Introduces modules and functions for working with files, folders, and directory paths.
Chapter 13, Object-Oriented Programming: Introduces the basics of object-oriented programming (OOP), used to make programs easier to maintain and update.
Chapter 14, Documenting Your Work: Presents best practices for in-code documentation.
Part III: The Anaconda Ecosystem
Part III introduces the Anaconda Python ecosystem and includes high-level summaries of many important scientific and visualization libraries, such as NumPy, pandas, and Matplotlib, and how to choose among the many options available. Part III includes the following chapters:
Chapter 15, The Scientific Libraries: Overviews of the core scientific libraries grouped by function, such as data analysis, machine learning, language processing, computer vision, deep learning, and so on. Guidelines are provided for choosing among competing libraries, along with a discussion of methods and libraries for dealing with very large datasets.
Chapter 16, The InfoVis, SciVis, and Dashboarding Libraries: Overviews of the most important libraries used to plot statistical and 3-D data and generate dashboards. Guidelines are provided for choosing among competing libraries.
Chapter 17, The GeoVis Libraries: Overviews of the most important libraries used to plot geospatial data. Guidelines are provided for choosing among competing libraries.
Part IV: The Essential Libraries
Part IV introduces you to the basics of working with NumPy, Matplotlib, and pandas—the Big Three
of Python scientific libraries. These libraries are important and wildly popular ones on which many others are based. Part IV includes the following chapters:
Chapter 18, NumPy: Numerical Python: Introduces NumPy, the module used for mathematical calculations in Python. Many useful scientific libraries such as pandas and Matplotlib are built on NumPy. This section covers some of its key concepts and base functionality.
Chapter 19, Demystifying Matplotlib: Covers the basics of Matplotlib, the granddaddy of plotting in Python, including some of its more confusing aspects.
Chapter 20, pandas, seaborn, and scikit-learn: Introduces pandas, the Python library designed for data loading, manipulation, and analysis. It offers data structures and operations for manipulating numerical tables and time series and includes data visualization functionality. This chapter is built around a machine learning classification problem that also involves seaborn, used for easier Matplotlib plotting, and scikit-learn, used for building predictive models.
Chapter 21, Managing Dates and Times with Python and Pandas: Addresses methods for working with dates and times in both native Python and pandas.
Appendix
The appendix presents answers to the Test Your Knowledge
challenges throughout the book.
Updates and Errata
This book will likely have multiple printings, and you can check for any updates or corrections at https://www.nostarch.com/python-tools-scientists. In the event you find any typos or errors, please report them to [email protected]. Be sure to include the book’s title and the page numbers affected (ebook readers should mention the chapter and the subsection).
As Python, Anaconda, and the scientific libraries are constantly evolving, I provide links to their official sites where appropriate so that you can always find the most up-to-date information regarding these products.
Leaving Reviews
If you find this book helpful, please take the time to leave an online review, even if it’s just a ranking with stars. Your unbiased opinion will help other users navigate the increasingly crowded market of Python programming books.
PART I
SETTING UP YOUR SCIENTIFIC CODING ENVIRONMENT
In Part I, you’ll create a scientific coding environment to build upon for years to come. You’ll start by installing Anaconda, a distribution of Python that works on Windows, macOS, and Linux and provides access to the science libraries we’ll use in this book. You’ll then learn to use the conda package and environment manager to keep your projects organized and up to date. After that, you’ll familiarize yourself with the popular coding tools Jupyter Qt console, Spyder, Jupyter Notebook, and JupyterLab.
These coding tools help you write code, run code, and review the output, and are summarized in Table I-1. If you’re unsure of the meaning of any of the terminology in the table, see the Terminology
sidebar.
Table I-1: Coding Tool Summaries
The Jupyter Qt console lets you execute commands inside windows called IPython interpreters and immediately displays the results. You can use this console to interact with and visualize data. It’s also great for learning Python.
The famous Jupyter Notebook is a web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text. It’s a wildly popular tool for data science that lets you do everything from exploring and cleaning data to producing polished and interactive reports, presentations, and dashboards. Using the cloud-based JupyterHub, you can serve Jupyter notebooks to multiple users such as a class of students or a scientific research group.
Spyder and JupyterLab are integrated development environments (IDEs). An IDE is an application that provides programmers with a set of tools for software development. For example, an IDE might include tools for debugging software and timing how long the code, or parts of the code, take to run. IDEs are built to work with specific application platforms and remove barriers involved in the development life cycle. They are generally used for more heavy-duty programming than is normally done in consoles or notebooks. JupyterLab, the next-generation user interface for Anaconda’s Project Jupyter, combines the classic Jupyter Notebook with a user interface that offers an IDE-like experience. It will someday replace Jupyter Notebook.
These coding tools are products of Interactive Python (IPython), a command shell used for interactive computing. (A command shell exposes the operating system’s services to a program or human user.) IPython is still evolving, and in 2015 the project split so that the language-agnostic parts, such as the notebook format, Qt console, web applications, message protocol, and so on, were put in the Jupyter project.
The name Jupyter references the Julia, Python, and R languages, though the project supports more than 40 languages. After the split, some terms changed. Most notably, IPython Notebook became Jupyter Notebook. There is also some overlap in the functionality of IPython products. This can cause confusion, especially given the volume of online articles and tutorials that reference the old terminology. If you’re interested in the history of IPython and Jupyter Notebook, check out the datacamp blog post IPython or Jupyter?
at https://www.datacamp.com/community/blog/ipython-jupyter/.
TERMINOLOGY
The following are some important terms that we’ll be using in Part I.
Debugging
A multistep process for finding, isolating, and resolving problems that prevent proper program operation, known as bugs. Debugging is usually performed with a program called, appropriately, a debugger. Debuggers run the problem program under controlled conditions in a step-by-step mode to track its operations. This typically involves running or halting the program at specific points, skipping over certain parts, displaying memory content, showing the position of errors that cause the program to crash, and so on.
Extensible
Extensibility is a principle used in software engineering and systems design that indicates whether a tool provides for future growth. JupyterLab, for example, is designed as an extensible environment. JupyterLab extensions are add-ons that provide new interactive features to the JupyterLab interface. For instance, JupyterLab LaTeX is an extension that lets you live-edit LaTeX documents, JupyterLab Plotly is an extension for rendering Plotly charts, and JupyterLab System Monitor lets you monitor your own resource usage, such as memory and CPU time. You can even write custom plug-ins for your own projects.
IDE
An IDE is a coding tool that integrates other specialized utilities into a single programming environment. Among these specialized tools are a text editor, a debugger, functions for autocompleting code, functions for highlighting mistakes, file managers, project managers, a performance profiler, a deployment tool, a compiler, and so on. By combining common software-writing tools into a single application, IDEs increase programmer productivity and make it easier to manage big projects with lots of interrelated scripts. The downside is that IDEs can be heavy, meaning they can take up a lot of system resources. They can also be a bit intense for beginners and those who need to write only relatively simple scripts.
Introspection
The ability to determine the type of an object and check its properties at runtime. In Python, an object is a code feature that has attributes and methods; you’ll learn more about these in Chapter 13. Code introspection dynamically examines these objects and provides information about them. When introspection is available, hovering the cursor over an object in your code will launch a pop-up window listing the type of object as well as useful tips about using it.
Kernel
The computational engine at the core of an operating system. It is always resident in memory, which means that the operating system is not permitted to swap it out to a storage device. The kernel manages disks, tasks, and memory and acts as a bridge between applications and the data processing performed at the hardware level.
Profiling
An analysis that measures the amount of time or memory required for a program, or a program’s components, to run. Profiling information can optimize code and improve its performance. IDEs, such as Spyder, come with profiling tools built in.
Qt
Pronounced cute, this is a widget (Windows gadget
) toolkit for creating graphical user interfaces and cross-platform applications that run on Windows, macOS, Linux, and Android.
Terminal
In modern usage, terminal refers to a terminal emulator rather than actual hardware such as a monitor and keyboard. Emulators provide a text-based interface at which to enter commands and may also be referred to as a command line interface (CLI), command prompt, console, or shell. The major operating systems all come with some type of terminal. Windows includes the Command Prompt executable, cmd.exe, for running Disk Operating System (DOS) commands and to connect to other servers. macOS ships with the aptly named Terminal, which you can use to run Unix commands within the operating system or to access other machines using the Zsh or Z shell. Unix normally includes a program called xterm, which can run Bash or other Unix shells.
Terminals are not very user friendly, but they allow access to information and software that sometimes is available only on a central computer, such as a File Transfer Protocol (FTP) server. Manipulating thousands of files and folders in the operating system is also easier in a terminal than in a graphics window. You can automate and expedite workflows on your computer, saving you time and aggravation. Additionally, you can run Python programs from a terminal as well as a lot of Anaconda operations (as an alternative to performing them with the Anaconda Navigator GUI). Best of all, knowing how to use a terminal will greatly impress your colleagues.
After you finish Chapter 4 in Part I, you can proceed to Part II, A Python Primer,
for an introduction to Python programming. If you’re comfortable with Python, complete Part I and go straight to Part III, The Anaconda Ecosystem,
to learn more about the essential packages for scientific computing.
1
INSTALLING AND LAUNCHING ANACONDA
Anaconda, the world’s most popular data science platform, provides access to a large collection of commonly used science libraries. This chapter walks you through the Anaconda installation process for Windows, macOS, and Linux. To verify the installation, you’ll launch Navigator, the GUI interface for Anaconda, and take a quick tour of its features.
About Anaconda
Among other features, Anaconda includes tools to help you write code and work with datasets; the Python language itself; collections of prewritten programs called packages; the Navigator GUI; and Nucleus, a community learning and sharing resource. Much of this content, summarized in Figure 1-1, is created and maintained by other organizations and distributed through Anaconda.
Figure 1-1: The key components of Anaconda
If you’re new to programming, you might be unfamiliar with the concept of packages. Packages are collections of modules, which are single programs that perform tasks that other programs can use. For example, a module might load an image and convert it from color to grayscale. Another module might resize or crop the image. Several of these image-manipulation modules might be grouped together into a package, and groups of packages form a library (Figure 1-2). The OpenCV computer vision library, for example, includes packages that do simple image manipulations, others that work with streaming video, and others still that perform machine learning tasks like detecting human faces.
Figure 1-2: The definitions of modules, packages, and libraries
Unfortunately, the terms module, package, and library are used interchangeably so often that they might as well refer to the same thing. To make matters worse, package may also refer to a unit of distribution, sharable with a community, that can contain a library, an executable, or both. So, you shouldn’t get too hung up on the definitions.
Many of the scientific packages that ship with Anaconda require numerous dependencies (specific versions of other supporting packages) to run. They might require a specific version of Python, as well. To keep the various Python installations and other packages from interacting and breaking, and to keep them up to date, Anaconda uses a binary package and environment manager called conda. You can use conda to install thousands of packages from the Anaconda public repository. There are also tens of thousands of packages from community channels such as conda-forge. These are in addition to several hundred packages that are automatically installed with Anaconda.
Conda will make sure that all necessary dependencies are installed with each library, saving you considerable trouble. It will also alert you if you are missing a dependency. Lastly, to prevent various packages from conflicting, conda lets you create conda environments, which are secure, isolated laboratories for your science projects. Packages in a conda environment will not interfere with packages in other locations, and when you share an environment, you can be sure that all the necessary packages are included. You’ll learn how to create conda environments in Chapter 2.
When you download Anaconda, you get access to Anaconda.org, a package management system that makes it easy to find, access, store, and share public notebooks, environments, databases, and packages in both conda and the Python Package Index (PyPI). You can use it to share your work collaboratively on the cloud or search and download popular Python packages and notebooks. You can also build new conda packages using conda-build and then upload them to the cloud to share with others (or to access them from anywhere).
Anaconda is developed and maintained by Anaconda, Inc. In addition to the free Anaconda Distribution (previously called Anaconda Individual Edition) that we’ll be using, the company also provides commercial versions. You can find the official documentation for all editions at https://docs.anaconda.com/anacondaorg/. Anaconda is also a distribution of the R programming language, and conda provides package, dependency, and environment management for languages such as Ruby, Lua, Scala, Java, JavaScript, C/C++, FORTRAN, and more. In this book, however, we’ll focus solely on its use with Python.
You’ll need about 5GB of free hard drive space to install Anaconda. Otherwise, you’ll need to install Miniconda, a minimal installation that requires around 400MB and comes with Python but not the other preinstalled libraries. There is also no need to uninstall any existing Python installations or packages prior to installing Anaconda.
In the event that you encounter problems, see the troubleshooting guide at https://docs.anaconda.com/anaconda/user-guide/troubleshooting/ and the FAQ at https://docs.anaconda.com/anaconda/user-guide/faq/. If you encounter any divergence in the instructions, defer to those in the installation wizard.
Installing Anaconda on Windows
You can find the official Windows-specific installation instructions at https://docs.anaconda.com/anaconda/install/windows/. Step 1 is to download the Anaconda Installer. You might need to choose between the 32- or 64-bit installer. Unless you have a very dated computer, you’ll want to click the 64-bit option. If you’re unsure, you should be able to verify your system type by navigating to Settings ▸ System ▸ About.
Clicking an installer downloads an .exe file into your Downloads folder (this can take a few minutes to complete). At this point, you have the option of checking the integrity of the installer using the SHA-256 checksum, which is a mathematical algorithm that checks files for corruption. Comparing a newly generated checksum against one generated ahead of time lets you detect errors introduced during data transmission. If you choose to run the checksum, see the instructions at https://docs.anaconda.com/anaconda/install/hashes/.
To start the installation, right-click the downloaded .exe file and choose the Run as Administrator option from the pop-up window. As Administrator, you’ll have permission to install Anaconda anywhere you want on your system. The installer will ask you for permission to make changes to your computer. Click Yes. The setup wizard should now appear. Click Next and then agree to the license.
The next window asks you to choose the installation type. Select the recommended Just Me option and then click Next. Next, you’re asked to choose an installation location. The installer will suggest a folder on the C:\ drive under your username. Note that this path should contain only 7-bit ASCII characters (numbers, letters, and certain symbols) and no spaces. Make a note of this default location and then click Next.
In the Advanced Installation Options window, register Anaconda as default Python and don’t add it to PATH. This is the recommended approach. It just means that you’ll need to open Anaconda Navigator or the Anaconda Command Prompt using the Start menu. By selecting the environment variable checkbox Add Anaconda3 to my PATH, you’ll be able to use Anaconda in the command prompt; however, this can cause problems down the road. Also, you can always add Anaconda to your PATH later. Click Install to continue. When the installation is complete, click Next.
After the installation window closes, you might be presented with the option to install the PyCharm or DataSpell IDE. If so, ignore it and click Next. We’ll be using the Spyder IDE, which comes preinstalled on Anaconda.
The installation should now be complete. In the final window, check the tutorial boxes if you want to view these later, and then click Finish. At this point, a window might open, welcoming you to Anaconda and inviting you to register for Anaconda Nucleus. You should also see an Anaconda3 folder in your Start Menu (Figure 1-3). This folder should contain a number of items, such as Navigator and prompts, which are terminals for entering text commands. You might also see icons for launching Jupyter and Spyder.
To verify that Anaconda loaded correctly, click the Windows Start button, navigate to the Anaconda3 app, and then launch Anaconda Navigator from the drop-down menu. You can also enter anaconda-navigator in the Anaconda Prompt terminal. This window doesn’t always automatically pop up, so be sure to check the taskbar at the bottom of your screen.
To see detailed information about your Anaconda distribution and Python version, type conda info in Anaconda Prompt.
Figure 1-3: The Anaconda3 program folder on the Windows Start menu
Installing Anaconda on macOS
You can install Anaconda Individual Edition on macOS using either a graphical setup wizard or through the command line. You can find instructions for both at https://docs.anaconda.com/anaconda/install/mac-os/. Choose the installer for your version of the operating system by scrolling to the bottom of the download page. When the download completes, you have the option of verifying the data’s integrity using the SHA-256 checksum algorithm (see the section Installing Anaconda on Windows
on page 9). Then, double-click the downloaded file and click Continue to launch the installation process.
You’ll be taken through the obligatory Introduction, Read Me, and License screens. The Important Information box in the Read Me screen will include specific instructions in the event that you want to deviate from any of the recommended default choices. When you finish with these screens, click the Install button to install Anaconda in your ~/opt directory. This is the recommended location, though you have the option of changing it using the Change Install Location button.
On the next screen, choose Install for me only and then click Continue. You might now have the option to install the PyCharm or DataSpell IDE. We will be using the Spyder IDE that comes preinstalled with Anaconda, so skip this step by clicking Continue. At this point you should see a screen indicating a successful installation. I highly recommend taking the time to look at the quick start guide and tutorial.
To end the installation process, click Close.
To verify installation, click Launchpad and then select Anaconda Navigator. Alternatively, use CMD-SPACE to open Spotlight Search and then enter Navigator to open the program. You can also see detailed information on the installed Anaconda distribution and Python version by visiting the Mac terminal and entering conda info.
Installing Anaconda on Linux
Because there are so many flavors of Linux, I strongly recommend you visit the official Anaconda installation instructions at https://docs.anaconda.com/anaconda/install/linux/. If you are running Linux on an IBM PowerPC or Power ISA computer, see https://docs.anaconda.com/anaconda/install/linux-power8/. These sites will help you install the dependencies that you’ll need to use GUI packages with your particular Linux distribution. The instructions presented in this section are for the x86 architecture.
Linux has no graphical installation option for installing Anaconda, so you’ll need to use the command line for most of the process. To begin, scroll to the bottom of the download page and click the installer for your system. When the download completes, you have the option of verifying the data integrity using the SHA-256 checksum algorithm (see the section Installing Anaconda on Windows
on page 9). Open a terminal and enter the following:
sha256sum /path/filename
Then, enter the following to begin installation:
bash ~/Downloads/Anaconda4-202x.xx-Linux-x86_64.sh
Note the date in the .sh filename above. This should be set to the name of the file you downloaded. If you did not download the installer to your Downloads directory, replace ~/Downloads/ with the correct path.
At the installer prompt, click Enter to view the license terms and then click Yes to agree. Next, the installer will prompt you to click Enter to accept the default install location, which is recommended, or specify an alternate installation directory. If you accept the default, the installer displays the following:
PREFIX=/home/
It then continues the installation, which may take a few minutes to complete. When the installer asks, Do you wish the installer to initialize Anaconda3 by running conda init?
the recommended answer is yes.
If for some reason you decide to say no,
see the instructions and FAQ on the installation website.
When the installer finishes, you’ll see a message, thanking you for installing Anaconda. Ignore the link for installing the PyCharm or DataSpell IDE, as we’ll be using the Spyder IDE that comes preinstalled.
For the installation to take effect, you’ll need to either close and open your terminal window or enter the command source ~/.bashrc. To control whether each shell session has the base environment activated by default, run conda config –set auto_activate_base True. If base activation is not desired, set this to False. In general, you will want to use the base environment as the default.
To verify installation, open a terminal and type conda list. If Anaconda is working correctly, this will display a list of all installed packages and their version numbers. You can also enter anaconda-navigator to open Navigator.
Getting to Know Anaconda Navigator
Anaconda Navigator is a desktop GUI. It provides a friendly point-and-click alternative to opening a command prompt or terminal and using typed commands to manipulate Anaconda. You can use Navigator to launch applications, search for packages on Anaconda.org or in a local Anaconda repository, manage conda environments, channels, and packages, and access a huge volume of training material. It works on Windows, macOS, and Linux.
Launching Navigator
On Windows, the installer will create a Start menu shortcut for Navigator. For Linux or macOS with Anaconda installed via the *.sh installer (as we did previously), open a terminal and enter anaconda-navigator. If you used the GUI (.pkg) installer on macOS, click the Navigator icon in Launchpad.
The Home Tab
Navigator opens with a window similar to the one shown in Figure 1-4. The app tiles, such as for Jupyter Notebook and Spyder, may be arranged differently in your view.
Figure 1-4: The Anaconda Navigator Home tab
The initial window you see is the Home tab ➊. Additional pages are listed below Home and include Environments, Learning, and Community. When you launch Navigator, you’ll start in the base (root) environment ➋. Environments are just folders or directories used to isolate and manage packages. The base environment is the folder in which Anaconda is installed, such as C:\Users\
The scrollable main screen is filled with square tiles for applications such as Datalore, Spyder, Command Prompts, and more ➌. Each tile contains a logo icon, the name of the app, a description of the app, and either a Launch ➍ or Install ➎ button depending on its current state. The gear
icon in the upper-right corner of each tile also lets you install the app as well as update, remove, or install a specific version. The nice thing about Anaconda is that when it installs an app, it automatically finds and installs all the dependencies (other packages) that the app needs to run, and it shows you a list of these in a pop-up window.
If you install a package or tool from the Anaconda Prompt command line interface, the Navigator Home tab might not reflect the change. To ensure this tab is always up to date, you can click the Refresh button in the upper right ➏.
At the lower left of the Home tab, you might see a link for Anaconda Nucleus ➐. You can join here, or sign into an existing account using the button in the upper-right corner ➑. Note that this button may be named either Sign in
or Connect.
You need to sign in only if you’re going to be accessing Anaconda Nucleus for sharing projects over the cloud or if you’re accessing repositories such as Anaconda.org.
The Environments Tab
Now let’s take a look at the Environments tab (Figure 1-5). To open it, click the Environments link ➊ beneath Home. Here, you’ll be able to manage conda environments and install and uninstall libraries from Anaconda, conda forge, and other sites. We’ll go into the details of this in Chapter 2.
Figure 1-5: Anaconda Navigator Environments tab
At this point, you should only see the base (root) environment ➋. The other environments shown, such as Levy,
golden_spiral,
and penguins,
are ones I’ve created previously using the Create button ➌ at the bottom of the screen. Note that there are additional buttons for cloning, importing, and removing environments. Newer versions may show an additional button for backing up environments to the cloud.
Only one environment can be active at a time. Clicking an environment link deactivates the current environment (such as base
) and activates the one you’ve clicked (such as penguins
). It can take a few seconds for the screen to update. The right half of the screen will now show you a list of the packages installed in that environment, along with a description and version number. Also note that you can change environments using the applications in the drop-down menu on the Home tab.
If you click the Installed drop-down menu, you’ll see choices for Not installed, Updatable, Selected, and All ➍. At the bottom of the screen, you’ll see how many packages are currently installed and available ➎. For the base environment, the packages preinstalled by Anaconda may change slightly over time, so the number you see might be different.
NOTE
You can also see which packages come preinstalled with Anaconda by going to https://docs.anaconda.com/anaconda/packages/pkg-docs/. You’ll need to know your operating system and Python version.
When you select Not installed, you’ll see a list of packages available from Anaconda but not currently installed in the selected environment. To see packages available from other sources, such as conda-forge, simply click the Channels button ➏ and select or add a new channel (Figure 1-6). A channel is just the path that conda takes to look for packages. Other options for working with packages include updating the packages list for the enabled channels (Update index) and searching for a package.
Figure 1-6: The Channels drop-down menu lets you add, update, and delete channels.
To remove a package from the active environment, click the checkbox next to the package (Figure 1-7). This opens a menu that offers choices such as marking a package for removal or installing a specific version number, which opens another menu.
Figure 1-7: Marking a package for an action
We talk more about managing packages in the next chapter. You can also visit the Anaconda documentation for more on this subject (https://docs.anaconda.com/anaconda/navigator/tutorials/manage-packages/).
The Learning Tab
On the Learning tab (Figure 1-8), you can discover more about Navigator, the Anaconda platform, and open data science. To open it, click the Learning link beneath Home ➊.
Figure 1-8: The Anaconda Navigator Learning tab
Click the Documentation, Training, Webinars, or Video buttons ➋ to see related tile items ➌. You can turn on all the categories at once. To turn off a highlighted category, just click it again. Clicking a tile item button will open it in a browser window ➍. The button choices are Read, View, and Explore.
The Community Tab
On the Community tab (Figure 1-9) you can learn more about events, free support forums, and social networking relating to Navigator. To open it, click the Community link beneath Home ➊.
Figure 1-9: The Anaconda Navigator Community tab
Clicking the Events, Forum, or Social buttons ➋ changes the displayed tiles. Depending on the type of tile, you can Learn More ➌, Explore ➍, or Engage ➎. Clicking a tile button opens it in a browser window.
File Menu
The File menu in the upper-left corner of the Navigator screen includes options to let you set preferences (Figure 1-10) and quit the program. Users of macOS will see additional options in the Preferences menu, including Services, for linking to your computer’s system preferences menu; Hide Anaconda-Navigator, for hiding the Navigator window; Hide Others, to hide all window except Navigator; and Show All, for showing all windows. For a detailed explanation of the Preferences menu options, see https://docs.anaconda.com/anaconda/navigator/overview/.
The Quit option shuts down Navigator and releases the memory resources used by Anaconda.
This completes the overview of Anaconda Navigator. You can find more information in the official documentation at https://docs.anaconda.com/anaconda/navigator/. In the next chapter, we’ll use Navigator, along with the command line interface, to set up conda environments that keep your projects separate, safe, and organized.
Figure 1-10: The Anaconda Navigator File ▸ Preferences menu on Windows
Summary
With Anaconda installed on your computer, you now have easy access to Python and its ecosystem of thousands of useful packages. You’re also part of the Anaconda community, with storage options, lots of learning opportunities, and the ability to upload and share packages you’ve built yourself. Lastly, you’ve become familiar with the Navigator interface, letting you run Anaconda with point-and-click convenience.
2
KEEPING ORGANIZED WITH CONDA ENVIRONMENTS
Each of your Python projects should have its own conda environment. Conda environments let you use any version of any package you want, including Python, without the risk of compatibility conflicts. You can organize your packages based on project needs rather than cluttering your base directory with unnecessary packages. And you can share your environments with others, making it possible for them to perfectly reproduce your projects.
Anaconda Navigator, introduced in the previous chapter, provides an easy point-and-click interface for managing environments and packages. For even more control, conda lets you perform similar tasks using text commands in Anaconda Prompt (for Windows) or in a terminal (for macOS or Linux).
In this chapter, we’ll use both Navigator and conda to create conda environments, install packages, manage the packages, remove the environment, and more. Before we begin, let’s take a closer look at why a conda environment is useful.
Understanding Conda Environments
You can think of conda environments as separate Python installations. The conda environment manager, represented by the cargo ship in Figure 2-1, treats each environment much like a secure shipping container. Each container
can have its own version of both Python and any other packages you need to run for a specific project. These containers are nothing more than dedicated directories in your computer’s directory tree.
Figure 2-1: A conceptual diagram for the conda environment and package managers
As shown in Figure 2-1, you can have different versions of Python and different versions of the same libraries loaded on your computer. If they’re in separate environments, they’ll be isolated and won’t conflict with one another. This is important because you might inherit legacy projects that run only with older versions of some packages.
The conda package manager, represented by the crane in Figure 2-1, finds and installs packages into your environments. Think of each package as a separate item packed in a shipping container like that heavy box of National Geographic magazines you should’ve recycled years ago.
The package manager ensures that you have the latest stable version of a package or of a version that you specify. It also finds and loads all the dependencies the main package needs to run at the correctly matched versions. A dependency is just another Python package that provides supporting functionality. For example, Matplotlib (for plotting) and pandas (for data analysis) are both built on NumPy (Numerical Python) and won’t run without it. For this reason, it’s best to install all the packages that you’re going to need for a project at the same time, if possible, to avoid dependency conflicts.
If you’re worried that installing packages in each conda environment is poor space management, set your mind to rest. No copies are created. Conda downloads packages into a package cache, and each environment links to the appropriate packages in this cache.
By default, this package cache is in the pkgs directory of your Anaconda distribution. To find it, open Anaconda Prompt or a terminal (see the instructions in Chapter 1) and enter conda info. Depending on your operating system, you should find the package cache at C:\Users\
Of course,
NOTE
By default, each user has their own package cache that’s not shared with anyone else. It’s possible to set up a shared package cache to save disk space and reduce installation times. If you want to share packages among multiple users, see the instructions at https://docs.anaconda.com/anaconda/user-guide/tasks/shared-pkg-cache/.
You can also use the conda info command (or conda info --envs) to show where your conda environments are stored. In Windows, for example, the default location is C:\Users\
The base environment is created by default when you install Anaconda, and it includes a Python installation and core system libraries and dependencies of conda. As a general guideline, avoid installing additional packages into your base environment. If you need to install additional packages for a new project, first create a new conda environment.
CONDA AND PIP
You’ll occasionally encounter a package that can’t be installed with conda. In this case, you’ll need to do so using the Python package management system (pip). Conda and pip work similarly with two exceptions. First, pip works only with Python, whereas conda works with multiple languages. Second, pip installs packages from the Python Package Index (https://pypi.org/), otherwise known as PyPI, whereas conda installs packages from the Anaconda repository (https://repo.anaconda.com/) and Anaconda.org (https://anaconda.org/). You can also install packages from PyPI in an active conda environment using pip. For your convenience, conda will automatically install a copy of pip in each new environment you create.
Unfortunately, issues can arise when conda and pip are used together to create an environment, especially when the tools are used back to back multiple times, establishing a state that can be difficult to reproduce. Most of these issues stem from that fact that conda, like other package managers, has limited abilities to control packages that it did not install. When using conda and pip together, here are the general guidelines:
Install packages needing pip only after installing packages available through conda.
Don’t run pip in the root environment.
Re-create the conda environment from scratch if changes are needed.
Store conda and pip requirements in an environment (text) file.
For more details on this issue, see https://www.anaconda.com/blog/using-pip-in-a-conda-environment/. For more on pip, see https://packaging.python.org/guides/installing-using-pip-and-virtual-environments/#creating-a-virtual-environment/. We’ll look at creating a requirements text file later in this chapter.
Working with Conda Environments Using Navigator
Setting up your first conda environment is easy. In the sections that follow, we’ll use the Anaconda Navigator GUI to work with conda environments. Later in this chapter, we’ll use conda in Anaconda Prompt (or a terminal) to do the same things. Anaconda Prompt and Navigator were introduced in Chapter 1.
Launching Navigator
In Windows, go to the Start menu and click the Anaconda Navigator desktop app. In macOS, open Launchpad and then click the Anaconda-Navigator icon. In Linux, open a terminal window and enter anaconda-navigator.
When Navigator starts, it automatically checks for a new version. If you see an Update Application message box asking you if you would like to update Navigator, click Yes. For a review of the Navigator interface, see Chapter 1.
Creating a New Environment
In Navigator, select the Environments tab and then click the Create button. This opens the Create New Environment dialog (Figure 2-2). Because this is your first environment, name it my_first_env.
Figure 2-2: The Navigator Create New Environment dialog
Note the Location information in Figure 2-2. By default, conda environments are stored in the envs folder within your Anaconda installation. For this reason, you must give each environment a unique name when using Navigator. It’s also possible to create environments in other locations using the command line interface. We’ll look at this option later in the section Specifying an Environment’s Location
on page 37.
The first package installed is Python. By default, this is the same version of Python you used when you downloaded and installed Anaconda. If you want to install a different version, you can use the pull-down menu to select it.
Click Create. In a minute or so, you should see the new environment on the Environments tab. You should now have two environments, base (root) and my_first_env. The arrow to the right of the name indicates that my_first_env is now the active environment (Figure 2-3). Active means that this is the environment in which you are now working, and any packages you load will be put in this folder. Clicking a name in the list activates that name and deactivates any other environments.
Figure 2-3: The newly created active environment (my_first_env) on the Navigator Environments tab
Also on the Environments tab is a listing of packages installed in my_first_env and their version numbers (Figure 2-4). At the bottom of the window, you can see that 12 packages were installed. These are all packages associated with Python. Over time, the number of packages may change, so you may see a different number.
Figure 2-4: The list of initially installed packages on the Navigator Environments tab
Congratulations, you just created your first conda environment! You can start using Python right away. But if you need additional packages, such as pandas and NumPy, you must install them in this environment. So let’s get to it.
Managing Packages
After you create an environment, you can use the Environments tab to see which packages are installed, check for available packages, find a specific package and install it, and update and remove packages.
Finding and Installing Packages
To find an installed package, activate the environment you want to search by clicking its name (see Figure 2-3). If the list of installed packages in the pane on the right is long and you don’t want to scroll, start typing the name of the package in the Search Packages box. This will reduce the number of packages displayed until only the package you want remains.
To find a package that is not installed, change the selection of packages displayed in the right pane by clicking the drop-down menu above it and selecting Not installed (see Figure 2-5).
Figure 2-5: The list of available but uninstalled packages on the Navigator Environments tab
As shown in the lower left of Figure 2-5, there are currently 8,601 packages automatically available after you create the new environment (this number may change over time, so the one you see might be different). To see more packages, you can add a channel using the Channels button on the Environments tab.
Click Channels to open a dialog (Figure 2-6). Then, enter conda-forge for access to the conda-forge community channel. This channel is made up of thousands of contributors who provide packages for a wide range of software (for more information, see https://conda-forge.org/docs/user/introduction.html).
Figure 2-6: Adding conda-forge using the Channels dialog
Press ENTER and then click the Update channels button to add conda-forge (Figure 2-7).
Figure 2-7: Updating channels with the Channels dialog
The pane on the right side of the Environments tab should now refresh to show that you have tens of thousands of packages available. You can remove channels by clicking the corresponding trash cans in the dialog (see Figure 2-7).
NOTE
If a package you want isn’t available from Anaconda, you can try installing it from the Python Package Index (PyPI.org/) using pip, which conda installs by default in conda environments (see the Conda and PIP
sidebar on page 24).
Remember that we wanted to add NumPy and pandas. Because NumPy is a requirement for pandas, it’s included in the pandas dependencies list. Consequently, you need to install only pandas. Enter pandas in the search box at the top of the right pane (Figure 2-8). Then, click the checkbox next to the pandas package and click Apply at the lower right. To install multiple packages at the same time, click each of the corresponding checkboxes prior to clicking Apply.
Figure 2-8: Finding and installing the pandas package on the Environments tab
A new dialog opens and, after a few moments, displays a list of packages on which pandas is dependent (Figure 2-9). As you can see, NumPy is among them. Click the Apply button to complete the installation of pandas.
If you switch to the Installed list, the number of installed packages will have increased, and the list will include both pandas and NumPy. Be aware that you might need to clear the Search Packages box to see the full list.
Figure 2-9: The list of packages to be installed including dependencies
You might notice that some of the major libraries appear to be duplicated in the Not installed list. For example, you can choose between matplotlib
and matplotlib-base
(Figure 2-10). The -base
options tend to be lighter versions for when a package, like Matplotlib, is used by other packages as a dependency. As a result, it might not be fully functional; thus, you should not install this -base
version when installing packages like Matplotlib or NumPy. This way, you can be sure that everything will work with no surprises.
Figure 2-10: There are two choices for the matplotlib library in the list of uninstalled packages.
Updating and Removing Packages
Over time, newer versions of installed packages may become available. To check for these, select the Updatable filter at the top of the right pane of the Environments tab (Figure 2-11). The list you see might not exactly match the one shown.
Figure 2-11: The right pane of the Environments tab, showing