Web Scraping with Python

intro image here
The web is a treasure trove of information
Contents:
  • Intro | topics: What is webscraping and why webscrape
  • Webscraping ethics | topics: ethical considerations, permissions, scraping potentially sensitive data
  • API's | topics: What are API's; Why use API's; Do to access API's
  • Webpage Structure | topics: Webpage structure; static vs dynamic content; tags
  • Python | topics: Python; Beautiful Soup; Selenium

Python code saved as an iPython notebook

An iPython notebook containing the python script used in this tutorial for webscraping can be downloaded here

Install Anaconda with a python environment

Python

Python is a popular language for research computing, and great for general-purpose programming as well. Installing all of its research packages individually can be a bit difficult, so we recommend Anaconda, an all-in-one installer.

Regardless of how you choose to install it, please make sure you install Python version 3.x (e.g., 3.6 is fine).

We will teach Python using the Jupyter Notebook, a programming environment that runs in a web browser (Jupyter Notebook will be installed by Anaconda). For this to work you will need a reasonably up-to-date browser. The current versions of the Chrome, Safari and Firefox browsers are all supported (some older browsers, including Internet Explorer version 9 and below, are not).

Windows

Video Tutorial
  1. Open https://www.anaconda.com/distribution/#download-section with your web browser.
  2. Download the Anaconda for Windows installer with Python 3. (If you are not sure which version to choose, you probably want the 64-bit Graphical Installer Anaconda3-...-Windows-x86_64.exe)
  3. Install Python 3 by running the Anaconda Installer, using all of the defaults for installation except make sure to check Add Anaconda to my PATH environment variable.

Linux

  1. Open https://www.anaconda.com/distribution/#download-section with your web browser.
  2. Download the Anaconda Installer with Python 3 for Linux.
    (The installation requires using the shell. If you aren't comfortable doing the installation yourself stop here and request help at the workshop.)
  3. Open a terminal window and navigate to the directory where the executable is downloaded (e.g., `cd ~/Downloads`).
  4. Type
    bash Anaconda3-
    and then press Tab to autocomplete the full file name. The name of file you just downloaded should appear.
  5. Press Enter. You will follow the text-only prompts. To move through the text, press Spacebar. Type yes and press enter to approve the license. Press Enter to approve the default location for the files. Type yes and press Enter to prepend Anaconda to your PATH (this makes the Anaconda distribution the default Python).
  6. Close the terminal window.
Hosted by [eResearch, Griffith University](https://www.griffith.edu.au/eresearch-services).

Theme: workshop-template-b by evanwill is built using Jekyll on GitHub Pages. The site is styled using Bootstrap with FontAwesome icons.

Content: CC BY-SA Brett Parker & Ben McRae 2021 (get source code). Creative Commons License