What is webscraping and how can we scrape data from websites

Topics: What is webscraping and why webscrape

Web scraping is the process of scraping webpages for data points. This is to create datasets that are usable for analysis. While web scraping can be done manually by a software user, the term typically refers to automated processes implemented using a bot or web crawler.

“Data displayed by websites can only be viewed using a web browser. Most websites do not allow you to save a copy of this data to a storage location or database. If you need the data, the only option is to manually copy and paste the data

Web Scraping is different to Web Crawling as web scraping is for specific data points, web crawling is to index whole pages.

Web scraping software will automatically load, crawl and extract data from multiple pages or websites based on your requirements. It is either custom built for a specific website or software which can be configured to work with any website

Web scraping software can be categorised into two main applications:

  1. Cloud-based applications or browser applications installed locally on your computer that can scrape data from individual web pages.

  2. Coded solutions that use Python and R packages to automate the scraping process from multiple web pages. In this workshop we will look at packages such as Beautifulsoup, Selesium and Scrapy.

References

www.parsehub.com/blog/what-is-web-scraping

www.webharvy.com/articles/what-is-web-scraping

librarycarpentry.org/lc-webscraping/