23 Web Scraping Using BeautifulSoup in Python

Long Lin & Shiang XuanYuan

23.1 Descrition:

There is so much information on the Internet that a human being can’t master it all in a lifetime. What we need to do is not access to this information but using an extensible way to collect, organize, and analyze it. Imagine you have to pull a large amount of data from websites and you want to do it as quickly as possible. How would you do it without manually going to each website and getting the data? Well, “Web Scraping” is the answer. Web Scraping just makes this job easier and faster.

Although we could use web scraping in R, the most common and easiest way of retrieving data from internet is using Python. We would give a brief introduction on how to use Python to do the web scraping, storing data into a csv file, then we could use any another languages to do the data analysis, data visualization, etc.