Guide to Scrape Data from an Investment Stock Website
- 6 minsTo understand the stock market, my elder brother Anthony gives me monthly assignments to research the stock exchange in Malaysia. And I have to report 2 to 3 stocks and give legit reasons why I chose them before purchasing one. But my monthly research involves repeating the same steps:
- Go to the stock investment portal
- Open multiple links that analyst recommends to buy
- Compile a list of quality stocks with good reasons
To save me the hassle of repeating the same steps I created a Python script and be done with the assignment as soon as possible (because I am lazy). For now, this script can only do step 1 and step 2. Keep in mind this post is catered towards Mac / Ubuntu users.
Unfortunately, i3investor does not have an API, but all is well. My approach is to mine, data using webscraping method.
First, let’s look at the repeated steps that I have to go through each time. If you go the portal, you can see the following table that looks this:
Screenshot 6/30/2016 from Joshua’s Macbook
As you know, there are many stocks to choose from and the ‘Price Call’ tab represents what analysts from different professional firms have to say about the particular stock. So what I usually do is to open multiple tabs that have ‘BUY’ as the keyword at the ‘Price Call’ tab.
What my script does is that it connects to the stock portal and opens all stocks that the analyst says ‘BUY’ in different tabs. The result looks like what you think it is.
Sounds easy enough? Good, you’re getting there.
Step 1:
Before you we go into details you would need to install lxml, requests , and BeautifulSoup on your Terminal:
Step 2:
Alright let’s start coding, we need to import these libraries for the python module to work.
Step 3:
We then need the script to connect to the portal. I took the actual URL and hard coded it in. The 1st function looks like this
Step 4:
Once the script is connected to the website, web scraping begins here. If you see the actual page source of this site, it looks ugly. BeautifulSoup does a good job ‘beautifying’ the HTML page for your convenience
To understand the different tags and to know which one to scrape, I created the html page from below :
Here’s a snapshot of before / after look on the html file:
before:
after :
Once I know which specific tags to scrape, I start compiling the links to these stocks in a list
I now have a list of stock names and their corresponding Price Target links
Step 5:
The last function is to simply open each link in different tabs using the webbrowser library in Python
Step 6:
Okay, all I need to do know right now is to call these 3 functions and boom! script success, time to research stock = reduced