How to scrape a website that requires login first with Python Scrapy-framework (Filling login forms automatically)
We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest. The classic way to approach this problem is: 1. launch a browser, go to site and search for the login page 2. inspect the source code of the page to find out: I. which one is the login form (a page can have many forms, but usually one of them is the login form) II. which are the field names used for username and password (these could vary a lot) ...
Comments
Post a Comment