Posts

User Model Test Case

# Hi, if you are writing model level test cases then you can try something like this from django.test import TestCase from django.contrib.auth import get_user_model from s3boto.models import Resource import tempfile class UserTestCase(TestCase): def _create_image(self): from PIL import Image # path='/Users/Mac/Downloads/' with tempfile.NamedTemporaryFile(suffix='.png', delete=False) as f: image = Image.new('RGB', (200, 200), 'white') image.save(f, 'PNG') return open(f.name, mode='rb') def setUp(self): self.image = self._create_image() self.name = self.image.name.split('.')[0] self.ext = self.image.name.split('.')[1] self.resource = Resource( name=self.name, ext=self.ext, bucket='test', data_source_type=0, type=1, descr

Import CSV in Postgress using Python Panda , compare two csv files and compare two db table using python

import pandas as pd from sqlalchemy import create_engine import psycopg2 import csv def importCSV(): engine = create_engine('postgresql://credr:credr@localhost/credr_db1') df = pd.read_csv('/Users/ranvijay/Desktop/data_sanity_21082015/Katta_data.csv') df.to_sql('data_anlytics', engine, if_exists='replace') df = pd.read_csv('/Users/ranvijay/Desktop/data_sanity_21082015/Data_sanity_21082015.csv') df.to_sql('data_bd', engine, if_exists='replace') # df = pd.read_csv('/Users/ranvijay/Downloads/BikeSheet.csv') # df.to_sql('bikesheet_dump', engine, if_exists='replace') def comapre_two_table(): # execute our Query conn_string = "host='localhost' dbname='credr_db1' user='credr' password='credr'" # print the connection string we will use to connect print "Connecting to database\n ->%s" % (conn_string) # get

Scrapy - Crawls for Scraping AJAX Pages

Here is the code of a simple spider that would use Crawls for Scraping AJAX Pages. #------------------------------------------------------------------------------- # Name: module1 # Purpose: # # Author: Ranvijay.Sachan # # Created: 31/10/2014 # Copyright: (c) Ranvijay.Sachan 2014 # Licence: <your licence> #------------------------------------------------------------------------------- from scrapy.http import Request from scrapy.spider import BaseSpider import urllib import json from scraping.DoveItem import DoveItem class DoveAjaxspider(BaseSpider): name = "dove" allowed_domains = ["dove.in"] start_urls = ["http://www.mydove.com.au/en/"] def parse(self, response): # This receives the response from the start url. But we don't do anything with it. allProductType = ['Bar/Body Wash','Lotion','Deodorant','Face','Hair','Men+Care'] url

scrapy - How to extract items that are paginated

[Python] Get links of product to every page of a retailor Here is the code of a simple spider that would use Crawling scraped links & next pagination. You have two options to solve your problem. The general one is to use yield to generate new requests instead of return. That way you can issue more than one new request from a single callback. Check the second example at http://doc.scrapy.org/en/latest/topics/spiders.html#basespider-example. #------------------------------------------------------------------------------- # Name: module1 # Purpose: # # Author: Ranvijay.Sachan # # Created: 31/10/2014 # Copyright: (c) Ranvijay.Sachan 2014 # Licence: <your licence> #------------------------------------------------------------------------------- from scrapy.spider import BaseSpider from scrapy.selector import HtmlXPathSelector from scrapy.http.request import Request from scraping.articles import ArticleItem from scrapy.contrib.linkextractors.sgml import

How to scrape a website that requires login first with Python Scrapy-framework (Filling login forms automatically)

We often have to write spiders that need to login to sites, in order to scrape data from them. Our customers provide us with the site, username and password, and we do the rest. The classic way to approach this problem is: 1.      launch a browser, go to site and search for the login page 2.      inspect the source code of the page to find out:                       I.         which one is the login form (a page can have many forms, but usually one of them is the login form)                      II.         which are the field names used for username and password (these could vary a lot)                     III.         if there are other fields that must be submitted (like an authentication token) 3.      write the Scrapy spider to replicate the form submission using   FormRequest Being fans of automation, we figured we could write some code to automate point 2 (which is actually the most time-consuming) and the result is   login form, a library to automatically fi