Why Web Scraping is the Coolest-Beginner-Friendly-Utility ever

I'm not going to talk about how to scrape. Google web-scraping blogs. You'll find a million amazing blogs and articles. Honestly, a million, and they're pretty good. I'm going to talk about how amazing web scraping is, and how I've used it so far.

But before that, if you've scraped in the past, do check out... *drum roll*

Jaunt-API

I've used Selenium, Beautifulsoup, Mechanize, and a couple others, but none are as easy as Jaunt. It truly is brilliant. Plus, the creator, Tom Cervenka, is super quick in replying to emails.

Alright. Coming back to the super awesomeness of web-scraping. It's just an alternative to the API. A hack of some sorts.
Want to apply machine learning, and need a data set? Scrape the web.
See hundreds of cute cat images you'd like to download off a webpage? Scrape the page.
Want to automate a login system? Scrape.

Here's a list of projects where I've used web-scraping, and in most, less than 100 lines of code.

I automated the process of logging in, into my Zimbra Webmail account. My university uses it as our email service. (It's probably the worst mailing software ever created. But yeah, DA-IICT uses it.) It basically logs you into your account in the background, and scrapes your webpage for new webmails. Easy peasy.

Noun Analysis
Our DBMS prof (who is a whack) wanted us to perform noun analysis on an SRS report we made. Basically 20 pages of extreme crap, and we had to go through each page and list down every noun, adjective and verb. Almost everyone spent 4 hours during the lab that day, but I got over with it in about 2 hours, after figuring out a way. Web scraping!
I couldn't find any dictionary API, so instead, I whitespace tokenized the entire document, and Google searched "'word' meaning" for each word. Google returns whether the word is a noun, an adjective, adverb, etc, etc. Scraping to the rescue!

Flight Scraper
Samarth Goyal and I (close friend from school) spent two days on making a tool that would let us know the cheapest available flight ticket prices on an input date for given destinations. Initially we scraped Yatra, Makemytrip, GoIbibo, but that was a pain. The data loads dynamically. So we went into the network requests made every time we fire a query, and boom! We found their REST API url. :) Other than GoIbibo, all others were simple cracks. (Though the code for this is on my GH, I do not want to give it away as of now)

Image Downloader | Life Hacks | Minions
This is by far the most fun one of all. Found a website with super amazing content? Images that you'd love to save? Other than Pinterest, and Facebook, all other websites are easy targets.

This is honestly how big the code is to scrape images off of almost any website.
Oh, and do check out the website mentioned in the code, and try to scrape it, if you're new to scraping. You'll love how easy it is. I've used Jaunt-API. Use that too. It's brilliant.

Youtube-MP3
Though I haven't made this yet, I have been planning to do so for a very long time. I really want to make an automated script that takes youtube URLs from a text file and downloads the respective MP3s. It would make life super, super easy. If you do end up writing a script for the same, please do write to me!

Alright. This is how far I've reached with web-scraping. It's easy. And I absolutely love it.

I usually don't ask people to, but if you have done written a super interesting script to scrape the web, please do leave a comment, and share creative ideas!

Comments

  1. This is really very informative post. Thanks for sharing such a useful knowledge.

    https://www.loginworks.com/web-scrapings/

    ReplyDelete

Post a Comment

Popular posts from this blog

Finding My Parter

Work Week: Sprints without the Jargon

Keeping Fit in India