Basic Python 3 programming
Basic HTML knowledge
Why this course?
We often have plenty of unstructured data available for free on the internet. Some of this data may be useful for combining with other structured or unstructured data available in the organization. What if I could fetch the desired unstructured data from web, transform it into structured format, and combine it with my other data, preprocess combined data, so that I can extract valuable insights to facilitate quick and better data-driven decision making?
Good news is that there are some techniques such as Web Scraping which can help us solve the problem of data gathering at scale and build curated datasets. In this course we will help you achieve this goal. Following are our Learning Objectives for this course.
- Automate the process of gathering unstructured data which is in the form of raw HTML.
- Learn to web scrap Financial News of specific listed companies on the Stock Market.
- Use BeautifulSoup4 Python library for web scraping – Install, Exception Handling, Advanced HTML Parsing.
- How to traverse a single domain to fetch data from many HTML pages.
- Process gathered (scrapped) data and transform it into structured format JSON and save as CSV.
In this course we are giving you hands-on experience of how to build and automate process of generating curated dataset from raw HTML text, scraped from web.
Who this course is for:
- Beginner Python developers who would like to learn web scraping techniques
- Anybody who wants to learn how to transform unstructured data into structured format
- Anybody who wants to learn how to scrape news (e.g. financial news) from web portals
- Anybody who wants to gather and transform unstructured data from web for their Machine Learning (NLP, Text Analytics) Projects