Search
  • Surya Prakash Garg

ETL through multiple pages of a website in Power BI

Updated: Jul 13, 2020

In this article, we are going to go through an example of performing ETL (Extraction, Transformation, Load) function by looping through multiple pages on a website.






Pre-extraction


For this we are going to use a website called planecrashinfo.com, which has historical data of all the plane crashes from 1920 onwards.



To start, we are going to open one of the pages randomly and extract the data of that web page in Power BI, in this example, I select 1922




Next step would be to open Power BI Desktop


Extract


To extract data simply click on Get Data-> Web, see below GIF

Get Data in Power BI Desktop

Transform


In this step, we are going to transform this data a bit,

  • Split the Location/Operator by line breaks [#(cr)] to make two columns, Location, and Operator.

  • Split the Aircraft Type/Registration by line breaks [#(cr)] to make two columns, Aircraft Type, and Registration.

  • Split the fatalities column by / and ( into three columns i.e. Air Fatalities, Aborad, Ground Fatalities.

Using Parameters


The next step is to define a Parameter, we can define a parameter using the Manage Parameters option in the Query editor, however, here we would use Advanced Editor and use M Code to define our parameter.



Using Advanced Editor to pass on a Parameter

To test this we can trigger the AircraftEvents function and enter some years manually, and this will create a table in the Query Editor with the data from that specific year


Now since, we want data for all the Years from this website, we would import the list of years from this website, remove the unwanted column and unpivot multiple columns into one column, rename it as “year



Extracting year list from the website

Now since we have two queries, first one AircraftEvents, which is a functional query and second one list of all the years from the website. We need to find a way to pass this list of years to the AircraftEvents function.


To do this we will add a custom column in the Year_List query and invoke the function AircraftEvents with the argument [Year], where [Year] is the name of the column in the Year_List table.


= AircraftEvents([Year])


This will give us a custom column with many tables, one table for each year, once we expand the table we get a consolidated data table with Plane crash data of all the years.



Calling AircraftEvents table in Year_List table

Your dataset is ready for you to do cool visualizations. In the next article, I will share my PowerBI dashboard visuals.


Photo by PhotoMIX Ltd. from Pexels

Article reproduced from http://blog.pragmaticworks.com/topic/


#powerbi, #dataextraction

23 views0 comments