ETL through multiple pages of a website in Power BI
Updated: Jul 13, 2020
In this article, we are going to go through an example of performing ETL (Extraction, Transformation, Load) function by looping through multiple pages on a website.
For this we are going to use a website called planecrashinfo.com, which has historical data of all the plane crashes from 1920 onwards.
To start, we are going to open one of the pages randomly and extract the data of that web page in Power BI, in this example, I select 1922
Next step would be to open Power BI Desktop
To extract data simply click on Get Data-> Web, see below GIF
In this step, we are going to transform this data a bit,
Split the Location/Operator by line breaks [#(cr)] to make two columns, Location, and Operator.
Split the Aircraft Type/Registration by line breaks [#(cr)] to make two columns, Aircraft Type, and Registration.
Split the fatalities column by / and ( into three columns i.e. Air Fatalities, Aborad, Ground Fatalities.
The next step is to define a Parameter, we can define a parameter using the Manage Parameters option in the Query editor, however, here we would use Advanced Editor and use M Code to define our parameter.
To test this we can trigger the AircraftEvents function and enter some years manually, and this will create a table in the Query Editor with the data from that specific year
Now since, we want data for all the Years from this website, we would import the list of years from this website, remove the unwanted column and unpivot multiple columns into one column, rename it as “year
Now since we have two queries, first one AircraftEvents, which is a functional query and second one list of all the years from the website. We need to find a way to pass this list of years to the AircraftEvents function.
To do this we will add a custom column in the Year_List query and invoke the function AircraftEvents with the argument [Year], where [Year] is the name of the column in the Year_List table.
This will give us a custom column with many tables, one table for each year, once we expand the table we get a consolidated data table with Plane crash data of all the years.
Your dataset is ready for you to do cool visualizations. In the next article, I will share my PowerBI dashboard visuals.
Article reproduced from http://blog.pragmaticworks.com/topic/