Data can be visualized in Excel, however, there are some limitations in Excel that Power BI can fix. The below is the example when you have a flat file and make it to dimension model to perform better in Power BI.


Background of this project:

The usage of Box is needed to be visualized. Here are some issues that Excel cannot handle easily.

  • Data is too big (7 GB, 35 millions rows)
  • Level of the tree (hierarchy) is complicated
  • need to add classification (faculty/staff/ student)

Goal:

Create star schema data model based on the current flat file.

Steps:

Data set up:

Make sure data is accessible and check the data.

  1. The data is in SharePoint as csv file
  2. Connect data in PowerBI Desktop (since this data will NOT be refreshed, used get data> csv option)
  3. click Transform data
  4. check column quality box
  5. Change date column 
    1. Since there are "N/A" values, replace them to null. 
    2. change type from abc → date

Creating dimensions:

Once you figure out which fields can go to dimension table, use reference function to create dimensions.

  1. Create path Dimension
    1. right-click on the base table and click "reference"
    2. rename the new table (folder)
    3. remove columns
    4. remove duplicate
    5. split column by delimiter
  2. Create File Dimension
    1. right-click on the base table and click "reference"
    2. rename new table to file name
    3. remove columns (file_name & file_id)
    4. Remove duplicate
  3. Go back to the base table and remove the columns (path_id, file_name & file_id)
  4. Create Box Data table
    1. right-click on the base table and click "reference"
    2. rename new table to "Box Data"
  5. Replace slash ( / ) with pipe( | )
    1. right-click path_ids column and put values on the boxes
    2. Remove the first pipe by using "Extract" function
    3. replace with "10" for the last characters
    4. delete 10 and type Text.Length( [path_ids ] ) 
    5. = Table.AddColumn(#"Filtered Rows", "First Characters", each Text.Start([path_ids], Text.Length([path_ids]) - 1), type text)
    6. extract > text after delimiter with 5
    7. change the column name to "path"
  6. Go to Box data table and remove file name & folder name columns
    1. remove null date
  7. Go back to "Home" and "Close and Apply"
  8. Create Calender table (moved to Create Date/ Calendar table)
    1. File > Options > Data Load > uncheck the  box for auto date/time
    2. Check the earliest date and latest date by dragging the field on the visualization pane
    3. Go to the Modeling tab on the ribbon 
    4. Go to table pane and right-click the Calendar table, then choose Mark as date table
    5. add new column and create "year"
    6. Go to modeling pane and create relationships
  9.  
    1. Go back to the visualization pane and create a new measure
    2. create some table
      1. Choose Don't summarize
    3. this guy shows count=2
    4. Copy value > edit query > filter row
    5. change "=" to "<>"
    6. filter null
    7. GO back to relationships
  10. New Measure: file size
    1. hide the original column (size) by right-clicking the field
  11. New Measure: number of files (distinct)
  12. New Measure:  size (MB) 
  13. Go back to visualization, and drag fields to create some visuals
  14. Go to Data view and click folder and create new column: Hierarchy depth 
    1. data looks like this
    2. the earlier steps (changing slash to pipe) were needed because of this
    3. New Column: level 1
      1. error; insert text before the delimiter then change path_ids to path
      2. null is the problem
      3. right-click the cell > replae value
  15. Open Chrome and search for DAX formatter(daxformatter.com)
    1. copy the DAX from PBI Desktop
    2. Paste the code on the website
    3. copy the DAX from website and paste it on PBI Desktop
  16. create new columns: Level 2, Level 3, ....Level 23.
    1. create a hierarchy by dragging level 2 over level1 and so on...
  17. Change the source (need full data now)
    1. Edit query
    2. click source
    3. add new path
    4. error: comma on data breaks the report without double quotes (")
      1. right-click on file_owner data cell > text filter
      2. enable data for box_data partial (2) and folder, and box data
      3. still error
      4. Tried to open the data file with Notepad ++ but the file is too big to open
      5. These five tows are the errors
      6. modify deleted column
      7. still error








  1. Notes: 
    1. Integer is better than text 
    2. PBI changes data type based on the top 1000 rows.  Change to "based on entire data set" on the bottom left- corner of the Query Editor. Look at the data by sorting the values.
  2. Error: 
    1. click the error cell
    2. data in this row got shifted due to the double quote (") in data 
    3. Solutions: Type CsvStyle=CsvStyle.QuoteAlways


Attachments:

image2020-9-8_9-46-23.png (image/png)
image2020-9-8_9-48-11.png (image/png)
image2020-9-8_9-54-32.png (image/png)
image2020-9-8_9-56-3.png (image/png)
image2020-9-8_9-56-55.png (image/png)
image2020-9-8_13-7-23.png (image/png)
image2020-9-8_13-8-34.png (image/png)
image2020-9-8_13-15-42.png (image/png)
image2020-9-8_13-17-39.png (image/png)
image2020-9-8_13-17-52.png (image/png)
image2020-9-8_13-18-3.png (image/png)
image2020-9-8_13-19-48.png (image/png)
image2020-9-8_13-20-58.png (image/png)
image2020-9-8_13-22-37.png (image/png)
image2020-9-8_13-27-42.png (image/png)
image2020-9-8_13-30-44.png (image/png)
image2020-9-8_13-32-24.png (image/png)
image2020-9-8_13-34-20.png (image/png)
image2020-9-8_13-38-42.png (image/png)
image2020-9-8_13-42-14.png (image/png)
image2020-9-8_13-47-3.png (image/png)
image2020-9-8_13-48-38.png (image/png)
image2020-9-8_13-53-44.png (image/png)
image2020-9-8_13-57-4.png (image/png)
image2020-9-8_13-57-59.png (image/png)
image2020-9-8_13-59-5.png (image/png)
image2020-9-8_14-0-15.png (image/png)
image2020-9-8_14-2-17.png (image/png)
image2020-9-8_14-19-11.png (image/png)
image2020-9-8_14-20-2.png (image/png)
image2020-9-8_14-21-4.png (image/png)
image2020-9-8_14-26-49.png (image/png)
image2020-9-8_14-30-8.png (image/png)
image2020-9-8_14-32-7.png (image/png)
image2020-9-8_14-36-5.png (image/png)
image2020-9-8_14-36-50.png (image/png)
image2020-9-8_14-39-35.png (image/png)
image2020-9-8_14-49-28.png (image/png)
image2020-9-8_14-52-18.png (image/png)
image2020-9-8_14-53-4.png (image/png)
image2020-9-8_14-56-49.png (image/png)
image2020-9-8_14-57-16.png (image/png)
image2020-9-8_14-59-26.png (image/png)
image2020-9-8_15-0-25.png (image/png)
image2020-9-8_15-3-45.png (image/png)
image2020-9-8_15-5-54.png (image/png)
image2020-9-8_15-7-51.png (image/png)
image2020-9-8_15-19-46.png (image/png)
image2020-9-8_15-22-13.png (image/png)
image2020-9-8_15-25-37.png (image/png)
image2020-9-8_15-48-36.png (image/png)
image2020-9-8_15-48-55.png (image/png)
image2020-9-8_15-49-13.png (image/png)
image2020-9-8_15-50-3.png (image/png)
image2020-9-8_16-19-49.png (image/png)
image2020-9-8_16-25-23.png (image/png)
image2020-9-8_16-36-2.png (image/png)
image2020-9-10_7-56-9.png (image/png)