And don’t forget that Power Query is case-sensitive! Note that this table should have at least the two-column of “To”, and “From”. This table is called here a Transformation Table. Sometimes in the merge operation, you need a mapping table. If you use these two functions directly in the M script, you will have some parameters to set Table.FuzzyGroup Function in M Transformation Table In addition to the option added in the graphical interface of Power Query, we also have a Power Query Function that does the Fuzzy Grouping: This table should have at least two columns of “To” and “From”. It gives you the option to use your own mapping table. This is like a mapping table, let’s check it out a bit later in this post. If you want the similarity algorithm to work regardless of the number of spaces in the text, then select this option. If you want the similarity algorithm to work regardless of the upper or lower case letters, then select this option. If the similarity of the two text values is more than the threshold it will be considered as a successful match. If you want to set other settings for the Fuzzy Grouping here is what they mean Option For example, 0.92 similarity threshold will give me the below setting the similarity threshold for fuzzy grouping The lower the similarity threshold the more matching of non-similar values. For example, if I change the similarity threshold to 1, It means 100% matching, this will result in seven groups. You can change the options such as Ignore case, or Similarity threshold. Similarity threshold for Fuzzy grouping in Power Query
The default threshold for Fuzzy Grouping is 0.8, which means 80% similarity. My suggestion is to first perform normal grouping on the items that match and then for the non-matching items perform the fuzzy operation. Especially with more rows in the data table, you will feel it much more. This process is a very time-consuming process. The reason is that every text value has to be compared with every other text value in the table, the similarity threshold of the two has to be calculated (based on the algorithm above), and then if passes the similarity threshold merge into a group. Any fuzzy operation on the dataset will bring a huge performance impact to the data processing. Performance Aspectīe careful whenever you use Fuzzy Merge or Fuzzy Grouping. The algorithm is based on Jaccard Index, which is explained here. This is possible because Fuzzy grouping uses an algorithm to find the similarity threshold of text values. Fuzzy Grouping in Power Query and Power BIĮnabling the fuzzy grouping will result in the grouping as below Fuzzy grouping’s result in Power QueryĪs you can see, in the above screenshot, the three values of “Management”, “Mangmt”, and “Managmnt” are all grouped into one. In the Group-by window, you will see the “Use Fuzzy Grouping” option. You have to choose Group By on the column(s) you want. Performing fuzzy grouping is very similar to the normal grouping in Power Query. This would get you to the Get Data experience to connect to any data sources you want. Once you created a dataflow, you can use the add new table defining new tables inside Power BI dataflow But if you want to try it and online is the only option, it means you need to create a new dataflow inside an organizational workspace for it creating dataflow in Power BI Service It surely will be available in other Power Query options on desktop such as Power BI Desktop and Excel in the future. Power Query OnlineĪt the time of writing this article, this option is only available in Power Query online.
This is something that can be done using Fuzzy Grouping. We do not have a mapping table that tells us “Managmnt” means “Management”. We want an output like this: desired outcome Our requirement is to have similar items grouped together. However, because every value in the Department column is different, we will have seven groups as the result data grouped based on the exact matching If the requirement is to group items based on the Department field. Or another example is “Managmnt” and “Management”. we have department values such as “Sales”, and “Sale”. Notice that the Department field has data quality issues. “source” table which is the data of employees and their departments.