Polluted Label

Description

  • This pattern refers to the presence of a group of event attribute values that are structurally the same, yet are distinct from each other due to differences in the exact attribute values that further qualifies the meaning of the value

Affect

  • Where the pattern exists and affects the attribute that serves as the activity name, the process mining analysis will result in the discovered process models over-fitting the event log as there will be many specific activities that should have been firstly abstracted out
  • In general, if this pattern exists and affects attributes such as case identifiers, activity names, and resource identifiers, the lack of recognition of the many-to-one mapping between entities in the log and entities in the real-world will negatively impact the quality of the results

Data Quality Issues

I15 - Incorrect data: activity name, I17 - Incorrect data: resource
  • The existence of this pattern in the log, particularly where it affects the activity name, effectively masks the underlying process step through the incorrect logging of the activity name

Manifestation and Detection

  • Pattern signature — attribute value being composed of a mixture of immutable boiler-plate text and mutable text that occurs at predictable points among the immutable text

Remedy

  • The immutable parts of the pattern need to be known/determinable
  • The mutable parts can be removed or transferred into attribute values – the immutable parts can be moved around to standardise the name