We are slowly creating more streams in Event Platform, but we don't currently sanitize and save them in the event_sanitized database like we do for legacy EventLogging data.
For example: mediawiki_mediasearch_interaction
Acceptance criteria:
- Allowlist system – file(s) where users can specify retention policies for streams
- Sanitization job that sanitizes event data from Event Platform streams according to retention policies specified with the allowlist system
TODO:
- Refactor EventLoggingSanitization job to something more generic: RefineSanitize
- Move all event_sanitized partitions to lowercased directory names to avoid re-refining data
- Apply backwards compatible usage of RefineSanitize for eventlogging
- Create new generic RefineSanitize job to be able to sanitize any data
- Copy all existent data that generic RefineSanitize job targets into event_sanitized
- Create data purge job to remove data after 90 days from all tables in event