Mycroft-core: Mycroft's blocking of curse words `****` interfere with searches and other functionality

Created on 11 Nov 2017  ·  8Comments  ·  Source: MycroftAI/mycroft-core

(For this bug report #### will indicate my own self-selected censorship, as I don't know the policies of this project regarding cursing. **** will indicate Mycroft censoring words.)

While testing a skill I realized that somewhere in the parsing of my input it’s turning detected curse words into asterisks, such as the query "#### you" being interpreted as “**** you”. This may be a reasonable default, but I want to play albums that contain explicit titles, and this feature breaks that functionality.

This seems to affect the core, not just a third party skill:

Steps to reproduce:

  1. Say a curse word after the wake word. E.g., "hey Mycroft, #### you." (If you were searching for a song title you could be saying something like, "hey Mycroft, play #### the police by NWA")

Observed behavior:
Mycroft reports and interprets "#### you" as "**** you."

Expected behavior:
Mycroft doesn't censor curse words, as they're necessary for playing songs with explicit titles. Optionally, this should be a configurable and documented behavior.

16:50:56.682 - mycroft.client.speech.listener:transcribe:144 - DEBUG - STT: f*** you                               
16:50:56.682 - __main__:handle_utterance:55 - INFO - Utterance: [u'f*** you'] 
Bug - quick hacktoberfest help wanted

Most helpful comment

I am from Jersey and swear a lot, this is a problem for me too.

All 8 comments

(This is actually a real bug despite my newness to the project and its unusual nature.)

I ran into this too. Would be much better to have it just as an option instead of as a default.

This was up for discussion last week, I think the conclusion was to make changes to allow this to be turned off. @matheuslima can you comment on this?

I am from Jersey and swear a lot, this is a problem for me too.

any progress on this issue? the censoring is really annoying.

Hey I wasn't around when this issue first got raised so wasn't part of those discussions, but this is actually the Google STT service that we use doing the censoring. Would need to see if there's a flag we can set on the requests to turn it off. If anyone knows already, please chime in.

From a very brief skim of this issue I've been able to determine the following:

  • Most STT services that Mycroft supports (with Google STT currently being the default) have a profanity_filter flag which is passed to the API.

  • In Mycroft's STT classes, this is set to to false for the IBMWatson STT class as per: this line of code, however this parameter does not appear to be set for the GoogleSTT class.

  • In the GoogleSTT class, this parameter does not appear to be set, and I think this is the root cause of this Issue. These are the docs for Google's STT - the parameter is called ProfanityFilter.

  • However, I don't think the answer is just to set profanity_filter to be false in the GoogleSTT class. I think that we should give users the ability to set this on a per-device basis, just as Wake Words and STT engines and TTS voices can be set on a per-device basis at: https://account.mycroft.ai/devices/

  • Therefore I think this requires changes to the Mycroft Home backend to have an ideal implementation.

What I tried to do as a workaround was implement a new self.config variable in mycroft.conf:

  // Profanity filter
  "profanity_filter": false,

This then requires support in the STT classes, ie this is what I tried in the STT base class, but it didn't work;

class STT(metaclass=ABCMeta):
    """ STT Base class, all  STT backends derives from this one. """
    def __init__(self):
        config_core = Configuration.get()
        self.lang = str(self.init_language(config_core))
        config_stt = config_core.get("stt", {})
        self.config = config_stt.get(config_stt.get("module"), {})
        self.credential = self.config.get("credential", {})
        self.recognizer = Recognizer()
        self.can_stream = False
        # set profanity filter
        self.profanity_filter = self.config.get('profanity_filter')

    @staticmethod
    def init_language(config_core):
        lang = config_core.get("lang", "en-US")
        langs = lang.split("-")
        if len(langs) == 2:
            return langs[0].lower() + "-" + langs[1].upper()
        return lang

    @abstractmethod
    def execute(self, audio, language=None, ProfanityFilter=self.profanity_filter):
        pass

(At this point my microphone stopped working with Mycroft for some strange reason, and nothing I did could get it to pick up the microphone again, so I couldn't continue testing)

This didn't work - the ProfanityFilter is still set to True, and the *** remain. But, this might be a clue for others who want to tackle this.

I tested the google STT module and profanity filter seems to be off by default, but requires one self to have a google cloud account to use.

The API used by the Mycroft backend (which is not the google cloud Speech to Text service, but another of Google's older APIs) always has it enabled and doesn't allow turning it off if I recall correctly.

A config setting is probably a good idea though. Default should be off in my opinion.

Was this page helpful?
0 / 5 - 0 ratings