(For this bug report ####
will indicate my own self-selected censorship, as I don't know the policies of this project regarding cursing. ****
will indicate Mycroft censoring words.)
While testing a skill I realized that somewhere in the parsing of my input it’s turning detected curse words into asterisks, such as the query "####
you" being interpreted as “****
you”. This may be a reasonable default, but I want to play albums that contain explicit titles, and this feature breaks that functionality.
This seems to affect the core, not just a third party skill:
Steps to reproduce:
####
you." (If you were searching for a song title you could be saying something like, "hey Mycroft, play ####
the police by NWA")Observed behavior:
Mycroft reports and interprets "####
you" as "****
you."
Expected behavior:
Mycroft doesn't censor curse words, as they're necessary for playing songs with explicit titles. Optionally, this should be a configurable and documented behavior.
16:50:56.682 - mycroft.client.speech.listener:transcribe:144 - DEBUG - STT: f*** you
16:50:56.682 - __main__:handle_utterance:55 - INFO - Utterance: [u'f*** you']
(This is actually a real bug despite my newness to the project and its unusual nature.)
I ran into this too. Would be much better to have it just as an option instead of as a default.
This was up for discussion last week, I think the conclusion was to make changes to allow this to be turned off. @matheuslima can you comment on this?
I am from Jersey and swear a lot, this is a problem for me too.
any progress on this issue? the censoring is really annoying.
Hey I wasn't around when this issue first got raised so wasn't part of those discussions, but this is actually the Google STT service that we use doing the censoring. Would need to see if there's a flag we can set on the requests to turn it off. If anyone knows already, please chime in.
From a very brief skim of this issue I've been able to determine the following:
Most STT services that Mycroft supports (with Google STT currently being the default) have a profanity_filter
flag which is passed to the API.
In Mycroft's STT classes, this is set to to false
for the IBMWatson
STT class as per: this line of code, however this parameter does not appear to be set for the GoogleSTT
class.
In the GoogleSTT
class, this parameter does not appear to be set, and I think this is the root cause of this Issue. These are the docs for Google's STT - the parameter is called ProfanityFilter
.
However, I don't think the answer is just to set profanity_filter
to be false
in the GoogleSTT
class. I think that we should give users the ability to set this on a per-device basis, just as Wake Words and STT engines and TTS voices can be set on a per-device basis at: https://account.mycroft.ai/devices/
Therefore I think this requires changes to the Mycroft Home backend to have an ideal implementation.
What I tried to do as a workaround was implement a new self.config
variable in mycroft.conf
:
// Profanity filter
"profanity_filter": false,
This then requires support in the STT
classes, ie this is what I tried in the STT base class, but it didn't work;
class STT(metaclass=ABCMeta):
""" STT Base class, all STT backends derives from this one. """
def __init__(self):
config_core = Configuration.get()
self.lang = str(self.init_language(config_core))
config_stt = config_core.get("stt", {})
self.config = config_stt.get(config_stt.get("module"), {})
self.credential = self.config.get("credential", {})
self.recognizer = Recognizer()
self.can_stream = False
# set profanity filter
self.profanity_filter = self.config.get('profanity_filter')
@staticmethod
def init_language(config_core):
lang = config_core.get("lang", "en-US")
langs = lang.split("-")
if len(langs) == 2:
return langs[0].lower() + "-" + langs[1].upper()
return lang
@abstractmethod
def execute(self, audio, language=None, ProfanityFilter=self.profanity_filter):
pass
(At this point my microphone stopped working with Mycroft for some strange reason, and nothing I did could get it to pick up the microphone again, so I couldn't continue testing)
This didn't work - the ProfanityFilter
is still set to True
, and the ***
remain. But, this might be a clue for others who want to tackle this.
I tested the google STT module and profanity filter seems to be off by default, but requires one self to have a google cloud account to use.
The API used by the Mycroft backend (which is not the google cloud Speech to Text service, but another of Google's older APIs) always has it enabled and doesn't allow turning it off if I recall correctly.
A config setting is probably a good idea though. Default should be off in my opinion.
Most helpful comment
I am from Jersey and swear a lot, this is a problem for me too.