Linux: I2C Broadcom bug workaround.

Created on 20 Mar 2013  ·  73Comments  ·  Source: raspberrypi/linux

The broadcom BCM2835 has a (hardware) bug in the I2C module.

The BCM implements "clock stretching" in a bad way:

Instead of ensuring a minimum "hightime" for the clock, the BCM lets the I2C clock internally run on, and when it's time to flip the clock again, it checks wether or not the clock has become high, and if it is low, it does a "clock stretch". However if just in the few nanoseconds before this check the slave releases the clock signal indicating it is ready for a new clock, the clock pulse may become as short as 40ns. (that's the lowest I'm able to measure, and that of course is quite rare). With a 100kHz (10 microseconds) I2C clock, the "bad" moments to release the clock are 5 microseconds apart.

My slaves require a clock pulse of at least 500ns. (which should be a factor of ten shorter than what the master should generate when it is configured for 100kHz operation).

That is a hardware bug that we can't do much about at the moment. ( Tip for the hardware guys: instead of implementing clock stretching in this way: inhibit the counting of the CLOCKDIV counter while the clock remains low. This will probably clock-stretch a few fast-clocks on every cycle, but standards conformance will improve substantially).

When this occurs, the BCM will send a byte, with a very short first clock pulse. The slave then doesn't see the first clock pulse, so when at the end of that byte it is time to ACK that byte, the slave is still waiting for the 8th bit. The BCM interprets this as a "NAK" and aborts the I2C transfer. However at that point in time the slave sees its 8th clock pulse and issues an "ACK" (Pulls the SDA line low). This in turn means that the BCM cannot issue a proper "START" condition for the next transfer.

Suggested software workaround to reduce the impact of this hardware bug: When a transfer is aborted due to a "NAK" for a byte, issue a single extra clock cycle before the STOP to synchronize the slave. This probably has to be done in software.

This only happens in the "error path" when the slave doesn't ACK a byte. Maybe it's best to only do this when it is NOT the first byte in a transfer, as the bug cannot happen on the first byte of a transfer.

Does this software fix belong in the driver? Yes. It's a driver for a piece of hardware that has a bug. It is the duty of the driver to make this buggy hardware work the best it can.

Bug

Most helpful comment

Hi hmf2015,
This is not a forum. This is a bug tracker. This list-of-comments discusses a bug which is unrelated to your problems. Please discuss your problem with an appropriate forum or helpdesk. Again, we appreciate your question: "Maybe it's the same? Maybe my description helps solve that bug?" but now we know for sure that what you are seeing is unrelated, please do not "pollute" this discussion further.

All 73 comments

If a workaround in the driver makes it work better for the majority of users, then sure it should be included.
Do you have a suitable patch?

No. Not yet.

1) I have not delved into the I2C driver yet. Around a year ago, I had plans to write it myself but before I got around to it, someone else had written it. So someone who already knows this driver might be better equipped to do this as a five minute project. It would take me at least several hours to get to know the driver.
2) It pays to first discuss it if you (and maybe others) agree that a fix/workaround is warranted.
3) I just wanted to document this as a "todo" we shouldn't forget it....
4) Maybe someone would suggest a different workaround.

Speaking about workarounds: The workaround could also be triggered by the SDA line still being low when we are trying to initiate a new transaction. That is clearly a "problem" situation.

The "it goes wrong" case might be rare for "normal" cases. Many people will have a hardware-i2c module on the I2C-bus. Back when I2C was invented "10us" might have been a "very short" timeframe for some chips to handle the requests, so clock stretching was necessary. Nowadays any hardware I2C implementation should be able to handle "fast I2C" up to even faster.

This combines with the way the atmel guys implemented their async modules. Instead of having the module run on the externally provided clock (i2c-slave or SPI-slave), the module still runs on the processor clock. And it synchronizes all incoming signals by passing them through a bunch of filpflops. Good hardware design. The alternative is IMHO better: run the module off the external clock, and synchronize when the data passes to the other (cpu-) clock domain. This would for example allow the SPI module to function at 20MHz, even though the CPU runs at only 8.

Anyway. Enough hardware rambling.

Suggested workaround: Issue an extra clock cycle when at the beginning of a transaction the SDA line is still low. (it should start out high to be able to issue a "start" condition).

Firstly, I know little about I2C. I have spoken to Gert and others about workarounds.

There are two problems with I2C known about:

  1. The clock stretching problem you have described
  2. Problem with I2C state machine when doing restart

For 1, Gert's view is there is no foolproof workaround.
If I2C slave doesn't support clock stretching you are fine.
If I2C clock stretches by a bounded amount, then lowering the I2C clock frequency can avoid the problem
Switching to a bit banging driver for slaves that don't fall in the previous cases is a safe workaround.

I don't know if your proposal will work. I'd be interested to hear if it reduces the problem, or fixes it completely.

Gert says he has a desciption of the bug that he'll try and get released.

For 2, in the GPU I2C driver we have some code working around a problem with state machine:

/***********************************************************
 * Name: i2c_actual_read
 *
 * Arguments:
 *       const I2C_PERIPH_SETUP_T *periph_setup,
         const uint32_t sub_address,
         const uint32_t data_to_read_in_bytes,
         void *data
 *
 * Description: Routine to actually transfer data to the I2C peripheral
 *
 * Returns: int == 0 is success, all other values are failures
 *
 ***********************************************************/
static int32_t i2c_actual_read(  const I2C_PERIPH_SETUP_T *periph_setup,
                                 const I2C_USER_T *user,
                                 const uint32_t sub_address,
                                 const uint32_t read_nbytes,
                                 void *data,
                                 const int using_interrupt)
{
   int32_t rc = -1; /* Fail by default */
   int32_t status = 0;
   int32_t nbytes;

   uint32_t sub;
   uint8_t* read_data = (uint8_t*)data;
   uint32_t data_read = 0;

   if( NULL != periph_setup && NULL != data ) {

      // confirm that the latch is held for this transaction
      assert( i2c_state.periph_latch[ periph_setup->port ] != rtos_latch_unlocked() );

      // -> START
      // -> Slave device address | WRITE
      // <- Ack
      // -> Sub address
      // <- Ack
      // -> Sub address
      // <- Ack
      // -> ReSTART
      // -> Slave device address | READ
      // <- Ack
      // <- Data[0]
      // -> Ack
      // <- Data[1]
      // -> nAck
      // -> STOP

      I2CC_x( periph_setup->port )     = I2CC_EN | I2CC_CLEAR; /* Enable I2C and clear FIFO */
      I2CS_x( periph_setup->port )     = I2CS_ERR | I2CS_DONE | I2CS_CLKT; /* clear ERROR and DONE bits */
      I2CA_x( periph_setup->port )     = periph_setup->device_address;

      sub = sub_address;
      nbytes = periph_setup->sub_address_size_in_bytes;

      if( 0 == nbytes ) {
         /* No subaddress to send - just issue the read */
      } else {
         /*
           See i2c.v: The I2C peripheral samples the values for rw_bit and xfer_count in the IDLE state if start is set.

           We want to generate a ReSTART not a STOP at the end of the TX phase. In order to do that we must
           ensure the state machine goes RACK1 -> RACK2 -> SRSTRT1 (not RACK1 -> RACK2 -> SSTOP1). 

           So, in the RACK2 state when (TX) xfer_count==0 we must therefore have already set, ready to be sampled:
           READ ; rw_bit     <= I2CC bit 0   -- must be "read"
           ST;    start      <= I2CC bit 7   -- must be "Go" in order to not issue STOP
           DLEN;  xfer_count <= I2CDLEN      -- must be equal to our read amount

           The plan to do this is:
           1. Start the sub-add]ress write, but don't let it finish (keep xfer_count > 0)
           2. Populate READ, DLEN and ST in preparation for ReSTART read sequence
           3. Let TX finish (write the rest of the data)
           4. Read back data as it arrives

         */
         assert( nbytes <= 16 );
         I2CDLEN_x( periph_setup->port )  = nbytes;
         I2CC_x( periph_setup->port )    |= I2CC_EN | I2CC_START; /* Begin, latch the WRITE and TX xfer_count values */

         /* Wait for us to leave the idle state */
         while( 0 == ( I2CS_x(periph_setup->port) & (I2CS_TA|I2CS_ERR) ) ) {
            _nop();
         }
      }

      /* Now we can set up the parameters for the read - they don't get considered until the TX xfer_count==0 */
      I2CDLEN_x( periph_setup->port )  = read_nbytes;
      I2CC_x( periph_setup->port )     |= I2CC_EN | I2CC_START | I2CC_READ;

      /* Let the TX complete by providing the sub-address */
      while( nbytes-- ) {
         I2CFIFO_x( periph_setup->port ) = sub & 0xFF; /* No need to check FIFO fullness as sub-address <= 16 bytes long */
         sub >>= 8;
      }

      /* We now care that the transmit portion has completed; the FIFO is shared and we mustn't read out
         any of the data we were planning on writing to the slave! */

      /* Wait for peripheral to get to IDLE or one of the two RX states - this way we *know* TX has completed or hit an error */
      {
         uint32_t state;
         bool_t state_transition_complete;
         bool_t error_detected;
         do { 
            state = (I2CS_x( periph_setup->port ) & 0xf0000000) >> 28;
            state_transition_complete = ((state == 0) || (state == 4) || (state == 5));
            error_detected = (I2CS_x(periph_setup->port) & (I2CS_ERR | I2CS_CLKT)) != 0;
         } while(!state_transition_complete && !error_detected);

         if (error_detected) {
            /* Clean up, and disable I2C */
            I2CC_x( periph_setup->port ) &= ~(I2CC_INTD | I2CC_INTR);
            I2CC_x( periph_setup->port ) &= ~(I2CC_START | I2CC_READ);
            I2CS_x( periph_setup->port ) = I2CS_CLKT | I2CS_ERR | I2CS_DONE;
            I2CC_x( periph_setup->port ) |= I2CC_CLEAR;
            I2CC_x( periph_setup->port ) = 0;
            return -1;
         }
      }

      if (using_interrupt)
      {
         /* Wait for interrupt to complete. */
         i2c_state.active_buffer[periph_setup->port] = data;
         i2c_state.active_buffer_length[periph_setup->port] = read_nbytes;
         i2c_state.active_buffer_offset[periph_setup->port] = 0;
         i2c_state.pending_transfer[periph_setup->port] = I2C_PENDING_TRANSFER_READ;
         RTOS_LATCH_T latch = rtos_latch_locked();
         i2c_state.pending_latch[periph_setup->port] = &latch;

         /* Enable interrupt. */
         I2CC_x( periph_setup->port ) |= I2CC_INTD | I2CC_INTR;

         rtos_latch_get (&latch);

         i2c_state.pending_latch[periph_setup->port] = NULL;
         data_read = i2c_state.active_buffer_offset[periph_setup->port];

         rc = (data_read == read_nbytes) ? 0 : -1;
      }
      else
      {
         uint32_t time_now = 0;

         /* Loop until we've read all our data or failed. */
         while( 0 == ( I2CS_x(periph_setup->port) & (I2CS_TA|I2CS_ERR|I2CS_DONE) ) ) {
            _nop();
         }

         /* Wait for some data to arrive - we should wait, at most, I2C_TIMEOUT_IN_USECS for data to arrive every time we start waiting */
         time_now = i2c_state.systimer_driver->get_time_in_usecs( i2c_state.systimer_handle );
         while( ((i2c_state.systimer_driver->get_time_in_usecs( i2c_state.systimer_handle ) - time_now) < I2C_TIMEOUT_IN_USECS)
                && ( data_read < read_nbytes )
                && !(I2CS_x( periph_setup->port ) & I2CS_ERR) ) 
         {
            if (I2CS_x( periph_setup->port ) & I2CS_RXD) 
            {
               read_data[ data_read ] = I2CFIFO_x( periph_setup->port );
               data_read++;
               time_now = i2c_state.systimer_driver->get_time_in_usecs( i2c_state.systimer_handle  );
            }
         }

         if( (data_read != read_nbytes) /* Did we read all the data we asked for? */
             || ( (read_nbytes - data_read) != I2CDLEN_x( periph_setup->port ) ) /* Has DLEN decremented? */
             || ( 0 != (I2CS_x( periph_setup->port ) & I2CS_ERR) ) ) { /* Are there any errors? */
            rc = -1;
         } else {
            while( I2CS_DONE != (I2CS_x(periph_setup->port) & I2CS_DONE) ); /* Wait for the peripheral */
            rc = 0;
         }
      }

      /* Clean up, and disable I2C */
      I2CC_x( periph_setup->port ) &= ~(I2CC_INTD | I2CC_INTR);
      if(I2CS_x( periph_setup->port ) & I2CS_ERR) {
         //Wait for it to be idle
         while(I2CS_x( periph_setup->port ) & I2CS_TA)
            _nop();
      }
      I2CS_x( periph_setup->port ) = I2CS_ERR | I2CS_DONE;
      //Finally disable the I2C
      I2CC_x( periph_setup->port ) = 0x0;
   }

   if( !user->skip_asserts ) {
      assert( rc >= 0 );
      _nop();
   }

   return rc;
}      

I'm not sure if people are hitting this issue, but the pasted could may be helpful if you are.

It is not possible to completely eradicate this hardware bug.

When it happens, the master and slave disagree on the number of clock cycles. Therefore, the data, as interpreted by the slave and the data as the master (= raspberry pi = driver/application) intended differ. You might be writing a completely bogus value in a register somewhere. Highly annoying.

My suggested workaround would at least cause the next transaction to work. The impact would be halved: only one transaction botched instead of two. It would not eliminate the problem.

If I2C slave doesn't support clock stretching you are fine.

If the I2C slave doesn't require clock stretching....

If I2C clock stretches by a bounded amount, then lowering the I2C clock frequency can avoid the problem

Yes. I have 10 devices "in the field" at one location. Apparently the clock frequency (RC clock) of three of the modules is different in such a way that they end up "done" with the clock stretching in exactly the wrong moment. Reducing the clock frequency fixed the problems for those modules.

If the clock stretching delay is NOT fixed, but, say, varies by more than 5 microseconds, there is a more or less statistical chance of hitting the problem. In my case the slave requires 0.25 microseconds of clock width. So clock stretching ends in the 5% of the time leading up to the clock changing moment for the BCM, then things go wrong. So, in this case, about 1 in 20 transfers would go wrong. (specs say min 250ns for a clock pulse. That in fact doesn't mean that it won't be seen if it is shorter. I think the chances of it being seen goes linearly from 100% at > 250ns to 0% at < 125ns. )

I've spent yesterday stretching the clock stretching period in such a way that it would fall around the 2.5 microsecond mark in the 5 microsecond window-of-opportunity. If I'd aim for the start of the window and my RC clock runs bit fast, I'd hit the "0" which triggers the bug. If I aim for 5, same thing. So now I'm aiming in the middle, away from the bad places. But change the I2C clock to 80kHz, and BAM, I'm aiming right for the sensitive spot... (and if it doesn't happen with 80kHz, it'll happen with some value between 80 and 100).

Here is Gert's description of the I2C bug:
https://dl.dropbox.com/u/3669512/2835_I2C%20interface.pdf

Haha! I have instrumented my test-board in exactly the same way. I've cut the SCL trace and mounted a 0.1" jumper connector on there. While not measuring there is a normal jumper there, or a female connector with a 100 ohm resistor when I am.

I don't think it only happens on the first clock. That would mean I could work around the problem by either NOT clock stretching at all (as most hardware I2C chips probably do), or by causing the clock stretch to last at least a full cycle.

I have instrumented my i2c slave code in such a way that I can cause the clock stretch to change by 0.5 microsecond increments. I can command it to change the clock stretch to XXX over I2C and then it works like that for 50 miliseconds before changing back to the default values. (in case it doesn't work we need it working again. This way I can quickly scan a whole range of values for these settings, while monitoring the SDA and SCL lines with a logic analyzer. Sending a "use delay XXX" and then sending a sting takes 25 miliseconds. THen incrementing a counter in the script and sending the "next delay" takes the shell script about 100ms. So I can scan 100 possible values in about 10 seconds.

I HAVE seen the bug happening on longer delays. So the theory: "it only happens on the first" is not correct.

As to the "super efficient" interrupt handler: Yes, I can do that too. The thing is: The clock stretching feature allows me to put the "handle this byte" code in the interrupt routine and not worry about it taking a few microseconds or a few miliseconds. In theory the clock stretching will make sure that the master waits for me to finish handling this byte before continuing with the next one. That plan is totally busted with the bug in the 2835. I now just cause clock stretching to stop, and then hope to finish handling the byte in the 70 microseconds that follow.

I have to "cycle count" to aim for the middle of the 5 microsecond half-clock for me to release the clock stretching. If I'd aim for the start (to cause a 4.9 microsecond clock-high-period) the clock on my slave might be running a few percent faster and cause the bug to trigger a half-cycle earlier. So both ends of the half-clock cycle are dangerous, I have to aim for a short, but valid enough clock cycle.

The change in the module is IMHO simple: When SCL is "driven high" (i.e. not driven at all), stop the I2C module master clock. This has the side effect of allowing much longer I2C buses because automatic clock stretching occurs when the bus capacitance is above spec.

Thanks for trying to help, iz8mbw, but those have nothing to do with this (hardware) bug.

I've measured my I2C clock and it comes out as 100kHz which the driver thinks it should be. The driver has a "clock source" object which it queries for the frequency and it uses that. I have not yet instrumented my driver to print out the frequency.

Just for information for people who want to SEE the bug in action I have some "proof" in the form of a logic analsyer dump.
Hmm I can attach images, but not the binary LA dump. That one is here: http://prive.bitwizard.nl/LA_with_bug_scan_50_to_100_microseconds_delay.bin.gz

In the image LA_with_bug_shot1.png you'll see the short burst of activity: "set delay to XXX" then a small burst that tries to write some 10 bytes, but it gets aborted due to a NACK. On the right you see a proper 10-byte-packet. The SDA line remains low until the next burst that is misaligned (all bits are shifted, and in my slave the "xth-byte-in-packet" counter has not been reset by the non-existant start condition at the start of that packet.

In the image LA_with_bug_shot2.png I've zoomed in to the section where the short pulse happens. This the longest pulse that I've seen go wrong: 290ns. (above datasheet-spec for the Atmel!)
LA_with_bug_shot1
LA_with_bug_shot2

Oh so THAT's where that came from!

I've got a project I'm trying to control with I2C from a raspberry pi and I'm honestly close to just giving up and either switching to a beagleboard or having the Pi talk to an FPGA via SPI which then does the I2C stuff.

I have a board in the mail that should be able to do that (expected delivery: tomorrow). SPI connector, I2C connector. Would be a nice project to try with that board. Send me an email if you want to be kept informed.

Current situation: I'm seeing behaviour compatible with Gert's: "only the first cycle can go wrong" when the slave is an ATMEGA, and I'm seeing the bug in full force when the slave is one of my ATTINYs.

Here are the delays I tested:
i2c_delays_atmega
and the shortest SCL pulse observed during this test-session was 4.5 microseconds.

Now testing with the attiny slave, the tested delays for the ACK are:
i2c_delays_attiny

and the minimum pulse width is 41ns (my measurement resolution):
i2c_delays_attiny2

Ah. Found it! It seems that clock stretching before the ACK of a byte is "dangerous" but clock stretching AFTER the ACK is ok.....

I and others are certainly hitting the repeated start issue. I have a userspace hack at http://cpansearch.perl.org/src/MDOOTSON/HiPi-0.26/BCM2835.xs in proc hipi_i2c_read_register_rs. This seems to work for the users who have reported trying it or incorporating the C code ( though that's only 5 people ), even though I had not figured out correctly how to determine that the TX stage had completed. I suppose I need to learn to compile kernel modules and run existing tests against code above.

Couldn't we just force the kernel to ignore the I2C hardware and use bitbanged GPIO-based I2C on the same pins? I would think the overhead for that would be more acceptable then the bugs in the hardware itself.

What overhead do you expect? The current driver defaults to 100kHz clock. So reading 10 bytes from an i2c intertial sensor takes about 1ms. If you bitbang, you're likely to be churning CPU cycles all that time....

My preference would be for the driver to optionally support bitbanging i2c through a module parameter (or maybe a separate module if that is more convenient).

Many devices don't use clock stretching (or clock stretch by a bounded amount so can work correctly at reduced speed), so the hardware I2C driver is still preferable in those cases.

The kernel itself has a driver for bitbanged I2C on GPIO pins, this driver has much lower overhead than bitbangging from userspace, and exposes the same API as the hardware I2C drivers. My suggestion would be to compile both that driver and the hardware I2C driver as modules, then load the hardware driver by default.

@Ferroin
Do you use the bitbanged I2C driver on Pi?
What .config options have you added to build it? Any source patches?
What do you do to choose the driver to use (modprobe? different /dev/ devices?)

Haven't used the driver on the Pi, but I have a friend who uses it successfully on a BeagleBoard.
The config option is I2C_GPIO.
If both that and the broadcom driver were configured as modules, choosing the driver would be as simple as loading the kernel module.
The GPIO module does need parameters to tell it which GPIOs to use as SDA and SCL, I don't know what the parameters are, but based on the not very well documented source in dirvers/i2c/busses/i2c-gpio.c, it looks like the apropriate params are 'sda=X scl=Y' where X and Y are the pins you ant to use.

The i2c-gpio module itself does not provide any way to pass parameters to it. So you need another module that registers platform data and configures at least bus number, SDA and SCL pins. I've written proof of concept of such a module and it even worked on pins 0 and 1 but without i2c-bcm2708 loaded (pcf8574 was detected by i2cdetect, I was able to drive some LED with gpio-pcf857x module). But then I burned my pcf8574 when changing the pins (and I don't have any other i2c chips right now) so I couldn't really test it. Here's the code

@kadamsi
Making sda_pin/scl_pin module parameters may be useful. As would a pull request.
Can anyone confirm this works for them?

That is less trivial than it looks: It expects to be passed a platformdata structure all the way.

On the other hand, the platform does not dictate which GPIOs you are going to use today, so putting in the work to change it does sound reasonable. (normally the platform data specifies: "How is /this/ computer wired". )

@popcornmix
I'm planning to make them configurable, I just had not enough time yesterday. I would also like it to be able to create more than one bus at a time. Maybe in dynamic way, like GPIOS export and unexport sysfs files (but I'm not sure if it's possible to cleanly unexport such bus)? Oh, and I found another pcf8574 and could confirm that it worked well also on different GPIO pins when external pullup resistors are used. Unfortunately I don't have more sophisticated I²C devices to test. So are you interested in providing such module in default kernel tree?

@kadamski
I don't have a setup to test modules like this. However any hardware features the chip provides (like I2C) should have kernel drivers to allow more people to use them. If the I2C hardware has bugs that will stop people using certain peripherals then a bitbanging alternative sounds worthwhile.

So, if you produce a new driver that looks harmless for people not using it (i.e. built as a module, and doesn't nothing until loaded), and there's some evidence (i.e. some confirming user reports) it is working correctly, then I'm happy to accept a PR.

Has the HW been fixed on BCM2837 (in RPi 3)?

My guess is: no. But I probably don't have time to test this today.

BCM2837 I2C is unchanged.

That's unfortunate, but thanks for the info.

thats very annoying.

I agree. Does anyone have a link to BCM2837 errata, or similar, stating the problem still exists? Would be handy when we have to pass the news on. Or @P33M, perhaps you can be quoted as authoritative if you work for Broadcom?

He is authoritative, although doesn't work for Broadcom.

That's good. But I've already Googled, and apart from P33M being M33P elsewhere, struggle to find an ID I can usefully pass on. Which is fine, but means he's not a great source to quote. :-) So an errata link would be handy.

@P33M works for Raspberry Pi. You can treat what he says as authoritative.

I don't think there is an errata list, probably should be. I've leave him
to confirm his identity, or not!

On 15 March 2016 at 14:30, RalphCorderoy [email protected] wrote:

That's good. But I've already Googled, and apart from P33M being M33P
elsewhere, struggle to find an ID I can usefully pass on. Which is fine,
but means he's not a great source to quote. :-) So an errata link would be
handy.


You are receiving this because you commented.
Reply to this email directly or view it on GitHub:
https://github.com/raspberrypi/linux/issues/254#issuecomment-196845789

@RalphCorderoy I am an employee of Raspberry Pi (trading) Ltd. My answer is authoritative.

@P33M, Ta, that'll do. Sorry for the hassle.

This is affecting a project I'm working on. Fortunately, I have complete control over the slave device firmware and I was able to add use the workaround of adding an extra half period delay (clock stretch) in my responses which skips the bad behavior.

Good analysis of the problem can be found on advamation.com.

Is this bad hardware in Broadcom's I2C implementation or are there kernel tweaks possible to get the hardware to behave correctly?

I have just subscribed to this issue as I have also encountered this with a product I am developing
which uses i2c between an AVR and Pi. Now I need to rethink it.
Has anyone used any reliable form of i2c bit banging ?

@mwilliams03, you might be interested in this project, in which I successfully established 100 kHz and 400 kHz I2C communication between a Raspberry Pi and an AVR. I am not bit-banging, but instead using the hardware I2C support and inserting delays at key points. It turned out that the AVR is slow enough that at 400 kHz no extra delays are needed.

In particular, here is information about the I2C issues. Note also that the AVR's TWI module has an issue preventing quick reads.

thanks @pdg137 , I tried it out and on the first go I got 1.2 million reads without errors and still going.

@rewolff is looks like this issue has been resolved. If that is the case, please close this issue.

And now.... add a level shifter and 40cm of extension cable.. Does it work then? Chances are that this causes enough delay on the "up" side of the waveforms, and all of a sudden, the bug strikes again.

As far as I can tell, broadcom and employees still think that this bug is restricted to the FIRST 5 microsecond (actually half-clock) after the final bit of a byte. then just adding 10 microseconds will carry things over to two half-clocks further and there should be no problems. In my testing, I find "bad" moments to stop the clock stretching every 5 microseconds.

Due to symmetry between SPI and I2C it is easier in my AVR code if the next databyte is "calculated" in the "next data" interrupt.... (I expose an interface where continuing to read is permissable, so a 16 byte buffer is not always enough.).

I'm not convinced.

I'm experiencing issues with SMBus between a Respberry Pi and an ATMEGA 324PA - could these be related to this issue? https://stackoverflow.com/questions/39274784/talking-smbus-between-raspberr-pi-and-atmega-324pa-avr-not-clock-stretching

As far as I can tell, broadcom and employees still think that this bug is restricted to the FIRST 5 microsecond (actually half-clock) after the final bit of a byte.

@rewolff: I would have to agree with Broadcom employees. In my tests, with a Raspberry Pi 3 and 100 kHz I2C (which slows down to 62 kHz when the CPU is throttled), I am getting results similar to the animation from advamation.com. Are you getting experimental results that differ from that animation, or is the animation not sufficient to convince you?

The animation seems sound. However it is not in line with my experience. Maybe I should setup a test. Might take a while. It's busy with other stuff.

But if this were the case, simply a "usleep (10)" in my interrupt routine (with hardware clock stretching) would normally do the trick and make things not lock up. I'm pretty sure that this was not the case. :-(

@rewolff: On the Raspberry Pi 3, I have seen the CPU get throttled to 600 MHz, resulting in a default I2C speed of 62 kHz. So you should sleep for 17 microseconds in your slave's ISR if the ISR starts on the falling edge of SCL. Or if you think the CPU might get throttled to below 600 MHz, sleep for longer.

Would it be possible to include @kadamski's i2c-gpio-param module in the Raspberry Pi kernel? I can confirm that this approach seems to work quite well for my application (AVR software I2C slave) and it is somewhat cumbersome to install the required headers for the raspberry kernel, so having it available would be great.

It would also be a good workaround addressing the HW bug underlying this whole issue.

@onnokort You mean like the i2c-gpio overlay present in current kernels since Nov 2015? https://github.com/raspberrypi/linux/blob/rpi-4.4.y/arch/arm/boot/dts/overlays/i2c-gpio-overlay.dts
From the README for overlays:

Name:   i2c-gpio
Info:   Adds support for software i2c controller on gpio pins
Load:   dtoverlay=i2c-gpio,<param>=<val>
Params: i2c_gpio_sda            GPIO used for I2C data (default "23")

        i2c_gpio_scl            GPIO used for I2C clock (default "24")

        i2c_gpio_delay_us       Clock delay in microseconds
                                (default "2" = ~100kHz)

@6by9: Awesome, thanks, I learned something new today. I didn't test this yet, but, yes, this looks exactly what I was looking for (and found in i2c-gpio-param).

@popcornmix @6by9 Can this be closed? Been around a looong time, and the dt param stuff from my reading seems to be a decent option.

The recommended workaround for devices that definitely do not play well with our I2C hardware is to use the i2c-gpio overlay, which bitbashes pins.

Otherwise, clock stretching in the address ACK phase is supported by the upstream driver.

Dear all, today i've tested the BNO055 in the package GY-955 from East Asia what behaves very ugly at 100_000 Hz with a Pi 2B (BMC2835). For tests I've changes the frequency of the I2C bus from 50_000 to 400_000 Hz and I found problems at 100_000 Hz, but for 50_000, 85_000, 115_000 and 125_000Hz the code from Adafruit (Adafruit_Python_BNO055) works pretty fine with the I2C connection, 150_000 Hz seems to be too fast for the BNO055, it works but the values are sometimes wrong. I confess, I didn't test the frequency by an oscilloscope, I've only read out the file /sys/class/i2c-adapter/i2c-1/of_node/clock-frequency, therefore some uncertainty is left open. Now I start to implement a full interface to the BNO055 in ADA.
Facit: The hardware bug in the BMC I2C handling appears only at 100_000 Hz, it must be confirmed.

Just FYI, The hardware bug happens when the slave device does clock stretching and releases the SCL line after 0.9 to 0.999 I2C clock periods. The line will start to rise, and the pi hardware will say: Oh, he didn't do clock stretching so I can provide the next clock pulse by pulling the clock line low again.

The bug happens when the time remaining in the clock period is sufficient for the clk line to pass the "HIGH" threshold for the raspberry but not for the slave devices OR when the slave has noise filtering and considers a high during < x% of the normal I2C clock cycle a glitch.

This does not "only happen at 100kHz", but also at many other frequencies, except that the response time of the slave device then has to be a different value.

Suppose the clock is 100kHz, so a normal cycle would be 5us high and 5us low. Now suppose at t=0 the pi starts the last clock cycle by pulling the clock line low. At t=5us the pi releases the clock signal, but the slave is still busyand keeps the clock low. At t=9us the slave is done and releases the clock signal. At t=10us the pi will look at the clock signal, and think: Ok, no clock stretching, so continue, and it pulls the clock signal low again. The clock signal has now only been high for 1us (a bit less due to capacitance and the pullup resistors' time constant) and might not be seen by the slave. Now a different device does not become ready after 9 microseconds, but after 10.5. Now at 100kHz, the pi will see "still low, clock stretching, wait another 5 microseconds. Now we'll get a 4.5 microsecond clock pulse and things will work. But run this device at 90kHz, and it gets into trouble. Now the pi will look at 11 microseconds from the start, and the clock signal will have been high for less than a microsecond -> Problems.

The people who complain about this are NOT the ones who have those other devices. Those work at the default 100kHz. But the bug is still very real and would occur on those devices simply at other bus speeds than the devices that you and I have.

Thank you for the response, it explains very good, why I got a reliable transfer at the frequences I chosen:
at 100_000 Hz the problems raise in the period of 9 to 9.99 µsec
at 85_000 Hz the problems start at 1_000_000/85_000*.9 = 10,59 µsec > 9.99 sec, therefore the clock stretching is ignored I assume, for all frequencies less then 85_000 Hz the start point of problems is higher than 10,59 µsec.
at 115_000 Hz the bus cycle is 1_000_000/125_000 = 8.7 µsec < 9 µsec, therefore the clock stretcing is detected correctly, for all frequencies higher than 115_000 Hz the bus cycle is less then 8.7 µsec.
So when I have problems I can calculate the frequence I need.
(I read the information on http://www.advamation.com/knowhow/raspberrypi/rpi-i2c-bug.html,
there was noted .5 instead of .9 of a period).

Well. It depends on what your slave detects as a valid clock signal. And how much delay the RC on your bus introduces. In my case, the AVR I use as a slave considers pulses less than two clock cycles to be a glitch. With an internal RC clock of 8MHz (+/-10%!!!), the pulse needs to be at least 250ns to be detected. I2C specifications require 5 microseconds, so there is a factor of 20 "margin" between what is officially required and what no longer works.

At first I thought I'd increase the clock stretch in my slave produces a bit to say 12us. That works until the bus capacitance adds enough delay to cause a problem again. So it is very difficult to work around. Annoying to have to tell your clients not to use the standard 100kHz i2c.

Sorry, but it I needed some time to write a full implementaion and test scenario.
I thank you for all the information you gave to me, it was very helpfull.
My test scenario is build up with 2 MEMS ( BNO055 packaged as GY-055, MPU9255 packaged as MPU-9250/6500, WHO_AM_I register says MPU9255) and 1 GPIO extender PCF 8574 (packaded PCF 8574).
The I2C clock at the Pi (2b) was set to 125 kHz, verified by reading /sys/class/i2c-adapter/i2c-1/of_node/clock-frequency.
My code runs 100 times reading data from the MEMS and writing data for PCF with a delay 0.1 sec.
Result:

  • Initialisation of the MEMS was successful.
  • Reading from the MEMS data over the I2C bus was successfull over all cycles.
  • Writing data to the PCF over the I2C bus was successfull over all cycles.

Therefore the MEMS and the PCF accept a clock higher then the standard 100 kHz.
In an elder environment the communcation to the AVR is done via serial line, in my second one the the communication will be done over USB, but it is not finished, actual I tripped over the I2C bus problems cause by the BNO055, but I think for my environment I found a workaround for a Pi 2b and I hope it will work with a Pi 3 too, that's the next test. And then I have to integrate 2 other MEMS, 1 VL53L0X and 1 BME280, I hope it will work with the new frequence and the bus capacitance accept all these elements.

I'm playing with this issue now on a RPi 3 Model B+, and I'm seeing different results than the Advamation article describes. The silicon wasn't revved recently, was it?

The short version is, things seem to work if I stretch at read-preACK, write-preACK, or write-postACK; the only time I need to be sure to stretch either long or not-at-all is read-postACK. I still need to clean up my experiment before I can say anything certain, though.

First, a note about terminology. The Advamation article says:

So, I2C with the Raspberry Pi only works if
the slave does not use clock-stretching at all, or
the slave stretches the clock only at the end/directly after an I2C-read-ACK-phase (after reading ACK/NACK), but then by more than 0.5 clock periods.

The term "I2C-read-ACK-phase" seems somewhat ambiguous. If it's an I2C read command (i.e., the post-address R/W̅ bit is set), the slave would be writing the A̅C̅K̅ bit, not reading it, contrary to the parenthetical. When I say "read-postACK" here, I mean that the R/W̅ bit was set right after the address (and presumably not changed by protocol like SMBus does), so the slave has just written its A̅C̅K̅ response.

As has been hinted at around here, it seems that if the RPi senses that SCL is being held low by a slave, it will delay its internal SCL clock by one half-cycle (5μS). The bug is triggered if the slave holds SCL low for _almost_ one half-cycle. If that happens, then the RPi will still let SCL go high, and then pull it down immediately afterwards when its SCL clock triggers. This results in a runt clock pulse, which some devices might see as a pulse while others don't. (For reference: the I2C spec implies a nominal SCL high time of 5μS, and establishes a minimum time of 4.0μS at 100kHz, which is the default speed of the RPi's I2C interface.)

In my experiments, in the read-preACK, write-preACK, and write-postACK phases, the RPi would ensure that the maximum SCL high time it would allow was 4μS or so. It seemed willing to push its SCL clock back in those phases. I suspect that the RPi sampled the SCL line shortly after the it released it, and if SCL was still low, it pushed its SCL clock back a half-phase.

However, in the read-postACK phase, I did see that the RPi would emit runt clocks (<4μS, often 2μS or less) if the SCL line was held low for about 6–9μS. I suspect that in this phase, the Broadcom engineers follow a different path that tries to make it less likely that they will have to delay a half-phase on a high-capacitance line. Sadly — as we've seen — it falls short.

Therefore, my current code needs to stretch SCL beyond ½ cycle in the read-postACK phase of reads that may take a while (such as ADC conversions). If I fail to

The current tests that I'm running use a PIC16F1619 as a slave device. The slave device uses a loop to stretch the clock in both read and write commands, both pre- and postACK. The length of this loop plus overhead varies from 3μS (less than the time the RPi holds SCL low) to 90μS in 0.35μS increments. I have a Python program on the RPi that will different values to two registers, then read them back, and assert that the read-back values are correct. The values being used

Piquan,
Broadcom has been downplaying this by proclaiming that this problem only happens on the very first clock stretch. I do not think that this is the case. You should be able to verify that. Advamation seems to be following broadcom statments.

@rewolff Well, there may be something to their interpretation. If, by “first clock stretch” they mean “the first time in a given stage that SCL is stretched by more than one clock translation” then their description is accurate: if you stretch by more than one clock cycle, then yeah: in all phases, if you stretch by >.5 cycle, you’re in good shape. At least, in the vague experiments I’ve done so far; more precise experiments are yet to come.

I’m happy to conduct some better experiments to demonstrate when the RPi’s I2C can and can’t handle clock stretches, and get repros that BCM can send to the engineers.

Sadly, there’s a very significant silicon rev they’d need to fix this issue; they need to not tick the SCL divider clock when SCL is held low (by an internal or external source). That’s probably going to require a silicon-level change, which is expensive; most ECOs are metal-level, which is… well, not cheap, but way cheaper than silicon.

My main goal here is to characterize the types of slaves that would and wouldn’t work with the RPi, with some experimental evidence.

It sounds like we need more solid data, such as repro scenarios, to convince Broadcom to re-engineer the I2C controller. Even with solid data, the engineers will have to convince management that it’s worth revising silicon for the next revision.

That said, I’ll put some time into designing reproducible scenarios so the BCM engineers can clearly see and test the problem. I’m going to depend on the community members here to communicate my findings to Broadcom, though; I don’t have a channel to them.

I should clarify something about my previous comment for the community, about silicon vs. metal changes.

You know how, if you get a board from SparkFun or Adafruit, it often has traces you can cut or solder together to change its behavior? Most I2C boards have traces you can change to change the address, enable or disable the pull-up resistors, etc.

Microchip designers do the same thing. There’s lots of changes they can make on the metal layers of a microchip, so that they don’t have to change the silicon layers. That’s because changing silicon layers is hard and expensive (like changing the chip on a board), while metal layers are a lot easier to change (like changing the traces on a board).

Many useful changes can be made in the metal layers, because designers think ahead and make metal connections (jumpers, essentially) that change the behavior of a chip.

But if there’s a change they didn’t plan for, and can’t figure out how to make by rerouting traces within the chip, then they have to change the silicon. Silicon needs a lot more planning, verification, and masking, so it’s hard to change.

Many companies, if they have a silicon layer that’s “good enough”, will use it as long as possible. They’ll reuse some parts of the silicon layer in multiple revisions of the chip.

I suspect that’s where BCM is at: they need good evidence that this bug can’t be fixed in drivers or metal before they can justify re-engineering this part of the silicon layer from scratch.

It is extremely unlikely that there will be a respin of the exisiting silicon just to fix this issue (got a spare $1M?). Future chips may well have a fix though, who knows.

On Wed, Sep 05, 2018 at 02:13:19AM -0700, Joel Ray Holveck wrote:

It sounds like we need more solid data, such as repro scenarios, to
convince Broadcom to re-engineer the I2C controller. Even with
solid data, the engineers will have to convince management that
it’s worth revising silicon for the next revision.

They already have had six years and 3 silicon revisions where they
have demonstrated their decision: "won't fix".

if you stretch by >.5 cycle, you~re in good shape.

I do not think so. Every time you release CLK in the "just before the
broadcom looks again" time period, you get short pulses. I can easily
delay the release of the clock signal by say 5 useless microseconds to
always be out of the 0.5 cycle. But that did not work.

On Wed, Sep 05, 2018 at 02:28:53AM -0700, James Hughes wrote:

It is extremely unlikely that there will be a respin of the
exisiting silicon just to fix this issue (got a spare $1M?). Future
chips may well have the fix though.

I doubt it. We went from BCM2835 with KNOWN bug to the BCM2836 with
the same bug to the BCM2837 with the exact same bug to the BCM2837B0.

Now why do you think the next revision will be different?

BCM has a target market: Phones and TV-sticks for these chips. In
their target market nobody is doing software-I2C on general purpose
microcontrollers like AVRs and PICs. they are NOT going to fix this.

It is a shame this investigative effort is required, great though it is. When I asked before for an erratum that covers this it was thought there wasn't one. Has that now changed and there's a URL? Ideally, Broadcom would help the community by precisely describing the fault and when it occurs so workarounds or evaluations of compatible components would be easier to work out with confidence. Even to the extent of the snippet of relevant HDL if they don't want to attempt to translate that accurately into a description.

The suggestion to halt the half-bit-clock-divisor when allowing clock stretching is a good one. It would work.

The problem with that is that part of the I2C module would then need to run at "full clock speed". Now the I2C module is a nice low-power one because it runs normally not faster than 200kHz.

The main reason for my post was what I said at the beginning of my original post:

I'm playing with this issue now on a RPi 3 Model B+, and I'm seeing different results than the Advamation article describes. The silicon wasn't revved recently, was it?

If we had significant changes to the I2C hardware or driver after the 2013 Advamation writeup, and the community knows how they may affect this bug, then I should make appropriate changes to my experiments.

I know the fundamental problem is in hardware, but for all I know, somebody may have found by experiment that changing a seemingly-unrelated register in the driver actually changes this bug's behavior. I also haven't confirmed that the BCM2837 has the same I2C module as the RPi 2 (the most recent time I could find definitive confirmation that the I2C hardware was unchanged since this problem was first reported). I also haven't found any information about the any of the BCM283x steppings; for all I know, there could have been a metal change in the BCM2836 era.

I'm not interested in bashing Broadcom, speculating how they may change future revisions, or anything like that.

My primary goal here is to document clearly the current behavior; it looks like it's changed over time. I have a secondary goal to allow future users to reproduce my experiments to see if the results are still valid, or to make improvements or test other conditions. I have another secondary goal to allow Broadcom engineers to reproduce the tests easily if they want to. Both of these mean I need to publish my experiments, which also helps: because of my own hubris, if I'm publishing my experiments, I'll be more driven to make them rigorous.

My current experimental plan is to use a PIC16F1619 as the slave implementation; it's fast and versatile enough to do the clock stretching experiments, easy enough to work with, and common enough that other experimenters can reproduce my results. If the results are ambiguous, I might also build an experiment with an FPGA. I might see if I can get reasonable granularity using another Raspberry Pi in a bit-bashing configuration as the slave, but I suspect I'll need something without a whole OS on it to do real-time testing.

To address another couple of things that came up:

Sorry that my original post was incomplete; I accidentally posted an early draft. I'll leave it as-is, since the changes I made are pretty minor.

I don't claim originality on the idea of halting the clock divisor while SCL was externally held low. I read that somewhere else, but don't remember where; I thought it was from Gert van Loo in the I2C clock stretching thread on raspberrypi.org, but I can't find it now, so I must have seen that elsewhere. I'm not convinced that it would change the power profile significantly,

The I2C HW block has not been fixed/changed, and is unlikely to be, in the bcm2835/6/7 as that would require respins of the chip which is simply much too expensive.

I do not know who Advamation are, but they are unrealted to the RPF(T) group, so I cannot guarantee the articles accuracy.

Dear all, I got a very strange phainomenon on my Pi 2B and 3, I try to explain:
when I started to implement the BNO055 end of June, I used an elder Raspbian (I don't now the version) and I got a lot of invalid I2C addresses using i2cdetect. For that I looked in the net and I found this forum with good informations. For that I changed the I2C frequence as described above and all the ghosts vanish and I can work. Mid of August I upgrated Raspbian to I think raspbi-dev 4.14.62-v7+ and the system became a little bit unstable, but I can work. From update to update the system became more unstable, noted leaks on the sd card. Therefore I re-installed Raspbian via NOOBS (Linux raspbi-dev 4.14.62-v7+ #1134 SMP Tue Aug 14 17:10:10 BST 2018 armv7l) and I forgot to set the I2C frequence but no ghosts appeared. Okay I don't use microkontroller on the bus, only MEMS and GPIO extender, but I feel there is an workaround in Raspbian's i2c code. Further I think the detected leaks came from an imperfect upgrade, I'm using the the same card and no leaks are detected.

@hmf2015 This issue is about a very specific I2C problem. Problems with the SD card are not caused by this I2C problem. If your SD card is not working right, you should try a new SD card. Some cards are not good, and stop working quickly. My SD card stopped working last month, and I had to replace it.

Usually, I2C ghosts are not because of this problem. They most often happen because:

  • You do not have a good power supply. I like this one.
  • Your cables are too long. I2C does not work well with cables longer than 1m.
  • Your cables have capacitive coupling. If you have the I2C pins (SCL and SDA) twisted together, that can make a problem. Sometimes, people use twisted-pair cables (like Ethernet cables) that twist the I2C pins, and that makes problems. If you have to use twisted-pair cables, then twist SCL with Vss (ground) and SDA with Vdd (3.3V). Look at the I2C spec in section 7.5.
  • Your I2C device has 5V. The Raspberry Pi does I/O at 3.3V. If the voltages are different, that can make problems. (The BNO055 needs 3.3V, so if you're using it with 5V, you have a problem.)

I said that we're talking about a particular problem. Some devices might make ghosts because of this problem, but not many devices. If devices make ghosts because of this problem, they don't make many ghosts. I think your problem is different.

I don't know what you mean when you say that you "noted leaks on the sd card". That's an important problem, but it is not a problem that we talk about in this web page's issue. This conversation is about a particular bug. That bug does not affect SD cards.

I do not think that recent Raspbian changed how I2C works. We think that Raspbian cannot fix this bug. We might be wrong. Thank you for telling us about this.

If you are only telling us this information to help us, then we thank you. If you are telling us this information to ask for help, you might want to ask at the Raspberry Pi forums. The forums talk about many things, like I2C, SD cards, GPIO extenders, and many more things. The things you are seeing may be the same things we are talking about, but they may be different things. The people at the forums can help you to know which problems you are seeing.

Thank you for telling us this information!

Thanks for the answer, but the sd card problem was produced by the O/S, I checked the card on my (Unix) host, and it was ok, and now the same cad works perfekt after new installation.
The ghosts on the I2C bus vanish by a higher frequence and now after the reinstallation they don't appear, R/C problems and reflections I can, I think, exclude. I know the 3.3V problems, but but the minimal 1 level is less.
The BNO055 package has a voltage reglulator.
I also thought that the I2C problems would be the same as before, but it doesn't, at the moment I have not the time (and the interest) to look into the source code of Raspbian to detect the differences, I'm interest on a working environment.
The list of elements should only show, that I'm working in an Andoid environment (with a real Unix) as BCM supports, not with microcontroller what I attach via serial line or USB since I'm lazy. In your environment the conditions are more restrict than mine, so I cannot say whether your problems become less, but I hope.

Hi hmf2015,
This is not a forum. This is a bug tracker. This list-of-comments discusses a bug which is unrelated to your problems. Please discuss your problem with an appropriate forum or helpdesk. Again, we appreciate your question: "Maybe it's the same? Maybe my description helps solve that bug?" but now we know for sure that what you are seeing is unrelated, please do not "pollute" this discussion further.

@popcornmix
hi, long time ago you have mentioned a problem with I2C state machine during restart and posted some code regarding this issue.
I am not sure what you meant but I have an issue with WM8731 codec connected to I2C.
After 2-3 restarts the I2C bus completely hangs up. i2cdetect doesn't show any info except 1st line and hangs up too. The only way to recover is power down/up cycle.
Does this look like a issues you have mentioned ?
I am using 4.19 mainine kernel.

@sergey-suloev When i2cdetect does not continue scanning after 8 or 16 addresses, I would say that this is unrelated to this issue. The issue pops up when devices do clock stretching to indicate that they are not yet ready for the next clock cycle. When I2C was designed almost 40 years ago, that might have been conceivable for hardware implementations too. But nowadays, hardware is always ready in time to handle the next clockcycle because 10 microseconds is ages of time for something implemented in hardware.

Was this page helpful?
0 / 5 - 0 ratings