Espeasy: Wifi issues -never ending story- go back to non event based wifi?

Created on 23 Apr 2018  ·  388Comments  ·  Source: letscontrolit/ESPEasy

As a lot of you have noticed the last few weeks, there have been lots of issues with the wifi.
This all started when I changed the way wifi operates to be event based.

  • Static IP not working
  • Boot loops (ESP32)
  • Connected but data transfer not possible (NTP cannot connect to 0.0.0.0)
  • No AP found errors
  • Loading setup page from AP mode not working (changed to core 2.4.0 for this)
  • Beacon timeout errors => no proper reconnect
  • Various other wifi related issues.

Some of these errors are core version related, and update to core 2.4.0 does introduce lots of other issues.
And then there is the problem with corrupted settings what was also in this period. That wasn't related to the wifi event based connect, but it made me look for a lot of other issues that were not really issues at all but just corrupted settings.

So at the moment the wifi state machine I wrote is overly complex due to the many fixes that were no fixes, because things were not broken.
And still there are other real issues, either caused by core 2.4.0 or still open wifi issues.

So now we have to choose:

  1. Go back to sloooowwww but stable wifi (still some issues then with MQTT when connection is lost)
  2. Invest some more time to get event based wifi just right + try to get core 2.4.1 working.
  3. Invest some more time to get event based wifi just right, but still go back to core 2.3.0
  4. Some intermediate solution to do async wifi with core 2.3.0

Core 2.3.0 does seem to give a lot less issues and leaves more free memory.
So I guess that's my preferred base.
This means that for event based wifi, there is still some issue with respect to loading the setup page when initial config is needed.

Anyway, this has to stop now and get stable again.
There are currently way too many issues at hand that are quite hard to see as separate issues.

Any other suggestion?

Core related Stabiliy Wifi Fixed Discussion

Most helpful comment

I am not sure about the status LED. It is called from the MQTTconnect function and from a few other places.
But maybe you could add an issue to make it selectable what is being shown via that LED?

And good to hear MQTT issues seem to be fixed by the lowered timeout.
We may have to make it selectable.

All 388 comments

I really can't speak to this from a programming level but it seems to me from what I have been seeing is that with the exception of the static ip address thing is that when setting up a "brand new" unit the wifi seems to work fine. I have not seen any connection issues with "fresh" installs with the latest firmwares. Web pages load fast and the entire things seems fast and responsive. Its when you try to upgrade is when most of these issues seem to be happening. Seems like there is a corruption issue when upgrading to a newer firmware.

I also notice that it seems to be a lot of user compiled firmwares are having wifi issues. Just from reading thru all these issue posts I get that impression. I could be completely wrong about that though. I am not trying to say that as a fact, but just a possibility.

I can't speak to MQTT because I don't use it.

Just my 2 cents worth.....

If you are leaning towards option 3 I support you fully. I'd hate to see us drop the improvements your event based WiFi has given us. Core 2_4_x might be easier to revert/go to up stream?

From the perspective of a user:
I would go ahead and use the new Core 2.4.1 as soon as possible.
The users can always use older versions.

Dont forget, core 2.4.x fixes some problems:
PWM flicker is history (#1156 is fixed with core 2.4.0)
Serial with large packet are also fixed...
At some point we have to make the transition to the new core. Return to 2.3.0 means only to postpone the problem. In the end we have to do the work anyway. My ESP's are definitely better with 2.4.0

As I see it, core 2_4_x will happen but maybe not necessary as of right now. We did a bad decision when we went ahead with core update and wifi event based approach at the same time. We should have made them one after another. When we then, at the same time, had an update in the global settings the problem got exceptionally hard to pinpoint. I strongly support the idea of going back to 2_3_0 during the fix of wifi stability + fix of settings corruption.

After that we can hopefully release the v2.1.0 and then focus on getting core 2_4_x stable for v2.2.0

After clearing the settings and uploading the version from 22.04. So far everything is working. At least for now :) Only free memory is not enough, even in NORMAL. We'll see how it will go on.

I have to agree with @Budman1758 and @melwinek : I also found that starting from a clean unit there are no problems at all with Wifi, static IP and settings.
The main issue is the fact that to upgrade I now need to manually clean all the units, reflash them and rebuild their configuration.

I guess we should not forget that officially we're still in the process of going from stable R120 to stable 2.1.0 and settings will not be converted between these two releases making you need to start from scratch anyway. What we did with the update of core 2_4_x was to make a "break point" yet again. If we can live with that then its not a problem. I agree that a clean install is really stable (at least on NORMAL, which I test most frequently). And NORMAL is the only part which will actually be in the release, test and dev is only in the development nightly release anyway.

What I mean is: if the current developed firmware works and is stable on a clean setup, then it means that there is nothing wrong with it. I wouldn't go back to 2.3 or to old wifi.

Yes I hear you and I kinda agree. The only thing is that we create another break point which I guess is okay since it's still beta.

Although it is a step back that I do not like, I'm afraid that it would be really better to go back to core 2_3_0 for now as I think some strange issues may happen due to lack of free memory on 2_4_0.

@giig1967g I agree with you there. I do believe there are some corruption issues going on though. Might be whats screwing up the wifi vs its having a lot of inherent problems.

There are still options to get memory usage to an acceptable point.
I think I can get about 3 - 4 kB more memory in like 1 evening of programming. (need to change all plugin files though)
And MQTT import is also something which is really a pain that should be resolved soon.
And the Switch plugin does have too much functionality it itself which should be split.

I will think about it today, what we should do, so please add more suggestions/arguments :)

@TD-er You're right with SWITCH.
Most people use only ON / OFF for switch / relay.
And in this plugin there is a servo, dimmer and probably something.
That could be separate.

It is also handling stuff very specific to MQTT and/or Domoticz. That should not be part of the plugin.

@TD-er In many cases, it would help me to compile myself, after removing unnecessary plugins, in many cases I need only SWITCH, FHEM Controller, DHT.
But after these adventures with settings I'm afraid to compile myself. Especially after your post: https://github.com/letscontrolit/ESPEasy/issues/1292

Did you take a look, how wifi is realized in other projects (eg tasmota)?

About memory: i told you :smile:
I think there are way too many rarely used features in the core - the decision, if a core feature request is implemented should be way more strict - now it's a little like Christmas for everyone...
Maybe a voting with a certain limit would help.

If possible, the core should first be cleaned from that rarely used features (or transformed to a plugin) and then optimized. Also, one could think about additional interfaces for plugins, to allow swapping more functionality outside of the core.

@M0ebiu5 Agree.
What should happen is that new features will be developed on a separate branch, then collect a few of them and merge those to a release candidate branch and test those.
Then release and merge the used features to the master branch (or dev branch, or whatever you name it).

And one thing I learned is to ask twice about what is observed, what should be observed and what version is used. That will make things a lot more clear and lead to less mistakes.
Part of that must be done in the code itself to make some kind of footprint to be able to see (and log) what software is used.

Also plugins should be just plugins to interface a sensor to some output values.
Maybe plugins that generate output (like displays) should not be used the same as the input ones.
So we get something like:

  • Sensor to read a device and generate measurements
  • Output (display?) to present values. This can also be something other than a display, for example JSON or an image.
  • Controller to interface to the outside world (input and output)
  • Rules to process data and events.
  • Notifications to convert events to something else. Actually it sounds like a more elaborate controller.
  • Commands to do some basic setup or temporary (non-persistent) changes/updates and perform some actions (e.g. reboot)
  • Web page to configure them all.

But such a redesign will take quite some effort.

@TD-er you are right, but i would make the changes in small steps - cause most parts are working stable and big changes could put this stability at risk.

New interfaces to the core are one possible way. They will not influence the current behavior and only new or heavily changed plugins will use them. It will take more time to transform to a clean architecture, but with a lower risk and the effort will also be spread over time.

I agree that these changes should be done at ease.
It is more a view on the redesign for the future.

However, node from 22.04 has lost the connection.
Resetting the router does not help.
ESP reset will help, but I'm far away.
So, the best version on my nodes is mega-20180410.
Maybe because it's on core 2.3?
Maybe, however, a good solution would be to go back to 2.3 for some time?

Nope, last night I saw the problem (in the code and happening at my own units).
My nodes did not reconnect when they got a 'beacon timeout' error, which is quite a common reason to disconnect. It is a logic error in the code, but it was already past 1:30 am and I didn't want to fix it at that moment. It would certainly have been past nightly build time to fix it, so that didn't matter anymore ;)

related: #1064

I just flashed 6 devices with the current version ESP_Easy_mega-20180425_test_ESP8266_4096.bin.

I think with this version we have reached an absolute low point.
All devices could not be reached in the network after a few hours.

That's why I will -one way or the other- return to a working wifi version.

Maybe we should also remove build of today, just to prevent others from loading the same.

I would suggest to build now a new version 04.25 on core 2.3.0 and replace the current one :)

I do not have control over the build server.
and 2 versions with the same build number is never a good idea.

I know @Grovkillen can remove today's build.

You think I should remove it @TD-er ? A new one will be build tomorrow.

Apparently it is even worse compared to yesterdays build.
So yeah, remove it.

Done

And platformio.ini is also changed.
So whatever happens, tomorrow's build will not be as bad as today's.

I sense panic around here. Dont't worry, there is still hope.
First, lets have a :beer: or :beers: Now,
thank you @TD-er for storming ahead so restlessly despite your worries, thank you @Grovkillen for supporting him. Thanks to all others as well. And thanks to everyone for discussing new ideas so openly.

That being said, let me say this:

  1. According to my experience there is nothing wrong with newer core versions except for 2.4.1 which has a wificlient memory leak (and a workaround).
  2. Older versions of the master branch up to the point when there was a decision to abandon 2.0 worked quite stable with those core versions. And fast.
  3. We really should (emphasis on really !) focus on stability. No new features for a while (unless the improve stability) less ESP32 and less memory hunting, less speeding up less everything but stability. Let's pretend we plan to fly to the moon. For real. That thing needs to operate. It needs to tolerate single bit failures, restarts, power fluctuations and temperature stress.
    I mean fail tolerant programming. Done it. It can be fun if you wrap your mind around it.

Whats next ? If I was a core dev, I would opt for that json based config. Asap. Seems like the current root of evil just like the memory intensive web server was a while ago.

I sense panic around here. Dont't worry, there is still hope.
First, lets have a 🍺 or 🍻 Now,
thank you @TD-er for storming ahead so restlessly despite your worries, thank you @Grovkillen for supporting him. Thanks to all others as well. And thanks to everyone for discussing new ideas so openly.

100 % agree !!!

I dont know if this error is already known:
After one or two days, it seems that the webserver dont work anymore. MQTT publishing is still working.
I am using the normal version from 04.22.

@TD-er personally i will vote for something new like 2.4.1 to try.

1+

1+

I sense panic around here. Dont't worry, there is still hope.

It is not panic, it is pure frustration ;)
Thing is that I really test stuff here and within minutes (at least it does feel like that) builds fail even worse than before.
I am used to program against black boxes, and also rev. engineer those black boxes.
But this feels like the feedback I think to see in the logs is completely different from reality.
Now it is clear there were several issues the last few weeks, due to bugs in core libraries, corrupt settings and a few changes I made appeared to be related to some bugs in AP firmware.

And my personal opinion about software is that it should be rock solid stable and speed comes second.
But last weeks the speed increase is OK, but the stability was worsening by the day, no matter what I tried.

So now the time has gone to make a firm halt and focus on stability first. You could call that 'panic', but actually it is some kind of step back to really focus on what's going on.
I now know a lot more about wifi than a month ago, so I should be able to make a well designed package. But that takes time and I really want to go to some point of stability and get some moment of ease in my head to make it working like it should.
And then there is still a lot of room to make stuff even faster, because I've seen ik connect even faster :)
But that's for the next version.

About the rest of the main issues:

  • memory usage
  • JSON import/export of settings
  • MQTT import redesign
  • some plugins like P001-switch should be changed.
  • What's left.

for me (and probably just for me) moving to 2.4.1 or even GIT Core did not improve, the opposite was the case. I tried about 20 different combinations of core version, mage-commits and lwIP versions. Going back to 2.3.0 especially lwIP 1.4 was the only way to get it running stable. But again, just my view of this in my specific environment...

And yes, big thanks @TD-er and @Grovkillen for the great work they do and time they invest for the community!

Thanks you all, @TD-er summarize the road forward pretty good.

About the rest of the main issues:
• memory usage
• JSON import/export of settings
• MQTT import redesign
• some plugins like P001-switch should be changed.
• What's left.

And we will revert back to 2.3.0 for tomorrows release and test that out for a while.

Most of my self build images are based on core rev. 491c9b8b (2.4.1 + x).
Only thing I see are random reboots with my Sonof 4ch device. Unfortunatelly it's part of my pond control, so no chance to connect a serial interface for better monitoring, Syslog is pretty unusable, 'cause the relevant info is spit out before WIFI is up and running.

It's pretty usable - as long as you use the lwIP 'v2 Higher Bandwith' library.
Otherwise you'l see problems with MTU fragmentation with packages > 512Bytes (out of order, with improper window information).

The working ESPEasy Revs (my repos's) are

commit 3576619181926b3adff5a1a133390eb71e808ae9
Merge: 9038bd2 d083a58
Author: Susis Strolch
Date: Fri Apr 13 17:07:30 2018 +0200

Merge remote-tracking branch 'upstream/mega' into mega

* upstream/mega:
  automaticly updated release notes for mega-20180413
  [wifi] Event based wifi, fix set AP and crash on start

and
commit daf39a064d3633fe1eccfa33576fafbccd7611a7
Merge: 2a96218 806a275
Author: Susis Strolch
Date: Mon Apr 9 09:15:52 2018 +0200

Merge remote-tracking branch 'upstream/mega' into mega

* upstream/mega:
  automaticly updated release notes for mega-20180409
  Both reset/factoryreset option
  Factory Reset (not enabled yet)

Any ESPEasy after the Fri Apr 13 shows horrible - speaking non-working - results, even when erasing the whole flash before flashing the binary (via Arduino IDE).

So, I'd suggest to go with 2.4.1 (or later) and polish ESPEasy (WIFI and config).
Core itself seems to be ok so far.

lol @Friday the 13th, an unlucky day indeed..
So what is "polish ESPEasy (WIFI and config)." ?
A different branch..?
Poland aka Polish or polish as in buff & shine, ha

@susisstrolch How do I catch errors like this ?
"problems with MTU fragmentation with packages > 512Bytes (out of order, with improper window information)."

Just send prepared request for espeasy webserver, with additional header with minimum 512 chars

@Oxyandy: running tcpdump on my FHEM server and analysing with WireShark, I found that the last 512 byte of a ~700 Byte JSON response was send first, followed by the HTTP header.
And those two packags where simply missing the TCP window information.
Can send more details on request...
polish as in buff & shine

For me, the version of 22.04.2018 with Core 2.4.1 runs quite well.
sysinfo

Could you also check my work of yesterday, but then built on 2.4.1?
https://github.com/TD-er/ESPEasy/tree/bugfix/wifi_stability

In 2.3.0 I still had issues with static IP.
Did not yet test the AP mode with setup page.

I just flashed a wemos with your version
[wifi] Attempt to make event based wifi simpler).

Vesion runs.....

What should I do now?

Shit, from my last Test (22.04.1018) 4 of 8 devices hung up after about 7 hours.

I guess no log? :(
Did the nodes crash (hang) or just not reconnect?
Do they reply to ping, and thus only the webserver disabled or too busy (MQTT reconnect takes a lot of resources) ?

Meanwhile hang 5 devices.
I do not have a log. The web server is not accessible. Ping does not work either.

they are just dead.

You should really try to log it. It might give us some useful hints

I log Gijs's version. It's been running for 55 minutes now. :)

oh, I can beat that! I have 12 devices running between 45 and 263Min's on Gijs version (with esp core from GIT) 😀 and still all of them happy...

Yes, times have changed.
In the past, my devices ran for weeks.

Today iam happy when they work a few hours. :)

One of my devices at home, is still running the build I made on 20171231 and is crossing to 60 days uptime today.

So I know what you mean :(

Then let's just take the 2017123 release, then we can focus on other things. :)

Local Time: | 2018-04-26 17:47:23
Uptime: | 0 days 2 hours 27 minutes
Load: | 10% (LC=9371)
Free Mem: | 10336 (9544 - sendContentBlocking)
IP: | 192.168.0.201
Wifi RSSI: | -67 dB

Hey, why not take that 60 day version, tag it with 2.0 and write some bullet-points about known issues (not missing features)

@s0170071 do we realy need 75% working product, and lots of issues filled after release?

Perhaps there's no need to go back so far... This is from latest manual reboot:

Uptime: 21 days 3 hours 32 minutes
Load: 32% (LC=6281)
Free Mem: 14328 (13392 - parseTemplate3)

Build | 20100 - Mega (core 2_3_0)
GIT version | mega-20180308
Plugins | 72 [Normal] [Testing] [Development]
Build Md5 | eb5a94ae675cb343cc387319fd8c4f9a
Md5 check | passed.
Build time | Mar 8 2018 03:05:36
Binary filename | firmware.bin

6 devices have been running for 5 hours - a new record.

That's over 30 hours already ;)

I have now nearly all (12+) devices running on @TD-er 's changes from tonigt. All Wemos D1 Mini's with a variety of sensors and relays attached (all different). Most of them have now 10h+ uptime. I'm going to flash the last ones also now.
I did have two or three spontaneous reboots from one or two devices but this could also be from plugins or faulty sensors (I also use some devel-plugins and even changed the config to support 24tasks and using esp GIT core, some even without clearing the config first). But they always came back up and connected to the network succesfully!

So for me this is the most stable version I had until now. similar to what i had before 2.4.0 core.

Therefore I'd vote to merge @TD-er changes from tonight have it tested and move on from there... But that's just MHO...

And thanks to @TD-er for the quick bugfix (try)!! For me it worked!!

I also had a reboot on one device, but it reconnected immediately.
All Devices are Wemos D1 mini.

I have run the test version here with 71 plugins.
On the devices are almost just BME280, Pir, MH-Z19 and a dust sensor and some Leds.

The web server reacts very fast.
At the moment I am very satisfied with this version and Core 2.4.1.

Maybe it is already known (if so, please ignore it)
I have installed a ESP pro mini with ESP_Easy_mega-20180422_normal_ESP8266_4096.bin (core 2.4.0).
It is running since 3 days !
The GUI is not reachable anymore, only after cold start. (testet with another esp)
Ping is ok, mqtt publishing work also, GPIO switch over http does also work.
The "only" problem, the gui is not reachable.
In other words, the ESP works blind, everthing is ok, only GUI does not respond.
After typing the ip in browser a get:

ily: sans-serif; font-size: 12pt; margin: 0px; padding: 0px; box-sizing: border-box; }h1 {font-size: 16pt; color: #07D; margin: 8px 0; font-weig190 ; color: #07D; }.button {margin: 4px; padding: 4px 16px; background-color: #07D; color: #FFF; text-decoration: none; border-radius: 4px; border:190 190 ative; cursor: pointer; font-size: 12pt; -webkit-user-select: none; -moz-user-select: none; -ms-u

I have enabled logging with my second device after cold reboot, after the device will become unresponsive a will report the log here, if wanted.

Could you also log the memory usage? Just to see if there is some memory leak in 2.4.1 as reported by some.

Yes, exactly the same I have with the version from 22.04.2018.
The device is running but the web server is not reachable.

I will try to log it.
Seems constant.

Free Mem: | 9792 (9008 - sendContentBlocking)

Do you mean sysheap?

unbenannt

@uzi18 you can't have both, cutting edge and stability. The 60 day version is stable, right?

@TD-er If you leave the Rules window open for a few minutes, all the rules will disappear.

That's with 2.3.0?

I have 2.4.1

Hi,
I am testing 20180426. Works but it's really slow compared to 20180424.
For me core 2.4.0 was working perfectly well and stable.
With new version, MQTT took more than 1 minute to connect while previuos version was half the time.
Am I just lucky with core 2.4.0? Or it's just a question of configuration?

I find the version of @TD-er from yesterday with Core 2.4.1 extremely fast.

After switching on the voltage, it only takes a few seconds for the web interface to be reached.
The MQTT messages come immediately.

just for the ones interested. I'm tracking CPU, Memory and RSSI. Attached a graph of all units. You can clearly see on the memory usage, when I did the upgrade to the 2.4.x core. However, memory seems to be stable (eg. no leak)...
Devices 1-11 & 16 are "in use" with sensors etc. the others are just plain D1's with nothing attached.

image

Hi @micropet : how can I use yesterday's version with core 2.4.1?

I am starting to think that maybe Wemos are more stable than SONOS?
I use Wemos too

@micropet and @TD-er : I just compiled the wifi stability branch with core 2.4.1.
WOW. Amazing fast.
Will leave it running for the next 3 days and report back. It contains quite complex rules...
For the time being: 7 seconds to connect to MQTT compared to 60 with 2.3.0 version 20180426.

I see some reconnections in MQTT Import.

104 : INIT : Free RAM:20040
104 : INIT : I2C
104 : INIT : SPI not enabled
1213 : INFO : Plugins: 72 [Normal] [Testing] [Development] (ESP82xx Core 2_4_1)
1214 : EVENT: System#Wake
1289 : WIFI : Set WiFi to STA
mode : sta(60:01:94:8e:ba:c9)
                             add if0
                                    1292 : WIFI : Connecting KeepOut attempt #0
1293 : IP   : Static IP : 192.168.1.206 GW: 192.168.1.1 SN: 255.255.255.0 DNS: 8.8.8.8
1405 : EVENT: System#Boot
1412 : ACT  : gpio,14,1
1414 : SW   : GPIO 14 Set to 1
1416 : ACT  : gpio,12,1
1417 : SW   : GPIO 12 Set to 1
1420 : ACT  : gpio,13,1
1420 : SW   : GPIO 13 Set to 1
1422 : ACT  :
1431 : ACT  : taskvalueset 1,1,1
1441 : ACT  : taskvalueset 1,2,1
1453 : ACT  : taskvalueset 1,3,1
1465 : ACT  : taskvalueset 1,4,1
1474 : ACT  :
1482 : ACT  :
1489 : ACT  : timerset,4,60
1568 : WD   : Uptime 0 ConnectFailures 0 FreeMem 18616
1682 : Dummy: value 1: 1.00
1683 : Dummy: value 2: 1.00
1683 : Dummy: value 3: 1.00
1683 : Dummy: value 4: 1.00
1684 : EVENT: Relay1#r1=1.00
1753 : EVENT: Relay1#r2=1.00
1824 : EVENT: Relay1#r3=1.00
1890 : EVENT: Relay1#r4=1.00
2251 : SYS  : 0.00
2253 : EVENT: SysInfoUptime#UptimeDays=0.00
3188 : IMPT : MQTT 037 Intentional reconnect
3562 : IMPT : MQTT 037 Intentional reconnect
scandone
        state: 0 -> 2 (b0)
                          5130 : Dummy: value 1: 25.80
5130 : Dummy: value 2: 27.20
5130 : Dummy: value 3: 27.40
5130 : Dummy: value 4: 0.00
5131 : EVENT: temp#t1=25.80
state: 2 -> 3 (0)
                 5158 : ACT  : timerset,1,2
state: 3 -> 5 (10)
                  add 0
                       aid 5
                            cnt
                                5174 : ACT  : lcd,1,20,*

connected with KeepOut, channel 9
                                 ip:192.168.1.206,mask:255.255.255.0,gw:192.168.1.1
   5239 : EVENT: temp#t2=27.20
5266 : ACT  : timerset,2,3
5276 : ACT  : lcd,1,20,*
5335 : EVENT: temp#t3=27.40
5365 : ACT  : timerset,3,4
5375 : ACT  : lcd,1,20,*
5428 : EVENT: temp#t4=0.00
5503 : Dummy: value 1: 18.00
5504 : Dummy: value 2: 11.00
5504 : Dummy: value 3: 12.00
5504 : Dummy: value 4: 0.00
5505 : EVENT: local#LSet1=18.00
5575 : EVENT: local#LSet2=11.00
5645 : EVENT: local#LSet3=12.00
5715 : EVENT: local#empty=0.00
6553 : Current Time Zone:  DST time start: 2018-03-25 02:00:00 offset: 120 minSTD time start: 2018-10-28 03:00:00 offset: 60 min
6554 : EVENT: Time#Initialized
6627 : EVENT: Clock#Time=Thu,22:59
6702 : IMPT : MQTT 037 Intentional reconnect
6964 : IMPT : Connected to MQTT broker with Client ID=ESPT6-Import
6965 : EVENT: MQTTimport#Connected
6981 : ACT  : publish /ESPT6/dummy/requestedTempUpdate,0
7059 : IMPT : [import1#Set1] subscribed to /OH2/status/nSetTemp1
7061 : IMPT : [import1#Set2] subscribed to /OH2/status/nSetTemp2
7062 : IMPT : [import1#Set3] subscribed to /OH2/status/nSetTemp3
7063 : IMPT : [import1#master] subscribed to /OH2/status/nMasterCaldaia
7065 : WIFI : Connected! AP: KeepOut (BC:EE:7B:EF:A3:38) Ch: 9 Duration: 3911 ms
7065 : EVENT: WiFi#ChangedAccesspoint
7144 : WIFI : Static IP: 192.168.1.206 (ESPT6-16) GW: 192.168.1.1 SN: 255.255.255.0   duration: 1940 ms
7173 : EVENT: Time#Set
7247 : EVENT: WiFi#Connected
7316 : Webserver: start
7332 : IMPT : MQTT 037 Intentional reconnect
7587 : IMPT : Error subscribing to /OH2/status/nSetTemp1
7588 : EVENT: Rules#Timer=1
ping 1, timeout 0, total payload 32 bytes, 1067 ms
                                                  7648 : [if 0=1]=false
7650 : else = true
7651 : ACT  : timerset,5,6
7688 : EVENT: Rules#Timer=1 Processing time:100 milliSeconds
7690 : MQTT : Intentional reconnect
7704 : MQTT : Connected to broker with client ID: ESPClient_60:01:94:8E:BA:C9
7707 : Subscribed to: /ESPT6/#
7708 : EVENT: MQTT#Connected
7722 : ACT  : publish /ESPT6/dummy/requestedTempUpdate,0
7813 : EVENT: MQTT#Connected Processing time:105 milliSeconds
7828 : IMPT : [import1#Set1] : 18.00
7828 : EVENT: import1#Set1=18.00
7882 : ACT  : taskvalueset,6,1,18
7893 : ACT  : timerset,1,2
7904 : ACT  : lcd,1,20,*
7955 : EVENT: import1#Set1=18.00 Processing time:127 milliSeconds
8065 : MQTT : Topic: /ESPT6/status/LWT
8065 : MQTT : Payload: Connected
8075 : IMPT : [import1#Set2] : 11.00
8075 : EVENT: import1#Set2=11.00
8131 : ACT  : taskvalueset,6,2,11
8143 : ACT  : timerset,2,3
8152 : ACT  : lcd,1,20,*
8199 : EVENT: import1#Set2=11.00 Processing time:124 milliSeconds
8206 : MQTT : Topic: /ESPT6/dummy/requestedTempUpdate
8207 : MQTT : Payload: 0
8218 : MQTT : Topic: /ESPT6/Relay1/r1
8218 : MQTT : Payload: 0
8219 : MQTT : Topic: /ESPT6/Relay1/r2
8219 : MQTT : Payload: 1
8220 : MQTT : Topic: /ESPT6/Relay1/r3
8220 : MQTT : Payload: 1
8220 : MQTT : Topic: /ESPT6/Relay1/r4
8220 : MQTT : Payload: 1
8221 : MQTT : Topic: /ESPT6/SysInfoUptime/UptimeDays
8221 : MQTT : Payload: 0.1
8222 : MQTT : Topic: /ESPT6/status/LWT
8222 : MQTT : Payload: Connected
8223 : MQTT : Topic: /ESPT6/dummy/requestedTempUpdate
8223 : MQTT : Payload: 0
ping 1, timeout 0, total payload 32 bytes, 1112 ms
                                                  8320 : IMPT : [import1#Set3] : 12.00
8320 : EVENT: import1#Set3=12.00
8376 : ACT  : taskvalueset,6,3,12
8387 : ACT  : timerset,3,4
8396 : ACT  : lcd,1,20,*
8441 : EVENT: import1#Set3=12.00 Processing time:121 milliSeconds
8565 : IMPT : [import1#master] : 0.00
8565 : EVENT: import1#master=0.00
8581 : ACT  : timerset,1,2
8591 : ACT  : timerset,2,3
8600 : ACT  : timerset,3,4
8608 : ACT  : lcd,1,20,*
8684 : EVENT: import1#master=0.00 Processing time:119 milliSeconds
8696 : EVENT: MQTTimport#Disconnected
8774 : EVENT: MQTTimport#Disconnected Processing time:78 milliSeconds
8775 : IMPT : MQTT 037 Connection lost
9712 : IMPT : Connected to MQTT broker with Client ID=ESPT6-Import
9713 : EVENT: MQTTimport#Connected
9725 : ACT  : publish /ESPT6/dummy/requestedTempUpdate,0
9809 : EVENT: MQTTimport#Connected Processing time:96 milliSeconds
9813 : IMPT : [import1#Set1] subscribed to /OH2/status/nSetTemp1
9813 : IMPT : [import1#Set2] subscribed to /OH2/status/nSetTemp2
9814 : IMPT : [import1#Set3] subscribed to /OH2/status/nSetTemp3
9815 : IMPT : [import1#master] subscribed to /OH2/status/nMasterCaldaia
9817 : MQTT : Topic: /ESPT6/dummy/requestedTempUpdate
9817 : MQTT : Payload: 0
9931 : IMPT : [import1#Set1] : 18.00
9931 : EVENT: import1#Set1=18.00
9985 : ACT  : taskvalueset,6,1,18
9996 : ACT  : timerset,1,2
10005 : ACT  : lcd,1,20,*
10053 : EVENT: import1#Set1=18.00 Processing time:122 milliSeconds
10173 : IMPT : [import1#Set2] : 11.00
10174 : EVENT: import1#Set2=11.00
10228 : ACT  : taskvalueset,6,2,11
10239 : ACT  : timerset,2,3
10248 : ACT  : lcd,1,20,*
10295 : EVENT: import1#Set2=11.00 Processing time:121 milliSeconds
10414 : IMPT : [import1#Set3] : 12.00
10414 : EVENT: import1#Set3=12.00
10470 : ACT  : taskvalueset,6,3,12

@giig1967g Any chance you could make a bin file for that that I could download? 4meg version? Still learning the compile process....

@ giig1967g, as I said - extremely fast and it also looks stable.

I now had a reconnect in 7 hours.

@giig1967g Running static IP still has some issues with my new code.
Especially when running with MQTT.
DHCP seems to be working fine.

Tomorrow is going to be a very busy day for me, since it is "Kingsday" and the royal family will be in Groningen. Also I am invited to be there to -maybe- tell the king about our issues here.
So I will be off to bed now and my suggestion is to not merge code today to the main branch. Tomorrow we will continue development and make the best wifi connecting code possible :)

ok, I found an issue on both 20180426 (2.3.0) and WifistabilityBranch (2.4.1).
If I switch off the router and then back on, the unit does not reconnect to Wifi despite it writes on serial "Wifi#Connected". Unit is working (serial and rules ok) but no WiFi connection so no web interface.

@ TD-he, let's do that. Greetings to the King - maybe he has another idea. :)
And good night.

@TD-er : good luck with your real life issues...

Some here, if I disconnect WIFI for 0.2 Seconds:

29113846 : ACT : timerSet,1,60
29115651 : MQTT : Connection lost
29115652 : EVENT: MQTT#Disconnected
29115689 : MQTT : Failed to connect to broker
29115690 : EVENT: WiFi#Disconnected
29115706 : WIFI : Disconnected! Reason: '(1) Unspecified' Connected for 8h04m <-------- !!
29116189 : MQTT : Failed to connect to broker
29116939 : MQTT : Failed to connect to broker
29117860 : WD : Uptime 485 ConnectFailures 6 FreeMem 16416
29117881 : MQTT : Failed to connect to broker
29117938 : MQTT : Failed to connect to broker
29119189 : MQTT : Failed to connect to broker
29120689 : MQTT : Failed to connect to broker
29120736 : DS : Temperature: 19.94 (28-ff-b8-ea-b4-16-3-ed)
29120738 : EVENT: DS18b20#Temperature=19.94
29122440 : MQTT : Failed to connect to broker
29124440 : MQTT : Failed to connect to broker
29126441 : MQTT : Failed to connect to broker
29128442 : MQTT : Failed to connect to broker

@giig1967g
You could look for the function to start/stop the webserver and add a return; on the first line of that function.
https://github.com/TD-er/ESPEasy/blob/f9be283cb70043733fdc45575457a85244660ea8/src/WebServer.ino#L570-L585

I guess there is some issue with calling the 'gotIP' function when using static IP.
That's also the reason of the instability when running MQTT + static IP.

But that is probably for @ TD-er a trifle. :)

Hm, I'm not looking through.
I think you better do that tomorrow.
In addition, I work here with DHCP.

Seems for me stability with my hardware is always 'better' (not perfect) with 2.3.0 core builds

  • tried 2.4.0 & 2.41 -- they are worse...

0403 mega normal - just seems to work & versions before..
Anyone still with problems like me able to try 0403 please..
@susisstrolch & @uzi18 - thanks both for your responses.. I have a better idea of how I can see communications now - wireshark & usb wifi 2.4g adapter will work, I have plenty spare
Thanks

Mine is running almost 17hours now. Single connection

update from my units (name|uptime in minutes|last disc. reason|wifi connected in ms):
wemos_mini_01_sysinfo | 1220 | 200 | 6462869
wemos_mini_02_sysinfo | 1223 | 1 | 19359544
wemos_mini_03_sysinfo | 657 | 1 | 1018597
wemos_mini_04_sysinfo | 1078 | 201 | 439668
wemos_mini_05_sysinfo | 650 | 6 | 9194816
wemos_mini_06_sysinfo | 927 | 1 | 955432
wemos_mini_07_sysinfo | 1142 | 1 | 14078412
wemos_mini_08_sysinfo | 730 | 1 | 7848454
wemos_mini_09_sysinfo | 1005 | 1 | 5536489
wemos_mini_10_sysinfo | 550 | 201 | 465734
wemos_mini_11_sysinfo | 662 | 4 | 15658520
wemos_mini_12_sysinfo | 1211 | 1 | 17915701
wemos_mini_13_sysinfo | 1211 | 1 | 17896590
wemos_mini_14_sysinfo | 1210 | 1 | 17882406
wemos_mini_15_sysinfo | 753 | 1 | 58904600
wemos_mini_16_sysinfo | 1197 | 1 | 17210855

one reboot of unit 10 (could also be related to plugins). all units with DHCP and FHEM controller as well as regular JSON status updates (called via HTTPMOD from fhem).
webserver on all of them still running and very responsive. never used static-ip or setup-page though.

so for me this seems to be a rather stable version.

as I'm mostly offline the next 4 days, I'll see next week how many are still alive 😃

Oh, and greetings to the king! i hope he's into IoT too 😀

@clumsy-stefan this version? https://github.com/TD-er/ESPEasy/tree/bugfix/wifi_stability
and what core ?
Have you tried to restart the router?

@melwinek git commit: d582cab938f041f622f2d4d8016b3d4bada55580 from esp8266 master branch (so latest commit on master branch from core development).

@clumsy-stefan core 2.3.0, 2.4.0 or 2.4.1 ?

latest GIT commit! (so Issume it's core 2.4.1+)

I can not find this commit.
Latest is: https://github.com/letscontrolit/ESPEasy/commit/d780f1a07fdcd4eec394a0677866c2a9778eb696
Can you provide a link?

@clumsy-stefan I understand, you wrote git to the core.
Which ESPEasy commit do you compile?

ESPEasy commit f9be283 from https://github.com/TD-er/ESPEasy/tree/bugfix/wifi_stability branch
esp8266 commit d582cab from https://github.com/esp8266/Arduino

@TD-er:

@giig1967g
You could look for the function to start/stop the webserver and add a return; on the first line of that function.
https://github.com/TD-er/ESPEasy/blob/f9be283cb70043733fdc45575457a85244660ea8/src/WebServer.ino#L570-L585

I guess there is some issue with calling the 'gotIP' function when using static IP.
That's also the reason of the instability when running MQTT + static IP.

Hi, the problem is not the webserver it's the unit that seems connected to Wifi but it's not. Cannot ping, cannot send MQTT, etc.

Latest github runs perfectly here on core 2.4.1. No connection issues, no members leaks

@mvdbro Do you use an esp32 or an esp8266 device?

I use both ESP8266 and ESP32. Both work fine.

Side note:
Always using static IP
Using only plugins that i need (keeps RAM at a safe minimum. I think the default set has to many plugins loaded)

Great! :smile:
I try core 2.4.1 on my esp8266 too.
Static and dhcp is working.
dnsserver and captive portal works.
ntp works!

@Feuerreiter : did you try to switch off and on the router to see if the unit reconnects?
In my case the log says that it reconnected but it didn't.

@giig1967g I would try it later at home. I have done the test in my car with my mobil phone as a hotspot. ;-)

@mvdbro

I think the default set has to many plugins loaded

I agree. There should be some preselection on plugins, or maybe a better check on all of them to not use any more memory than absolutely needed when not used.
And there is a lot of memory to gain when looking closely at the big structs containing information about the plugins available.

[PlatformIO] Updated core to 2.4.1
1+

I think that's a good decision.
I have no problems since yesterday afternoon.

As I posted in another thread:
For those who need a little help in building, I just built a version of the patch I wrote 2 days ago, but now with core 2.4.1:
TD-er_wifi_stability_core-2.4.1

Memory stays pretty constant.
It sinks a little in the first 15 minutes.
(They are not to be seen here)
19:05:33 ESP-206/SYSHEAP 11536
19:08:34 ESP-206/SYSHEAP 11536
19:11:36 ESP-206/SYSHEAP 11536
19:14:37 ESP-206/SYSHEAP 11536
19:17:40 ESP-206/SYSHEAP 11536
19:20:42 ESP-206/SYSHEAP 11536
19:23:45 ESP-206/SYSHEAP 11536
19:26:47 ESP-206/SYSHEAP 11536
19:29:50 ESP-206/SYSHEAP 11536
19:32:52 ESP-206/SYSHEAP 11536
19:35:54 ESP-206/SYSHEAP 11536
19:38:57 ESP-206/SYSHEAP 11536
19:41:59 ESP-206/SYSHEAP 11536
19:45:01 ESP-206/SYSHEAP 11536
19:48:04 ESP-206/SYSHEAP 11536
19:51:06 ESP-206/SYSHEAP 11536
19:54:09 ESP-206/SYSHEAP 11536
19:57:11 ESP-206/SYSHEAP 11536
20:00:13 ESP-206/SYSHEAP 11536
20:03:16 ESP-206/SYSHEAP 11536
20:06:17 ESP-206/SYSHEAP 11536
20:09:29 ESP-206/SYSHEAP 11656
20:12:19 ESP-206/SYSHEAP 11592
20:15:21 ESP-206/SYSHEAP 11592
20:18:23 ESP-206/SYSHEAP 11592
20:21:24 ESP-206/SYSHEAP 11592
20:24:25 ESP-206/SYSHEAP 13192
20:27:27 ESP-206/SYSHEAP 11592
20:30:30 ESP-206/SYSHEAP 11592
20:33:31 ESP-206/SYSHEAP 11592
20:36:34 ESP-206/SYSHEAP 11592
20:39:36 ESP-206/SYSHEAP 11592
20:42:39 ESP-206/SYSHEAP 11592
20:45:40 ESP-206/SYSHEAP 11592
20:48:43 ESP-206/SYSHEAP 11592
20:51:45 ESP-206/SYSHEAP 11592
20:54:48 ESP-206/SYSHEAP 11592
20:57:50 ESP-206/SYSHEAP 11592
21:00:52 ESP-206/SYSHEAP 11592
21:03:54 ESP-206/SYSHEAP 11592
21:06:56 ESP-206/SYSHEAP 11424
21:09:58 ESP-206/SYSHEAP 13024
21:13:01 ESP-206/SYSHEAP 11424
21:16:03 ESP-206/SYSHEAP 13024
21:19:06 ESP-206/SYSHEAP 11424
21:22:08 ESP-206/SYSHEAP 11448
21:25:10 ESP-206/SYSHEAP 11424
21:28:13 ESP-206/SYSHEAP 11424
21:31:15 ESP-206/SYSHEAP 11424
21:34:18 ESP-206/SYSHEAP 11424
21:37:20 ESP-206/SYSHEAP 11424
21:40:22 ESP-206/SYSHEAP 11424
21:43:24 ESP-206/SYSHEAP 11424
21:46:27 ESP-206/SYSHEAP 11424
21:49:28 ESP-206/SYSHEAP 13024
21:52:31 ESP-206/SYSHEAP 11424
21:55:33 ESP-206/SYSHEAP 11424
21:58:36 ESP-206/SYSHEAP 11424
22:01:38 ESP-206/SYSHEAP 11424

Meanwhile, 8 devices have been running since noon yesterday.
None of them got stuck.

One was not accessible for about 15 minutes - suddenly it was back on the network.
Overall, a pleasing result.

Very good to hear.
Let's hope @Oxyandy also could share similar positive results with the latest build I shared.
His nodes were the most critical when using 2.4.x

@TD-er Morning, finally found this post after doing a catch up
0403 using 2.4.1 went great all night..
Now flashing from your latest rar normal 1024 8266
connects first try, updates time straight away, no wifi errors (so far) & stays connected (so far),
web server responds every time..
Need a bit more time testing, but looking good

That's good to hear :)

I will have a look at the NTP issue reported by @giig1967g and then push this code to the ESPeasy repository.
Let's hope for the best.

It would be really great if we could leave the wifi for what it is and continue with the rest.

Hoping you could fine tune a few things I supply feedback on, once things show as stable again.
I am going to try a few specific tests with your wifi_stability_core-2.4.1 source from Github
& maybe fixes against my long standing issues, if source has not changed that radically

If you have some fixes, please share them.

@giig1967g I have do the testings. Very short disconnects are no problem. AP off and immediately on. My mcunode connect if AP is back. My pc has already have connected to a other AP.

Also just started testing with D1-Mini and BME280

ESPEasy: commit 2abec2b0bb74018ea76203886f683761796091a2
Merge: 16d3a9f 29f89b6
Author: Susis Strolch juorschiedt@gmail.com
Date: Sat Apr 28 10:26:14 2018 +0200

Merge remote-tracking branch 'upstream/mega' into mega

* upstream/mega:
  automaticly updated release notes for mega-20180428

Core: commit 41a64707f149d01ace37c903f448d5e3f1cee5d8
Author: Marcel Stör marcelstoer@users.noreply.github.com
Date: Thu Apr 26 01:46:17 2018 +0200

Fix WiFi status formatting issue (bullet list) (#4671)

Custom.h:
`#warning "** Using Settings from Custom.h File *"

if defined(ESP8266)

//enable Arduino OTA updating.
//Note: This adds around 10kb to the firmware size, and 1kb extra ram.
// #define FEATURE_ARDUINO_OTA

//enable mDNS mode (adds about 6kb ram and some bytes IRAM)
// #define FEATURE_MDNS

endif

undef PLUGIN_BUILD_NORMAL

undef PLUGIN_BUILD_TESTING

undef PLUGIN_BUILD_DEV

define PLUGIN_BUILD_CUSTOM

undef BUILD_UPLOADER

if defined(BUILD_UPLOADER)

#warning "**** Building ESP8285 Uploader image ***"

else

// define our own plugins
#define USES_P001 // Switch
#define USES_P002 // ADC
#define USES_P004 // Dallas
#define USES_P005 // DHT
#define USES_P013 // HCSR04
#define USES_P026 // SysInfo
#define USES_P028 // BME280
#define USES_P033 // Dummy

#define USES_C008   // Generic HTTP
#define USES_C009   // FHEM HTTP
#define USES_C013   // ESPEasy P2P network

endif

undef BUILD_GIT

define BUILD_GIT "2abec2b"`

Seems there are some problems with NTP:
`
INIT : Booting version: 2abec2b (ESP82xx Core 41a64707)

80 : INIT : Warm boot #2

81 : FS : Mounting...

106 : FS : Mount successful, used 76053 bytes of 957314

115 : CRC : No program memory checksum found. Check output of crc2.py

144 : CRC : SecuritySettings CRC ...OK

227 : INIT : Free RAM:32208

227 : INIT : I2C

227 : INIT : SPI not enabled

232 : INFO : Plugins: 8 (ESP82xx Core 41a64707)

233 : EVENT: System#Wake

241 : WIFI : Set WiFi to STA
242 : WIFI : Connecting SusiconStrolch attempt #0
355 : EVENT: System#Boot
364 : WD : Uptime 0 ConnectFailures 0 FreeMem 31504
3987 : BMx280 : Detected BME280
5575 : BME280: dew point 8.03C
5576 : BME280 : Address: 0x76
5576 : BME280 : Temperature: 18.49
5576 : BME280 : Humidity: 50.75
5576 : BME280 : Barometric Pressure: 1010.58
5583 : EVENT: BMx280#Temperature=18.49
5592 : EVENT: BMx280#Humidity=50.75
5597 : EVENT: BMx280#Pressure=1010.58
5853 : Current Time Zone: DST time start: 2018-03-25 02:00:00 offset: 120 minSTD time start: 2018-10-28 03:00:00 offset: 60 min
5853 : EVENT: Time#Initialized
5862 : EVENT: Clock#Time=Sat,10:52
5866 : ACT : taskvalueset 12,1,0
5872 : ACT : taskvalueset 12,2,-58
5877 : ACT : taskvalueset 12,3,29912
5883 : ACT : taskvalueset 12,4,39164
5888 : WIFI : Connected! AP: SusiconStrolch (38:10:D5:B2:22:1E) Ch: 13 Duration: 3783 ms
5888 : EVENT: WiFi#ChangedAccesspoint
5894 : WIFI : DHCP IP: 192.168.254.71 (D1pro-01-11) GW: 192.168.254.1 SN: 255.255.255.0 duration: 17 ms
5913 : Current Time Zone: DST time start: 2036-03-30 02:00:00 offset: 120 minSTD time start: 2036-10-26 03:00:00 offset: 60 min
5914 : EVENT: Time#Set
5921 : EVENT: WiFi#Connected
5928 : Webserver: start
5935 : EVENT: Clock#Time=Thu,07:28
`
5853 : Current Time Zone: DST time start: 2018-03-25 02:00:00
5913 : Current Time Zone: DST time start: 2036-03-30 02:00:00 offset: 120 minSTD time start:

That's a known -1 error code and still converted to time bug.

Hmm, I still wonder how the call to NTP may result in a -1.

@susisstrolch
What version is that? ESPeasy used the BUILD_GIT value to determine what should be patched in settings files and maybe other files too. It is some kind of internal version used to determine the needed patches. (if any)
when you change that value, it may result in strange things.

I guess it‘s UDP related. I don‘t see any of my other devices. And visa versa.

@susisstrolch That is with static IP, or with DHCP?

@TD-er patching based on BUILD_GIT is a bad idea, cause it changes on forks, branches and local commits.
There should be some different type/variable - may be BUILD_FEATURE - which controls such a behaviour.

As i almost never reboot my AP, it went unnoticed that a reconnect fails with latest github source (20180428). I tried to get some internal status to see what's going on:

After boot:
call to Wifi.status() : 3 means WL_CONNECTED
variable wifiStatus : 3 means ESPEASY_WIFI_SERVICES_INITIALIZED

After reboot AP
call to Wifi.status() : 3 means WL_CONNECTED
variable wifiStatus : 0 means ESPEASY_WIFI_DISCONNECTED

this does not change over time and the ESP never reconnects

Just wondering why the Wifi.status() call to core still reports a status 3 (WL_CONNECTED)
Is this because the event based wifi interferes with internal core status?

Behavior is the same on core 2_3_0 and 2_4_1

That‘s with DHCP

And i would expect after normal reboot:

call to Wifi.status() : 3 means WL_CONNECTED
variable wifiStatus : 1 means ESPEASY_WIFI_CONNECTED

instead of:
call to Wifi.status() : 3 means WL_CONNECTED
variable wifiStatus : 3 means ESPEASY_WIFI_SERVICES_INITIALIZED

Hmm, that's a good question @mvdbro
Maybe the status WL_CONNECTED is never updated because it is not calling the function to update that status.
Just from memory, I would say that function is called in the core library when the event "got IP" is handled.
I will look into that area of the core library code.
Thanks for noticing.

@susisstrolch OK, I will also try with DHCP here.
My test units can see eachother via the internal ESPeasy UDP communication, but they are currently running on static IP, since that gave most of the issues lately.

@TD-er hold on - will check if it's a core related problem.

@mvdbro
Here is the code:
https://github.com/esp8266/Arduino/blob/836c7da8cc1ad11a66e0be1f30d35a92b5317bcc/libraries/ESP8266WiFi/src/ESP8266WiFiSTA.cpp#L497-L513

Indeed, as long as the internal status is not set to 'got IP', it will not return WL_CONNECTED.

About the difference in variable names between the enum/defines in ESPeasy and core library.
The core library states do not really reflect the true state, since you can be connected and not have an IP.

I may be better to keep this thread for Wifi connect/reconnect issues only so this can be fixed in the first place.

I guess these are the effective codes used:

Wifi.status() codes:
WL_IDLE_STATUS 0
WL_NO_SSID_AVAIL 1
WL_SCAN_COMPLETED 2
WL_CONNECTED 3
WL_CONNECT_FAILED 4
WL_CONNECTION_LOST 5
WL_DISCONNECTED 6

wifiStatus codes:
ESPEASY_WIFI_DISCONNECTED 0
ESPEASY_WIFI_CONNECTED 1
ESPEASY_WIFI_GOT_IP 2
ESPEASY_WIFI_SERVICES_INITIALIZED 3

I guess it may very well be related to the wifi connect/reconnect issue.
With static IP, there simply is no event for "got IP", so its current handling may be incomplete, which is causing some of these issues.

So After reboot AP
call to Wifi.status() : 3 means WL_CONNECTED
This is not correct.

If the Wifi.status() does not work as expected, this would be a serious arduino core bug. Shouldn't this be reported on their github issue tracker?

@susisstrolch Just experienced the same. Updated a known good and ready configured device. Did not discover other units.

Solution: check the UDP port entry. (65500) Mine was just gone. Another reboot and it was working.
We should really go for the JSON based config !!!

@mvdbro I just found this, see top of page 39:
https://www.espressif.com/sites/default/files/documentation/2c-esp8266_non_os_sdk_api_reference_en.pdf
So I will now test to see if we have to set the IP config (again) after processing the connect event.

Can someone test the code in this PR: https://github.com/letscontrolit/ESPEasy/pull/1328

@s0170071 - confirmed - setting of UDP port changed.

@TD-er about #1328:

INIT : Booting version: 62e6317 (ESP82xx Core 41a64707)
75 : INIT : Warm boot #1
76 : FS   : Mounting...
101 : FS   : Mount successful, used 76053 bytes of 957314
111 : CRC  : No program memory checksum found. Check output of crc2.py
142 : CRC  : SecuritySettings CRC   ...OK 
248 : INIT : Free RAM:31624
248 : INIT : I2C
248 : INIT : SPI not enabled
253 : INFO : Plugins: 8 (ESP82xx Core 41a64707)
254 : EVENT: System#Wake
261 : WIFI : Set WiFi to STA
        mode : sta(5c:cf:7f:f1:bb:e1)
        add if0
264 : WIFI : Connecting SusiconStrolch attempt #0
267 : OTA  : Arduino OTA enabled on port 8266
379 : EVENT: System#Boot
390 : WD   : Uptime 0 ConnectFailures 0 FreeMem 30112
        scandone
        state: 0 -> 2 (b0)
4014 : BMx280 : Detected BME280
        state: 2 -> 3 (0)
        state: 3 -> 5 (10)
        add 0
        aid 3
        cnt 
        connected with SusiconStrolch, channel 13
        dhcp client start...
        ip:192.168.254.71,mask:255.255.255.0,gw:192.168.254.1
5602 : BME280: dew point 8.12C
5603 : BME280 : Address: 0x76
5603 : BME280 : Temperature: 20.25
5603 : BME280 : Humidity: 45.75
5603 : BME280 : Barometric Pressure: 1010.14
5611 : EVENT: BMx280#Temperature=20.25
5620 : EVENT: BMx280#Humidity=45.75
5626 : EVENT: BMx280#Pressure=1010.14
5884 : Current Time Zone:  DST time start: 2018-03-25 02:00:00 offset: 120 minSTD time start: 2018-10-28 03:00:00 offset: 60 min
5884 : EVENT: Time#Initialized
5893 : EVENT: Clock#Time=Sat,16:10
5898 : ACT  : taskvalueset 12,1,0
5903 : ACT  : taskvalueset 12,2,-60
5908 : ACT  : taskvalueset 12,3,28376
5914 : ACT  : taskvalueset 12,4,58217
5921 : WIFI : Connected! AP: SusiconStrolch (38:10:D5:B2:22:1E) Ch: 13 Duration: 3788 ms
5921 : EVENT: WiFi#ChangedAccesspoint
5927 : IP   : Static IP : 192.168.254.71 GW: 192.168.254.1 SN: 255.255.255.0 DNS: 192.168.254.1
        STUB: dhcp_stop
5932 : WIFI : Static IP: 192.168.254.71 (D1pro-01-11) GW: 192.168.254.1 SN: 255.255.255.0   duration: 1879 ms
5957 : Current Time Zone:  DST time start: 2036-03-30 02:00:00 offset: 120 minSTD time start: 2036-10-26 03:00:00 offset: 60 min
5957 : EVENT: Time#Set
5964 : EVENT: WiFi#Connected
5971 : Webserver: start
5979 : EVENT: Clock#Time=Thu,07:28
5989 : ACT  : taskvalueset 12,1,0
5999 : ACT  : taskvalueset 12,2,-59
6006 : ACT  : taskvalueset 12,3,26040
6014 : ACT  : taskvalueset 12,4,26896
6019 : EVENT: Clock#Time=Thu,07:28 Processing time:40 milliSeconds
        ping 1, timeout 0, total payload 32 bytes, 1064 ms
        ping 1, timeout 0, total payload 32 bytes, 1065 ms
7269 : UDP  : 5C:CF:7F:23:CB:63,192.168.254.97,7
8088 : UDP  : 5C:CF:7F:1C:0B:DD,192.168.254.94,4
8396 : UDP  : 5C:CF:7F:1B:E4:F7,192.168.254.92,2
11998 : Dummy: value 1: 0.00
12000 : Dummy: value 2: -59.00
12000 : Dummy: value 3: 26040.00
12001 : Dummy: value 4: 26896.00
12006 : EVENT: sysinfo#uptime=0.00
12015 : EVENT: sysinfo#uptime=0.00 Processing time:9 milliSeconds
12016 : EVENT: sysinfo#RSSI=-59.00
12025 : EVENT: sysinfo#RSSI=-59.00 Processing time:8 milliSeconds
12025 : EVENT: sysinfo#sysheap=26040.00
12034 : EVENT: sysinfo#sysheap=26040.00 Processing time:9 milliSeconds
12034 : EVENT: sysinfo#syssec_d=26896.00
12043 : EVENT: sysinfo#syssec_d=26896.00 Processing time:9 milliSeconds
12068 : HTTP : connecting to 192.168.254.27:8383
12275 : HTTP : closing connection
        pm open,type:2 0
14437 : UDP  : 5C:CF:7F:9E:CB:D4,192.168.254.99,9
18430 : UDP  : 5C:CF:7F:1B:E9:2F,192.168.254.91,1
18533 : UDP  : 5C:CF:7F:23:C5:5A,192.168.254.96,6
25189 : UDP  : 60:01:94:83:B1:70,192.168.254.80,10
30390 : WD   : Uptime 1 ConnectFailures 0 FreeMem 24816
30391 : UDP  : Send Sysinfo message
34917 : UDP  : 5C:CF:7F:9E:CC:3D,192.168.254.98,8
37273 : UDP  : 5C:CF:7F:23:CB:63,192.168.254.97,7
38092 : UDP  : 5C:CF:7F:1C:0B:DD,192.168.254.94,4
38501 : UDP  : 5C:CF:7F:1B:E4:F7,192.168.254.92,2
44441 : UDP  : 5C:CF:7F:9E:CB:D4,192.168.254.99,9
48436 : UDP  : 5C:CF:7F:1B:E9:2F,192.168.254.91,1
48641 : UDP  : 5C:CF:7F:23:C5:5A,192.168.254.96,6
49979 : EVENT: Clock#Time=Thu,07:29
49987 : ACT  : taskvalueset 12,1,1
49994 : ACT  : taskvalueset 12,2,-57
50001 : ACT  : taskvalueset 12,3,25544
50009 : ACT  : taskvalueset 12,4,26940
50014 : EVENT: Clock#Time=Thu,07:29 Processing time:35 milliSeconds
55193 : UDP  : 60:01:94:83:B1:70,192.168.254.80,10
60390 : WD   : Uptime 1 ConnectFailures 0 FreeMem 24816
60392 : UDP  : Send Sysinfo message
64922 : UDP  : 5C:CF:7F:9E:CC:3D,192.168.254.98,8
66569 : BME280: dew point 8.10C
66571 : BME280 : Address: 0x76
66572 : BME280 : Temperature: 20.28
66572 : BME280 : Humidity: 45.58
66573 : BME280 : Barometric Pressure: 1010.10
66576 : EVENT: BMx280#Temperature=20.28
66587 : EVENT: BMx280#Temperature=20.28 Processing time:11 milliSeconds
66588 : EVENT: BMx280#Humidity=45.58
66594 : EVENT: BMx280#Humidity=45.58 Processing time:6 milliSeconds
66595 : EVENT: BMx280#Pressure=1010.10
66602 : EVENT: BMx280#Pressure=1010.10 Processing time:7 milliSeconds
66627 : HTTP : connecting to 192.168.254.27:8383
66833 : HTTP : closing connection
67277 : UDP  : 5C:CF:7F:23:CB:63,192.168.254.97,7
68096 : UDP  : 5C:CF:7F:1C:0B:DD,192.168.254.94,4
68403 : UDP  : 5C:CF:7F:1B:E4:F7,192.168.254.92,2
72860 : Dummy: value 1: 1.00
72861 : Dummy: value 2: -57.00
72862 : Dummy: value 3: 25544.00
72863 : Dummy: value 4: 26940.00
72865 : EVENT: sysinfo#uptime=1.00
72872 : EVENT: sysinfo#uptime=1.00 Processing time:7 milliSeconds
72872 : EVENT: sysinfo#RSSI=-57.00
72878 : EVENT: sysinfo#RSSI=-57.00 Processing time:6 milliSeconds
72879 : EVENT: sysinfo#sysheap=25544.00
72887 : EVENT: sysinfo#sysheap=25544.00 Processing time:8 milliSeconds
72888 : EVENT: sysinfo#syssec_d=26940.00
72897 : EVENT: sysinfo#syssec_d=26940.00 Processing time:9 milliSeconds
72924 : HTTP : connecting to 192.168.254.27:8383
73129 : HTTP : closing connection
74446 : UDP  : 5C:CF:7F:9E:CB:D4,192.168.254.99,9
78747 : UDP  : 5C:CF:7F:23:C5:5A,192.168.254.96,6
78950 : UDP  : 5C:CF:7F:1B:E9:2F,192.168.254.91,1
85197 : UDP  : 60:01:94:83:B1:70,192.168.254.80,10
90390 : WD   : Uptime 2 ConnectFailures 0 FreeMem 24816
90391 : UDP  : Send Sysinfo message
94925 : UDP  : 5C:CF:7F:9E:CC:3D,192.168.254.98,8
97280 : UDP  : 5C:CF:7F:23:CB:63,192.168.254.97,7
98101 : UDP  : 5C:CF:7F:1C:0B:DD,192.168.254.94,4
98407 : UDP  : 5C:CF:7F:1B:E4:F7,192.168.254.92,2
104448 : UDP  : 5C:CF:7F:9E:CB:D4,192.168.254.99,9
108852 : UDP  : 5C:CF:7F:23:C5:5A,192.168.254.96,6
108954 : UDP  : 5C:CF:7F:1B:E9:2F,192.168.254.91,1
110842 : EVENT: Clock#Time=Thu,07:30
110849 : ACT  : taskvalueset 12,1,2
110856 : ACT  : taskvalueset 12,2,-54
110864 : ACT  : taskvalueset 12,3,24504
110871 : ACT  : taskvalueset 12,4,27000
110876 : EVENT: Clock#Time=Thu,07:30 Processing time:34 milliSeconds

As you can see DHCP is started even when set to static IP...

And here's my JSON

{"System":{
"Build":20102,
"Git Build":"62e6317",
"Local time":"2036-02-07 07:33:33",
"Unit":11,
"Name":"D1pro-01",
"Uptime":5,
"Load":1,
"Load LC":10747,
"Free RAM":25280
},
"WiFi":{
"Hostname":"D1pro-01-11",
"IP":"192.168.254.71",
"Subnet Mask":"255.255.255.0",
"Gateway IP":"192.168.254.1",
"MAC address":"5C:CF:7F:F1:BB:E1",
"DNS 1":"192.168.254.1",
"DNS 2":"0.0.0.0",
"SSID":"SusiconStrolch",
"BSSID":"38:10:D5:B2:22:1E",
"Channel":13,
"Connected msec":319382,
"Last Disconnect Reason":1,
"Last Disconnect Reason str":"(1) Unspecified",
"RSSI":-59
},
"Sensors":[
{
"TaskNumber":4,
"Type":"Environment - BMx280",
"TaskName":"BMx280",
"TaskValues": [
{"ValueNumber":1,
"Name":"Temperature",
"Value":20.31},
{"ValueNumber":2,
"Name":"Humidity",
"Value":44.70},
{"ValueNumber":3,
"Name":"Pressure",
"Value":1010.10}]
},
{
"TaskNumber":12,
"Type":"Generic - Dummy Device",
"TaskName":"sysinfo",
"TaskValues": [
{"ValueNumber":1,
"Name":"uptime",
"Value":5},
{"ValueNumber":2,
"Name":"RSSI",
"Value":-60},
{"ValueNumber":3,
"Name":"sysheap",
"Value":25464},
{"ValueNumber":4,
"Name":"syssec_d",
"Value":27180}]
}
]
}

99 : CRC : No program memory checksum found. Check output of crc2.py
130 : CRC : SecuritySettings CRC ...OK
211 : INIT : Free RAM:21016
211 : INIT : I2C
211 : INIT : SPI not enabled
1042 : INFO : Plugins: 71 [Normal] [Testing] (ESP82xx Core 2_4_1)
1042 : EVENT: System#Wake
1089 : WIFI : Set WiFi to STA
1091 : WIFI : Connecting SMC attempt #0
1103 : EVENT: System#Boot
1111 : ACT : Publish ESP-201/IP,0.0.0.0
1124 : ACT : timerSet,1,60
1152 : WD : Uptime 0 ConnectFailures 0 FreeMem 20160
1183 : DS : Temperature: 20.37 (28-ff-b8-ea-b4-16-3-ed)
1184 : EVENT: DS18b20#Temperature=20.37
4887 : WIFI : Connected! AP: SMC (78:8A:20:D1:9B:D9) Ch: 1 Duration: 3795 ms
4888 : EVENT: WiFi#ChangedAccesspoint
4910 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
4911 : WIFI : Static IP: 192.168.0.201 (ESP-201-1) GW: 192.168.0.3 SN: 255.255.255.0 duration: 25 ms
5009 : Current Time Zone: DST time start: 2018-03-25 02:00:00 offset: 120 minSTD time start: 2018-10-28 03:00:00 offset: 60 min
5010 : EVENT: Time#Initialized
5027 : EVENT: WiFi#Connected
5044 : Webserver: start
5127 : MQTT : Intentional reconnect
5182 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:0B:68:52
5184 : Subscribed to: ESP-201/#
5185 : EVENT: MQTT#Connected
5846 : EVENT: Clock#Time=Sat,16:19

@susisstrolch That's strange about starting DHCP, since I wrote a patch for it recently.
Maybe something has changed in 2.4.1?

What did you do to get the extra debug info?

I‘ve simply set the debug level to „Debug more“...

@susisstrolch You also seem to have other debug output generated by the core library.
I do not see that in my log on the serial port.

Edit:
Found it, Serial.setDebugOutput is called from the Setup(). So a simple reboot was enough :)

Here is an interaction of SYSHEAP and Uptime.

sysheap

I am curious how that sysheap graph will look like after a day or two.

If he does not fail we will see it. :)

The charts all look different: this is the device with your last change and a static IP.

esp-201 sysheap

What version was the previous chart?

The first chart, the special version from you from last night. With DHCP

Interesting.
I will add some sysheap logging to my nodes too.

I log with openHAB. It's also great with Grafana.

Should I flash your current commits? Then my chart is lost on the ESP-201.

I guess wifi stability tests is more important at the moment.

Ok I'll do.

OK, is Online.

INIT : Booting version: (ESP82xx Core 2_4_1)
92 : INIT : Warm boot #2
94 : FS : Mounting...
118 : FS : Mount successful, used 76806 bytes of 957314
131 : CRC : No program memory checksum found. Check output of crc2.py
162 : CRC : SecuritySettings CRC ...OK
243 : INIT : Free RAM:20984
243 : INIT : I2C
243 : INIT : SPI not enabled
1073 : INFO : Plugins: 71 [Normal] [Testing] (ESP82xx Core 2_4_1)
1073 : EVENT: System#Wake
1120 : WIFI : Set WiFi to STA
1152 : WIFI : Connecting SMC attempt #0
1153 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
1155 : WIFI : SDK station status differs from Arduino status. SDK-status: 1 Arduino status: 6
1172 : EVENT: System#Boot
1178 : ACT : NeoPixelAll,0,0,0,0
1189 : ACT : Publish ESP-201/IP,192.168.0.201
1201 : ACT : timerSet,1,60
1226 : WD : Uptime 0 ConnectFailures 0 FreeMem 20088
1257 : DS : Temperature: 20.25 (28-ff-b8-ea-b4-16-3-ed)
1259 : EVENT: DS18b20#Temperature=20.25
4952 : WIFI : Connected! AP: SMC (78:8A:20:D1:9B:D9) Ch: 1 Duration: 3798 ms
4953 : EVENT: WiFi#ChangedAccesspoint
4974 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
4975 : WIFI : SDK station status differs from Arduino status. SDK-status: 5 Arduino status: 3
4980 : WIFI : Static IP: 192.168.0.201 (ESP-201-1) GW: 192.168.0.3 SN: 255.255.255.0 duration: 24 ms
5102 : Current Time Zone: DST time start: 2018-03-25 02:00:00 offset: 120 minSTD time start: 2018-10-28 03:00:00 offset: 60 min
5103 : EVENT: Time#Initialized
5123 : EVENT: WiFi#Connected
5140 : Webserver: start
5141 : WIFI : SDK station status differs from Arduino status. SDK-status: 5 Arduino status: 3
5223 : MQTT : Intentional reconnect
5261 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:0B:68:52
5264 : Subscribed to: ESP-201/#
5265 : EVENT: MQTT#Connected
5912 : EVENT: Clock#Time=Sun,00:14
31226 : WD : Uptime 1 ConnectFailures 0 FreeMem 15968

I also extended the information on the sysinfo page.
Added a reconnect counter and used Static/DHCP setting (and the SDK version)

Also included some forced reconnect check, which will reconnect when it is not connected for a while.

Should I interrupt WIFI for 0.2 seconds?

Please do try to crash it :)

OK

244302 : ACT : Publish ESP-201/IP,192.168.0.201
244318 : ACT : Publish ESP-201/MAC,5C:CF:7F:0B:68:52
244331 : ACT : Publish ESP-201/Time,00:18:18
244343 : ACT : Publish ESP-201/Uptime,4
244355 : ACT : Publish ESP-201/RSSI,-62
244367 : ACT : Publish ESP-201/SSID,SMC
244379 : ACT : Publish ESP-201/BSSID,78:8A:20:D1:9B:D9
244391 : ACT : Publish ESP-201/CH,1
244406 : ACT : Publish ESP-201/SYSHEAP,12616
244422 : ACT : timerSet,1,60
255542 : EVENT: WiFi#Disconnected
255560 : WIFI : Disconnected! Reason: '(1) Unspecified' Connected for 4 m 10 s
255560 : WIFI : SDK station status differs from Arduino status. SDK-status: 5 Arduino status: 3
255571 : MQTT : Connection lost
255572 : EVENT: MQTT#Disconnected
255610 : MQTT : Failed to connect to broker
256110 : MQTT : Failed to connect to broker
256860 : MQTT : Failed to connect to broker
257860 : MQTT : Failed to connect to broker
259110 : MQTT : Failed to connect to broker
260610 : MQTT : Failed to connect to broker
262360 : MQTT : Failed to connect to broker
264360 : MQTT : Failed to connect to broker
266360 : MQTT : Failed to connect to broker
268360 : MQTT : Failed to connect to broker
270360 : MQTT : Failed to connect to broker
271226 : WD : Uptime 5 ConnectFailures 22 FreeMem 17224
271247 : MQTT : Failed to connect to broker
272360 : MQTT : Failed to connect to broker
274360 : MQTT : Failed to connect to broker
276360 : MQTT : Failed to connect to broker
278360 : MQTT : Failed to connect to broker
280360 : MQTT : Failed to connect to broker
282360 : MQTT : Failed to connect to broker
284360 : MQTT : Failed to connect to broker
286291 : EVENT: Clock#Time=Sun,00:19
286359 : MQTT : Failed to connect to broker
288360 : MQTT : Failed to connect to broker
290360 : MQTT : Failed to connect to broker

Just for this check, could you let it stay in this state for 4 minutes?
I will reduce the ticker interval for this check (is currently 240 sec)

OK, I do.

240 seconds are very long

Yes I know, I will change it.
Just taken the idea from this issue: https://github.com/esp8266/Arduino/issues/4445

nothing....

432148 : MQTT : Failed to connect to broker
434148 : MQTT : Failed to connect to broker
436148 : MQTT : Failed to connect to broker
438147 : MQTT : Failed to connect to broker
440148 : MQTT : Failed to connect to broker
442147 : MQTT : Failed to connect to broker
444148 : MQTT : Failed to connect to broker
446148 : MQTT : Failed to connect to broker
448148 : MQTT : Failed to connect to broker
450147 : MQTT : Failed to connect to broker
451222 : WD : Uptime 8 ConnectFailures 446 FreeMem 17384
451243 : MQTT : Failed to connect to broker
452148 : MQTT : Failed to connect to broker
453907 : EVENT: Clock#Time=Sun,00:29
454148 : MQTT : Failed to connect to broker
456148 : MQTT : Failed to connect to broker
458148 : MQTT : Failed to connect to broker

I'm going to sleep, see you tomorrow.

just found another issue, possibly UDP related:
on a plain new factory reset unit use the serial commands
wifikey
wifissid
save

reboot, then go to the advanced settings and check ssdp. Reboot.
It goes into a boot loop then:

 ets Jan  8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1384, room 16 
tail 8
chksum 0x2d
csum 0x2d
v41a64707
~ld
⸮U87 : 


INIT : Booting version: (custom) (ESP82xx Core 41a64707)
88 : INIT : Warm boot #741
89 : FS   : Mounting...
114 : FS   : Mount successful, used 75802 bytes of 957314
120 : CRC  : No program memory checksum found. Check output of crc2.py
152 : CRC  : SecuritySettings CRC   ...OK 
258 : INIT : Free RAM:27288
258 : INIT : I2C
258 : INIT : SPI not enabled
272 : INFO : Plugins: 49 [Normal] (ESP82xx Core 41a64707)
273 : WIFI : Set WiFi to STA
304 : WIFI : Connecting MNET attempt #0
306 : WIFI  : SDK station status differs from Arduino status. SDK-status: 1 Arduino status: 6
311 : WD   : Uptime 0 ConnectFailures 0 FreeMem 26448

Exception (28):
epc1=0x40208931 epc2=0x00000000 epc3=0x00000000 excvaddr=0x00000004 depc=0x00000000

Decoding 14 results
0x40208931: UdpContext::next() at /home/john/Arduino/scetchbooks/ESPEasy/_P030_BMP280.ino line 390
0x40249cf8: HardwareSerial::write(unsigned char const*, unsigned int) at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/HardwareSerial.cpp line 69
0x4024a055: Print::write(char const*) at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/Print.cpp line 220
0x4024a2f1: Print::printNumber(unsigned long, unsigned char) at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/Print.cpp line 220
0x4024ac4f: String::changeBuffer(unsigned int) at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/WString.cpp line 714
0x40249cf8: HardwareSerial::write(unsigned char const*, unsigned int) at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/HardwareSerial.cpp line 69
0x401071a2: millis at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_wiring.c line 183
0x4024a27c: Print::println(char const*) at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/Print.cpp line 220
0x40213ece: LogStruct::add(char const*) at /home/john/Arduino/scetchbooks/ESPEasy/_P030_BMP280.ino line 390
:  (inlined by) addLog(unsigned char, char const*) at /home/john/Arduino/scetchbooks/ESPEasy/Misc.ino line 1395
0x4023545d: runEach30Seconds() at /home/john/Arduino/scetchbooks/ESPEasy/_P030_BMP280.ino line 390
0x4020c678: timeOutReached(unsigned long) at /home/john/Arduino/scetchbooks/ESPEasy/_P030_BMP280.ino line 390
0x4023eac5: loop at /home/john/Arduino/scetchbooks/ESPEasy/ESPEasy.ino line 436
0x4024bcc8: loop_wrapper at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/core_esp8266_main.cpp line 125
0x40100739: cont_wrapper at /home/john/ArduinoPortable/arduino-1.8.5_ESPgit/hardware/esp8266com/esp8266/cores/esp8266/cont.S line 81

good Morning all.
For me, the reconnect does not work yet.

INIT : Booting version: (ESP82xx Core 2_4_1)
92 : INIT : Warm boot #8
94 : FS : Mounting...
118 : FS : Mount successful, used 76806 bytes of 957314
131 : CRC : No program memory checksum found. Check output of crc2.py
162 : CRC : SecuritySettings CRC ...OK
243 : INIT : Free RAM:20968
243 : INIT : I2C
243 : INIT : SPI not enabled
1073 : INFO : Plugins: 71 [Normal] [Testing] (ESP82xx Core 2_4_1)
1074 : EVENT: System#Wake
1121 : WIFI : Set WiFi to STA
1153 : WIFI : Connecting SMC attempt #0
1154 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
1155 : WIFI : SDK station status differs from Arduino status. SDK-status: 1 Arduino status: 6
1173 : EVENT: System#Boot
1179 : ACT : NeoPixelAll,0,0,0,0
1190 : ACT : Publish ESP-201/IP,192.168.0.201
1201 : ACT : timerSet,1,60
1226 : WD : Uptime 0 ConnectFailures 0 FreeMem 20072
1258 : DS : Temperature: 19.75 (28-ff-b8-ea-b4-16-3-ed)
1259 : EVENT: DS18b20#Temperature=19.75
4925 : WIFI : Connected! AP: SMC (78:8A:20:D1:9B:D9) Ch: 1 Duration: 3770 ms
4926 : EVENT: WiFi#ChangedAccesspoint
4947 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
4947 : WIFI : SDK station status differs from Arduino status. SDK-status: 5 Arduino status: 3
4953 : WIFI : Static IP: 192.168.0.201 (ESP-201-1) GW: 192.168.0.3 SN: 255.255.255.0 duration: 23 ms
5066 : Current Time Zone: DST time start: 2018-03-25 02:00:00 offset: 120 minSTD time start: 2018-10-28 03:00:00 offset: 60 min
5066 : EVENT: Time#Initialized
5087 : EVENT: WiFi#Connected
5104 : Webserver: start
5104 : WIFI : SDK station status differs from Arduino status. SDK-status: 5 Arduino status: 3
5186 : MQTT : Intentional reconnect
5222 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:0B:68:52
5225 : Subscribed to: ESP-201/#
5226 : EVENT: MQTT#Connected
5906 : EVENT: Clock#Time=Sun,10:01
31227 : WD : Uptime 1 ConnectFailures 0 FreeMem 16336
47905 : EVENT: Clock#Time=Sun,10:02
61229 : WD : Uptime 1 ConnectFailures 0 FreeMem 16336
61926 : DS : Temperature: 19.75 (28-ff-b8-ea-b4-16-3-ed)
61928 : EVENT: DS18b20#Temperature=19.75
61972 : EVENT: Rules#Timer=1
61983 : ACT : Publish ESP-201/IP,192.168.0.201
61999 : ACT : Publish ESP-201/MAC,5C:CF:7F:0B:68:52
62015 : ACT : Publish ESP-201/Time,10:02:14
62030 : ACT : Publish ESP-201/Uptime,1
62044 : ACT : Publish ESP-201/RSSI,-62
62060 : ACT : Publish ESP-201/SSID,SMC
62076 : ACT : Publish ESP-201/BSSID,78:8A:20:D1:9B:D9
62091 : ACT : Publish ESP-201/CH,1
62107 : ACT : Publish ESP-201/SYSHEAP,13536
62120 : ACT : timerSet,1,60
67292 : EVENT: WiFi#Disconnected
67310 : WIFI : Disconnected! Reason: '(1) Unspecified' Connected for 1 m 2 s
67310 : WIFI : SDK station status differs from Arduino status. SDK-status: 5 Arduino status: 3
67316 : MQTT : Connection lost
67317 : EVENT: MQTT#Disconnected
67357 : MQTT : Failed to connect to broker
67856 : MQTT : Failed to connect to broker
68606 : MQTT : Failed to connect to broker
69607 : MQTT : Failed to connect to broker
70857 : MQTT : Failed to connect to broker
72357 : MQTT : Failed to connect to broker
74107 : MQTT : Failed to connect to broker
76106 : MQTT : Failed to connect to broker
78107 : MQTT : Failed to connect to broker
80107 : MQTT : Failed to connect to broker
82106 : MQTT : Failed to connect to broker
84106 : MQTT : Failed to connect to broker
86107 : MQTT : Failed to connect to broker
88106 : MQTT : Failed to connect to broker
90107 : MQTT : Failed to connect to broker
91228 : WD : Uptime 2 ConnectFailures 30 FreeMem 17368
91250 : MQTT : Failed to connect to broker
92107 : MQTT : Failed to connect to broker
94107 : MQTT : Failed to connect to broker
96106 : MQTT : Failed to connect to broker
98107 : MQTT : Failed to connect to broker
100107 : MQTT : Failed to connect to broker
102107 : MQTT : Failed to connect to broker
104106 : MQTT : Failed to connect to broker
106107 : MQTT : Failed to connect to broker
107905 : EVENT: Clock#Time=Sun,10:03
108107 : MQTT : Failed to connect to broker
110107 : MQTT : Failed to connect to broker
112107 : MQTT : Failed to connect to broker
114107 : MQTT : Failed to connect to broker
116107 : MQTT : Failed to connect to broker
118107 : MQTT : Failed to connect to broker
120107 : MQTT : Failed to connect to broker
121228 : WD : Uptime 2 ConnectFailures 62 FreeMem 17368
121249 : MQTT : Failed to connect to broker
121926 : DS : Temperature: 19.75 (28-ff-b8-ea-b4-16-3-ed)
121927 : EVENT: DS18b20#Temperature=19.75
122107 : MQTT : Failed to connect to broker
122905 : EVENT: Rules#Timer=1
122915 : ACT : Publish ESP-201/IP,0.0.0.0
122927 : ACT : Publish ESP-201/MAC,5C:CF:7F:0B:68:52
122939 : ACT : Publish ESP-201/Time,10:03:15
122950 : ACT : Publish ESP-201/Uptime,2
122961 : ACT : Publish ESP-201/RSSI,0
122972 : ACT : Publish ESP-201/SSID,--
122983 : ACT : Publish ESP-201/BSSID,00:00:00:00:00:00
122994 : ACT : Publish ESP-201/CH,0
123005 : ACT : Publish ESP-201/SYSHEAP,16992
123015 : ACT : timerSet,1,60
124107 : MQTT : Failed to connect to broker
126106 : MQTT : Failed to connect to broker
128107 : MQTT : Failed to connect to broker
130107 : MQTT : Failed to connect to broker
132107 : MQTT : Failed to connect to broker
134107 : MQTT : Failed to connect to broker
136107 : MQTT : Failed to connect to broker
138107 : MQTT : Failed to connect to broker
140107 : MQTT : Failed to connect to broker
142107 : MQTT : Failed to connect to broker
144107 : MQTT : Failed to connect to broker
146107 : MQTT : Failed to connect to broker
148107 : MQTT : Failed to connect to broker
150107 : MQTT : Failed to connect to broker
151228 : WD : Uptime 3 ConnectFailures 94 FreeMem 17368
151249 : MQTT : Failed to connect to broker
152107 : MQTT : Failed to connect to broker
154107 : MQTT : Failed to connect to broker
156107 : MQTT : Failed to connect to broker
158107 : MQTT : Failed to connect to broker
160107 : MQTT : Failed to connect to broker
162107 : MQTT : Failed to connect to broker
164107 : MQTT : Failed to connect to broker
166107 : MQTT : Failed to connect to broker
167905 : EVENT: Clock#Time=Sun,10:04
168107 : MQTT : Failed to connect to broker
170107 : MQTT : Failed to connect to broker
172107 : MQTT : Failed to connect to broker
174106 : MQTT : Failed to connect to broker
176106 : MQTT : Failed to connect to broker
178107 : MQTT : Failed to connect to broker
180107 : MQTT : Failed to connect to broker
181228 : WD : Uptime 3 ConnectFailures 126 FreeMem 17368
181250 : MQTT : Failed to connect to broker
181926 : DS : Temperature: 19.75 (28-ff-b8-ea-b4-16-3-ed)
181927 : EVENT: DS18b20#Temperature=19.75
182107 : MQTT : Failed to connect to broker
183905 : EVENT: Rules#Timer=1
183915 : ACT : Publish ESP-201/IP,0.0.0.0
183927 : ACT : Publish ESP-201/MAC,5C:CF:7F:0B:68:52
183938 : ACT : Publish ESP-201/Time,10:04:16
183950 : ACT : Publish ESP-201/Uptime,3
183961 : ACT : Publish ESP-201/RSSI,0
183972 : ACT : Publish ESP-201/SSID,--
183983 : ACT : Publish ESP-201/BSSID,00:00:00:00:00:00
183994 : ACT : Publish ESP-201/CH,0
184005 : ACT : Publish ESP-201/SYSHEAP,16992
184015 : ACT : timerSet,1,60
184107 : MQTT : Failed to connect to broker
186106 : MQTT : Failed to connect to broker
188107 : MQTT : Failed to connect to broker
190106 : MQTT : Failed to connect to broker
192107 : MQTT : Failed to connect to broker
194106 : MQTT : Failed to connect to broker
196106 : MQTT : Failed to connect to broker
198106 : MQTT : Failed to connect to broker
200106 : MQTT : Failed to connect to broker
202106 : MQTT : Failed to connect to broker
204106 : MQTT : Failed to connect to broker
206106 : MQTT : Failed to connect to broker
208106 : MQTT : Failed to connect to broker
210106 : MQTT : Failed to connect to broker
211228 : WD : Uptime 4 ConnectFailures 158 FreeMem 17368
211249 : MQTT : Failed to connect to broker
212106 : MQTT : Failed to connect to broker
214106 : MQTT : Failed to connect to broker
216106 : MQTT : Failed to connect to broker
218106 : MQTT : Failed to connect to broker
220106 : MQTT : Failed to connect to broker
222106 : MQTT : Failed to connect to broker
224106 : MQTT : Failed to connect to broker
226106 : MQTT : Failed to connect to broker
227905 : EVENT: Clock#Time=Sun,10:05
228107 : MQTT : Failed to connect to broker
230107 : MQTT : Failed to connect to broker
232107 : MQTT : Failed to connect to broker
234107 : MQTT : Failed to connect to broker
236106 : MQTT : Failed to connect to broker
238106 : MQTT : Failed to connect to broker
240106 : MQTT : Failed to connect to broker
241228 : WD : Uptime 4 ConnectFailures 190 FreeMem 17368
241249 : MQTT : Failed to connect to broker
241925 : DS : Temperature: 19.75 (28-ff-b8-ea-b4-16-3-ed)
241927 : EVENT: DS18b20#Temperature=19.75
242107 : MQTT : Failed to connect to broker
244106 : MQTT : Failed to connect to broker
244908 : EVENT: Rules#Timer=1
244918 : ACT : Publish ESP-201/IP,0.0.0.0
244930 : ACT : Publish ESP-201/MAC,5C:CF:7F:0B:68:52
244942 : ACT : Publish ESP-201/Time,10:05:17
244953 : ACT : Publish ESP-201/Uptime,4
244964 : ACT : Publish ESP-201/RSSI,0
244975 : ACT : Publish ESP-201/SSID,--
244986 : ACT : Publish ESP-201/BSSID,00:00:00:00:00:00
244997 : ACT : Publish ESP-201/CH,0
245008 : ACT : Publish ESP-201/SYSHEAP,16992
245018 : ACT : timerSet,1,60
246107 : MQTT : Failed to connect to broker
248106 : MQTT : Failed to connect to broker
250106 : MQTT : Failed to connect to broker
252106 : MQTT : Failed to connect to broker
254106 : MQTT : Failed to connect to broker
256106 : MQTT : Failed to connect to broker
258106 : MQTT : Failed to connect to broker
260106 : MQTT : Failed to connect to broker
262106 : MQTT : Failed to connect to broker
264106 : MQTT : Failed to connect to broker
266107 : MQTT : Failed to connect to broker

ESP32 - last changes (28-04-2018) MQTT stop work, NTP stop work
(static IP)

@flexiti
MQTT does lose the connection every second if you do not subscribe and just try to publish.

52898 : MQTT : Connection lost
52925 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
52926 : Subscribed to: 
53176 : MQTT : Connection lost
53203 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
53204 : Subscribed to: 
53498 : MQTT : Connection lost
53527 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
53528 : Subscribed to: 
53778 : MQTT : Connection lost
53806 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
53807 : Subscribed to: 
54058 : MQTT : Connection lost
54086 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
54087 : Subscribed to: 
54337 : MQTT : Connection lost
54363 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
54364 : Subscribed to: 
54615 : MQTT : Connection lost
54642 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
54643 : Subscribed to: 
54894 : MQTT : Connection lost
54921 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
54922 : Subscribed to: 
55172 : MQTT : Connection lost
55199 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
55200 : Subscribed to: 
55630 : FILE : Saved config.dat
55692 : FILE : Saved config.dat
55861 : MQTT : Connection lost
55889 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2

Try to enter something in the subscribe line even if you dont use it. Worked for me.

55630 : FILE : Saved config.dat
55692 : FILE : Saved config.dat
55861 : MQTT : Connection lost
55889 : MQTT : Connected to broker with client ID: ESPClient_2C:3A:E8:06:5E:B2
55890 : Subscribed to: SMA
60404 : WD   : Uptime 1 ConnectFailures 354 FreeMem 19224
90404 : WD   : Uptime 2 ConnectFailures 353 FreeMem 19224
120404 : WD   : Uptime 2 ConnectFailures 352 FreeMem 19224
150404 : WD   : Uptime 3 ConnectFailures 351 FreeMem 19224
180404 : WD   : Uptime 3 ConnectFailures 350 FreeMem 19224
210404 : WD   : Uptime 4 ConnectFailures 349 FreeMem 19224

@TD-er
this thread is getting a bit overloaded and it seems there are multiple wifi / stability issues. I suggest to move all issues to a seperate github issue and tag them with a key word, e.g.
[WIFICORE] ssdp
[WIFICORE] MQTT subscription needed
[WIFICORE] Unit not found - port setting vanished
[WIFICORE] ticker interval

Just to report, just upgraded from mega-20180428 to mega-20180429.

In mega-20180428 the wifi was stable overall, but wouldn't reconnect after a disconnect.
in mega-20180429 the wifi is very unstable and I can not ping it without getting a lot of read timeouts.

I'm using a sonoff basic, self-compiling the releases with #define PLUGIN_SET_SONOFF_BASIC to make the bin small enough for OTA.
Not sure if it helps, going back to mega-20180428 in the meantime.

@louis-lau Could you re-test with the latest commit+merge I just made?

This is on the latest commit
screenshot

@louis-lau And the log from the node itself?
If it tries to reconnect to a MQTT broker, the response is slower and there is currently a reconnect handler in the background to reconnect when there are lots of MQTT reconnect failures.
Oh and sometimes after a flash attempt it is best to do a reset of the node (not reset settings, just like pressing the reset button). Sometimes there is some left over config still present on the node after flashing, which may lead to strange results.

@louis-lau a sonoff basic ?
me too I love 0429, 0428 was terrible
Can I know more about your 'overall' settings please ?
And as self-compiled, is it compiled with 2.4.1 Core ?
I just tested 0429 fresh flash on a blank node & it runs perfectly.
My previous try with 0429 was an update to a preconfigured static set node. perfect too
My boards are dated 5-5-2017

@Oxyandy
2.4.2 core????

This is all I got that seems relevant:

04-29-2018  15:43:29    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: EVENT: MQTT#Connected
04-29-2018  15:43:29    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: Subscribed to: /sonoff_lavalamp/#
04-29-2018  15:43:29    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:71:68:FB
04-29-2018  15:43:29    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: EVENT: Time#Set Processing time:46 milliSeconds
04-29-2018  15:43:29    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: EVENT: Time#Set
04-29-2018  15:43:29    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: NTP  : NTP replied: 20 mSec
04-29-2018  15:43:29    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: NTP  : NTP host time.google.com (216.239.35.8) queried
04-29-2018  15:43:29    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
04-29-2018  15:43:06    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: SW   : GPIO 12 Set to 0
04-29-2018  15:43:06    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: EVENT: MQTT#Connected Processing time:1132 milliSeconds
04-29-2018  15:43:06    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: else = false
04-29-2018  15:43:06    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: SW   : GPIO 13 Set PWM to 1023
04-29-2018  15:43:05    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: [if 0=0]=true
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: ACT  : timerSet,4,0
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: ACT  : timerSet,3,0
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: ACT  : timerSet,2,0
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: ACT  : timerSet,1,0
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: EVENT: MQTT#Connected
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: Subscribed to: /sonoff_lavalamp/#
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:71:68:FB
04-29-2018  15:43:05    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: EVENT: Time#Set Processing time:47 milliSeconds
04-29-2018  15:43:05    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: EVENT: Time#Set
04-29-2018  15:43:05    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: NTP  : NTP replied: 20 mSec
04-29-2018  15:43:05    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: NTP  : NTP host time.google.com (216.239.35.8) queried
04-29-2018  15:43:05    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
04-29-2018  15:42:53    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: WD   : Uptime 3 ConnectFailures 2 FreeMem 19456
04-29-2018  15:42:53    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: 2: lowest: 12320  parseTemplate3-> 17112 ruleMatch-> 17088 ruleMatch2-> 17040 parseTemplate-> 17176 parseTemplate3-> 17112 ruleMatch-> 17072 ruleMatch2-> 17008 rulesProcessingFile2-> 17160 sendContentBlocking-> 17184 sendConten
04-29-2018  15:42:47    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: EVENT: MQTT#Connected Processing time:1131 milliSeconds
04-29-2018  15:42:47    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: else = false
04-29-2018  15:42:47    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: SW   : GPIO 13 Set PWM to 1023
04-29-2018  15:42:46    Kernel.Debug    192.168.2.22    sonoff_lavalamp EspEasy: [if 0=0]=true
04-29-2018  15:42:46    Kernel.Notice   192.168.2.22    sonoff_lavalamp EspEasy: ACT  : timerSet,4,0

@Oxyandy I'll try a factory reset. I haven't made any changes before compiling, except the sonoff basic plugin flag and the network settings in Custom.h.

You're right, works fine with the factory settings 😄 .

I'll try restoring, see if it breaks again.

It just looks like a reboot?

In the web log you will get a bit more lines, since that one has a buffer of 15 lines.
Also on the sysinfo page you can get more info.

The recorded logs on the syslog server may not be complete, since that one does not receive data when the wifi is disconnected.

Alright, started happening again when I enabled the the mqtt controller.
And stopped when I disabled it.

I'll try to get a better log, it's a bit hard when you can't connect to it half the time.
What log level? debug?

MQTT connects are also shown at info level.
Info does show error + info.
That way, you're not filling up the log buffer of the weblog too fast to display.

Okay, when I stop the mqtt server is also works normally, only when the server is running does the connection time out.

This is all I was able to get from the log:
screenshot

What kind of MQTT controller is this? Domoticz? OpenHAB?

Or are you trying to do importMQTT?

There is also a copy button on the web page showing the log.
That page refreshes every N seconds, which allows you to paste the text inbetween refreshes to a text editor.
It isn't perfect, I know, but it's better than nothing :)

Edit:
Please also have a look at the rules, if these are complete.
There is still an issue saving the rules. This saving may not be the complete rules set entered.

Using the latest githib (07bfec42347d13ad49dda907654a36bf747df3bc), Wifi connects without issues on all my nodes. Now also reconnects properly after AP reboot. Using core 2.4.1.

Hehe, once it's actually loaded I have very little time, so I just took a screenshot. I'm using the openhab mqtt controller, the server is mosquitto.

EDIT: not using rules right now. Just to make sure that's not the problem.

Same here. Try to subscribe to any topic. That fixed my issue.

-------- Ursprüngliche Nachricht --------
Von: Louis Laureys notifications@github.com
Gesendet: 29. April 2018 16:23:54 MESZ
An: letscontrolit/ESPEasy ESPEasy@noreply.github.com
CC: s0170071 drk@posteo.de, Mention mention@noreply.github.com
Betreff: Re: [letscontrolit/ESPEasy] Wifi issues -never ending story- go back to non event based wifi? (#1302)

Hehe, once it's actually loaded I have very little time, so I just took a screenshot. I'm using the openhab mqtt controller, the server is mosquitto.

--
You are receiving this because you were mentioned.
Reply to this email directly or view it on GitHub:
https://github.com/letscontrolit/ESPEasy/issues/1302#issuecomment-385255207

@s0170071 Where should this subscribe be done?
If that's in some kind of default setting, then maybe we should add that.

Is the time between MQTT reconnects about 15 - 16 seconds? Then it could be Mosquito is kicking you out and then I know where to patch this.

In the openhab controller settings. There is the subscribe field.

Using the latest githib (07bfec4), MQTT also works without issues here using openhab controller and connecting to Mosquitto. Using core 2.4.1.

@td-er see few posts above.

I'm already subscribed to /sonoff_lavalamp/# as you can see in the logs.

I did notice when I subscribe to # (so, all topics). The wifi is stable, but that makes mqtt unusable ;)

Mosquitto shouldn't be kicking the user out, the user I'm using has permissions to all topics.

This is the mosquitto log:

1525014154: New client connected from 192.168.2.22 as ESPClient_5C:CF:7F:71:68:FB (c1, k15, u'my_mqtt_username').
1525014168: New connection from 192.168.2.22 on port 1883.
1525014168: Client ESPClient_5C:CF:7F:71:68:FB already connected, closing old connection.
1525014168: Client ESPClient_5C:CF:7F:71:68:FB disconnected.
1525014168: New client connected from 192.168.2.22 as ESPClient_5C:CF:7F:71:68:FB (c1, k15, u'my_mqtt_username').
1525014196: New connection from 192.168.2.22 on port 1883.
1525014196: Client ESPClient_5C:CF:7F:71:68:FB already connected, closing old connection.
1525014196: Client ESPClient_5C:CF:7F:71:68:FB disconnected.
1525014196: New client connected from 192.168.2.22 as ESPClient_5C:CF:7F:71:68:FB (c1, k15, u'my_mqtt_username').
1525014214: New connection from 192.168.2.22 on port 1883.
1525014214: Client ESPClient_5C:CF:7F:71:68:FB already connected, closing old connection.
1525014214: Client ESPClient_5C:CF:7F:71:68:FB disconnected.
1525014214: New client connected from 192.168.2.22 as ESPClient_5C:CF:7F:71:68:FB (c1, k15, u'my_mqtt_username').
1525014226: New connection from 192.168.2.22 on port 1883.
1525014226: Client ESPClient_5C:CF:7F:71:68:FB already connected, closing old connection.
1525014226: Client ESPClient_5C:CF:7F:71:68:FB disconnected.
1525014226: New client connected from 192.168.2.22 as ESPClient_5C:CF:7F:71:68:FB (c1, k15, u'my_mqtt_username').
1525014255: New connection from 192.168.2.22 on port 1883.
1525014255: Client ESPClient_5C:CF:7F:71:68:FB already connected, closing old connection.
1525014255: Client ESPClient_5C:CF:7F:71:68:FB disconnected.
1525014255: New client connected from 192.168.2.22 as ESPClient_5C:CF:7F:71:68:FB (c1, k15, u'my_mqtt_username').
1525014270: New connection from 192.168.2.22 on port 1883.

It seems to be reconnecting, and mosquitto is closing the old connection.

I've been looking into the source of PubSubClient.
It appears there has to be incoming and outgoing activity within the timeout period.
If one of them fails, PubSubClient will disconnect and thus ESPeasy will reconnect.

I will try to add some kind of automatic ping. There is already some, but it may be the check is done before the ping has returned.

@louis-lau Can you somehow find the used keep alive time setting in your Mosquito?

It looks like the MQTT timeout setting we use is 15 seconds.
It is defined as: #define MQTT_KEEPALIVE 15

See also https://github.com/knolleary/pubsubclient/issues/239

Wow, you did it @TD-er !!!!!

467279 : EVENT: Clock#Time=Sun,17:24
467588 : WD : Uptime 8 ConnectFailures 0 FreeMem 16304
481935 : MQTT : Connection lost
481935 : EVENT: MQTT#Disconnected
481953 : EVENT: WiFi#Disconnected
481969 : WIFI : Disconnected! Reason: '(1) Unspecified' Connected for 7 m 40 s
482278 : WIFI : Connecting SMC attempt #0
482278 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
483304 : EVENT: WiFi#Disconnected
483322 : WIFI : Disconnected! Reason: '(202) Auth fail' Connected for 1018 ms
484291 : WIFI : Connecting SMC attempt #1
484292 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
488073 : WIFI : Connected! AP: SMC (78:8A:20:D1:9B:D9) Ch: 1 Duration: 3780 ms
488074 : IP : Static IP : 192.168.0.201 GW: 192.168.0.3 SN: 255.255.255.0 DNS: 192.168.0.3
488078 : WIFI : Static IP: 192.168.0.201 (ESP-201-1) GW: 192.168.0.3 SN: 255.255.255.0 duration: 6 ms
488099 : EVENT: WiFi#Connected
488245 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:0B:68:52
488247 : Subscribed to: ESP-201/#
488248 : EVENT: MQTT#Connected
489111 : EVENT: Time#Set

@TD-er Are you sure about the pubsubclient having issues? My unit receives nothing, only sends analog value every 30 seconds. No MQTT connection issues. If there was an issue with the library, wouldn't we have a lot more users complaining?

Maybe it's a race condition...

The MQTT broker has to consider a client as disconnected at 1.5x the timeout.
Pubsubclient does send a ping when the last activity was more than 15 seconds ago.
So if the default timeout of Mosquito (10s) is being used, then the timing is getting quite critical.

At my setup, I see a lot of MQTT traffic of all the nodes here chatting on the same (domoticz) channel.
So the timeout is never an issue here.
But if there is only one node, there is a lot less traffic and with the default ESPeasy timeout set at exactly 1.5x the default Mosquito timeout, it may be a bit critical.

Then he could try to send a dummy analog value each 5 seconds to check if the issue goes away...

Or use my last commit ;)

My reconnects were very fast, every second or so

The MQTT dev does not seem to like that:
I am not inclined to change the default value. It had been 15 seconds for 7+ years. If anything, we will make it easier to customise and not rely on editing the header file.

The define is wrapped with #ifdef
So there is an option to define it in other parts of the code.

On the github page of PubSubclient, is also an issue about making it configurable.

Another comment by it's author:
In the MQTT protocol, the client determines the keepalive value used on a connection. The broker has no say in it.

The only keepalive config option on mosquitto is its bridge keepalive - where it is acting as a client connecting to another broker - https://mosquitto.org/man/mosquitto-conf-5.html

I've checked my mosquitto config. Can't find any timeout option for connections

This document discusses the timeout behavior.

And the document states the same:

The MQTT client is responsible of setting the right keep alive value. For example, it can adapt the interval to its current signal strength.

So where does the 10 seconds Mosquitto time come from? Is it hardcoded?

https://mosquitto.org/man/mosquitto-conf-5.html

keepalive_interval seconds
Set the number of seconds after which the bridge should send a ping if no other traffic has occurred. Defaults to 60. A minimum value of 5 seconds is allowed.

That setting is only for bridging

Keep in mind that this wasn't an issue in 20180428 :)

EDIT: I'll try your latest commit when I get home.

You could try to disable the connectionCheckHandler.

To see what has changed in the last day: https://github.com/letscontrolit/ESPEasy/compare/mega@%7B1day%7D...mega

Just updated to https://github.com/letscontrolit/ESPEasy/commit/4e6e31fdae11476a2f3dfce00e01ed77d1858c00 and wifi is now stable. It also reconnects now when I restart my AP! Thank you 😄

(btw, is there any way to toggle the Wifi Status LED setting using rules? For example I would only like to enable it if mqtt is disconnected, and use it for the relay status when connected.)

I am not sure about the status LED. It is called from the MQTTconnect function and from a few other places.
But maybe you could add an issue to make it selectable what is being shown via that LED?

And good to hear MQTT issues seem to be fixed by the lowered timeout.
We may have to make it selectable.

Could you make the LOG window a bit longer?
I mean, increase the number of lines.

Otherwise it is going very well now.

I just reduced them.
It was 20 lines, but with 2.4.0 there had to be a bit more free mem, so I reduced it to 10 lines for dev/test and 15 for normal.

I will look into the memory consumption this week and @Grovkillen is looking into some way to get a proper log window.

OK, i go to sleep.
Many Thanks for your hard work!!

About that window. As I see it there are three options.

  1. Use as much ram as available to keep the info until requested. Could be dynamic, dependant on how many plugins are active.
  2. Stream it to the Webbrowser.
  3. Compress it temporarily and restore when requested. E.g. use more numeric codes. It's less readable of course.

I like 2. Gives a smooth web display too. It's not that simple though.

The idea is to get some kind of JavaScript to collect the log and store it in the browser.
There are several options to deliver the data to the browser, like keeping an open connection, or some other techniques.
That leaves only a few lines of log needed to be kept in memory.
And that buffer can also be used to transfer the log to syslog.

The log cache object currently isn't that efficient with memory. A lot of room for improvement.

I also wonder now if different versions of mosquitto
running on different operating systems have different behaviour ?
I know ESPeasy should be flexible, but would a mosquitto update resolve issues for some.

Only for understanding...
Why i get direct after booting my ESP this:

Last Disconnect Reason str |  (1) Unspecified
Number reconnects | 1

In syslog i see two IP STATIC messages and Uptime 0 ConnectFailures 0:

INIT : Booting version: mega-20180430 (ESP82xx Core 2_4_1)
104 : INIT : Warm boot #1
105 : FS   : Mounting...
130 : FS   : Mount successful, used 75802 bytes of 957314
419 : CRC  : program checksum       ...OK
426 : CRC  : SecuritySettings CRC   ...OK 
532 : INIT : Free RAM:23528
532 : INIT : I2C
532 : INIT : SPI not enabled
546 : INFO : Plugins: 47 [Normal] (ESP82xx Core 2_4_1)
546 : WIFI : Set WiFi to STA
578 : WIFI : Connecting im6shop attempt #0
579 : IP   : Static IP : 192.168.1.17 GW: 0.0.0.0 SN: 0.0.0.0 DNS: 0.0.0.0
585 : WD   : Uptime 0 ConnectFailures 0 FreeMem 22688
4342 : WIFI : Connected! AP: im6shop (30:B5:C2:EB:DB:7D) Ch: 9 Duration: 3763 ms
4343 : IP   : Static IP : 192.168.1.17 GW: 0.0.0.0 SN: 0.0.0.0 DNS: 0.0.0.0
4346 : WIFI : Static IP: 0.0.0.0 (ESP-Easy-7) GW: 0.0.0.0 SN: 0.0.0.0   duration: 3 ms
4356 : Webserver: start
4710 : WIFI : Static IP: 192.168.1.32 (ESP-Easy-7) GW: 192.168.1.1 SN: 255.255.255.0   duration: 356 ms

Maybe the term "reconnect" is not chosen well.
It is more like "(re)connect", or "connect attempts"
Counter starts at 0 and at each connect attempt it is a counter++.

I changed the counter by initializing it to '-1'. So the first connect attempt will set it to 0, thus the label "Number reconnects" now makes sense again :)
I could not think of a better name, so I changed the reported value ;)

Proper naming is still the hardest part of programming.

It sound's good !

connection try x

Unfortunately the latest build 20180503 (4096 dev) does not work for me - the ESP web does not open , instead it stops responding in serial console (after the web open attempt). Settings reset to defaults and set only WifiSSID and key through serial console. PING works, no reboots, no error message.

Have you performed a cold reboot after flash?
I had something similar right after flashing.
A power cycle did solve it then.

Yes I tried cold reboot severeal times, it's still the same. Tested from MSIE and Firefox on Win7. I'm going to test the device in another location couple minutes later (different PC / OS, different AP).
Right after the flash it rebooted and hanged.

I have tried it right now, with the normal version. On my side, no problerms. No power on reset was neccessary.

It's strange in another location the same device's ESP web works (Win10 + Firefox / MS Edge, different AP) but serial console looks "read only"... :-/
Update - tried another terminal app - the same, serial console read only. Then run putty (which is my default terminal app) again and saw the device rebooting as soon as I connected with putty to appropriate COM port. Now the serial console accepts commands and web is also working... I don't understand anything...

Maybe try a complete wipe of the flash and program again?
There seems to be a relation to wifi connecting and whether the serial port is being read. I've seen someone mentioning that in a thread. Not sure if that was on PlatformIO issue or on LWIP.
I've been reading a lot lately ;)

Yeah, I'll try that with some next build, this one I wish to test in another location again to see if it's still the same issue (device configuration updated a bit in the meantime).
BTW. when I reset settings for the first time (after this build flashing, by issuing the Reset command from serial console), in fact it did NOT reset settings despite it said "formatting flash" etc. Device rebooted and at least WiFi settings was still there as I saw connecting attempts to AP in serial console... The second reset attempt from serial console erased the settings...

Yep, the core library stores the wifi settings in an area outside the SPIFF.
This may affect wifi connection attempts.

I already read in the (core) code that there is now support for up to 5 wifi settings, which you can select.
So maybe I will actively use that storage area also, to make sure it will not conflict with our own settings.
Only thing I am afraid of, is that those settings will be written too often. I have to check when that's being written.
But it may make wifi connection a bit more predictable when that area is also being taken into account with ESPeasy.

@Oxyandy I will now push the PlatformIO.ini change to use LWIP2_LOW_MEMORY.
Could you please test this?
And also the question whether you were using task/device 12?
I just tested it on my node and almost immediately it disconnected + refused to get online again.
That was with LWIP1.4

And also the question whether you were using task/device 12?

In tests, no just the first, a single switch
Yeah, there are 15 changed files in GitHub Desktop pulling now

Compiled fine, web server great, using LWIP2_LOW_MEMORY, the F5 test, just great..
no sign of LmacRxBlk:1 in logs
I had wiped out all settings on the Node, but will 'reset' it and go through the process, report any oddities..
Wifi connected instantly first go..

just be careful with lwIP2 low memory, I had to use the lwIP2 high bandwith, otherwise large packets got truncated (eg. sensors with multiple values), at least when using fhem as a server/controller...

I'm currently using the mega branch with esp core from GIT and lwIP2 high bandwith which runs fine, except after some time some of the units can't read the sensors values anymore and therefore won't send it to the controller. the unit itself as well as the webinterface run smoothly though... I'm still investigating in this, never had this in commits before Mai..

We tried LwIP2 high bandwidth and that messed-up POST messages.
For example, saving rules > 1520 bytes got truncated.

ESP_Easy_mega-20180504_normal_ESP8266_1024.bin
Is meant as a DoS attack on time servers ?
After I flashed I was call away, I have pages & pages of this loop

INIT : Booting version: mega-20180504 (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
105 : INIT : Cold Boot
106 : FS   : Mounting...
113 : FS   : Mount successful, used 76053 bytes of 113201
15:40:37: 410 : CRC  : program checksum       ...OK
417 : CRC  : SecuritySettings CRC   ...OK 
418 : CRC  : binary has changed since last save of Settings
433 : INIT : Free RAM:23448
434 : INIT : I2C
434 : INIT : SPI not enabled
449 : INFO : Plugins: 47 [Normal] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
449 : EVENT: System#Wake
458 : WIFI : Set WiFi to STA
mode : sta(5c:cf:7f:72:96:ec)
add if0
491 : WIFI : Connecting MAD_IOT attempt #0
492 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
STUB: dhcp_stop
505 : EVENT: System#Boot
514 : SW   : Switch state 1 Output value 1
517 : EVENT: Float_SW#Switch=1.00
530 : ACT  : Publish domoticz/in,{"idx":66,"nvalue":0,"svalue":"FLOAT_SWITCH_1_00:00:00"}
541 : Command: publish
1004 : WD   : Uptime 0 ConnectFailures 0 FreeMem 22736
15:40:39: scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 
15:40:41: 
connected with MAD_IOT, channel 13
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
4480 : WIFI : Connected! AP: MAD_IOT (F4:F2:6D:25:84:C6) Ch: 13 Duration: 3986 ms
4485 : EVENT: WiFi#ChangedAccesspoint
4497 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
4499 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 19 ms
4517 : EVENT: WiFi#Connected
4525 : Webserver: start
4526 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
5015 : NTP  : NTP host au.pool.ntp.org (129.250.35.250) queried
5066 : NTP  : NTP replied: 50 mSec
5068 : Current Time Zone:  DST time start: 2018-10-07 01:00:00 offset: 660 minSTD time start: 2018-04-01 01:00:00 offset: 600 min
5071 : EVENT: Time#Initialized
5082 : EVENT: Time#Initialized Processing time:11 milliSeconds
5083 : EVENT: Clock#Time=Fri,15:40
5089 : EVENT: Clock#Time=Fri,15:40 Processing time:6 milliSeconds
5091 : MQTT : Intentional reconnect
5091 : LoadFromFile: config.dat index: 28672 datasize: 336
5221 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:96:EC
5222 : Subscribed to: domoticz/out
5224 : EVENT: MQTT#Connected
5234 : EVENT: MQTT#Connected Processing time:9 milliSeconds
15:40:43: ping 1, timeout 0, total payload 32 bytes, 1365 ms
15:40:48: bcn_timout,ap_probe_send_start
15:40:50: ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
13912 : EVENT: WiFi#Disconnected
13919 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 9432 ms
13930 : MQTT : Connection lost
13931 : EVENT: MQTT#Disconnected
14514 : WIFI : Connecting MAD_IOT attempt #0
14515 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
15:40:51: scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 
15:40:52: 
connected with MAD_IOT, channel 13
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
16258 : WIFI : Connected! AP: MAD_IOT (F4:F2:6D:25:84:C6) Ch: 13 Duration: 1738 ms
16260 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
16269 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 11 ms
16288 : EVENT: WiFi#Connected
16295 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
16389 : LoadFromFile: config.dat index: 28672 datasize: 336
16425 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:96:EC
16426 : Subscribed to: domoticz/out
16427 : EVENT: MQTT#Connected
16434 : EVENT: MQTT#Connected Processing time:6 milliSeconds
16557 : NTP  : NTP host au.pool.ntp.org (129.250.35.250) queried
16599 : NTP  : NTP replied: 40 mSec
16600 : EVENT: Time#Set
16606 : EVENT: Time#Set Processing time:5 milliSeconds
15:40:54: ping 1, timeout 0, total payload 32 bytes, 1022 ms
15:40:59: bcn_timout,ap_probe_send_start
15:41:01: ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
25176 : EVENT: WiFi#Disconnected
25183 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 8919 ms
25194 : MQTT : Connection lost
25195 : EVENT: MQTT#Disconnected
25514 : WIFI : Connecting MAD_IOT attempt #0
25515 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 
15:41:02: 
connected with MAD_IOT, channel 13
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
26186 : WIFI : Connected! AP: MAD_IOT (F4:F2:6D:25:84:C6) Ch: 13 Duration: 666 ms
26189 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
26197 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 11 ms
26217 : EVENT: WiFi#Connected
26224 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
26318 : LoadFromFile: config.dat index: 28672 datasize: 336
26344 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:96:EC
26345 : Subscribed to: domoticz/out
26346 : EVENT: MQTT#Connected
26353 : EVENT: MQTT#Connected Processing time:7 milliSeconds
15:41:04: ping 1, timeout 1, total payload 0 bytes, 1022 ms
27623 : NTP  : NTP host au.pool.ntp.org (129.250.35.250) queried
27664 : NTP  : NTP replied: 41 mSec
27665 : EVENT: Time#Set
27671 : EVENT: Time#Set Processing time:6 milliSeconds
27672 : EVENT: Clock#Time=Fri,15:41
27677 : EVENT: Clock#Time=Fri,15:41 Processing time:5 milliSeconds
ping 1, timeout 0, total payload 32 bytes, 1024 ms
15:41:07: 31004 : WD   : Uptime 1 ConnectFailures 4 FreeMem 19304
15:41:10: bcn_timout,ap_probe_send_start
15:41:12: pm open,type:2 0
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
36143 : MQTT : Connection lost
36144 : EVENT: MQTT#Disconnected
36151 : EVENT: WiFi#Disconnected
36156 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 9948 ms
36514 : WIFI : Connecting MAD_IOT attempt #0
36515 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 
15:41:13: 
connected with MAD_IOT, channel 13
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
37145 : WIFI : Connected! AP: MAD_IOT (F4:F2:6D:25:84:C6) Ch: 13 Duration: 625 ms
37147 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
37155 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 10 ms
37176 : EVENT: WiFi#Connected
37182 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
37276 : LoadFromFile: config.dat index: 28672 datasize: 336
37296 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:96:EC
37297 : Subscribed to: domoticz/out
37298 : EVENT: MQTT#Connected
37306 : EVENT: MQTT#Connected Processing time:8 milliSeconds
37551 : NTP  : NTP host au.pool.ntp.org (129.250.35.250) queried
37592 : NTP  : NTP replied: 41 mSec
37593 : EVENT: Time#Set
37599 : EVENT: Time#Set Processing time:6 milliSeconds
15:41:15: ping 1, timeout 0, total payload 32 bytes, 1046 ms
15:41:20: bcn_timout,ap_probe_send_start
15:41:22: ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
46076 : MQTT : Connection lost
46076 : EVENT: MQTT#Disconnected
46083 : EVENT: WiFi#Disconnected
46089 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 8921 ms
46514 : WIFI : Connecting MAD_IOT attempt #0
46515 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 

With current core (Master, ESP82xx Core 41a64707, NONOS SDK 2.2.1(cfd48f3)) the MTU problems don‘t show up anymore.
Having 10 ESPs (Sonoff, NodeMcu, W1pro) running since May 1st w/o any problems.

FYI... Flash the latest official build, reset settings through serial to defaults, enter the WiFiSSID and keys. Couldn't connect to primary AP despite it was visible (but a weak signal about -82). Then crashed as you can see below.
In another location device connected to secondary AP quickly without any issue and so far it's working (but no plugins configured, no rules active).
....
....
458745 : WIFI : Connecting IOTAP1 attempt #57
458749 : AP Mode: Client disconnected: xx:xx:xx:xx:xx:xx Connected devices: 0
458810 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 64 ms
459360 : AP Mode: Client connected: xx:xx:xx:xx:xx:xx Connected devices: 1
471739 : WIFI : AP Mode ssid will be ESP_Easy_0 with address 192.168.4.1
471739 : WIFI : Connecting IOTAP2 attempt #58

Exception (29):
epc1=0x4000e1c3 epc2=0x00000000 epc3=0x40000f68 excvaddr=0x00000018 depc=0x00000000

ctx: sys
sp: 3ffffc50 end: 3fffffb0 offset: 01a0

stack>>>
3ffffdf0: 3fff5108 1f385062 401021e6 3fffa2b0
3ffffe00: 402706a6 402705c4 3fff9454 40100eb6
3ffffe10: 3ffeb6d5 401042bb 3ffef160 4026a718
3ffffe20: 00000018 3ffefb30 3ffefaac 00000000
3ffffe30: 40270767 3ff20a00 3fff9454 3ffedf1a
3ffffe40: 3ffedf00 00000000 00000000 00000006
3ffffe50: 00000021 1f4328f4 401021e6 3ffedf00
3ffffe60: 3ffedf1a 0000002c 00000008 401004f4
3ffffe70: 3ffedf0a 3fff6454 4026d324 3ff20a00
3ffffe80: 3fff9454 3fff55c4 00000015 4026d1f7
3ffffe90: 3fffc278 40101f80 3fffc200 00000022
3ffffea0: 3ffebf74 00000000 00000000 3fff4fe4
3ffffeb0: 40000f68 00000030 00000010 ffffffff
3ffffec0: 40000f58 00000000 00000020 00000000
3ffffed0: 3ffef7d4 7fffffff 00000000 3ffeed30
3ffffee0: 3ffef138 00000006 00000000 3fffdab0
3ffffef0: 00000000 3fffdcc0 00000040 00000030
3fffff00: 40274fd1 ffffffe0 00000000 00000000
3fffff10: 4026ca2f 3ffedf00 3ffef160 3fff9454
3fffff20: 00000000 3ffedf0a 3ffedf20 4026a4d1
3fffff30: 4027fac5 00000001 00000000 3ffedf0a
3fffff40: 4027ec78 00000092 3ffedef4 3fff6454
3fffff50: 3ffedef4 00000000 00000040 4027e5a2
3fffff60: 3ffef160 3ffedef4 3fffdcc0 3ffeb710
3fffff70: 3ffedf10 3fff752c 00000040 3ffeb710
3fffff80: 00000040 3ffef160 00000002 00000000
3fffff90: 4027de7f 3fffdab0 00000000 40283f5f
3fffffa0: 3ffeb710 40000f49 3fffdab0 40000f49
<<

ets Jan 8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v614f7c32
~ld
-U88 :

INIT : Booting version: mega-20180504 (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
89 : INIT : Warm boot #6
90 : FS : Mounting...
116 : FS : Mount successful, used 75802 bytes of 957314
436 : CRC : program checksum ...OK
442 : CRC : SecuritySettings CRC ...OK
548 : INIT : Free RAM:21464
549 : INIT : I2C
549 : INIT : SPI not enabled
565 : INFO : Plugins: 72 [Normal] [Testing] [Development] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
566 : WIFI : Set WiFi to STA
599 : WIFI : Connecting IOTAP1 attempt #0
607 : WD : Uptime 0 ConnectFailures 0 FreeMem 20616
3461 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2861 ms
3604 : WIFI : Connecting IOTAP1 attempt #1
6466 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2862 ms
6604 : WIFI : Connecting IOTAP2 attempt #2
9467 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2862 ms
9604 : WIFI : Connecting IOTAP2 attempt #3
12467 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2862 ms
12604 : WIFI : Connecting IOTAP1 attempt #4
15466 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2862 ms
15604 : WIFI : Connecting IOTAP1 attempt #5
18466 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2862 ms
18604 : WIFI : Set WiFi to AP+STA
19524 : WIFI : AP Mode ssid will be ESP_Easy_0 with address 192.168.4.1
19524 : WIFI : Connecting IOTAP2 attempt #6
22388 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2863 ms
22603 : WIFI : AP Mode ssid will be ESP_Easy_0 with address 192.168.4.1
22603 : WIFI : Connecting IOTAP2 attempt #7
25463 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2860 ms
25602 : WIFI : AP Mode ssid will be ESP_Easy_0 with address 192.168.4.1
25603 : WIFI : Connecting IOTAP1 attempt #8
28464 : WIFI : Disconnected! Reason: '(201) No AP found' Connected for 2860 ms
28602 : WIFI : AP Mode ssid will be ESP_Easy_0 with address 192.168.4.1
28603 : WIFI : Connecting IOTAP1 attempt #9
30606 : WD : Uptime 1 ConnectFailures 0 FreeMem 18176
...
...

I saw that "No AP found" error also yesterday.
It seems to be related to some incorrect or corrupted settings. Not just the settings we store, but maybe also the settings stored in another part of the flash, by the core library.

@Oxyandy NTP updates should be done only after an hour, or when it was not set.

@susisstrolch Could you test to write a rules file containing > 1800 bytes?
The LWIP2 high bandwidth version was corrupting the HTTP POST requests when they exceed one MTU in size.

Hi Gijs,
ESP_Easy_mega-20180504_normal_ESP8266_1024.bin
If you saw the log, it's a pattern, always connects, MQTT connects, updates time

bcn_timout,ap_probe_send_start
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
2515788 : EVENT: WiFi#DisconnectedDisconnected! Reason: '(200) Beacon timeout' Connected for 8919 ms

every 10 to 12 seconds, looped like that for hours, oops

Rule size of my pondctrl device: Current size: 1859 characters (Max 2048)
MTU: 534 (low mem lwip 2.x)
No Issues - changed multiple times this week.

We are currently using LwIP 2.x low memory, because the high bandwidth version was giving these issues.
@Oxyandy I will look into the code regarding the time update and of-course the reconnect loop.
This topic is becoming more and more on-topic :(

I don't get it, the one I compiled homebrew (this morning) as per your request worked great..
then when the official release was available I thought I had better use that, I was shocked with the difference..
Now, regarding your last reply.. time update is fine..
the official release from today connects to wifi no problem, connects MQTT, gets the time, then drops connection with "(200) Beacon timeout" then loops every 11 seconds
Connect - wifi, MQTT, time, disconnect, repeat
So don't look at "time update", if it stayed connected would be fine...

That's strange. There is something really funky about the build process of PlatformIO.
I've noticed it myself also a few times that building it for the second time actually makes a difference.
It is just like there is something wrong at linker level in PlatformIO.
It is really strange what's happening here.

its probably in the compiler or linker flags. Optimization can be a bitch.
The Arduino IDE has this config file (platform.txt)

# ESP8266 platform
# ------------------------------

# For more info:
# https://github.com/arduino/Arduino/wiki/Arduino-IDE-1.5---3rd-party-Hardware-specification

name=ESP8266 Modules
version=2.5.0-dev

runtime.tools.xtensa-lx106-elf-gcc.path={runtime.platform.path}/tools/xtensa-lx106-elf
runtime.tools.esptool.path={runtime.platform.path}/tools/esptool

compiler.warning_flags=-w
compiler.warning_flags.none=-w
compiler.warning_flags.default=
compiler.warning_flags.more=-Wall
compiler.warning_flags.all=-Wall -Wextra

build.lwip_lib=-llwip_gcc
build.lwip_include=lwip/include
build.lwip_flags=-DLWIP_OPEN_SRC

build.vtable_flags=-DVTABLES_IN_FLASH

build.float=-u _printf_float -u _scanf_float
build.led=

compiler.path={runtime.tools.xtensa-lx106-elf-gcc.path}/bin/
compiler.sdk.path={runtime.platform.path}/tools/sdk
compiler.libc.path={runtime.platform.path}/tools/sdk/libc/xtensa-lx106-elf
compiler.cpreprocessor.flags=-D__ets__ -DICACHE_FLASH -U__STRICT_ANSI__ "-I{compiler.sdk.path}/include" "-I{compiler.sdk.path}/{build.lwip_include}" "-I{compiler.libc.path}/include" "-I{build.path}/core"

compiler.c.cmd=xtensa-lx106-elf-gcc
compiler.c.flags=-c {compiler.warning_flags} -Os -g -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -std=gnu99 -ffunction-sections -fdata-sections

compiler.S.cmd=xtensa-lx106-elf-gcc
compiler.S.flags=-c -g -x assembler-with-cpp -MMD -mlongcalls

compiler.c.elf.flags=-g {compiler.warning_flags} -Os -nostdlib -Wl,--no-check-sections -u app_entry {build.float} -Wl,-static "-L{compiler.sdk.path}/lib" "-L{compiler.sdk.path}/ld" "-L{compiler.libc.path}/lib" "-T{build.flash_ld}" -Wl,--gc-sections -Wl,-wrap,system_restart_local -Wl,-wrap,spi_flash_read

compiler.c.elf.cmd=xtensa-lx106-elf-gcc
compiler.c.elf.libs=-lhal -lphy -lpp -lnet80211 {build.lwip_lib} -lwpa -lcrypto -lmain -lwps -laxtls -lespnow -lsmartconfig -lairkiss -lwpa2 -lstdc++ -lm -lc -lgcc

compiler.cpp.cmd=xtensa-lx106-elf-g++
compiler.cpp.flags=-c {compiler.warning_flags} -Os -g -mlongcalls -mtext-section-literals -fno-exceptions -fno-rtti -falign-functions=4 -std=c++11 -MMD -ffunction-sections -fdata-sections

compiler.as.cmd=xtensa-lx106-elf-as

compiler.ar.cmd=xtensa-lx106-elf-ar
compiler.ar.flags=cru

compiler.elf2hex.cmd=esptool
compiler.elf2hex.flags=

compiler.size.cmd=xtensa-lx106-elf-size

compiler.esptool.cmd=esptool
compiler.esptool.cmd.windows=esptool.exe

# This can be overriden in boards.txt
build.extra_flags=-DESP8266

# These can be overridden in platform.local.txt
compiler.c.extra_flags=
compiler.c.elf.extra_flags=
compiler.S.extra_flags=
compiler.cpp.extra_flags=
compiler.ar.extra_flags=
compiler.objcopy.eep.extra_flags=
compiler.elf2hex.extra_flags=

## generate file with git version number
## needs bash, git, and echo
recipe.hooks.core.prebuild.1.pattern=bash -c "mkdir -p {build.path}/core && echo \#define ARDUINO_ESP8266_GIT_VER 0x`git --git-dir {runtime.platform.path}/.git rev-parse --short=8 HEAD 2>/dev/null || echo ffffffff` >{build.path}/core/core_version.h"
recipe.hooks.core.prebuild.2.pattern=bash -c "mkdir -p {build.path}/core && echo \#define ARDUINO_ESP8266_GIT_DESC `cd {runtime.platform.path}; git describe --tags 2>/dev/null || echo unix-{version}` >>{build.path}/core/core_version.h"
## windows-compatible version without git
recipe.hooks.core.prebuild.1.pattern.windows=cmd.exe /c mkdir {build.path}\core & (echo #define ARDUINO_ESP8266_GIT_VER 0x00000000 & echo #define ARDUINO_ESP8266_GIT_DESC win-{version} ) > {build.path}\core\core_version.h
recipe.hooks.core.prebuild.2.pattern.windows=

## Build the app.ld linker file
recipe.hooks.linking.prelink.1.pattern="{compiler.path}{compiler.c.cmd}" -CC -E -P {build.vtable_flags} "{runtime.platform.path}/tools/sdk/ld/eagle.app.v6.common.ld.h" -o "{runtime.platform.path}/tools/sdk/ld/eagle.app.v6.common.ld"

## Compile c files
recipe.c.o.pattern="{compiler.path}{compiler.c.cmd}" {compiler.cpreprocessor.flags} {compiler.c.flags} -DF_CPU={build.f_cpu} {build.lwip_flags} {build.debug_port} {build.debug_level} -DARDUINO={runtime.ide.version} -DARDUINO_{build.board} -DARDUINO_ARCH_{build.arch} -DARDUINO_BOARD="{build.board}" {build.led} {compiler.c.extra_flags} {build.extra_flags} {includes} "{source_file}" -o "{object_file}"

## Compile c++ files
recipe.cpp.o.pattern="{compiler.path}{compiler.cpp.cmd}" {compiler.cpreprocessor.flags} {compiler.cpp.flags} -DF_CPU={build.f_cpu} {build.lwip_flags} {build.debug_port} {build.debug_level} -DARDUINO={runtime.ide.version} -DARDUINO_{build.board} -DARDUINO_ARCH_{build.arch} -DARDUINO_BOARD="{build.board}" {build.led} {compiler.cpp.extra_flags} {build.extra_flags} {includes} "{source_file}" -o "{object_file}"

## Compile S files
recipe.S.o.pattern="{compiler.path}{compiler.c.cmd}" {compiler.cpreprocessor.flags} {compiler.S.flags} -DF_CPU={build.f_cpu} {build.lwip_flags} {build.debug_port} {build.debug_level} -DARDUINO={runtime.ide.version} -DARDUINO_{build.board} -DARDUINO_ARCH_{build.arch} -DARDUINO_BOARD="{build.board}" {build.led} {compiler.c.extra_flags} {build.extra_flags} {includes} "{source_file}" -o "{object_file}"

## Create archives
recipe.ar.pattern="{compiler.path}{compiler.ar.cmd}" {compiler.ar.flags} {compiler.ar.extra_flags} "{build.path}/arduino.ar" "{object_file}"

## Combine gc-sections, archives, and objects
recipe.c.combine.pattern="{compiler.path}{compiler.c.elf.cmd}" -Wl,-Map "-Wl,{build.path}/{build.project_name}.map" {compiler.c.elf.flags} {compiler.c.elf.extra_flags} -o "{build.path}/{build.project_name}.elf" -Wl,--start-group {object_files} "{build.path}/arduino.ar" {compiler.c.elf.libs} -Wl,--end-group  "-L{build.path}"

## Create eeprom
recipe.objcopy.eep.pattern=

## Create hex
#recipe.objcopy.hex.pattern="{compiler.path}{compiler.elf2hex.cmd}" {compiler.elf2hex.flags} {compiler.elf2hex.extra_flags} "{build.path}/{build.project_name}.elf" "{build.path}/{build.project_name}.hex"

recipe.objcopy.hex.pattern="{runtime.tools.esptool.path}/{compiler.esptool.cmd}" -eo "{runtime.platform.path}/bootloaders/eboot/eboot.elf" -bo "{build.path}/{build.project_name}.bin" -bm {build.flash_mode} -bf {build.flash_freq} -bz {build.flash_size} -bs .text -bp 4096 -ec -eo "{build.path}/{build.project_name}.elf" -bs .irom0.text -bs .text -bs .data -bs .rodata -bc -ec

## Save hex
recipe.output.tmp_file={build.project_name}.bin
recipe.output.save_file={build.project_name}.{build.variant}.bin

## Compute size
recipe.size.pattern="{compiler.path}{compiler.size.cmd}" -A "{build.path}/{build.project_name}.elf"
recipe.size.regex=^(?:\.irom0\.text|\.text|\.data|\.rodata|)\s+([0-9]+).*
recipe.size.regex.data=^(?:\.data|\.rodata|\.bss)\s+([0-9]+).*
#recipe.size.regex.eeprom=^(?:\.eeprom)\s+([0-9]+).*

# ------------------------------

tools.esptool.cmd=esptool
tools.esptool.cmd.windows=esptool.exe
tools.esptool.path={runtime.platform.path}/tools/esptool
tools.esptool.network_cmd=python
tools.esptool.network_cmd.windows=python.exe

tools.esptool.upload.protocol=esp
tools.esptool.upload.params.verbose=-vv
tools.esptool.upload.params.quiet=
tools.esptool.upload.pattern="{path}/{cmd}" {upload.verbose} -cd {upload.resetmethod} -cb {upload.speed} -cp "{serial.port}" {upload.erase_cmd} -ca 0x00000 -cf "{build.path}/{build.project_name}.bin"
tools.esptool.upload.network_pattern="{network_cmd}" "{runtime.platform.path}/tools/espota.py" -i "{serial.port}" -p "{network.port}" "--auth={network.password}" -f "{build.path}/{build.project_name}.bin"

tools.mkspiffs.cmd=mkspiffs
tools.mkspiffs.cmd.windows=mkspiffs.exe
tools.mkspiffs.path={runtime.platform.path}/tools/mkspiffs

To avoid issue with platformIO I deleted the .pioenvs folder & ran 'Clean' anyway
Since this morning's build - 2 files have changed
ESPEasy-Globals.h & Misc.ino (Fix corruption of task settings)

Starting to do this: my current folder structure is GitHub/ESpeasy
I copy ESpeasy folder & rename that to include the date before I do a fetch, will help me compare changes locally..

FritzBox did an automatic firmware update. All ESPs failed over to the alternate Mesh AP w/o any problems.

@TD-er you said "Do you have this Beacon timeout with 0504 on any node, or just a few?"
Ok to be fair I will grab a fresh module & flash that & in future flash 2 nodes every new firmware.
I have time to spare as is only 4PM in your part of the world

Ok, a brand new Node, erased.
Flashed with ESP_Easy_mega-20180504_normal_ESP8266_1024.bin
The log below has identical behaviour to the other node..
Can you see the pattern in the log ?

INIT : Booting version: mega-20180504 (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
104 : INIT : Cold Boot
105 : FS   : Mounting...
111 : FS   : Mount successful, used 76053 bytes of 113201
408 : CRC  : program checksum       ...OK
419 : CRC  : SecuritySettings CRC   ...OK 
420 : CRC  : binary has changed since last save of Settings
439 : INIT : Free RAM:23448
440 : INIT : I2C
440 : INIT : SPI not enabled
455 : INFO : Plugins: 47 [Normal] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
455 : EVENT: System#Wake
464 : WIFI : Set WiFi to STA
mode : sta(5c:cf:7f:72:97:2a)
add if0
497 : WIFI : Connecting MAD_MOB attempt #0
498 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
STUB: dhcp_stop
511 : EVENT: System#Boot
519 : SW   : Switch state 1 Output value 1
522 : EVENT: Float_SW#Switch=1.00
536 : ACT  : Publish domoticz/in,{"idx":26,"nvalue":0,"svalue":"FLOAT_SWITCH_1_00:00:00"}
547 : Command: publish
00:23:56: 1004 : WD   : Uptime 0 ConnectFailures 0 FreeMem 22752
00:23:59: scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 
00:24:01: 
connected with MAD_MOB, channel 7
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
5287 : WIFI : Connected! AP: MAD_MOB (18:90:D8:AC:0F:D8) Ch: 7 Duration: 4787 ms
5293 : EVENT: WiFi#ChangedAccesspoint
5304 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
5306 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 19 ms
5324 : EVENT: WiFi#Connected
5332 : Webserver: start
5332 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
5423 : MQTT : Intentional reconnect
5424 : LoadFromFile: config.dat index: 28672 datasize: 336
5500 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:97:2A
5501 : Subscribed to: domoticz/out
5502 : EVENT: MQTT#Connected
5512 : EVENT: MQTT#Connected Processing time:10 milliSeconds
5796 : NTP  : NTP host au.pool.ntp.org (27.124.125.251) queried
5867 : NTP  : NTP replied: 70 mSec
5869 : Current Time Zone:  DST time start: 2018-10-07 01:00:00 offset: 660 minSTD time start: 2018-08-05 01:00:00 offset: 600 min
5871 : EVENT: Time#Initialized
5879 : EVENT: Time#Initialized Processing time:8 milliSeconds
5881 : EVENT: Clock#Time=Sat,01:24
5887 : EVENT: Clock#Time=Sat,01:24 Processing time:6 milliSeconds
00:24:02: ping 1, timeout 0, total payload 32 bytes, 1023 ms
00:24:07: bcn_timout,ap_probe_send_start
00:24:09: pm open,type:2 0
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
14289 : EVENT: WiFi#Disconnected
14296 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 9002 ms
14311 : MQTT : Connection lost
14311 : EVENT: MQTT#Disconnected
14519 : WIFI : Connecting MAD_MOB attempt #0
14520 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
00:24:11: scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 

connected with MAD_MOB, channel 7
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
16497 : WIFI : Connected! AP: MAD_MOB (18:90:D8:AC:0F:D8) Ch: 7 Duration: 1972 ms
16499 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
16507 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 10 ms
16527 : EVENT: WiFi#Connected
16533 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
16608 : NTP  : NTP host au.pool.ntp.org (27.124.125.251) queried
00:24:12: 16679 : NTP  : NTP replied: 70 mSec
16680 : EVENT: Time#Set
16686 : EVENT: Time#Set Processing time:6 milliSeconds
16688 : LoadFromFile: config.dat index: 28672 datasize: 336
16713 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:97:2A
16714 : Subscribed to: domoticz/out
16715 : EVENT: MQTT#Connected
16724 : EVENT: MQTT#Connected Processing time:9 milliSeconds
00:24:14: ping 1, timeout 0, total payload 32 bytes, 2070 ms
00:24:18: bcn_timout,ap_probe_send_start
00:24:21: ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
25349 : EVENT: WiFi#Disconnected
25356 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 8852 ms
25371 : MQTT : Connection lost
25371 : EVENT: MQTT#Disconnected
25519 : WIFI : Connecting MAD_MOB attempt #0
25520 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 

connected with MAD_MOB, channel 7
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
25683 : WIFI : Connected! AP: MAD_MOB (18:90:D8:AC:0F:D8) Ch: 7 Duration: 158 ms
25686 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
25694 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 11 ms
25714 : EVENT: WiFi#Connected
25721 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
25815 : LoadFromFile: config.dat index: 28672 datasize: 336
25836 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:97:2A
25838 : Subscribed to: domoticz/out
25838 : EVENT: MQTT#Connected
25845 : EVENT: MQTT#Connected Processing time:7 milliSeconds
00:24:22: 26585 : NTP  : NTP host au.pool.ntp.org (27.124.125.251) queried
26656 : NTP  : NTP replied: 70 mSec
26657 : EVENT: Time#Set
26663 : EVENT: Time#Set Processing time:6 milliSeconds
ping 1, timeout 0, total payload 32 bytes, 1010 ms
00:24:26: 31005 : WD   : Uptime 1 ConnectFailures 4 FreeMem 19304
00:24:28: bcn_timout,ap_probe_send_start
00:24:30: ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
35077 : EVENT: WiFi#Disconnected
35083 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 9394 ms
35094 : MQTT : Connection lost
35095 : EVENT: MQTT#Disconnected
35519 : WIFI : Connecting MAD_MOB attempt #0
35520 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 

connected with MAD_MOB, channel 7
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
35685 : WIFI : Connected! AP: MAD_MOB (18:90:D8:AC:0F:D8) Ch: 7 Duration: 160 ms
35687 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
35696 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 11 ms
35715 : EVENT: WiFi#Connected
35721 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
35816 : LoadFromFile: config.dat index: 28672 datasize: 336
35844 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:97:2A
35845 : Subscribed to: domoticz/out
35846 : EVENT: MQTT#Connected
35855 : EVENT: MQTT#Connected Processing time:9 milliSeconds
00:24:32: 36735 : NTP  : NTP host au.pool.ntp.org (144.48.166.166) queried
ping 1, timeout 0, total payload 32 bytes, 1016 ms
37739 : NTP  : No reply
00:24:39: bcn_timout,ap_probe_send_start
00:24:41: pm open,type:2 0
ap_probe_send over, rest wifi status to disassoc
state: 5 -> 0 (1)
rm 0
pm close 7
46238 : EVENT: WiFi#Disconnected
46244 : WIFI : Disconnected! Reason: '(200) Beacon timeout' Connected for 10 s
46260 : MQTT : Connection lost
46261 : EVENT: MQTT#Disconnected
46519 : WIFI : Connecting MAD_MOB attempt #0
46520 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
scandone
state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 1
cnt 
00:24:42: 
connected with MAD_MOB, channel 7
ip:192.168.0.225,mask:255.255.255.0,gw:192.168.0.254
46679 : WIFI : Connected! AP: MAD_MOB (18:90:D8:AC:0F:D8) Ch: 7 Duration: 154 ms
46681 : IP   : Static IP : 192.168.0.225 GW: 192.168.0.254 SN: 255.255.255.0 DNS: 8.8.8.8
46689 : WIFI : Static IP: 192.168.0.225 (ESP-Easy-0) GW: 192.168.0.254 SN: 255.255.255.0   duration: 10 ms
46709 : EVENT: WiFi#Connected
46715 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
46809 : LoadFromFile: config.dat index: 28672 datasize: 336
46843 : MQTT : Connected to broker with client ID: ESPClient_5C:CF:7F:72:97:2A
46844 : Subscribed to: domoticz/out
46845 : EVENT: MQTT#Connected
46851 : EVENT: MQTT#Connected Processing time:6 milliSeconds

Ah I guess it's time I setup a special access point running wireshark

Using GitHub Desktop, I fetched the latest 'live' source which includes only 2 changed files from https://github.com/letscontrolit/ESPEasy/commit/92680c5542b76a15db16af198a3a07ed17618c4e since my working compile from this morning, made an evening version & it works perfect..
Why are the nightly builds different if is same source, what is happening differently on GitHub ?

Anyone using Arduino IDE ?
Are you able to build an "ESP8266 normal 1024" from current src and dump it here ?
Thanks

not sure if this is related to some of the random problems we're seeing, but I noticed, that after some time, my units stop sending data to the controller. However the web-interface is still up and running, but t shows an IP-Address of 0.0.0.0. (see screenshot). Anyone else seeing this?

untitled

image

What build version is this?

@Oxyandy

Why are the nightly builds different if is same source, what is happening differently on GitHub ?

That's what I am wondering also.
I guess it has something to do with the fact we also have to build it twice sometimes when building it ourselves.
There seems to be something wrong with the way we compile stuff, or with the compiler.
That may perhaps also explain why we're seeing so many things that are very hard, if not impossible, to reproduce the last weeks.

I discussed this also with @arendst from Tasmota, and he confirmed he also has to rebuild every now and then to get a good working version.
I will ask @psy0rz if it is possible to build the nightly builds twice as a test.

self compiled from mega-20180503
Build | 20102 - Mega
Libraries | ESP82xx Core 76a14b1f, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3

Could you try to build it twice, before writing it to the node?
No clean between builds, just build it and build again.

sure, but I use Arduino SDK on a mac, not platformIO... still wotrth a try?

Yes please do, since we don't have a clue yet what is causing it.
Just make sure you're using the same core libraries as we do, since Arduino IDE does not look at the fixed versions set in PlatformIO.
The current versions we use is:
ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3

ok, I'll try that now and flash some units. I'll report as soon as something happens... or nothing at all ;)

BTW. the current build can be crashed anytime by keeping F5 pressed in web browser at least on Device or Notifications pages (probably on any ESP web page) for a few seconds. I can repeat it anytime from Firefox on Win10:
...
...
39543696 : Ram usage: Webserver only: 0 including Core: 0
39551825 : LoadFromFile: config.dat index: 27648 datasize: 320
39557599 : Ram usage: Webserver only: 0 including Core: 0
39566681 : Ram usage: Webserver only: 0 including Core: 0
39566879 : Ram usage: Webserver only: 0 including Core: 0
39567105 : Ram usage: Webserver only: 0 including Core: 0
39567490 : Ram usage: Webserver only: 0 including Core: 0
39567690 : Ram usage: Webserver only: 0 including Core: 0
39567883 : Ram usage: Webserver only: 0 including Core: 0
39568086 : Ram usage: Webserver only: 0 including Core: 0
39568287 : Ram usage: Webserver only: 0 including Core: 0
39568495 : Ram usage: Webserver only: 0 including Core: 0
39568701 : Ram usage: Webserver only: 0 including Core: 0
39568910 : Ram usage: Webserver only: 0 including Core: 0
39569112 : Ram usage: Webserver only: 0 including Core: 0
39569324 : Ram usage: Webserver only: 0 including Core: 0
39569530 : Ram usage: Webserver only: 0 including Core: 0
39569739 : Ram usage: Webserver only: 0 including Core: 0
39569948 : Ram usage: Webserver only: 0 including Core: 0
39570158 : Ram usage: Webserver only: 0 including Core: 0
39570366 : Ram usage: Webserver only: 0 including Core: 0
39570582 : Ram usage: Webserver only: 0 including Core: 0
39570742 : Webpage skipped: low memory: 1784
39570832 : Webpage skipped: low memory: 1784
39570906 : Webpage skipped: low memory: 1784
39570992 : Webpage skipped: low memory: 1784
39571074 : Webpage skipped: low memory: 1784
39571161 : Webpage skipped: low memory: 1784
39571240 : Webpage skipped: low memory: 1784
39571322 : Webpage skipped: low memory: 1784
39571401 : Webpage skipped: low memory: 1784

Exception (28):
epc1=0x4025cb66 epc2=0x00000000 epc3=0x401003f2 excvaddr=0x00000000 depc=0x00000000

ctx: cont
sp: 3fff4690 end: 3fff4b20 offset: 01a0

stack>>>
3fff4830: 00000000 3ffe90c0 3fff48a4 40253308
3fff4840: 00000000 3ffe90c0 3fff48d4 40257c32
3fff4850: 00000000 3ffe90c0 00000043 4021ada8
3fff4860: 00000000 00000000 00000000 000006c0
3fff4870: 000006f8 00000000 00000000 00000000
3fff4880: 00000000 00000000 00000000 40107b18
3fff4890: 00000000 000003e8 3fff48f0 00000000
3fff48a0: 00000000 00000000 00000000 00000000
3fff48b0: 4029cdfc 00000007 3fff48f0 3fffbfdc
3fff48c0: 0000000f 0000000b 3fff815c 0000000f
3fff48d0: 00000000 3fffb8a4 0000025f 0000025c
3fff48e0: 00000001 3fff16d4 3fff6294 40227cb4
3fff48f0: 3fffb414 0000000f 00000007 4010053d
3fff4900: 3fff4d6c 00000855 00000855 3fff4d6c
3fff4910: 00000010 00000010 00000000 3fff36d4
3fff4920: 00000010 3fff5d14 3fff5d14 40257a6f
3fff4930: 3ffe8ea1 00000000 3fff5d14 40257abb
3fff4940: 00000000 00000010 3fff5d14 3fff4d6c
3fff4950: 40107b70 ffffffff 00000000 40253308
3fff4960: 3fff3a14 00000001 3fff6294 4022fc8d
3fff4970: 00000010 3fff49e0 3ffe8ea1 40207ae8
3fff4980: 00000000 3fff49e0 3fff1868 4028577b
3fff4990: 4025653c 00000001 3fff1868 40253308
3fff49a0: 3fff4d6c 00000c35 00000c35 4010020c
3fff49b0: 00000001 00000001 3fff49e0 40107b70
3fff49c0: ffffffff 40107b70 00000000 40257a14
3fff49d0: 4020a8ca 00000001 3fff6294 4022fd96
3fff49e0: 00000000 00000000 00000000 40253308
3fff49f0: 00000001 00000001 3fff6294 4022fe80
3fff4a00: 00000001 00000001 3fff4a30 40259cfa
3fff4a10: 3fff4d6c 00000112 3fff6294 402532fe
3fff4a20: 3fff6294 3fff366c 3fff6294 4025333a
3fff4a30: 00000000 00000000 00000000 40257c18
3fff4a40: 3fff6294 3fff366c 3fff3628 402533c1
3fff4a50: 3fff5afc 0000000f 00000008 00000000
3fff4a60: 00000000 3fff4ab0 3fff362c 4024ca28
3fff4a70: 3fff366c 00000001 00000000 4024d200
3fff4a80: 00000001 00000000 40251b18 0000000d
3fff4a90: 00000000 3fff7b4c 3fff3628 3fff3af4
3fff4aa0: 00000001 3fff3650 3fff3628 40253618
3fff4ab0: 40107910 00000000 00001388 3fff3b00
3fff4ac0: 00000000 3fff7b4c 00000000 40256abd
3fff4ad0: 3fffdad0 00000000 3fff1944 4023742a
3fff4ae0: 3fffdad0 00000000 3fff19c4 40240380
3fff4af0: 00000000 00000000 00000001 40258a31
3fff4b00: 3fffdad0 00000000 3fff3aee 40258a5c
3fff4b10: feefeffe feefeffe 3fff3b00 40100700
<<

ets Jan 8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v614f7c32
~ld
▒U88 :

INIT : Booting version: mega-20180504 (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
89 : INIT : Warm boot #3
90 : FS : Mounting...
115 : FS : Mount successful, used 75802 bytes of 957314
437 : CRC : program checksum ...OK
469 : CRC : SecuritySettings CRC ...OK
575 : INIT : Free RAM:21464
575 : INIT : I2C
575 : INIT : SPI not enabled
1677 : INFO : Plugins: 72 [Normal] [Testing] [Development] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
1678 : EVENT: System#Wake
1682 : WIFI : Set WiFi to STA
...
...

Device then connected to AP successfully. Another crash:

973 : Ram usage: Webserver only: 0 including Core: 0
279178 : Ram usage: Webserver only: 0 including Core: 0
279387 : Ram usage: Webserver only: 0 including Core: 0
279590 : Ram usage: Webserver only: 0 including Core: 0
279798 : Ram usage: Webserver only: 0 including Core: 0
280321 : Ram usage: Webserver only: 0 including Core: 0
280558 : Ram usage: Webserver only: 0 including Core: 0
280785 : Ram usage: Webserver only: 0 including Core: 0
Fatal exception 28(LoadProhibitedCause):
epc1=0x4025cb66, epc2=0x00000000, epc3=0x40100408, excvaddr=0x00000000, depc=0x00000000

Exception (28):
epc1=0x4025cb66 epc2=0x00000000 epc3=0x40100408 excvaddr=0x00000000 depc=0x00000000

ctx: cont
sp: 3fff4690 end: 3fff4b20 offset: 01a0

stack>>>
3fff4830: 00000000 3ffe90c0 3fff48a4 40253308
3fff4840: 00000000 3ffe90c0 3fff48d4 40257c32
3fff4850: 00000000 3ffe90c0 0000002f 4021ada8
3fff4860: 00000000 00000000 00000000 000007e8
3fff4870: 00000820 00000000 00000000 00000000
3fff4880: 00000000 00000000 00000000 40107b18
3fff4890: 00000000 000003e8 3fff48f0 00000000
3fff48a0: 00000000 00000000 00000000 00000000
3fff48b0: 4029cdfc 00000007 3fff48f0 3fff9af4
3fff48c0: 0000000f 0000000b 3fff9adc 0000000f
3fff48d0: 00000004 3fffb7bc 0000025f 00000130
3fff48e0: 00000001 3fff16d4 3fff773c 40227cb4
3fff48f0: 3fff83ac 0000000f 00000007 4010053d
3fff4900: 3fff4d6c 00000a67 00000a67 3fff4d6c
3fff4910: 00000010 00000010 00000000 3fff36d4
3fff4920: 00000010 3fff9014 3fff9014 40257a6f
3fff4930: 3ffe8ea1 00000000 3fff9014 40257abb
3fff4940: 00000000 00000010 3fff9014 3fff4d6c
3fff4950: 40107b70 ffffffff 00000000 40253308
3fff4960: 3fff3a14 00000001 3fff773c 4022fc8d
3fff4970: 00000010 3fff49e0 3ffe8ea1 40207ae8
3fff4980: 00000000 3fff49e0 3fff185c 4028577b
3fff4990: 4025653c 00000001 3fff185c 40253308
3fff49a0: 3fff4d6c 00000628 00000628 4010020c
3fff49b0: 00000001 00000001 3fff49e0 40107b70
3fff49c0: ffffffff 40107b70 00000000 40257a14
3fff49d0: 4020a8ca 00000001 3fff773c 4022fd96
3fff49e0: 00000000 00000000 00000000 40253308
3fff49f0: 00000001 00000001 3fff773c 4022fe80
3fff4a00: 00000001 00000001 3fff4a30 40259cfa
3fff4a10: 00000000 00000000 3fff773c 402532fe
3fff4a20: 3fff773c 3fff366c 3fff773c 4025333a
3fff4a30: 00000000 00000000 00000000 40257c18
3fff4a40: 3fff773c 3fff366c 3fff3628 402533c1
3fff4a50: 3fff9594 0000000f 00000008 00000000
3fff4a60: 00000000 00000000 3fff362c 4024ca28
3fff4a70: 3fff366c 00000001 00000000 4024d200
3fff4a80: 00000001 00000000 40251b18 0000000d
3fff4a90: 00000000 3fff9b0c 3fff3628 3fff3af4
3fff4aa0: 00000001 3fff3650 3fff3628 40253618
3fff4ab0: 40107910 00000000 00001388 3fff3b00
3fff4ac0: 00000000 3fff9b0c 00000000 40256abd
3fff4ad0: 3fffdad0 00000000 3fff1944 4023742a
3fff4ae0: 3fffdad0 00000000 3fff19c4 40240380
3fff4af0: 00000000 00000000 00000001 40258a31
3fff4b00: 3fffdad0 00000000 3fff3aee 40258a5c
3fff4b10: feefeffe feefeffe 3fff3b00 40100700
<<

ets Jan 8 2013,rst cause:2, boot mode:(3,7)

load 0x4010f000, len 1384, room 16
tail 8
chksum 0x2d
csum 0x2d
v614f7c32
~ld
▒U89 :

INIT : Booting version: mega-20180504 (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
89 : INIT : Warm boot #8
91 : FS : Mounting...
116 : FS : Mount successful, used 75802 bytes of 957314
438 : CRC : program checksum ...OK
469 : CRC : SecuritySettings CRC ...OK
575 : INIT : Free RAM:21464
576 : INIT : I2C
576 : INIT : SPI not enabled
1678 : INFO : Plugins: 72 [Normal] [Testing] [Development] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
1678 : EVENT: System#Wake
1683 : WIFI : Set WiFi to STA
...

Well that is with 72 plugins, how about normal does that behave the same ;)

That's normal if you flood the IP stack with requests. Only thing that helps is servicing the Webserver more often so that the inbound buffer gets freed again.

At least it's good that's recoverable - complete hang would be worse than reboot... But it's strange that boot cause was not same everytime - I saw also cause 4.

ESP_Easy_mega-20180504_test_ESP8266_1024.bin
I can get wifi to connect (first try) and stay connected
F5 (Devices page) causes no errors or crash
ESP_Easy_mega-20180504_normal_ESP8266_1024.bin
Connects and fails in an 11 second cycle over & over - (200) Beacon timeout
ESP_Easy_mega-20180504_normal_ESP8266_1024_(Self_Compiled).bin
Perfect

@Oxyandy Slow keyboard autorepeat? ;-)
Mine crashes on every web page I tried.

@ghtester I will build a set on my PC, with all the current code. Will take about 30 minutes to build.

@TD-er Thanks a lot for your effort, I'll test it when the new build is ready.

@TD-er flashed my units (16) now with ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3 built twice before flashing... I'll see tomorrow if they're still alive and sending data... I had to take lwIP2 high bandwith otherwise the data sent via the FHEM-Controller is truncated!
but need to sleep now.... n8

Be aware about the HTTP POST issues with the high bandwidth version.

My build is now running for the 3rd time (had some ESP32 issue which needed to be fixed and want to make sure only the linking is done in the last build)
So there will be a zip file in a few minutes.
Then I'm off to bed too.

TD-er, your build, my hardware, no issue with the (normal 1024 8266)
Waiting on the daily build to show will try that next ;)

@TD-er Thanks a lot, quickly tested 4096 dev release (to be continued), the "F5" reboot issue is still there (first attempt hanged the device completely) and the firmware info says MD5 check fail ( I suppose it's due to test build). Nevertheless everything else is fine so far and it's connecting perfectly and quickly to AP.


Build 20102 - Mega
Libraries ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3
GIT version
Plugins 72 [Normal] [Testing] [Development]
Build Md5 4d44355f4d44355f4d44355f4d44355f
Md5 check fail !
Build time May 5 2018 00:33:21
Binary filename ThisIsTheDummyPlaceHolderForTheBinaryFilename...

As built and released on GitHub both these BINs, 1 day (1 Build) apart.
ESP_Easy_mega-20180504_normal_ESP8266_1024.bin
Confirmed 'faulty' many different ways, resetting, different nodes etc
the problems consistent (201 Beacon Timeout) in an 11 second loop
Homebrew working perfectly from same source.
to
ESP_Easy_mega-20180505_normal_ESP8266_1024.bin
Working perfectly, the changes in source being minimal
tells me, the GitHub releases have a failure 'somewhere' in compiling..
Connected to Wifi first try, instant time update, not dropped Wifi once,
MQTT maintaining connection
etc etc -
nothing to fault ;)

So this means a lot of time spent the past few weeks (???) may be due to compile issues?
That's unfortunate, but at least gives me confidence I am not losing my mind, in seeing all kinds of issues reported that I could not reproduce nor explain.

As for the f5 issue: try lwip high bandwidth as it had larger buffers. It may crash a little later.

Compile issues: you are running at 80 MHz, right?

80 MHz is the default I guess.

I know. But setting it to 160 may cause strange behaviour.

Yeah the "F5" issue is the only serious trouble I see so far in this special build. In fact if I just press it quickly severeal times to refresh ESP webpage, it makes troubles:

18084332 : EVENT: Clock#Time=Sat,08:47 Processing time:2 milliSeconds
18091174 : WD : Uptime 302 ConnectFailures 0 FreeMem 14504
bcn_timout,ap_probe_send_start
18112119 : Ram usage: Webserver only: 0 including Core: 0
18115167 : Ram usage: Webserver only: 0 including Core: 0
18116976 : Ram usage: Webserver only: 0 including Core: 0
18119084 : LoadFromFile: notification.dat index: 0 datasize: 996
18119089 : LoadFromFile: notification.dat index: 1024 datasize: 996
18119092 : LoadFromFile: notification.dat index: 2048 datasize: 996
18119129 : Ram usage: Webserver only: 0 including Core: 0
18121330 : Ram usage: Webserver only: 0 including Core: 0
18121337 : WD : Uptime 302 ConnectFailures 0 FreeMem 13584
18128862 : Ram usage: Webserver only: 0 including Core: 0
18130833 : Ram usage: Webserver only: 0 including Core: 0
18135120 : Ram usage: Webserver only: 0 including Core: 0
18136605 : Ram usage: Webserver only: 0 including Core: 0
18138356 : Ram usage: Webserver only: 0 including Core: 0
18140067 : LoadFromFile: notification.dat index: 0 datasize: 996
18140076 : LoadFromFile: notification.dat index: 1024 datasize: 996
18140078 : LoadFromFile: notification.dat index: 2048 datasize: 996
18140152 : Ram usage: Webserver only: 0 including Core: 0
18144694 : Ram usage: Webserver only: 0 including Core: 0
18144700 : EVENT: Clock#Time=Sat,08:48
18144702 : EVENT: Clock#Time=Sat,08:48 Processing time:2 milliSeconds
18148558 : Ram usage: Webserver only: 0 including Core: 0
18151336 : WD : Uptime 303 ConnectFailures 0 FreeMem 12568
18153230 : Ram usage: Webserver only: 0 including Core: 0
18153763 : Ram usage: Webserver only: 0 including Core: 0
18155000 : Ram usage: Webserver only: 0 including Core: 0
18155592 : Ram usage: Webserver only: 0 including Core: 0
18156416 : Ram usage: Webserver only: 0 including Core: 0
18156838 : Ram usage: Webserver only: 0 including Core: 0
18164949 : Ram usage: Webserver only: 0 including Core: 0
18165234 : Ram usage: Webserver only: 0 including Core: 0
18165587 : Ram usage: Webserver only: 0 including Core: 0
18170770 : Ram usage: Webserver only: 0 including Core: 0
18170947 : Ram usage: Webserver only: 0 including Core: 0
18171120 : Ram usage: Webserver only: 0 including Core: 0
18171300 : Ram usage: Webserver only: 0 including Core: 0
18171733 : Ram usage: Webserver only: 0 including Core: 0
18177686 : Ram usage: Webserver only: 0 including Core: 0
bcn_timout,ap_probe_send_start
18181336 : WD : Uptime 303 ConnectFailures 0 FreeMem 10832
18182865 : Ram usage: Webserver only: 0 including Core: 0
18183367 : Ram usage: Webserver only: 0 including Core: 0
18188878 : Ram usage: Webserver only: 0 including Core: 0
18190025 : Ram usage: Webserver only: 0 including Core: 0
18203237 : Ram usage: Webserver only: 0 including Core: 0
18203809 : Ram usage: Webserver only: 0 including Core: 0
18203817 : EVENT: Clock#Time=Sat,08:49
18203819 : EVENT: Clock#Time=Sat,08:49 Processing time:2 milliSeconds
bcn_timout,ap_probe_send_start
bcn_timout,ap_probe_send_start
bcn_timout,ap_probe_send_start
18496844 : Ram usage: Webserver only: 0 including Core: 0
18496850 : WD : Uptime 304 ConnectFailures 0 FreeMem 8736
18496856 : EVENT: Clock#Time=Sat,08:53
18496858 : EVENT: Clock#Time=Sat,08:53 Processing time:2 milliSeconds

After the last three bcn_timout,ap_probe_send_start messages the device was frozen for about 60 secs even in serial console, then it continued working without reboot.

Yes it's running at 80 MHz - System Info says: ESP Chip Freq: 80 MHz

The ESP web responses are really fast until I try to refresh too quickly...

morning all... my units run stable for about 10hours now. however issues sometimes rose only after a day or two, so I leave them running like this. one unit rebootet twice during the night (out of 16 D1's) which could be plugin related or so... the sonoff basics also run smoothly.

I know you could not see or explain TD-er, but did you use the GitHub builds as a test ?
I might go back over the source for releases in past month and compare vs. homebrew
I've had plenty that I deemed useless, one of my tabs in Notepad++ has the list

About F5, is not really an issue - I use it as a benchmark, I do not have auto-refresh for the F5 key,
so I am manually hitting it, watching RAM usage and serial log output at same time..
"lwip high bandwidth" from memory had almost instant
LmacRxBlk:1 errors and they were not recoverable..
Which reminds me I am meant to write something up about.. on to do list..
And I will try different compile settings again as a measure

Hmm, I don't know if the reason was I configured a LCD display at position 12 but soon after that I lost the connection to AP and can't reconnect anymore (No AP found despite it's visible by wifiscan). After reboots the same issue. Then the device entered an AP mode but maybe due to underlines in SSID name it said:

1127917 : WIFI : Error while starting AP Mode with SSID: ESP_01_1 IP: 192.168.4.1

But the device was visible as AI-THINKER_XXXXXXX, I was able to connect it but when I tried to change the device name etc. I experienced losing connections to device, crash & reboots etc... :-(

Update - after several attempts i renamed the device to remove underline, but it's there's still before device number:
599417 : WIFI : Error while starting AP Mode with SSID: ESP-001_1 IP: 192.168.4.1
And the device is visible still as AI-THINKER_XXXXXXX
Çan't connect to my AP as client anymore... I'll try to reset factory settings...

Update2 OK... Factory reset, device visible as ESP_Easy_0 AP... Set the WifiSSID and WifiKey through serial console, device immediately connected to AP...

most of my units still run fine... however I don't really think it's a matter of compiling twice... the only difference is, some libraries are compiled only the first time, then after that, when the SDK sees that they didn't change it won't recompile...

one other thing I'm going to try now is instead of specifying the "D1 Mini" board in arduino I set all parameters myself and use the "generic 8266" module... we'll see if this makes a difference...

currently the only thing I see are some spontaneous reboots from certain units... but this could be related to other things...

one question from me: sonoff basic only has 1M memory... any chance to get these things OTA updated? it obviously always claims "not enough memory"... any ideas?

PS: about compiling issues, I always used self compiled binaries (other plugins) from the beginning, and had the "same" stability issues... so some of them could really be platformIO related, other I think were "real" issues...

You can use less plugins when compiling. Look at https://github.com/letscontrolit/ESPEasy/blob/mega/src/define_plugin_sets.h

You'll never be able to OTA directly, but this way it's at least small enough to OTA in two stages. Look at the wiki for that one :)

that's exactly what I did... that's why I'm always self compiling... but even with only 2 plugins enabled the sketch uses 500k...
that's why I'm looking for a different solution (eg. no web-interface... etc.)..

500k is small enough. Like I said, two part OTA. You can't update directly. Look at the wiki :)

thx.. searching.... ;)

add: somehow I missed the two-step-part....

With 1M Sonoffs you will need to use an Initial Uploader which has DOUT flag in header,
from memory the one on the Wiki is not, but may have been updated recently..

I remember something like that, Just searched an I'm using this one for OTA: https://github.com/soif/EspBuddy/blob/master/firmwares/ESPEasyUploader.OTA.1m128.esp8266.bin

Yes, checked that is DOUT
Here's one I put together it's smaller
Initial_Firmware_Uploader_Sonoff_1M_DOUT.zip

I said I'd do it because I was very curious........
As a comparison to Self_Compiled mega-20180422 to GitHub released, so far - I tried just one
ESP_Easy_mega-20180422_normal_ESP8266_1024.bin
I left a comment and log on it's behaviour here https://github.com/letscontrolit/ESPEasy/issues/1301#issuecomment-383433822 (GitHub released)
Same source self-compiled, no surprise, I see different behaviour to the Nightly.
It stayed connected, shock horror !
GitHub released mega-20180422, flashed over the top, the exact same behaviour as I reported when I first tried it.
Would not stay connected, WIFI : Disconnected! Reason: '(200) Beacon timeout' cycling over & over
The cause needs investigating... sigh
GitHub released = not to be trusted ?

I posted the Arduino compiler flags yesterday or so. What flags does Travis use?

I have 16 D1's and 3 basic's running on f69e476 self compiled, Core 2.4.1, lwIP2 High bandwith.
Nearly all of them with an uptime of >900Min. now and still working fine. 2 of them switched to AP mode about an hour ago and didn't reconnect till now. I'll wait until tonight and see if they recover.

unfortunately the behaviour is not really reproducable, and Ican't really provide any evidence where the issues arrive, but I'll continue to look for any flaws and report it if I find anything...

@Oxyandy : I also recompiled a upload for my basic's works fine! thanks again for pointing me to the right direction.. ;)

@s0170071
They are stored in .travis.yml & platformio.ini
I sped read through, tada, look what I found
skip_cleanup: true
Edit: Further reading I'm not sure if this refers to making a clean build ?
I admit is all new to me... ??? Disabling Caching ?

Travis does a clean build every time, since it even downloads the python environment, etc.

The nightly builds are done by @psy0rz , not by Travis

Okee dokee, so are they clean builds ?
Trying to understand why they behave different ?

What‘s about comparing MD5 of bin1 and bin2?
It will indicate If it‘s a compiletime or runtime Problem...

@susisstrolch you mean when you compile yourself ? Its because the MD5 is calculated after compilation. There is a script that does that for you.
https://github.com/s0170071/CRC4ESP

-edit: @susisstrolch I may have gotten you wrong :-)
yes, you can compare the md5, they should be identical. If you do it offline, you could as well compare the binaries bit-wise. This way you can even tell where the deviation is. If you're advanced, you may even back-track that to the code. The .elf file should have all the info.

Anyway, I don't care so much about what exactly differs. I think it more important to make sure were all on the same binary- no matter who compiled it.

So again, what are the compile flags for Arduino, platformio, nightly and travis ? Win / Linux,.

Nö - I mean instead of speculating about differences in compiler output in automated builds simply compare the MD5 of both builds. So you can tell exactly if they differ.

Flags for Arduino IDE on linux are:

build.lwip_lib=-llwip_gcc
build.lwip_include=lwip/include
build.lwip_flags=-DLWIP_OPEN_SRC
build.vtable_flags=-DVTABLES_IN_FLASH
build.float=-u _printf_float -u _scanf_float


compiler.cpreprocessor.flags=-D__ets__ -DICACHE_FLASH -U__STRICT_ANSI__ "-I{compiler.sdk.path}/include" "-I{compiler.sdk.path}/{build.lwip_include}" "-I{compiler.libc.path}/include" "-I{build.path}/core"

compiler.c.cmd=xtensa-lx106-elf-gcc
compiler.c.flags=-c {compiler.warning_flags} -Os -g -Wpointer-arith -Wno-implicit-function-declaration -Wl,-EL -fno-inline-functions -nostdlib -mlongcalls -mtext-section-literals -falign-functions=4 -MMD -std=gnu99 -ffunction-sections -fdata-sections

compiler.S.cmd=xtensa-lx106-elf-gcc
compiler.S.flags=-c -g -x assembler-with-cpp -MMD -mlongcalls

compiler.c.elf.flags=-g {compiler.warning_flags} -Os -nostdlib -Wl,--no-check-sections -u app_entry {build.float} -Wl,-static "-L{compiler.sdk.path}/lib" "-L{compiler.sdk.path}/ld" "-L{compiler.libc.path}/lib" "-T{build.flash_ld}" -Wl,--gc-sections -Wl,-wrap,system_restart_local -Wl,-wrap,spi_flash_read

compiler.c.elf.cmd=xtensa-lx106-elf-gcc
compiler.c.elf.libs=-lhal -lphy -lpp -lnet80211 {build.lwip_lib} -lwpa -lcrypto -lmain -lwps -laxtls -lespnow -lsmartconfig -lairkiss -lwpa2 -lstdc++ -lm -lc -lgcc

compiler.cpp.cmd=xtensa-lx106-elf-g++
compiler.cpp.flags=-c {compiler.warning_flags} -Os -g -mlongcalls -mtext-section-literals -fno-exceptions -fno-rtti -falign-functions=4 -std=c++11 -MMD -ffunction-sections -fdata-sections

@s0170071 I tried to ask Edwin, but he hasn't replied yet.
So for now, we don't know exactly what's being done for the nightly builds.
And still the strange behavior remains (which I experienced also on my setup) that compiling it twice gives different results.
The binaries between builds cannot be compared via a checksum. There is a build timestamp included, which will be different for any build. So compiling the same source 100 times will give 100 different checksums.
But at least it should give the same functionality and that appears not to be happening (sometimes) between the first and second build. And _that_ is not how it should be.

So, for test puposes and to nail it down I‘d skip setting all the precompile changes between those consecutive compiles to get a comparable src.

@s0170071 so we get a lot of debug Information because of the -g option when building with Arduino IDE.
Maybe a point to reduce size...

Size reduction is something we must look at in the future, but you have to take care when switching optimization flags. Some may lead to even harder to reproduce issues.

the two untis recovered some hours later during the day, but others are failing randomly... two things I found until now:
I get entries in the log saying:
96493069 : IP blocked: 0.0.0.0 Allowed: 10.0.0.0 - 10.0.255.255
96517068 : WD : Uptime 1608 ConnectFailures 0 FreeMem 11544
the first line seems strange to me, what tries to connect with IP 0.0.0.0? this could be related to the issues I saw some days ago, that the unit is reachable, but it tells me it has IP 0.0.0.0.

second thing, even though nothing was done during the last 24h I see sudden jumps in CPU load, even in "small" units, which are just using the switch functionality (eg. Nr. 7 in the graph attached).

Any chance to find out, why this sudden changes in CPU load occur? the log does say nothing specific.. (CPU Load from today and yesterday)...
image
image

Is de CPU load jump at the moment of a reboot?
The way the CPU load is computed is just a bit strange.
It takes the number of times the main loop function is executed in 30 seconds.
This is compared with the maximum count observed.
This means that during operation the maximum count observed can increase, but also that it can suddenly decrease after a reboot.

no, not at a reboot... just "randomly" at some point... interestingly it happens to multiple units at once... I have no clue what happened there (was not home then)... especially units 12-14 are really plain boards with no tasks except rssi, load, uptime and mem... also memory graphs show no sign of significant changes...
still now, load on some units show ~50% but web interface is speedy..

also I have on one unit a nfx driven watch that changes every second the hands position. interestingly the clock stopped at one point in time, but the unit continued to work without issues... like this task was not run anymore... but this could be plugin related (as it's one of the playgound).. but it could probably point to issues...

but like I said, I'm currently absolutely without any clue where all these issues can come from.. really jusst poking around trying to find something.... and giving feedback to everyone for brainstorming....

anyways, thanks a lot for your efforts and help... if I do find something reasonable I will report...

It could also mean all some service changed availability at some moment.
Either it became available which caused the nodes to actually do more work.
Or it was no longer available. ESPeasy tries to do some ping to a host to detect its availability. If that ping fails, it will halt the node during the time-out period.

how is that "ping" done? because I see some "host unknown" from time to time... could it be, that this timeout is quite large? or the ping ord DNS lookup have to short timeouts? acutally I don't use FQDN's anymore, just IP's, so DNS is less likely to be an issue...

I'm just updating with the latest commit... some units are easy to update and quite fast, others very slow...

I have 4 AP's running on the same SSID, it yould also be that it sometimes selects a worse one and then have speed issues?

I was planning to change the way a ping is done to async ping.
When trying to make a connection, there is a check to see if the host is available. That is done via ping.

and that's a blocking call?

It is indeed, that's why I want to change it to some async variant.

hmm.. this could explain, when network connectivity is weak, that things are not really "fluent"... especially when trying to upload a new sketch... is it possible to increase the ping-timeout if the net has a lot of latency?

It is only done when making a connection.
The experienced load of a MQTT connect retry which will fail is a lot more noticeable.
There is not really a lot of hosts we're connecting with, so maybe there should be some async availability check running in the background to help deciding whether or not to reconnect.
N.B. a DNS lookup may also be quite blocking when it will eventually fail.

could it be that 0.0.0.0 is the secundary DNS?

I realised this with DNS, that's why I only use IP addresses now... I don0t use MQTT at all, only fhme controller plugin as well as regular json queries (from fhem with HTTPMOD plugin).

one thing that I'm probably gona try is disabling UDP inter-ESPEasy networking... I can't get rid of the feeling, that this has some influence on the units, especially when running 15+ units that send out regular updates...

after updating all units to the latest mega commit, all CPU loads went back to "normal"... this morning around 8 I rebooted Nr. 4 and 9, and see what happened to the CPU load, nearly all units had a peak of about 30min. and went back to normal after that....

just a quick idea: is it possible that received inter-ESPEasy UDP events are "redistributed" to other units and therefore could generate a loop and fill the network stack?
image

I never looked into the code of this 'inter ESPeasy communications', therefore I can't say anything about that.
I would expect this protocol is only sending its own data and not echo'ing the rest.
This protocol consists of 2 parts:

  1. Stating its own presence.
  2. Sending parameter values of plugins via what used to be "global sync".

That last one is now replaced by "controller c_013".

But not sure if there isn't a loop possible. For example with dummy devices, MQTT import, etc.
It may also be different behavior between old versions and new versions that use "controller 13".

ok, thanks for the quick reply... I'm going to turn it off now, resp. setting the UDP port to 0 and see if anything changes...

Add: after changing it I see about 50% of the units rebooting (without beeing told to do so)... and some "HTTP : connection failed" in the log....

I have noticed a similar problem with the CPU load. The wemos is running a self compiled FW with 13 plugins. I am using Rest API with Pimatic. As you can see for some kind of reason the load goes up to 90% and above.
image

for info: since I turned off Inter-ESPEasy networking by setting the port to 0 in advanced settings it seems (most of) my issues are gone! all 20 units have uptimes >20min. and are still regularly reporting values. web-if up and running. also CPU-graph doesn't show anymore of these sudden jumps. Only one unit changed to AP mode, I'll see if it recovers..
probably this need investigating (in the source)... with lots of units the UDP-stuff is probbably overloading the network stack...
hope this helps others that face similar issues...

Here my experience:
6 units all with Static IP and Inter-ESPeasy Network active.
Latest firmware loaded was "Mega 20180505 Manual built twice". (but previous firmwares were working very well too).
They have been running for almost 3 days without any issue.

immagine

That's one of the reasons I want to wait a few days before I do some wifi/network fixes. Just let it run for a while to see what's actually wrong and try to read some issues on the Arduino issues list.
I already noticed that a number of issues suggest to disable power management for some wifi configurations. Apparently some combinations of ESP8266 + accesspoint do not work well with power management features enabled (which are enabled by default)
So that's an option to add to ESPeasy.

I'm looking for more ideas the next days and Friday/Saturday I have more time to code.

For reference my Access Point is ASUS RT-AC68U with Merlin firmware

I guess most problems are with factory default firmware for the more budget models and perhaps somewhat older accesspoints that were not aware of power saving options of modern wifi devices.
Question remains if it is possible to negotiate such features to auto detect these features.

What abuout leaving disabled by default and enabling only if requested in the settings page?
The power saving options make sens for battery operated devices only, I suppose.

They also make sense for other purposes.
More power means more heat and some are enclosed with sensors.... ;)

In addition to the high processor load. I went back to a FW dated 16/03 with 2.3.0 Core and all became normal again. Loads now max. 25%. Also the response time of the wemos is much better again. With both the 08/05 and 16/03 I do not have WiFi disconnects at all. Still no clue what caused the high load. Also in both cases I did not make use of udp.

since disabling UDP the units run wihtout issues, except the ones that have a LCD attached. I think if you have lots of units talking (>20) they get either too busy decoding all the UDP messages or have some memory leak or similar. That would also explain the spntaneous reboots of units after a start of another one. just MHO... need to debug more to be sure...

It may also be related to doing longer tasks without time for some calls to yield(), which is also done when calling delay().
I can imagine the LCD plugin (and maybe some others also) do some tasks that take > 10msec.

the units with LCD seem to get busier over time... when trying to reflash I need to reboot them first, because after some hours the upload does not work anymore (or takes ages... )

all other units run fine now, but as said, inter ESP announcing has to be disabled...

at least WiFi is stable like this, no other issues with connectivity so far... (running 20+ units by now)

Do you have logging enabled on the serial port?
Could you maybe set that to "None"?

hmm.. interesting point, I have the default "info" on the serial port logging, but disabled the serialport completely forther down... could that lead to problems?

I'm going to change it to "none" on the units, I'll see if that makes a difference..

I have heard reports before, about data not being fetched from the serial buffers will pile up.
And I just noticed myself that on a running node, when I connected the serial monitor I received data back to the moment it booted, like a minute before.

Herewith the difference between core 2.3.0 and 2.4.1 May 10 i have changed the FW from 2.4 back to 2.3. The settings and rules for both are exactly the same.
image

@jopiekr : what is the software you are using to print the CPU load?

@gii1967g it is pimatic. Have it running on an OrangePi One for more than a year without any problems. As a back up also on a Raspberry Pi. pimatic.org

changed all serial-logs to none now... I'll see what happens...

what's also remarkable is the wifi RSSI.. if the units have get a weak signal, they can be quite unresponsive... as I have 4AP's running it seems a bit "random" to which they connect and don't always take the strongest one....

I have noticed above -77dB they can become unresponsive. Also when they are not battery operated check the lease time renewal of your router. I have made a rule to reboot them after 12000 minutes.
image

They should reconnect to the last one. The first reconnect attempt is to the BSSID of the last active connection.
But I guess we should add the BSSID to the settings.

Last few day's I have been thinking about the wifi issues a lot and I came up with some kind of work-around for the bugs probably present in either the core lib, or combination with the AP firmware versions around
Currently the core lib does a scan when trying to connect using only a SSID + pwd.
This way it is so much faster when you provide the BSSID + channel when connecting.
Because then it doesn't have to scan.
So why not do the scan ourselves, find the known SSID, store all BSSIDs + channel for the matching networks and then try to connect only using BSSID + channel.
Then you also have the automatic failover and have full control over when to consider a connection to be renewed. (and thus restart services)

You also know the strongest RSSI, so connect to the strongest first and always try to re-use the last used first
And when the connection is unstable, try to disable the power-save feature which is enabled by default in the 2.4.x core.
This power save feature may cause wifi reconnects, stalls when browsing through the web pages, or even rejects when connecting.

With the power save feature enabled, the wifi is switched off for a while and then enabled again just in time to receive the beacon signal of the accesspoint.
If those are out of sync, some packets may not arrive to the ESP and will only be retransmitted when the next beacon signal is received. (which may take a while)

I don't know much of the specifics of these power save issues, but I know enough of firmware-development to know standards may not be implemented the same among vendors. So it is very likely there are some combinations of hardware that will operate less optimal with power save features enabled. It may even be some other WiFi device which delays the wifi beacon interval of the accesspoint. There are just too many variables.

Such a temporary disconnect may also affect DNS lookups and other connection interruptions which may stall the ESP for a while. This could also affect CPU-load, since that (poorly defined) value is based on how many times the loop() function is being run in 30 seconds. If a call to some DNS resolve request stalls the ESP, the loop count will then be quite low and thus a high reported CPU load.

@jopiekr You could also issue a reboot I guess from the rules.
I will first look into disabling the power save options (and make it selectable ofcourse)

@TD-er about WiFi handling: I completely agree with the power-management function. One should be able to turn it off, as it can cause realy strange issues and laggy devices...

For the reconnection to the AP I see it different, I'd always try to connect to the strongest AP to ensure good connection. only after that I'd probably take the last one. Simply because if the "main" AP crashes/reboots/doesn't answer etc. the unit would connect to another one. From that point on forward it will always take that (worse) one until this one also fails... it will never go back to the strong one, even if this one is coming back up... also if you move the devices around, it could be that it selects the previous AP even though that one is much farther away/has a weaker signal...

I agree though in scanning within your code and then decide which one to take based on RSSI and known BSSID...

It is only the first attempt to reconnect to the last known BSSID.
This will result in a more stable connection when using repeaters.
Those repeaters can sometimes drop connections when they are too busy handling other tasks and they may appear less strong when scanning. The cheap ones only have one radio for receiving and transmitting, so when they receive data, it may be they are not broadcasting their beacon signal at the moment a scan is performed. So then the RSSI of another AP may appear stronger at that moment.

ok, how about a "regular" scan that keeps track of strength? or at least after a reboot it should reevaluate what accesspoint it chooses...

At reboot and if the first attempt to reconnect fails, it will perform a scan and pick the strongest match for the set SSID.

I will change that into storing the BSSID of the preferred AP in the settings, to make that BSSID always the preferred one.
The reconnects are really fast if the channel is also known. (sub 200 msec is possible, depending on the AP used)
Also setting static IP does take about 20 - 25 msec, compared to DHCP which takes 2.5 - 10 seconds.
That would be ideal for battery powered devices.

So there is room for improvement :)

Sounds very promising. Up till now i got 2 months of battery life with 2AA batteries step up to 3.3V and one DHT sensor sending every 900s. This is on a RobotDyn ESP Pro board with removed voltage regulator.

@TD-er Just tried the build from yesterday. I can't see any difference in connect times if I use a static IP or dhcp. Both connects take about 3 seconds after reset.

Then you have a quick DHCP server.
My Fritzbox takes about 2.5 seconds for the DHCP, after the 2.5 seconds connecting to wifi.
So the MQTT connect + NTP time setup takes about 6 - 6.5 seconds from boot.

Fritzbox 7490.

quick update: after disabling InterESP UDP (and removing C13 from 1M units) and setting the serial log to "non" (also disabling the serial-port) nearly all my units run stable for the last 30+ hours... a mix between D1 Mini, D1 pro, Sonoff Basic, Dual and 4ch... 20+ units... running on commit from mega-20180511, self compiled with Arduino IDE...

I have the 7581 as modem and 3* 1750E as accesspoints.

BTW: I'm running on Mikrotik AP's (and one really old USR). Going to update the units now with the latest commit from tonight...

The latest commits currently only deal with the JSON related code (log viewer + update sensor values).
Not yet with the wifi code.

sure, but wifi seems to bae "stable enough" with the latest commits, so I want to be up to date with my units ;)

sorry to bring this up again, but I still expiereince the issue that the ESP thinks it has an IP of 0.0.0.0 and stops talking to the server, bit on an network and web level everything is fine! see attached screenshot, tried it with different version combination of ESPEasy and esp8266...
does someone eslse expierience this?
untitled

Very strange indeed, since I wonder how you can see the web page at all with such IP config.
Or do you connect to the ESP via its accesspoint function?

no, connected via network directly. ping and http works without issues, speedy, (see the client IP is 10.0.0.10 which is my laptop, internal net is 10.0.0.0/16)... yes, quite strange though..
cuold be DHCP related or so.. after a reboot everything is fine again..

could it be related to the fact, that I only enabled one controller (FHEM) and no MQTT enabled controller? I saw a lot of mqtt code in the sources, outside of the controller plugin... just a guess... but I think it's not really related...

Maybe dhcp expired and not renewed?

could very well be... probably the network stack still has an active IP but if renew fails the coresponding config gets zeroed... not sure how the DHCP code works though, but it could explain the state I0m seeing.

also I found that when the server (in my case fhem) is not responding fast enough a numebr of times, the units start to reboot after some time. could be a problem in the plugin code or the underlying tcp stack... I did some performance tweaking on the server, since then the units run much more stable (som ehave uptimes over 48h now)..

I've already seen 10+ days of connection-uptimes (uptime without any reconnect) with the latest versions.
It is possible to get the units to fill up their ram with lots of requests.
And I've seen the LWIP doing strange things when doing lots of requests. (reading from memory not containing data related to that request.)

Today one node stop responding again. He disconnected from wifi. I could not connect to the "esp" network. He stopped sending data to the controller. I had to reboot him. Maybe a watchdog would be a good solution. If, for example, an hour is disconnected from the wifi, it reboots. Or maybe it can be done with rules, but I do not know how :)

Today I experienced a lot of Watchdog actions while debugging a plugin.
And I know now that sometimes when the watchdog intervenes, a node can remain halted.
So a watchdog is not the perfect solution.

Is it possible that hanging node of yours was never rebooted after flashing? (press reset or power cycle)

It is possible that there was no reboot after flashing. But it was a flashing via www, not a serial.

OK, then it shouldn't matter, if you flashed OTA.
As long as there has been a proper reset/reboot after the serial flash.

Well, after struggling many stability issues and strange wifi troubles with latest firmware releases I had to get back to earlier versions in the end. For instance, until power outage happened recently, one old ESP12E node with mega-20180311dev was working for 70 days, sending temperature data to ThingSpeak.
On another node after upgrade to mega-20180522dev I was experiencing a reboot due to exception about every 24 hours despite reset to defaults, just running without any device configured, no NTP configured, no controller... Never survived 48 hours. After downgrade to mega-20180324 two and half days ago, kept the config, just enabled NTP again and so far it's running. Although there are some bugs and missing features in these older versions, for me it's currently the best choice.

There is not much anyone can do if the issues are not reproduceable reliably.
What helps a bit is scheduling a reboot every night. You can use the rules for that.

I know but I prefer a stable node without scheduled reboots. I don't know if the stability was significantly decreased due switching to core 2.4.1 (maybe which is not mature enough yet) or if it's related to ESP Easy redesign but it happened despite the maximal effort of all ESP Easy contributors. I really appreciate the hard work all of you but currently I can't use the latest ESP Easy releases anymore.

I think it is also related to the used plugin or maybe combination of plugins.

Last week I worked on looking into the effects of timings and I am sure it will have significant effect on time-critical tasks.

I just looked at some of my nodes, all running official builds:

Binary filename ESP_Easy_mega-20180513_normal_ESP8266_4096.bin

Unit 3

  • Uptime 16 days 18 hours 26 minutes
  • Connected 5d23h07m
  • Last Disconnect Reason (201) No AP found
  • Number reconnects 2

Unit 5

  • Uptime 11 days 5 hours 22 minutes
  • Connected 5d22h57m
  • Last Disconnect Reason (201) No AP found
  • Number reconnects 4

Unit 6

  • Uptime 11 days 5 hours 23 minutes
  • Connected 45 m 1 s
  • Last Disconnect Reason (201) No AP found
  • Number reconnects 58

Binary filename ESP_Easy_mega-20180619_test_ESP8266_4096.bin

Unit 7

  • Uptime 13 days 20 hours 51 minutes
  • Connected 5d23h05m
  • Last Disconnect Reason (202) Auth fail
  • Number reconnects 2

About 6 days ago, I had some issues with one of my WiFi accesspoints, which I had to restart.

Unit 6 is connected to the same as unit 3 & 7, but it has a lot more reconnects.
Those 3 units are right next to eachother, within a meter from eachother to compare different CO2 sensors (MH-Z19 A, B and SenseAir S8) and all powered by the same power supply (IKEA 3-port USB charger).

The only difference between them is that the one with more reconnects has the Senseair sensor.
So it could be the implementation of that sensor does put more strain on the WiFi routine (less delay calls), which could lead to WiFi instability.

Could you give a list of plugins used?
Also I made a pull request yesterday which does log a lot timing statistics. Maybe you could make a build based on that one and run it for a few minutes to get some idea on plugins using way too much time.

I am always using the official builds as I am not able to prepare and maintain the developping environment for these devices.
The mentioned release mega-20180522dev, as I described above, was completely empty configuration so absolutely no plugins used, no rules, I have even deleted the default Controller Nr1 in the end. Nothing could stop the node from rebooting due to exception at intervals about 24 - 40 hours.

Dont know if it is wifi issue - it not look like it, I have managed to set static IP adresses for wifi, but espeasy still fetch it by dhcp and set different.

1104 : WD   : Uptime 0 ConnectFailures 0 FreeMem 21800
1105 : S
W   : State 1.00
1106 : EVENT: x#w=1.00

scandone

state: 0 -> 2 (b0)
state: 2 -> 3 (0)
state: 3 -> 5 (10)
add 0
aid 2
c
nt 

connected with BJ3, channel 12
dhcp client start...
4350 : WIFI : Connected! AP: BJ3 (E8:DE:27:4F:66:86) Ch: 12 Duration: 3760 ms
4351 : EVENT: WiFi#ChangedAccesspoint
4355 : IP   : Static IP : 192.168.2.184 GW: 192.168.2.1 SN: 192.168.2.0 DNS: 8.8.8.8
4360 : WIFI : Static IP: 0.0.0.0 (ESP5-5) GW: 0.0.0.0 SN: 0.0.0.0   duration: 11 ms
4367 : EVENT: WiFi#Connected
4374 : Webserver: start
4374 : WIFI  : Arduino wifi status: WL_DISCONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
ip:192.168.2.123,mask:255.255.255.0,gw:192.168.2.1
4400 : WIFI : Static IP: 192.168.2.123 (ESP5-5) GW: 192.168.2.1 SN: 255.255.255.0   duration: 50 ms
4401 : EVENT: WiFi#Connected
4406 : WIFI  : Arduino wifi status: WL_CONNECTED ESPeasy internal wifi status: ESPEASY_WIFI_SERVICES_INITIALIZED
4500 : MQTT : Intentional reconnect
4501 : LoadFromFile: config.dat index: 28672 datasize: 724


@uzi18 Have you set all fields for static IP config?

If so, then I am afraid it is a known issue (to me), where there is some previous session stored in a region where we don't (yet) erase data at a factory reset.
This means at this moment there is no other way than to wipe all of the flash and start again with a recent version of ESPeasy.
The later versions do set a value to not make the wifi settings persistent.

@TD-er Yes, all data filled - as you see in log.
I have flashed NEW module, with
INFO : Plugins: 71 [Normal] [Testing] (ESP82xx Core 2_4_1, NONOS SDK 2.2.1(cfd48f3), LWIP: 2.0.3)
and it work like that.
Module was only taken from original bag and flashed, espeasy never was before here.

@TD-er: Two thoughts on that:

  1. My non ESPEasy heating to MQTT broadcaster is working flawlessly using the latest pubsub client. Does re-connect and so on. Maybe the ESPEasy connectivity surveillance is interfering with whats already been done by the core library. Should we have a way to disable all that additional ping-reconnect-wifistate stuff ? Just for testing ?
  2. Are you aware of the wifi auto power down ? https://blog.creations.de/?p=149

Would be nice if someone with rather unstable wifi could test this PR: https://github.com/letscontrolit/ESPEasy/pull/1562

@TD-er just stumbled upon https://github.com/esp8266/Arduino/pull/4718
deals with lwip reconnect issues. Is fixed in the meantime. Maybe you want to skip through it...

I always use the latest GIT Version from the esp8266.. that's why I probably don't see the 0.0.0.0 issue anymore...

Yesterday again one node after the restart of the router lost contact with the network.
Rules worked correctly.
This is my own wall switch, it is difficult to disassemble it for reset.

To facilitate this in the future, I modified the rules:

on S1#Switch do
timerSet,1,5 
if [R1#Relay]=1
gpio,12,0
else
gpio,12,1
endif
endon

on S2#Switch do
if [R2#Relay]=1
gpio,13,0
else
gpio,13,1
endif
endon

On Rules#Timer=1 do
if [S1#Switch]=1.00
 reboot
endif
endon

Now life will be simpler :))

I do reboot every 24h. This revives one node with a firmware from 4 weeks ago twice a week.
Maybe this should be a permanent feature....

Is this still an issue? If so please reopen.

Our longest thread on the issue list....

Was this page helpful?
0 / 5 - 0 ratings

Related issues

Grovkillen picture Grovkillen  ·  6Comments

s0170071 picture s0170071  ·  3Comments

jobst picture jobst  ·  5Comments

DittelHome picture DittelHome  ·  5Comments

ronnythomas picture ronnythomas  ·  3Comments