Mongoose: No primary server available

Created on 1 Dec 2015 · 76Comments · Source: Automattic/mongoose

I have an issue that is rather difficult to debug, and was wondering if anyone sees anything wrong with my configuration.

Error no primary server available

Nodejs version 4.2.1 and mongoDB version 3.0.7 with mongoose 4.2.8.

This seems to happen randomly and will open many connection until I finally restart the node process. The cluster is healthy at all times during this error. This error happens hundreds of times per hour. There does not seem to be any consistency as to when the error will begin. For example, it occurs when the cluster is operating normally and no changes to the primary have been made.

This is what the db stats look like. As you can see the number of connections will steadily increase. If I kill the node process and start a new one everything is fine.

screen shot 2015-11-30 at 5 21 01 pm

Config

  // Connect
  mongoose.connect(config.mongo.connectionString, {
    server: {
      socketOptions: {
        socketTimeoutMS: 5 * 60 * 1000,
        keepAlive: 1
      }
    },
    replset: {
      socketOptions: {
        socketTimeoutMS: 5 * 60 * 1000,
        keepAlive: 1
      }
    }
  });

Connection String

mongodb://username:[email protected]:27000,mongo-2.cz.0200.mongodbdns.com:27000,mongo-3.cz.0200.mongodbdns.com:27000/dbase

Stack trace

node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/replset.js:860pickServer    
node_modules/mongoose/node_modules/mongodb/node_modules/mongodb-core/lib/topologies/replset.js:437command   
node_modules/mongoose/node_modules/mongodb/lib/replset.js:392command    
node_modules/mongoose/node_modules/mongodb/lib/db.js:281executeCommand  
node_modules/mongoose/node_modules/mongodb/lib/db.js:305command 
node_modules/newrelic/lib/instrumentation/mongodb.js:177wrapped 
node_modules/mongoose/node_modules/mongodb/lib/collection.js:2327findAndModify  
node_modules/mongoose/node_modules/mongodb/lib/collection.js:2265findAndModify  
node_modules/newrelic/lib/instrumentation/mongodb.js:177wrapped [as findAndModify]  
node_modules/mongoose/lib/drivers/node-mongodb-native/collection.js:136(anonymous function) [as findAndModify]  
node_modules/mongoose/node_modules/mquery/lib/collection/node.js:79findAndModify    
node_modules/mongoose/lib/query.js:1833_findAndModify   
node_modules/mongoose/lib/query.js:1621_findOneAndUpdate    
node_modules/mongoose/node_modules/kareem/index.js:156none  
node_modules/mongoose/node_modules/kareem/index.js:18none

can't reproduce help wanted

Source

ChrisZieba

All 76 comments

Nothing leaps out at the moment. Are you sure none of the mongodb servers are crashing? Also, can you maintain a steady connection using the shell?

vkarpov15 on 1 Dec 2015

Running the command db.runCommand( { replSetGetStatus : 1 } ) while the error was occurring produces "health" : 1, on all 3 nodes. There is also a primary set "stateStr" : "PRIMARY", on one of the nodes.

ChrisZieba on 1 Dec 2015

Are you connecting using the same connection string, using the DNS? Also looks like your storage flat-lined after the issue, can you double check and see if you've run out of hard drive space on one of your machines?

vkarpov15 on 1 Dec 2015

Are you connecting using the same connection string, using the DNS?

I wasn't using the same connection string. Do you think using the private EC2 IP addresses would resolve this?

Not sure what's causing the storage to max out like that, but even after booting new instances the issue with no primary servers still occurs with plenty of space available.

ChrisZieba on 1 Dec 2015

The EC2 IP addresses may help, depending on how your replica set is configured. Can you show me the output of rs.status() from the shell?

vkarpov15 on 2 Dec 2015

This is the rs.status() while the connections are on the rise.

{
    "set" : "mongo2",
    "date" : ISODate("2015-12-04T23:39:32.520Z"),
    "myState" : 1,
    "members" : [
        {
            "_id" : 6,
            "name" : "mongo-8.loc.0600.mongodbdns.com:27000",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 444053,
            "optime" : Timestamp(1449272372, 32),
            "optimeDate" : ISODate("2015-12-04T23:39:32Z"),
            "lastHeartbeat" : ISODate("2015-12-04T23:39:32.507Z"),
            "lastHeartbeatRecv" : ISODate("2015-12-04T23:39:31.442Z"),
            "pingMs" : 0,
            "syncingTo" : "mongo-9.loc.0600.mongodbdns.com:27000",
            "configVersion" : 29
        },
        {
            "_id" : 7,
            "name" : "mongo-9.loc.0600.mongodbdns.com:27000",
            "health" : 1,
            "state" : 1,
            "stateStr" : "PRIMARY",
            "uptime" : 444056,
            "optime" : Timestamp(1449272372, 39),
            "optimeDate" : ISODate("2015-12-04T23:39:32Z"),
            "electionTime" : Timestamp(1449097485, 1),
            "electionDate" : ISODate("2015-12-02T23:04:45Z"),
            "configVersion" : 29,
            "self" : true
        },
        {
            "_id" : 8,
            "name" : "mongo-10.loc.0600.mongodbdns.com:27000",
            "health" : 1,
            "state" : 2,
            "stateStr" : "SECONDARY",
            "uptime" : 444053,
            "optime" : Timestamp(1449272371, 111),
            "optimeDate" : ISODate("2015-12-04T23:39:31Z"),
            "lastHeartbeat" : ISODate("2015-12-04T23:39:31.904Z"),
            "lastHeartbeatRecv" : ISODate("2015-12-04T23:39:30.903Z"),
            "pingMs" : 2,
            "syncingTo" : "mongo-8.loc.0600.mongodbdns.com:27000",
            "configVersion" : 29
        }
    ],
    "ok" : 1
}

ChrisZieba on 5 Dec 2015

Nothing out of the ordinary in the replica set. Do you have any other relevant code samples, for instance, do you have any code that's reacting to mongoose connection events?

Another potential issue worth considering, are you using an up-to-date new relic agent? I'd try running without new relic and see if this still happens, new relic monkey-patches the mongodb driver so that can sometimes lead to unexpected behavior.

vkarpov15 on 5 Dec 2015

We've been outputting the mongoose connection events:

['connecting', 'connected', 'open', 'disconnecting', 'disconnected', 'close', 'reconnected', 'error', 'fullsetup'].forEach(function(name) {
  mongoose.connection.on(name, function() {
    notifySlack('Mongoose event: ' + name);
  });
});

This is what some of the logs look like

[4:30] Mongoose event: fullsetup
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: open
[4:30] Mongoose event: connected
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: reconnected
[4:30] Mongoose event: fullsetup
[4:30] Mongoose event: connected
[4:30] Mongoose event: open
[4:30] 
{
 "err": {
   "name": "MongoError",
   "message": "no primary server available"
 }
}

I was at the mongodb days event this week, where I was able to schedule some time and show this issue to one of the senior engineers at MongoDB, and they were not sure what the issue was. They did mention to add the replication set and max pool size to the connection string, which has not resolved this issue, unfortunately.

We also tried disabling the keep alive, and setting it to a smaller value on the instances, but that also did not seem to resolve this.

We;re using newrelic version 1.24.0, and mongo-express-patch version 0.21.1. I will try running without newrelic to see if that resolves this.

ChrisZieba on 6 Dec 2015

Hmm yeah it does look like mongoose is reconnecting for some reason. Can you show me the output of npm list | grep "mongoose" and npm list | grep "mongo"?

vkarpov15 on 6 Dec 2015

$ npm list | grep "mongoose"
├─┬ [email protected]

$ npm list | grep "mongo"
├─┬ [email protected]
│ ├─┬ [email protected]
│ │ ├─┬ [email protected]
├─┬ [email protected]
├─┬ [email protected]
│ ├─┬ [email protected]
│ │ ├── [email protected]

ChrisZieba on 6 Dec 2015

What are you using mongodb-core for? Also, are you running with mongo-express enabled in prod?

vkarpov15 on 7 Dec 2015

Currently not using mongodb-core for anything. Do you think the version mismatch between the mongoose dependency may be causing issues?

We do have mongo-express enabled in production.

ChrisZieba on 7 Dec 2015

Not that I know of. I'm just trying to see if there are other connections to mongodb that might be contributing to this issue. I've done a little googling - are you using the same DNS names for your connection string as the ones that appear in rs.status()? According to this, you may see similar issues if you use different DNS for connection string than the your replica set thinks.

vkarpov15 on 7 Dec 2015

This error will occur when using the same DNS in the connection string as the "syncingTo" attribute in rs.status(). It also occurs when using the internal ec2 IP in the connection string.

The only thing I haven't tried yet is just setting connectWithNoPrimary to true.

ChrisZieba on 7 Dec 2015

I'd also try running with mongo-express off too. That might be causing issues...

vkarpov15 on 8 Dec 2015

We're running into the same issue. We have a site that is experiencing sustained load of of about 100 RPM with peaks in the 500-700 rpm+. It seems that we see this throughout the process even during relatively quite periods.

Environment:
Heroku - 75 2x dynos - Node.JS 5.1.1
Database - MongoLabs Dedicated Cluster M4 - Version 3.0.7

Connection String:
mongodb://_:*_@ds043294-a0.mongolab.com:43294,ds043294-a1.mongolab.com:43294/heroku_hf8q79dt?replicaSet=rs-ds043294

NPM:

npm list | grep "mongoose"
├─┬ [email protected]
├── [email protected]
├── [email protected]
├─┬ [email protected]

Connection.js

// Mongoose import
var mongoose = require('mongoose');
var options = {
    server: {
        socketOptions: {
            keepAlive: 1,
            poolSize: 10,
            connectTimeoutMS: 30000,
            socketTimeoutMS: 30000
        }
    },
    replset: {
        socketOptions: {
            keepAlive: 1,
            poolSize: 10,
            connectTimeoutMS: 30000,
            socketTimeoutMS: 30000
        }
    }
};

mongoose.connect((process.env.MONGOLAB_URI || "mongodb://localhost/test"), options, function(error) {
    if (error) {
        console.log(error);
    }
});

module.exports = {
    mongoose: mongoose
};

Logging:
We've enabled a fair amount of monitoring to try and debug this, so i've included our Raygun stack traces in the even that this would help debug. _Note:_ This is the exact same line number that @ChrisZieba showed in the trace above.

Message: no primary server available
Object.pickServer in /app/node_modules/mongodb-core/lib/topologies/replset.js:860
ReplSet.ReplSet.command in /app/node_modules/mongodb-core/lib/topologies/replset.js:437
ReplSet.ReplSet.command in /app/node_modules/mongodb/lib/replset.js:392
Object.executeCommand in /app/node_modules/mongodb/lib/db.js:281
Db.Db.command in /app/node_modules/mongodb/lib/db.js:305
Object.wrapped in /app/node_modules/newrelic/lib/instrumentation/mongodb.js:185
Object.findAndModify in /app/node_modules/mongodb/lib/collection.js:2327
Collection.Collection.findAndModify in /app/node_modules/mongodb/lib/collection.js:2265
Object.wrapped in /app/node_modules/newrelic/lib/transaction/tracer/index.js:155
Object.wrappedQuery in /app/node_modules/newrelic/lib/instrumentation/mongodb.js:218
Object.wrapped in [as findAndModify] (/app/node_modules/newrelic/lib/instrumentation/mongodb.js:188
NativeCollection.NativeCollection.(anonymous in function) [as findAndModify] (/app/node_modules/mongoose/lib/drivers/node-mongodb-native/collection.js:136
NodeCollection.NodeCollection.findAndModify in /app/node_modules/mquery/lib/collection/node.js:79
Query.Query._findAndModify in /app/node_modules/mongoose/lib/query.js:1833
Query.Query._findOneAndUpdate in /app/node_modules/mongoose/lib/query.js:1621
unknown.[anonymous] in /app/node_modules/kareem/index.js:156
unknown.[anonymous] in /app/node_modules/kareem/index.js:18
Object.wrapped in /app/node_modules/newrelic/lib/transaction/tracer/index.js:155
Object.doNTCallback0 in node.js:430
process.process._tickCallback in node.js:359

Monitoring:
2015-12-09_22-22-51

rmartin on 10 Dec 2015

That stack trace really only tells me that 1) you're using new relic (which is very questionable, since new relic does a lot of monkey-patching of the mongodb driver), and 2) the mongodb driver thinks that there is no primary available, but I'm not sure why.

Try enabling the mongodb driver's debug mode by adding replset: { loggerLevel: 'debug' } to your connection options, that is:

var options = {
    server: {
        socketOptions: {
            keepAlive: 1,
            poolSize: 10,
            connectTimeoutMS: 30000,
            socketTimeoutMS: 30000
        }
    },
    replset: {
        loggerLevel: 'debug',
        socketOptions: {
            keepAlive: 1,
            poolSize: 10,
            connectTimeoutMS: 30000,
            socketTimeoutMS: 30000
        }
    }
};

This will log a lot of driver debug data to stdout and help us figure out what's going wrong. Can you capture this data around when this "No primary server found" error occurs?

vkarpov15 on 10 Dec 2015

Thanks @vkarpov15 ,

We have added that and will report back as soon as we have another one triggered.

Cheers,
Roy

rmartin on 10 Dec 2015

I don't think newrelic is the problem here. We tried running without it and this issue persists. Will collect some log data from loggerLevel: 'debug' and post here.

ChrisZieba on 12 Dec 2015

Thanks, let me know if you manage to catch more details on the error.

vkarpov15 on 13 Dec 2015

Another data point: Mongoose triggers the "reconnected" event over and over as the connection count increases.

The "no primary server available" errors usually trigger _after_ the connection count has already begun to climb.

paton on 13 Dec 2015

We as well have experienced this issue. With have a Node app hosted on Heroku with MongoLab.
Last week we suddenly lost connection with the database and kept getting the Error no primary server available message. Restarting our app solved the issue.
Both Heroku and MonogLab saw nothing in their logs.
I hope someone finds a solution for this.

ashercoren on 15 Dec 2015

Bump - we're seeing this on node v4.2.3 mongoose v4.1.5 on a large production deployment. Hard to wrangle this issue as it:

does not error consistently which prevents us from taking action (restart process/taking out node)
happens randomly and seems uncorrelated to mongo replset status

sansmischevia on 15 Dec 2015

@sansmischevia are you using mongolab + heroku as well?

vkarpov15 on 15 Dec 2015

^ We're experiencing this problem in a large production deployment on AWS EC2 with self-hosted mongodb servers via Cloud Manager.

paton on 15 Dec 2015

Hello,

We would also like to chime in.
We are running node v0.12.8, mongo v2.6.11 with mongoose v4.1.11.

$ npm list | grep "mongo"
├─┬ [email protected]
│ └─┬ [email protected]
│   ├─┬ [email protected]
├─┬ [email protected] 
│ ├─┬ [email protected]
│ │ ├─┬ [email protected]
└─┬ [email protected]
  └─┬ [email protected]
    ├─┬ [email protected]

$ npm list | grep "mongoose"
├─┬ [email protected]

It is often reproducible during an operation that seeds the database, involving a lot of queries. Our application seems to be unaffected after this occurs. No errors in mongo log and our three node replica set is healthy during this time.

We will try loggerLevel: 'debug' and report back.

vincenthuynh on 15 Dec 2015

@vkarpov15 we're on mongolab replsets + ec2 directly

sansmischevia on 15 Dec 2015

I am experiencing this issue on mongolab as well.

jeffwhelpley on 16 Dec 2015

We're also experiencing this issue on MongoLab and Modulus.

jf-emberex on 16 Dec 2015

have a look at https://jira.mongodb.org/browse/NODE-622 and if anyone can provide a full set of logs that would be extremely helpful so we can reproduce it.

christkv on 16 Dec 2015

Gonna chime in here, we're not using mongoose, but the native MongoDB client. Getting the same no primary server available error here. We're running a replica set on EC2 instance inside a private VPC, our connection string is the private IP addresses of the instances. MongoDB v3.0.3. It seems to me this happens when there is a high throughput of queries, as in general use the error does not occur.

            serverOpts = {
                server: {
                    sslValidate: false,
                    sslCA: ca,
                    socketOptions: {
                        connectTimeoutMS: 30000,
                        socketTimeoutMS: 180000
                    }
                },
                replSet: {
                    connectWithNoPrimary: false,
                    sslValidate: false,
                    sslCA: ca,
                    socketOptions: {
                        connectTimeoutMS: 30000,
                        socketTimeoutMS: 180000
                    }
                }
            };

Crispy1975 on 18 Dec 2015

Looks like there's a fix for this in upcoming driver releases: NODE-622

It's never too early for presents! :)

vincenthuynh on 23 Dec 2015

The fixed version has already been published on NPM https://www.npmjs.com/package/mongodb.

iclanzan on 23 Dec 2015

I can confirm that we no longer receive the error. :tada:

Crispy1975 on 23 Dec 2015

PR for mongodb 2.1.2 here: https://github.com/Automattic/mongoose/pull/3712

sansmischevia on 23 Dec 2015

Still seeing this error after upgrading mongoose to 4.3.4, which uses mongo core 2.1.2. https://jira.mongodb.org/browse/NODE-622 has been reopened

ChrisZieba on 28 Dec 2015

+1 I just noticed this happening on our production server too. I don't see any pattern of why. Using node 4.2.4 with mongoose 4.3.4 and mongodb 3.0.8. I use mongodb's MMS services to monitor my cluster and I have no seen any alerts during the time when I get: MongoError: no primary server available

amit777 on 13 Jan 2016

@amit777 Can you post your connection string and options? Also, did this occur during an unusually heavy workload, for example, a lot of writes to the database?

ChrisZieba on 13 Jan 2016

Chris, it's definitely happen during write operations, though I wouldn't say our load is particularly heavy. We have a couple nodes in a cluster where each node will independently write to mongo.

Here is how we connect:


var mongoose = require('mongoose');
var mongodb = {};

var connect = function () {
mongodb.db = "mongodb://node1:27017,node2:27017,node3:27017/myapp";
mongodb.dbOptions = {
      "db": {"native_parser": true},
      "replSet": {
        "rs_name": "mongocluster",
        "socketOptions": { "keepAlive": 1, "connectTimeoutMS": 30000, "socketTimeoutMS": 60000 }
        }
    };
  mongoose.connect(config.get('mongodb.db'), config.get('mongodb.dbOptions'));
};
connect();

amit777 on 13 Jan 2016

👎1 👍1

I also just noticed my mongod logs are filling up really fast with connection and disconnections messages:

2016-01-13T13:32:15.418-0500 I NETWORK  [conn91700536] end connection 192.168.1.50:33189 (5558 connections now open)
2016-01-13T13:32:15.418-0500 I NETWORK  [conn91700534] end connection 192.168.1.50:33187 (5558 connections now open)
2016-01-13T13:32:15.418-0500 I NETWORK  [conn91700540] end connection 192.168.1.50:33193 (5557 connections now open)
2016-01-13T13:32:15.418-0500 I NETWORK  [conn91700538] end connection 192.168.1.50:33191 (5558 connections now open)
2016-01-13T13:32:15.418-0500 I NETWORK  [conn91700542] end connection 192.168.1.50:33195 (5557 connections now open)
2016-01-13T13:32:15.418-0500 I NETWORK  [conn91700532] end connection 192.168.1.50:33185 (5556 connections now open)
2016-01-13T13:32:15.419-0500 I NETWORK  [conn91700533] end connection 192.168.1.50:33186 (5552 connections now open)
2016-01-13T13:32:15.419-0500 I NETWORK  [conn91700535] end connection 192.168.1.50:33188 (5552 connections now open)
2016-01-13T13:32:15.419-0500 I NETWORK  [conn91700537] end connection 192.168.1.50:33190 (5552 connections now open)
2016-01-13T13:32:15.419-0500 I NETWORK  [conn91700541] end connection 192.168.1.50:33194 (5551 connections now open)
2016-01-13T13:32:15.419-0500 I NETWORK  [conn91700543] end connection 192.168.1.50:33196 (5551 connections now open)
2016-01-13T13:32:15.419-0500 I NETWORK  [conn91700539] end connection 192.168.1.50:33192 (5552 connections now open)
2016-01-13T13:32:15.548-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36754 #91705950 (5548 connections now open)
2016-01-13T13:32:15.549-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36755 #91705951 (5549 connections now open)
2016-01-13T13:32:15.550-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36756 #91705952 (5550 connections now open)
2016-01-13T13:32:15.550-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36757 #91705953 (5551 connections now open)
2016-01-13T13:32:15.550-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36758 #91705954 (5552 connections now open)
2016-01-13T13:32:15.551-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36760 #91705955 (5553 connections now open)
2016-01-13T13:32:15.551-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36759 #91705956 (5554 connections now open)
2016-01-13T13:32:15.551-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36762 #91705957 (5555 connections now open)
2016-01-13T13:32:15.551-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36761 #91705958 (5556 connections now open)
2016-01-13T13:32:15.553-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36763 #91705959 (5557 connections now open)
2016-01-13T13:32:15.553-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36764 #91705960 (5558 connections now open)
2016-01-13T13:32:15.554-0500 I NETWORK  [initandlisten] connection accepted from 192.168.1.50:36765 #91705961 (5559 connections now open)

amit777 on 13 Jan 2016

Here is some additional info that may help to debug.. I'm beginning to think there may be some bug related to connection pooling. After I restart my node processes, I see a bunch of new connections show up in the mongod.log. Then after about a minute, I see a bunch of end-connection messages in mongod.log.

It seems like the connect/disconnect amplifies faster and faster over time, (though I'm still trying to confirm that).

amit777 on 13 Jan 2016

the typical situation causing this is something like.

Replicaset contains hosts that cannot be resolved by the driver. When the driver connects it uses the replicaset as the canonical source for all connections. Reconnections will use those addresses. They MUST be resolvable by the driver.

You should also avoid using IP addresses they are a source of lots of problems like this, use fully qualified host names (no short names)

christkv on 13 Jan 2016

@christkv if the OS is able to resolve the hosts (ie, by doing ping ), does this mean the driver should be able to resolve as well?

amit777 on 13 Jan 2016

it should yes but you can always use telnet hostname port to check.

christkv on 13 Jan 2016

yeah, I'm able to telnet to the host and port.. (all the database host's have /etc/hosts entries on the application servers).

After our app starts up and the connection pool is created, should there ever being disconnects and reconnects if there are no network issues? Or is there a normal connection timeout and reconnect that I will see in the mongodb logs?

amit777 on 13 Jan 2016

problem is that it's impossible to correlated these things to attempt to understand and reproduce the issue without a full set of logs (see my last comment on https://jira.mongodb.org/browse/NODE-622)

christkv on 13 Jan 2016

if you don't have enough ops in the socket timeout window to exercise all the connections the pool will close down and reconnect. so if you have a 30 second window and 10 connections but only 5 ops it will cause a reconnect event every 30 seconds.

christkv on 13 Jan 2016

Will it close down all connections to the pool? or only the connections that have not been exercised? If we do exercise all the connections within 30 seconds, will the same check be performed in the next 30 second window?

amit777 on 13 Jan 2016

I will try to get the logs you requests in the mongodb ticket.. thanks for assisting.

amit777 on 13 Jan 2016

All. If you manage to exercise all connections in the pool in the socketTimeout window node.js will not timeout the sockets and they will not close forcing a pool reconnect.

A tip lots of connections are only useful if you have a lot of slow running operations in parallel, otherwise you are better suited with a smaller pool as MongoDB uses a thread per socket meaning that thousands of connections require more allocated memory on the server and will cause more CPU context switches.

The next major revision of mongodb-core will change the pool to be growing as well as some other fundamental changes to minimize slow train problems. However thats several months out and will probably be tied together with the MongoDB 3.4 work.

christkv on 13 Jan 2016

Do you see it possible/likely that the massive amounts of disconnect/reconnect can intermittently cause the no primary server available issue?

amit777 on 13 Jan 2016

👎1 👍1

yes as there will be a brief period where there might not be any servers in the set

christkv on 13 Jan 2016

@christkv I've been waiting until this happens again to send you some logs in that other ticket. Our cluster has actually been stable for the last few weeks and we have not seen this error.

ChrisZieba on 13 Jan 2016

@ChrisZieba funny how that always seems to happen lol :+1: I'Il leave the ticket open in jira for now and see what we can figure out.

christkv on 13 Jan 2016

@christkv Hi Christian, i'm just curious if you have any pointers on workarounds in the case of lower traffic. I was thinking of just reducing the pool size as well as increasing the timeouts.

amit777 on 13 Jan 2016

if it helps anyone else, I removed the socket timeout as well as increased keepAlive to 200 and also reduced the poolsize to 3.. i seem to have a lot less disconnect/reconnects.. however it does still occasionally happen.

amit777 on 14 Jan 2016

If it helps anyone, we removed almost all mongoose settings, including socketTimeout and connectionTimeout and keepAlive and connections started to be stable. Our poolSize is 200.
I'm not sure it's the recommended approach but it works now. We're still monitoring it to make sure it holds.

mongoose v4.4.2
Node 4
Mongo 3.0

refaelos on 9 Feb 2016

Do you have a huge amount of slow operations ? if you don't I don't think you will notice any difference between a pool of 20 sockets vs 500.

christkv on 9 Feb 2016

Sorry... it's 200. Fixed the comment.

And yeah, you're right. We don't sense much difference but we rather have the pool size larger than smaller.

The real problem with when connections keep opening and not closed. This used to happen until we removed all mongoose timeout and keepAlive settings. I wonder why these are handled by mongoose/mongo-driver and not letting the OS do it?

refaelos on 9 Feb 2016

Well 2.1.7 and higher has a redesigned pool that avoid this. If you set socketTimeout 0 you delegate it to the os but that might be as much as 10 minutes of hanging connections.

christkv on 9 Feb 2016

Ok. interesting. So now that I removed the keepAlive and socketTimeout settings what are the default settings?

refaelos on 9 Feb 2016

it depends, not sure if mongoose set any specific settings as default. if you use the MongoClient.connect method in the driver it's 30 seconds for both connect and socket timeouts.

christkv on 9 Feb 2016

We do use connect but when we set 30 secs manually connections starting to pile up.

refaelos on 9 Feb 2016

Well with 500 connections you need at least 500 ops inside the socketTimeout period to keep the pool open, otherwise it will close down and force a reconnect. This changes in 2.1.7 however as the pool is a growing/shrinking model.

christkv on 9 Feb 2016

I am having same issue with mongodb 3.2.6 and mongoose 4.3.4. Any help on this?

15astro on 6 Jul 2016

@15astro try to remove the settings of socketTimeout and connectionTimeout and see if it helps.

refaelos on 6 Jul 2016

@refaelos Ok..willl try that..I tried with keepAlive=6000 but that didn't help. Just wanted to know how removingsocketTimeout and connectionTimeout will help?

15astro on 6 Jul 2016

Yeah we tried it with different values and only when we completely removed these settings things started to work well.

refaelos on 6 Jul 2016

@refaelos: I found no luck with removing these settings. Any other thing I am missing?

15astro on 13 Jul 2016

@15astro no man. Sorry. This is how our settings looks like today:

mongo   : {
    uri    : process.env.MNG_URL || 'mongodb://localhost/myDB',
    options: {
      user   : process.env.MNG_USER,
      pass   : process.env.MNG_PASS,
      replset: {
        poolSize: 200
      }
    }

  }

refaelos on 13 Jul 2016

In my case it was related to lack of IP to name binding in /etc/hosts.

If you have set up replica set with names instead of IPs and you have something like this in /etc/hosts of MongoDB nodes:

10.10.10.10 mongodb-2gb-fra1-02 10.10.10.11 mongodb-2gb-fra1-01 10.10.10.12 mongodb-2gb-fra1-03

Then you also need to put it in /etc/hosts of all your app servers.

I thought that node-mongo connects according to whatever I put in the URI, but it's not the case.

It seems that node-mongo connects by IP or name from Mongo URI, then gets hostnames of other replica members from the first MongoDB node that responded to request. It gets for example mongodb-2gb-fra1-03 and passes it to OS for resolving. If OS doesn't know anything about mongodb-2gb-fra1-03, it throws "Error no primary server available".

Hope that helps.

adriank on 29 Aug 2016

👍1

@adriank yes that's correct it bases it's connections of the ones it gets back from the replicaset config. The reason is that this is the canonical source of truth about a replicaset. This is also why all addresses in the replicaset configuration must be resolvable by the driver for the driver to failover correctly and for it to be able to detect servers being added and removed from the set. Previous drivers did not implement the SDAM spec and where more lax. This however would cause problems in production environments.

christkv on 29 Aug 2016

@christkv However it is a nightmare for tools like our MongoSpector. Because of it we have problems with connecting securely to more than one replica from one host. DigitalOcean auto-generates names to droplets that almost nobody changes and the effect is that many clients have mongodb-2gb-fra1-01 as their PRIMARY. :) I hope we can figure something out.

adriank on 29 Aug 2016

We are tracking a server ticket here https://jira.mongodb.org/browse/SERVER-1889. I would love for something like this to be possible.

We should also file a ticket with DigitalOcean pointing out the mistake they are making and how it's affecting their users.

christkv on 29 Aug 2016

by the way you can remove and re-add the replicaset members with their new names being ips

christkv on 29 Aug 2016

Having a similiar issue, after around 12-24hours of being connected our we get an error "No primary server available"

Restarting usually fixes the issue.

connection:
{ "url": "mongodb://user:password@cluser-shard-00-00, cluser-shard-00-01, cluster-shard-00-02/settings?ssl=true&replicaSet=primarycluster-shard-0&authSource=admin&retryWrites=true", "options": { "db": { "w": 1, "wtimeout": 3000, "fsync": true }, "authSource": "admin", "server": { "poolSize": 3, "socketOptions": { "autoReconnect": true, "keepAlive": 60000, "connectTimeoutMS": 7000, "socketTimeoutMS": 15000 } } }, "password": "password", "username": "username" }