天天看點

Concurrency and Database Connections in Ruby with ActiveRecord 原文位址 https://devcenter.heroku.com/articles/concurrency-and-database-connections#threaded-servers Table of Contents

原文位址 https://devcenter.heroku.com/articles/concurrency-and-database-connections#threaded-servers

Table of Contents

  • Connection pool
  • Threaded servers
  • Multi-process servers
  • Maximum database connections
  • Calculating required connections
  • Number of active connections
  • Bad connections
  • Limit connections with PgBouncer

When increasing concurrency by using a multi-threaded web server likePuma, or multi-process web server like Unicorn, you must be aware of the number of connections your app holds to the database and how many connections the database can accept. Each thread or process requires a different connection to the database. To accommodate this, Active Record provides a connection pool that can hold several connections at a time.

If you have questions about Ruby on Heroku, consider discussing it in the Ruby on Heroku forums.

Connection pool

By default Rails (Active Record) will only create a connection when a new thread or process attempts to talk to the database through a SQL query. Active Record limits the total number of connections per application through a database setting 

pool

; this is the maximum size of the connections your app can have to the database. The default maximum size of the database connection pool is 5. If you try to use more connections than are available, Active Record will block and wait for a connection from the pool. When it cannot get a connection, a timeout error will be thrown. It may look something like this:

ActiveRecord::ConnectionTimeoutError - could not obtain a database connection within 5 seconds. The max pool size is currently 5; consider increasing it
      

To avoid this error you can change the size of your connection pool manually by customizing your connection settings. While the means are similar, the location of your connect setup can vary for threaded vs. multi-process web servers.

Threaded servers

For servers that achieve concurrency via threads we recommend using an initializer to configure your database pool. When your Rails application boots, it will execute the code in your initializer and establish the connection with your customizations.

For Rails 4.1+ you can set these values directly in your

config/database.yml

production:
  url:  <%= ENV["DATABASE_URL"] %>
  pool: <%= ENV["DB_POOL"] || ENV['MAX_THREADS'] || 5 %>
      

Otherwise if you are using an older version of Rails you will need to use an initializer.

# config/initializers/database_connection.rb

# Use config/database.yml method if you are using Rails 4.1+
Rails.application.config.after_initialize do
  ActiveRecord::Base.connection_pool.disconnect!

  ActiveSupport.on_load(:active_record) do
    config = ActiveRecord::Base.configurations[Rails.env] ||
                Rails.application.config.database_configuration[Rails.env]
    config['pool']              = ENV['DB_POOL']      || ENV['MAX_THREADS'] || 5
    ActiveRecord::Base.establish_connection(config)
  end
end
      

If you are already using an initializer, you should switch over to the 

database.yml

 method as soon as possible. Using an initializer requires duplicating code if you are using a forking webserver such as Unicorn or Puma (in hybrid mode). The initializer method can cause confusion over what is happening and is the source of numerous support tickets.

If you are using the Puma web server we recommend setting the 

pool

value to equal 

ENV['MAX_THREADS']

. When using multiple processes each process will contain its own pool so as long as no worker process has more than 

ENV['MAX_THREADS']

 then this setting should be adequate.

Multi-process servers

For a forking server such as Unicorn, the master process will boot your rails applications (and execute any initializers) and then fork workers. For this reason it’s necessary to disconnect in your master process in the

before_fork

 and then re-establish the connection in an 

after_fork

block:

# config/unicorn.rb
before_fork do |server, worker|
  # other settings
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.connection.disconnect!
  end
end

after_fork do |server, worker|
  # other settings
  if defined?(ActiveRecord::Base)
    ActiveRecord::Base.establish_connection
  end
end
      

For Unicorn, this connection setup should be in addition to the normal recommended configuration as described in theDeploying Rails Applications With Unicorn guide.

If you are using Rails 4.1+ then

ActiveRecord::Base.establish_connection

 will use the connection information stored in 

config/database.yml

. Otherwise you will need to duplicate the behavior in your initializer to ensure consistent connection information:

# config/unicorn.rb

# Use config/database.yml method if you are using Rails 4.1+
after_fork do |server, worker|
  # other settings
  if defined?(ActiveRecord::Base)
    config = ActiveRecord::Base.configurations[Rails.env] ||
                Rails.application.config.database_configuration[Rails.env]
    config['pool'] = ENV['DB_POOL'] || 5
    ActiveRecord::Base.establish_connection(config)
  end
end
      

Note we setting the 

pool

 to 5 connections or the value specified in the

DB_POOL

 env var. Now you can set the connection pool size by setting a config var on Heroku. For instance if you wanted to set it to 10 you could run:

$ heroku config:set DB_POOL=10
      

This doesn’t mean that each dyno will now have 10 open connections, but only that if a new connection is needed it will be created until a maximum of 10 have been used per Rails process.

Even if you have enough connections in your pool, your database may have a maximum number of connections that it will allow.

Maximum database connections

Heroku provides managed Postgres databases. Different tiered databases have different connection limits. The Starter Tier “Dev” and “Basic” databases are limited to 20 connections. Production Tier databases (plans Crane and up) have higher limits. Once your database has the maximum number of active connections, it will no longer accept new connections. This will result in connection timeouts from your application and will likely cause exceptions.

When scaling out, it is important to keep in mind how many active connections your application needs. If each dyno allows 5 database connections, you can only scale out to four dynos before you need to provision a more robust database.

Now that you know how to configure your connection pool and how to figure out how many connections your database can handle you will need to calculate the right number of connections that each dyno will need.

Calculating required connections

Assuming that you are not manually creating threads in your application code, you can use your web server settings to guide the number of connections that you need. The Unicorn web server scales out using multiple processes, if you aren’t opening any new threads in your application, each process will take up 1 connection. So in your unicorn config file if you have 

worker_processes

 set to 

3

 like this:

Then your app will use 3 connections for workers. This means each dyno will require 3 connections. If you’re on a “Dev” plan, you can scale out to 6 dynos which will mean 18 active database connections, out of a maximum of 20. However, it is possible for a connection to get into a bad or unknown state. Due to this we recommend setting the 

pool

 of your application to either 

1

 or 

2

 to avoid zombie connections from saturating your database. See the “Bad connection” section below.

Another web server, Puma, gets concurrency using threads (16 by default). This means it would require 16 connections in the pool to operate without exception. It’s likely that your dyno isn’t taking full advantage of all 16 of these threads, so with tuning you could figure out an optimal number and specify it in your 

Procfile

. If you wanted Puma to only use 5 threads and therefore 5 maximum connections, you can configure it to use a maximum of 5 threads 

0:5

 like this:

web:  bundle exec puma  -t 0:5 -p $PORT -e ${RACK_ENV:-development}
      

Every application will have different performance characteristics and different requirements. To properly tune the number of threads for your app you will need to load test your app in a production-like or staging environment.

Number of active connections

In development you can see the number of connections taken up by your application by checking the database.

$ bundle exec rails dbconsole
      

This will open a connection to your development database. You can then see the number of connections to your postgres 9.1 or previous database by running:

select count(*) from pg_stat_activity where procpid <> pg_backend_pid() and usename = current_user;
      

On Postgres 9.2 and later the command is:

select count(*) from pg_stat_activity where pid <> pg_backend_pid()  and usename = current_user;
      

Which will return with the number of connections on that database:

count
-------
   5
(1 row)
      

Since connections are opened lazily, you’ll need to hit your running application at 

localhost

 several times until the count quits going up. To get an accurate count you should run that database query inside of a production database since your development setup may not allow you to generate load required for your app to create new connections.

Bad connections

It is possible for connections to hang, or be placed in a “bad” state. This means that the connection will be unusable, but remain open. If you are running a multi-process web server such as Unicorn this could mean that over time a 3 worker dyno which normally consumes 3 database connections could be holding as many as 15 connections (5 default connections per pool times 3 workers). To limit this threat lower the connection pool to 

1

 or 

2

 and enable connection reaping which is available in Rails 4, though it was turned off by default after this bug report

The 

'reaping_frequency'

 can tell Active Record to check to see if connections are hung or dead every N seconds and terminate them. While it is likely that over time your application may have a few connections that hang, if something in your code is causing hung connections, the reaper will not be a permanent fix to the problem.

Limit connections with PgBouncer

You can continue to scale out your applications with additional dynos until you have reached your database connection limits. Before you reach this point it is recommended to limit the number of connections required by each dyno by using the PgBouncer buildpack.

PgBouncer maintains a pool of connections that your database transactions share. This keeps connections to Postgres, that are otherwise open and idle, to a minimum. However, transaction pooling prevents you from using named prepared statements, session advisory locks, listen/notify, or other features that operate on a session level. See the PgBouncer buildpack FAQ for full list of limitations for more information.

For many frameworks, you must disable prepared statements in order to use PgBouncer. Then set your app to use a custom buildpack that will call other buildpacks.

Do not continue before disabling prepared statements, or verifying that your framework is not using them. Rails 3+ uses prepared statements.

$ heroku buildpack:set https://github.com/heroku/heroku-buildpack-multi
      

This buildpack will run other buildpacks by looking in the 

.buildpacks

file, and running each buildpack listed in order. So first we will add the PgBouncer buildpack:

$ echo "https://github.com/heroku/heroku-buildpack-pgbouncer" >> .buildpacks
      

Next we need to ensure your application can run so you need to add your language specific buildpack. Since you are using Ruby it would be:

$ echo "https://github.com/heroku/heroku-buildpack-ruby" >> .buildpacks
      

The final file should look like this:

$ cat .buildpacks
https://github.com/heroku/heroku-buildpack-pgbouncer
https://github.com/heroku/heroku-buildpack-ruby
      

Now you must modify your 

Procfile

 to start PgBouncer. In your

Procfile

 add the command 

bin/start-pgbouncer-stunnel

 to the beginning of your 

web

 entry. So if your 

Procfile

 was

web: bundle exec puma -C config/puma.rb
      

Will now be:

web: bin/start-pgbouncer-stunnel bundle exec puma -C config/puma.rb
      

Commit the results to git, test on a staging app, and then deploy to production.

When deploying you should see this in the output:

=====> Detected Framework: pgbouncer-stunnel