Experiencing Lag? (Upcoming Server Maintenance)

Blog Discussion in 'BeerAdvocate Talk' started by Todd, Sep 19, 2014.

Thread Status:
Not open for further replies.
  1. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Some of you may have experienced some sporadic lag while using the site recently. It's not you.

    The lag is being caused by massive DDoS attacks that are targeting certain services (not us) and networks throughout the Internet. While our host is doing their best to mitigate the attacks on their other clients/networks, it's impacting our services throughout the day.

    I also discovered that there were far too many connections and subsequent latency between our servers. Add some DDoS on this and web processes start waiting, hanging, stacking up, creating lag, and eventually dying. Regardless, it's not acceptable.

    Good News
    We're currently working with our host to put our servers on the same network switch. They'll literally be sitting next to each other resulting in practically zero latency.

    Bad News
    There will be a period of downtime as we migrate to a new server. The techs are saying that it could be anywhere from 5 minutes to 1 hour, but we'll be scheduling it during an off-peak usage window to reduce the impact.

    The migration could happen as soon as later tonight PDT.

    I'll post an update once I get confirmation.
     
  2. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Migration has been scheduled for 2014-09-19 23:04:43 PST. More to follow ...
     
    Greywulfken likes this.
  3. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Apparently the migration is complete. We'll continue to monitor.
     
  4. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    We're aware that sporadic lag/connectivity issues still exist. Our host has been notified, and we're exploring solutions.
     
    jzeilinger, Greywulfken and jrnyc like this.
  5. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    On top of all of the above, we've discovered that we're maxing out the 1 gigabyte switch that our servers use to communicate with each other. We're spiking at several GBs, which appears to be the core issue right now.

    On a side note: Our bandwidth is now 30+ terabytes/month and climbing fast. (Only 3TBs are include in our services, so we're paying a ton of $$$ in overages now.)

    Good issues to have, but we're actively seeking solutions for both.

    In the meantime, thanks for your patience during this period of lag and service disruption.
     
    jzeilinger, Greywulfken and Jaycase like this.
  6. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    Our host will be performing maintenance tonight starting at 11pm PDT, which will last for approximately 2 hours. During this time our services will rage from being interrupted to completely inaccessible.

    Our issues should be resolved upon completion, however, we plan on counting to seek other solutions and hosting alternatives.

    Thanks again for your patience.
     
  7. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    So that was hell. Our host failed at completing the scheduled maintenance. On top of this there were complications. As a result we were forced to rollback to the backup that was performed just before maintenance; around 6:12am UTC.

    We'll be attempting to reschedule maintenance soon.

    Sorry for the inconvenience. We're beyond frustrated.
     
  8. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    Just got off the phone with a tech support lead. They'll be reattempting the migration at 11pm PDT today. They expect the process to last 3 hours, during which our services will rage from being interrupted to completely inaccessible.

    More updates to follow ...
     
    jzeilinger and Greywulfken like this.
  9. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    Migration complete as of 6am PDT. We've got some settings to adjust, and we'll continue to monitor performance, but the site is wicked fast now.

    Thanks again for your patience.
     
    jzeilinger and Greywulfken like this.
  10. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Just rebuilt our search index. It's an intensive process.
     
    Greywulfken likes this.
  11. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    I applied some core database settings that our host missed. It required a restart. We should be good to go for now.
     
    Greywulfken likes this.
  12. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    So despite the migration of our database, we're still experiencing sporadic server issues. Our host's tech team is currently investigating and troubleshooting the issue.

    We realize that it's frustrating for everyone, so we'll do our best to keep you updated and find a solution as soon as possible.

    In the meantime, your patience and support is appreciated.
     
  13. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    According to our host, the issue seem to be occurring at the OS level of our server. They've got their tech support and infrastructure teams monitoring and gathering reports in the hopes to uncover the root cause. They've also applied a web services monitor that'll reboot within 2 mins of detecting any downtime; just a bandaid.

    There's not much we can do other than wait and let them do their job, however, we are actively researching hosting alternatives regardless if this issue is resolved or not.

    More updates as we get them ...
     
  14. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    As explained by the lead tech assigned to this issue:

    We may have found and resolved the issue. We're keeping an eye out and hoping for the best, but it's the closest we've come so far. Let me explain the issue:

    So, the problem was that Apache kept crashing due to an error stating that it was hitting the limit of files one user could have open. We kept raising the limit, and it kept crashing, even though the Apache user didn't reach the set limits. Through all the logging and checking we've been doing, we finally figured out that Apache wasn't spawning these processes. They were being spawned by php-cgi when executing the PHP code. Thus, they are spawned by the 'beeradvocate' user, and then the user ID is switched to Apache to finish serving the data.

    The problem with this is that the limits set to the Apache user don't apply to the process when it was originated from another user. So, essentially, the process was respecting the limits applied to 'beeradvocate', rather than those applied to 'apache'. So the fix was to raise the limits applied to the 'beeradvocate' user as well, so Apache doesn't crash.

    Does that make sense? We're still keeping an eye out to make sure this fix resolves the issue, and we'll update the support request with more-detailed information. But let me know if you have any questions.

    This all makes sense to me, and it would explain why it took a bit longer to figure out on top of the other server/network issues.

    I'm personally monitoring the site during nearly every waking second. I'll continue to post updates as we get them too.

    Thanks again for your patience and support.
     
    jzeilinger, Yohann, F2brewers and 4 others like this.
  15. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    We were good for a while there, but there's unfortunately some unsolved issues still at play. I've updated out host's tech team. They're investigating now.
     
  16. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    Despite settings applied by our host last night, our server issues still persist and their at a loss. It's complicated as it's an OS level issue, however, we're not going to sit around until they eventually fix this.

    I'm still actively researching alternative hosting solutions. And I believe I've narrowed down. I just need to get on a call to explain our needs, get a quote, and create a migration plan with @Jason and @Mike.

    More to follow ...
     
  17. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Roughly 2 hours ago another fix was applied to change the way our Apache server handles certain files + more updates to limits and permissions. Our host believes this was the cause of our post-migration issues, caused by an OS kernel/Apache bug. And we're not impressed as they're the ones who setup our LAMP stack.

    Anyway, I've yet to run into any access issues, but I'll continue to monitor the site's performance.

    And we're still planning on moving to a new host, but more on that later ...

    Thanks,
     
    Prospero likes this.
  18. Todd

    Todd Founder (13,518) Aug 23, 1996 Finland
    STAFF Mod Team Society Pooh-Bah

    Update
    There's been some slight lag/issues over the last 2 days during peak usage/traffic spikes, but things appear to be much smoother than before.

    I also just applied a few updates that should stabilize the server and make page loading a bit snappier. And I'll continue to monitor and update as needed.

    In the meantime, @Mike and I are digging into a next generation hosting option that we're pretty damn stoked about. But more on that later ...
     
    Rimbimhoot and TongoRad like this.
Thread Status:
Not open for further replies.