What steps to take when trying to resolve unresponsive/hung/broken IIS web site?
Asked
14 years, 8 months ago
Modified
7 years, 9 months ago
Viewed
13k times
4
What steps do you take when you find an IIS web site is not responding?
I might try to first telnet to the specified port, then check web site binding and authentication, and at last restart it.
I think knowing what an experienced admin would check when facing such problems is quite useful.
In fact I my self spent over half an hour trying to figure out what
is the problem and nothing seemed incorrect. I simply restarted the web
site and problem was still there, but after restarting IIS service the
problem was resolved.
If I could know a better tracing or at least a useful logging feature
which helped me resolve it faster, that would save me over half an
hour.
{FYI I am using IIS 7.5}
iis
iis-7
website
iis-8
iis-10
Share
Improve this question
Follow
edited Jun 12, 2018 at 6:04

TristanK
9,18322 gold badges3131 silver badges3939 bronze badges
asked Jul 11, 2011 at 10:36

Yasser Sobhdel
16711 gold badge22 silver badges88 bronze badges
Add a comment
2 Answers
Sorted by:
4
I've found the following guidance works pretty well as a general collection guide.
Determine Symptoms
Try to establish (as quickly as possible) the surface area of the problem:
Connectivity? (Telnet is good; if you get an
error page returned in the browser, something's obviously working -
eliminate connectivity first)
General App Pool failure, or specific to a content type? (Do ASPX files work/not work, but .HTM work? Do you have canary files for each app and content type?)
Specific in-app failure, hang, or crash? (Most of this is for hangs and app failures; crashes dictate their own methodology: get a crash dump, debug it)
As a rule, always write it down, as you might be dealing with
multiple symptoms, and being able to refer back to your notes on an
earlier incident can be invaluable.
Collect Data
aka "Collect Temporal Data" - You have a limited window to collect
certain data while there's an outage. Some data - like the process
memory - is ephemeral and will disappear if you take corrective action
first. Other data - like logs - might take time to copy, but you could
just as easily get them afterwards. So understand what data you need to
collect NOW vs post-restoration.
Grab whatever time-sensitive/timely data you
will need to resolve the issue later. Don't worry about persistent stuff
- Event Logs and IIS logs stick around, unless you're a compulsive
clearer, in which case: stop it. (Those that don't have an Event Log of last week are doomed to repeat it)
Determine the affected worker process (and dump it)
APPCMD LIST WP can help with this, or the Worker Processes GUI at the Server level.
If using the GUI, don't forget to look at the Current Requests by
right-clicking the worker process - if you get it, it'll show you which
module (DLL) the requests are jammed in, which can help you guess a
cause early.
Determine the scope (i.e. just one App Pool, multiple App Pools, two with dependencies - this depends on your app and website layout)
Grab a memory dump of the worker process
- once you've identified which App Pool has the problem, identify the
relevant Worker Process, and use Task manager to create a memory dump by
right-clicking that process. Note the filename for later.
Note On Task Manager bitness: You need to use the same bitness of Task Manager
as the Worker Process you're attacking with it - if you dump a 32-bit
WP (w3wp*32) with 64-bit Task Manager, it's not going to be
interpretable. If dumping a 32-bit process on 64-bit Windows, you need
to exit Task Manager, run %WINDIR%\SYSWOW64\TaskMgr.exe to get the
32-bit version, then dump with the same bitness. (a ten second detour,
but you must do it at the time).
Restore Service
You've now got all the point-in-time info you think you need for
diagnostics, so it's time to get the website customers back in business.
Recycle the minimum number of Worker Processes in order to restore service.
Don't bother stopping and starting Websites, you generally need
the App Pool to be refreshed in order to get the site working again, and
that's what a Recycle does.
Recycling the App Pool is 9/10 times enough.
Note that recycling appears to happen on the next request to come
in (even though the existing WP has been told to go away), so a worker
process may not immediately reappear. That doesn't mean it hasn't worked, just that no requests are waiting.
IISReset is usually a tool used by people that don't know better. Don't use it unless you need every website to terminate and restart all at once. (It's
like trying to hammer a nail into a wall with a brick. It might work,
but you kinda look like an idiot, and there's going to be collateral
damage at some point).
You may have other app dependencies - app pools depending on
other app pools, or databases, or external systems... What you have to
do to restore service tells you something about the scope of the
problem. Last in the list is a full reboot, but unless a kernel-level
driver really got messed up, that's typically not necessary, it's just
that you can't determine which thing is necessary and it's a useful
catch-all...
Determine Cause
i.e. look at and think about the data you've collected.
Take the logs and the memory dump, look for commonalities, engage the app developers, debug the dump with DebugDiag (or newer) or WinDBG, and so on.
Set up for next time
Do you know you've fixed it? If not, and especially if nothing else
seems to have changed, think about what you might be able to capture if
you're better set up if it happens again.
Don't assume it's the last occurrence - develop a plan for what you'll need to collect next time, based on this time.
For example, if the requests are all for the same URL, implement
some additional instrumentation or logging, or a Failed Request Tracing
rule that'll help identify the spot on the page that experiences a
problem.
Performance monitor logs are helpful (if in doubt, get a perfmon log too).
Look at other tools which might be useful - ProcDump, XPerf/WPT/WPR, and so on. If all you have is a hammer, every problem has to be a nail…
Think about whether "papering over" the issue is acceptable while
seeking actual root cause - if the outage is really bad, something like
adjusting the recycling settings for the App Pool might be acceptable
to minimize the likelihood, or the duration (except where that conflicts
with being able to troubleshoot it)...
Share
Improve this answer
Follow
edited Jun 12, 2018 at 6:13
answered Jul 12, 2011 at 0:49

TristanK
9,18322 gold badges3131 silver badges3939 bronze badges
Add a comment
0
Why would the bindings or authentication methods (which should be
static) cause a site to be unresponsive? Those wouldn't be on my list of
checks, or at the very least they wouldn't be on the top of my list.
The first thing I would check would be whether or not the site loads
from the server itself. If it doesn't, you can rule out almost every
possible network or DNS problem as the cause.
Share
Improve this answer
Follow
https://serverfault.com/questions/288959/what-steps-to-take-when-trying-to-resolve-unresponsive-hung-broken-iis-web-site