Friday, July 26, 2013

TCP.VALIDNODE_CHECKING

Well, this is embarassing :)

I unwittingly learnt how these parameters work:

In my SQLNET.ORA, I had

TCP.VALIDNODE_CHECKING=YES
TCP.EXCLUDED_NODES=SQLSERVER1

Looks like this file gets read by the LISTENER and NOT the database! Argh!

When I restarted the listener, the customer's SQL Server host was blocked from accessing any Oracle databases on my db host. Had to troubleshoot using listener tracing, and figured out that the listener was the culprit.

So, I simply removed those lines and restarted the listener - problem gone away.

Note that the error message on the client-side is misleading, since it says that the service is not found:

ORA-12514: TNS:listener does not currently know of service requested in connect descriptor
Off course, this is totally misleading.

Wednesday, July 24, 2013

SQL *Net more data to client

I was performance tuning a customer's production database, so naturally I ran AWR Reports.

Whenever I ran AWR reports I kept getting the event "SQL *Net more data to client" as the second highest wait event (after CPU time). When I googled it, I came across this post on AskTom:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:951335700013

In it, SDU (Session Data Unit) and MTU (Maximum Transmission Unit) are mentioned. What's the relationship between SDU and MTU ? 

If you look through the articles on Google, it would seem that they've all just plagiarised each other, with the wrong information. They all incorrectly say SDU should be a multiple of MTU. This lone post on the same AskTom article has the correct info:

http://asktom.oracle.com/pls/asktom/f?p=100:11:0::::P11_QUESTION_ID:951335700013#67529392482665

The author says that SDU should be a multiple of MSS (Maximum Segment Size of the network protocol in use), and NOT the MTU.

I had to search for a more authoritative source.

I looked towards Oracle Support as the holy grail of Oracle knowledge. Here I found Oracle Support document ID 274483.1 The relationship between MTU (Maximum Transmission Unit) and SDU (Session Data Unit) . It says 

"The principle is that the SDU value be a multiple of the MTU." 

Now, this may the source of all the wrong information on the internet.

This document attempts to summarize another doc, SQL Net Packet Sizes (SDU & TDU Parameters) Doc ID 44694.1. This second document says:

"...set the SDU size as a multiple of the MSS."
The reason why MSS is used and not MTU, is that the MTU includes two headers, which reduces the amount of data which Oracle NS (Network Substrate) can transmit per TCP packet. Only MSS determines how much data Oracle can transmit via the lower network protocols.

To calculate the MSS:

MSS = MTU - TCP header size - IP header size

For bog-standard TCP over Ethernet:

MTU = 1500 bytes 
TCP = 20 bytes
IP = 20 bytes

Thus, the MSS for TCP/IP over Ethernet is 1460.

I confirmed that the customer's network has an MTU of 1500.

Given that for Oracle 10g the maximum for SDU size is 32767, what's the optimal SDU with an MSS of 1460? I've done the simple math, and it's 32120. 

That's what we use for the SDU size for a simple Ethernet network.

The results are quite impressive:

Before:

Top 5 Timed Events                                         Avg % Total
~~~~~~~~~~~~~~~~~~                               wait     Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                          1,203               70.5
db file sequential read             140,179            266      2   15.6   User I/O
SQL*Net more data to client   2,669,360       153      0    8.9    Network
control file parallel write           3,755               115     31   6.7   System I/O
db file scattered read               44,471             101      2    5.9   User I/O


After:

Top 5 Timed Events                                         Avg %Total
~~~~~~~~~~~~~~~~~~                                        wait   Call
Event                                 Waits    Time (s)   (ms)   Time Wait Class
------------------------------ ------------ ----------- ------ ------ ----------
CPU time                                            636          72.4
db file sequential read              81,221         160      2   18.3   User I/O
control file parallel write           3,794         117     31   13.3 System I/O
db file scattered read               22,473          55      2    6.2   User I/O
log file parallel write               3,648          51     14    5.8 System I/O

(Sorry about the formatting, it's like herding cats.)

The "sql net more data to client" wait is eliminated! :) 

At first I thought there was something wrong with the AWR Report, maybe the wrong snapshot was chosen, so I double-checked, and ran it for different days. But, it's really gone! :)

Tuesday, July 23, 2013

Database Network Performance Tuning


I recently looked at the database performance of a production database. This post will explain the network performance parameters which I implemented as a result of this work.

Here's the best SQLNET.ORA (server) I could come up for our OLTP application:

SQLNET.AUTHENTICATION_SERVICES=(NTS)
NAMES.DIRECTORY_PATH=(TNSNAMES)
DISABLE_OOB=ON
TCP.NO_DELAY=YES
DEFAULT_SDU_SIZE=32120
USE_DEDICATED_SERVER=ON
SQLNET.EXPIRE_TIME=10

And here's the relevant part of the LISTENER.ORA:

LISTENER =
  (DESCRIPTION =
    (SDU = 32120) (ADDRESS = (PROTOCOL = TCP)(HOST = SERVER)(PORT = 1521)(SEND_BUF_SIZE = 65535)(RECV_BUF_SIZE = 65535))
  )



Now, these settings are specifically chosen for an OLTP application, where response time is more important than throughput. I'll explain the settings and values.

SQLNET.AUTHENTICATION_SERVICES=(NTS)


Not much to say here except on the server I always use NTS because it's required by ASM, and on the Citrix server I set this variable to NONE.

NAMES.DIRECTORY_PATH=(TNSNAMES)


We use TNSNames for service resolution, so that's what we put.

DISABLE_OOB=ON


Disable Out-of-Band protocol (disables Ctrl-C). The application doesn't require OOB, so why bother with this overhead? Disable it.

Note: do not include the parameter BREAK_POLL_SKIP. With BREAK_POLL_SKIP, the Oracle client will check for a Ctrl-C, while DISABLE_OOB completely disables it. If both are used, I am not sure which takes precendence.

USE_DEDICATED_SERVER=ON


I use Dedicated Server mode for all production databases to ensure maximum performance. We have enough RAM, so why not?

SQLNET.EXPIRE_TIME=10



Closes a dead connection after 10 minutes e.g. application crashed and we need to clean up the dead connection.

TCP.NO_DELAY=YES


This is the most important parameter for OLTP apps. It tells the Oracle network software "stop messing about with buffers, just send the data back to the client ASAP!". This had a substantial performance improvement. 

Sybase, in its ASE documentation, recommends this to be set on for its ASE. The relevant document can be found here:


Sybase puts it eloquently:

The tcp no delay parameter controls TCP (Transmission Control Protocol) packet batching. The default value is 1, which means that TCP packets are not batched. 
TCP normally batches small logical packets into single larger physical packets (by briefly delaying packets) fill physical network frames with as much data as possible. This is intended to improve network throughput in terminal emulation environments where there are mostly keystrokes being sent across the network. 
However, applications that use small TDS (Tabular Data Stream) packets may benefit from disabling TCP packet batching.

Yes, I realise it's Sybase doc and not Oracle, but the concept is the same, and their explanation is the best I could find.

The anti-thesis of this parameter is these two:

RECV_BUF_SIZE
SEND_BUF_SIZE

These two parameters set up the data buffers for the packets going back & forth. It's perfect for DSS (reporting) type applications. Put these in your SQLNET.ORA if you want your OLTP app to run slowly!

If both sets of parameters are used, I'm not sure which takes precedence.

SQLNET.ORA: DEFAULT_SDU_SIZE=32120 & SDU=3120 in Listener.ora


This is extremely important. I'll explain how I arrived at this in a separate post. For the new SDU size to take effect, you must update the SQLNET.ORA (client and server) and the LISTENER.ORA (as shown above), and restart the listener.