NSLU2-Linux
view · edit · print · history

NSLU2 performance

A professional report on web server performance from a researcher at the Free University in Amsterdam can be found here.

Overview: What to expect

It will of course depend on your LAN, disk and USB enclosure. On a 10Mb LAN performing large sequential read or writes the NLSU2 will utilize it fully, so this is only relelvant for 100Mb and faster networks. Note that Unslung firmware does not seem to alter the network performance significantly.

The results below are obtained with Unslung 3.17 using a USB 2.0 enclosure (Sweex - http://www.sweex.nl - using the ALI chip) with a Samsung 160GB 7.2krpm and 8 MB cache. Please update if your performance numbers differ significantly from mine.

I have compared the NSLU2 to similar products. One of the closest, the Synology DS-101, has slightly better performance, probably due to more RAM and use of IDE disks rather than USB. Until it has been "Unsyned", the actual cause of the higher speed is uncertain. A performance comparation of NSLU2 and DS-101 (and the Kuro-box) using iozone can be found at http://www.tomsnetworking.com/Reviews-190-ProdID-DS101-5.php .

Samba and FTP tests are done with single client only. Superficial testing with two clients indicates slightly higher joint throughput. Pure disk and network tests should not differ significantly when having several instances of the test program.

We should probably test using standard storage benchmark programs too - a couple that spring to mind are IOMeter and Bonnie (would prefer Spec SFS 3 but complex and not free to set up).

Question: Does the overclock mod alter the speeds below for Samba?

Samba

  • Maximum read speed: ~3.5 MB/s (CPU load ~50%)
  • Maximum write speed: ~3 MB/s (CPU load ~60%)

Samba on Debian

Tests on a de-underclocked NSLU2 running Debian Etch, kernel 2.6.18.dfsg.1-17. Samba ver: 3.0.24-6etch9

  • Maximum read speed: ~3.5MB/s (CPU load ~40%)
  • Maximum write speed: ~4 MB/s (CPU load ~60%)

Additional SAMBA measures on Debian

I just tweak the following stuff on my NSLU2:

 - Use of writeback journal (available in ext3fs) and noatime mount option
 - In smb.conf, use of : socket options = TCP_NODELAY SO_KEEPALIVE IPTOS_LOWDELAY SO_RCVBUF=16384 SO_SNDBUF=16384

Then i got the following results, with the Slug 100% dedicated to the test (ie no other consuming processes running):

 - From a Linux server, with a 100 MB file, flushing all cache between tests:
            - WRITE : 5.2 MBS
            - READ  : 4.3 MBS
 - From a Vista laptop, with a 100 MB file, flushing all cache between tests:
            - WRITE : 5.0 MBS 
            - READ  : 4.7 MBS 

Note that for the laptop, the result are horrible when using a wireless connection (ie near 2.0 MBS), but it might be due to the bad routing peformance of my xDSL/WLAN gateway.

NFS

I don't use NFS, but posts from others seem to indicate about the same, or slightly higher, speeds as with Samba, but with more CPU load.

For speed measurement for one of the NFS packages, see Unslung.Nfs-utils.

The rsize and wsize options for NFS have a huge impact, try setting them to 32768 or 65536. I'm getting 4.75/3.5 MB/s (read/write), see my profile - Profiles.Zhyla

FTP

Measured with vsftpd.

  • Maximum read speed: ~5.8 MB/s (CPU load ~65%)
  • Maximum write speed: ~4.7 MB/s (CPU load ~70%)

Disk speed

Write speed measured using time dd bs=1M count=100 if=/dev/zero of=hugefile.dat on /share/hdd/data/public and readspeed with time dd bs=1M count=100 of=/dev/null if=hugefile.dat (remember to empty any cache between write and read test). Also note that time is a part of the package Busybox 1.0. You may use your stopwatch if you are reluctant to install it.

  • Maximum read speed: ~8.7 MB/s (CPU load ~60%)
  • Maximum write speed: ~7.3 MB/s (CPU load ~75%)

Network speed

Measured using netio (slug binary version downloadable from http://folk.uio.no/ingeba/netio.arm and x86 linux version http://folk.uio.no/ingeba/netio.x86 . Source can be fetched from http://www.netfuse.de/techarea/netio/netio114.zip ).

  • Maximum read speed: ~11.5 MB/s (CPU load ~80%)
  • Maximum write speed: ~11 MB/s (CPU load ~85%)

Troubleshooting

There are a number of possible causes for bad performance. Here are some things to look at:

  • Run disk benchmarks directly on the slug (dd is a good thing)
  • Check the switch/hub LEDs that your slug is detected as a 100Mb/s device.
  • Run ifconfig and look for collisions, dropped packets, overruns and things like that
  • Run netio to see if there is anything in the network path that obstructs. A common problem is bad cables.
  • Check out the duplex issue if possible (see below)
 > I got into this discussion over on DSL Reports. The leading theory
 > is a problem in the switch/hub detecting full or half duplex. One of
 > the posters had a nice switch and was able to catch the errors from
 > the NSLU. I had the problem with my CNet switch, it liked my Linksys
 > switch and it likes my Nortel switch. I was going to play with my
 > Nortel switch to verify his info, but I have not had the time. I
 > would blame your switch for now.

 I guess that IS possible (I thought about it), but the netgear hub I have
 here does have a 10mb led, and a 100mb led, and when the device is plugged
 in, the 100mb led is lit (this is the Netgear 16-port hub).
 [DAVID- 100/half duplex is still a problem.  LED doesn't indicate half/full.]

 I also have that 16 port hub connected to a netgear FM114P router, and when
 I connect it to that...it works at full-speed again! Dang, definitely
 something with the drivers/chipset negotiating line speed.

 Sounds like something needs to be tweaked in the NSLU2's lan negotiation.

 David Troesch | Atlanta, GA | ICQ# 2333123

Details - hard numbers

Lmbench results for Stock Slug with 2.4.22-Linksys kernel.

Hardware:

  • Stock NLSU2,
  • Toshiba MK5026GAX 40GB 5400rpm 2.5” Drive
  • Vantec Nexstar USB2 2.5” case

with CSR loaded:


 Results going to ../results/armv5b-linux-gnu/LKG0FB07F?.
 Using config in CONFIG.LKG0FB07F
 Tue Sep 21 00:34:20 MDT 2004
 Latency measurements
 Tue Sep 21 00:39:56 MDT 2004
 Calculating file system latency
 Tue Sep 21 00:40:18 MDT 2004
 Local networking
 Tue Sep 21 00:40:54 MDT 2004
 Bandwidth measurements
 Tue Sep 21 01:53:08 MDT 2004
 Calculating context switch overhead
 Tue Sep 21 02:07:41 MDT 2004
 Calculating memory load latency
 Tue Sep 21 02:13:30 MDT 2004
 make[1]: Leaving directory `/home/packages/lmbench/lmbench-2.0.4/src'

 real    102m23.363s
 user    80m48.670s
 sys     20m6.560s

without CSR loaded:


 Results going to ../results/armv5b-linux-gnu/LKG000000?.0
 Using config in CONFIG.LKG000000?
 Tue Sep 21 02:32:23 MDT 2004
 Latency measurements
 Tue Sep 21 02:33:04 MDT 2004
 Calculating file system latency
 Tue Sep 21 02:33:27 MDT 2004
 Local networking
 Tue Sep 21 02:33:54 MDT 2004
 Bandwidth measurements
 Tue Sep 21 02:41:49 MDT 2004
 Calculating context switch overhead
 Tue Sep 21 02:43:41 MDT 2004
 Calculating memory load latency
 Tue Sep 21 02:49:05 MDT 2004
 make[1]: Leaving directory `/home/packages/lmbench/lmbench-2.0.4/src'

 real    17m46.969s 
 user    14m12.840s
 sys     2m48.130s

with CSR loaded:


 sh-2.05b# ./hdparm -Tt /dev/sda

 /dev/sda: 
  Timing cached reads:   148 MB in  2.00 seconds =  74.00 MB/sec
  Timing buffered disk reads:   20 MB in  3.03 seconds =   6.60 MB/sec

 /dev/sda:
  Timing cached reads:   148 MB in  2.02 seconds =  73.27 MB/sec
  Timing buffered disk reads:   24 MB in  3.21 seconds =   7.48 MB/sec

 /dev/sda:
  Timing cached reads:   148 MB in  2.00 seconds =  74.00 MB/sec
  Timing buffered disk reads:   24 MB in  3.06 seconds =   7.84 MB/sec

 /dev/sda:
  Timing cached reads:   148 MB in  2.02 seconds =  73.27 MB/sec
  Timing buffered disk reads:   24 MB in  3.09 seconds =   7.77 MB/sec

 /dev/sda:
  Timing cached reads:   148 MB in  2.01 seconds =  73.63 MB/sec
  Timing buffered disk reads:   24 MB in  3.07 seconds =   7.82 MB/sec

without CSR loaded:


 sh-2.05b# ./hdparm -Tt /dev/sda

 /dev/sda:
  Timing cached reads:   164 MB in  2.01 seconds =  81.59 MB/sec
  Timing buffered disk reads:   22 MB in  3.02 seconds =   7.28 MB/sec

 /dev/sda:
  Timing cached reads:   164 MB in  2.02 seconds =  81.19 MB/sec
  Timing buffered disk reads:   24 MB in  3.17 seconds =   7.57 MB/sec

 /dev/sda:
  Timing cached reads:   164 MB in  2.02 seconds =  81.19 MB/sec
  Timing buffered disk reads:   24 MB in  3.07 seconds =   7.82 MB/sec

 /dev/sda:
  Timing cached reads:   164 MB in  2.02 seconds =  81.19 MB/sec
  Timing buffered disk reads:   26 MB in  3.17 seconds =   8.20 MB/sec

 /dev/sda:
  Timing cached reads:   164 MB in  2.02 seconds =  81.19 MB/sec
  Timing buffered disk reads:   26 MB in  3.16 seconds =   8.23 MB/sec

without CSR loaded:


 make on perl-5.8.3:
 real    80m43.326s
 user    76m39.070s
 sys     3m12.650s

with CSR loaded:


 make on perl-5.8.3:
 real    90m48.799s
 user    81m40.610s
 sys     8m1.640s 


                 L M B E N C H  2 . 0   S U M M A R Y
                 ------------------------------------


 Basic system parameters
 ----------------------------------------------------
 Host                 OS Description              Mhz

 --------- ------------- ----------------------- ----
 LKG000000 Linux 2.4.22-        armv5b-linux-gnu  266
 LKG0FB07F Linux 2.4.22-        armv5b-linux-gnu  266
 familiar  Linux 2.4.19-      armv5tel-linux-gnu  400

 Processor, Processes - times in microseconds - smaller is better
 ----------------------------------------------------------------
 Host                 OS  Mhz null null      open selct sig  sig  fork exec sh  
                              call  I/O stat clos TCP   inst hndl proc proc proc
 --------- ------------- ---- ---- ---- ---- ---- ----- ---- ---- ---- ---- ----
 LKG000000 Linux 2.4.22-  266 1.23 3.10 17.5 23.6 211.6 9.19 14.0 3200 11.K 41.K
 LKG0FB07F Linux 2.4.22-  266 1.28 3.23 18.3 24.7 221.5 9.58 14.6 3500 12.K 44.K
 familiar  Linux 2.4.19-  400 0.37 1.03 59.9 61.6  70.4 2.85 4.62 1864 5434 15.K

 Context switching - times in microseconds - smaller is better
 -------------------------------------------------------------
 Host                 OS 2p/0K 2p/16K 2p/64K 8p/16K 8p/64K 16p/16K 16p/64K
                         ctxsw  ctxsw  ctxsw ctxsw  ctxsw   ctxsw   ctxsw
 --------- ------------- ----- ------ ------ ------ ------ ------- -------
 LKG000000 Linux 2.4.22- 151.2  300.4  695.6  338.8  708.8   339.0   733.6
 LKG0FB07F Linux 2.4.22- 178.7  343.5  770.3  378.6  783.2   385.0   784.2
 familiar  Linux 2.4.19- 109.0  293.3  800.5  294.5  824.6   308.0   823.9

 *Local* Communication latencies in microseconds - smaller is better
 -------------------------------------------------------------------
 Host                 OS 2p/0K  Pipe AF     UDP  RPC/   TCP  RPC/ TCP
                         ctxsw       UNIX         UDP         TCP conn
 --------- ------------- ----- ----- ---- ----- ----- ----- ----- ----
 LKG000000 Linux 2.4.22- 151.2 322.4 482.             755.4       1415
 LKG0FB07F Linux 2.4.22- 178.7 375.1 967.             855.0           
 familiar  Linux 2.4.19- 109.0 217.3 342. 544.7 729.0 684.4 1011. 1567

 File & VM system latencies in microseconds - smaller is better
 --------------------------------------------------------------
 Host                 OS   0K File      10K File      Mmap    Prot    Page	
                         Create Delete Create Delete  Latency Fault   Fault 
 --------- ------------- ------ ------ ------ ------  ------- -----   ----- 
 LKG000000 Linux 2.4.22-  747.4  188.6 1851.9  444.6   1734.0 5.026    30.0
 LKG0FB07F Linux 2.4.22-  789.9  205.6 1972.4  793.7   1866.0 5.211    31.0
 familiar  Linux 2.4.19-   14.4   11.5   97.0   21.9   2141.0 2.060    13.0

 *Local* Communication bandwidths in MB/s - bigger is better
 -----------------------------------------------------------
 Host                OS  Pipe AF    TCP  File   Mmap  Bcopy  Bcopy  Mem   Mem
                              UNIX      reread reread (libc) (hand) read write
 --------- ------------- ---- ---- ---- ------ ------ ------ ------ ---- -----
 LKG000000 Linux 2.4.22- 9.02 18.8 16.1   25.6   64.3   43.6   43.3 64.3  84.6
 LKG0FB07F Linux 2.4.22- 7.71 17.1 15.0   24.3   60.4   39.2   38.9 60.5  78.2
 familiar  Linux 2.4.19- 18.0 38.4 21.8   43.5   79.6  118.7   49.8 78.8 334.6

 Memory latencies in nanoseconds - smaller is better
     (WARNING - may not be correct, check graphs)
 ---------------------------------------------------
 Host                 OS   Mhz  L1 $   L2 $    Main mem    Guesses
 --------- -------------  ---- ----- ------    --------    -------
 LKG000000 Linux 2.4.22-   266  16.1  236.3  246.2    No L2 cache?
 LKG0FB07F Linux 2.4.22-   266  17.0  266.2  266.2    No L2 cache?
 familiar  Linux 2.4.19-   400 7.540  302.7  322.5    No L2 cache?

--jacques


Some tests done with the copy of a 200MB file between a NSLU2 with 2.12-beta firmware and a Linux 2.8 (ubuntu), the test was performed with the filesystem "mounted" and then a simple read/write of the 200MB via a python script.

          read        write
 nfs:     5.7MB/s     2.6MB/s 
 cifs:    3.5MB/s     1.9MB/s 
 samba:   2.2MB/s     1.85MB/s 

 nfs-server Version: 2.2beta47-2

--titoo


Dhrystone

The Dhrystone benchmark mostly runs from cache and says something about CPU performance but relatively little about overall system performance. The system is an unmodfied NSLU2 (except for serial port addition) running Unslung 3.18 beta, gcc 3.3.5 with stock libs and compiler flags "-O3 -mcpu=xscale".

Dhrystone 2.1:

Microseconds for one run through Dhrystone: 6.4 Dhrystones per Second: 155440.4 VAX MIPS rating = 88.469

Dhrystone 1.1:

Dhrystone( 1.1) time for 3000000 passes = 16.4 Register option selected? NO This machine benchmarks at 182815.4 dhrystones/second VAX MIPS rating = 104.050

These results look low to me for a 266 MHz xscale so I checked the rate at which the performance monitor register CCNT counts and saw 133 MHz. Maybe the core on the NSLU2 is running only at 133 MHz. Some other xscales have a frequency change procedure that kernels or bootloaders can get wrong-- but no such procedure appears to be documented for the IXP420. For comparison, my 233 MHz Pentium2 MMX running NetBSD 1.6.2 yields 189 and 213 VAX MIPS from Dhrystone 2.1 and 1.1 respectively.

-- yahpn

Additional tests on SlugOS/BE

Environment:

  • NSLU2 266Mhz, 2.6.16, armv5teb - SlugOS/BE
  • HD ST3320620A(Seagate, SATAII, 7200rpm, 16MB) in a Thermaltake MAX4 enclosure,
    • filesystem type : ext3 (mounted with -o async and echo 1024> /sys/block/${disk_name}/device/max_sectors)
    • samba (3.0.27a) : socket options = TCP_NODELAY IPTOS_LOWDELAY SO_KEEPALIVE SO_SNDBUF=65536 SO_RCVBUF=65536
  • WRT54GL, 250Mhz, dd-wrt
  • WinXP Sp3, AthlonXP 2500+,1GB, HD SATAI - ST3120026AS
 READWRITE
ftp5.6MB/s6.0MB/s
dd16.9MB/s19.2MB/s
samba5.0MB/s4-5MB/s

--adriansi

Page last modified on November 10, 2008, at 09:58 PM