Linux Kernel Network Subsystem Patching

Wed Jan 22 21:41:46 UTC 2014

On Wed, Jan 22, 2014 at 02:37:55PM -0500, D. Hugh Redelmeier wrote:
> I found that very confusing, and I understood the subject before
> reading it.
> 
> The term "thread" in hardware is not the same as "thread" in
> software.  A thread in hardware is just the execution of a process
> (which might involve execution of a software thread).
> 
> A "thread" in software is just a process that shares a lot of
> resources with other threads.

A thread is a CPU context and stack running within a process's memory
space.
A software process is one or more threads sharing a memoryspace.
If you avoid things like thread local storage (TLS), then memory is
shared between all threads in a process.

> Software view:
> 
> Unix Process: essentially a running program.  Remember, you can have
> multiple instances of the running program, assuming it is written to
> not have multiple instances trip over each other.

I would almost have thought that you had to write your program to trip
over itself.  Most things should automatically not do so, but then again
I am probably thinking of the standard utilities lke cat and such which
tend to work with files specified on the command line and pipes and such,
and hence really can't interfere with other copies unless the users asks
for it.

> This "trip over each other" criterion is actually very important.  Old
> style Unix programs did that pretty well.  The only issue was global
> resources, for example a wired-in file that might be written to.
> 
> Unfortunately, most GUI programs to have global resources that they
> don't share well: config files, caches, and who knows what else.  When
> you try to run multiple copies of Firefox, it discovers this and
> actually only runs the one copy, with multiple windows.
> 
> Windows programs (of which I know little) seem to be like GUI
> programs: they are not very good at having multiple instances NOT
> stepping on each other.
> 
> Unix processes can only share a few things accidentally (files), and
> they are fairly easy to get straight.
> 
> Threads are different.  Traditionally, multiple threads within the
> same program share almost everything.  For example, memory and hence
> most variables.  So instances very easily step on each other.  It takes a
> lot of care to design a program with threads that is both efficient and
> not buggy.  It's not a great paradigm and I avoid it like the plague.
> 
> Multiprogramming / multi-tasking: a property of an operating system
> (OS) that allows multiple processes to run at (roughly) the same time.
> All modern OSes do that, but it wasn't always so.
> 
> Hardware view:
> 
> Multiprocessor: hardware with multiple CPUs.  It takes work to make an
> OS support it, but Linux got that support about 20 years ago.  Windows
> has it too.  Unix and OSX have supported it for quite a while.  To
> properly written user software, it is a non-issue because all that
> happens is that the OS can let multiple process run and exactly the
> same time, which is more or less indistinguishable from (roughly) the
> same time.
> 
> An OS might do a better job if it knows that certain paths between
> pairs of processors have different costs (eg. NUMA, shared caches,
> ...).  For example, an OS scheduler probably tries to restart a
> process on the same processor on which it ran the last time in the
> hope that the cache retains relevant data.

That probably explains the statement I saw that Windows 2000 ran better
with hyperthreading off, since it did not have a clue about the SMT stuff
and would schedule two tass on one core while leaving other cores idle
and getting subpar performance as a result.

> Multicore: a multiprocessor that has several CPUs on the same chip.
> From a software view, indistinguishable from Multiprocessor.
> 
> HyperThreaded (HT): Intel's trademark(?) for Simultaneous
> Multi-Threaded (SMT). It has probably displaced SMT as a term.

IBM very much uses the term SMT for the power line of CPUs.

A nice system like the IBM p795 has 32 CPUs, each with 8 cores, each
with 4 threads.  So it qualifies as multi processor, multi core (both
SMP really), multi thread (SMT).  So 256 cores, with a total of 1024
hardware threads.  I seem to recall hearing a presentation once where
someone mentioned having tried doing a linux kernel 'make allyesconfig
-j' on such a machine, and having the complete kernel with all drivers
built in about 5 seconds.

> Simultaneous Multi-Threaded: implement multi-core, but with a lot of
> shared hardware resources.  These hardware resources are expensive.
> By design, this sharing is invisible to software, except for
> performance.  SMT can opportunistically keep the hardware usefully
> employed when work on one process is "stalled" (eg. due to a memory
> fetch).  How can this go wrong (apart from bogus implementation)?
> One way is that things like the L1 cache are likely shared and
> therefore less effective for each "processor".

Well the P4 did SMT with just one core, as does the atom chips.

> Since the OS can switch a processor between processes when progress on
> the first is delays (multitasking), why is SMT useful?  Because
> multitasking and SMT operate on quite different time scales.  Process
> switching takes many microseconds and involves storing and restoring a
> lot of state (eg. registers), taking many OS instructions.  SMT
> switching is all done by hardware and takes just a few machine cycles,
> less than the time to access a single memory location.  But SMT
> requires that the hardware have duplicate resources for holding
> exactly that state.

SMT takes no time to switch because each thread has it's own set of
registers.  The CPU state is duplicated, the other resources are generally
shared (although a design can choose to duplicate as much as it wants
in addition to the CPU state bits).

> Multitasking is good for exploiting hardware that would otherwise be
> idle while waiting for I/O.  SMT is good for exploiting hardware that
> would be idle while awaiting a memory fetch.
> 
> Summary: as a programmer, you don't need to care about Multicore, HT,
> SMT.  Except that you need to remember that processors are not getting
> faster, they are getting wider.  If you don't parallelize your
> software stack, you are not getting all the CPU crunch out of your
> hardware.  If CPU isn't a bottleneck, then this matters not at all.

Certainly don't expect much improvement in clock speeds or performance
for a single thread going forward.  If your code can't be parallelized,
then your performance limit has pretty much been hit at this point.

> Don't write multi-threaded programs without a great deal of
> forethought.

That is certainly true.

> There are ways of exploiting multiprocessing that are easy:
> 
> - lots of independent programs can be run at the same time
> 
> - lots of copies of the same program can be run (as long as it is
>   polite about sharing global resources)
> 
> - you can write programs in a language which doesn't force the
>   programmer to explicitly deal with threads (eg. Erlang).

Certainly some interesting research into having compilers generate
parallel code automatically from loops and other things in the code.

-- 
Len Sorensen
--
The Toronto Linux Users Group.      Meetings: http://gtalug.org/
TLUG requests: Linux topics, No HTML, wrap text below 80 columns
How to UNSUBSCRIBE: http://gtalug.org/wiki/Mailing_lists