Log In | Get Help   
Home My Page Projects Code Snippets Project Openings UPC Operations Microbenchmarking Suite
Summary Activity Tracker Lists Docs News SCM Files
[uoms] Diff of /trunk/uoms-doc/doc.tex
[uoms] / trunk / uoms-doc / doc.tex Repository:
ViewVC logotype

Diff of /trunk/uoms-doc/doc.tex

Parent Directory Parent Directory | Revision Log Revision Log | View Patch Patch

revision 13, Wed May 12 11:28:19 2010 UTC revision 14, Mon Nov 29 18:07:07 2010 UTC
# Line 32  Line 32 
32   \item \texttt{doc/manual.pdf}: This file. User's manual.   \item \texttt{doc/manual.pdf}: This file. User's manual.
33   \item \texttt{COPYING and COPYING.LESSER}: Files containing the use and redistribution terms (license).   \item \texttt{COPYING and COPYING.LESSER}: Files containing the use and redistribution terms (license).
34   \item \texttt{changelog.txt}: File with changes in each release.   \item \texttt{changelog.txt}: File with changes in each release.
35     \item \texttt{Makefile}: Makefile to build the benchmarking suite. It relies on the src/Makefile file.
36   \item \texttt{src/affinity.upc}: UPC code with affinity-related tests.   \item \texttt{src/affinity.upc}: UPC code with affinity-related tests.
37   \item \texttt{src/config/make.def.template.*}: Makefile templates for HP UPC and Berkeley UPC.   \item \texttt{src/config/make.def.template.*}: Makefile templates for HP UPC and Berkeley UPC.
38   \item \texttt{src/config/parameters.h}: Header with some customizable parameters.   \item \texttt{src/config/parameters.h}: Header with some customizable parameters.
# Line 54  Line 55 
55  \section{Operations tested}  \section{Operations tested}
56    
57  \begin{itemize}  \begin{itemize}
58    \item \texttt{upc\_forall} (read elements of a shared array)
59    \item \texttt{upc\_forall} (write elements of a shared array)
60    \item \texttt{upc\_forall} (read+write elements of a shared array)
61    \item \texttt{for} (read elements of a shared array)
62    \item \texttt{for} (write elements of a shared array)
63    \item \texttt{for} (read+write elements of a shared array)
64  \item \texttt{upc\_barrier}  \item \texttt{upc\_barrier}
65  \item \texttt{upc\_all\_broadcast}  \item \texttt{upc\_all\_broadcast}
66  \item \texttt{upc\_all\_scatter}  \item \texttt{upc\_all\_scatter}
# Line 107  Line 114 
114  \item \texttt{upc\_free}  \item \texttt{upc\_free}
115  \end{itemize}  \end{itemize}
116    
117    The \texttt{upc\_forall} and \texttt{for} benchmarks test the performance of accesses to a shared \texttt{int} array in read, write and read+write operations. The \texttt{upc\_forall} benchmark distributes the whole workload across threads, whereas in the \texttt{for} benchmark all the work is performed by thread 0. This is useful for testing the speed of remote accesses and optimization techniques such as coalescing. The operation performed in read is a sum of a variable in the stack and the current element in the array, to prevent the compiler from dropping the first $N-1$ iterations. The operation performed in write is a simply update of the elements with its position in the array. The operation performed in read+write is a sum of the current element and its position in the array.
118    
119  In bulk memory transfer operations there are two modes: remote and local. Remote mode will copy data from one thread to another, whereas local mode, will copy data from one thread to another memory region with affinity to the same thread.  In bulk memory transfer operations there are two modes: remote and local. Remote mode will copy data from one thread to another, whereas local mode, will copy data from one thread to another memory region with affinity to the same thread.
120    
121    
# Line 166  Line 175 
175    
176  \item \texttt{-input FILE}: Read user defined list of benchmarks to run from \texttt{FILE}. Valid benchmark names are:  \item \texttt{-input FILE}: Read user defined list of benchmarks to run from \texttt{FILE}. Valid benchmark names are:
177  \begin{itemize}  \begin{itemize}
178    \item \texttt{upc\_forall\_read}
179    \item \texttt{upc\_forall\_write}
180    \item \texttt{upc\_forall\_readwrite}
181    \item \texttt{for\_read}
182    \item \texttt{for\_write}
183    \item \texttt{for\_readwrite}
184  \item \texttt{upc\_barrier}  \item \texttt{upc\_barrier}
185  \item \texttt{upc\_all\_broadcast}  \item \texttt{upc\_all\_broadcast}
186  \item \texttt{upc\_all\_scatter}  \item \texttt{upc\_all\_scatter}
# Line 231  Line 246 
246  To compile the suite you have to setup a correct \texttt{src/config/make.def} file. Templates are provided to this purpose. The needed parameters are:  To compile the suite you have to setup a correct \texttt{src/config/make.def} file. Templates are provided to this purpose. The needed parameters are:
247    
248  \begin{itemize}  \begin{itemize}
249  \item \texttt{CC}: Defines the C compiler used to compile the C code. Please note this does not involve the resulting C code generated from the UPC code if your UPC compiler is a source to source compiler.  \item \texttt{CC}: Defines the C compiler used to compile the C code. Please note this has nothing to do with the resulting C code generated from the UPC code if your UPC compiler is a source to source compiler.
250  \item \texttt{CFLAGS}: Defines the C flags used to compile the C code. Please note this does not involve the resulting C code generated from the UPC code if your UPC compiler is a source to source compiler  \item \texttt{CFLAGS}: Defines the C flags used to compile the C code. Please note this has nothing to do with the resulting C code generated from the UPC code if your UPC compiler is a source to source compiler
251  \item \texttt{UPCC}: Defines the UPC compiler used to compile the suite  \item \texttt{UPCC}: Defines the UPC compiler used to compile the suite
252  \item \texttt{UPCFLAGS}: Defines the UPC compiler flags used to compile the suite. Please note you should not specify any number of threads flag at this point  \item \texttt{UPCFLAGS}: Defines the UPC compiler flags used to compile the suite. Please note you should not specify the number of threads flag at this point
253  \item \texttt{UPCLINK}: Defines the UPC linker used to link the suite  \item \texttt{UPCLINK}: Defines the UPC linker used to link the suite
254  \item \texttt{UPCLINKFLAGS}: Defines the UPC linker flags used to link the suite  \item \texttt{UPCLINKFLAGS}: Defines the UPC linker flags used to link the suite
255  \item \texttt{THREADS\_SWITCH}: Defines the correct switch to set the desired number of threads. It is compiler dependent, and also includes any blank space after the switch  \item \texttt{THREADS\_SWITCH}: Defines the correct switch to set the desired number of threads. It is compiler dependant, and also includes any blank space after the switch
256  \end{itemize}  \end{itemize}
257    
258  Once you have set up your \texttt{make.def} file you can compile the suite as  Once you have set up your \texttt{make.def} file you can compile the suite.
259  following:  \\
260    
261    For a static thread setup type:
262    
263  \texttt{make NTHREADS=NUMBER\_OF\_UPC\_THREADS}  \texttt{make NTHREADS=NUMBER\_OF\_UPC\_THREADS}
264    
265  E.g., for 128 threads:  E.g., for 128 threads:
266    
267  \texttt{make NTHREADS=128}  \texttt{make NTHREADS=128}
268    \\
269    
270    For a dynamic thread setup just type:
271    
272    \texttt{make}
273    
274    
275    
# Line 304  Line 324 
324    
325  \normalsize  \normalsize
326    
327  The header indicates the benchmarked function and the number of processes involved. The first column shows the size used for each particular row. It is the size of the data at the root thread, or in any thread in a non-rooted operation. The second column is the number of repetitions performed for that particular message size. The following three columns are, respectively, the minimum, maximum and average latencies. The last column shows the aggregated bandwidth calculated using the maximum latencies. Therefore, the bandwidth reported is the minimum bandwidth achieved in all the repetitions.  The header indicates the benchmarked function and the number of processes involved. The first column shows the block size used for each particular row. The second column is the number of repetitions performed for that particular message size. The following three columns are, respectively, the minimum, maximum and average latencies. The last column shows the aggregated bandwidth calculated using the maximum latencies. Therefore, the bandwidth reported is the minimum bandwidth achieved in all the repetitions.
328    
329  Moreover, when 2 threads are used, affinity tests are performed. This way you can measure the effects of data locality in NUMA systems, if the 2 threads run in the same machine. This feature may be useful even when the 2 threads run in different machines. E.g.: Machines with non-uniform access to the network interface, like quad-socket Opteron/Nehalem-based machines, or cell-based machines like HP Integrity servers. The output of this tests is preceded with something like:  Moreover, when 2 threads are used, affinity tests are performed. This way you can measure the effects of data locality in NUMA systems, if the 2 threads run in the same machine. This feature may be useful even when the 2 threads run in different machines. E.g.: Machines with non-uniform access to the network interface, like quad-socket Opteron/Nehalem-based machines, or cell-based machines like HP Integrity servers. The output of this tests is preceded with something like:
330    

Legend:
Removed from v.13  
changed lines
  Added in v.14

root@forge.cesga.es
ViewVC Help
Powered by ViewVC 1.0.0  

Powered By FusionForge