Social Media

Multi-homed distributed JVM

Categories
Published
of 50
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Master Thesis Multi-homed distributed JVM Noah Heusser Mathias Payer Responsible assistant Prof. Thomas R. Gross Laboratory for Software Technology ETH Zurich September Abstract In the past year
Transcript
Master Thesis Multi-homed distributed JVM Noah Heusser Mathias Payer Responsible assistant Prof. Thomas R. Gross Laboratory for Software Technology ETH Zurich September 2011 Abstract In the past year a move from systems based on a single main-frame to systems based on a cluster of many servers could be observed. Whereas main-frame systems provide a shared memory, server cluster can only communicate using message passing. It is widely accepted that developing software for shared memory systems is easier then it is for message passing systems. Thats why it would simplify programming if we provided an abstraction layer on top of a message passing system, which allows to develop within the shared memory paradigm. The Java platform defines the Java Virtual Machine (JVM), which is a virtual shared memory system. In the past years Java has become one of the most popular programming languages. Therefore a wide collection of software for the JVM is available today. We want to provide the abstraction layer which allows to execute this software in a message passing system. To achieve that we extend the Gnu Java Compiler (GCJ). The modified GCJ creates binaries which can distribute the program execution in a message passing cluster system. The programmer only needs to know about parallel programming using shared memory. In this thesis we implemented a prototype, which allows to execute programs, which use a subset of the Java runtime. The prototype proves that the basic design we have specified can be implemented. Our benchmarks show the possibilities and limitations of our solution. iii Zusammenfassung In den vergangenen Jahren fand eine Bewegung weg von Gross-Systemen hin zu Cluster-Systemen statt. Gross-Systeme verfügen über einen Speicher, der von allen Recheneinheiten gemeinsam genutzt wird. Cluster-Systeme hingegen können nur über das Versenden von Nachrichten miteinander kommunizieren. Man geht allgemein davon aus, dass die Entwicklung von Programmen für Systeme mit gemeinsam genutztem Speicher einfacher ist. Darum würde es die Programmierung von Nachrichten basierten Systemen vereinfachen, wenn diese auf einen gemeinsamen Speicher zugreifen könnten. Ein solcher Speicher könnte von einer Zwischenschicht, die auf dem Nachrichten basierten System aufbaut, zur Verfügung gestellt werden. Die von der Java Plattform definierte Java Virtual Machine (JVM) ist eine virtuelle Maschine mit gemeinsam genutztem Speicher. In den letzten Jahren wurde Java zu einer der populärsten Programmiersprachen. Dementsprechend ist heute eine grosse Bandbreite an Java Programmen verfügbar. Wir möchten eine Zwischenschicht bauen, die das Ausführen dieser Programme auf einem Nachrichten basierten Cluster erlaubt. Dabei muss der Programmierer nur Kenntnisse der parallelen Programmierung mit gemeinsam genutztem Speicher haben. In dieser Arbeit implementieren wir ein erstes Modell, welches die Ausführung von Programmen erlaubt, welche eine Untermenge der von der Java Laufzeitumgebung zur Verfügung gestellten Funktionalität nutzen. Dieses erste Modell zeigt, dass die entwickelten Ideen umsetzbar sind. Zudem haben wir Messungen gemacht welche die Möglichkeiten und Limitierungenen unseres Systems aufzeigen. v Contents 1 Introduction Conventions in this Document Background Info Java Memory Model (JMM) The volatile Keyword GCJ Overview, Hooks/Library GCC Overview GCC Front-End GIMPLE / GENERIC The GCJ Front-End libgcj Standard Java Classes Design A JMM Compatible Object Access A JMM Compatible Memory Access Sharing Objects Distinction between Local and Shared Objects volatile Variables Load Distribution Thread Synchronization I/O and Syscalls vii viii Contents 4 Implementation Basic Technology Project Overview Messageing Provided Functionality Code Design Network and Transport Layer Meshing Protocol Message Transfer Protocol Sharing Objects Object Creation and Memory Layout Local, Globally Known and Shared Objects Object Forward Protocol Object Access Strategy Implementation Changed JVM Byte-Code Instructions Remote Thread Creation Load Distribution Start Remote Thread Thread Synchronization I/O and Syscalls Evaluation Benchmarks Test Environment Available-Object Access Network Setup Messaging Module Object Transfer Object Round Trip Thread Scheduling Contents ix 6 Future Work Missing Functionality Monitor Enter and Exit Wait and Notify Volatile Variable Class Migration I/O and Syscalls Thread Migration Test: Object Migration on Read Preload Objects which Become Needed in the Future A Distributed GC Faster Object Access Implementation Less Messages for Load Distribution Fast Available-Object Access Related Work JavaSplit Hyperion Jakal Java/DSM Jessica Jessica cjvm Conclusion 47 A GCC Coding 49 A.1 Creating a GENERIC Expression A.2 Adding a new Function to libgcj A.3 Adding a new Field to Object A.4 Extending the GCJ 1 Introduction Writing distributed application requires a deep understanding of distribution itself. By distribution the program flow gets out-of-order which results in nondeterministic behavior. This results in errors which only occur in certain execution orders. Such errors are extremely hard to reproduce and fix. Nevertheless distribution is often required. That is why programmers have tried to move the complexity introduced by the distribution into frameworks. The optimal solution would be a software that converts a standalone application into a distributed one without human help required. To write such a software we have to answer the following questions: Which are the parallelizable subtasks we have in our application? Which subtasks have to be finished before others can start? What data needs to be shared between different subtasks? What consistency model do we have to provide when sharing data between subtasks? By looking at Java we realize that all these questions are answered at execution time. The different threads represent different subtasks that can be executed in parallel. By wait() and notify() calls we get informed when a subtask has to wait for an other s results. Data is represented as objects in Java. We have to share exactly those objects between two threads which are referenced by both thread objects. Mathematically spoken: We share objects that are in the intersection of the transitive closure of the objects that are linked by the thread objects. The last question about the consistency model is answered by the Java Memory Model as discribed in JSR-133 [6] (JMM). By writing a software which takes class files and makes them execute in a distributed environment we step into the footsteps of many predecessors. These predecessors project s can be grouped in two dimensions. One is the way they inserted the code which is necessary to handle the complexity which was introduced by the distribution. There three main concepts exist. 1. Use byte-code rewriting to inject the code. [5] 2. Build a JVM which is aware of the distribution. The code is inserted when the byte-code gets interpreted. [3, 9, 13, 14] 3. Write a special compiler that inserts the code. [1, 12] 1 2 1. Introduction The other dimension is the way the memory is synchronized between the nodes. Some use a preexisting page based Distributed Shared Memory (DSM) system [1, 9, 13]. Others exploit the informations about the object structure and synchronize objects instead of pages [5, 14]. In this project we extend the Gnu Java Compiler (GCJ) and create a compiler based solution that synchronizes objects. Most of these predecessor projects provided consistency guaranties which go beyond the ones that are specified in the JMM. In our Project we want to reduce the synchronization overhead by only guaranteeing what the JMM requires. In addition we assume that a write access to an object is often followed by an other write access to the same object of the same thread. Thats why we implement a protocol that makes repetitive access to an object fast. To achieve this speed up we accept that a single access to an object may be slower than in other implementations. 1.1 Conventions in this Document Filenames, directories and paths will be shown in italic letters (gcc-src/libjava/cluster/net.cc). The prefix gcc-src/ specifies the root of the GCC source directory. Variables, constants, functions, classes or command names are written in typewriter-font (FUNCTION DECL). 2 Background Info 2.1 Java Memory Model (JMM) The JMM defines the consistency model every JVM has to provide. The JMM is based on a main memory which is shared among all the threads in the system and a thread local cache memory. Figure 2.1 shows that model. A read access goes to the local cache. If the value is not available there, it is loaded from the shared main memory into the local cache and can be used by the thread. Write operations are only done in the cache. The synchronization with the shared main memory is done by the monitor enter and monitor exit commands of the JVM. When a thread executes a monitor enter, its local read cache gets flushed. So after a monitor enter a thread can assume to see all changes that were done in the shared main memory before the monitor enter was executed. Monitor exit on the other side writes all the changes in the write cache into the main memory. Figure 2.1: Java Memory Model To make this consistency model more clear we give an example. Assume a counter which is shared between the two threads A and B. The counter has the value 0 and is not protected by a monitor. Both threads increase the counter. Now the correct value would be 2, but because non of the threads executed a monitor enter or monitor exit command, both see the value 1. Then thread A executes a monitor enter and flushes its read cache. This does not change anything, because thread A still sees the value it has written. The write cache is only flushed at monitor exit. But after the monitor enter, thread A exits the monitor. Now the value 1 is written to the shared main memory. After this has happened, thread B enters a monitor. It flushes its read cache, but the value 1 which is in its write cache remains untouched. When thread B leaves the monitor again it will write 1 to the main memory. So we have lost one of our counter increment. Assume we had protected our counter by a monitor. Both threads want to increase the counter. Thread A enters the monitor and flushes its read cache. Then the counter s value 0 gets loaded from the shared main memory. Thread A then increases the counter by 1 and leaves the monitor. 3 4 2. Background Info So the new value 1, which is in the write cache of thread A, is written to the shared main memory. At the same time, thread B wanted to enter the monitor too. But it was blocked until thread A exited the monitor. Now that thread A did so B is allowed to do the monitor enter. It flushes its read cache. Then it loads the counters value from the shared main memory. This value is now 1 because thread A wrote the 1 when it exited the monitor. Thread B increases the counters value and stores the value 2 to the local write change. Then it exits the monitor and stores the value 2 in the write cache to the shared main memory. No updates are lost The volatile Keyword In real live we would not implement the counter in the former example ourself. The Java library provides the class AtomicInteger which provides a method called incrementandget. If we look at the implementation provided by the GNU Classpath project of incrementandget method, we see that it does not use a monitor. But the value variable in the AtomicInteger which holds the current value of the AtomicInteger is defined using the modifier volatile. This modifier defines that the field, which holds the volatile modifier, always bypasses the local read and write cache. This bypass functionality in combination with the Java provided compareandset method allows the implementation of the counter without a monitor. Now we see that using the compareandset on a non-volatile value does not make any sense. 2.2 GCJ Overview, Hooks/Library The GCJ is the Java compiler of the Gnu Compiler Collection (GCC). I will first explain how a general compiler in the GCC works and later go into details about the GCJ GCC Overview Every Compiler in the GCC translates a source file into an assembler file. A specific compiler is defined by the language of the source file and the architecture the assembler file is written for. The compiler has nothing to do with assembling and linking. This is done by the extra tools as (for assembling) and ld (for linking). To make compiling more convenient to the programmers, the compilers usually call the assembler and linker for us by default. The data flow is shown in Figure 2.2. The Compiler itself is divided into three parts. Front-, Middle- and Back-End. The Front-End defines what language the compiler compiles and transforms it into the GCC internal language GIMPLE. The Middle-End is the same for all compilers. It does neither depend on the language nor on the target platform. Its job is to perform non-platform specific optimizations. The Back-End does the translation from GIMPLE into the Register Transfer Language (RTL) and later to assembler code for the target platform. 2. Background Info 5 The only important thing to know is that they use GIMPLE. This means changes in the Front-End will automatically be optimized and will be platform independent GCC Front-End A GCC Front-End is a subdirectory in the gcc-src/gcc/ in which at least the following files exist: Make-lang.in The idea of this file is to know how to compile the new Front-End. This file will be included in the main makefile. So we can not expect it to execute in the local directory. It is mandatory to use the full path. config-lang.in It is included by the gcc-src/gcc/configure shell script. It defines for example the name of the language and of the compilers that get created. lang-specs.h Specifies informations about the language we are writing a Front-End for. For example the file name extensions that are expected for input files. The best way to understand the content of these files is to read their equivalent in the existing language directories. The Front-End is where we do all our changes in this project. Since the GCC 4 the design of the Front-End is standardized. First the Front-End converts the source file into a language specific GENERIC-tree. Then the Gimplifier converts this GENERIC-tree into a GIMPLE-Tree. The Gimplifier will work out of the box if all language specific GENERIC expressions have a defined way to convert them into GIMPLE expressions. We will talk more about GIMPLE and GENERIC in Section Figure 2.2: GCC Architecture GIMPLE / GENERIC GIMPLE and GENERIC are GCC internal, language independent tree representations of a source code. Like a language independent Abstract Syntax Tree (AST). GENERIC is a superset of GIMPLE. GIMP LE GENERIC This allows us to use the rich language GENERIC (similar to Java) to represent the functionality we want to compile. Later, when we do optimization, we can assume the simpler language GIMPLE 6 2. Background Info (similar to C) which is easier to optimize and to convert to a Static Single Assignment (SSA) compliant form. The process of converting GENERIC to GIMPLE is called Gimplification. Gimplification is done by the function gimplify function tree which takes a FUNCTION DECL as argument. That parameter shows that GIMPLE and GENERIC work with functions as code unit. To make it easier for the programmer, GENERIC provides some extensions to represent the object oriented world. gcc-src/gcc/tree.def contains all expressions and a lot of comments to explain each expression The GCJ Front-End GCJ is the Front-End for Java. GCJ does not compile the Java language. It uses the Eclipse Complier to to that. GCJ takes.class or.jar files and compiles them to executable programs. An other specialty is that Java does not only need a compiler, but also a runtime environment. The GCJ does not compile this runtime environment into the created binary, but creates a library called libgcj which is the same for all Java executables libgcj The libgcj library is written in C++ and can be found at gcc-src/libjava. The Library exports the functions with names starting with Jv or Jv as described in gcc-src/libjava/libgcj.ver. A good overview over the provided interface can be achieved by reading the file gcc-src/libjava/libgcj bc.c. It is used as a dummy library for linking the Java library itself. The library provides three main tasks: Standard Classes Java has a lot of standard classes like java.lang.object or java.lang.thread. These classes are available in libgcj. The makefile which produces this library will notice when a changes is made in the code of the library and recompile it, but it will not realize when the compiler has changed. Therefor the whole library has to be recompile after something has changed in the GENERIC code the compiler creates. Garbage Collector The Boehm Garbage Collector (GC) [4] is used. Its code can be found in gcc-src/boehm-gc. The boehm-gc folder contains a specially tuned interface for the GCJ. It is important to know that there is no need to maintain link-tree informations to the Boehm GC. The Boehm GC is a conservative GC. It assumes all the allocated data as possible pointers. If it can not find the address of an allocated memory area in any of the allocated memory areas, it frees the memory and starts again. Complex JVM commands The JVM supports complex commands like monitorenter or new. It would be complex to implement these commands in GENERIC. Thats why there are functions, written in C, in libgcj which do their tasks. In the GENERIC code these functions need to be called. I will explain that in detail in Section A.2. 2. Background Info Standard Java Classes There are multiple subfolders of gcc-src/libjava where we can find standard Java classes: classpath This is the unmodified Classpath project. The majority of files in this directory are.java files. If GCJ does not change the implementation of these classes, they become part of libgcj. java,javax,sun,org The majority of files in these folders are.h files. By creating C++ header files, it is possible to call any Java method from C++ code. But there are some.java and nat*.cc files. The.java files overwrite the.java file in the classpath directory. The nat*.cc files implement the native methods of a class in C++. Because Java has no special commands to create new threads or do the I/O, the native methods are an important tool in Java. When we edit native classes, we can access all Java classes. But although we are writing C++ code, we are not able to access the C++ stdlib. 3 Design The objective of this project is to reduce the complexity of distributed application programming to the complexity of parallel multi threading in a shared memory system. The JVM [8] is a potentially parallel multi threading system which is widely used and has a defined consistency model 1 for its shared memory. The JVM in combination with Javas standard library provides all abstractions we need for the Distributed Virtual Machine (DVM) we want to design like networking, hardware, file system and more. To provide the JVM abstraction on top of a distributed system we have to meet the following challenges: Shared Memory A distributed system has no global shared memory, but the JVM requires a memory that is shared among all threads. We provide an object based global distributed shared memory based on eager object migration. Eager object migration means that a node receives exclusive write permission to an object as soon as the node wants to write to that object and it keeps that permission until an other node wants to write to the object. Eager object migration has the advantage over remote updates that multiple updates on an object trigger network traffic only once. We exploit that many objects are only accessed by one thread which makes it unnecessa
Search
Similar documents
View more...
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks