Technology

The Problem with Data -- about data gravity in computing clouds

Categories
Published
of 29
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
Utility Computing requires levels of efficacy and abstraction that are not matched by modern computing clouds. We argue that the lack of harmonization of data and computation is holding back computing clouds from evolving further into utility clouds.
Transcript
  • 1. The Problem with Data --about Data gravity in computing clouds 2nd International workshop on Big Data, London, 9 December 2014 Coral Walker, Joerg Fritsch
  • 2. Utility Computing requires levels of efficacy and abstraction that are not matched by modern computing clouds.
  • 3. Agenda 1. Let’s have a look at Data! 2. Functional Programming Languages, Data Flow Languages or Stream processing to the rescue? 3. Fixing Data?
  • 4. Let’s have a look at Data!
  • 5. The three dimensions of Data • Variety • Range of data types and sources • ALL data has structure, but structure may not be discovered yet at time of ingestion • Velocity • For example: social networking feeds, multi-media streams (audio, video) • 2013: Internet consisted of 640TB of data in motion per minute • Volume • Big Data because of impressive volume • Map Reduce Framework parallelized analytics  Hadoop • Distributed queries
  • 6. The three dimensions of Data Data can be challenging because of any of these dimensions or a combination of several dimensions.
  • 7. Data Gravity • Data has gravitational pull. It pulls computation to it. For example: Map Reduce on Hadoop. • However, Computing Clouds centralize and rationalize computation! • Focus on Computation goes “all the way down” to the CPU  no data centric improvements in the past years (except AES-NI, maybe). • Implications of Data on the silicon (aka: CPU, ASICS, …) are little investigated. • But this holds us up!
  • 8. Why is better integration with Data essential? The lack of harmonization of data and computation is holding back computing clouds from evolving further into utility clouds.
  • 9. Functional Programming Languages, Data Flow Languages or Stream processing to the rescue?
  • 10. Stonebraker’s eight Eight criteria to excel in processing data in motion 1. Keep data moving 2. SQL on streams 3. Handle Stream imperfections 4. Predictable outcome 5. High availability 6. Stored and Streamed data 7. Distribution and scalability 8. Instantaneous response
  • 11. In depth assessment: Functional Programming Stonebraker’s requirement FPL (for example: Haskell) Required add-ons (examples) Keep data moving Messaging, in-memory computation SQL on streams FPLs and SQL are declarative Parser, Tokenizer and Interpreter Handle stream imperfections Currying potentially decouples and space and time Decoupling across all layers, Tuple Space (?) Predictable outcome Evaluation eventually ends High availability Application Containers (?) Stored and Streamed data Functional Reactive Programming, Map Reduce Lambda Architecture (?) Distribution & scalability For example: Currying, Code maintainability Means of coordination, Tuple Space, LINDA Instantaneous response
  • 12. Dataflow Programming • Started in the 1970s • Academic research focuses on Dataflow Programming as abstraction to model parallel programs -- as Dataflow Graph (DFG). • Data Flow Programming languages are very close to FPLs! • Commercial research focusing on stream processing models.
  • 13. Stream Processing • Operate in real-time, for example online advertising, sensor data, multi media streams. • Time complexity O (N log N). • Reduces required hardware base and energy consumption  computation happens in transit not where data is terminated. • Supports recursion and machine learning (ML)  Map Reduce needs some work around to support recursion and ML. • Not invasive, no change in programming model  Map Reduce required a change to batch mode.
  • 14. •Observation: All programming languages and paradigms have missing pieces and cannot match Stonebraker’s eight. •Assumption: The eight requirements should be matched by an architecture rather than by a programming language.
  • 15. Fixing Data?
  • 16. Fixing Data by eight Principles? (1) • Associative lookup should be preferred. • For example, associative lookup is used in: Data Flow Programming Content Addressable Memory (CAM) of network switches and routers Tuple Spaces • Our architecture is based in a Tuple Space, thus we broaden the applicability of associative lookup.
  • 17. Fixing Data by eight Principles? (2) • The fabric must be a dynamic scalable distributed system. • No need top explain :D • Key requirements: Framework/architecture should be asynchronous (later we will use the UDP protocol) Shared nothing Elastic “Green”! (not addressed in our research)
  • 18. Fixing Data by eight Principles? (3) • Next-gen platforms must handle stored data and streamed data. • Streamed data, for example events, that need to find an encapsulated app to get processed. In this case the encapsulated app, that is data as well, has the higher gravitational pull and attracts event data. • Stored data, for example larger (file) objects, that have high gravitational pull and bring services temporarily close to them for the time needed to process them.
  • 19. Fixing Data by eight Principles? (4) • A global name space to virtualize data objects and apps must be provided. • Dedicated name spaces isolate resources (for example: jailing apps, Linux containers, network name spaces). • To many dedicated name spaces that may need links that are too expensive. For example: JSON/serialization, (transmission) protocols, etc.  there may not even bee asynchronous interaction between name spaces!
  • 20. Fixing Data by eight Principles? (5.1) • Transmission protocols are the main cost center. • Data transfer and message passing should be optimistic and based on the UDP protocol preserving the asynchronous character of all components and communications. • UDP often disputed, --but look yourself (next slide!) • Shared Nothing, Asynchronous, … remember what we said two slides ago?
  • 21. Fixing Data by eight Principles? (5.2)
  • 22. Fixing Data by eight Principles? (6) • Declarative programming, such as in SQL and FPLs, is preferred. • FPLs bring a lot to the table, too much to ignore (see previous section “In Depth assessment: Functional Programming”).
  • 23. Fixing Data by eight Principles? (7.1) • The emulation of traditional tier-based computing needs to be removed from computing clouds and replaced with a unified fabric. • Modern computing clouds have no affinity to traditional tier-based computing, but they emulate it. • Concept of tiers has been around since 1998 • Costly serialization (of data) required at every system boundary  latency! • Often depicted w three simple tiers: web server, application server and data(base) • Many more devices & protocols involved: redundant load balancers, spanning tree, etc.
  • 24. Fixing Data by eight Principles? (7.2) • To date: not many alternatives • Space based architectures • Gigaspaces • Tibco activespace • Notion of a one stop shop • Networks  L2 Ethernet fabrics • Networks  Integrated packet processing • Space based architectures and L2 Ethernet fabrics use Associative Lookup  See Principle 1
  • 25. Fixing Data by eight Principles? (8) • Next-generation cloud computing platforms need to deliver abstract services, not limited to web services. • Limitation to web services would equal that the future platform is Software as a Service (SaaS). • Consumers need so much more!
  • 26. Pulling at all together: architecture.
  • 27. Thank You
  • 28. Spare slides
  • 29. Functional Programming Aynschronous operations Parallel, multi- & many core support Elasticity & large scale operations Secure, multi tenancy, confidentiality  Immutable Data. Shared nothing.  Message passing (e.g. actors) available to re-synchronize processes  STM better manageable than locks.  FPLs are inerently parallel. Functions, Closures, Currying  Declarative  Compiler has freedom to re-arrange “everything”  Elasticity is left to the developer or to the “app engine”  Code easily testable & maintainable  No  “Safe Haskell” may be a good start. For example: Haskell
  • VILLANUEVA.pptx

    Jul 23, 2017

    Edge Effect on ERA

    Jul 23, 2017
    Search
    Related Search
    We Need Your Support
    Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

    Thanks to everyone for your continued support.

    No, Thanks