Documents

A Timed Petri-Net Model for Loop Scheduling

Categories
Published
of 20
All materials on our website are shared by users. If you have any questions about copyright issues, please report us to resolve them. We are always happy to assist you.
Related Documents
Share
Description
For
Transcript
  IntheProceedingsofthe12t h InternationalConferenceontheApplicationandTheoryofPetriNets,Gjern,Denmark,June26{28,1991,pp.22{41. ATimedPetri-NetModelforLoopScheduling  GuangR.GaoYue-BongWongQiNing  SchoolofComputerScience McGillUniversity 3480University Montreal,QuebecH3A2A7  Abstract Mostscienticcomputationsspendasignicantpartoftheirtimeintheexecutionofloops.Asaresult,compilerdesignersforhigh-performancecomputerarchitectureshavebeenfocusingtheirattentionontheeectiveandecientexploitationofparallelismwithin loops.Inthispaper,weusedataowgraphsasourprogramrepresentationforaclassofloops,andwedevelopatimedPetri-netmodeltomodeltheseloops.The behaviorgraph  ofthePetrinetisusedtodetermine,atcompiletime,arepetitiveexecutionpattern.Suchapattern,oncedetected,canbeusedtogenerateparallelinstructionsscheduleduringcodegeneration.Weshowthat,foranidealmodel,afterthe earliestringschedule  isrelaxedtosatisfy aninitialtoken-distributionconstraint,therepetitivepatternofaloopcanbedetectedin  O  ( n  2 )iterations.Thisimprovesconsiderablythepolynomialboundreportedearlier17].Furthermore,weshowthatourresultsapplytoallnodesonoroanycriticalcycles.Wehaveexaminedtheapplicationofthebehaviorgraphapproachtoatargetpipelined architecturewithmultiplecleanexecutionpipelines.Anumberoftypicalbenchmarks(loops)inscienticprogramshavebeensimulated,andwendthattherepetitivepatternscan bereachedveryfast.Thisveriesthefeasibilityofemployingtheproposedmethodina compiler. 1Introduction  Mostscienticcomputationsspendasignicantpartoftheirtimeintheexecutionofloops.Asa result,compilerdesignersforhigh-performancecomputerarchitectureshavebeenfocusingtheirattentionontheeectiveandecientexploitationofparallelismwithinloops.Unfortunately,manyoptimizingcompilersrelyonad-hocschemestohandleloopparallelismandsofarhave achievedonlylimitedsuccess.Incontrast,thetechnologyandarchitecturesforhigh-performance machineshavebeenadvancingrapidly,andaggressivene-grainloopschedulingtechniquesare neededtotakeadvantageoftheextensiveparallelismavailableinthesemachines.Thispaperwaspartiallyinspiredbyrecentworkin  softwarepipelining  proposedforne-grain loopscheduling.Softwarepipeliningperformsloopschedulingbycomputingastaticparallelscheduletooverlapinstructionsofaloopbodyfromdierentiterations.Anadvantageofsoftware 022   pipeliningisthatitprovidesadirectwayofexploitingparallelismacrossalliterationsofthe loop.Thisisachievedwithouttheexplicituseofloopunrolling,resultinginhighlycompactobjectcodes.Softwarepipelininghasbeenproposedforsynchronousparallelmachinesaswellaspipelinedmachines1,2,3,10,11,22].Softwarepipeliningrequirestheexploitationofparallelismfromapartiallyorderedsetofoperationsoftheloopbodywhicharetobeperformedrepetitivelyoverasequenceofiterations.Whenaloopcontainsloop-carrieddependences,thedatadependencegraphoftheloopisnolongeracyclic.Therefore,itisimportantthattheexecutionoftheloopcanbemodeledatcompiletime,andasteadystatepatterncanbederivedforcodescheduling.Toeectivelymodelthecode schedulingprocessforsuchloopsistheprimaryobjectiveofthispaper.Thisworkisrelatedtoourworkin  dataowsoftwarepipelining  14,15,16].Dataowsoftware pipeliningisacompilermethodforstructuringthene-grainparallelisminloopsthataretobe executedbyadataowcomputer.Thegeneralizationofourearlierwork,tocoverloopswith loop-carrieddependenciesaswellastopipelinedmachineswithresourceconstraints,motivatesustondamodelforne-grainloopscheduling.Weusedataowgraphsasourprogramrepresentationforaclassofloops,andwedevelopa timedPetri-netmodeltomodeltheseloops.The  behaviorgraph  ofthePetri-netmodelisused todetermine,atcompiletime,arepetitiveexecutionpattern.Aftertheexecutionpatternisdetermined,thecompilercanthengenerateatime-optimalscheduletoguidecodegeneration.Therefore,ourworkhasprovidedamodelforsoftwarepipeliningmentionedearlier.Inacompanionpaper,wehadestablishedan  O  ( n  3 )iterationsasthepolynomialboundforthe schedulingpatterntooccurinaloopwithsinglecriticalcycleundertheearliestringschedule fortheidealmachinemodel17].Inthecaseofloopswithmultiplecriticalcycles,wewereonly abletoshowthatapolynomialtimeboundexistsfornodesonthecriticalcycles.Inthispaperwe showthat,aftertheearliestringscheduleisrelaxedtosatisfycertaininitialtoken-distribution constraint,therepetitiveexecutionpatternofaloopcanbedetectedin  O  ( n  2 )iterations.Thisimprovesconsiderablythepolynomialboundreportedpreviously.Furthermore,weshowthatourresultsapplytoallloopswithsingleormultiplecriticalcycles,includingnodesonoro criticalcycles.Wehavealsoexaminedtheapplicationofthebehaviorgraphapproachtoatargetpipelinedarchitecturewithmultiplecleanexecutionpipelines. 1 Wesimulatetheexecutionofa numberoftypicalbenchmarks(loops)fromscienticprogramsandfoundthatrepetitivepatternscanbereachedveryfast.Thisveriesthefeasibilityofemployingsuchmethodsinacompiler.Section2denesthetimedPetri-netsandreviewsthebasictheoriesontimedmarkedgraph(foranintroductiontobasicPetri-nettheory,see26,28,27]).Section3denesaclassofloopsknown asastaticdataowsoftwarepipeline(SDSP).Thisclassincludesloopsbothwithandwithoutloop-carrieddependences.Inthethirdsectionwealsodescribehowtoobtainacorresponding Petri-netlooprepresentation,SDSP-PN.InSection4,thetechniqueforconstructingthebehaviorgraphtoobtainsteady-statebehaviorforanSDSP-PNoperatedundertheearliestringruleisdiscussed.InSection5weshowthatsteady-statebehaviorfortheidealmachinemodel,SDSP-PN,canalwaysbereachedinapolynomialnumberofsteps.InSection6wecomposeanewmodel,calledSDSP-MCP-PN,integratingtheconceptofaclean-pipelinedprocessorarchitecture(with multiplecleanpipelines,MCP)intothebasicSDSP-PNmodel.WethenprovideexperimentalevidencetoshowthatthecyclicfrustumforbothanSDSP-PNanditsextensionSDSP-MCP-PN canbequicklyreachedforasetofLivermoreloops.Finally,ourconclusionsarepresentedin  1 Apipelineis clean  ifitisfreeof structuralhazards |resourceconictsarisewhenthehardwarecannotsupportsimultaneousoperationsbytwopossibly-independentinstructions20]. 023   Section7. 2ThePetriNetModel  2.1TimedPetriNets  AddingthenotionoftimetothebasicPetri-netmodelenablesthecharacterizationofsystem performance.Inthispaperweassignadeterministictime,expressedbyanon-negativeinteger,toeachtransitioninthebasicPetri-net.ThemodeldescribedbelowismadeupoftheoriginaltimedmodelintroducedbyRamchandani28]andtheconceptofinstantaneousstatesubsequently developedbyChretienne7].Formally,a  timedPetrinet isdenedbyapair( PN  ,),where  PN  isthebasicPetri-nettuple ( P;T;A  ). P  isanon-emptysetofplacesdenotedby  f  p  1 ;p  2 ;:::;p  n  g  , T  isanon-emptysetoftransitionsdenotedby  f  t 1 ;t 2 ;:::;t m  g  ,and  A  isanon-emptysetofdirectedarcssuchthat P  6 =  ;  , T  6 =  ;  , P  \  T  =  ;  , A    P    T    T    P  .Pictorially, P  , T  ,and  A  arerepresentedbycircles,bars,anddirectedarcs,respectively.Thesymbolisafunctionthatassignsannon-negativeinteger   i toeachtransition  t i inthenet.Thevalue    i denotesthe  executiontime  (ortheringtime)takenbytransition  t i .ThestateofthetimedPetrinetattime  u  isnolongerdescribedonlybythecurrentmarking attime  u  ( M  u )becausesometransitionsmaystillbeprocessingattime  u  .Anewconceptof residualringtimevector  , R  ,isintroducedtokeeptrackofon-goingexecutionsateachtimestep. R  u ( t i )storestheremainingexecutiontimeoftransition  t i attime  u  .Accordingly, M  u and  R  u togetherdenethe  instantaneousstate  ofatimedPetrinet.Wehavealsomadethefollowingtwo assumptionsregardingtheringruleofenabledtransitions: Assumption2.1.1  Twodistinctringsofthesametransitionscannotoverlap.Toformally enforcethisrule,eachtransitioninthenetisassignedadistinctself-loopofitsownwithonlyone tokenonit.Thoughwedonotdrawthemexplicitly,theyareimplicitlyassumed.Thisassumption isalsoknownas  simple-serversemantics . Assumption2.1.2  Transitionsareredassoonastheyareenabled.Thishasbeencalledthe  earliestringrule  . 2.2MarkedGraphs  ThissectiondenesandreviewspreviouslyknownresultsforaclassofPetrinetsknownas marked graphs  8].Thesegraphsareimportanttothedevelopmentofourworks. Denition2.2.1  APetrinet PN=( P;T;A  ) iscalledamarkedgraphifandonlyifthereisone inputtransitionandoneoutputtransitionforeachplacein  P  . Theorem2.2.1  Amarkingisliveifandonlyifthetokencountofeverysimplecycleispositive. 2 Theorem2.2.2  Alivemarkingissafeifandonlyifeveryedgeinthegraphisinasimplecycle withtokencount1. 2 Asimplecycleisadirectedpath  p  i t j p  k :::t l p  m  suchthatallplacesandtransitionsaredierentexcept p  i and  p  m  . 024   Theorem2.2.3  If    isacyclicringsequencesuchthat M    !  M  ,alltransitionshavebeenred anequalnumberoftimes. 2.3OptimalComputationRate  TimedPetrinetshavebeenappliedinthestudyofconcurrentsystemstodeterminethe  compu-tationrate  (orequivalentlythe  cycletime  )whichdescribesthenumberofringsofatransition perunittimeasthemodelingsystemisoperatingatitsmaximumrate.Listedbelowisareview ofresultsregardingthecycletimeofatimedmarkedgraph27]: Denition2.3.1  Thecycletimeoftransition  t i isdenedas  lim  n  !1  X  n i n  where  X  n i isthetimeatwhichtransition  t i initiatesits  n  -thexecution.   Thenumberoftokensinasimplecycleremainsthesameafteranyringsequence.   Alltransitionsinamarkedgraphhavethesamecycletime.   Thecycletimeiscomputedby    =max  (  ( C  k ) M  ( C  k ) ; ( t i ) )  where  k  =1  ; 2  ;:::;q  and  t i 2  T  ;( C  k )=  P  t i 2  C  k ( t i )=sumoftheexecutiontimesofthetransitioninsimplecycle  C  k ; M  ( C  k )=  P  p i 2  C  k M  ( p  i )=totalnumberoftokensinsimplecycle  C  k ; q  =numberofsimplecyclesinthenetexcludingtheself-loopimplicitlyassumedforeach transition;thecycletimeofeachself-loopisreectedby( t i ) ; 8  t i 2  T  .   Thecomputationrate    ofatransitionistheaveragenumberofringsofthattransitionin unittimeandiscomputedbythereciprocalofthecycletime.   =min  (  M  ( k )( C  k ) ; 1 ( t i ) )  where  k  =1  ; 2  ;:::;q  and  t i 2  T    Thesimplecycle  C  k whichgivesthemaximumcycletimeorequivalentlytheminimum computationrateisknownasthe  criticalcycle  .Thecycletimeofatimedmarkedgraphcanbeobtainedbyenumeratingeverysimplecycle inthegraph;however,thetimecomplexityoftheenumeratingprocesscanturnouttobeex-ponentialbecausethereexistsamarkedgraphwithanexponentialnumberofsimplecycles23].Amoreecientapproachcanbeobtainedbyformulatingthecycletimeproblemintoalinearprogrammingproblem,whichhasatheoreticalpolynominalbound23].025 
Search
Tags
Related Search
We Need Your Support
Thank you for visiting our website and your interest in our free products and services. We are nonprofit website to share and download documents. To the running of this website, we need your help to support us.

Thanks to everyone for your continued support.

No, Thanks