-
Notifications
You must be signed in to change notification settings - Fork 0
Spec standard Issues List
This is a place to accumulate a list of design issues and our choices. When you fill in your proposals under each entry, please label it with your name and initials.
(KK) Proposal:
-
Don’t try to figure out what is the standard, what are restrictions and what are extensions. Rather, first focus on the set of possible features (semantics and syntax). Only after that should we sort them out.
-
Determine for each, the reasons we are even talking about it:
- Support for optimizations
- Support for error checking
- Functionality
- Ease of use
- To help guide the process maybe we should distinguish amomg a few variant categories. Here is a starter list:
- Close to current language
- New user
- Wild west
- Maximal error checking
- Maximal optimization potential
- Best software engineering
- Most functionality
(VS) We should have a standardized textual representation of a CnC graph language from which "glue code" can be generated for different runtime systems
- Ease of use
- Support for optimizations
- Error checking
(VS) The graph language should include item collections and step collections, but not control tag collections. (There is no* loss of generality in excluding tag collections.) The prescribe relationship should be specified directly between pairs of step collections as in the Reduced CnC proposal. The possible edges in the graph are as follows:
(Step1)-->[Item2] // put operation
[Item2]-->(Step3) // get operation
(Step1)::(Step2) // prescribe operation
(FS) prescribing steps by steps could be optional. Not having tag-collections removes the valuable mediator between controller and controllee and only leaves a mediator for data. Tag-collections allows cleaner and more flexible SW engineering.
(VS) Also consider the possibility of including a step dependence (before) operation, e.g., (Step1:i-1,j)-->(Step3:i,j), which serves as an explicit synchronization between steps without an intervening item collection. It should not be needed if users follow a pure CnC model of only accessing sharing data in item collections, but is useful when users would like steps to share global data structures that are not in item collections.
(RN) I strongly feel that a 'before' relation should not be specified with an
arrow. I think there's no reason that our spec language needs to be
punctuation-only with no keywords. I find (Step1:i) before (Step2:i)
much
more readable. Also, could someone write down a psuedocode algorithm
demonstrating how to desugar CnC programs wih before constraints into those with
dummy items. I'm worried about difficult corner cases for this algorithm. For
example:
(Step1: i) before (Step2: 3)
This would seem to say that EVERY Step1
must execute before a specific Step2
can execute. How do you know how many instances of Step1
[will] exist?
Probably, the restriction we want is that the step on the right-hand-side of
before is introducing the tag component variables, and any variable not
introduced there is unbound. For example the i
above would be unbound.
(FS) Using shared but not item-collected data could be optional, but should also require being marked explicitly because it limits (or removes) CnC's nice property of being independent of the memory model.
(RN) Regarding tag collections -- I think they should be removed. I think instead we should introduce an optional node type called something a "connector" (or "tee" or anything you like). This node is used to add indirection to ANY edge in the graph. These are basically tag collections, but if we later add reduction collections, they are also useful on item-put edges. (We effectively have two different kinds of put-edges, edges carrying (tag,value) and edges carrying tag only. It's a separate question whether you would want to allow both an item collection and a step collection to be "subscribed" to the same connection point. Presumably the step would only get the tag and ignore the value.)
(RN) Even though syntactic issues are mostly incidental, I think it is an important part of this effort to standardize the details so that our parsers are compatible. For example, where/when are semicolon required at the end of lines? I'd like to raise two small issues here: (1) I think reduced CnC should possible simplify terminology as well as removing tag collections. For example I think that we should consider using only arrows for graph edges, rather that colon-colon. (2) I want to at least have on the table the option of C/Java style syntax for steps/items rather than the Lisp style syntax. I had implemented this and there was never much feedback one way or the other, so I'm not sure how people really feel. For example:
Step1()-->Item2[] // put operation
Item2[]-->Step3() // get operation
Step1()-->Step2() // prescribe operation
TECHNICAL QUESTIONS
How will the step-dependence be reflected in the step code?
SUMMARY
- Ease of use (may improve ease of use for new users)
- Software engineering (may lose some software engineering benefits if a step collection is used as a mediator instead of a tag collection)
- Optimizaitons (harder?)
- Error checking (harder?)
(KK) Is a disconnected graph legal? Is a graph with a step that doesn't produce either tags or items legal?
- Error checking
(FS) It should support tags and items of any type. Supporting a simple expression syntax might require something like a type declaration capability.
(RN) My contention here was always that symbolic tag-functions and tag-function analysis would require standardizing a little bit more (numbers, arithmetic). My other contention was that enforcing that tags are an (ordered) collection of components essentially introduces tuples to the type-system, but ONLY top-level "flat" tuples. A more general type system would allow a tag to be any type, including tuples (which may be nested).
- Ease of use
(VS) All step inputs should be independent i.e., the tag/key for a get operation should only depend on the step's control tag, and not on the value of another get operation. There is no loss of generality with this restriction.
(RN) Is the proposal for this restriction to apply to the core language? Or for this to be a well-defined restriction/variant? I believe this is the same as requiring tag functions for the inbound edges into a step. (At least, as long as tag functions are not allowed to reference item collections contents!) Personally, I don't think that it would be good to enforce this restriction for all (or "default") CnC programs. It sounds awful painful to split steps up at all points they do data-dependent gets.
- Ease of use (harder)
- Optimization
(FS) Do we want the same for put operations? Would be good for tag-function analysis. Would it restrict generality?
- Optimization
- Restrict functionality
(VS) The graph language should require the specification of "tag functions" for get operations i.e., the name of a user-defined function that can be used to compute the tag/key for a get operation as a function of the step's control tag. As an option, we can also support a simple expression syntax that the user can use to specify tag functions directly in the graph language. But they should always have the ability to define a tag function in the host language.
(FS) If we allow variants, it seems natural to start with a generic base-language and let variants be more restricted. Requiring tag-functions in the base-language does the opposite (accounting for the fact that we already have variants which do not require tag-functions).
(FS) Using user-provided functions raises portability issues. A definition of some kind of API/ABI for every "supported" target-language would be nice.
(RN) Just as a syntactic note, Vivek's last statement "they should always have the ability to define a tag function in the host language", implies that we have "foreign" tag functions from the perspective of the spec, example:
foreign int foo(int);
[Item: foo(i)] -> (Step: i);
- Optimization
- Ease of use (harder)
(VS) This should not be required, but a user should optionally be able to specify tag functions for put operations i.e., the name of a user-defined function that can be used to compute the tag/key for a put operation as a function of the step's control tag. The availability (and analyzability) of these tag functions can enable important memory optimizations. As an example, see the following reference:
Declarative Aspects of Memory Management in the Concurrent Collections Parallel Programming Model. Zoran Budimlic, Aparna Chandramowlishwaran, Kathleen Knobe, Goeff Lowney, Vivek Sarkar, Leo Treggiari. Proceedings of DAMP 2009 Workshop (Declarative Aspects of Multicore Programming), co-located with POPL, January 2009.
- Optimization
(VS) It is desirable for a "caller step" to pass additional parameters by value to a "callee step" that it prescribes. The control tag is the only required parameter (like a primary key), whereas additional parameters are optional.
(FS) What's the difference between such a parameter and a tag-component? Is it only that it might not be used in tag-functions?
(RN) I think this is basically an optimization. It allows passing arguments in a register rather than packing them into a tag struct (which gets hashed and all that). BUT, if tag-memoization is turned on then this issue has more significance. Presumably these extra arguments aren't part of the tag -- so what happens wrt memoization? Also
(RN) An alternate formulation would be to say that this is a "tag unpacking" optimization, where rather than a N-component tag (tuple) you pass N arguments. They would still be logically part of the tag. You would probably need some kind of UNPACK pragma in the spec to let the code generator know the calling convention for a particular step implementation.
(RN) However, isn't there a collision here (between tag unpacking and data-gets-as-arguments) where we want to use the extra step-function arguments for two different purposes? In HC-CnC aren't the extra function arguments used to communicate the values of the step's data-gets? To get these two features to compose, what would the rule be? If I mark a step in the spec as both tag-unpacked and receiving-data-gets-as-arguments, does it then receive tag arguments 1..N first followed by data arguments N+1...M??
- Ease of use
(VS) This should not be required, but a user should optionally be able to specify specify "get-count" functions for put operations. As an example, see the following reference:
Folding of Tagged Single Assignment Values for Memory-Efficient Parallelism. Dragos Sbirlea, Kathleen Knobe, Vivek Sarkar. International European Conference on Parallel and Distributed Computing (Euro-Par), August 2012.
- Optimization
(VS) Controlled nondeterminism can be added to the CnC model with the putIfAbsent operation. Should we consider this as an optional feature of the base language?
(KK) I do like the idea of allowing some nondeterminism as long as it's explicit in the graph, not just the step code. The putIfAbsent worked will for the problem of embedding a tree in a graph. But we should consider the problem more broadly. The putIfAbsent looks at the current runtime state. There are other aspects of the state we might consider. One example is getIfPresent. This would allow for a step to proceed with or without a given input. More globally, maybe the graph specifies the general computation but for this run we want the first 10 items produced. I'm just proposing non-determinism that depends on the current execution state, not arbitrary non-determinism. That would be a different topic.
- Functionality
(VS) This should not be required, but a user should optionally be able to specify specify folding functions for data items. As an example, see the Euro-Par 2012 paper referenced above.
(RN) I still need to read the Euro-Par paper, but this does add nondeterminism as well, correct?
- Optimization
(VS) We should discuss adding support for user-defined reductions to the CnC base language. Two proposals from past work can be found in the following papers:
Deterministic Reductions in an Asynchronous Parallel Language. Zoran Budimlic, Michael Burke, Kathleen Knobe, Ryan Newton, David Peixotto, Vivek Sarkar, Edwin Westbrook. The 2nd Workshop on Determinism and Correctness in Parallel Programming (WoDet), March 2011.
CnC-Hadoop: a Graphical Coordination Language for Distributed Multiscale Parallelism. Riyaz Haque, David M Peixotto, Vivek Sarkar. ACM Computing Frontiers 2011.
- Functionality
- Ease of use
- Optimization
(KK)
- Ease of use
- Optimization
(KK)
(FS) Scoping and naming/mapping issues
(FS) tag-space slicing and/or sub-/super-typing
- Ease of use
(VS) We should define a tuning language separately from the base graph language. This can come later, but it's important to consider what requirements the tuning language will impose on the base graph language.
- Optimization
(KK) support multiple implementation of a given step together with some decision procedure.
- Optimization
(KK) This is a non-deterministic feature. A step can perform a computation that and determine that some other step should not run even if its tag is produced. Obviously this will have no result if the step is already complete and it would apply if the step tag has been produced but the step hasn't started. What if the step has started?
- Functionality
(FS) It should be possible to provide implementation specific hints, options, etc.
(RN) This is a highly-relevant implementation specific feature because it affects asymptotic time and space complexity. It is NOT simulatable by using an dummy item collection, however. So I think in that sense it is fundamental.
- not sure what category this is