Celery: Should map and starmap be renamed?

Created on 24 Feb 2014 · 3Comments · Source: celery/celery

I don't really think these 2 primitives really live up to what it seems they should imply in the context of a "distributed task queue". I understand the reason why someone might want the map and starmap functions since they create a single task, but I don't really get the advantage to them as it would be trivial for the user to write the function to simply support a list of inputs. To me the term "map" here implies something like a map-reduce algorithm which this is definitely not. A chord is effectively a form of map-reduce which really just makes the naming boggling.

I think the naming of chord should stick as it makes enough since, but having map and starmap alludes that there is a map-reduce function IMO.

http://docs.python.org/2/library/multiprocessing.html#using-a-pool-of-workers

Source

budlight

Most helpful comment

group is the distributed map function.

The term map was used long before Google wrote the MapReduce paper, and I don't think anyone is confused by the term when used in haskell or clojure. The functionality is perfectly expressed with the term 'map', and I don't think there are any natural alternatives.

The canvas primitives are also all nouns (signature, group, chord, chain), but map is used as a verb (task.map), not the thing tourists are sometimes seen with.

MapReduce frameworks will also not normally have disconnected map and reduce stages, instead
you have a mapreduce operation that takes a Mapper and a Reducer, where the processed data is streamed into the reducer. In fact, simply having map() and reduce() is not considered to be sufficient for MapReduce.

So a chord is not really a form of map-reduce, it's a distributed version of a barrier, and the name
is directly taken from such a barrier in Cω.

group is not taken from anywhere, but the operation is the same
as what is often called 'parallel map' in the concurrency literature, just in distributed form.
Therefore, map is usually considered to be sequential, not parallel

ask on 6 Nov 2014

👍2

All 3 comments

group is the distributed map function.

The canvas primitives are also all nouns (signature, group, chord, chain), but map is used as a verb (task.map), not the thing tourists are sometimes seen with.

So a chord is not really a form of map-reduce, it's a distributed version of a barrier, and the name
is directly taken from such a barrier in Cω.

ask on 6 Nov 2014

👍2

And they are useful because they let you decrease the granularity of an operation simply by using task.map(list) instead of group(task.s(i) for i in list)

ask on 6 Nov 2014

Except that task.map(list) doesn't allow the tasks to run concurrently. All of the tasks run on the same worker one after another. If this shouldn't be the case I can open up a new issue.