Shapeworks: Packaging shapworkspy and use case restructuring

Created on 14 Dec 2020  ·  13Comments  ·  Source: SCIInstitute/ShapeWorks

Most helpful comment

Steps for restructuring:

  1. Include all the helper functions used in the Jupyter notebooks to the ShapeWorks python module.
  2. Update the notebooks to use the helper functions from the python module.
  3. Rewrite python use cases using the python module and python API commands without using GroomUtils.

All 13 comments

From #818

Redesign groom utils so it can be run interactively rather than batch wise.
This will make it so intermediate grooming files don't have to be saved (issue #598) and steps can be skipped (issue #507)

Let's use this issue as the parent/driving issue for shapeworks python packaging and associated use case design. We can add more focused issues later and relate them to this one. I have closed related issues accordingly.

@jadie1 @iyerkrithika21 please join G-C slot to discuss this as part of python APIs. Moved this up on the agenda to allow you to leave earlier if needed.

I'm looking into Python module packaging now. Please ping me if you have suggestions or thoughts.

Some instructions I've found:

So far I'm most interested in conda for reputedly better dependency specification, but I'd be happy to have anything.
The reason for conda is we should be able to install everything with this package: command line, python module, and studio. But we'll start with our python module.

My biggest fear is multi-platform issues sucking away our lives, so I'll try to get OSX working first and go from there.

The use cases didn't work for me on Ubuntu 18.04, I had to:

  1. in RunUseCase.py, I added at the top sys.path.append('../../build/cmake-build-release/bin/')
  2. set the environment variable LD_LIBRARY_PATH=../../dependencies/install/lib/ (else it complains about a missing "libvcl.so")

To have the option of saving intermediate outputs, can we include the write option within each operation rather than a separate write/save function?
What I am imagining is :

img.binarize(write=False)
img.resample(write=True).binarize(write=True)

Instead of

img.binarize()
img.write()
img.resample()
img.write()

This will probably need a filename as input argument
e.g., img.binarize(write=True, filename='blabla')

@archanasri @cchriste thoughts?

To have the option of saving intermediate outputs, can we include the write option within each operation rather than a separate write/save function?
What I am imagining is :

img.binarize(write=False)
img.resample(write=True).binarize(write=True)

Instead of

img.binarize()
img.write()
img.resample()
img.write()

Image.write is chainable like everything else. Just put it in the chain if
you want it.

img.binarize().write(<path>)
img.resample().write(<path>).binarize()

On Tue, Jan 19, 2021 at 4:58 PM Shireen Elhabian notifications@github.com
wrote:

This will probably need a filename as input argument
e.g., img.binarize(write=True, filename='blabla')

@archanasri https://github.com/archanasri @cchriste
https://github.com/cchriste thoughts?

To have the option of saving intermediate outputs, can we include the
write option within each operation rather than a separate write/save
function?
What I am imagining is :

img.binarize(write=False)
img.resample(write=True).binarize(write=True)

Instead of

img.binarize()
img.write()
img.resample()
img.write()


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/SCIInstitute/ShapeWorks/issues/865#issuecomment-763221837,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJT3EP3HDOHQGC54NMWSJDS2YMA7ANCNFSM4U3KV45Q
.

Image.write is chainable like everything else. Just put it in the chain if you want it. img.binarize().write(<path>) img.resample().write(<path>).binarize()

I understand that the write function is also chainable; my point with suggesting a write option within each operation was to have just one function and pass the flag whether we want to save the intermediate images or no and simplify the use cases.
Example sudo-code:

function groom(write_flag):
    img.binarize(write = write_flag).resize(write = write_flag).crop(write=write_flag)
groom(write_flag = True)
groom(write_flag = False)

This way, we can avoid repeating the same piece of code. Just want to know the feasibility of this idea.

One of the reasons we're trying to dismantle the GroomUtils.py set of
"helper" functions is to make our grooming operations more transparent.
Without packaging these operations into monolithic functions, made flexible
only by parameter passing, it's much easier to create straightforward,
understandable use case demonstrations. If we don't use (much) chaining in
our examples, it will be very straightward to convey the ability to write
intermediate results when deemed necessary. Right now they all seem
necessary because we have use cases that require these results since
one GroomUtils function performs some (perhaps arbitrary) set of
operations, saves its result, then the next function reads those results
and continues processing.

I suggest flatten everything out to start, and for all the use cases, not
just the ellipsoids. I believe what we'll see is a relatively
straightforward set of operations that notably differs for certain cases
(ex: when original images are "along for the ride"). What the users will
get from our examples is a much clearer understanding of what can and/or
should be done for their own datasets.

Here's an example of what I would want to emulate if I were a user:

for img in images:

# since we're starting with fuzzy data, we first need to ensure it's a
binary (black and white) image in order to <explain>
img.binarize()

# next, we must ensure images all have the same logical dimensions since
<explain>
img.resize()

# now we'll crop these images using the bounds we computed earlier so they
all encompass the data without leftover space (since it can be costly and
pointless to compute)
img.crop(bounds)

We can provide examples of chaining write to any of these operations, such
as by adding .write(<path> after one of them. What we don't want is some
function that "just does it" since "it" isn't the same for every
dataset. Instead,
we'll empower the users by showing them that what is being done isn't all
that complicated and is very easy to change. Rather than giving them a
black box interface with a zillion parameters, give them the keys and let
them drive. I hope this helps make clear the whole idea behind getting rid
of GroomUtils.

On Mon, Jan 25, 2021 at 9:49 AM Krithika Iyer notifications@github.com
wrote:

Image.write is chainable like everything else. Just put it in the chain if
you want it. img.binarize().write()
img.resample().write().binarize()
… <#m_-7433729883366947300_>

I understand that the write function is also chainable; my point with
suggesting a write option within each operation was to have just one
function and pass the flag whether we want to save the intermediate images
or no and simplify the use cases.
Example sudo-code:

function groom(write_flag):

img.binarize(write = write_flag).resize(write = write_flag).crop(write=write_flag)

groom(write_flag = True)

groom(write_flag = False)

This way, we can avoid repeating the same piece of code.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/SCIInstitute/ShapeWorks/issues/865#issuecomment-766952032,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJT3EJND2F3EDVU75NB6ITS3WOIPANCNFSM4U3KV45Q
.

@iyerkrithika21 @jadie1

I agree with @cchriste. Let's not use chaining unless it is semantically reasonable (for instance, resampling binary images), even for these cases we don't have to write every intermediate output of this resampling (combo) step. Let's make use cases easy-to-follow, self-documented, and easy for users to adapt and customize.

Writing (especially temporarily for debugging) images is a great example of
when chaining is reasonable.

# let's see what happened
img.operation(...) -> img.operation(...).write(<path>)

Whereas, when it's an important step, it might be better to put it on its
own line with a comment.

...
# now let's write the results
img.write(<path>)

On Mon, Jan 25, 2021 at 10:34 AM Shireen Elhabian notifications@github.com
wrote:

@iyerkrithika21 https://github.com/iyerkrithika21 @jadie1
https://github.com/jadie1

I agree with @cchriste https://github.com/cchriste. Let's not use
chaining unless it is semantically reasonable (for instance, resampling
binary images), even for these cases we don't have to write every
intermediate output of this resampling (combo) step. Let's make use cases
easy-to-follow, self-documented, and easy for users to adapt and customize.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
https://github.com/SCIInstitute/ShapeWorks/issues/865#issuecomment-766983878,
or unsubscribe
https://github.com/notifications/unsubscribe-auth/AAJT3EKKARLKY4VKBRPHJWLS3WTSNANCNFSM4U3KV45Q
.

Steps for restructuring:

  1. Include all the helper functions used in the Jupyter notebooks to the ShapeWorks python module.
  2. Update the notebooks to use the helper functions from the python module.
  3. Rewrite python use cases using the python module and python API commands without using GroomUtils.

Remember, we have python_module branch in which this is already started. It hasn't been merged for a minute, but keep us posted if someone tackles this. It's high on my priority list.

Was this page helpful?
0 / 5 - 0 ratings