Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle PPTX shapes where position is None #1161

Open
wants to merge 3 commits into
base: main
Choose a base branch
from

Conversation

richardye101
Copy link
Contributor

@richardye101 richardye101 commented Mar 28, 2025

My previous PR for sorting powerpoint shapes relied on the top and left attributes of the pptx shapes, however I didn't account for shapes having None as their top, left, height or width attributes. We'll need to think about what to do with these shapes, as sometimes they can still contain text, however they don't exist on the slide.

Currently, I've filtered out shapes with None in their positional attributes, however I'm open to changing how they're dealt with.

I think the options are:

  1. Parse them into markdown in a "Hidden Text" section under the Notes
  2. Ignore these shapes as I have written, as they don't appear on the powerpoint anyway so anyone reading it wouldn't know that text exists

@richardye101 richardye101 marked this pull request as ready for review March 28, 2025 14:45
@afourney
Copy link
Member

Thanks for looking into this!

@afourney
Copy link
Member

afourney commented Mar 28, 2025

It looks like there is a stack overflow/infinite recursion error now on the test file.

Moreover, I think that skipping shapes is problematic -- is it possible those shapes are tables, pictures, etc?

Maybe have the sort key be the tuple: (float('inf') if shape.top is None else shape.top, float('inf') if shape.left is None else shape.left),

so that shapes with missing coordinates always get listed last (but because it's a stable sort, they appear in the order they were indexed in the file)

@richardye101
Copy link
Contributor Author

From my research, the only shapes with None in their top,left,height,width attributes are of type pptx.shapes.placeholder.SlidePlaceholder. I'm not sure why the sort key would result in a stack overflow error though...

To your point about listing the shapes last: It seems almost all the shapes with missing coordinates are slide titles using placeholder shapes inserted using a slide template. I've seen some instances where a placeholder for slide content is also missing coordiantes. Based on that, i would actually be inclined to list them first

@afourney
Copy link
Member

From my research, the only shapes with None in their top,left,height,width attributes are of type pptx.shapes.placeholder.SlidePlaceholder. I'm not sure why the sort key would result in a stack overflow error though...

To your point about listing the shapes last: It seems almost all the shapes with missing coordinates are slide titles using placeholder shapes inserted using a slide template. I've seen some instances where a placeholder for slide content is also missing coordiantes. Based on that, i would actually be inclined to list them first

First also makes sense to me.

You can click on the test logs to see the detailed error, but here's the trace:
image

@richardye101
Copy link
Contributor Author

I fixed the recursion error, was a mistake in copying the sorting code between grouped shapes and normal shapes

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants