Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WIP / RFC: Define IO reading interface #57982

Draft
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

jakobnissen
Copy link
Member

@jakobnissen jakobnissen commented Apr 2, 2025

This is a work in progress proposal to define the interface of the abstract type IO, and provide a robust, low-level API that makes it possible to implement efficient, generic IO operations.

Discussion is very welcome, see https://hackmd.io/UgeBwtIkTTipkgZSQnXc8w#Discussion for more details and discussion of design decisions.

Having used this API for some of my own work, I must say I really like it (well, most of it).

This PR will, in time, have several parts:

  1. Define and document the core IO interface, and also fallback definitions of derived functions.
  2. Use Base.IOBuffer as a test case for the new interface, to see if a) the API is pleasant, b) the performance is expected, and most importantly, c) this is non-breaking.
  3. Add a set of generic IO objects in Test and test the generic definitions using those, to make sure the generic fallbacks work for buffered IOs, unbuffered IOs, and IOs that do not implement this new interface at all (i.e. previously existing IOs)

TODO

  • Implement fallback definitions for all functions that do not currently have a fallback definition
  • Implement IOBuffer using these new abstractions as a test case for this PR, and fix what breaks
  • Add generic IO types in Test and test all main IO methods
  • Figure out what to do about the current system's dependence on pointer APIs.
  • Add documentation, including a manual section
  • Add NEWS

Decisions

See https://hackmd.io/UgeBwtIkTTipkgZSQnXc8w for points of discussion

Timeline

I hope to finish this before the feature freeze of 1.13, but I'd rather have this be done well than done soon.

Things left out of this PR

  • Cancellation: The original IO discussion included a discussion of a cancellation API. This is somewhat orthogonal to the IO interface, and IOBuffer, being purely in-memory, does not need to work with cancellation. Therefore, this can be left to a future PR.
  • Writing interface: To start with, I'll only implement the reading part of the interface. If people are happy with this, I'll move on to the writing part in a later PR.
  • Deserialization / serialization: Methods like read(::IO, ::Int) are one of the main uses of IO. However, this is a different problem with a different design space.

Closes #55835
Closes #47771

@nsajko nsajko added io Involving the I/O subsystem: libuv, read, write, etc. design Design of APIs or of the language itself labels Apr 2, 2025
base/io.jl Outdated
GC.@preserve ref unsafe_read(io, Ptr{UInt8}(pointer(ref))::Ptr{UInt8}, nbytes)
end

function unsafe_read(io::IO, dst::Ptr{UInt8}, nbytes::UInt)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is tricky because there is an existing fallback method that uses only read(io, UInt8).

If I understand correctly, fillbuffer can fallback to return 0 and getbuffer can fallback to return an empty vector for IO without an underlying buffer. This would hit the isempty(buf) && iszero(nfilled) case.

Currently, you have this throw an EOFError, but you could instead do:

unsafe_store!(dst, read(io, UInt8)::UInt8)
dst += 1
nbytes -= 1

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

fillbuffer can fall back to 0, but getbuffer should only be implemented if the IO is buffered. And having it buffered implies that when the buffer is empty, and fillbuffer returns 0, the IO is EOF. I'll clarify the documentation.

W.r.t the fallback calling read(io, UInt8) you're right. I just drew out the call graph for the current generic IO functions, and it's not too complex, actually. All reading functions fall back to read(io, UInt8).
I'm thinking of circumventing this by adding a check similar to this:

if readbuffering(typeof(io)) == NotBuffered() || hasmethod(getbuffer, Tuple{typeof(io)})
    # use methods relying on new interface
else
    # use method relying only on read(io, UInt8)
end

This will work, because, since IOs are buffered by default, IOs that have either opted out of buffering, or implemented getbuffer must be aware of the new interface. It's an ugly solution, but it'll work.

Another consideration for this function specifically, is whether we can write a fast, generic fallback for unsafe_read(::IO, src::Any, ::UInt). To do this, we need to be able to dispatch on whether we can write to src using a pointer, which we don't currently have any abstractions for. This is a hobby horse of mine, but for a reason; you really run into it again and again. I hope to be able to address this in this PR.

base/io.jl Outdated
Comment on lines 192 to 196
isempty(v) && return 0
buffer = @something get_nonempty_reading_buffer(io) return 0
mn = min(length(v), length(buffer))
copyto!(v, firstindex(v), buffer, 1, mn)
mn
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ideally, this would somehow fall back to readbytes! to work well with existing IO types. For example, you could check isnothing(buffer) && !eof(io) (which should only happen for legacy IO types) and then fallback to readbytes! in that case.

Copy link
Member Author

@jakobnissen jakobnissen Apr 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Edit: This is a new function, so I think it would actually be a boon if it failed for old IOs. That will push people to implement this new API for old IO types, without breaking any existing code.
It will also simplify the implementation because we don't have to do hacky workarounds to support IOs which don't adhere to the (new) interface.

@jakobnissen jakobnissen changed the title WIP / RFC: Define IO interface WIP / RFC: Define IO reading interface Apr 3, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
design Design of APIs or of the language itself io Involving the I/O subsystem: libuv, read, write, etc.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

print(iobuffer, number) without calling string(number)? Unified I/O error type
3 participants