Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Adding Pluggable Device For TensorFlow #262

Merged
merged 29 commits into from
Sep 29, 2020

Conversation

jzhoulon
Copy link
Contributor

@jzhoulon jzhoulon commented Jun 24, 2020

This RFC will be open for comment until Monday, July 20th, 2020.

Pluggable device for TensorFlow

Status Accepted
RFC # 262
Author(s) Zhoulong Jiang ([email protected]), Yiqiang Li ([email protected]), Eric Lin ([email protected]), Jianhui Li ([email protected])
Sponsor Anna Revinskaya ([email protected])
Updated 2020-08-13

Objective

Implement a pluggable device mechanism which allows to run existing TensorFlow programs on a new device without user changing the code. Users only need to install a plugin in a specified directory, and the mechanism is able to discover and plug in the capabilities offered by the plugin.

This RFC is based on the Modular TensorFlow RFC, which aims to extend the TensorFlow design to plugin capabilities like adding a new device support. The modular device interface is based on StreamExecutor C API RFC.

@penpornk
Copy link
Member

penpornk commented Jun 24, 2020

Thank you for the RFC! I think the links to the images are broken?

@jzhoulon
Copy link
Contributor Author

jzhoulon commented Jun 25, 2020

Thank you for the RFC! I think the links to the images are broken?

@penpornk Thanks for the review! I put image relative path in the md file, it seems can be displayed normally in "view file", but broken in the md diff. Seems that the image is linked to the master branch and indeed it was not there yet. Do you have any suggestions? thanks.
view file:
https://github.com/tensorflow/community/blob/97d805fb6c122ba69b239d468b80247a33350fd1/rfcs/20200624-pluggable-device-for-tensorflow.md
images:
https://github.com/tensorflow/community/tree/97d805fb6c122ba69b239d468b80247a33350fd1/rfcs/20200624-pluggable-device-for-tensorflow

Copy link
Contributor

@annarev annarev left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for writing the RFC!

rfcs/20200624-pluggable-device-for-tensorflow.md Outdated Show resolved Hide resolved
rfcs/20200624-pluggable-device-for-tensorflow.md Outdated Show resolved Hide resolved
rfcs/20200624-pluggable-device-for-tensorflow.md Outdated Show resolved Hide resolved
@penpornk
Copy link
Member

@jzhoulon "View file" works for me. Sorry I didn't try that before asking!

Now that I think about it, I don't think there is a good way to make the images display in the diff mode here, since the images are not in this repo yet. If you use absolute paths (URLs), you'll need to change them back to relative paths before the PR is merged. -- Not worth the hassle.

@wchao1115
Copy link
Contributor

Thanks for the RFC. We're working on adding a new device for DirectML (#243) so this RFC along with #257 are of great interest to us. I have a few high-level questions as follow:

  1. Is there a plan to encourage a refactor of the existing GPU-CUDA device as a pluggable device in the future? The DirectML device is designed to be a replacement of the existing GPU device. If the GPU device continues to be part of TensorFlow proper, the total payload size of the package with the DirectML device, which includes the rest of TensorFlow as its dependency will never be smaller than the package that it depends on. The current TensorFlow-GPU wheel size is over 450 MB while the current TensorFlow-DirectML wheel size (pre-release) is about 65 MB.

  2. Is there a particular reason to require all pluggable devices to implement the stream and the stream executor interface (RFC: StreamExecutor C API #257)? Why not give the device implementer the freedom to implement the device as they see fit by defining a C API for the pluggable device interface instead? As an example, the DirectML device doesn't use stream or the stream executor. Since one of the main purposes of any device is to manage the state for all the kernels it supports, it seems there is little benefit to regulate the implementation's choices of how a device and its corresponding kernels may be implemented.

  3. What is the runtime mechanism that ensures that the version of TensorFlow supporting the plug-in is compatible with the pluggable device implemented in the plug-in? One of the design benefits of a plug-in architecture is that it allows the host and the plug-in to evolve independently as long as there is a versioning policy in place to ensure their compatibility to one another. It's not clear to me from the device discovery code sample how that mechanism is achieved.

@penpornk
Copy link
Member

penpornk commented Jul 7, 2020

@wchao1115 Great questions!

  1. Is there a plan to encourage a refactor of the existing GPU-CUDA device as a pluggable device in the future? The DirectML device is designed to be a replacement of the existing GPU device. If the GPU device continues to be part of TensorFlow proper, the total payload size of the package with the DirectML device, which includes the rest of TensorFlow as its dependency will never be smaller than the package that it depends on. The current TensorFlow-GPU wheel size is over 450 MB while the current TensorFlow-DirectML wheel size (pre-release) is about 65 MB.

For the current TensorFlow stack, no. Would linking to CPU-only TensorFlow package work for you (~110.MB for 1.15, ~144MB for 2.2.0)? Would love it if you could share your techniques to get the wheel size down to 65MB as well. Looping in @gunan and @annarev for more details.

  1. Is there a particular reason to require all pluggable devices to implement the stream and the stream executor interface (RFC: StreamExecutor C API #257)? Why not give the device implementer the freedom to implement the device as they see fit by defining a C API for the pluggable device interface instead? As an example, the DirectML device doesn't use stream or the stream executor. Since one of the main purposes of any device is to manage the state for all the kernels it supports, it seems there is little benefit to regulate the implementation's choices of how a device and its corresponding kernels may be implemented.

Yes. It’s because we happen to have the StreamExecutor C API already in the works.

The new TFRT/MLIR TensorFlow stack will have much more flexible support for third-party devices. Since the new stack is fundamentally different from the current stack, we can’t guarantee that any code added to the current stack will work with the new stack right away. So the focus for the current stack is to enable basic device integration, for people that need it now, while minimizing throw-away work. We think StreamExecutor C API (covering a small subset of StreamExecutor routines) and PluggableDevice could be sufficient. (Or we’d like to hear why they might not be enough sooner rather than later. -- Hence the RFCs.)

“Stream” is just a keyword meant to represent an ordered sequence of kernels to run. Does DirectML have a kernel queue which would fit this definition? Would DirectML be able to use StreamExecutor C API and PluggableDevice? Is there anything you need that is missing? If so, we can look into adding the missing parts (and making the things you don’t need optional).

Why not give the device implementer the freedom to implement the device as they see fit by defining a C API for the pluggable device interface instead?

Do you mean C APIs for Device and DeviceFactory? Or do you mean different APIs for different devices?

  1. What is the runtime mechanism that ensures that the version of TensorFlow supporting the plug-in is compatible with the pluggable device implemented in the plug-in? One of the design benefits of a plug-in architecture is that it allows the host and the plug-in to evolve independently as long as there is a versioning policy in place to ensure their compatibility to one another. It's not clear to me from the device discovery code sample how that mechanism is achieved.

Functionality can be extended by appending members to the structures in StreamExecutorInterface. TensorFlow will check for the existence of members before accessing them safely. Plugins just needs to initialize struct_size using the macros provided so that TensorFlow knows which members are accessible. @yisitu will give some examples later in a separate reply.

@pawelpiskorski
Copy link

Thanks for the RFC. I have two questions regarding support for dataset prefetch_to_device in case of PluggableDevice.

When using prefetch to device dataset iterator is on "accelerator" device and resorts to a remote call to the CPU that does the data preparation. One of the obstacles is that remote call mechanism is guarded by ProcessFunctionLibraryRuntime::GetDeviceContext that, as of now, is a bunch of hardcoded device type strings. Do you envision a way to allow PluggableDevice to support remote calls?
Also a few operators (IteratorGetNext to mention one) must be supported on the accelerator and seemingly could reuse TF implementations. This would ask for an access to operator registry such that would allow duplicating registrations of some operators from one device (say "GPU") to another.

Such features would be very welcome. Are these topics in scope of Pluggable Device RFC?

rfcs/20200624-pluggable-device-for-tensorflow.md Outdated Show resolved Hide resolved
rfcs/20200624-pluggable-device-for-tensorflow.md Outdated Show resolved Hide resolved
rfcs/20200624-pluggable-device-for-tensorflow.md Outdated Show resolved Hide resolved
@ematejska ematejska added the RFC: Proposed RFC Design Document label Jul 7, 2020
@ematejska ematejska changed the title Adding Pluggable Device For TensorFlow RFC RFC: Adding Pluggable Device For TensorFlow Jul 8, 2020
@jzhoulon
Copy link
Contributor Author

jzhoulon commented Jul 8, 2020

Thanks for the comment! these are good questions.

as of now, is a bunch of hardcoded device type strings. Do you envision a way to allow PluggableDevice to support remote calls?

Tensorflow seems use lots of hardcoded device string type to do branch, some are device name(front-end name), such as (ProcessFunctionLibraryRuntime::GetDeviceContext), some are device type(back-end name), such as (ProcessMemoryTypes), I think we can extend a new branch path, such as provide an util function (IsPluggableDevice(device_type)), this function maintains a global object that contains every device type/name registered from plugin. @annarev do you have comments here? Thanks

    if (device_type != DEVICE_GPU && device_type != DEVICE_SYCL && !IsPluggableDevice(device_type)) {
     // On non-GPU and non-SYCL devices, HOST_MEMORY and DEVICE_MEMORY are always
     // compatible.
     return Status::OK();
   } 

Also a few operators (IteratorGetNext to mention one) must be supported on the accelerator and seemingly could reuse TF implementations.

I'm not sure whether Kernel and Op Implementation C API will support this, maybe Google can expose IteratorGetNext as a utility functions like (TF_NewKernelBuilder) or provide an special API to register an existing implementation with new device type. @annarev

Thanks for the RFC. I have two questions regarding support for dataset prefetch_to_device in case of PluggableDevice.

When using prefetch to device dataset iterator is on "accelerator" device and resorts to a remote call to the CPU that does the data preparation. One of the obstacles is that remote call mechanism is guarded by ProcessFunctionLibraryRuntime::GetDeviceContext that, as of now, is a bunch of hardcoded device type strings. Do you envision a way to allow PluggableDevice to support remote calls?
Also a few operators (IteratorGetNext to mention one) must be supported on the accelerator and seemingly could reuse TF implementations. This would ask for an access to operator registry such that would allow duplicating registrations of some operators from one device (say "GPU") to another.

Such features would be very welcome. Are these topics in scope of Pluggable Device RFC?

@annarev annarev mentioned this pull request Jul 9, 2020
@yisitu
Copy link
Contributor

yisitu commented Jul 9, 2020

  1. What is the runtime mechanism that ensures that the version of TensorFlow supporting the plug-in is compatible with the pluggable device implemented in the plug-in? One of the design benefits of a plug-in architecture is that it allows the host and the plug-in to evolve independently as long as there is a versioning policy in place to ensure their compatibility to one another. It's not clear to me from the device discovery code sample how that mechanism is achieved.

Functionality can be extended by appending members to the structures in StreamExecutorInterface. TensorFlow will check for the existence of members before accessing them safely. Plugins just needs to initialize struct_size using the macros provided so that TensorFlow knows which members are accessible. @yisitu will give some examples later in a separate reply.

Hi @wchao1115, @penpornk! Please find more details examples here: https://github.com/tensorflow/community/pull/257/files#diff-8312b2e074fd41855aa8a03d13e92b91

@penpornk
Copy link
Member

penpornk commented Jul 9, 2020

There will be a public design review meeting for this RFC.
Time: Thursday, July 23rd, 2020 from 3:00-3:45pm PT
Meeting link: Google Meet

Meeting participants are expected to read the RFC ahead of time as the meeting will focus on the remaining issues/questions.

@penpornk
Copy link
Member

penpornk commented Jul 9, 2020

@yisitu Thank you!

@ppiskorski
Copy link

Thanks for the RFC. I have two questions regarding support for dataset prefetch_to_device in case of PluggableDevice.
When using prefetch to device dataset iterator is on "accelerator" device and resorts to a remote call to the CPU that does the data preparation. One of the obstacles is that remote call mechanism is guarded by ProcessFunctionLibraryRuntime::GetDeviceContext that, as of now, is a bunch of hardcoded device type strings. Do you envision a way to allow PluggableDevice to support remote calls?
Also a few operators (IteratorGetNext to mention one) must be supported on the accelerator and seemingly could reuse TF implementations. This would ask for an access to operator registry such that would allow duplicating registrations of some operators from one device (say "GPU") to another.
Such features would be very welcome. Are these topics in scope of Pluggable Device RFC?

@pawelpiskorski @gunan mentioned that the above issue should be solved by registering these kernels as "DEVICE_DEFAULT", following the example in queue_ops.cc (a link). For your case, the IteratorGetNext registration should be modified from "DEVICE_DEFAULT" (a link). For the kernels being registered as "DEVICE_DEFAULT", it can be used for any device. So when a device is plugged, these operations should be assigned to the pluggable device since it has highest priority and backed by the "DEVICE_DEFAULT" kernels.

Thanks for suggestion. I registered a group consisting of IteratorV2, MakeIterator, DeleteIterator, AnonymousIterator, AnonymousIteratorV2, IteratorGetNext, IteratorGetNextSync, IteratorGetNextAsOptional, IteratorToStringHandle, IteratorFromStringHandleV2 on DEVICE_DEFAULT. This somehow places the part of dataset that's designated for CPU on the device. It's a bit surprising because I haven't changed Dataset* registrations. . In short, seems not to be working out of the box for datasets, but I'll try to dig into that.
That said, I'd love to advertise my question on tf.developers regarding device type hardcodes. It so far has gone unnoticed but please have a look. Thanks!!

annarev added a commit to annarev/community that referenced this pull request Sep 10, 2020
Updates based on recent changes and discussion on RFC PR tensorflow#262
@penpornk
Copy link
Member

@wchao1115 We have updated the versioning strategy. Changes include:

  • Restructuring the contents so the conventions/assumptions/procedures are clearer.
  • An example showing that the struct size check works with struct members in any position, not just the last member of the struct.
  • More details on major version bump conventions. Specifying that breaking changes require an RFC.
  • An explicit mention that nontrivial deprecations (that cannot use zero initialization) needs a major version bump.
  • Adding a requirement that fields that treat 0 and NULL as invalid values must have explicit comments saying so, to ensure all plug-ins have consistent behavior, e.g., none of them is using 0 or NULL for special cases.
  • Updating the deprecation examples to use 'producer' and 'consumer' keywords (instead of plug-in and core TensorFlow) to cover more cases.

Could you please help take another look and raise any remaining concerns? Thank you!

Please feel free to comment on the file directly on the StreamExecutor C API RFC PR's file changes review page. (Or we can discuss here as usual.)

@annarev
Copy link
Contributor

annarev commented Sep 17, 2020

thanks @annarev , I think unified_memory_allocate/unified_memory_deallocate is still need, AllocatorHandle is a higher level api, which allows plugin authors to implement a specific cached allocator, which is counter part of BFCAllocator. But for those plugin authors who want to reuse existing tensorflow BFC allocator, I think the low level allocation api in StreamExecutor(AllocateArray/UnifiedMemoryAllocate) is still need, We can make it: if plugin register a allocator handle, then proper will use allocate handle as its allocator, it not , BFC Allocator will be the default Allocator and it will be forward to StreamExecutor Allocation API through sub-allocator.

I wonder if it will get confusing with 2 separate ways to set allocator functionality. Need to think of a clear way to annotate which one to use when in the API. But I will try it out and see how it looks.

I added a proposed implementation here:
21be6e9
This adds 2 sets of functions to SP_PlatformFns create_allocator/destroy_allocator and create_custom_allocator/destroy_custom_allocator. I thought keeping the two options near each other would make it less confusing.
@kulinseth, @jzhoulon ptal.

@kulinseth
Copy link
Contributor

kulinseth commented Sep 17, 2020

I wonder if it will get confusing with 2 separate ways to set allocator functionality. Need to think of a clear way to annotate which one to use when in the API. But I will try it out and see how it looks.

I added a proposed implementation here:
21be6e9
This adds 2 sets of functions to SP_PlatformFns create_allocator/destroy_allocator and create_custom_allocator/destroy_custom_allocator. I thought keeping the two options near each other would make it less confusing.
@kulinseth, @jzhoulon ptal.

Thanks Anna, this looks good.!.

@penpornk
Copy link
Member

Questions we got offline (cc: @kulinseth, @wchao1115, @jzhoulon, @Jianhui-Li).

Q1: What are the next steps?

  • We are trying to wrap up both RFCs (StreamExecutor C API and this).
    • Wrapping up doesn’t mean we can’t make changes anymore. It’s more of a confirmation that the proposals are really happening. We can continue to discuss here and make adjustments.
  • PluggableDevice implementation PRs will be submitted by @jzhoulon and his team. (The PRs can be submitted for review anytime, but they likely will be merged after the RFC itself is accepted.)
  • Interested parties can try the experimental API and give feedback.
  • Reiterate until the API is somewhat stable, then consider moving it out of experimental.

Q2: How will the API be released? Will we have a branch where the API will be released in the experimental folder?

  • StreamExecutor C API is already in the experimental folder in the master branch. It will be released with the next TensorFlow release.
  • PluggableDevice will be too once it is checked in. It will be released with the soonest TensorFlow release after that.

Q3: What is the high-level timeline?

Q4: Will this be in TF 2.4?

  • StreamExecutor C API: Yes.
  • PluggableDevice: Depends on when the implementation gets in. If it’s before r2.4 branch cut, then yes. I’ll post the branch cut date here once it is announced on [email protected].

@wchao1115: Friendly ping. Have you had a chance to look at the updated versioning strategy yet? :)

@penpornk
Copy link
Member

cc: @jzhoulon @Jianhui-Li @wchao1115 @kulinseth

I'd like to wrap up this RFC (i.e., merge this PR) to pave the way for the upcoming implementation PRs.

Here are the changes since the last open design review meeting:

  • PluggableDevice:
    • Renamed PluggableDeviceAllocator to PluggableDeviceMemAllocator and added PluggableDeviceProcessState. 492ca0b
  • StreamExecutor C API
    • Added more explanation of the overall functionality.
    • Added more clarifications on versioning strategy.
    • Corrected typos. Minor functionality additions such as block_host_until_done, unified_memory_allocate/deallocate, etc.

If anyone has anything else that needs to be discussed/resolved before we can move forward (i.e., fundamental issues), please explicitly say so here by this Thursday, 9/24 at 11:59pm PT.

@wchao1115 I believe the versioning strategy is not a blocking issue. We can continue discussing and refining the strategy here throughout the experimental phase.

Q4: Will this be in TF 2.4?

TensorFlow 2.4 branch cut has been announced. If PluggableDevice implementation makes it into TF by 10/21/20 at 5pm PT, it will be in TF 2.4. Otherwise, it won't.

@wchao1115
Copy link
Contributor

@penpornk Sorry for the delay. The revised versioning doc looks fine for now. We can iterate more when we start implementing the V1 plug-in.

A note on deprecation by the producer leaving an attribute zero-initialized: I'm not sure this is safe and can be done in a backward compatible way even when zero is defined as an invalid value for the attribute because the consumer may reject the zero value and fail the call instead.

If in the later minor version the producer setting the value to zero assumes the attribute will be safely ignored by the older plug-ins, the plug-in may fail.

I believe it is only safe to do so when the attribute is also considered optional by the producer. A required attribute must not be left zero initialized in the later version without a major version bump.

@kulinseth
Copy link
Contributor

Questions we got offline (cc: @kulinseth, @wchao1115, @jzhoulon, @Jianhui-Li).

Q1: What are the next steps?

  • We are trying to wrap up both RFCs (StreamExecutor C API and this).

    • Wrapping up doesn’t mean we can’t make changes anymore. It’s more of a confirmation that the proposals are really happening. We can continue to discuss here and make adjustments.
  • PluggableDevice implementation PRs will be submitted by @jzhoulon and his team. (The PRs can be submitted for review anytime, but they likely will be merged after the RFC itself is accepted.)

  • Interested parties can try the experimental API and give feedback.

  • Reiterate until the API is somewhat stable, then consider moving it out of experimental.

Q2: How will the API be released? Will we have a branch where the API will be released in the experimental folder?

  • StreamExecutor C API is already in the experimental folder in the master branch. It will be released with the next TensorFlow release.
  • PluggableDevice will be too once it is checked in. It will be released with the soonest TensorFlow release after that.

Thanks! for clarifying on these questions. There is no outstanding design items from our side. There will be few iterations and adjustments as we get to the implementation PRs and start using the API.

  • @jzhoulon @Jianhui-Li Would you mind giving a rough timeline of when you plan to submit PluggableDevice implementation PRs?
    If PluggableDevice implementation makes it into TF by 10/21/20 at 5pm PT, it will be in TF 2.4. Otherwise, it won't.

@jzhoulon, @Jianhui-Li it will be great if you can elaborate on it. We would like to know if you are also targeting TF 2.4. Also it will be great if there are any implementation PRs we can try out to adapt our implementation and provide any feedback.

@Jianhui-Li
Copy link

@jzhoulon, @Jianhui-Li it will be great if you can elaborate on it. We would like to know if you are also targeting TF 2.4. Also it will be great if there are any implementation PRs we can try out to adapt our implementation and provide any feedback.

@kulinseth We are targeting initial PRs for TF2.4 as a stretch goal and incremental PRs on TF2.5. At experimental stage, the initial version wont' be able to run full model, but useful for us to test various plugin and further improve the interface.

@kulinseth
Copy link
Contributor

kulinseth commented Sep 25, 2020

@kulinseth We are targeting initial PRs for TF2.4 as a stretch goal and incremental PRs on TF2.5. At experimental stage, the initial version wont' be able to run full model, but useful for us to test various plugin and further improve the interface.

Thanks for the update. I was also hoping to iron out and test the interface in the experimental stage. It will become more clear after looking at the PR and testing with more plugins.

@penpornk
Copy link
Member

@wchao1115 @kulinseth Thank you for the quick confirmation! :)
@Jianhui-Li Thank you for the timeline!

@ematejska @alextp I believe there is no outstanding issue now. Could we move forward?

@alextp
Copy link
Contributor

alextp commented Sep 25, 2020 via email

Copy link
Contributor

@theadactyl theadactyl left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@theadactyl theadactyl merged commit ddbdeac into tensorflow:master Sep 29, 2020
@theadactyl theadactyl added RFC: Accepted RFC Design Document: Accepted by Review and removed RFC: Proposed RFC Design Document labels Sep 29, 2020
@kulinseth
Copy link
Contributor

Hi,

@penpornk and others
I had a general question about how the plugin modules will be packaged for the user. Will it be a regular package or a namespace package ? Also I would like to understand the workflow for installing the pluggable module:

  1. install the core tensorflow package
  2. install the pluggable device package (which will install it at ../site-packages/tensorflow-plugins/M)

Please let me know if there are other ways and if I am missing something.

@yiqianglee
Copy link

Hi @kulinseth ,
Pluggable device plugin installation is following modular TF python design (https://github.com/tensorflow/community/blob/master/rfcs/20190305-modular-tensorflow.md#python), it's backend's choice to distribute as regular package or namespace package, the only requirement is to follow the convention to install library under "../site-packages/tensorflow-plugins/M" so that plugin discovery can load it successfully.

install the core tensorflow package
install the pluggable device package (which will install it at ../site-packages/tensorflow-plugins/M)

Yes, that's expected installation process.

@penpornk
Copy link
Member

penpornk commented Nov 5, 2020

Hi all, quick updates:

  • The initial PluggableDevice implementation couldn't make it into TF 2.4, but it seems on track to be in TF 2.5.
  • Current PluggableDevice PRs under review, in case anyone would like a quick look: 1, 2, 3
  • The RFC for custom graph optimizer C API is now up: RFC: Modular TensorFlow Graph C API #318

@kevint324
Copy link

Hi guys,

Is there any demo like a pseudo pluggable device to show what need to be done to add a device to tensorflow?

Thanks

@yiqianglee
Copy link

Hi @kevint324 ,
We are working on a tutorial on how to implement TensorFlow plugin, will release in near future.

@kevint324
Copy link

Sounds great. Looking forward to it.

@mmaor123
Copy link

mmaor123 commented Aug 8, 2021

Hi @kevint324 ,
We are working on a tutorial on how to implement TensorFlow plugin, will release in near future.
@yiqianglee

reiterating the question. is there a detailed mock pluggable device example or other document on this?
thanks

@yiqianglee
Copy link

Hi @kevint324 ,
We are working on a tutorial on how to implement TensorFlow plugin, will release in near future.
@yiqianglee

reiterating the question. is there a detailed mock pluggable device example or other document on this?
thanks

Please see here for details tutorial and example for pluggable device development.
#352

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cla: yes RFC: Accepted RFC Design Document: Accepted by Review
Projects
None yet
Development

Successfully merging this pull request may close these issues.