Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[GPU] Support for converting zero-filled memory for i4 compressed weights #29937

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

ahnyoung-paul
Copy link
Contributor

Details:

  • Support for converting zero-filled memory for i4 compressed weights
  • Currently, the innermost dimension with an odd size, such as i4:bfyx:5120x13653:nopad, crashes due to a bad weight dimension issue in the oneDNN matmul.
  • The odd dimension is acceptable if it is an outer dimension of matmul, but it is not allowed as an innermost dimension in oneDNN matmul.
  • To fix this issue, add padding after the innermost odd-sized dimension i4[0, 13653] with 0 and set the y stride to 13654. Then, add stride information to the memory descriptor of oneDNN

Tickets:

  • 165374

@ahnyoung-paul ahnyoung-paul added category: GPU OpenVINO GPU plugin pr: needs tests PR needs tests updating do not merge labels Apr 4, 2025
@ahnyoung-paul ahnyoung-paul requested review from a team as code owners April 4, 2025 07:48
strides.push_back(1);
strides.push_back(padded_dims[1]);
dnnl::memory::data_type dt = convert_data_type(weights_layout.data_type);
weights_md = dnnl::memory::desc(dims, dt, strides);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I think it would be better to be placed within onednn::layout_to_memory_desc.
  • This function supports padded_dims[1] only. Then could you add assert to check padded_dims[0] == 0?
  • Could you add test case for this fix?

@@ -115,7 +189,17 @@ static void create_data(ProgramBuilder& p, const ov::Shape& const_shape, const s
auto buf = lock.data();
auto bufSize = constLayout.bytes_count();

std::memcpy(&buf[0], &data[0], bufSize);
if (constLayout.data_padding) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isn't it necessary to check also data_type == u4 || data_type == i4?

std::memcpy(&padded_vec[padded_begin], &non_padded_vec[non_padded_begin], non_padded_row_size);
if (non_padded_end < non_padded_dims_size) {
uint8_t s0, s1;
unpack8to4(non_padded_vec[non_padded_end], s0, s1);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm afraid it will create wrong data if you convert char data into uint8_t. For example, when char contains negative value.


// Function to convert a non-padded vector to a padded vector
static void copy_to_padded_vector(const std::vector<int> &non_padded_dims, const char* non_padded_vec,
const std::vector<int> &padded_dims, char* padded_vec) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this function works only for 4bit data element. I think it would be better to clarify from the name.

temp_padding_buffer[index] = s0;
++index;
temp_padding_buffer[index] = s1;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How long does it take? I'm afraid this will be very slow. I think we need to find a better way.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
category: GPU OpenVINO GPU plugin do not merge pr: needs tests PR needs tests updating
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants