-
Notifications
You must be signed in to change notification settings - Fork 7.9k
PharData
incorrectly extracts zip file
#13037
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
PharData
incorrectly extracts zip file
The corruption only seems to happen when the compression flags of the file entry are zero, i.e. no compression is used. EDIT: looks like the zip file entry offset is maybe not set up properly, the |
The difference of 4 bytes comes from an inconsistency in the extra field length in the sample zip files. According to https://pkware.cachefly.net/webdocs/casestudies/APPNOTE.TXT
Furthermore, upon finding this I googled a bit further and found this: https://stackoverflow.com/questions/58702783/finding-start-of-compressed-data-for-items-in-a-zip-with-zip4j
And indeed that's the case here too! What an amazing file format design! Here's a very quick and dirty proof of concept patch: diff --git a/ext/phar/zip.c b/ext/phar/zip.c
index 1804d926b4..c07ddf9b10 100644
--- a/ext/phar/zip.c
+++ b/ext/phar/zip.c
@@ -386,8 +386,16 @@ int phar_parse_zipfile(php_stream *fp, char *fname, size_t fname_len, char *alia
entry.timestamp = phar_zip_d2u_time(zipentry.timestamp, zipentry.datestamp);
entry.flags = PHAR_ENT_PERM_DEF_FILE;
entry.header_offset = PHAR_GET_32(zipentry.offset);
+ // entry.offset = entry.offset_abs = PHAR_GET_32(zipentry.offset) + sizeof(phar_zip_file_header) + PHAR_GET_16(zipentry.filename_len) +
+ // PHAR_GET_16(zipentry.extra_len);
+
+ zend_off_t loc = php_stream_tell(fp);
+ phar_zip_file_header local_file_header;
+ php_stream_seek(fp, entry.header_offset, SEEK_SET);
+ php_stream_read(fp, (char *) &local_file_header, sizeof(local_file_header));
+ php_stream_seek(fp, loc, SEEK_SET);
entry.offset = entry.offset_abs = PHAR_GET_32(zipentry.offset) + sizeof(phar_zip_file_header) + PHAR_GET_16(zipentry.filename_len) +
- PHAR_GET_16(zipentry.extra_len);
+ PHAR_GET_16(local_file_header.extra_len);
if (PHAR_GET_16(zipentry.flags) & PHAR_ZIP_FLAG_ENCRYPTED) {
PHAR_ZIP_FAIL("Cannot process encrypted zip files");
I'll write up a proper patch and testcase tomorrow, it's late. |
The code currently assumes that the extra field length of the central directory entry and the local entry are the same, but that's not the case. For example, the "Extended Timestamp extra field" differs in size for local vs central directory entries. This causes the file contents offset to be incorrect because it is based on the central directory length instead of the local entry length. Fix it by reading the local entry and getting the size from there as well as checking consistency for the file name length.
Description
The following code:
Extracts the zip file with invisible characters such as \x14 and other control characters in a few of HUNDREDS of files.
Files that are corrupted when extracted:
diff -r -q /path/to/dir1 /path/to/dir2
output:This code:
correctly extracts the zip file as expected, in the same way as command line tool
unzip
does.Working file from

ZipArchive
:Corrupted file from

PharData
:PHP Version
PHP 8.2.7
Operating System
Debian 12.4 raspberrypi
The text was updated successfully, but these errors were encountered: