Experiments/icepack/pyice.py (1,058 lines of code) (raw):
#!/usr/bin/python3
# facebook t38065718
'''
PyICE is a memory mapped implementation of cached byte code. Multiple Python
modules are compiled into a single memory mapped file. The actual code objects
are stored in the memory mapped file.
PyICE will produce a file which is called an ice pack. An ice pack consists of
multiple compiled Python byte codes. An ice pack will further reduce memory
usage by folding constants across all of the modules.
'''
import io
import math
import mmap
import os
import re
import struct
import sys
import time
try:
import _pyice
except ImportError:
_pyice = None
from collections import deque, OrderedDict
from importlib.util import spec_from_file_location, decode_source
from os import path
from types import CodeType
'''
IcePack file format:
ICEPACK[byte version]
4 bytes - TIMESTAMP
Section offsets, an array of int32s that provide the offset to each section:
Modules: A sorted tree of the module names provided by this icepack.
Code Objects: The metadata for each code object.
Strings: A table of strings which are referred to in the icepack
Bytes: A table of bytes objects which are referred to in the icepack
Ints: A table of integer values which don't fit in 24-bit values
Big Ints: A table of integer values which don't fit in 32-bit values
Floats: A table of floating point numbers in the icepack
Complexs: A table of complex numbers
Tuples: A table of tuple objects, composed from this or other tables
FrozenSet: A table of frozen set objects
Sections
========
Modules
-------
A sorted list of the modules/packages. Top-level packages/modules are listed
first, and then an offset to any child packages is provided. If there are no
children then the offset will be 0.
Each section is organized as:
[uint32 module count], ([uint32 name reference], [uint32 code id],
[uint32 is_package], [uint32 str filename ref]
[uint32 child offset])*
The module count is the number of name reference / child offsets. The
name reference is a reference into the string table. The child offset is
the absolute offset from the start of the icepack to where anyh child modules
exist.
Code Objects
------------
Code objects are variable sized based upon the number of variables, etc...
which they reference. Therefore this is broken into two parts. The first part
is an array of uint32's which is the absolute offset to the code object.
[uint32 absolute offset]*
Following are the code objects themselves, laid out as:
[uint32 co_bytes offset to bytes table]
[uint32 co_argcount]
[uint32 co_kwonlyargcount]
[uint32 co_nlocals]
[uint32 co_stacksize]
[uint32 co_flags]
[uint32 co_firstlineno]
[uint32 co_name reference to str table]
[uint32 co_filename reference to str table]
[uint32 co_lnotab reference to bytes table]
[uint32 length] [uint32 co_cellvars reference to str table]*
[uint32 length] [uint32 co_freevars reference to str table]*
[uint32 length] [uint32 co_names reference to str table]*
[uint32 length] [uint32 co_varnames reference to str table]*
[uint32 length] [uint32 co_consts object references]*
Strings
-------
Strings are variable sized based upon the length of the string.
Therefore this is broken into two parts. The first parts is an array of int32's
which is the absolute offset to the string contents. Then the arrays of utf8
encoded bytes represent the string.
[uint32 count] number of strings
[uint32 absolute offset]*
[uint32 length] [byte utf8 encoded]*
Bytes
-----
Bytes are variable sized based upon the length of the byte string.
Therefore this is broken into two parts. The first parts is an array of int32's
which is the absolute offset to the byte object. Then the arrays of bytes
follow.
[uint32 count] number of byte objects
[uint32 absolute offset]*
[uint32 length] [byte]*
Ints
----
Ints are fixed 32-bit values. They are encoded as an array of the integers.
They are preceeded by a uint32 count.
Big Ints
--------
Tuples are variable sized based upon the number of bytes they contain.
Therefore this is broken into two parts. The first parts is an array of u
uint32's which is the absolute offset to the big int representation. Then
the big ints follow.
[uint32 count] number of strings
[uint32 absolute offset]*
Big int's are stored as:
[uint32 length] [byte array]*
The byte array is in the format returned by int.to_bytes(), serialized as
little endian, and signed.
Floats
------
Floats are fixed sized double values. They are encoded as an array of the
floating point value which can simply be indexed. They are preceeded by a
uint32 count.
Complexes
---------
Complex's are fixed sized pairs of double values. They are encoded as an array
of the real followed by the imaginary value stored as a double. They are
preceeded by a int32 count.
Tuples
------
Tuples are variable sized based upon the number of items they contain.
Therefore this is broken into two parts. The first parts is an array of
int32's which is the absolute offset to the tuple object. Then the tuple
objects follow.
[uint32 count] number of tuples
[uint32 absolute offset]*
Tuples them selves are encoded as:
[uint32 length] [uint32 object references]*
Frozen Sets
-----------
Frozen sets are variable sized based upon the number of items they contain.
Therefore this is broken into two parts. The first parts is an array of
int32's which is the absolute offset to the set object. Then the set
objects follow.
[uint32 count] number of sets
[uint32 absolute offset]*
Sets them selves are encoded as:
[uint32 length] [uint32 object references]*
Object References
=================
Occasionally we need to embed an object reference to an untyped object.
This occurs in the co_const array on objects as well as for the elements
of a tuple.
References are encoded using a 32-bit value indicating the table to to
get the value from and the index into that table. The low byte is used
to indicate the type of value and the value is encoded in the high 24
bits.
0: None
1: Named constant
Upper 24-bits:
0 = False
1 = True
2 = Ellipsis
2: small int, upper 24 bits is the int value
3: int32, upper 24-bits is a table reference
4: large int, upper 24-bits is a table reference
5: bytes, upper 24-bits is a table reference
6: str, upper 24-bits is a table reference
7: float, upper 24-bits is a table reference
8: complex, upper 24-bits is a table reference
9: tuple, upper 24-bits is a table reference
10: code, upper 24-bits is a table reference
'''
CODE_FORMAT = struct.Struct(
'I' + # code index in the bytes table
'I' + # co_argcount
'I' + # co_kwonlyargcount
'I' + # co_nlocals
'I' + # co_stacksize
'I' + # co_flags
'I' + # co_firstlineno
'I' + # co_name index in str table
'I' + # co_filename index in str table
'I' + # co_nlotab index in bytes table
'I' + # cellvars index in string array table
'I' + # freevars index in string array table
'I' + # names index in string array table
'I' + # varnames index in string array table
'I' + # consts index in tuple table
''
)
INT = struct.Struct('i')
UINT = struct.Struct('I')
FLOAT = struct.Struct('d')
COMPLEX = struct.Struct('dd')
TIMESTAMP_OFFSET = 8
SECTION_OFFSET = 12
SECTION_FORMAT = struct.Struct('IIIIIIIIII')
MODULE_ENTRY = struct.Struct('IIIII')
EXTENSION = '.icepack'
class ModuleInfo:
def __init__(self, code=-1, filename=None, is_package=False):
self.children = OrderedDict()
self.code = code
self.filename = filename
self.child_offset = 0
self.is_package = is_package
OBJECT_TYPE_NONE = 0x00
OBJECT_TYPE_NAMED_CONSTANT = 0x01
OBJECT_TYPE_INT32 = 0x03
OBJECT_TYPE_BIGINT = 0x04
OBJECT_TYPE_BYTES = 0x05
OBJECT_TYPE_STR = 0x06
OBJECT_TYPE_FLOAT = 0x07
OBJECT_TYPE_COMPLEX = 0x08
OBJECT_TYPE_TUPLE = 0x09
OBJECT_TYPE_CODE = 0x0A
OBJECT_TYPE_FROZENSET = 0x0B
# Named constants
OBJECT_FALSE = 0x0001
OBJECT_TRUE = 0x0101
OBJECT_ELLIPSIS = 0x0201
def _float_equals(a, b):
if math.isnan(a) and math.isnan(b):
return True
elif (a == 0 and b == 0 and
math.copysign(1, a) != math.copysign(1, b)):
return False
else:
return a == b
class PyObjectValue:
'''handles equality with slightly different semantics than normal
Python ==. Disallows equality between conflicting types (e.g. 0 != 0.0 !=
False). Allows NaN == NaN and +0.0 != -0.0 for both floats and components
of complex numbers'''
def __init__(self, value):
assert type(value) is not ObjectValue
if type(value) == tuple:
self.value = tuple(ObjectValue(v) for v in value)
elif type(value) == frozenset:
self.value = frozenset(ObjectValue(v) for v in value)
else:
self.value = value
self.hash = hash(self.value) ^ hash(type(self.value))
def __repr__(self):
return 'ObjectValue(' + repr(self.value) + ')'
def __hash__(self):
return self.hash
def __eq__(self, other):
if type(other) is not ObjectValue:
return False
elif type(self.value) is not type(other.value):
return False
elif type(self.value) == float:
return _float_equals(self.value, other.value)
elif type(self.value) == complex:
return (_float_equals(self.value.real, other.value.real) and
_float_equals(self.value.imag, other.value.imag))
return self.value == other.value
if _pyice is None:
ObjectValue = PyObjectValue
class IcePackError(Exception):
pass
else:
# Use the C accelerator version of ObjectValue
ObjectValue = _pyice.CObjectValue
IcePackError = _pyice.IcePackError
def _align_file(file, align=8):
len = file.tell()
padding = (((len + align - 1) & (~(align - 1))) - len)
file.write(b'\x00' * padding)
class _TypeTable:
section_count = 1
def add(self, value) -> bool:
raise NotImplementedError(str(type(self)))
def write(self, value, outfile):
raise NotImplementedError(str(type(self)))
def write_table(self, value, outfile):
raise NotImplementedError(str(type(self)))
@staticmethod
def add_const_to_table(table, value) -> bool:
if value not in table:
table[value] = len(table)
return True
return False
@staticmethod
def write_simple_table(table, outfile, get_bytes, align=4):
values = [(y, x) for x, y in table.items()]
values.sort()
outfile.write(UINT.pack(len(values)))
_align_file(outfile, align)
for _index, value in values:
outfile.write(get_bytes(value))
@staticmethod
def write_variable_len_table(table, outfile, get_bytes_and_len):
'''writes the string table, in the format: (offset)* (utf8 value)*'''
values = [(y, x, get_bytes_and_len(x)) for x, y in table.items()]
values.sort()
outfile.write(UINT.pack(len(values)))
offset = outfile.tell() + len(table) * 4
for _index, _value, (encoded, _length) in values:
outfile.write(UINT.pack(offset))
offset += len(encoded) + 4
for _index, _value, (encoded, length) in values:
outfile.write(UINT.pack(length))
outfile.write(encoded)
class _ConstantTable(_TypeTable):
section_count = 0
def __init__(self, id):
self.id = id
def add(self, value) -> bool:
return False
def write(self, value, outfile):
outfile.write(UINT.pack(self.id))
def write_table(self, value, outfile):
return ()
class _BoolTable(_TypeTable):
section_count = 0
def add(self, value) -> bool:
return False
def write(self, value, outfile):
outfile.write(UINT.pack(OBJECT_TRUE if value else OBJECT_FALSE))
def write_table(self, value, outfile):
return ()
class _SimpleTable(_TypeTable):
object_type: int
def __init__(self):
self.table = {}
def add(self, value) -> bool:
return self.add_const_to_table(self.table, value)
def write(self, value, outfile):
str_id = self.table[value]
outfile.write(UINT.pack(str_id << 8 | self.object_type))
def __getitem__(self, value):
return self.table[value]
class _IntTable(_TypeTable):
section_count = 2
def __init__(self):
self.int_table = {}
self.bigint_table = {}
def add(self, value):
if -1 << 31 <= value <= (1 << 31) - 1:
return self.add_const_to_table(self.int_table, value)
else:
return self.add_const_to_table(self.bigint_table, value)
def write(self, value, outfile):
if -1 << 31 <= value <= (1 << 31) - 1:
index = self.int_table[value]
outfile.write(UINT.pack(OBJECT_TYPE_INT32 | index << 8))
else:
index = self.bigint_table[value]
outfile.write(UINT.pack(OBJECT_TYPE_BIGINT | index << 8))
@staticmethod
def bigint_to_bytes_and_len(value):
length = (value.bit_length()+7)//8
try:
res = value.to_bytes(length, 'little', signed=True)
except OverflowError:
# Positive numbers like 0x80000000 will fail because bit_length
# doesn't account for the need for a sign bit
res = value.to_bytes(length + 1, 'little', signed=True)
return res, len(res)
def write_table(self, maker, outfile):
int_offset = outfile.tell()
self.write_simple_table(self.int_table, outfile, INT.pack)
bigint_offset = outfile.tell()
self.write_variable_len_table(self.bigint_table, outfile,
self.bigint_to_bytes_and_len)
return int_offset, bigint_offset
class _BytesTable(_SimpleTable):
object_type = OBJECT_TYPE_BYTES
def write_table(self, maker, outfile):
offset = outfile.tell()
self.write_variable_len_table(self.table, outfile,
lambda value: (value, len(value)))
return offset,
class _StrTable(_SimpleTable):
object_type = OBJECT_TYPE_STR
def write_table(self, maker, outfile):
offset = outfile.tell()
self.write_variable_len_table(self.table, outfile,
self.get_bytes_and_len)
return offset,
def get_bytes_and_len(self, value):
# Null terminate strings for making life easier in dealing
# w/ the module table lookups
res = value.encode('utf8', 'surrogatepass')
return res + b'\0', len(res)
class _ObjectValueTable(_SimpleTable):
def add(self, value):
return super().add(ObjectValue(value))
def write(self, value, outfile):
super().write(ObjectValue(value), outfile)
def __getitem__(self, value):
return super().__getitem__(ObjectValue(value))
class _FloatTable(_ObjectValueTable):
object_type = OBJECT_TYPE_FLOAT
def write_table(self, maker, outfile):
offset = outfile.tell()
self.write_simple_table(self.table, outfile,
lambda value: FLOAT.pack(value.value), 8)
return offset,
class _ComplexTable(_ObjectValueTable):
object_type = OBJECT_TYPE_COMPLEX
def write_table(self, maker, outfile):
offset = outfile.tell()
def writer(value):
return COMPLEX.pack(value.value.real, value.value.imag)
self.write_simple_table(self.table, outfile, writer, 8)
return offset,
class _SequenceTable(_ObjectValueTable):
def __init__(self, object_type, maker):
super().__init__()
self.maker = maker
self.object_type = object_type
def add(self, value) -> bool:
# TODO: Fixme
if super().add(value):
for x in value:
self.maker.add_const(x)
return True
return False
def write_table(self, maker, outfile):
table_offset = outfile.tell()
values = [(y, x) for x, y in self.table.items()]
values.sort()
outfile.write(UINT.pack(len(values)))
offset = outfile.tell() + len(self.table) * 4
for _index, value in values:
outfile.write(UINT.pack(offset))
offset += 4 + len(value.value) * 4
for _index, value in values:
maker.write_array(value.value)
return table_offset,
class _CodeTable(_TypeTable):
def __init__(self, maker):
self.table = {}
self.maker = maker
@property
def count(self):
return len(self.table)
def code_id(self, code):
'''Python has very weird equality semantics around code objects, so that
it doesn't compare the filename or the co_lnotab fields to determine if they're
equal. That results in oddities such as:
>>> def f():
... x = 1
... x = 2
...
>>> g = f
>>> def f():
... x = 1
...
... x = 2
...
>>> f.__code__ == g.__code__
True
So we always consider object identify to ensure we don't merge code objects
together and end up with wrong filename or line number information.
'''
return id(code), code
def __getitem__(self, code):
return self.table[self.code_id(code)]
def write(self, value, outfile):
code_id = self.code_id(value)
outfile.write(UINT.pack(self.table[code_id] << 8 | OBJECT_TYPE_CODE))
pass
def add(self, value):
code_id = self.code_id(value)
if code_id not in self.table:
self.table[code_id] = len(self.table)
self.maker.add_const(value.co_code)
self.maker.enqueue_code(value)
def write_table(self, maker, outfile):
table_offset = outfile.tell()
code_offsets = []
# write space for offsets to code objects
outfile.write(INT.pack(len(self.table)))
code_start = outfile.tell()
outfile.truncate(code_start + 4 * len(self.table))
outfile.seek(0, 2)
codes = [(y, x) for (_, x), y in self.table.items()]
codes.sort()
for _i, code in codes:
code_offsets.append(outfile.tell())
header = CODE_FORMAT.pack(
maker.bytes[code.co_code],
code.co_argcount,
code.co_kwonlyargcount,
code.co_nlocals,
code.co_stacksize,
code.co_flags,
code.co_firstlineno,
maker.strs[code.co_name],
maker.strs[code.co_filename],
maker.bytes[code.co_lnotab],
maker.get_tuple_id(code.co_cellvars),
maker.get_tuple_id(code.co_freevars),
maker.get_tuple_id(code.co_names),
maker.get_tuple_id(code.co_varnames),
maker.get_tuple_id(code.co_consts),
)
outfile.write(header)
outfile.seek(code_start)
for offset in code_offsets:
outfile.write(UINT.pack(offset))
outfile.seek(0, io.SEEK_END)
return table_offset,
class IceMaker:
'''Generates an ice pack from a set of modules and saves it to a file like
object.'''
def __init__(self, outfile):
'''Creates a new IceMaker which will save the contents of the provided
modules to outfile which should be a seekable file-like object'''
self.outfile = outfile
self.consts = set()
self.queue = deque()
self.modules = {} # all modules, e.g. a.b.c -> ModuleInfo
self.timestamp = 0
self.codes = _CodeTable(self)
self.strs = _StrTable()
self.tuples = _SequenceTable(OBJECT_TYPE_TUPLE, self)
self.bytes = _BytesTable()
self.ints = _IntTable()
self.floats = _FloatTable()
self.complexes = _ComplexTable()
self.frozensets = _SequenceTable(OBJECT_TYPE_FROZENSET, self)
self.type_handlers = {
type(None): _ConstantTable(0),
type(Ellipsis): _ConstantTable(OBJECT_ELLIPSIS),
bool: _BoolTable(),
int: self.ints,
bytes: self.bytes,
str: self.strs,
float: self.floats,
complex: self.complexes,
CodeType: self.codes,
ObjectValue: None,
tuple: self.tuples,
frozenset: self.frozensets,
}
# Ensure we have an empty code for namespace modules
self.empty_code = compile('', '', 'exec', dont_inherit=True)
self.enqueue_code(self.empty_code)
def add_module(self, code, name, filename, is_package=False, timestamp=0):
if timestamp > self.timestamp:
self.timestamp = timestamp
self.enqueue_code(code)
self.modules[name] = ModuleInfo(self.codes[code],
filename, is_package)
self.add_const(filename)
for name_part in name.split('.'):
self.add_const(name_part)
self.process()
def enqueue_code(self, code):
self.add_const(code)
# TODO: Consider the order of serialization. Currently we're doing
# breadth first, which means children won't be near their parents,
# probably resulting in more pages being read in when not all modules
# are used. Switching to an appendleft here would result in children
# being closer to their parents, but in opposite of the order they
# appear in co_consts. And the order in co_consts appears to be the
# order they're referred to in code, so we could end up with extra
# seeking (which might not really matter)
self.queue.append(code)
def write_str(self, value):
self.outfile.write(UINT.pack(self.strs[value]))
def add_const(self, const):
handler = self.type_handlers[type(const)]
return handler.add(const)
def process(self):
while self.queue:
code = self.queue.popleft()
self.add_const(code.co_filename)
self.add_const(code.co_name)
self.add_const(code.co_lnotab)
self.add_const(code.co_names)
self.add_const(code.co_varnames)
self.add_const(code.co_cellvars)
self.add_const(code.co_freevars)
self.add_const(code.co_consts)
def write_object_value(self, value):
if type(value.value) == tuple:
tuple_id = self.tuples.table[value]
self.outfile.write(UINT.pack(tuple_id << 8 | OBJECT_TYPE_TUPLE))
elif type(value.value) == frozenset:
set_id = self.frozensets.table[value]
self.outfile.write(UINT.pack(set_id << 8 | OBJECT_TYPE_FROZENSET))
else:
self.write_reference(value.value)
def write_reference(self, value):
if type(value) is ObjectValue:
self.write_object_value(value)
else:
handler = self.type_handlers[type(value)]
handler.write(value, self.outfile)
def write_array(self, arr):
self.outfile.write(UINT.pack(len(arr)))
for value in arr:
self.write_reference(value)
def write_str_array(self, arr):
self.outfile.write(UINT.pack(len(arr)))
for value in arr:
self.write_str(value)
def write(self):
# Write the tables
sections = [
self.codes,
self.strs,
self.bytes,
self.ints,
self.floats,
self.complexes,
self.tuples,
self.frozensets,
]
sec_count = sum(section.section_count for section in sections) + 1
self.outfile.write(b'ICEPACK\x00') # write header and version
self.outfile.write(UINT.pack(self.timestamp))
# Get space for section offsets
offset_start = self.outfile.tell()
self.outfile.write(b'\0\0\0\0' * (sec_count))
# Then write the sections
offsets = []
_align_file(self.outfile)
offsets.append(self.outfile.tell())
self.write_modules()
for section in sections:
_align_file(self.outfile)
offsets.extend(section.write_table(self, self.outfile))
# Then update the section offsets
self.outfile.seek(offset_start)
for offset in offsets:
self.outfile.write(UINT.pack(offset))
self.outfile.seek(0, io.SEEK_END)
def make_module_tree(self):
tree = ModuleInfo()
all_modules = list(self.modules.items())
all_modules.sort()
# First build a tree of all modules...
for mod_name, mod_info in all_modules:
cur = tree
names = mod_name.split('.')
for name in names[:-1]:
next = cur.children.get(name)
if next is None:
ns_module = ModuleInfo(self.codes[self.empty_code], '',
is_package=True)
next = cur.children[name] = ns_module
cur = next
cur.children[names[-1]] = mod_info
self.calculate_module_relative_offsets(tree)
return tree
def calculate_module_relative_offsets(self, tree, offset=0):
'''Recurses through the tree and calculates the relative offsets for
where their children will live'''
# space for the # of children, and their names/code/child offset
offset += 4 + len(tree.children) * MODULE_ENTRY.size
for _name, item in tree.children.items():
if item.children:
item.child_offset = offset
offset = self.calculate_module_relative_offsets(item,
offset)
return offset
def write_modules(self):
'''We write out the module table as a sorted tree that we can binary
search. The format at each level is:
count, (name_offset, children_offset)*
If a level has no children 0 is written'''
self.write_module(self.make_module_tree(), self.outfile.tell())
def write_module(self, tree, base_offset):
# Write this entry
self.outfile.write(UINT.pack(len(tree.children)))
for name, item in tree.children.items():
self.write_str(name)
if item.code == -1:
self.outfile.write(UINT.pack(0xffffffff))
else:
self.outfile.write(UINT.pack(item.code))
self.outfile.write(UINT.pack(1 if item.is_package else 0))
self.write_str(item.filename)
if item.child_offset == 0:
self.outfile.write(b'\0\0\0\0')
else:
self.outfile.write(UINT.pack(base_offset + item.child_offset))
# Then write the children
for item in tree.children.values():
if item.children:
self.write_module(item, base_offset)
def get_tuple_id(self, value):
return self.tuples[value]
class PyIceBreaker:
def __init__(self, icepack, base_dir=''):
self.file = icepack
self.base_dir = base_dir
self.map = mmap.mmap(self.file.fileno(),
length=0, access=mmap.ACCESS_READ)
self.str_cache = {}
self.bytes_cache = {}
self.const_cache = {}
self.code_cache = {}
self.tuple_cache = {}
self.int_cache = {}
self.float_cache = {}
self.complex_cache = {}
self.bigint_cache = {}
self.frozenset_cache = {}
header = self.map[0:7]
if header != b'ICEPACK':
raise IcePackError('Invalid ice pack file: ' + repr(header))
version = self.map[7]
if version != 0:
raise IcePackError('Unsupported IcePack version')
ts_bytes = self.map[TIMESTAMP_OFFSET:SECTION_OFFSET]
self.timestamp, = UINT.unpack(ts_bytes)
sec_data = self.map[SECTION_OFFSET:
SECTION_OFFSET + SECTION_FORMAT.size]
sections = SECTION_FORMAT.unpack(sec_data)
(self.modules, self.code, self.strings, self.bytes, self.ints,
self.bigints, self.floats, self.complexes, self.tuples,
self.frozensets) = sections
self.bytes_count, = UINT.unpack(self.map[self.bytes:self.bytes+4])
self.str_count, = UINT.unpack(self.map[self.strings:self.strings+4])
self.code_count, = UINT.unpack(self.map[self.code:self.code+4])
self.int_count, = UINT.unpack(self.map[self.ints:self.ints+4])
self.bigint_count, = UINT.unpack(self.map[self.bigints:self.bigints+4])
self.float_count, = UINT.unpack(self.map[self.floats:self.floats+4])
self.complex_count, = UINT.unpack(self.map[self.complexes:
self.complexes+4])
self.tuple_count, = UINT.unpack(self.map[self.tuples:self.tuples+4])
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
self.file.close()
def read_none(self, const):
if const != 0:
raise ValueError('Invalid const value')
return None
def read_named_constant(self, const):
if const <= 2:
return (False, True, ...)[const]
raise ValueError('Invalid const value')
def read_bytes(self, index):
if index > self.bytes_count:
raise ValueError('Invalid bytes index')
res = self.bytes_cache.get(index)
if res is None:
start = self.bytes + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
len, = UINT.unpack(self.map[location:location + 4])
res = self.map[location + 4:location + 4 + len]
self.bytes_cache[index] = res
return res
def read_str(self, index):
if index > self.str_count:
raise ValueError('Invalid str index ' + str(index))
res = self.str_cache.get(index)
if res is None:
start = self.strings + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
len, = UINT.unpack(self.map[location:location + 4])
utf8 = self.map[location + 4:location + 4 + len]
self.str_cache[index] = res = utf8.decode('utf8',
'surrogatepass')
return res
def read_const(self, const):
return _CONST_READERS[const & 0xff](self, const >> 8)
def read_const_array(self, offset):
count, = UINT.unpack(self.map[offset:offset + 4])
o = offset + 4 # starting offset to the actual elements
map = self.map
return (self.read_const(UINT.unpack(map[o + i * 4:o + 4 + i * 4])[0])
for i in range(count))
def read_tuple(self, index):
if index > self.tuple_count:
raise ValueError('Invalid tuple index')
res = self.tuple_cache.get(index)
if res is None:
start = self.tuples + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
res = tuple(self.read_const_array(location))
self.tuple_cache[index] = res
return res
def read_frozenset(self, index):
if index > self.tuple_count:
raise ValueError('Invalid frozenset index')
res = self.frozenset_cache.get(index)
if res is None:
start = self.frozensets + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
res = frozenset(self.read_const_array(location))
self.frozenset_cache[index] = res
return res
def read_int(self, index):
if index > self.int_count:
raise ValueError('Invalid int index')
res = self.int_cache.get(index)
if res is None:
start = self.ints + 4 + index * INT.size
res, = INT.unpack(self.map[start:start + 4])
self.int_cache[index] = res
return res
def read_float(self, index):
if index > self.float_count:
raise ValueError('Invalid float index')
res = self.float_cache.get(index)
if res is None:
start = self.floats + 4 + 4 + index * FLOAT.size
res, = FLOAT.unpack(self.map[start:start + FLOAT.size])
self.float_cache[index] = res
return res
def read_complex(self, index):
if index > self.complex_count:
raise ValueError('Invalid complex index')
res = self.complex_cache.get(index)
if res is None:
start = self.complexes + 4 + 4 + index * COMPLEX.size
real, imag = COMPLEX.unpack(self.map[start:start + COMPLEX.size])
res = self.complex_cache[index] = complex(real, imag)
return res
def read_bigint(self, index):
if index > self.bigint_count:
raise ValueError('Invalid bigint index')
res = self.bigint_cache.get(index)
if res is None:
start = self.bigints + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
len, = UINT.unpack(self.map[location:location + 4])
bigint_bytes = self.map[location + 4:location + 4 + len]
res = int.from_bytes(bigint_bytes, 'little', signed=True)
self.bigint_cache[index] = res
return res
def read_code(self, index):
if index > self.code_count:
raise ValueError('Invalid code index')
start = self.code + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
header = self.map[location:location + CODE_FORMAT.size]
(bytes, argcount, kwonlyargcount, nlocals, stacksize, flags,
firstlineno, name, filename, lnotab, cellvars, freevars, names,
varnames, consts) = CODE_FORMAT.unpack(header)
code = self.get_code_buffer(bytes)
cellvars = self.read_tuple(cellvars)
freevars = self.read_tuple(freevars)
names = self.read_tuple(names)
varnames = self.read_tuple(varnames)
consts = self.read_tuple(consts)
fixed_fn = path.join(self.base_dir, self.read_str(filename))
return CodeType(argcount, kwonlyargcount, nlocals, stacksize, flags,
code, consts, names, varnames,
fixed_fn, self.read_str(name),
firstlineno, self.read_bytes(lnotab), freevars,
cellvars)
def get_code_buffer(self, index):
start = self.bytes + 4 + index * 4
location, = UINT.unpack(self.map[start:start + 4])
len, = UINT.unpack(self.map[location:location + 4])
return memoryview(self.map)[location + 4:location + 4 + len]
def find_module(self, name):
'''Finds a module in the module tree. Returns a tuple of the code
and a bool indicating if the module is a package'''
parts = name.split('.')
cur = self.modules
res = None
for part in parts:
if cur == 0:
# Previous loop had no children
return None
count, = UINT.unpack(self.map[cur:cur+4])
for i in range(count):
start = cur + 4 + i * MODULE_ENTRY.size
entry_bytes = self.map[start:start + MODULE_ENTRY.size]
(iname, code, is_package, filename,
children) = MODULE_ENTRY.unpack(entry_bytes)
if self.read_str(iname) == part:
cur = children
res = code
break
else:
res = None
if res is not None:
return (self.read_code(res), is_package, self.read_str(filename))
return None
def close(self):
self.file.close()
IceBreaker = PyIceBreaker
if _pyice is not None:
# Use the accelerator version if it's available
class IceBreaker(_pyice.CIceBreaker):
def __new__(cls, icepack, base_dir=''):
map = mmap.mmap(icepack.fileno(),
length=0, access=mmap.ACCESS_READ)
if (isinstance(base_dir, str) and base_dir and
not base_dir.endswith(path.sep)):
base_dir += path.sep
self = super().__new__(cls, map, base_dir)
self.map = map
self.file = icepack
return self
def __enter__(self):
return self
def __exit__(self, type, value, traceback):
# Can't close this until all of the code objects are freed
super().__exit__(type, value, traceback)
self.file.close()
_CONST_READERS = {
OBJECT_TYPE_NONE: PyIceBreaker.read_none,
OBJECT_TYPE_NAMED_CONSTANT: PyIceBreaker.read_named_constant,
OBJECT_TYPE_INT32: PyIceBreaker.read_int,
OBJECT_TYPE_BIGINT: PyIceBreaker.read_bigint,
OBJECT_TYPE_BYTES: PyIceBreaker.read_bytes,
OBJECT_TYPE_STR: PyIceBreaker.read_str,
OBJECT_TYPE_FLOAT: PyIceBreaker.read_float,
OBJECT_TYPE_COMPLEX: PyIceBreaker.read_complex,
OBJECT_TYPE_TUPLE: PyIceBreaker.read_tuple,
OBJECT_TYPE_CODE: PyIceBreaker.read_code,
OBJECT_TYPE_FROZENSET: PyIceBreaker.read_frozenset,
}
class Freezer:
def __init__(self, output, modules, optimize, exclude, verbose):
self.modules = modules
self.optimize = optimize
self.exclude = exclude
self.verbose = verbose
self.outfile = open(output, 'wb')
self.maker = IceMaker(self.outfile)
def __enter__(self):
return self
def __exit__(self, *args):
self.outfile.close()
def build_file(self, basedir, fullpath):
dir, file = path.split(fullpath)
if file == "__init__.py":
relname = path.relpath(dir, basedir)
module_name = relname.replace('/', '.').replace('\\', '.')
is_package = True
else:
relname = path.relpath(path.splitext(fullpath)[0], basedir)
module_name = relname.replace('/', '.').replace('\\', '.')
is_package = False
relfn = path.relpath(fullpath, basedir)
for exclusion in self.exclude:
if re.match(exclusion, relfn):
if self.verbose:
print('Skipping', relfn)
break
else:
if self.verbose:
print('Including', module_name, 'from', relfn)
with open(fullpath, 'rb') as inp:
try:
bytes = inp.read()
compiled = compile(bytes, relfn,
'exec', dont_inherit=True,
optimize=self.optimize)
timestamp = int(os.stat(fullpath).st_mtime)
self.maker.add_module(compiled, module_name, relfn,
is_package, timestamp)
except SyntaxError as se:
if self.verbose:
print('Ignoring module with error: ', relfn, se)
def build_dir(self, dir):
for dirpath, _dirnames, filenames in os.walk(dir):
for filename in filenames:
if not filename.endswith('.py'):
continue
fullpath = path.join(dirpath, filename)
self.build_file(dir, fullpath)
def freeze(self):
start = time.time()
for module in self.modules:
if path.isdir(module):
if self.verbose:
print('Including directory', module)
self.build_dir(module)
else:
self.build_file(path.dirname(module), module)
self.maker.write()
end = time.time()
print('IcePack built in', end - start, 'seconds')
def main():
args = parser.parse_args()
if not args.modules:
print('No modules specified!')
sys.exit(1)
with Freezer(args.output, args.modules, args.optimize,
args.exclude or (), args.verbose) as freezer:
freezer.freeze()
class PyIceImporter:
def __init__(self, import_path):
self.path = import_path
try:
if (EXTENSION + '/') in import_path:
# sys.path entry should be
# 'path/to/compiled.icepack//relative/loc'
components = import_path.split(EXTENSION + '/')
pack_name = components[0] + EXTENSION
if path.isfile(pack_name):
self.disk_loc = components[1]
self.breaker = IceBreaker(open(pack_name, 'rb'),
self.disk_loc)
return
except IcePackError as e:
print('failed to load ice pack (invalid)', e)
except OSError as e:
print('failed to load ice pack: ' + str(e), e)
raise ImportError()
def find_spec(self, fullname, target=None):
if '\x00' in fullname:
# Invalid module name, return None, and let the import machinery
# report the module as not found.
return None
mod_info = self.breaker.find_module(fullname)
if mod_info is None:
return None
mod, is_package, filename = mod_info
disk_loc = path.join(self.disk_loc, fullname.replace('.', '/'))
if filename:
file_path = path.join(self.disk_loc, filename)
try:
mtime = os.stat(file_path).st_mtime
if int(mtime) > self.breaker.timestamp:
# the file on disk has been updated since the icepack was
# generated, prefer the on-disk version.
return None
except OSError:
# no file on disk, use the icepack
pass
else:
# namespace package
file_path = None
if is_package:
search = [self.path, disk_loc]
else:
search = None
loader = PyIceLoader(mod, self, file_path, is_package)
spec = spec_from_file_location(fullname, file_path,
loader=loader,
submodule_search_locations=search)
if not file_path:
spec.has_location = False
return spec
class PyIceLoader:
def __init__(self, code, importer, filename, is_package):
self.code = code
self.importer = importer
self.path = filename
self._is_package = is_package
def create_module(self, spec):
return None
def exec_module(self, mod):
# self.path is the empty string for namespace packages, which don't
# get a __file__ attribute
if self.path:
mod.__file__ = self.path
exec(self.code, mod.__dict__)
def is_package(self, fullname):
return self._is_package
def get_code(self, x):
return self.code
def get_source(self, fullname):
if not self.path:
# matching behavior of _NamespaceLoader in _bootstrap_external
return ''
with open(self.path, 'rb') as f:
return decode_source(f.read())
def get_data(self, path):
"""Return the data from path as raw bytes."""
with open(path, 'rb') as file:
return file.read()
def get_filename(self, fullname):
return self.path
def install():
sys.path_hooks.append(PyIceImporter)
def uninstall():
sys.path_hooks.remove(PyIceImporter)
if __name__ == '__main__':
import argparse
parser = argparse.ArgumentParser(
description='''
PyICE - Produces an icepack, a memory mappable pre-compiled modules.
''')
parser.add_argument('--exclude', type=str, nargs='+',
dest='exclude',
help='Adds a module to be excluded in the icepack.')
parser.add_argument('--optimize',
default=-1,
type=int,
action='store',
dest='optimize',
help='The optimization level (-1, 0, 1 or 2).')
parser.add_argument('--verbose',
default=False,
action='store_true',
dest='verbose',
help='Enable verbose output.')
parser.add_argument('modules',
nargs='*',
help='Directories or files to be included in the IcePack.')
parser.add_argument('--output', type=str,
default='out' + EXTENSION,
help='The destination filename')
main()