This project is on https://gitlab.com/bstarynk/refpersys/ and has its own web site on http://refpersys.org/ where more details are given.
The Reflective Persistent System language is a research project, taking many good ideas from Bismon, sharing a lot of goals (except static source code analysis) with it but avoiding bad ideas from it.
For Linux/x86-64 only. Don't even think of running that on non-Linux systems, unless you provide patches for that. And we need a 64 bits processor.
We have multi-threading in mind, but in some limited way. We think of a pool of a few dozen Pthreads at most (but not of a thousand Pthreads).
We absolutely want to avoid any GIL
Don't expect anything useful from RefPerSys before at least 2023. But you could have fun sharing our ideas and experimenting yours.
A rewrite of RefPerSys in C happens on refpersys-in-c.
We considered previously to use the garbage collector from Ravenbrook MPS. Since that project is now obsolete, we gave up that idea.
Don't expect RefPerSys to be a realistic project. It is not (and certainly not before 2025).
Some draft design ideas are written in the RefPerSys design draft which is very incomplete work in progress.
If you happen to know about any research call for proposals or funding
opportunities in Europe (Euro zone) about this (e.g. related to
artificial general
intelligence
goals) please mention them to Basile
Starynkevitch (France) by email to
[email protected]
.
Like Bismon, RefPerSys is managing an evolving, persistable, heap of dynamically typed, garbage-collected, values, exactly like Bismon does (see §2 Data and its persistence in Bismon of the Bismon draft report...). The semantics -but not the syntax- of values is on purpose close to those of Lisp, Python, Scheme, JavaScript, Go, or even Java, etc.... Most of these RefPerSys values are immutable; for example boxed strings, sets -with dichotomic search inside them- or tuples of references to objects, closures, etc ...- But some of these RefPerSys values are mutable objects, and by convention every mutable value is called an object. Each mutable object has its own lock, and any access or update of mutable data inside objects is generally made under its lock. By exception, some very few, and very often accessed, mutable fields inside objects (e.g. their class) are atomic pointers, for performance reasons. Objects have (exactly like in Bismon) attributes, components, and some optional payload. An attribute is an association between an object (called the key of that attribute) and some RefPerSys arbitrary non-nil value (called the value of that attribute), and each object has its mutable associative table of attributes. A component is an arbitrary RefPerSys value, and each object has some mutable vector of them. The payload is any additional mutable data (e.g. a string buffer, an mutable vector or hashtable of values, some class metadata, etc...), owned by the object. So the data model of a RefPerSys object is as flexible as the data model of JavaScript. However, RefPerSys objects have a mutable class defining their behavior (not their fields, which are represented as attributes) so used for dynamic message dispatching.
RefPerSys will have a small fixed set of worker threads (perhaps a dozen of them), each running some agenda loop; we would have some central data structure (called the agenda, like in Bismon (see §1.7 of the Bismon draft report...) organizing runnable tasklets (e.g. a few FIFO queues of them). A tasklet should conceptually run quickly (in a few milliseconds) and is allowed to add or remove runnable tasklets (including itself) to the agenda. Each worker thread is looping: fetching a runnable tasklet from the agenda, then running that tasklet.
This research project is GPLv3+ licensed and copyrighted by the RefPerSys team, currently made of:
-
Basile Starynkevitch <[email protected]>, homepage http://starynkevitch.net/Basile/ near Paris, France. So usual timezone `TZ=MEST`
-
Abhishek Chakravarti <[email protected]>
-
Nimesh Neema <[email protected]>
Some files might be "borrowed" from other similar GPLv3+ licensed projects (notably from Bismon...) and could retain their original copyright owner.
Please ask, by email, the above RefPerSys team for C++ coding
conventions before starting non-trivial contributions to the C++
runtime of RefPerSys. If you are contributing to its C++ runtime,
please run make clean
after any git pull
.
The GPLv3+ license of RefPerSys is unlikely to change before 2025 (and probably even after).
The RefPerSys runtime is implemented in C++17, with hand-written C++
code in *_rps.cc
, and has a single C++ header file refpersys.hh
.
We don't claim to be C++ gurus. Most C++ experts could write more
genuine C++ code than we do and will find our C++ code pityful. We
just want our runtime to work, not to serve as an example of well
written C++17 code.
The prefered C++ compiler (in 2020Q1) for RefPerSys is GCC version 8 or 9.
It could be worthwhile to sometimes compile RefPerSys with clang++
(see http://clang.llvm.org/ for more). In practice make clean
then
make RPS_BUILD_CXX=clang++
. The Clang static
analyzer could be useful, but
expect a lot of warnings, since C++ dont have flexible array
members but we
need something similar.
RefPerSys may later also use generated C++ code in some _*.cc
file, some generated C code in some _*.c
and generated C or C++
headers in some _*.h
files. By convention, files starting with an
underscore are generated (but they may, or not, being git
versioned). Some generated C++ files which are git add
-ed are under
generated/
subdirectory.
We could need later some C++ generating program (maybe similar in
spirit to Bismon's
BM_makeconst.cc. it
would then be named rps_*
for the executable, and fits in a single
self-sufficient rps_*.cc
C++ file. Perhaps we'll later have some
rps_makeconst
executable to generate some C++, and its source in
some rps_makeconst.cc
. So the convention is that any future C++
generating source code is in some rps_*.cc
C++ file. In commit
65a8f84aeffc9ba4e468
or newer the dumping facility is scanning
hand-written C++ source files to emit generated/rps-constants.hh
The build automation
tool used here is GNU make since
commit 6d56f50660c7cc41b9
(it was
omake before).
You should have compiled and installed Ian Taylor's
libbacktrace,
e.g. under /usr/local/
. You may need to add /usr/local/lib/
in
your /etc/ld.so.conf
and run ldconfig -v -a
after installation of
that libbacktrace
.
The JsonCPP and and
also a mail command in your
$PATH
.
To install the dependencies on a recent Debian 10 buster or Ubuntu 20 or 21 system, you could run the following steps
sudo add-apt-repository -y ppa:ubuntu-toolchain-r/test
(for Ubuntu 20.04)sudo apt install -y gcc-11 g++-11 clang-11 libc++-11-dev libc++abi-11-dev
(for Ubuntu 20.04)sudo apt install libunistring-dev
sudo apt install libjsoncpp-dev
sudo apt-get install libssl-dev
sudo apt install ccache g++ make build-essential remake gdb automake
sudo apt install ttf-unifont ttf-mscorefonts-installer unifont msttcorefonts fonts-ubuntu fonts-tuffy fonts-spleen fonts-roboto fonts-recommended fonts-yanone-kaffeesatz fonts-play fonts-eurofurence fonts-ecolier-court fonts-dejavu fonts-croscore fonts-cegui fonts-inter fonts-inconsolata
git clone https://github.com/ianlancetaylor/libbacktrace.git
cd libbacktrace
./configure
make
make install
RefPerSys is using (e.g. in its commit
843a6f0ddf1c22...)
the FLTK graphical user interface toolkit
(e.g. FLTK version
1.3.8
or newer...). That toolkit should be
compiled with both debug information and optimization, by configuring
it
with ./configure --enable-debug --with-optim="-O2"
You need a recent C++17 compiler such as g++
(We use
GCC version GCC
10) or GCC
11 or
clang++
version Clang
11. Look into, and perhaps
improve, our Makefile
. Build using make -j 3
or more.
You also should do a make clean
after any git pull
You may want to edit your $HOME/.refpersys.mk
file to contain
definitions of GNU make
variables for your particular C and C++ compiler,
like e.g.
# file ~/.refpersys.mk
RPS_BUILD_CC= gcc-11
RPS_BUILD_CXX= g++-11
You then build with make -j4 refpersys && make all
RefPerSys is a multi-threaded and garbage-collected system. We are fully aware that multi-thread friendly and efficient garbage collection is a very difficult topic.
The reader unaware of garbage collection terminology (precise vs. conservative GC, tracing garbage collection, copying GC, GC roots, GC locals, mark and sweep GC, incremental GC, write barrier) is advised to read the GC handbook and is expected to have read very carefully the Tracing Garbage Collection wikipage.
We have considered to use Ravenbrook MPS. Unfortunately for us, that very good GC implementation seems unmaintained, and with almost a hundred thousand lines of code is very difficult to grasp, understand, and adopt. Finally, using MPS is not reasonable in our eyes.
We also did consider using Boehm
GC. That conservative GC is really simple
to use (basically, use GC_MALLOC
instead of malloc
, etc...) and is
C++ friendly. However,
it is rather slow (even for allocations of GC-ed zones, and we would
have many of them) and might be quite unsuitable for programs having
lots of circular
references, and
reflexive programs have lots of them.
So we probably are heading towards developing our own precise and multi-thread friendly GC (hopefully "better" than Boehm, but worse than MPS), with the following ideas:
-
local roots in the local frame are explicit, like in Bismon (
LOCALFRAME_BM
macro of bismon/cmacros_BM.h) or Ocaml (see its §20.5 Living in harmony with the garbage collector andCAMLlocal*
andCAMLparam*
andCAMLreturn*
macros). The local call frame is conventionally reified as the_
local variable, so an automatic variable GC-ed pointerfoo
is coded_.foo
in our C++ runtime. A local frame in RefPerSys should be declared in C++ usingRPS_LOCALFRAME
. -
our garbage collector manages memory zones inside a set of
mmap
-ed memory blocks : either small blocks of a megaword that is 8 megabytes (i.e.RPS_SMALL_BLOCK_SIZE
), or large blocks of 8 megawords (i.e.RPS_LARGE_BLOCK_SIZE
). Values are inside such memory zones. Mutable objects may contain -perhaps indirectly- pointers to quasivalues (notably in their payload), that is to garbage collected zones which are not first-class values. A typical example of quasivalue could be some bucket in some (fully RefPerSys-implemented) array hash table (appearing as the payload of some object), in which buckets would be some small and mutable dynamic arrays of entries with colliding hashes. Such buckets indeed garbage collected zones, but are not themselves values (since they are mutable, but not reified as objects). -
The GC allocation operations are explicitly given the pointer to the local frame (i.e.
&_
, namedRPS_CURFRAME
), which is linked to the previous call frame and so on. That pointer is passed to every routine needing the GC (i.e. allocating or mutating values); only functions which don't allocate or mutate (e.g. accessor or getter functions) can avoid getting that local frame pointer. -
The C++ runtime, and any code generated in RefPerSys, should explicitly be in A-normal form. So coding
z = f(g(x),y)
is forbidden in C++ (wheref
andg
are C++ functions using the GC). Instead, reserve a local slot such as_.tmp1
in the local frame, then code_.tmp1 = g(RPS_CURFRAME, _.x); _.z = f(RPS_CURFRAME, _.tmp1, _.y);
In less pedantic terms, we should do only one call (to GC-aware functions) or one allocation per statement; and every such call to some allocation primitive, or to a GC-aware function, should pass theRPS_CURFRAME
and useRPL_LOCALFRAME
in the calling function. -
A write barrier should be called after object or quasivalue updates, and before any other allocation or update of some other object, value, or quasivalue. In practice, code
_.foo.rps_write_barrier(RPS_CURFRAME)
or more simply_.foo.RPS_WRITE_BARRIER()
-
Every garbage-collection aware thread (a thread allocating GC-ed values, mutating GC-ed quasivalues or objects, running the GC forcibly) should call quite often, typically once per few milliseconds, the
Rps_GarbageCollector::maybe_garbcoll
routine. If this is not possible (e.g. before a potentially blockingread
orpoll
system call), special precautions should be taken. Forgetting to call thatmaybe_garbcoll
function often enough (typically every few milliseconds) could maybe crash the system. -
Consequently, as a rule of thumb, any routine which can directly or indirectly allocate GC-ed values or quasi-values, or directly or indirectly mutate GC-ed values or quasi-values, should take a calling callframe argument. We might need to consider: putting that specific
callframe
argument in some global register, using GCCregister
...asm
extension to define global register variables and compile with the-ffixed-
reg code generation option. By coding convention, that calling callframe argument should be preferably namedcallingfra
, and should be the first argument of every function or methods (member functions in C++ classes) requiring the GC.
For Bismon, see http://github.com/bstarynk/bismon and read its dfraft Bismon report (updated quite often).
For the C++17 language, see this C++ reference.
For Linux programming, see Advanced Linux
Programming and the
syscalls(2)
man
page.
For GCC, see notably its Invoking GCC chapter.
For garbage collection, read Paul Wilson's Uniprocessor Garbage Collection Techniques old paper, then read the GC handbook
We already need the following libraries:
- libunistring for UTF-8 support, since UTF-8 is everywhere
- libbacktrace for backtraces
We may want to use, either soon or within a few years, (usually after 2022) interesting C or C++ libraries such as:
- libonion or Wt should be very soon (even in 2019) useful for the web interface
- libevent or libev for some event loop (quite soon).
- TensorFlow for machine learning purposes
- Gudhi for topological data analysis
- libcurl for HTTP client
- GMPlib for Arbitrary Precision Arithmetic or Bignums.
- 0mq for distributed messaging, in relation with distributed computing and message passing approaches.
- JsonCPP could be useful for JSON.
- POCO is a useful C++ generic framework library, and Qt might also be useful, even without its GUI aspect.
We should list other libraries interesting for us here, just in case (to avoid forgetting them).
Thanks to Niklas Rosencrantz (Sweden) for past minor contributions.
We are adding HTTP service in RefPerSys. So
libonion is required. For
many months, we just hope to use http://localhost:9090/
in a recent
(e.g. Firefox 80) web browser.
We really need to be able to show a demo of RefPerSys on a laptop
without Internet connection. So all required resources should be
copied here, under webroot/
. Be careful about copyright and
licensing issues.
The webroot/
subdirectory holds resources useful for HTTP
requests. In particular the following subdirectories:
-
webroot/css/
for -hand-written- style sheets. -
webroot/img/
for additional images. Prefer SVG or PNG formats. -
webroot/js/
for JavaScript code.
- apt install make
- apt install pkg-config
- apt install libcurl4-openssl-dev
- apt install zlib1g-dev
- apt install libreadline-dev
- apt install libjsoncpp-dev
- apt install qt5-default
- apt install cmake
- apt install build-essential