-
Notifications
You must be signed in to change notification settings - Fork 0
Home
This repository contains some codes (interfaces and implementations) I like in the year 2015. They are relatively mature that satisfying my existing needs. Therefore, they should be simple, practical and efficient in some senses.
Prevailing implementations of atomic_shared_ptr<T>
use spin-lock and mutex for exclusive access. This implementation (like::atomic_shared_ptr<T>
) uses shared-lock for reader access (At least no LOCK-prefix instruction is used to acquire a lock and 1 LOCK-prefix instruction is used to increment reference count) and lock-synchronization for writer access (1 LOCK-prefix instruction is used to decrement reference count). With less locked time and fewer LOCK-prefix instructions (please check what is less and what are fewer). It could be better than most implementations today. (BTW, IA32 CPU and WB memory)
Existing implementations (BOOST and Microsoft) of shared_ptr<T>
use 2 volatile variables (1 for shared_ptr and 1 for weak_ptr) for reference counting. This implementation (like::shared_ptr<T>
) combines the 2 variables into 1, resulting in less LOCK-prefix instructions executed (theoretically and hopefully; please check what are less). The trade-off is to use 16-bit (for example) instead of 32-bit for each reference counting.
It is a Sudoku solver using pre-emptive set technique in single function call only. It solves most human playable Sudoku puzzles in about 0.85 million CPU cycles with single thread and Intel(R) Core(TM) i5-3330 CPU @ 3.00GHz, though initial implementation spends about 10 million CPU cycles. Three optimizations were done with ~12x performance gain. The first optimization used SSE instructions to process 8 combinations in one iteration instead of 8 iterations. The second optimization eliminates some batches of 8-iteration and 64-iteration. The third optimization replaced some cache accesses with register accesses.
By the way, it can be customized for constraints in irregular shape (non-3x3 block).
like::xmm_ptr
realizes the concept of utilizing XMM registers as an alternate of general registers, though only pointer is envisioned. Its interface is similar to that of std::tuple. For example,
typedef like::xmm_ptr<unsigned char const, char> ptr_type;
ptr_type xmmPtr(pInput, pOutput);
unsigned char const *pInputX = get_ptr<0>(xmmPtr); // get first pointer
char *pOutputX = get_ptr<1>(xmmPtr); // get second pointer
xmmPtr += ptr_type(16, 16) * 2; // advance pointers
like::member
provides a simple interface to access descendant member (field and result of method invocation) of a C++ object (or C++ structure), though array member is not envisioned.
struct coord_t
{
int x;
int y;
};
struct line_t
{
coord_t start;
coord_t end;
};
line_t astLine[10];
// ...
coord_t astPos[10];
int anStartX[10];
std::transform(std::cbegin(astLine), std::cend(astLine), astPos, like::member(&line_t::end));
std::transform(std::cbegin(astLine), std::cend(astLine), anStartX, like::member(&line_t::start, &coord_t::x));
like::order_by
provides a simple interface to specifying ordering of elements.
std::sort(std::begin(astPos), std::end(astPos), like::order_by(&coord_t::y));
std::sort(std::begin(astPos), std::end(astPos), like::order_by(&coord_t::y, std::greater<int>()));
std::sort(std::begin(astLine), std::end(astLine), like::order_by(like::member(&line_t::start, &coord_t::x)));