This repository was archived by the owner on Nov 3, 2020. It is now read-only.
Tags: sekwiatkowski/komputation
Tags
v0.12.5: - Switched CUDA C development to CLion - Used the __JETBRAINS_IDE__ macro to declare CUDA's language extensions - Header include paths are now relative to the given source file - For real-time compilation with nvrtc, all include directives in the source code are replaced with a sequence of directives that use paths relative to the CUDA resource base directory. - Header files are now inferred from the source code and do no longer have to be specified in kernel instructions. - Fixed comparisons in the binary testing kernel - Replaced double constants with floats - Removed the unused numberEntries parameter from the kernel that replaces NaNs - Removed unused par 8000 ameter from functions used for backpropagation kernels of recurrent layers - Resolved a name conflict in the max-pooling kernel - Simplified the definition of the stack of convolutional layers in the embedding toy demo with two filter widths
v0.12.3: - Finished implementing experimental support for (fixed-length, left-to-right, vanilla) GPU-accelerated recurrent neural networks - Fixed the allocation of memory for the propagation result in CudaSquaredLoss - Added a helper function to access and print arrays on the device - Implemented a SumKernel to add up accumulated gradients for parameters that are in used in each instance - Added CUDA helper functions to cooperatively copy an array and add up two arrays - Moved the entrywise CUDA activation functions to header files - Removed unused array fill kernels - Added a pointer to the maximum number of input columns in BaseCudaContinuation - The shared parameter is passed directly to the CPU-specific ParameterizedSeries instruction. This makes it possible to use the same entries for the CPU and CUDA. - Removed the CUDA IDs from the ResultExtraction enumeration - Set the device activity function IDs to be constant - Added a CUDA version of the increment demo - Mentioned the demo in the README - Replaced kotlin-stdlib-jre8 with kotlin-stdlib-jdk8
v0.12.1: - The summation of gradients based on the parameter index in CudaLookup is now deterministic. - Removed the hash table kernel - Replaced the use of the hash table with a pointer to the parameter indices - Rewrote the group sum kernel based on information about the indices of the first occurrence of a parameter and its remaining occurrences - Added a kernel two add up two arrays - Fixed backward propagation in CudaStack by replacing the cuBLAS axpy operation with the use of the addition kernel - The input memory can now store information about duplicate occurrences. - Improved the name of the setters in InputMemory - The optimizer kernels now check if the count is strictly positive. - Moved reusable batch size and output entries members to BaseCudaEntryPoint - Increased the batch size to 16 and changed hyperparameters in the TREC demos with two filter widths. - Mentioned the CUDA TREC demo with two filters in the README
v0.12.0: - Simplified the specification of networks - The input dimensions over the continuations of the network are computed automatically. - Removed the Layer suffix from instruction factory functions - Overloaded the instruction factory function to simplify the specification of initialization strategies - Renamed Direction.Forward/Backward to Direction.LeftToRight/RightToLeft - Shortened "ActivationFunction" to "Activation" and "ActivationLayer" to "Activation" - Generalized BaseCudaEntrywiseActivationLayer to BaseCudaEntrywiseLayer - The specification of the minimum length is required in the lookup instruction and optional in the input instruction. - TREC categories are indexed based on all available training data. - Renamed "forward" layer to "continuation" and shortened "combination layer" to "combination" - Moved the architecture-specific interfaces from the general package to the respective architecture-specific packages - Improved the names used in SparseAccumulator and SparseUpdate - The series is passed on to the method of the ResultExtractionStrategy interface. - Introduced CpuCombinationSeries to implement the addition of the weighted previous state and the weighted current input. - Added the Cpu prefix to Series and ParameterizedSeries in preparation of the CUDA implementation of recurrent neural networks - Optimized the performance RNN implementation by adding the bias to the input rather than adding at each step - Fixed the specification of the number of rows in CpuLogisticLoss - Renamed the "Negation" demo to "Not" - Stopped experimenting with dynamic parallelism - CudaIdentity now implements CudaActivation. - Introduced a base class for higher-order layers - Differentiated the CUDA continuation base class into one class for layers that change the number of columns and one class for layers that don't. - Reused the code for the computation of launch configurations in CudaHashing and CudaGroupSum - Fixed the sparse updated in CudaLookup - Added a "copy" helper function that encapsulates System.arraycopy for copies - Added a setter to InputMemory that caches all possible data - Clarified references to the hash table in CUDA optimizers - CUDA layers pass a pointer to the length of the input data and the maximum length within the batch. - Unified the activation instruction factory functions over the two architectures - Moved the concatenation layer to a separate package - Added an instruction for weightings with shared parameters that is separate from the instruction for the weighting layer that uses a dedicated parameter - The two weighting instructions inherit from the new BaseWeighting class. - Added instructions for the tree series types: Series, ParameterizedSeries and CombinationSeries - Refactored the CPU RNN factory function based on the instructions - Continuation instructions implement HasOutputDimensions and CanSetInputDimensions, while entry point instructions only implement HasOutputDimensions. - Inlined some CUDA C helper functions - Moved the division by 2 in the squared loss function from the host to the device - Added the missing scaling of gradients in some of the optimization kernels - Refactored the for loops used to update entries in optimization kernels - Temporarily removed the CUDA forward layer tests - Updated the links in the README - Upgraded to Kotlin 1.2.10
v0.11.3: - Added an instruction for bidirectional recurrent layers - Rearranged the parameters in the factory functions of the recurrent layer and the dropout layer instruction - Overloaded the dropout layer instruction factory function for the case of vectorial input - Mentioned the bidirectional recurrent layer and the new running total demos in the README - Updated the TREC sample code in the README
v0.11.2: - The recurrent layer can now emit either all steps or the last step. - Added demos that compute the total of fixed-length and variable-length input - Mentioned the new recurrent layer implementation in the README - Included links to the demos in the README
v0.11.1: - Implemented testing support for multi-class and binary classification problems - Constructors of optimization instructions are now internal. - Removed AttentiveDecoder and the reverse demo based on that decoder - Removed its specific dependencies: column repetition, row summation and transposition
PreviousNext