[RFC] src/aiori-CEPHFS: New libcephfs backend #217

markhpc · 2020-03-10T15:44:55Z

This is a new aiori backend using libcephfs that is loosely based on the existing POSIX and RADOS backends. It also borrows the "prefix" concept from the DFS backend for an existing POSIX mount point (necessary for ior/mdtest to function properly even when using a library for direct filesystem access). A slight change to libcephfs.h is needed for IOR to properly compile (this does not appear to be necessary for C++ clients using libcpehfs however):

#include <sys/time.h>

io500 tests on a 10 node in-house test cluster with 2X replication and co-located clients appeared to function properly with similar (though much better in the case of sequential reads) scores vs using the POSIX backend with kernel based CephFS mount points. In the following results, the mdtest easy directories are being round-robin pinned prior to the test, though in the near future ceph will do ephemeral pinning across MDSes automatically with a single top-level xattr.

[RESULT] BW   phase 1            ior_easy_write               30.328 GB/s : time 630.68 seconds
[RESULT] IOPS phase 1         mdtest_easy_write              240.573 kiops : time 374.11 seconds
[RESULT] BW   phase 2            ior_hard_write                7.225 GB/s : time 525.99 seconds
[RESULT] IOPS phase 2         mdtest_hard_write               23.795 kiops : time 516.84 seconds
[RESULT] IOPS phase 3                      find              574.220 kiops : time 178.15 seconds
[RESULT] BW   phase 3             ior_easy_read               79.416 GB/s : time 240.79 seconds
[RESULT] IOPS phase 4          mdtest_easy_stat             1057.850 kiops : time  85.08 seconds
[RESULT] BW   phase 4             ior_hard_read               24.591 GB/s : time 154.39 seconds
[RESULT] IOPS phase 5          mdtest_hard_stat              100.794 kiops : time 122.02 seconds
[RESULT] IOPS phase 6        mdtest_easy_delete              191.729 kiops : time 469.41 seconds
[RESULT] IOPS phase 7          mdtest_hard_read               56.874 kiops : time 216.24 seconds
[RESULT] IOPS phase 8        mdtest_hard_delete               14.824 kiops : time 831.99 seconds
[SCORE] Bandwidth 25.5761 GB/s : IOPS 124.21 kiops : TOTAL 56.3632

2020-03-06-RedHatLibCephFS-10-30.zip

Generally, lower scores in unaligned reads/writes and build-up time for dynamic subtree partitioning in the ior and mdtest hard test cases held us back (we actually see higher scores with longer run times!). Given how scores are calculated these will be prime targets for future optimization.

Signed-off-by: Mark Nelson mnelson@redhat.com

Signed-off-by: Mark Nelson <mnelson@redhat.com>

JulianKunkel

Great patch.
Delightful to see that you tested it with the IO500 benchmark.

JulianKunkel · 2020-03-10T15:46:40Z

src/aiori-CEPHFS.c

+  .prefix = NULL,
+};
+
+static option_help options [] = {


great that you use the new options ^-^

JulianKunkel · 2020-03-10T15:50:25Z

src/aiori-CEPHFS.c

+                          "cannot total data moved");
+                if (tmpMin != tmpMax) {
+                        if (rank == 0) {
+                                WARN("inconsistent file size by different tasks");


Nice check, albeit it costs a little performance.
As it assumes to be a collective operation that may lead to unexpected behavior (in terms of AIORI semantics) if not all processes invoke the same function. I'm not 100% sure if we should generally allow that but also not so worried as IO500 runs...

JulianKunkel · 2020-03-10T15:52:11Z

configure.ac

@@ -197,6 +197,20 @@ AM_COND_IF([USE_RADOS_AIORI],[
        AC_DEFINE([USE_RADOS_AIORI], [], [Build RADOS backend AIORI])
 ])

+# CEPHFS support
+AC_ARG_WITH([cephfs],


In the future, the code might benefit from automatic detection of include/library files. It is reasonable to keep it function for now though.

src/aiori-CEPHFS: New libcephfs backend

9649a0c

Signed-off-by: Mark Nelson <mnelson@redhat.com>

JulianKunkel approved these changes Mar 10, 2020

View reviewed changes

glennklockwood merged commit 657ff8a into hpc:master Mar 10, 2020

markhpc deleted the wip-aiori-cephfs branch March 10, 2020 20:22

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[RFC] src/aiori-CEPHFS: New libcephfs backend #217

[RFC] src/aiori-CEPHFS: New libcephfs backend #217

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

[RFC] src/aiori-CEPHFS: New libcephfs backend #217

[RFC] src/aiori-CEPHFS: New libcephfs backend #217

Uh oh!

Conversation

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!