Most infrastructure operations folks have faced the situation where monitoring and analysis results from different tools are at odds with each other. Alternatively, your data analysis reveals completely unexpected patterns or conclusions.
When the business executives, regulators or customers are skeptical, not just about your data but the methodology you used to collate and analyze it, you need to be very clear on how you arrived at the results you are standing over. This is often very hard to do if when the monitoring and analysis tools are built on a set of do-it-yourself projects.
Examples of this abound in troubleshooting situations. Application logs reporting that feed handler performance is within its latency budget, while the networking team is measuring something different. High-resolution network utilization metrics show unexpected microbursts or a break down of bandwidth consumption shows unused data feeds hogging the capacity. A venue is adamant that their market data “couldn’t possibly take that long” to deliver, while your team suspects the venue is sending multiple quotes together, queueing up one behind the other, rather than as they are generated.
Such “discussions” are going to be more frequent as more asset classes become more algorithmically driven and automated.
The good news is that you can reduce risk and exposure by using independent vendors. Where there’s a need for accountability, outlined in the uncomfortable inquisition above, Corvil is a partner that you would want to have on your side. Our tried-and-tested solutions are something you can stand over, even when the going gets tough.
"If we provide a client with latency metrics, from Corvil, that it has taken X amount of milliseconds or microseconds for us to execute their order, or reach the exchange. Since it is an industry standard product, the clients recognize and trust in it. Therefore, it's easier for us to demonstrate the performance benefits of our applications." EMEA Head of Electronic Trading App Management
That’s not to say there won’t be queries around our analysis that still need to be unpicked, but the point is that it can be unpicked. For example, at a global broker was surprised by unexpected behaviors or order routing patterns revealed by a multi-hop sequence diagram. It showed how a particular algorithm was going to multiple markets sequentially with each child order and coming back with no executions. The surprise was seeing this pattern play out over and over again, so of course the analysis results were called into question.
Is the pattern just an artifact of how the diagram was laid out? The use of high-resolution timestamps, in this case from the aggregation layer, is key for addressing these types of questions. While the number of decimal places used certainly limits the precision of the timestamp, the converse is not true: a millisecond-precision timestamp can be presented in nanoseconds format simply by padding with six trailing zeros. In this case, a quick check of the timestamps answered that question.
Is there any ambiguity in where the measurements were taken? Identifying the exact point at which a timestamp is applied within a software stack is notoriously difficult to do. That variability is hidden with simple application logging, which in turn impacts the latency measurements. With network-based timestamps, however, there is no variability and ambiguity as to where in the host software stack the measurement is made. Additionally, using Corvil App Agent takes care of the alignment with real (UTC) time for application events.
Are all of the system clocks synched properly? While originally built for MiFID II and CAT compliance purposes, internal stakeholders can use clock sync reports information to verify traceability and clock synchronization integrity. Thereby showing that, yes, the graphic is an accurate sequence of events, and yes, the algo was exhibiting an unexpected behavior.
What happens next with that algorithm is a business decision. However, the main point here is that when your monitoring infrastructure is designed to both understand what is happening environments and demonstrate the precision, accuracy and integrity of your analysis, it simplifies these meetings with various internal and external stakeholders.