Vt200 tool
How much performance was being left on the table? How terribly complex was the code? And how much worse would it become if fully tuned? Where did scientific researchers write optimized code, and where did they get into trouble? I was hooked.
I wanted to know how far scientific researchers got before throwing in the towel. The software was open source, and I wanted to know what it could really do.
It wasn't satisfying. I was suddenly wanted to know the performance story for this specific piece of software. There was only one thing to do.
Do a performance analysis and full tuning of the software. So I did. Here I will describe 1 performance adjustments I made and their impact 2 performance pitfalls I encountered and tips to avoid 3 discuss some unturned stones, like thread downtime. PCat takes an astronomical survey image and catalogs the image: finds the location of stars in the image. PCat doesn't find a single set of locations the highest likelihood catalog. PCat instead samples the distribution of possible star catalogs for the image, and returns a family of possible catalogs.
PCat is built on the statistical sampling framework DNest3. Understanding of stars, their density, how they may be modeled using point source function, is used to construct a bayesian. The image data is used to update the bayesian, and PCat jumps around sampling from the universe of all possible star locations to generate a set of samples.
Via metropolis, PCat takes its state catalog of the stars , mutates it slightly, decides if this is a more likely outcome than before, and then either accepts or rejects this "mutant. The processes is similar to an optimizer except that PCat is not targeting a global maximum value and indeed, PCat accepts inferior mutants and walks somewhat randomly. Pcat didn't build and didn't have tests sigh.
I hacked away until PCat built, making some minor modifications to library DNest3 the author seems to have made but didn't document.
Since PCat doesn't have any tests, I made a simple end-to-end test that samples an astro image, and checks the answer against an expected result. PCat is probabilistic which is an issue for reproducibility. I pre-selected the RNG seed, to make my end-to-end test act deterministic. This left limitations -- I had to make certain not to tweak code that affected the order in the random numbers were picked, for example -- but proved effective. To build my benchmark, I found an image and output catalog settings that took roughly 1min to run.
Long enough to average out, but not so long that running several iterations would leave me needing reading material. I did the following to mitigate:. It's a mystery for another day. I used linux perf-stat to benchmark PCat as it gives gives runtimes and more detailed metrics like L1 loads.
I can build everything in one go, tell what version and commit the binary came from, and extract stats and plots from the benchmark tooling; I describe those tooling improvements in the appendix , if you're interested. It must be time for code tuning.
Let's remove some unnecessary operations and clear up CPU caches! Okay, we need to find a target. I used Flamegraphs to find and target hotspots. I wrote a post about it if you're interested. In addition to flamegraphs I also looked at Intel VTune and while helpful especially at looking at thread coordination issues , I didn't find it as handy and direct as flamegraphs.
Memory operations are slow and are all red, there is no way to avoid that. But functions like pixelLogLikelihood and logLikelihood are also red. And similarly, they will always be red because they're floating-point heavy and thus, require multiple cycles per operation. If you're doing scientific FP-math-heavy computing, you are also likely best to ignore this truism. Quick note: Click on the flamegraph to open.
Seems excessive. So that's target number 1: reduce the number of memory operations PCat executes. Since PCat "walks" through the possibility space, proposing and evaluating mutations, PCat inherently needs to copy some information about the old position in order to create a new proposal. There is no way to get away from a memory operation here, but we can:. Starting with reducing data copied. Here in updateParticle we can see the current position is copied in order to propose a new position.
We have to copy the current position to make a proposal, but it would be ideal to copy nothing more than the data needed to uniquely define our position. Poking at the proposal object, MyModel , I found a lot of unneeded fat. Actually, I found a ton of fat. I found that each sample contained an entire copy of the telescope image! The image is shared data, and is certainly not needed to describe a specific set of star positions. To boost performance, I trimmed down MyModel to a minimum.
I stripped out the telescope data and common data and place them in a new object MyModelGlobals , available to all threads by shared pointer. No longer copying full images each cycle, PCat runtime dropped Worth it. This alone would have been a nice outcome, but I was suspicous there were other unneeded operations to eliminate. I poked around further and found that indeed, we have other unnecessary copy operations.
See if you can spot the issue in updateParticle? And now that we accept this proposal , we shouldn't need any more copies. We already have an object for the current position and an object for the new position, why would we need a third?
The error is the code assigns proposal to a std::vector , and std::vector initiates a copy in order to store the object. Why does this occur? So instead, when std::vector receives an object, it stores its own instance of the data, a copy, guaranteeing the data will persist as long as it remains in the vector.
Let's axe this unneeded copy and reclaim some performance! Show More. Estimated Value. Free Shipping. Add to Cart. Compare Compare. With a single USB-C connection, you can connect up to two external displays, transfer files and data at up to 5Gbps speeds and charge mobile devices.
Three external displays requires disabling the system screen and specialized display settings, see manual. Expanded Selection of Ports — Flexibility and portability are the goals with the VT, by offering some of the most commonly needed ports in one dock using the included 4 inch USB-C cable.
Quick Specs Product Type. Related Products. The tape has excellent electrical and mechanical properties to provide moisture seal protection. The tape provides primary insulation for solid, dielectric insulated cables rated through 69 kV. This tape is designed for low and high voltage, bus bar insulation, end sealing of cables and jacket repair applications. It has outstanding mechanical and electrical properties that offer moisture seal and void free buildups. Protected lid with large sighting mirror.
Flexible waveguides are manufactured from helically wound silver plated copper alloy strip material which allows the waveguides to twist and bend in the E and H planes simultaneously.
CommScope RG Sat CommScope RG-6 Sat Our engineers participate in industry standards groups and critical committees.
0コメント