Jouni
Members-
Posts
2,095 -
Joined
-
Last visited
Content Type
Profiles
Forums
Developer Articles
KSP2 Release Notes
Everything posted by Jouni
-
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
Of course your datasets can be public or private, but it's irrelevant to this discussion. The amount of data is rather small. You're working with gigabytes or terabytes, not with petabytes or exabytes. Analyzing such amounts of data isn't too challenging computationally. There may be more combinations of variables than particles in the visible universe, but it doesn't matter, because data miners learned to deal with such complexity a long time ago. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
What was the point of all that? If you strip away all the unnecessary jargon, you were just admitting that the actual datasets aren't too large or too complex by today's standards. When I was a student, we had one of the best data mining / machine learning / data analysis groups in the world at the university. I took a few classes from them, and while I ultimately chose to specialize in another area, I'm quite familiar with the basic techniques for handling the kind of complexity you describe. These days I'm working in a major bioinformatics / genetics research institute. While my own research is mostly computer science, I get a fair bit of exposure to the work done in genetics. Based on what I see, computation is rarely the real bottleneck. If something is computationally infeasible, it's probably so infeasible that a 10x or 100x performance improvement won't make it feasible. The real bottleneck is almost always the lack of relevant expertise. People with a background in methods (computer science, statistics, mathematics, physics, or something similar) are generally familiar with one or more relevant methodological areas, such as algorithms, statistics, or high-performance computing. People with a background in biology understand the data better and ask better questions about it. People with a mixed/bioinformatics background are familiar with the best practices and tools in bioinformatics. It's rare to see a person who's familiar with most of the relevant aspects of a research project. Or in other words, understands what the group is really doing. Most of us don't have what it takes to become world-class experts in multiple unrelated fields, and to stay up-to-date in all of them. I'm less familiar with the data analysis large companies like Google, Amazon, and Walmart are doing. Based on what's publicly available, they work with orders of magnitude more data than we do, and they're facing at least as complex questions. While biology is a complex subject, language, economy, and human behavior are also quite complex. -
1.0 - Constant crashing on OS X
Jouni replied to shaun3000's topic in KSP1 Technical Support (PC, unmodded installs)
Today I solved a rather annoying memory leak in software I built. It reminded me of some of the problems we have with KSP on a Mac. KSP is based on Unity. Unity runs on Mono. Mono may have been compiled on GCC. Software compiled on GCC uses glibc. The multithreaded implementation of malloc() and free() in glibc wastes memory under some circumstances. In short, small (by default smaller than 128 kB) memory allocations done by threads other than the main thread are typically served from so-called arenas. An arena is a smallish memory region (1 MB on 32-bit systems), so there are usually many arenas at the same time. When a thread tries to allocate memory, it cycles through all arenas, starting from the one where it last allocated memory successfully, until it finds one where the request can be served. If all arenas (where the thread could acquire mutex) are full, a new arena is created. When a thread calls free(), the deallocated region isn't really freed, unless it was the highest allocated region in an arena. If there are many small allocations and deallocations, the end result is a lot of fragmented arenas with unallocated regions inside them. Because many free() calls don't actually free the memory, while many malloc() calls don't fit in the holes in the existing arenas, memory usage tends to grow slowly over time. This isn't really a memory leak, because everything is under control, but it certainly looks like one. I don't know whether this is the real cause of the memory leak in KSP, but it could explain the symptoms. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
If you doubt whether I know what I'm talking about, you can trivially check what kind of expertise and what qualifications I have. I haven't made any effort to hide my real identity on the forums. What you describe is not a very large dataset by today's standards. You have maybe a few hundred gigabytes, while businesses routinely mine terabytes or even petabytes of similar data. The methods are well-known, and the entire world from programming languages to software and from hardware to infrastructure has been optimized for processing that kind of data. Biologists tend to have trouble with smaller datasets than people from many other fields. In part this happens because biologists (especially those who chose biology already in college) often lack the culture, the infrastructure, and the expertise for processing large amounts of data. Another reason is that biological data is often sequence data, which requires quite different methods, hardware, and infrastructure to process than mainstream numerical/categorical data. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
What's that supposed to mean? Using 10 dimension means just that you're handling objects that consist of 10 numerical values. In the things I'm working on, objects typically have from billions to hundreds of billions of "dimensions". Processing such data isn't that hard, because running time depends on algorithms, not on dimensions. What you described in the next paragraph sounds like a fairly standard data mining problem with datasets that aren't too large by today's standards. I know a lot of people working on similar problems, and exactly none of them writes any code in assembly. -
The Wheesley is capable of breaking Mach 2 since one of the 1.0.x patches. You just need to minimize drag and fly level at a low altitude until you break the sound barrier.
-
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
Since then, Intel has developed better verification methods that don't depend as much on people not making mistakes. The Pentium FDIV bug was the main catalyst for Intel investing in formal verification, because it cost them real money. (Or at least that's what the people hired in the late 90s always say.) Obviously they would have done that eventually without the bug, but the way they harvested formal verification talent from the entire world sped up the development by at least 5 years. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
A few points: 1) Large technology companies have always invested in scientific research. The most famous example was the Bell Labs, whose employees received several Nobel prizes for the work done there. 2) "In silico" is biology/bioinformatics-specific jargon. It doesn't mean anything in this context. 3) Testing can only reveal that the processor works correctly with a negligibly small fraction of possible inputs. Testing for manufacturing defects is no different from other testing. I was talking about formal verification, where integrated circuits (or computer programs) are treated as mathematical proofs, and the proofs are verified to be correct. If you've had computer science education, you should be familiar with the basic techniques, such as preconditions, postconditions, and invariants. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
This is mostly about a different issue. I'm arguing that if you're interested in efficiency, you should make sure that you're measuring the real costs vs. the real benefits. Assume that you want to analyze a dataset. You know which methods you want to use, but you have to implement them yourself. There are two scenarios. Either you spend one week writing the code and four weeks running it, or you spend four weeks writing and optimizing the code in order to make it run in a week. Which one is more efficient? In both cases, you get the results in five weeks. The difference is whether you lose three weeks of working time or three weeks of computing time. If your employer is running a large-scale datacenter, your time is probably worth around 1000 CPU cores running 24/7. If you buy the CPU time from Amazon or your employer only has a few computing servers, your time can be worth as little as 100-200 CPU cores. Unless the analysis requires a lot of hardware, it's probably more efficient to spend just one week writing the code and use those three weeks on another project. I wasn't talking about engineers. I was talking about world-class researchers making major scientific breakthroughs and putting the results to good use. Even though modern CPUs are orders of magnitude more complex than those we had in the 90s, we haven't seen a similar increase in hardware bugs. That's because Intel not only tests its products but also formally proves that many of their subsystems work correctly. As any scientist knows, experiments can only prove that you're wrong, while no amount of experiments will prove you right. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
It's actually the other way around. If you're writing a game that thousands of people will play for a long time, it pays off to optimize the game as far as possible. If you're just processing and analyzing data, writing the code can easily take longer and be more expensive than running it. In the latter case, you have to know when the code is good enough, so you can stop optimizing it and start using your time for more productive purposes. If you're referring to that old division bug, Intel did way more than that. They started hiring world's top experts in automatic software verification, made huge leaps in the theory of derivation and verification of provably correct hardware, and put that theory to good use. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
Well, it's no longer the 8088 era. Most programmers don't write assembly by hand, because it's almost always a waste of time. Unless you're really good at writing assembly and devote a significant fraction of your time to keeping up with hardware developments, your compiler probably writes better assembly code than you do. Those of use who have specialized on something else just write our performance-critical code in our favorite language, and let the compiler sort out the low-level issues. -
Software engineers and the rest of the world.
Jouni replied to PB666's topic in Science & Spaceflight
You're not reading a "Teach yourself Intel x64 assembly in 21 days/weeks/months/years" guide. You're reading a reference manual intended for the small minority of experts who are already skilled in assembly programming but need a reference for technical details. If you're not already familiar with a modern assembly language, you should spend a couple of years learning and practicing one first. The websites of most decent universities should point to suitable learning resources. -
There are three fundamental reasons for doing anything. The activity can be necessary to keep us from dying right now, it can be interesting/fun/cool/something, or it can indirectly help us doing something we already consider worth doing. Most science and almost all good science belongs to the second category.
-
How to handle fairing lift?
Jouni replied to Dorlan's topic in KSP1 Gameplay Questions and Tutorials
There are two problems I can see. First, the shape of the fairing is bad. Guess which of these two fairings makes the rocket behave better: Second, the TWR may be too high. The faster you fly, the worse all aerodynamic problems become. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
As I said, there are simple combinations of complex things inside those complex things. At any level above the innermost loop bodies, there are typically 1-3 sequentially executed tasks that take a nontrivial amount of time, and those tasks are almost always obvious to anyone. At any level, you can choose any of the nontrivial tasks to optimize, and you see a noticeable improvement in performance. Optimize all of them, and you see a significant improvement. That's how things work in scientific computing, and in data processing in general. The structure of the code tends to be simple, and the bottlenecks are usually obvious. Many compilers treat inline as a hint, which they can choose to ignore. Because scientific software is almost always distributed as source code, you can't rely on compiler-specific behavior. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
Remember that I was talking about simple combinations of complex things, not about complex combinations of simple things. Threads may run for minutes or hours independently. At any level above the innermost loop bodies, there are probably 1-3 sequentially executed tasks that take a nontrivial amount of time, and a number of smaller tasks that are orders of magnitude faster. You typically have no control over what other CPU/memory intensive processes are running at the same time. My point was that it's always a minor change that pushes the optimizer past a critical threshold. You add a single instruction to a function or call it in one more place, and the optimizer decides that it's no longer beneficial to inline the function. That's the fundamental nature of binary choices. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
In the kind of code I'm talking about, the performance impact of anything taking more than 100 cycles (but less than a second) is usually obvious. We know quite precisely what the code is doing, and there are no black boxes or complex data-dependent call graphs around. The impact of high-level algorithmic choices can be less obvious, while it's hard to determine what takes the most time inside the innermost loops. My favorite is a loop that my laptop executes either 0.5 billion or 2 billion iterations/second, depending on whether a certain unrelated piece of code is present in the same compilation unit. The optimizers of modern compilers are incredibly smart, except when they happen to be incredibly stupid or incredibly unpredictable. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
If the performance tools require special compiler options instead of working directly with the production executable, they're obviously producing wrong results. Similarly, if they execute any CPU instructions or use registers, cache and/or memory, they're by definition altering the behavior of the program. A profiler is a rough tool. If there is an obvious bottleneck in the code, the profiler can probably find it. If there are multiple and/or less obvious bottlenecks, the profiler may not be able to identify them correctly. C++ code is particularly hard to profile, because a change in one place may lead the optimizer to make different decisions in a seemingly unrelated place. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
This is actually wrong. Roughly speaking, profilers are useful when you're writing software, but less so if you're writing algorithmic code. If your code is a complex combination of simple tasks, you usually need a profiler to see where the bottlenecks are. On the other hand, if your code is a simple combination of complex tasks, you're just a wrong person for the job, if you can't see where the bottlenecks are without any help 95% of the time. There are also some common pitfalls when using profilers with algorithmic code (like in this case). One is that you may be measuring the wrong thing. It's quite common with algorithmic code that the bottleneck depends on input size, input type and/or the environment. By profiling with a handful of test cases and optimizing the code for them, you may actually be hurting its performance with real data. Thorough performance measurements may take weeks, months, or even years. Another issue is that the measurements themselves are wrong. Measuring the performance changes the behavior of the program and often moves the low-level bottlenecks to different places. -
Heavy Lift Launch Vehicles VS Orbital Assembly
Jouni replied to Nicholander's topic in Science & Spaceflight
In short, payloads are complex and expensive, while rockets are simple and cheap. The best option is usually to launch the payload in as few launches as possible, using as large rockets as you can justify developing. It really boils down to R&D costs vs. production costs. Most systems in the payload are (almost) unique. Every system you can eliminate will save you a lot in R&D costs. Rockets, on the other hand, need only minor improvements after the first few launches, and the marginal costs of launching yet another one can be quite low. If you design a rocket with a significantly higher payload capacity than anything you already have, and get enough launches with large payloads for it, the R&D savings from simpler payloads will outweigh the R&D costs from developing the rocket. I believe the break-even point is at around 10-20 launches. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
What about: if(a) { A; C; } while(b || !c) { if(! { D; } B; C; } E; If you want to replace gotos with structured code, you need reason about the structure of the control flow, instead of using state flags. -
C, C++, C# Programming - what is the sense in this
Jouni replied to PB666's topic in Science & Spaceflight
Goto is like const_cast in C++. It does have its uses, but the situation you're planning to use it is not one of them. -
These figures tell us one thing: space exploration is cheap. It's also so utterly boring that nobody wants to spend more money on it. The Norwegian Oil Fund could probably build a permanent colony orbiting Saturn, if they wanted to.
-
Voice is the most important quality. Most gamers seem to have so annoying voice that watching their videos is pure torture.
-
Legalities of space mining - SPACE act of 2015
Jouni replied to RainDreamer's topic in Science & Spaceflight
Nothing prevents an indidual claiming territory in outer space. Nothing prevents another individual claiming the same territory, because those claims mean exactly nothing. You can claim whatever you want, but nobody cares. Because the Outer Space Treaty prohibits states from claiming territory in outer space, it also prohibits their courts of law from handling cases about the ownership of those territories. Otherwise the state would be claiming sovereignty over that territory. There are no property rights in outer space, just factual ownership based on your ability to defend your property. The Outer Space Treaty gives limited protection to the property launched from Earth, as well as to its immediate surroundings, but otherwise you would be on your own.