Jump to content

aussiedwarf

Members
  • Posts

    3
  • Joined

  • Last visited

Reputation

0 Neutral

Profile Information

  • About me
    Curious George
  1. Living in the good parts of Wollongong with fast fttp NBN. I get a nice 92mbits/s down 1: Good fast internet although ping to USA is limited by the speed of light. 2: Generally very mild here though it is very windy today. 3: No snow here, but there is in the snowy mountains.
  2. By ATGC I mean the nucleobases adenine, thymine, guanine, and cytosine. And since there are only 4 of them, only 2 bits need to be used to store them meaning a single byte can store 4 consecutive base pairs.
  3. Reading and writing 2 bits of information can be done with shifting and masking. Off the top of my head something like this should work but then I don't know fully what your assembly code does. Since it is short it could also be inlined to potentially save a function call. //reads bits int GetBits(uint64_t bytes, int index) { return (bytes >> (index * 2)) & 3; } //sets bits but will not overwrite set bits uint64_t SetBits(uint64_t bytes, int bits, int index) { return bytes | (bits << (index * 2)); } //overwrites bits to be set uint64_t OverwriteBits(uint64_t bytes, int bits, int index) { return (bytes & ~(3 << (index*2))) | (bits << (index * 2)); } Is this just a short hand way of storing dna, so A, T, G or C? Looking back at Mr Shifty posts, he has linked to what I wrote above and is right about linking to assembly.
  4. Visual Basic was how I learn't to program. I then learn't C and C++. I have to admit that I was put off VB after Microsoft mashed .Net on top of it. I have not used VB now in 8 years so I have forgotten much of it. From my memory, it is an ok language that can be used to do many things but there are now better tools out there. c/c++ for high performance code, c# or java for applications, python for easy to use smaller programs. I doubt you will get any real performance gain from using custom math functions. Things like sqrt is actually reasonably fast and run on a sqrt register on the cpu ( for desktops at least). Trigonometric functions can be slow and some performance can be gained by using tables or a much much simpler algorithm. Now I don't know how complex the processing is so I am going to make some probably wrong assumptions. When it come's down to it, reading files from a hard drive, even a SSD is many factors slower then reading from ram. There is the possibility that the code spends a lot of wasted time waiting for the data to read. Other things that can greatly improve performance is trying to make the data cache friendly. A L1 cache read is much much faster than reading from ram. If you are working with genome data, then the data may already be fairly packed and relatively cache friendly as is, depending on how you are reading it. Other possible improvements would also include using multiple threads. For something that is more advanced, you could also use SMID (Single instruction multiple data) to process more data at once. Using SSE or AVX registers once can process 4 floats, ints or doubles at once. You will certainly want to use c/c++ for SMID. If the algorithm can be split up hundreds of times, then running the data through the graphics card could also speed things up. As your still learning, perhaps it would be better to start off using a c++ compiler to write C code and pick up on c++ as you progress. If you used something like github or bitbucket, it would allow us to look though and give advice as well. PB666, you also have me intrigued on what sort of processing you are doing. I don't really know much about genome sequencing/processing and I would love to understand what you are doing.
×
×
  • Create New...