Jump to content

C, C++, C# Programming - what is the sense in this


PB666

Recommended Posts

Lua was written with the goal of maximal simplicity and being easy to embed as scripting language IIRC

If you use LuaJIT its very fast too.

Though its mainly a scripting language, not used as often for standalone executables. And I would say a more strict language with strict typing leads to better code and more things learned.

Link to comment
Share on other sites

This is a joke, right. My structures and consts all by themselves will be 100 lines. I have no choice but to move to C, VB is too slow, win10 makes it even slower. The assunption here is that you know what the density of Sqrt operations are in the code, as alredy describe above, i might get away with a substantila performabce tweeking the compiler, more if I use the inverse float square root function. Read the post, if you use an older c++ math.sqrt it can take 400 cycles, they managed to get that down to 3 specifying the inst set and using x * rsqrtss x. lol, 100 fold difference........

To get a feel take four charcter types, lets make them bytes , now make random sized that accumulate to 150 gb. And you dont know apriori how big in terms of numer of byte strings the set is. Now i am going to give you 6gb divided in 22 units,

These units are not identical to the first, not even in pieces, they vary in both string sequnce, multiple type can exist at a position in a string and their identities are known to shrink and swell. Your job is to find which unit every piece in the first set best fits into the second and then fit those byte strings into a long chain. In addition there is degenerecy, some pieces best fit in several places, and other pieces may not fit at all, so these have to be set aside. the inclusion and set asides are determined mathematically using log functions and square roots.

It still sounds to me you're trying to speed up your processing by making your code run faster, instead of by running less code. Can't you run some cheap checks on your data? Without knowing values, you could *for instance* do an initial square root calculation based on the integer value and use a lookup instead of a "real" sqrt calculation. That can give you a pass/fail in far less cycles than actually calculating the square root to the level of accuracy that is needed. Or use hashing. There are many options that save you from searching every single data set every single time, and it sounds like that's where the real profit is.

Optimizing your code will rarely make it run substantially faster unless you profiled your code and know where it spends 90% of its time crunching. Optimizing your algorithm usually gets you performance increase in the order of magnitudes. It would be totally not shocked if it turns out your project can run faster in a scripting language like Python, because it allows you to discard large chunks of data based on complex algorithms, than in C or even assembler because you're using a brute-force approach.

Link to comment
Share on other sites

It still sounds to me you're trying to speed up your processing by making your code run faster, instead of by running less code. Can't you run some cheap checks on your data? Without knowing values, you could *for instance* do an initial square root calculation based on the integer value and use a lookup instead of a "real" sqrt calculation. That can give you a pass/fail in far less cycles than actually calculating the square root to the level of accuracy that is needed. Or use hashing. There are many options that save you from searching every single data set every single time, and it sounds like that's where the real profit is.

Optimizing your code will rarely make it run substantially faster unless you profiled your code and know where it spends 90% of its time crunching. Optimizing your algorithm usually gets you performance increase in the order of magnitudes. It would be totally not shocked if it turns out your project can run faster in a scripting language like Python, because it allows you to discard large chunks of data based on complex algorithms, than in C or even assembler because you're using a brute-force approach.

In the process of running cheap checks right now. But last night I found someone has already done part of the alignment online, problem is they did not parse the contigs into two branches. Lets say that i gave everything a unique but near infinite meaning using long long integers (comperable to firing a bullet from an orbit between mars an earth and hitting the hole of A donut floating around in earths orbit). This means that 6 billion sites get long-long id strings (because Unidirectional integers only reach 4.8 billion it represents 32 pieces of information about a site that can identify it and. 4^32 is roughly 1/1000000 chance of hitting the wrong target) So that each site now has 1 8 byte cell. So thats 48 gb of data, and my memory is 16 gb. I can deal with this anyway because i can simply select certain sites. in the institutes they use massively paralleled processing and have a huge amount of memory, but I don't. So i have to find clever work arounds one is to simply stop at 4.8 gig, i can codify the data and cut it to 3 gigs, but there will be alot of degeneracy.

Link to comment
Share on other sites

Just wanted to ask, is Lua a good language to learn how to program? What do people think of it, anyway?

I already went through it, so this question is out of pure curiosity.

It's not a bad language, but isn't great for learning. If you want to learn a script, I'd start with Python.

Depending on what your goals are, however, it might be a better idea to just start with C. It's a simple enough language, and it will be far more useful in the long run.

Link to comment
Share on other sites

I have come to this thread late, but here are a few thoughts on learning to write software and use various languages:

* learn humility - there is always a better way to do it, there is always something you have not thought of, a bug you have not noticed or a test case you have not considered. Other folks might have written it differently, and there are always trade offs between efficiency, maintainability, portability, extensiblity, and time and money available.

* when choosing a language or development platform there are several considerations: target platform, speed of execution, speed of development, development tools - debuggers etc, ability to link to other modules, language features, library features, interpreted or compiled. Again there are always trade offs. Different options have different benefits and drawbacks, and different folks would make different choices. No-one is right or wrong, but some will have more success than others.

* Comment your code heavily right from the start. However, you should also be writing your code so that it is easy to understand without the comments. Make your flow control clear, name your variables thoughtfully, refactor your code so your methods/functions do easy to understand things, use the language features for the purpose they were intended, use standard design patterns, and design your code for reuse, even if you are not sure how you will reuse it.

* Test as you go, at the finest granularity you can, and if possible automate testing too.

* Understand your problem space. If you do not fully understand the concepts that you are working with, how can you expect to describe how to use them to the software?

* If you really want to learn how to get the best out of a language (any language) then learn how to write a compiler and an interpreter.

I am sure there are plenty more that others far greater than I could offer. I spent many years in the trade, but that was quite a while ago now. I am sure that there are better folks to learn how to code from than some random priest on the internet!

Edited by codepoet
typo
Link to comment
Share on other sites

Why I'm moving on from VB

One has to wonder how computer programmers think how outsiders might approach computer language. Now certainly on the night the folks at M$ wrote this little jewel someone must have been passing around a little microdot.

See if you can figure out where the binary input occurred (Hint think bonafida function statements). The one major difference between VB and C is in VB you never had to worry about adding additional classes or dll's to your program, not so anymore

Add IO.system class library before you begin.

    Sub OpenSource()
Dim pathSource As String = "c:\x.bin"
Try
Using fsSource As FileStream = New FileStream(pathSource,
FileMode.Open, FileAccess.Read)
' Read the source file into a byte array.
Dim bytes() As Byte = New Byte((fsSource.Length) - 1) {}
Dim numBytesToRead As Integer = CType(fsSource.Length, Integer)
Dim numBytesRead As Integer = 0
While (numBytesToRead > 0) ' Read may return anything from 0 to numBytesToRead.
Dim n As Integer = fsSource.Read(bytes, numBytesRead, numBytesToRead)
If (n = 0) Then
Exit While ' Break when the end of the file is reached.
End If
numBytesRead = (numBytesRead + n)
numBytesToRead = (numBytesToRead - n)
End While
numBytesToRead = bytes.Length
'[Write section eliminated damage would have already been done at this point]
End Using
Catch ioEx As FileNotFoundException
Console.WriteLine(ioEx.Message)
End Try
End Sub

From what this was cut this was a way to copy a text.file using their new code. There are so many things wrong with this code it actually boggles the mind.

So let me go through these. Why use Visual basic to copy a file.

There is only one reason that I have repeatedly use binary read/write to copy a file going back as far as 1990, and only one reason, the file is too big to transport and it needs to be split.

So you might argue maybe they have another need to copy, but since their dos does if faster, but anyway, OK, granted some crytic need, but they piped the reader in advertising the sequential read/write capability.

But wait a second, why might someone actually need to sequentially read a binary . . . . . . . . . . .the binary might be larger than the computers memory or the memory required to process the data in the binary!

Dim bytes() As Byte = New Byte((fsSource.Length) - 1) {} <---- creates a byte() array the size of the file. It just so happens the file I read was 256Mb, but I had 22 files to read, and the last file to be read was 64Gb

Before this was created it should have queried the user if size was a certain fraction of available program memory

Dim numBytesToRead As Integer = CType(fsSource.Length, Integer) <------- if the filelength is not an integer, luckily in this instance Integers have been promoted to the old long, but the better choice was UInteger type

Lets just suppose that at Integer maxima (2,147,483,647) the program sends only the maxima back as the limit length, so basically that is 2Gb of the average 4Gb computer memory, and that is an awful heap of memory for VB to handle

The second thing is what might the user do with those bytes. For example I am going to expand each byte to 6 bytes, that would mean 2Gb + 6Gb = 8 Gb thats the memory of the overwhelming majority of computers, and of course since these are going to be in arrays its actually over the limit of almost all PCs. There really needed to be a clarify function call on the back side of this function. "Msgbox ......"Do you really want to load this much data [datablocksize]?"

Dim n As Integer = fsSource.Read(bytes, numBytesRead, numBytesToRead)<------This is the transfer function, hard to deduce because the output is an integer see variable array bytes tucked into the argument list.

This would not have crashed the program, the dynamic declaration of bytes() would have.

If you had a file you wanted VB to copy you would want to do this.

Open up a binary file for sequential read.

Read to a certain part of the file storiing the binary in an array

Open an output file

storing the data in an output file

Close the output file

Repeat the Read unless the 'certain' size is larger than the binary information left in the file (-1).

Close the binary file

Close the last outfile after storing the data in it.

They made something excessively complicated providing a solution for which there is no need, are they trying to drive users to another platform?

Link to comment
Share on other sites

Visual Basic was how I learn't to program. I then learn't C and C++. I have to admit that I was put off VB after Microsoft mashed .Net on top of it. I have not used VB now in 8 years so I have forgotten much of it. From my memory, it is an ok language that can be used to do many things but there are now better tools out there. c/c++ for high performance code, c# or java for applications, python for easy to use smaller programs.

I doubt you will get any real performance gain from using custom math functions. Things like sqrt is actually reasonably fast and run on a sqrt register on the cpu ( for desktops at least). Trigonometric functions can be slow and some performance can be gained by using tables or a much much simpler algorithm.

Now I don't know how complex the processing is so I am going to make some probably wrong assumptions. When it come's down to it, reading files from a hard drive, even a SSD is many factors slower then reading from ram. There is the possibility that the code spends a lot of wasted time waiting for the data to read. Other things that can greatly improve performance is trying to make the data cache friendly. A L1 cache read is much much faster than reading from ram. If you are working with genome data, then the data may already be fairly packed and relatively cache friendly as is, depending on how you are reading it. Other possible improvements would also include using multiple threads. For something that is more advanced, you could also use SMID (Single instruction multiple data) to process more data at once. Using SSE or AVX registers once can process 4 floats, ints or doubles at once. You will certainly want to use c/c++ for SMID. If the algorithm can be split up hundreds of times, then running the data through the graphics card could also speed things up.

As your still learning, perhaps it would be better to start off using a c++ compiler to write C code and pick up on c++ as you progress. If you used something like github or bitbucket, it would allow us to look though and give advice as well.

PB666, you also have me intrigued on what sort of processing you are doing. I don't really know much about genome sequencing/processing and I would love to understand what you are doing.

Link to comment
Share on other sites

PB666, you also have me intrigued on what sort of processing you are doing. I don't really know much about genome sequencing/processing and I would love to understand what you are doing.

Link to comment
Share on other sites

So here is the issue. I want to pack 4 pieces of information into a byte. (byte = 2^8 possibilities, or 4^4). a 64 bit register is 8 bytes which means that i can pack 32 pieces of information into a register.

So i have an assembly language routine that does this and a few other things, is it poosible to integrate any assembly language routine into C++?

Link to comment
Share on other sites

Late to the party as well.

I've been sampling various programming languages of late (JavaScript, C++, C#, and VB), and I cannot for the life of me understand why they all exist!

I understand that there may be some differences in advanced usage, but what possible need could there be for all of the different syntax? console.log vs console.WriteLine for instance. How could someone NOT have standardized this by now?

If there was ONE syntax for mid-level programming languages, I should think we would leap forward in all sorts of computing-related fields. Am I missing a fundamental aspect of this?

standards.png

Link to comment
Share on other sites

So here is the issue. I want to pack 4 pieces of information into a byte. (byte = 2^8 possibilities, or 4^4). a 64 bit register is 8 bytes which means that i can pack 32 pieces of information into a register.

So i have an assembly language routine that does this and a few other things, is it poosible to integrate any assembly language routine into C++?

Usually, yes, but the particular implementation depends on your compiler and machine. Assembly is always machine specific, so there's no generic case for it. You'll have to dig into your documentation to do it.

It's also probably possible to simply re-write the routine in C. C is somewhat terrible at bit-wise manipulation, but it can be done through masks and the XOR operator. You can also do it by specifying field widths in a struct (or, more likely, a union), but this is typically not a recommended practice because it creates non-portable code. (It will break on a machine with different register sizes and/or byte ordering.) Both methods will compile down to the bit-manipulation instructions that are found in most assembly instruction sets (a rare case where C is more verbose than assembly).

Link to comment
Share on other sites

Usually, yes, but the particular implementation depends on your compiler and machine. Assembly is always machine specific, so there's no generic case for it. You'll have to dig into your documentation to do it.

It's also probably possible to simply re-write the routine in C. C is somewhat terrible at bit-wise manipulation, but it can be done through masks and the XOR operator. You can also do it by specifying field widths in a struct (or, more likely, a union), but this is typically not a recommended practice because it creates non-portable code. (It will break on a machine with different register sizes and/or byte ordering.) Both methods will compile down to the bit-manipulation instructions that are found in most assembly instruction sets (a rare case where C is more verbose than assembly).

Thanks, as long as assembly is specific to the processor the sky is the limit?

Link to comment
Share on other sites

I've been sampling various programming languages of late (JavaScript, C++, C#, and VB), and I cannot for the life of me understand why they all exist!

If there was ONE syntax for mid-level programming languages, I should think we would leap forward in all sorts of computing-related fields. Am I missing a fundamental aspect of this?

There's a sense in which all the languages you mentioned (with the exception of VB, but who codes in VB?) do share a syntax. They're all C-like -- that is semi-colon statement terminations, grouping with braces, case sensitivity, whitespace ignored, unary increment and decrement operators, type names and strictness. Java is included in that class as well, and even perl has some of those features. Certainly there are other coding models (e.g. Python), but C-like ones are easily the most common.

Thanks, as long as assembly is specific to the processor the sky is the limit?

You can code in assembly, compile the assembly to an object file, then link in the object file as a function call from your C or C++ code. There are ways to give the object file a handle that the linker will recognize, but you'll have to look those up as there's no standard way to specify a linker's activity, so it's implementation specific. There should be a section in either your compiler or linker documentation that describes how to link in an assembly object file.

Edited by Mr Shifty
Link to comment
Share on other sites

There's a sense in which all the languages you mentioned (with the exception of VB, but who codes in VB?) do share a syntax. They're all C-like -- that is semi-colon statement terminations, grouping with braces, case sensitivity, whitespace ignored, unary increment and decrement operators, type names and strictness. Java is included in that class as well, and even perl has some of those features. Certainly there are other coding models (e.g. Python), but C-like ones are easily the most common.

You can add PHP to the C-like list too.

If you want to know how different syntax could be lookup up LISP.

Link to comment
Share on other sites

You can code in assembly, compile the assembly to an object file, then link in the object file as a function call from your C or C++ code. There are ways to give the object file a handle that the linker will recognize, but you'll have to look those up as there's no standard way to specify a linker's activity, so it's implementation specific. There should be a section in either your compiler or linker documentation that describes how to link in an assembly object file.

Yeah Link, the linux stuff is pretty well packaged to go along with C. Can't I do that with VB also, anyway aside from the point.

So if I don't want to pay the cost of transferring an array variable to an variable before the transfer I need to have 4 different variables IOW don't specify an array variable until all the assembly math is complete and ready to package. Yeah, I will have to read up on the link in and outs later this PM.

Link to comment
Share on other sites

Reading and writing 2 bits of information can be done with shifting and masking. Off the top of my head something like this should work but then I don't know fully what your assembly code does. Since it is short it could also be inlined to potentially save a function call.


//reads bits
int GetBits(uint64_t bytes, int index)
{
return (bytes >> (index * 2)) & 3;
}

//sets bits but will not overwrite set bits
uint64_t SetBits(uint64_t bytes, int bits, int index)
{
return bytes | (bits << (index * 2));
}

//overwrites bits to be set
uint64_t OverwriteBits(uint64_t bytes, int bits, int index)
{
return (bytes & ~(3 << (index*2))) | (bits << (index * 2));
}

Is this just a short hand way of storing dna, so A, T, G or C?

Looking back at Mr Shifty posts, he has linked to what I wrote above and is right about linking to assembly.

Link to comment
Share on other sites

Reading and writing 2 bits of information can be done with shifting and masking. Off the top of my head something like this should work but then I don't know fully what your assembly code does. Since it is short it could also be inlined to potentially save a function call. Is this just a short hand way of storing dna, so A, T, G or C?

No, its a way of encoding a search and find string with as few bytes as possible, also can be used for direct addressing in shorter searches. A T G and C are letters they would require a full byte, so they would need to be encrypted and then packed. A, C, G, T are ASCII decimal codes 65, 67, 71, 85? and a byte is composed of 2^8 possibilities and this leaves a maximum for a split of 2^4 or 16, each of these exceed 16.

it can be accomplished by


MOV AL, m1
MOV CL, 2
SHL AL, CL
ADD AL, m2
SHL AL, CL
ADD AL, m3
SHL AL, CL
ADD AL, m4
MOV [Output], AL

Link to comment
Share on other sites

Yeah Link, the linux stuff is pretty well packaged to go along with C. Can't I do that with VB also, anyway aside from the point.

You can. CRL supports execution of native code as a function call. I don't know specifics on how you'd go about linking Assembly/C code from VB, or even if you can do that directly, but what you certainly can do is compile and link your Assembly/C code into a dynamic library (DLL), and load it from your VB program. Then you can call your Assembly/C functions just like you would call any other dynamic library function from VB. I have done this with C# code.

What you will need to be a little careful about is how variables get passed to your library. I see to recall needing to set up arrays of data in a special way. But it's straight forward enough.

Link to comment
Share on other sites

By ATGC I mean the nucleobases adenine, thymine, guanine, and cytosine. And since there are only 4 of them, only 2 bits need to be used to store them meaning a single byte can store 4 consecutive base pairs.

They can be encoded 0~3, N which appears in poorly defined parts has to be encoded as all variants or sequence that contains it must be ignored otherwise the packing drops to 2 per byte.

Link to comment
Share on other sites

You can. CRL supports execution of native code as a function call. I don't know specifics on how you'd go about linking Assembly/C code from VB, or even if you can do that directly, but what you certainly can do is compile and link your Assembly/C code into a dynamic library (DLL), and load it from your VB program. Then you can call your Assembly/C functions just like you would call any other dynamic library function from VB. I have done this with C# code.

What you will need to be a little careful about is how variables get passed to your library. I see to recall needing to set up arrays of data in a special way. But it's straight forward enough.

Wow, these procedure calls are archaic, or maybe thats how C makes the code portable. . .

To pass variables you have to load them onto the stack in reverse order, but accessing the variables does not remove them from the stack; they have to be repointed or manually removed from the stack after return call is made. If the procedure is a Function proper then EAX 32 has the return value, but if its 64 bit it has to be placed in EDX and EAX. Seems is not the most clever way, particularly if you want to send an array of data to a routine you would send a pointer in EAX to where the datasegment begins.

I'll have to keep this in mind because this appears not to be specific to AL calls but to all function calls in C.

Link to comment
Share on other sites

How functions are called in C is not part of the spec, but this is the most common way, and what Visual Studio is going to work with.

Except, you don't need to "remove" anything from the stack. You simply increment the stack pointer. It's literally one of the fastest operations you can perform on a CPU.

Besides, Visual Basic will also use the stack under the hood, except, it's going to push a lot more information, requiring a huge overhead on function calls compared to C.

Link to comment
Share on other sites

How functions are called in C is not part of the spec, but this is the most common way, and what Visual Studio is going to work with.

Except, you don't need to "remove" anything from the stack. You simply increment the stack pointer. It's literally one of the fastest operations you can perform on a CPU.

Besides, Visual Basic will also use the stack under the hood, except, it's going to push a lot more information, requiring a huge overhead on function calls compared to C.

Yeah this author is doing that up until he gets to the Ret, then he starts backing stuff off the stack into the registers. The problem here is if you cant change the values below the calling stack pointer in the stack before the return call, you can only access them, which means at most you get a parsable 64 bit number on the return. In VB at least you can change the arguments of the procedure unless you use the byval modifyer, this means you have access to the args before the return. Anyway Im sure ive not got all the details right, i see many GFPs in my future, lol.

It looks as if the only way to access the AL procedures is through C calls, at least no other alternative is given. I will have to do more research.

This was designed for a linux C call that's supposed also to work with C++.

Link to comment
Share on other sites

This thread is quite old. Please consider starting a new thread rather than reviving this one.

Join the conversation

You can post now and register later. If you have an account, sign in now to post with your account.
Note: Your post will require moderator approval before it will be visible.

Guest
Reply to this topic...

×   Pasted as rich text.   Paste as plain text instead

  Only 75 emoji are allowed.

×   Your link has been automatically embedded.   Display as a link instead

×   Your previous content has been restored.   Clear editor

×   You cannot paste images directly. Upload or insert images from URL.

×
×
  • Create New...