0

Encoding using C#

Hi Friends, as we’ve discussed about encoding decoding in our previous article, so in this article we are going to discuss how we can implement encoding using c#.

Let’s summarize about encoding\decoding:

Computers doesn’t understand these characters. Computers understand only one language i.e. 0\1. These 0 and 1 are electric signal which used to maintain a state in computer memory. This state can be accessed later on and transformed into desired results.

Every character which we type or see in computer are saved somewhere in form of 0 and 1 e.g. If I type my name “Deepak Gera” then this name will be converted into stream of 0\1 by using some algorithm and then this stream will be stored somewhere in computer.

Later on when I try to access my name then this stream will be read from memory location and will be transformed into characters using the same algorithm which was used previously for transformation.

“The process of transforming characters into stream of bytes is called as Encoding”

“The process of transforming encoded bytes into characters is called as Decoding”

Encoding using C#

Create one console application and write following code in Program.cs file.   

class Program
{
    static void Main(string[] args)
    {
        string myData = "A";
        byte[] encodedData = Encode(myData);
        Console.WriteLine($"Encoded Data: {encodedData}");

        string origData = Decode(encodedData);
        Console.WriteLine($"Original Data: {origData}");
        Console.ReadLine();
    }

    public static byte[] Encode(string text)
    {
        byte[] dataBytes = System.Text.Encoding.UTF8.GetBytes(text);
        return dataBytes;
    }

    public static string Decode(byte[] dataBytes)
    {
        string returntext = System.Text.Encoding.UTF8.GetString(dataBytes);
        return returntext;
    }
}

Above code is having 2 methods. One method is for encoding in c# and another method is for decoding.

As you can see that I’ve used UTF8 encoding schema so when I debug this code, I got 1 byte for character “A”. ASCII code for “A” is 65. If I use UTF7, then also I’ll get the same results as UTF7 is also 1 byte.

encoding using c#

Let’s check another character which takes 2 bytes. “¢” symbol takes 2 bytes in code pages so let’s check this with UTF8 schema. We can see below that this character is taking 2 bytes.

If we encode same character using UTF7 schema then it will convert using symbols from ASCII so it will take more bytes so it is considered less efficient for multi byte characters.

encoding using c#

Same way you can perform encoding using c# using different schemas

ASCII

byte[] dataBytes = System.Text.Encoding.ASCII.GetBytes(text);

UTF-16 (Little Endian)

byte[] dataBytes = System.Text.Encoding.Unicode.GetBytes(text);

UTF-16 (Big Endian)

byte[] dataBytes = System.Text.Encoding.BigEndianUnicode.GetBytes(text);

In little endian machines, least significant byte of binary representation of the multi-byte datatype is stored first. On the other hand, in big endian machines, most significant byte of binary representation of the multi-byte datatype is stored first. You can see more about Big Endian\Little Endian

UTF-32

byte[] dataBytes = System.Text.Encoding.UTF32.GetBytes(text);

 

 

You can check yourself and see how these schema results are differ.

Deepak Gera

Hi, I am Deepak Gera, Consultant, Founder and Chief Editor for www.lazyheap.com. Working as Technical Architect and having diversification in Industry types, Tools, Techs.

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.