Daniel Berlin on Security Insight on SAP security, development stuff… and all the rest

24Jun 12

Decompress ABAP source code from table REPOSRC

Howdy!
If you ever wondered, where SAP stores your report source code, you probably came across table REPOSRC – but the source code is stored in a compressed format, so there's no way to get hold of it directly.
I've been trying to figure out, how this compression works for some time now… fortunately, several people dealt with the SAP DIAG protocol recently – which got me off the ground.

Algorithm

The DIAG protocol uses a form of the Lempel-Ziv (LZ) compression algorithm and a bold attempt confirmed that this is also true for the source code compression.

The code stored in REPOSRC-DATA is actually compressed using the LZH algorithm (Lempel-Ziv plus Huffman coding), which is used by the SAP DB MaxDB database too (thanks to Dennis Yurichev for the idea).

Knowing this, I wrote a decompression tool around a small portion of the MaxDB code, which also takes care of some SAP specialties:

  • The 1st byte of the compressed data seems to be junk (or might have a special meaning !?)
  • The first 2 bytes of the decompressed source are junk, too ?!
  • Lines are terminated with character code 0xFF, which has to be adjusted
  • The 1st, 3rd, 5th … decompressed byte contains NUL (not sure, why !)

The decompressed source code has a fixed line length of 255 characters (blank-padded).

Usage

  1. Download this archive (it contains the decompressor tool compiled for Win32 and an ABAP report to dump the binary source code). The source code is available here; you can easily compile it on Linux/BSD/Unix/Windows using the enclosed build script.
  2. To extract the compressed source code from SAP, use the report "ZS_REPOSRC_DOWNLOAD". It reads the DATA field from table REPOSRC for a given report and stores it in a binary file on your workstation.
  3. Decompress the file on the command line.

PS: Works for Kernel 7.x, no guarantee for older releases.
PS 2: The functionality on non-Unicode systems is unknown… I'll check this later.

Have a nice day!

Comments (40) Trackbacks (0)
  1. Hi all,
    Hartmut Schwab sent me an updated version of the ABAP report to extract the raw source code.
    You can find his version here: GitHub Gist – ZS_REPOSRC_DOWNLOAD-1.0.2-HS.

    Cya!

  2. Hi Daniel,
    your program works fine, congratulations! I have just a quick question: in SAPMSYST we have the modules “SYST-” (ie: SYST-SIGNM) that is not send to output file.
    Do you know how we can display the source code inside these modules?

    • Hi Renan,
      usually you can display the code behind a dialog module in SE80, but for SAPMSYST it seems like the source resides in the kernel.
      As far as I can see, there’s no way to get hold of it…

      Regards, Daniel

  3. Hi Daniel,
    I really want to say thank you.
    And your website is really good for me as a SAP ABAPer.
    I used C++ to rebuild the .exe file and it works fine in Win7 32bit OS, but it not works in Win7 64bit OS.
    Error message can not open file “file name”. Did not know why this appeared.
    At last, thanks again.

    Regards, Tom

  4. To protect my report I put the string ‘*@#@@[SAP]’ and used this command:

    EXEC SQL.
    UPDATE REPOSRC SET DATA = ”
    WHERE PROGNAME = :program
    AND R3STATE = ‘A’
    ENDEXEC.

    to delete REPOSRC-DATA .
    Is there any way of someone stealing my code or copy my report?

    • Hi Guilherme,
      deleting the source code is definitely the best protection… better than the (meanwhile obsolete) *@#@@[SAP] stuff.

      Regards, Daniel

      • Hi Daniel,
        I cannot agree, sorry. The version management could still betray Guilherme… The trick is as follows: if the deleted program was transportable (i.e. not part of development package $*) and if it was released to next system (QA/PROD) at least once, then the version management created the version of that program at the release time. Then, if you re-create the program of the same name in DEV again (it can be created even as a local object this time), the program and its versions (still sitting in VRSD table) are re-linked again. You are now in the editor, with the first generated line REPORT in front of you, don’t save, just go to the version management and you can check the previous versions again… To consider the program deletion as being safe method of cleaning tracks, then the program has to be either created as local object at first time (no versions are created then) or the version database has to be manipulated (not recommended – it’s quite complex structure of version management tables, not a simple single table. Version database can easily been corrupted).

        Cheers, Michal

  5. Also interesting:
    SAP compression library wrapper for Python -> https://github.com/CoreSecurity/pysap/tree/master/pysapcompress

  6. Hi Daniel.
    Great research and thank you. Do you have any update on this functionality with a non-Unicode system? I have experienced erratic decompression to unintelligible source when targeting a non-unicode system.

    Kind regards,
    James

    • Hi James.
      To put it briefly: I’d love to improve the decompressor – but at the moment I simply don’t have any time for programming… and in the near future this is unlikely to change.

      Best regards,
      Daniel

  7. Hello, Daniel.
    Your program is very useful, but if the ABAP code contains non-ASCII symbols (for example Russian letters in comments or in literals) then the program breaks it. My propose is to leave UTF-16 as codepage of output file, i.e. to add UTF-16 signature at the beginning of output file and not to cut each even byte.
    Sample code is:

    142│ //for (i = 3; i < byte_decomp; i += 2) {
    143│ ret = fwrite("\xFE\xFF",2,1,fout);
    144│ if (ret != 1) {
    145│ printf("Error writing to output file '%s'\n", argv[2]);
    146│ return 11;
    147│ }
    148│ for (i = 2; i < byte_decomp; i += 2) {
    149│ if ( (bout[i+1] == 255) && (bout[i] == 0) ) {
    150│ ret = fwrite("\n", 2, 1, fout);
    151│ }
    152│ else {
    153│ ret = fwrite(bout + i, 2, 1, fout);

    instead of yours

    142│ for (i = 3; i < byte_decomp; i += 2) {
    143│ if(bout[i] != 255) {
    144│ ret = fwrite(bout + i, 1, 1, fout);
    145│ }
    146│ else {
    147│ ret = fwrite("\n", 1, 1, fout);

    …and may be it is not needed to add a LF at the end of file?
    If it is still needed, then once more line of code must be changed:

    163│ fwrite("\n", 2, 1, fout);

    instead of yours

    157│ fwrite("\n", 1, 1, fout);

    Best regards,
    VKB

  8. Can I have sample source using *@#@@[SAP] please…
    Thanks a lot…

    • Hello Oking.
      Have a look at program SAPMSYST.
      On a test system, you could also create a new report in SE38 and insert the above “magic” string at the very beginning of the source code.

      BR, Daniel

      • Hi Daniel.
        I was wondering if you could please share some tips on how to use *@#@@[SAP].
        I was just playing with it in a sandbox and was unable to activate the code once it was inserted.
        Is there a trick? Insert report?
        Thanks,
        Max

        • Hi Max.
          That’s indeed tricky, because you’re completely locked out, once the protection is in place.
          You could use report RS_REPAIR_SOURCE to insert the string, but anyway you won’t be able to activate the report afterwards…

          Regards, Daniel

  9. Hi Daniel,
    thanks for the tools, do you have any idea, why I we miss every second sign of the coding after decompression (it is like this: the 1st, 3rd, 5th… decompressed byte contains NUL)
    Thanks for the answer 🙂
    Bea

    • Hi Bea,
      the program is a proof-of-concept: it works, but under certain circumstances the results might be… strange…
      I’d really love to investigate your issue and improve the code – but my time is rather limited at the moment.
      Please don’t expect a quick fix.

      Best regards, Daniel

  10. Hi Daniel,
    I’ve tested you decompression tool and congrats it works fine. The only “little” problem is that it outputs with alphabetic “END OF LINE” chars. If I would like to upload the file again into the SAP Editor, this could give a lot of work of getting out all the “C” or “H”. Is there any possibility, that I could have another char? Because automating it could replace all the “C” and “H”, but also all the “C’s” and “H’s” of the source code itself.

    Thanks for that,
    Regards,
    Gunter

  11. Thank you Daniel 🙂 it was very helpful.

  12. Love this site for stuff like this! 🙂

  13. After I created the dump file with the ABAP report – how do I use the .exe file together with the produced extract file? Can you please post a sample of how the command line should look like?
    Many thanks.

  14. Hi Daniel,
    nice job in general!!!

    Regards,
    Jose M. Prieto

  15. Works fine Daniel, thank you.
    The code compiles fine with g++ (MinGW) using the following command line:
    g++ sap-*.cpp lib\*.cpp -o decompress.exe

    Any idea or sample code to do the LZH compression as well?

    Best regards.

    • Hi Edmond,
      you’re welcome!

      Both compression algorithms (LZC and LZH) should work fine in the current version… although the code in REPOSRC seems to use only LZH.

      The method CsObjectInt::CsDecompr (which is used for decompression) decides which algorithm to use. Have a look at line 179 and 182 in “…/lib/vpa105CsObjInt.cpp”.

      Regards, Daniel

    • Hi Daniel, thank you for this quick reply!
      Yes I saw this and I can compress / decompress after few adjustments in your code adding some “CsCompr” method usage.
      Now: if I decompress a LZH compressed file (using your code) and then try to compress the generated file (using my code with LZH algorithm), I obtain a file with a different size of the very first compressed one…
      On top of that, if I try to decompress (using your code) the generated file, I reveive an “Error: Unknown status” message… It sounds like my compression implementation has some lacks… 🙂
      Anyway, don’t waste your time on this topic, it’s not important and thank you again for your smart contributions!

      Best regards

      • Hi Edmond,
        the lack of reversibility of the decompression is probably due to the “SAP Magic” ™ in line 137 ff. of “sap-reposrc-decompressor.cpp”…
        For example, the first two bytes of the compressed code are discarded before decompression starts. If you want to implement compression, you need to add those 2 bytes in front of the result again (they contain “header data” like the compression algorithm).
        Have a look at my comments in line 137-140… I’d really appreciate, if someone would find an explanation for these oddities…

        Regards, Daniel

  16. Hi Daniel, nice job!

    Can you add a link with the decompressor executable for Win32?
    (Problems to compile with MS Visual C++ !!!)

    Best regards,
    Leandro Mengue

  17. How would I go about converting REPOSRC data obtained with SE16 for use with your tool?
    (e.g. “FF007A0800121F9D…”)

    • Unfortunately this won’t work, as the DATA field is of type RAWSTRING (i.e. a byte string of unlimited length).
      What you see in SE16 is not the original data, but a hexadecimal representation, which is limited to 128 chars.
      Even if you’d convert it back to binary, the compressed data wouldn’t be complete…

      • Hi folks,
        in my case the FUNCTION ‘TYPD_GET_OBJECT’ solved the problem.

        Hugs

        • Hi Otávio.
          This should work somehow… basically that FM does a ‘READ REPORT …’ plus some weird stuff.
          But my intention was to use the encoded data directly from the database and recover the source code from it – and not to use the kernel to decode it.

          Regards, Daniel


Leave a comment


No trackbacks yet.