Modern TLS/SSL on 16-bit Windows
Lately, there's been an resurgence of new programs written for retro computers—everything from a Slack client to many Wordle clones, to a Mastodon client. But most of these programs, if they connect to the Internet, require a proxy running on a modern computer to handle the SSL/TLS connection, which almost all APIs nowadays require. For my Gateway 4DX2-66 running Windows 3.11 for Workgroups, making it reliant on a modern machine for any kind of real Internet use is a sad state of affairs—so I decided to change the status quo.
It wasn't that Windows 3.1 didn't support secure connections; Internet Explorer 2, for instance, supported SSL. But over time, both clients and servers have upgraded to newer versions of the SSL (now called TLS) protocol and algorithms, and have dropped support for older versions as vulnerabilities like POODLE are found with them.
Normally, programs can upgrade to using a newer TLS library (e.g. OpenSSL) to get support for modern TLS, but one of the biggest barriers to this for Windows 3.1 is that it's a 16-bit OS; TLS libraries today tend to support 32-bit OS's, and sometimes 16-bit OS's for embedded hardware, but never Windows 3.1 itself.
I was inspired by Yeo Kheng Meng's notes as he discovered the same challenges for his aforementioned Slack client—it seemed possible to somehow modify a decent TLS library and convince it to compile and work in the Windows 3.1 environment, and I wanted to give it a shot to give my Gateway, and other Windows 3.1 computers around the world, the ability to connect to most of the Internet again.
What does success look like?
To connect to most servers today, we need to be able to speak TLS 1.2. Older versions of TLS have widely been deprecated by servers in the last few years. TLS 1.3 is the newest version of TLS, standardized in 2018, so it'll be a bonus to support it as well, but not all servers support it yet.
|SSL 2.0||1995||Deprecated in 2011 (RFC 6176)|
|SSL 3.0||1996||Deprecated in 2015 (RFC 7568)|
|TLS 1.0||1999||Deprecated in 2021 (RFC 8996)|
|TLS 1.1||2006||Deprecated in 2021 (RFC 8996)|
|TLS 1.2||2008||In use since 2008|
|TLS 1.3||2018||In use since 2018|
We also need to support a set of popular cipher suites, or the algorithms that the TLS protocol uses under the hood to actually exchange keys and encrypt data. Upon connection, the TLS client tells the server which cipher suites it supports in the "Client Hello" message; if the server doesn't support any of them, it will reject the connection with a "No common ciphers" error. There are 37 cipher suites for TLS 1.2, so we need to at least support the most common (read: less vulnerable) ones for our client.
One non-goal of this effort was actually creating a secure implementation. Given that I'm already throwing security out of the window by using 30-year old software on the Internet, I decided it was okay to make some tradeoffs to get a working implementation. The biggest limitations are that I am using a fake random number generator, and I am not verifying certificates.
Finding the right TLS library
Given the success criteria above, I looked into a few different TLS libraries that might work: OpenSSL, BearSSL, Mbed TLS, and WolfSSL. WolfSSL stood out among the pack as it had explicit 16-bit compiler support while being fully-featured and well-supported. It also had a wide range of support code for all sorts of hardware with different constraints—including limited memory—giving me a plethora of examples I could learn from.
I also needed a working TCP/IP stack on Windows. It's hard to believe now, but Windows 3.x didn't come with TCP/IP built in. There were many options users could choose from, including Trumpet Winsock, a popular shareware TCP/IP stack. For my testing purposes, I decided to go with Microsoft's own TCP/IP implementation, TCP/IP-32, which came as a separate download from Microsoft. This provides a TCP/IP implementation which you interface through Winsock (the Windows Sockets API)—an older version of the same API you can use even today on Windows 11, but it should work with any other implementation.
My plan was to compile WolfSSL into a DLL that could then be used from any program. Because WolfSSL is written in C, I needed a C environment that could create 16-bit Windows DLLs. I decided to use Open Watcom v2, which provides phenomenal support for 16-bit Windows programs and DLLs, and cross-compilation including from 64-bit Windows 11.
Unlike with Windle, to make things a bit easier for myself, I did the main development on Windows 11, with a folder shared to a VirtualBox VM running Windows 2000 (which can run 16-bit programs, unlike Windows 11) for most of the iteration and testing; then verified at intervals that it still worked on Windows 3.11 for Workgroups.
My first challenge was figuring out how to build and use DLLs on Windows 3.x. To do so, I needed to understand the difference between far and near pointers.
Nowadays we tend to conceptually think of pointers to memory as absolute addresses; however, when programming for 16-bit Windows, the x86 memory segmentation architecture dating back to the Intel 8088 and 8086 CPUs has to be taken into account.
Disclaimer: This is my understanding after reading up on this subject. I've tried to provide a simple explanation on this without going into too much of the details, e.g. the differences between Real and Protected modes. Please reach out if anything here is fundamentally incorrect.
With segmentation, memory is split in to segments each up to 64KB in size. A segment might represent the code (the instructions) for the program, the data (the global and static local variables), or the stack for a program. Addresses in memory consist of two parts: the segment the memory lives in, and the offset within that segment. A pointer with this full, two-part address is called a far pointer.
|Segment:||xxxx xxxx xxxx xxxx 0000|
|+||Offset:||0000 yyyy yyyy yyyy yyyy|
|=||Far pointer:||zzzz zzzz zzzz zzzz zzzz|
The system has a set of registers, including the Code Segment (CS), Data Segment (DS), and Stack Segment (SS), which keep track of the segments that are applicable to the currently running program. When passing a pointer to a function inside the same program, you typically pass just the offset of the variable you're referencing, and the segment is implied. This kind of pointer is called a near pointer, and is used most of the time to save on memory.
When calling into a DLL from a program, a few things are different. First, the instructions for a DLL are in a different segment from the program's instructions, so the CS register must be changed to the DLL's in the function call. The C compiler takes care of this for us if we use the non-standard __far keyword in front of the function name, meaning that the compiler will generate a "far call".
Second, when passing pointers into DLL functions, we have to use far pointers; DLLs are tricky in that when running code in them, the Code and Data segments are those of the DLL, but the Stack segment is that of the calling program's. Using far pointers ensures that the wrong assumptions aren't made about which segments pointers point to.
Another non-standard C keyword that is often used for DLL functions is __pascal. This specifies that this function uses the Pascal calling convention. A calling convention specifies how parameters are passed to a function in memory and how the stack is cleaned up after a function returns. (With pascal, parameters are pushed on the stack left-to-right and the callee removes them from the stack on return). Windows 3.x adopted the calling convention used by Borland Pascal (and Delphi 1.0), hence the name, but later Windows versions dropped it in favor of stdcall.
This means all the functions in WolfSSL that will be called from an external program need to be defined as __far __pascal. For example, the function to initialize WolfSSL is defined as follows:
int __far __pascal wolfSSL_Init(void);
Everything I've said above generally applies to calling the Windows API as well; after all, the Windows API is implemented as calls into the DLLs that are part of Windows.
Segment too large
The biggest limitation I ran into with porting WolfSSL to 16-bit Windows was the 64KB maximum size of segments—each Code and Data segment can only be a maximum of 64KB. This is a challenge when you're compiling a TLS library with support for dozens of cipher suites, and various protocols both commonly and rarely used.
Thankfully, we're not limited to 64KB in total code or data for our DLL. 16-bit compilers have a concept of "memory models", which were defined as Small, Medium, Compact, Large, and Huge. All compilers for the platform, whether from Microsoft, Borland, or Open Watcom, support these models as a compile flag; models change how the compiler uses far/near pointers and which libraries are used in the generated instructions. I decided to use the "Large" memory model, which allows for multiple Code segments and Data segments in your program or DLL.
|Model||Code Segments||Data Segments|
To save on memory in general, I started by turning off as many compile-time feature flags in WolfSSL as I could without jeopardizing its ability to connect to most modern TLS servers. This includes features used by insecure cipher suites (e.g. MD5, DES, RC4) as well as other features like DTLS and server support I wouldn't be using for a simple TLS client.
I quickly found that even that wasn't enough, as I'm limited to 64KB of code in each object file. Every time an object file hit that limit, the compilation would fail with "Segment too large". In particular, WolfSSL's internal.c file is a whopping 1.25MB (too large for even GitHub to render), and would fail the build every time I enabled a feature I needed.
I was able to get that file down to a size that would fit into two segments by moving code around into two files, internal.c and internal2.c—elegant, I know! This was the most agonizing and time-consuming part of this whole process; I was in a loop of testing the code to see if I had enough features to connect to a server, enabling a feature, seeing "Segment too large", and moving more code while trying not to break the intricate dependencies between functions, or inadvertently changing the meaning of the nested ifdef macros in the code. Eventually, it compiled.
Even with WOLFSSL.DLL compiled, I was running into a strange "Access Denied" error while loading the DLL with the LoadLibrary Windows API call. Eventually, after much agony, I figured out I needed to turn off debugging information in the generated code, and change the target processor from the default (Intel 8086) to 80286 in the compiler options.
Once I got everything compiling, I whipped together a quick test program to hit Qualys SSL Lab's browser capability tester and download the results to a file.
I could debug what was going on (especially with missing features) by using Wireshark to decode the "Client Hello" message, which is the message that a TLS client sends to the server upon connection advertising its capabilities.
In my initial builds I was missing the Server Name Indication (SNI) extension, which is generally required nowadays to allow servers to host multiple TLS websites on one IP address with different certificates. With that added, and after some more code moving to internal2.c, I was in business.
After downloading the Qualys results on Windows 3.11 for Workgroups and opening the resulting file in Internet Explorer 3, voilà:
You can see our build of WolfSSL supports a healthy number (2) of the finest cipher suites, in addition to a bunch of less secure ones—which is more than enough for most web sites to talk to us!
Bonus: TLS 1.3
Supporting TLS 1.2 was enough for most use cases, but WolfSSL has TLS 1.3 built in and it would be awesome if our Windows 3.1 apps could take advantage of it too. TLS 1.3 has some significant changes in the protocol compared to TLS 1.2, and also dispenses with the 37 cipher suites in lieu of 5 secure ones.
After playing around with the compiler flags, and moving even more code around to avoid "segment too large" errors, I was finally able to get my build of WolfSSL to compile with TLS 1.3 enabled. However, I did run into an issue where the code thinks the received encrypted data is "too large" because it's 1 byte. For now, I've commented the check out, and it seems to work in real-world situations, but I would love to understand why this is happening.
Here's a screenshot of the Qualys page showing TLS 1.3 working, with the TLS_AES_128_GCM_SHA256 cipher suite:
What's the use of a library if you can't use it? Check out WinGPT, an AI Assistant for Windows 3.1, which makes use of the WolfSSL port to connect directly to the OpenAI API servers.
I am providing the modified source code for WolfSSL here, under the GNU General Public License (GPL) v2 license that WolfSSL is licensed under. As noted above, this code with the modifications I have made is not secure, not reliable, and there is no warranty. You should definitely not use it for any purposes other than testing for your own entertainment. You can find the full GPL v2 license here.
You will also find that I have not figured out a way to integrate with the build tools (like CMake) that the rest of WolfSSL uses, or made changes in a way that will be easy to integrate back into the main source tree. If someone wants to make 16-bit Windows a supported architecture for WolfSSL, that would be great—I haven't got there yet.
WolfSSL for Windows 16-bit + WinGPT 1.0 source code (14M)
-  The closest is Didiet Noor's prior work in getting mbed TLS working on Windows NT 3.x and 95 but Windows NT 3.x, like Windows 95, is a 32-bit OS.
-  This model applies to Real Mode, but Protected Mode uses a descriptor table and different sizes for addresses. Read the Wikipedia article for more details.