A harmless example comes from httplib where an if / elif statement had tests from strings and for unicode strings. They were both converted to test for strings by the conversion tool. The code looked like this:
if isinstance(buf, str): # regular strings
# do something
elif isinstance(buf, str): # unicode strings
# do something else
In this case, the second branch could be deleted. In other cases, the effects were harmful. If you passed a bytes object as the body argument in an HTTP request--passing form params for a POST reply is a common case--the bytes object would be converted via str() to a string.
>>> body = b"key=value"
That is, str() uses repr() to convert bytes to a string. That's simplfy incorrect.
It will take a long time to sort out all of these problems. We don't have a lot of experience from application developers who are using Python 3.0, so we have to invent solutions as we go along. We're likely to make mistakes or at least make sub-optimal API decisions.
I can of think of two things that would help us make progress.
First, we ought to organize a systematic effort to review the standard library. How many of the libraries have plausible tests that exercise strings and bytes? For example, the json library was carefully tested with strings and unicode in Python 2.x. Those have all been converted to strings, so now we have a thorough set of tests for strings and none at all for bytes.
Second, we need to collect a set of best practices for writing libraries that support bytes and unicode. A typical pattern is that bytes get sent on the wire. (Wires, almost by definition, send bytes.) The applications that use the wire usually want to deal with strings, which means they need to have some way to specify an encoding to use when send to or read from the wire. We could start by collecting all the patches and bug fixes that have gone into Python 3.1 to fix string and bytes problems with 3.0.