PreparedStatement setBytes() #78

danap · 2015-11-30T21:15:30Z

Hello,

Somewhere between Release 3.8.10.2 and current code the PreparedStatement setBytes()
has been changed that results in what appears to be twice the byte size being stored in the
database. I'm not sure if getBytes() is the issue. This results in corrupt BLOB type being
stored.

A cursory look appears it maybe around the changes in the Kaufmann, (character > U+FFFF)
commit with the JDBC3PreparedStatement. I suppose it might also have to do with the native
library. Currently I'm using the Linux/64bit one. I have not tested on Windows or 32 bit machines.

danap.

xerial · 2015-11-30T21:55:14Z

Thanks for reporting. This sounds serious. Could you give me a sample query (and code) to reproduce this issue? Thanks.

danap · 2015-12-01T06:10:54Z

Hello,

The test case code I have created to provide is not duplicating the behavior like the application
code. The only difference that i can determine at this time is the temporary HashMap that is used
to store the bytes for the user before election to dump to a file.

Without changing anything other then the build jars, 3.8.10.1 to the current exempts the
behavior described above. I will not be able to investigate further for a day or two. It may be
a problem with the current code jar build created on my end. The environment is JRE 7.

I would not classify this as a BUG at this time without further work on my end.

danap.

danap · 2015-12-03T18:13:05Z

Hello,

After further analysis, I have determined this is not related to the
PreparedStatements setBytes(). Rather in seems the ResultSet getBytes()
retrieval.

I have created a test case that demonstrates the issue. I do not
believe this would be defined as a BUG, but a new inconsistency introduced
after build 3.8.10.1/2 in retrieving the field data more then once using
getString() then getBytes().

The application uses what has been a general safe getString() to
determine if the content is NULL. If not normally processing takes
place with reading a second time the content of the field with getBytes().
In so doing the returned bytes are 2X the size of the stored BLOB.

I have ran into a similar issue with Oracle, HyperSQL, and MSAccess.
MSAccess only allows ResultSets to be read once, then they become null
I believe. None of the other databases my application supports restrict
or deform the data in retrieving a ResultSet more then once.

The application has been updated to insure this is no longer an
issue, but I would say in the sqlite-jdbc it could snare other users
who may also read ResultSet more then once. I learned long ago what
you don't think the user will do in, will often be done.

Test Case: http://dandymadeproductions.com/temp/
File: SQLite_JDBC.java

Thank you for your work with the sqlite-jdbc driver. By the way the
pull request I had opened for Date, Time, Literals, #33 can be closed.
The current code is no longer causing an issue, the Date changes must
have fixed the problem I was having with my application.

danap.
Dana M. Proctor
MyJSQLView Project Manager.

mkauf · 2015-12-12T13:08:49Z

I agree that this is a change in behavior of sqlite-jdbc. Sorry about that. But keep in mind that JDBC drivers don't need to support getBytes() for strings at all. Look at the table B-6 in the JDBC 4.1 specification ("Use of ResultSet getter Methods to Retrieve JDBC Data Types")

Behavior when reading a string before commit a4cf82d:

getBytes(); => the result is in the database's native encoding (see "PRAGMA encoding")
getString(); getBytes(); => always UTF-8

Behavior when reading a string after commit a4cf82d:

getBytes(); => the result is in the database's native encoding (see "PRAGMA encoding")
getString(); getBytes(); => always UTF-16

Maybe it would make sense to always return UTF-8 when getBytes() is called on a string?

getBytes(); => UTF-8
getString(); getBytes(); => UTF-8

I have created a pull request for this.

danap · 2015-12-12T19:35:43Z

Hello Michael,

The test case was only using strings to simplify the code to demonstrate the
behavior. As I said the code for my application only uses getString() to perform
a preliminary test for NULL content on all data types. I use getBytes() for BLOB
content, and getString(); getBytes(); => UTF-8 would just again produce malformed
data when working with Binary content, images and such.

The issue I think is the ResultSet read, first with getString() then getBytes()
does not seem to be resetting the storage container to reflect the last call
getBytes()?

In each of the databases my application supports, other then the ones mentioned,
for instance PostgreSQL I can before the first call for testing a NULL entry with
getString() and if not NULL then use the correct data type call, BLOB for instance
getBytes() and not have malformed data returned. This is a non-issue for me since
I have modified the code to just read the BLOB data type content once, which it
really should have been doing to begin with for efficiency.

I would not recommend the new pull request => getString(); getBytes(); => UTF-8

danap.

mkauf · 2015-12-13T13:16:25Z

getString(); getBytes(); => UTF-8 would just again produce malformed
data when working with Binary content, images and such

In my proposed pull request, the conversion to UTF-8 is done only for TEXT. Binary data (BLOB) is never modified. SQLite does not perform any conversions that would produce malformed data. See https://sqlite.org/c3ref/column_blob.html

The question is: What character encoding shall getBytes() use for TEXT? I propose to always use UTF-8, because that's what your original code expected. It's also the native format for databases created by sqlite-jdbc.

danap · 2015-12-14T18:32:13Z

Hello Michael,

In my proposed pull request, the conversion to UTF-8 is done only for TEXT. Binary
data (BLOB) is never modified. SQLite does not perform any conversions that would
produce malformed data. See https://sqlite.org/c3ref/column_blob.html

I'm sure it does not since I have already demonstrated with the test case that the
data stored is unmodified. Its the retrieval of the data as defined by the
test case provided.

In the test case I provided there is a method testBlobs() with a commented call
to writeRead_BinaryFile(). It calls for the selection of a file for import to
fill the BLOB field defined by the database. Run it, choose an image for data
and you will see the same result, with a call to getString() then getBytes(),
returns the same failed results. By the way a call to getObject() then getBytes()
does not produce a failure.

As far as BLOB fields I have no idea what a user may wish to place in them
in a database. They may arbitrarily just place UTF-8 characters instead of
data outside that range. All I know is when an inital call to getString()
then getBytes(), returns data that does not seem to match the stored content.

As indicated, somewhere after sqlite-jdbc 3.8.10.1/2 this behavior turned up.
It matters not to me at this point since it no longer effects my application,
MyJSQLView, because I now only do a single getBytes() call for a defined BLOB
field. Works fine! I find it to be nothing more then an oddity to be dealt with
like my other supported databases and their JDBC drivers.

The question is: What character encoding shall getBytes() use for TEXT? I propose to
always use UTF-8, because that's what your original code expected. It's also the
native format for databases created by sqlite-jdbc.

Then that is the answer to your question for a TEXT defined field.

danap.

mkauf · 2015-12-14T22:36:14Z

Hi danap,

thank you for the example program and the explanations. I have been able to reproduce the problem. "getString(); getBytes();" on a BLOB indeed changes the blob's data.

I will try to create a bugfix.

Regards,
Michael

danap · 2015-12-15T19:39:23Z

Hi danap,

thank you for the example program and the explanations. I have been able
to reproduce the problem. "getString(); getBytes();" on a BLOB indeed changes
the blob's data.

I will try to create a bugfix.

Regards,
Michael

Hello Michael,

I started thinking overnight about this issue. I did look at the sqlite-jdbc
code at one point when I had an issue with temporal values.

It appeared to me that the driver does a test on content by a typeof to
determine the type. I realize that even though a table may define a column
to be BLOB anything can go in the column, so the driver needs to figure that
out before processing. This is the issue I never liked about WEAK TYPE scripting
languages and in this case SQLite data storage. It provides a very versatile
ability to store whatever you want in the 'VARIABLE', but processing can be
hazards.

I briefly looked to see if my application could try to do a typeof, but
did not think it was possible or feasible. I rely on the defined Database
Meta Data TYPE & CLASS results. So if a user defines the table TYPE to be
BLOB I have to assume that is the type they will store there, so thereby
use getBtyes(). The use of a TEST, generic getString(), for NULL has been safe
in most of the database to not throw an error for any column, but poor choice.
Since it seems the driver in this case thinks I may wish to get TEXT for the
column and that is what stored there? So appears the issue I guess.

I really think the current code is acceptable, from my limited knowledge
since as I have said seen this type of behavior in a couple of other databases.
I now use a single read of getBytes() test for NULL then process accordingly.

The only possible issue that I can point out is, after reviewing the
SQLite docs this morning, TEXT can be stored as "(UTF-8, UTF-16BE or UTF-16LE)"
Java uses UTF-16 for strings, TEXT, and therefore if a column TYPE is defined
as BLOB and the user puts TEXT there then it may not be limited to UTF-8
only.

Dana M. Proctor
MyJSQLView Project Manager

…unexpected data type conversions. Don't use modified UTF-8, see xerial#61 . Convert Java Strings to "unmodified" UTF-8 and vice versa. Improve out-of-memory handling. Fixes xerial#78

unexpected data type conversions. Don't use modified UTF-8, see xerial#61 Convert Java Strings to "unmodified" UTF-8 and vice versa. Improve out-of-memory handling. Fixes xerial#78

xerial added the bug Something isn't working label Nov 30, 2015

mkauf mentioned this issue Dec 12, 2015

ResultSet.getBytes(): Return strings in UTF-8 #82

Closed

mkauf mentioned this issue Dec 16, 2015

Use UTF-8 functions of SQLite to avoid unexpected data type conversions #84

Merged

gitblit closed this as completed in #84 Jul 6, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PreparedStatement setBytes() #78

PreparedStatement setBytes() #78

danap commented Nov 30, 2015

xerial commented Nov 30, 2015

danap commented Dec 1, 2015

danap commented Dec 3, 2015

mkauf commented Dec 12, 2015

danap commented Dec 12, 2015

mkauf commented Dec 13, 2015

danap commented Dec 14, 2015

mkauf commented Dec 14, 2015

danap commented Dec 15, 2015

PreparedStatement setBytes() #78

PreparedStatement setBytes() #78

Comments

danap commented Nov 30, 2015

xerial commented Nov 30, 2015

danap commented Dec 1, 2015

danap commented Dec 3, 2015

mkauf commented Dec 12, 2015

danap commented Dec 12, 2015

mkauf commented Dec 13, 2015

danap commented Dec 14, 2015

mkauf commented Dec 14, 2015

danap commented Dec 15, 2015