Skip to content
This repository has been archived by the owner on Jan 22, 2019. It is now read-only.

JsonGenerationException: Unrecognized column when trying write a subset of a POJO's fields #37

Closed
mjball opened this issue Apr 29, 2014 · 10 comments
Milestone

Comments

@mjball
Copy link

mjball commented Apr 29, 2014

It does not seem possible to specify the columns to write when using CsvMapper to write a POJO. If I have a class with two fields:

class Example {
    String a;
    String b;
}

and a schema which only specifies one of those fields:

CsvSchema schema = CsvSchema.builder()
  .addColumn("a")
  .build();

then

Example example = /* ... */;
new CsvMapper().writer().withSchema(schema).writeValueAsString(example);

throws an exception like

com.fasterxml.jackson.core.JsonGenerationException: Unrecognized column 'b': known columns: ["a"]

But of course I only intended to write one of those two columns! Is there a fix or a writer configuration which can solve this?

@mjball mjball changed the title JsonGenerationException: Unrecognized column when trying to specify columns to write JsonGenerationException: Unrecognized column when trying write a subset of a POJO's fields Apr 29, 2014
@cowtowncoder
Copy link
Member

With Jackson the idea is usually to filter out properties at higher level (using @JsonIgnore on property; @JsonIgnoreProperties on class, or using JSON Views or JSON Filters).

But I can see how ability to ignore values based on schema would be useful as well.
I'll have to think of how easy this would be to do: problem is that currently schema is mostly just used for mapping names to/from column index, so I don't know if filtering is doable. But the idea makes sense.

@cowtowncoder
Copy link
Member

Come to think of this, of course it should be possible to add a simple flag in CsvSchema to say "ignore any unknown properties". And/or allow exclusion of specific names, if more granular control is desired. There is some work involved in adding state (to support ignoral of structured values), but it should be quite doable.

@curtis628
Copy link

I have this same exact issue. Any way to accomplish this?

@cowtowncoder
Copy link
Member

Right now it unfortunately requires suppresion by filtering, so that no attempt is made to write b; examples include use of @JsonView, @JsonFilter and such:

http://www.cowtowncoder.com/blog/archives/2011/02/entry_443.html

but I hope to implement something similar to what Avro module has (generator Feature.IGNORE_UNKNOWN).

@cowtowncoder
Copy link
Member

Created this:

FasterXML/jackson-core#164

for feature. Still need to add support here however.

@rob-baily
Copy link

In case anyone is interested in some code to use in the mean time see http://stackoverflow.com/questions/11791353/convert-java-object-to-csv. It uses the filter solution generically based on the schema. I added in a customization on the JacksonAnnotationIntrospector so it can be used without annotating objects but that is of course optional if you are able and want to annotate.

One thought I have on the proposed solution is that it seems like for serializing it should be the default for CSV to only use the fields in the schema since you are specifying what the output looks like in the schema. Not sure why it would make sense by default to record an error when the object has more fields than are in the CSV schema.

@cowtowncoder
Copy link
Member

Support for JsonGenerator.Feature.IGNORE_UNKNOWN was added in 2.5.0.

As to why defaulting to exception: my experience has been that quietly swallowing problems leads to harder to diagnose problems, compared to over-eager exceptions. Latter is easily recognized and configuration can be changed. The reverse tends to lead to strange cases where some data just disappears, and is later found to be due to things like naming mismatch.
But this is one area where tastes vary and opinions differ.

@cowtowncoder cowtowncoder added this to the 2.5.0 milestone Apr 20, 2015
@cowtowncoder
Copy link
Member

Also: for tracking purposes, #50 is the issue that was originally marked as implemented for 2.5.0.

@rob-baily
Copy link

Thanks for linking these. I did try it and it did not generate an error so that is helpful.

I guess we have a differing opinion (like you said) of what the error should be. I think for CSV schema is should really be reversed so that values in the schema that do not map to the object are flagged as errors as opposed to fields in in the object that are not in the schema raising an error. For example based on the object above:

  • Object properties: a, b
  • Schema columns: a - ERROR RAISED
  • Schema columns: a, b, c - NO ERROR RAISED

IMHO the second scenario is really the case where an error is being swallowed since we defined a column that doesn't exist but the code continues on and just ignores it. I could see this happening in cases where a schema might be shared or a property is moved from one object to another. I still think the first condition is not really an error because like XML if we define a subset of what is available we are allowed to go on. Otherwise as new properties are added it invalidates and previous code that was geared towards a specific list. I can see the issue being that if you always start from the object it is backwards from how schema processing works to define what is in that subset.

In any event I wanted to clarify a little more and see if it made more sense. :) Great module and a nice natural fit for those using Jackson so thank you!

@cowtowncoder
Copy link
Member

@rob-baily There is also one practical challenge, in that jackson-databind has no knowledge of (CSV) schema (it does pass a FormatSchema as requested, but has no understanding of its possible meaning), so that this information is only really available at low level streaming. This complicates handling in some cases, and makes it impractical (if not impossible) to integrate parts to check for likely incompatibility.

Interesting note on second problem, thanks!

However... it might be possible to add some functionality to validate compatibility ("is this CsvSchema compatible with this POJO?"). This would allow catching "impossible column" style problem(s).
Not sure if it'd make sense; if you think (and esp. if you have other suggestions for it), feel free to file a separate issue.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants