Strategies for dealing with quote marks from Excel files for Text Import
I recently ran into an issue with some text imports from an Excel Tab Separated Value (TSV) text file, and am wondering if anyone else can clarify or provide some suggestions for workflow.
I've been aware that Microsoft Excel has a very annoying behavior when exporting to a TSV. Specifically, if you are importing data that was stored in the cell with a semicolon or comma, it will "encapsulate" the data by putting plain quote marks at the lead and end of the data string. For example if the value in the column you want to import into the Keyword field is:
boy;water buffalo;stream;farmhand
You will find in the Microsoft Excel exported TSV:
"boy;water buffalo;stream;farmhand"
When matching up the Metadata field with the Content in the columns within the Metadata Import dialog/window, I've noticed that if I use the arrow buttons next to the Start from row # value on the right side, the value in the Content field changes, and I don't see the quotes appear in that field.
So, it appears that FotoStation seems to "know" that these quote marks are not required and ignores them. However, there are other fields which also have/allow multiple values, like Creator (Byline), and the "Creator Contact Info" fields for email address, phone and web URL's which also allow multiple entries and are separated by semicolons. Unfortunately, these field values (when viewed in the Content column) do show these quote marks, and when the text is imported they are included as part of the value in the embedded metadata.
One solution is simply to export a TSV from Google Sheets, as they don't do this to the file when exporting.
I've not been able to find any hidden switch or option within Microsoft Excel to prevent this behavior (if there is, please let me know). I did find the following article, but this seems to be written for Windows users https://excel.tips.net/T002237_Getting_Rid_of_Extra_Quote_Marks_in_Exported_Text_Files.html
If I'm only dealing with one metadata field, and I know that data shouldn't have any quote marks, I can open the resulting TSV in a plain text editor, and do a "Find and Replace" operation to find all " marks, and replace them with <nothing>. However, if it's a larger group of data, and may also include fields that might have actual quote marks in the text, doing the find and replace is going to remove those "actual quotes", as well as the ones that Excel inserted into my data.
One thought I had was to do a Find and Replace in Excel first, and replace all the quote marks in the data (the ones that are supposed to be there) with some other character (perhaps an * or # - something not commonly used). Then export the TSV from Excel. Open that file in a plain text editor, and do a Find and Replace for all the quotes that Excel inserted (for whatever reason). Then do a Find and Replace for all the fake quotes (like the asterisk) and replace them with new quote marks, before saving, and then using that in FotoStation.
I know my solution at present will be to just switch to Google Spreadsheets, or save out the Excel, open in Google Sheets and export TSV.
Though if the programmers are looking for feature requests, it would be great if there was a way to have FotoStation deal with these "extra Excel quote marks" in data columns other than the Keywords field the same way.
Short of that, are there other options to consider for those who are stuck with Excel (especially for very large data sets).
thanks,
David
-
Since FotoStation is using the semi-colon as default separator for items in a bag field, we understand that this becomes a problem after editing the data in Excel. We have now implemented improvements where we trim away the quote marks surrounding the whole text. On export, quotes are now encoded as " this way we avoid removing quotes which should be in the text.When importing from other systems, just replace quotes (") in the original text with " before the import.
Please sign in to leave a comment.
Comments
2 comments