Pages in topic:   [1 2] >
converting csv files
Thread poster: Jennifer Barnett

Jennifer Barnett  Identity Verified
France
Local time: 00:24
Dutch to English
+ ...
Jan 16, 2016

I've a problem with a csv file.

The project came from an agency that uses a web platform for posting jobs and communicating the progress of the various stages. Rates are on the low side!

The csv could not fully be loaded into OmegaT; only the first column content appeared.
I saved a copy as xlsx which did not load at all.
I then saved a copy as html and this loaded OK, but came with a lot of extra characters that were not visible on the csv file. These were all carefully included in the translation to avoid changing the formatting, but which slowed the process down enormously. Eg.:
;;;;;;;;;;
 ;;;;;;;;;
;;;;;;;;;;
 ;;;;;;;;

and after each sentence too. In retrospect, that was probably the only important format sign

The translated file was then saved as csv and also html. The extra characters did not appear on the html file, but the table form has disappeared, leaving plain text, but laid out as the original.
After a few deadline extensions, it was finally sent off on 1 Jan and once more on 5 Jan (again expressing my worries) when the PM mailed to ask for it. Later that day I was also asked to undertake more jobs for the same client, but I declined due to the laborious work (for me) At such a low rate.

I had explained my difficulties to the PM at the outset and was just told not to change any of the formatting characters.

After a week or so, the web platform informed me that my translation had been reviewed and accepted and I was paid. Last night I received the following email.

"Dear Translator,
Due to big issues with the translated file "translation KK" that happened during the translation, we are unable to pay you for your work. The CSV file has been modified, making it unusable for the customer and obliges us to have it translated again, which is a big loss of time and money for us.
I'm sorry, "

So my first question is, what was my problem with the csv file? OmegaT is supposed to accept this kind of file.

Second question, is xlsx the way to go? I also saved the translation as xlsx and that looks the same as the original csv. file (Don't recall why I just didn't send that as well, but I was exhausted by the end). Just want to check on this before I send it as a hopeful rescue mission.

Third question, is it really so difficult for IT people to reconvert/edit out the superfluous characters, or must they also do it character by character?

Fourth question: is it unreasonable of me to expect that they should have contacted me immediately to sort things out - eg, requesting my TMs to facilitate the second translation?

Fifth question, is this a case of shared responsibility in that I warned of the problem at the outset and just before I carried out my translation check, I asked if all the extra ',,,,,,::::' etc were really needed in the text? Was just told not to change any format stuff; ie, it seems the PM didn't know nor asked the client.

For the time being, I certainly won't be accepting any more csv files. Sorry if this is too long ...

Any help would be greatly appreciated.

Cheers,
Jennifer


 

Susan Welsh  Identity Verified
United States
Local time: 18:24
Member (2008)
Russian to English
+ ...
Ask the Yahoo group Jan 16, 2016

https://groups.yahoo.com/neo/groups/OmegaT/conversations/messages

More people monitor that group and someone will be able to help you.


 

Didier Briel  Identity Verified
France
Local time: 00:24
Member (2007)
English to French
+ ...
Generic CSV is not a recognised format Jan 16, 2016

Jennifer Barnett wrote:

So my first question is, what was my problem with the csv file? OmegaT is supposed to accept this kind of file.

No.

OmegaT only accepts specific CSV files for localising Magento (the filter is called "Localisation CSV Magento CE").

Second question, is xlsx the way to go?

Probably, or any compatible spreadsheet format: LibreOffice/OpenOffice's .ods or Excel's XML Spreadsheet 2003.

It's hard to be sure without actually seeing the file. In doubt (with a new unfamiliar format), one should always try to do a roundtrip with the client. I.e., translate a few sentences, and then send back the partially translated file to the client, to check everything is OK.

Didier


 

Dan Lucas  Identity Verified
United Kingdom
Local time: 23:24
Member (2014)
Japanese to English
Confused Jan 16, 2016

I'm not sure exactly what your situation is. You say "the web platform informed me that my translation had been reviewed and accepted and I was paid" then you say that the client says "we are unable to pay you for your work". Were you paid, or not?
Jennifer Barnett wrote:
So my first question is, what was my problem with the csv file? OmegaT is supposed to accept this kind of file.

Very hard to say without looking at it. CSV and other text formats can cause problems, but in this case - two European languages - it shouldn't have been that complex. If the CSV is regular and valid i.e. each line contains an equal number of fields, each separated by commas, there shouldn't (in theory) have been a problem.

Incidentally, the software I use (Studio 2014) or another solid CAT tool like CafeTran would have allowed you to import a valid CSV file without problems.

I don't fully understand your reference to ";;;;;;;;;;" characters. They may have had something to do with the formatting but, again, not possible to say without an explanation of the file format. If you didn't change or delete them the output should also have been fine.
Second question, is xlsx the way to go?

It seems unlikely to me that this would have made any difference. The client seems to be claiming that the structure of the CSV was materially altered, which might have happened with any format of file.
Third question, is it really so difficult for IT people to reconvert/edit out the superfluous characters, or must they also do it character by character?

It shouldn't be, if the original CSV structure is still in place. Probably half an hour to knock up a Perl or Python script to fix things, depending on what's broken (if anything). But what are the superfluous characters to which you refer and how did they get there?
Fourth question: is it unreasonable of me to expect that they should have contacted me immediately to sort things out?

No, but before asking us you should have demanded and received a detailed technical explanation of what it is that they allege you did wrong. They can't simply decide not to pay, unilaterally.
Fifth question, is this a case of shared responsibility in that I warned of the problem at the outset and just before I carried out my translation check, I asked if all the extra ',,,,,,::::' etc were really needed in the text? Was just told not to change any format stuff;

Did you change the format stuff and/or the ,,,;;; or did you leave them untouched?

Regards
Dan





[Edited at 2016-01-16 18:12 GMT]


 

Milan Condak  Identity Verified
Local time: 00:24
English to Czech
csv file is not supported in OmegaT Jan 16, 2016

Jennifer Barnett wrote:

I've a problem with a csv file.
The csv could not fully be loaded into OmegaT; only the first column content appeared.

Jennifer



You can see list of filters for supported files:

http://www.condak.cz/p-preklady.php

csv file is not supported type of file. Someone can translate some type of csv.

Milan


 

Jennifer Barnett  Identity Verified
France
Local time: 00:24
Dutch to English
+ ...
TOPIC STARTER
response to your responses Jan 16, 2016

Thank you all for responding.

Didier:
You're right about sending an sample, as this PM clearly did not understand the consequences of my warnings. Thanks for that.

I looked again at the OmegaT Help and, indeed, csv is not listed under the supported file types. I no longer remember the reference that gave the impression that csv was supported, but I thought it was the OmegaT Help.

Dan:
Yes, I was paid. This email came a few days after I was paid.

The ,,,, and ;;;; characters appeared in the OmegaT window in the source text, but not in the target htm document. The source text was a htm version of the original csv file. When the htm target text was then saved as a csv file, the extra characters reappeared in the csv version. I submitted both the htm and the csv versions.

I posted this in preparation for discussing the issue with the agency on Monday.

I did not dare change anything added by the conversion.


In conclusion, I just hope that the xlsx version (no extra characters) will save the day. I'll report back on the result if it turns out to be useful to OmegaT users.

Kind regards,
Jennifer


 

Dan Lucas  Identity Verified
United Kingdom
Local time: 23:24
Member (2014)
Japanese to English
Avoid in future Jan 17, 2016

Jennifer Barnett wrote:
Dan:
Yes, I was paid. This email came a few days after I was paid.

Well, that's the main thing: you've been paid. After all, you completed the translation.
When the htm target text was then saved as a csv file, the extra characters reappeared in the csv version. I submitted both the htm and the csv versions.

I think I misunderstood your earlier post. I suppose that in saving as html OmegaT could have deliberately introduced some extraneous characters to the file rather than just saving the file as plain text with an html extension?

Saving the file when you first downloaded it as an xlsx file would probably have been the right way to go, although if you don't have Excel to hand then it makes it more difficult to check the content of the file.

Probably the lesson is not to take on jobs like these where technical issues loom as large as the act of translation itself. If you have no experience in editing or creating csv files and if your software doesn't support csv natively then you're setting yourself up for a difficult time.

To be honest, it doesn't sound like a worthwhile client to me. Professional clients have a better understanding of the formats they are using, give better instructions in the first place and don't make unilateral attempts to claw back payment. Best of luck anyway.

Regards
Dan


 

Jennifer Barnett  Identity Verified
France
Local time: 00:24
Dutch to English
+ ...
TOPIC STARTER
definitely avoid in future Jan 17, 2016

Thank you Dan.

Obviously the cause of the problem was unknowingly trying to force non-compatible CAT software to do a task it can't. A case of a faulty nut behind the wheel.

I've certainly learn a my lesson as far as csv files go. Never again!

As for the agency, I only took on the job because I currently really need work and they do pay quickly. Other 'simple' projects had gone smoothly. Seems that it's especially the tricky projects that show up the lack of professional support offered by cheapo agencies.

Cheers,
Jennifer


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:24
Member (2006)
English to Afrikaans
+ ...
Jennifer's post fixed Jan 17, 2016

Jennifer Barnett wrote:
These were all carefully included in the translation to avoid changing the formatting, but which slowed the process down enormously. Eg.:
;;;;;;;;;;
<p> </p>;;;;;;;;;
;;;;;;;;;;
<p style=""text-align: justify;> </p>;;;;;;;;

and <p> after each sentence too.


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:24
Member (2006)
English to Afrikaans
+ ...
Some more comments Jan 17, 2016

Jennifer Barnett wrote:
So my first question is, what was my problem with the CSV file? OmegaT is supposed to accept this kind of file.


There is no single format called just "CSV". CSV is a very loosely defined format, and different implementations of CSV are all slightly different.

OmegaT can handle one very specific CSV format, namely "Magento CE Locale" CSV. It is likely that your CSV was not a Magento CSV file, but OmegaT would have attempted to load it anyway, if you had a check mark next to "Magento CE Locale" in the File Filters dialog.

CSV is a bad, bad format to use for anything, unless clients that use it take steps to ensure that the files can be read without problems by anyone who must edit those files. It's tempting for a client to think that Excel can open CSV files, but even different versions of Excel interpret CSV files differently.

Second question, is XLSX the way to go?


It all depends on whether your version of Excel can open the client's version of the CSV file in such a way that no information is lost, and on whether Excel can save the final file as the correct type of CSV again.

Unfortunately, if you have accepted the job as a "CSV job", then it is your responsibility to ensure that whatever programs you use to open the CSV file do not break the CSV file. This is why you should not accept CSV jobs if you're not intimately familiar with the CSV format *and* with the client's implementation of it (which, by the way, is almost never the case... so most CSV jobs are gambles anyway).

Third question, is it really so difficult for IT people to reconvert/edit out the superfluous characters ...?


Yes, because they don't speak your language.

And, yes, because those IT people probably don't know the CSV format well enough to "hack it" either, i.e. it's not unlikely that they use some specialised software to edit the CSV, and only compliant CSV files will work on their program.

Fourth question: is it unreasonable of me to expect that they should have contacted me immediately to sort things out -- e.g. requesting my TMs to facilitate the second translation?


It depends on whether they know that you're working with a TM system.

It also depends on whether they trust you to be able to fix the problem, if you hadn't been able to deliver a problem-free version in the first place.

Fifth question, is this a case of shared responsibility in that I warned of the problem at the outset and just before I carried out my translation check, I asked if all the extra ',,,,,,::::' etc were really needed in the text? Was just told not to change any format stuff; ie, it seems the PM didn't know nor asked the [end-]client.


You could try that approach.

Did the client tell you to use OmegaT?

Did the client tell you in which program to edit the CSV file?



[Edited at 2016-01-17 18:44 GMT]


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:24
Member (2006)
English to Afrikaans
+ ...
@Jennifer and @Dan Jan 17, 2016

Dan Lucas wrote:
If the CSV is regular and valid i.e. each line contains an equal number of fields, each separated by commas, there shouldn't (in theory) have been a problem.


Jennifer, note how Dan defines a "valid" CSV file as one in which each line (called a "record") has an equal number of fields. This is what the RFC 4180 specification of the CSV file format also says, but RFC 4180 is but one specification of CSV (and it's a relative late-comer, too).

See also:
https://en.wikipedia.org/wiki/Comma-separated_values#History
https://en.wikipedia.org/wiki/Delimiter-separated_values

In fact, the fact that your CSV file resulted in many cases of ";;;;;;" leads me to suspect that your CSV files were not "comma" delimited but "semi-colon" delimited.

Incidentally, the software I use (Studio 2014) or another solid CAT tool like CafeTran would have allowed you to import a valid CSV file without problems.


The key word here is "valid", and the question then becomes "valid according to which specification?".

I remember when I worked at a localisation firm that I had endless trouble reconciling different CSV dialects. I recall one CSV editor whose rule was "if any field is quoted, then all fields must be quoted" and another CSV editor whose rule was "if a field does not require to be quoted, then it must not be quoted". Naturally these two programs produced mutually incompatible CSV files. I also recall dealing with differences in line breaks between CSV files produced on Linux machines versus CSV files produced on Windows machines.

Incidentally, the software I use (Studio 2014) or another solid CAT tool like CafeTran would have allowed you to import a valid CSV file without problems.


Any CAT tool capable of translating a generic CSV file would need some way to define the columns, and OmegaT has no such capability.

Out of interest, Dan, can CafeTran import these two CSV examples appropriately (both of which are valid in terms of RFC 4180)?

key 1,source text 1,target text 1
key 2,source text 2,
key 3,,target text 3

(i.e. after translation, the translation should be in column 3, and column 2 should contain the source text)

key 1,source text 1,source text 2
key 2,source text 3,
key 3,,source text 4

(i.e. after translation, the translation should be in column 2 and 3, and the source text should not be present in the file)

Trados 2015 is capable of translating example 1, but only if you edit the CSV file definition before creating a project. Trados 2015 does not ask the user (e.g. with a preview function) which column contains which type of text. Trados 2015 is entirely incapable of translating example 2, because you can define only one target text column in Trados 2015's CSV file definition.

Samuel


[Edited at 2016-01-17 18:48 GMT]


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:24
Member (2006)
English to Afrikaans
+ ...
@Jennifer Jan 17, 2016

Jennifer Barnett wrote:
The source text was an HTM version of the original CSV file.


Which program did you use to convert the CSV file to HTM format?

When the HTM target text was then saved as a CSV file, the extra characters reappeared in the CSV version.


Which program did you use to convert the HTM file to CSV format again?


[Edited at 2016-01-17 18:26 GMT]


 

Dan Lucas  Identity Verified
United Kingdom
Local time: 23:24
Member (2014)
Japanese to English
Fair points Jan 17, 2016

Samuel Murray wrote:
Jennifer, note how Dan defines a "valid" CSV file as one in which each line (called a "record") has an equal number of fields. This is what the RFC 4180 specification of the CSV file format also says, but RFC 4180 is but one specification of CSV (and it's a relative late-comer, too).

Samuel, thank you for those additional points. In other contexts, when cleaning data for use in other apps, I have regularly dealt with irregular text formats with different numbers of fields and different separators, but that's not relevant to translation or CAT tools.

In translation I have never needed (thank goodness) to tackle anything other than the "RFC 4180" format you mention above, and that only twice, for small jobs.

When I tried out CafeTran I threw a RFC 4180 type file at it and it seemed to work fine, but it was a simple test. Perhaps one of our enthusiastic CT users can comment on that.

If I were faced with a delimited text file with a complex format (possibly like that encountered by Jennifer?) I would probably if could reshape it into a more regular format using R or a combination of Python with pandas, rather than rely on the CAT tool to get it right.

But my first instinct in such circumstances would be to walk away.

Regards
Dan


 

Samuel Murray  Identity Verified
Netherlands
Local time: 00:24
Member (2006)
English to Afrikaans
+ ...
Good advice: walk away Jan 18, 2016

Dan Lucas wrote:
But my first instinct in such circumstances would be to walk away.


This is good advice, for CSV. Either walk away or ask the client to send you the text in a different format, e.g. XLS or XLSX. Let the client be in charge of conversions.


 

Jennifer Barnett  Identity Verified
France
Local time: 00:24
Dutch to English
+ ...
TOPIC STARTER
@Samuel Jan 18, 2016

Thank you Samuel for your expert information.
Obviously I should have posted before taking on the job! But of course I would not have taken it at all if had I not mistakingly believed that OmegaT could handle csv files. Just want to make that perfectly clear.

And thanks for fixing my quote: just shows how tricky csv can be.

Samuel: Did the client tell you to use OmegaT?
No. That would be unusual, wouldn't it?

Samuel: Did the client tell you in which program to edit the CSV file?
No. In fact the instruction accompanying the job was, "no instructions from the client."

Samuel: Which program did you use to convert the CSV file to HTM format?
The csv file opened in Excel so that was the file (xls) used for converting into html which was used for the translation. The xls file couldn't be loaded by OmegaT. Yesterday I finally realised this was because it was not xlsx (years since I translated a xls file, so was not up to date - deep sigh). I then saved the original source file (.cvs in Excel) as xlsx and this loaded into OmegaT without any apparent extra characters. Perhaps this would have avoided the problem, perhaps not.

Which program did you use to convert the HTM file to CSV format again?
I imported it into Excel then saved it as csv.

[Edited at 2016-01-18 10:10 GMT]


 
Pages in topic:   [1 2] >


There is no moderator assigned specifically to this forum.
To report site rules violations or get help, please contact site staff »


converting csv files

Advanced search






SDL MultiTerm 2019
Guarantee a unified, consistent and high-quality translation with terminology software by the industry leaders.

SDL MultiTerm 2019 allows translators to create one central location to store and manage multilingual terminology, and with SDL MultiTerm Extract 2019 you can automatically create term lists from your existing documentation to save time.

More info »
WordFinder Unlimited
For clarity and excellence

WordFinder is the leading dictionary service that gives you the words you want anywhere, anytime. Access 260+ dictionaries from the world's leading dictionary publishers in virtually any device. Find the right word anywhere, anytime - online or offline.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search