Pages in topic:   [1 2] >
How to decompile Encyclopaedia Britannica software program?
Thread poster: Reed James

Reed James
Chile
Local time: 23:57
Member (2005)
Spanish to English
Jan 9, 2014

Hello. I am the owner of the Encyclopaedia Britannica on CD. It is a great application, but I would like to be able to extract all of the articles from the program in order to view them separately and/or index them. Now, each article is an HTML file, so it seems like it's doable. However, I took a look at the data files, and had no clue as to how to find each individual article.

Any suggestions?

Thanks!

Reed


Direct link Reply with quote
 

Natalie  Identity Verified
Poland
Local time: 04:57
Member (2002)
English to Russian
+ ...

Moderator of this forum
Violation of the copyright and user's license Jan 9, 2014

Please check the license of your software, I am sure you will find something like this:
"You may not decompile, reverse engineer or disassemble any software..."


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 04:57
English to German
Please clarify your question Jan 10, 2014

Reed D James wrote:

Now, each article is an HTML file, so it seems like it's doable. However, I took a look at the data files, and had no clue as to how to find each individual article.


"The data files" means one HTML file per article? So, you could have Windows' search function search (and index) a folder containing all these files.


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 04:57
English to German
Copyright Jan 10, 2014

Natalie wrote:

Please check the license of your software, I am sure you will find something like this:
"You may not decompile, reverse engineer or disassemble any software..."


That's right. But in some cases local law supersedes contractual regulations.

Here in Germany decompiling etc. it is explicitely allowed (see section 69e UhRG), **IF** the decompiling is necessary in order to make something inter-operable with the copyrighted product. So, if that "something" is any other lookup software, you may decompile the copyrighted software in order to gain information on how to make that other lookup software usable.


Direct link Reply with quote
 

Reed James
Chile
Local time: 23:57
Member (2005)
Spanish to English
TOPIC STARTER
I want to know where all the individual HTML files are stored Jan 10, 2014

Rolf Keller wrote:

"The data files" means one HTML file per article? So, you could have Windows' search function search (and index) a folder containing all these files.


You would think that there would be a folder with each individual HTML file that could just be copied into another folder. However, that's not the way it works. I'm just asking how to extract this data for the program so I can do that.

I'm not looking to profit from this data, I just want to be able to do what I want with it instead of having to search in the Britannica program.

[Edited at 2014-01-10 11:23 GMT]


Direct link Reply with quote
 

Natalie  Identity Verified
Poland
Local time: 04:57
Member (2002)
English to Russian
+ ...

Moderator of this forum
Copyright again Jan 10, 2014

Reed D James wrote:
I'm not looking to profit from this data, I just want to be able to do what I want with it instead of having to search in the Britannica program.


The question is not in making profit or not; the question is if decompiling the program is legal in your country and in accordance with the user license you hold. If it is not, this cannot be discussed in this forum.


Direct link Reply with quote
 
FarkasAndras
Local time: 04:57
English to Hungarian
+ ...
It is Jan 10, 2014

Natalie wrote:

Reed D James wrote:
I'm not looking to profit from this data, I just want to be able to do what I want with it instead of having to search in the Britannica program.


The question is not in making profit or not; the question is if decompiling the program is legal in your country and in accordance with the user license you hold. If it is not, this cannot be discussed in this forum.

Let's assume that it is. Companies put all sorts of ludicrous conditions in the EULA, a lot of which won't stand up to scrutiny or challenge in court in many jurisdictions. Even if there's a "do not decompile" clause in the EULA and it's enforceable in the country in question, decompiling still isn't illegal. It's just an infringement of the contract (the EULA).
It's not our job to police anyone's actions on behalf of Britannica. My approach is that the OP bought the software, it's his to dissect. Of course he is not allowed to distribute the fruits of his labour but he doesn't wish to.

I had a cursory look at the files a couple of years ago, and found that it's not trivial. IIRC I was able to identify the data files but I couldn't extract anything from them. I suspect that it will take some serious hacking skills to extract them. It's not just a big .rar file with HTML or XML files in it or something elementary like that. If you have computer scientist/coder/hacker friends you could ask them to have a look. If you've found the files, you can try and look up the file extension online and try and guess what format it is. You can also post here but I doubt that we can help you.

By the way, Wikipedia provides dumps of its entire database at no cost, and they are in an open format of course. You can do with them as you please. For instance, you can make glossaries out of them.

[Edited at 2014-01-10 12:49 GMT]


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:57
Member (2006)
English to Afrikaans
+ ...
Which version? Jan 10, 2014

Reed D James wrote:
I [have] the Encyclopaedia Britannica on CD. ... I would like to be able to extract all of the articles from the program in order to view them separately and/or index them.


Let's assume for the moment that you want to know is not illegal in your country (we can only find out when we know more about what it is that you're trying to accomplish). If you hadn't used the word "decompile" in your thread title, the copyright police might not have noticed... since what you seem to be asking has nothing to do with the ordinary meaning of the word "decompile".

[From a strictly linguistic point of view, unzipping a zip file is "decompiling" it, but the computer people usually mean something more specific when they say "decompile", which doesn't seem to be what you're trying to do.]

So, which version of Encyclopaedia Britannica do you have? It may be that the different versions have different ways of getting to the content.


Direct link Reply with quote
 

Reed James
Chile
Local time: 23:57
Member (2005)
Spanish to English
TOPIC STARTER
2013-the latest version I think Jan 10, 2014

Samuel Murray wrote:

Reed D James wrote:
I [have] the Encyclopaedia Britannica on CD. ... I would like to be able to extract all of the articles from the program in order to view them separately and/or index them.


Let's assume for the moment that you want to know is not illegal in your country (we can only find out when we know more about what it is that you're trying to accomplish). If you hadn't used the word "decompile" in your thread title, the copyright police might not have noticed... since what you seem to be asking has nothing to do with the ordinary meaning of the word "decompile".

[From a strictly linguistic point of view, unzipping a zip file is "decompiling" it, but the computer people usually mean something more specific when they say "decompile", which doesn't seem to be what you're trying to do.]

So, which version of Encyclopaedia Britannica do you have? It may be that the different versions have different ways of getting to the content.





Anyway, since it's such a touchy subject, I think I'll just figure out a slow way of doing it, i.e. copying and pasting. Thanks for your input.


Direct link Reply with quote
 

Michael Joseph Wdowiak Beijer  Identity Verified
United Kingdom
Local time: 03:57
Member (2009)
Dutch to English
+ ...
Use a macro recorder to automate the copy/pasting Jan 10, 2014

If you are going to try copying/pasting, you could also use something like AutoHotkey and an AHK script recorder to do it automatically. If you can break down the steps needed to copy/paste the contents out of the program, it should be possible to create an AutoHotkey script to do it for you.

For example, if all you need to do is:

1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
4. Save the clipboard contents to a separate file with separate file name (there are AHK scripts for this)
5. Esc,
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
etc.


... you can quite easily automate this.

Then all you have to do is start the script and wait for a few hours.

Michael

http://www.macrocreator.com/
http://www.autohotkey.com/board/topic/79763-macro-creator-v411-automation-tool-recorder-writer/

[Edited at 2014-01-10 13:25 GMT]


Direct link Reply with quote
 

Reed James
Chile
Local time: 23:57
Member (2005)
Spanish to English
TOPIC STARTER
It's a little more complicated than that… Jan 10, 2014

Michael Beijer wrote:

If you are going to try copying/pasting, you could also use something like AutoHotkey and an AHK script recorder to do it automatically. If you can break down the steps needed to copy/paste the contents out of the program, it should be possible to create an AutoHotkey script to do it for you.

For example, if all you need to do is:

1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
4. Save the clipboard contents to a separate file with separate file name (there are AHK scripts for this)
5. Esc,
1. Press the down arrow on your keyboard (or something similar)
2. Ctrl+A
3. Ctlr+C
etc.


... you can quite easily automate this.

Then all you have to do is start the script and wait for a few hours.

Michael

http://www.macrocreator.com/
http://www.autohotkey.com/board/topic/79763-macro-creator-v411-automation-tool-recorder-writer/

[Edited at 2014-01-10 13:25 GMT]


Thanks for the tip, Michael. I'm afraid it's a little more complicated than that. You see, there are two columns or sections to this program. You have the article titles on the left, and article itself on the right. Unfortunately, if you just hit the down arrow on the article title column, the article content on the right column or pane will not refresh; you have to click on the title column for this to happen. So I'm confused as to how to get to each title with its corresponding content without using the mouse. Because if I have to program it using mouse clicks, then I really don't know how that's going to work.

BTW: I use Macro Express, a very competent and complete application.


Direct link Reply with quote
 

Rolf Keller
Germany
Local time: 04:57
English to German
I'm confused ... Jan 10, 2014

Reed D James wrote:

I want to know where all the individual HTML files are stored


I'm confused. You haven't seen the HTML files yet? How do you know that there are such files?


Direct link Reply with quote
 

Samuel Murray  Identity Verified
Netherlands
Local time: 04:57
Member (2006)
English to Afrikaans
+ ...
Okay, an answer Jan 10, 2014

Reed D James wrote:
I [have] the Encyclopaedia Britannica on CD. ... I would like to be able to extract all of the articles from the program in order to view them separately and/or index them. Now, each article is an HTML file...


I don't think every article is an HTML file. It is hypertext, yes, but I see no indication that it is HTML. You can, of course, press Ctrl+S at any time and it will save the current article as an HTML file, but that HTML file will not have any links in it and will lack some of the other features as well.

According to EB's web site, the 2013 DVD contains over 100 000 articles, so it's going to take you a very long time to extract it all. Even if you can extract one article every 10 seconds for 3 hours a day, it will still take you 100 days to do it.

I see your problem with the left column (titles) and the right column (article). There is no keyboard shortcut for loading an article -- you have to click it with the mouse. However, I think I have found a way to ensure that you can click the next article every time: if you can make the mouse click the first line of the title column, and then click the "down" scrollbar at the bottom of the screen once, then the title list moves up one title, and then you can click on the "first" line of the title column again to load the next page, and so on. Fortunately, Ctrl+S works anywhere, so you can press Ctrl+S after you clicked, and it will save an HTML file. You'll just have to name the HTML files when you save them. And... if you've clicked on an article title and press Ctrl+C, it will copy the name of that article, so you can name the "save as" HTML file for the name of the article exactly.

You'll end up with a bunch of HTML files, though, so you'll have to make sure you have an indexing program that can index HTML files.

Can your macro language do this, or do you want one of us to write it in AutoIt?

I think you'll find that you may THINK that you'll have great benefit from having all the articles in your own database format, but I think ultimately it would be best to simply use the company's own software, or alternatively try to buy a web-based subcription.

However, I took a look at the data files, and had no clue as to how to find each individual article.


I don't think the individual articles are stored as individual files or chunks of extractable data, either on the installation DVD or on your hard drive's installation folder. I doubt if any encyclopedia publisher would be stupid enough to do that.

Samuel


[Edited at 2014-01-10 17:57 GMT]


Direct link Reply with quote
 

Alan Halls
Germany
Local time: 04:57
German to English
Legal problem, definitely Jan 11, 2014

For some, possibly a minor point, but if FarkasAndras says:

"My approach is that the OP bought the software, it's his to dissect."

I would have a closer look at the EULA conditions. The general legal situation is that you buy a licence to USE the software. You don't own Microsoft Office, for example, just because you've paid for a licence.

I'm all in favour of open-source software where that is the intention of the people who invent and distribute it. If it is a commercial product, I would tend to leave well alone. I also use EB for my own reference purposes and just leave it running in the background.


Direct link Reply with quote
 

Reed James
Chile
Local time: 23:57
Member (2005)
Spanish to English
TOPIC STARTER
That isn't fair Jan 12, 2014

Alan Halls wrote:

For some, possibly a minor point, but if FarkasAndras says:

"My approach is that the OP bought the software, it's his to dissect."

I would have a closer look at the EULA conditions. The general legal situation is that you buy a licence to USE the software. You don't own Microsoft Office, for example, just because you've paid for a licence.

I'm all in favour of open-source software where that is the intention of the people who invent and distribute it. If it is a commercial product, I would tend to leave well alone. I also use EB for my own reference purposes and just leave it running in the background.



The way I see it, if I own a product, then it's mine to do whatever I want with it in the privacy of my own home. If I own a pair of Levi's jeans, and they get old, and I get my scissors out and make a pair of cutoffs out of them, I go ahead and do it. Now, if I were to buy Levi's wholesale, set up a factory and hire people to make cutoffs out of them with industrial machinery, and then sell the cutoffs for my own profit under my own brand, that would be unethical and illegal.

What is the difference between decompiling and/or copying and pasting from a software program and converting a PDF to an editable document to be indexed and searched by the owner? How about buying the set of encyclopedias in print form and making photocopies to take on the road with you? Haven't we all done something like that? What if I had a prodigal memory and I took it upon myself to read and memorize each and every Encyclopaedia Britannica article and then profit immensely from all the knowledge I gained? Isn't that decompiling in a sense?

As for the macros, I found them to be buggy, even when I think the code was legit. Somehow, the computer seized up when the macro told it to save the article, even though the timing was exactly the same as when I did it manually. No matter.

In short, I have closed this discussion, at least on my end. I'm going to go read my Encyclopaedia Britannica instead.


Direct link Reply with quote
 
Pages in topic:   [1 2] >


To report site rules violations or get help, contact a site moderator:


You can also contact site staff by submitting a support request »

How to decompile Encyclopaedia Britannica software program?

Advanced search






SDL Trados Studio 2017 Freelance
The leading translation software used by over 250,000 translators.

SDL Trados Studio 2017 helps translators increase translation productivity whilst ensuring quality. Combining translation memory, terminology management and machine translation in one simple and easy-to-use environment.

More info »
Protemos translation business management system
Create your account in minutes, and start working! 3-month trial for agencies, and free for freelancers!

The system lets you keep client/vendor database, with contacts and rates, manage projects and assign jobs to vendors, issue invoices, track payments, store and manage project files, generate business reports on turnover profit per client/manager etc.

More info »



Forums
  • All of ProZ.com
  • Term search
  • Jobs
  • Forums
  • Multiple search