About The Railfan.net
ERIE / DL&W / E-L RAILROAD MAGAZINE Archive
02-25-14 The project is again in idle mode due to low interest. Steve Mailly sent scans
of the September 1952 issue which has been added today. Thank you Steve!
10-09-12 Just a quick update, the Project is back in gear after a nearly 10 year
hiatus! i have changed the methodology and the newly scanned magazines will allow
copy and pasting text, but I don't guarantee good results because of the way the
OCR process encodes things, your mileage may vary... - Henry
Thank you to Ron Dukarm, Rich Tubbs, Vince Lee, Paul Tupaczewski and Steve Timko for generously
loaning me issues for scanning that I don't have and Thank you to Steve Mailly for
scanning the September 1952 issue. Most especially, a heartfelt Thank You
to Tammy Priebe for helpig me to scan the issues!
Note: Much of this page is horribly out of date, most of it was written back in 2002!
How It All Started
By Henry Priebe
When I first aquired a small collection of Erie Railroad Magazines
I had the idea to put them on the web in some form, I just didn't
know any practical way to go about doing that.
Over the years I have aquired a lot of software and hardware for
digitizing all sorts of things for use on the web. In early 2002
I got wrapped up in a historical map project involving thousands
of PDF files which needed to be combined into volumes of maps. That
started me off working with Adobe Acrobat ™.
While going through some of my Erie Magazine issues, after completing
the PDF map project, the little light bulb in my head went off (play
the Tool Time Tim Taylor "OH-NO" sound in your head here) and off to
the flatbed scanner I went.
I had never actually tried to scan anything directly into PDF format,
but I knew it could be done. I had made several abortive attempts at
scanning Penn Central memorabilia into HTML format, but I ended up with
these awful bloated javascript nightmare directories of files for
every page and the resulting web pages wouldn't display worth a darn
in older browsers. I frankly wasn't expecting the Erie Magazine page
to come out even halfway decent looking after the OCR phase.
When it finished OCRing the page and displayed the output I was stunned!
Not only did it look very good, but the text was very accurate for the
page I had scanned. I kept going.
After several more hours of scanning and tinkering about I had the first
complete Erie PDF Magazine. I uploaded it to the web server and fired
off an email to the erielack list asking for opinions. I had a bunch more
Eries that I could PDF, but I wasn't going to start converting them all
if nobody cared about looking at them.
Within hours there were several very enthusiastic responses from Erie
fans hoping to see more of them online. It was obvious at that point that
this project was worth doing.
That was April 19, 2002.
Taking on a Life of Its Own
I did six more magazines over that weekend and people kept on reading
them. My Fiance, Tammy, got involved and I taught her how to to do the
scanning work on her computer and then I used our home network to copy
them to my machine for the OCR work and graphics processing. It would
take Tam two to three hours to scan an entire issue and then take me
another two to three hours to do the OCR, corrections, graphics work
and uploading. Why did it take longer with both of us working on it?
It turned out that the first magazine I scanned in was one of the
easier issues to process. In addition to dealing with yellowed pages
and poor typesetting alignment by the original publishers we started
to get picky about the imperfections we would let pass and I started
getting into more involved graphics manipulation to reduce the total
file size of the finished PDF magazines, some of the early ones were
pretty huge.
Paul Tupaczewski graciously offered to loan us some E-L and Lackawanna
Magazines to broaden the scope somewhat and then Rich Tubbs sent off
his collection, Vince Lee sent the August '46 issue and Jon Liles
sent off a batch. We really have our work cut for us as several others
have volunteered to loan us several more issues.
About a month into the project we completely changed our production
methodology in order to use more accurate OCR software. Unfortunately
this integrated the scanning and OCR steps so only one of us could do
both of them at any one time. I had been taking more and more time
proofreading, making the OCR corrections and processing the graphics
so Tammy decided she could learn how to do the OCR part of it. That
meant going through another learning curve phase which cost more time,
but the results were definitely worth it.
I got a little fanatical about reducing the overall file size for
each issue which reached its pinnacle with the April of '59 issue
at only 893K for 32 pages. A far cry from the first issues, some of
which were over 8 megs! Despite the fact that the 4/59 issue is very
readable I have since decided to sacrifice some bytes for graphics
quality, especially with the magazine covers. Size does matter, a lot of
people ae still using dialup connections and this project has
alwys been geared toward downloading the PDF from the web server.
(11-24-10 Note: I sorely wish I had kept master full-res image versions of them!)
As my PDF toolbox has increased in size I have cut the production
time for each issue while increasing the output quality. I've learned
a lot of nifty tricks and I think I have found every darned bug in
all the software we use! Compare some of the earlier issues with the
ones done after the end of May, I was amazed at how much better they
look. It currently takes 6 to 8 man-hours to convert a 36 page
magazine into PDF format. (That was then, I could do one in a couple
of hours at higher quality now! - 11-24-10)
Tammy got into a morning routine of scanning pages and I got into
one of finishing them in the afternoons after my regular work was
out of the way. For a few weeks we were really churning out the issues!
Reality Bites
Then our one year old scanner died, we literally wore it out...
Then some of my company web servers started crashing for various reasons,
I built and configured three complete new ones over a 45 day period...
Then it started getting hot, we don't use much A/C at home in the summer...
We just couldn't get back into the groove so we decided to take a
break for the rest of the summer and maybe get a few other things done.
We haven't done that many more magazines since then as quite a few other
priorites keep trumping our hobbies, pesky things such as work, family,
living situations, health issues and other mundane issues.
We have 78 issues completed so far and a few more on hand. Paul T has
another batch of Lackawannas ready to send and several other folks
are ready to ship off other Eries as soon as we are ready. It looks
like this will be a long term project. We're about 500 man-hours into it
so far...
Update: Availability of time has precluded me from continuing
on this project. It's not completely dead, just hibernating for now. No
matter how hard I try to get things done, it seems like there's always
more to do and more things get shoved onto side tracks. I would like to
restart the project at some time in the future if I can lay my hands on
some good sized collections of company magazines at reasonable prices.
I would even re-do some of the completed issues in higher quality. With
the improved hardware, software and workflow techniques I have now,
it would only take a couple of hours per issue, which would make churning
out a couple of issues a week a piece of cake. - Henry 11-24-10
Notes, Errata, Etc.
While we couldn't have done this without the generous support of
those who have loaned us issues, the biggest thanks goes out to my wife,
Tammy Priebe, for tirelessly scanning issue after issue and
painstakingly bolding by hand name after name in the gossip column of
the Eries, an absolutely maddening process! As she is a genealogist
she felt it was important to get the names in each issue as correct
as possible.
The PDF files are all set up for page by page delivery to web browsers
which have an Acrobat™ plug-in installed. This distributes the
data transfer over time as you browse it page by page which makes it
much more bearable for folks on dialup connections.
We have stopped bolding the names in the gossip column. That will save
about two hours per issue. I cannot get the OCR to do it even close to
reliably and it just isn't worth doing it by hand.
There have been some typos and misspellings in the magazines. I wrestled
with my conscience over whether or not to correct them for the PDF
versions and I haven't been consistent about it. In the beginning I let
them go as the original mistakes which they were. Then I started
correcting them and now I have been leaving them there again, only now
I attach a little PDF comment which can be clicked on for an explanation.
I flag the comments to not print though, I'd like printings to be as
close to the originals as possible.
I have tried to keep the railroad photos at a higher quality level than
the social photos, we are doing this for Railfans after all! Recently
I have started doing the magazine covers at a higher resolution than
article photos so they look nicer. Advertisements get the lowest
quality graphics unless they show something notable.
Some issues have full year calendars in them. I have paid special
attention to those so they will print nicely. If you're like me then
you use old RR calendars for the current year as long as the days
match :)
The typefaces don't match the original magazines perfectly in any issue.
(When we did the conversion all those years ago, internet bandwidth and
storage space was a concern. I now wish we had used Acrobat's image
overlay feature so they all would look perfect even though the file
sizes would be huge. 11-24-10)
The Erie used a funky seriffed typeface which was sort of a cross
between Times and Courier and we have used both to replace it on
occasion. I don't think it really hurts too much and it would be a
HUGE deal to try to make it all match perfectly. Huge like taking
twenty to twenty-four hours per magazine to do!
Right now there isn't a searchable master index to all the issues.
I am keeping an eye out for practical way to do this.
Individual issues can be searched from within Acrobat using
the little binocular icon. Being able to text search issues is
the main reason we kill ourselves doing the OCR.
I know some folks are using these for genealogy and we've worked
hard to get all the names accurate in the retirements, obits and
gossip columns. If they typoed a name there and I know it I will
correct it. One of the primary reasons we switched software
mid-stream was because the tiny text in the retirement and obit
sections of the Eries would never come out decent and it was
taking an hour or more to proofread and correct those.
Because some issues just aren't very white anymore the photo
quality in those issues has suffered. We had to compromise
with the contrast settings in order to get the text to OCR as
accurately as possible. I wish it was all easier sometimes...
Even though we're hundreds of hours into the project we're
still learning some new methods of improving the conversion
process. Because of that there are little variations from issue
to issue such as having article text half a point size different
or a different typeface being used in a header or advert. I try
to use what looks best at the time for each issue and I think
the consistency has improved, but it still has a ways to go. I'm
not kidding myself that I'll ever do a perfect issue, but I'll
probably keep trying. I'll be an expert at generating PDFs by
the time we're out of issues.
Regretfully, you can't cut-and-paste text from the issues. I have
had problems in the past with people stealing my work, which I give
away, to charge people for it and I have been starting to protect
some things with watermarks and passwords. The PDFs have some
security enabled to prevent this, but you can print them.
Anyone who really wants to copy the info can figure out how to hack
it or just ask me nicely to copy an article to text for them.
If I do find out that someone is using our work for profit or
distributing it without our permission I will seek
legal recourse.
- J. Henry Priebe Jr.
Back to the Main Index
Last Updated Saturday, 21-Feb-2015 10:40:11 EST