Using MS Office To Convert Topps’ PDF Checklists To A Useable Format

Because I track my collection in an Access database, I do a fair amount of copying online checklists in order to get card information into my database.  From playing around with the formatting of these things, I’ve got it down to a science… somewhat.

Even if you don’t use a database, you still might want to take the Topps PDF and convert it into a format which is better suited for whatever it is you want to do with it.

FYI, everything I’m doing is described in terms of MS Office 2010, but you can likely do similar things in other software packages.  I’m not going to elaborate on some of the more detailed Office stuff;  if you have questions just leave a comment and I’ll do my best to explain the details.

…And just to be clear, software to convert PDF’s to Word does exist, but I’m not using any of those packages.

Copying PDF info into the clipboard, method #1 – saving the PDF to your computer

When you’ve opened a PDF document in your browser, the simplest way to save it is to right-click on the page, select “Save Page As” and save it as a PDF on your computer. Once it’s saved, you can open it up (assuming you at least have Adobe Reader) do a Ctrl-A to highlight the whole document and then Ctrl-C to copy the whole thing into your clipboard.

The upside of this method is that you’ve always got the original PDF to fall back on.

The downside is that you’ve got a file stored on your computer taking up space… probably not a lot, but it could end up as clutter if you’re not careful.

Copying PDF info into the clipboard, method #2 – copying from your browser

If you want to copy the information without saving the PDF file on your PC, then you want to follow these steps.

One thing that tripped me up for a while happened during this step.  I thought I was copying the entire document, but I later realized that I missed a large chunk of the document.

To copy EVERYTHING, my experience is that you have to do the following:

  • Scroll all the way to the bottom so that all of the text in the PDF has been displayed by your browser.
  • Left click and click “Select All”
  • Left click and click “Copy”.

One way or another, you’ve now got your checklist info in your computer’s clipboard.

Re-Formatting the data using MS Word

At this point, I go to Word and do a Paste/Keep Text Only. Your text would probably look something like this:

TP-1
Byron Buxton
Minnesota Twins®
TP-2
Tyler Austin
New York Yankees®

From here on out, you might want to start breaking the text down so that you’re working with just the base set or a particular insert… otherwise, things get confused and unwieldy. Topps checklist documents touch on each and every parallel and insert, and there’s a crap-ton of information included.

To get things properly separated into individual lines, I do the following:

Bring up the “Find And Replace” box (I do this by hitting Control-H, but there are other ways)

In the “Find What” box, I put

^p

…which Word interprets as being a “paragraph” character.

In the “Replace With” box, I put

^t

…which is Word-ese for a tab character.

I then click on Replace all, and that will take all of your Paragraph characters and replaces them with Tabs.

You’ve now got a very long single-line document that looks something like this:

TP-1       Byron Buxton    Minnesota Twins®           TP-2       Tyler Austin        New York Yankees®        TP-3       Mason Williams                New York Yankees®        TP-4       Albert Almora    Chicago Cubs®   TP-5       Joey Gallo                Texas Rangers®

Now I take advantage of the trademark (™) and registered trademark (®) symbols.  Topps ends each line with a team name followed by one of these symbols, and since I have no reason to keep legal characters in a privately-used database, I replace each of those with a paragraph character.

This is done similarly to above, except that we’re going to replace the

®

with a

^p

And click on Replace All.

…Of course, you also do the same with the ™.

Some of the formatting I do works quite nicely as a Macro – this is essentially a little program that lets you do a series of repetitive tasks all at once.  I won’t get into the details of creating a macro, but you can go into Word help or look up one of the many resources that exist online. I’ve got a macro set up that will replace both of these characters with a ^P with one click of the mouse… OK, fine, several clicks, but it’s still easier.

Getting back to the above example, I prefer to separate the card number prefix (TP) from the card number (1, 2, 3), so I’ll do another Find And Replace and substitute a tab character (^t) for the dash.

Going from Word to Excel

You may be wondering why I use tabs to separate the different parts of the line, when spaces would look pretty similar.  That’s because my next step is to put the information into Excel.  When you copy something with tabs and paste it into an Excel workbook, the tabs are an indicator that the information between the tabs (and paragraph marks) should be put into its own cell.

So when I paste – voila! – I get this:

Excel example

The reason I put it into Excel is because you can upload Excel documents directly into Access, where it will become an Access table… and then I use SQL to copy the information from that new table into my database tables… but I’m not going to get into that now.

Like I said, if you have any questions, or if you have feedback along the lines of “Hey, dumb-ass, wouldn’t it be easier to do it this other way?”, please leave a comment!

Advertisements

3 thoughts on “Using MS Office To Convert Topps’ PDF Checklists To A Useable Format

  1. Perhaps this is in your blog archives, but is there a good way to design an Access interface that makes entering the cards into Access reasonably painless?

    • I have thought about writing about how my database is set up, but it would probably take more time to write and research than I have right now.

      …But it’s good to know that there is interest in such a thing, so I’ll see if I can’t squeeze it in somehow.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s