Cardboard box training rig - first book done!

Built a scanner? Started to build a scanner? Record your progress here. Doesn't need to be a whole scanner - triggers and other parts are fine. Commercial scanners are fine too.

Re: Cardboard box training rig - first book done!

Postby 600dpi » 19 Mar 2012, 09:55

daniel_reetz wrote:That's possible - a clean, binarized (pure black and white image) takes less bits to encode than a color or noisy image.


Many thanks for the clarification....
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby 600dpi » 20 Mar 2012, 13:13

Next Stage: Book2

I went for broke and picked a book that is around 365 pages and about as big as my hand held platen can handle.

The "scanning" went well and it took about 35 minutes to do the "Right" then "Left" side pages of the book. But this is where the "Fun" really started!!!!

Before doing this though, I'd looked at the scrappy pieces of paper I'd written down notes of the missing images (turned two pages or whatever), and duplicate images. I wrote a quick utility to list the image files in a directory in this format:

( ) img_0001.jpg [ ]

Pre-scanning and Post-processing Workflow/Issues

So I ran the utility to get a file of all the JPGs. I loaded this into MS Word and used the "columns" function to get three columns per page and printed this "Work list" out. I sat down with the first image loaded into Irfanview and clicked through each image inturn and if I spotted duplicates, decided which was the best and noting ( X ) on the worklist if the image was selected for deletion. If I spotted that a page was missing I wrote a [ M ] and page number against the preceding image. I then sat down with the list of deletions and went through all those.

Back to the scanner and to re-do all those missing image (now why this was happening I'm not sure)...

So now I've go additional sequential images. In principal it's easy, knowing the page number, identify the image of the preceding page and give it a suffix something like img_456a.jpg if it needs to come after img_456.jpg and so on. This is where it all went a bit tits up, since the work list of image names doesn't have page numbers and so forth. In the book I was using there were also large numbers of pages without page numbers, so this made identifying them very tricky.

The bottom line: I found all this phenomenally time consuming. I'd estimate that to get a final set of complete images to be an additional day's work, possibly more. Last night I had two missing pages, tracking them down was a PITA.

At this moment in time I see little point in contemplating two cameras, since dropping from 90 to 45 minutes would save a very small percentage of the total time to get to Scan Tailor readiness.

Currently I decided I had a vital List of information missing: A map of the book (analagous to the work-list of images above). So I wrote a utility where you enter the title, whether it's right or left side that's being scanned, the number of introductory (or unnumbered pages), the first (right or left) numbered page of the book, then the ending pages. When run it produces a single columnated listing (which can be colunnated in MS Word and really needs to be in a mono spaced font) similar to:

Title: A Dummies guide to DIY Book Scanning
Side: RIGHT
intro 1 ......................... [ ]
intro 2 ......................... [ ]
page 1 ..................... [ ]
page 3 ..................... [ ]
page n ..................... [ ]
end 1 ......................... [ ]

The thought was to write descriptions for the intro/ending and other pages that don't have page numbers (to help match up the missing pages). Also the file names of pages before the missing pages can be noted (and perhaps of the Re-taken) ones. This means that the plan is to then greatly speed up this aspect of the procedure.

The other thought was to split the task into two and do all the sorting out for the Right hand side of the book before even taking the images of the left hand side. Trying to get a complete set of 180 images being much more than twice as fast than doing the same for 360. The "book map" listing can then be used to "sign-off" that side before proceeding with the Left side.

So going from a 160 page book up to 360 pages has been a real eye-opener!

I would welcome others thoughts and advice on how to efficiently deal with this aspect of the process....
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby dpc » 20 Mar 2012, 15:18

Wow, what a hassle. If it were me, I would try to find out what was the cause of this (CHDK bug/setting or switch bounce) rather than work on an efficient way to determine and reshoot the missing pages and eliminate the duplicates. Have you tried triggering the camera by hand instead of using your remote switch to see if you end up with a similar problem? You may need to add a debounce circuit to your switch.

If you knew the general area where the page numbers were located within each page's image, you could write a program that saved just that clipped area for each page into a separate image, ran that through OCR, then use another program to parse the numerical text output and look for missing/duplicate page numbers. I thought about doing something like this to detect missed/duplicate pages for an automated page turning scanner I was designing, but you shouldn't have to worry about this if you're turning the pages by hand (I would hope).
dpc
 
Posts: 126
Joined: 01 Apr 2011, 18:05
Location: Issaquah, WA

Re: Cardboard box training rig - first book done!

Postby 600dpi » 24 Mar 2012, 14:33

dpc wrote:Wow, what a hassle. If it were me, I would try to find out what was the cause of this (CHDK bug/setting or switch bounce) rather than work on an efficient way to determine and reshoot the missing pages and eliminate the duplicates. Have you tried triggering the camera by hand instead of using your remote switch to see if you end up with a similar problem? You may need to add a debounce circuit to your switch.

If you knew the general area where the page numbers were located within each page's image, you could write a program that saved just that clipped area for each page into a separate image, ran that through OCR, then use another program to parse the numerical text output and look for missing/duplicate page numbers. I thought about doing something like this to detect missed/duplicate pages for an automated page turning scanner I was designing, but you shouldn't have to worry about this if you're turning the pages by hand (I would hope).


<<I would try to find out what was the cause of this (CHDK bug/setting or switch bounce)>><< rather than work on an efficient way to determine and reshoot the missing pages and eliminate the duplicate>>

I think I probably need to do both! :D

Process and Procedure Cont...

I didn't like the idea of two separate types of paper Work lists as described above. But the steps involved seemed the right approach. So after further thoughts and a bit of playing about I came up with an application to cut the paper out of the loop (well mostly). This is to be known as "Book Mapper". :lol:

The idea is to use this to speed up getting a good set of images on the SD card prior to copying them to HD for post-processing in Homer etc. You firstly set the directory of the source image files (which will be on the SD card). You also enter key attributes for the book to create an overview "map":

Image

The "Intro pages" are just page number counts of all the stuff prior to the first numbered page, similarly for the "Ending pages". You then enter the last numbered page.
It then populates the "Map" list in this:

Image

The logically created "Book Map" and Source files are shown in the top two list boxes. You can scroll through the images and identify i) Any for duplicates or duff ones for deletion (and "Add to Deletes"). If a match is made between image file and book map page, then Click on the "Match" button. These two operations create entries in the lower two lists (Matched Map/Images and Deletion list) and remove the entries from the upper work lists.

Then:
1) The "Delete All" will batch delete all the files in the Deletion list.
2) The "Work list" option will create a list of any pages remaining in the "Map" list (which can be printed off, taken to the scanner for the next cycle etc)...

I haven't used it for real yet and there's tons of stuff left to do (Undelete, Save book map, Load Saved book map, Edit individual book map entries (to add some description or such for pages with no page number) etc etc)....

If anyone has any thoughts on other useful functions, please chime in....

Book 2 Continued.....

This week I struggled with the 386 odd page book whose images I took last weekend. I ran into troubles in Scan Tailor. Basically, it kept bombing. Frequently on page one or two! The images themselves were at the same resolution and settings, but this book was much bigger. In the end I guess it was just really needing to get more experience with ST. I set the first two pages (in colour) to "Mixed" type, I used the arrange by increasing width and height to crop all the largest images (I was suspicious that some were too "wide") and finally I set the output to 300 dpi and not 600. It then ran through to completion. The book is fine, no duff pages, and less than 9Mb in size.

DPC is quite right that I need to do a lot more work with CHDK and the remote and I think I'll add a second light (see above) to the lighting rig..... but the next step is to bash on with Book3 using the program above and note whether it helps or not.
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby 600dpi » 24 Mar 2012, 18:11

Book No 3 (309 page paperback)

Did the Right hand side of the book, noticed that saving images (a 2GB card) was slow and (it's an old Jessops generic card).

So I set the SD card to "Unlock" and loaded it into a USB card reader. I ran this as the first test through "Book Mapper".

The first critical thing was that loading the images by the applications was slow, really s-l-o-o-o-w!!! Even so, it was much less frought to scan through the images clicking "Match" if there was or "Delete" if a duplicate or duff one. Learnt which bits to watch out for as the arrays changed. At the end of the process I had 34 duff and duplicate records that got batch deleted and a worklist of 30 or so that need to be re-done. The "Worklist" option just opens the missing pages into Notepad and saves it as a text file, it was quick to splice in the preceding matched page/image which gives this:

3 IMG_0009.JPG
Page 5...

13 IMG_0021.JPG
Page 15...

21 IMG_0025.JPG
Page 23...

29 IMG_0028.JPG
Page 31...
Page 33...
Page 35...
Page 37...


43 Img_0032.JPG
Page 45...
Page 47...

49 Img_0032.JPG
Page 51...

53 IMg_0036.JPG
Page 55...
Page 57...

63 IMG_0039.JPG
Page 65...

67 IMG_0041.JPG
Page 69...
Page 71...
Page 73...


79 IMG_0045.JPG
Page 81...

83 IMG_0047.JPG
Page 85...

93 IMG_0052.JPG
Page 95...

99 IMG_0055.JPG
Page 101...

119 IMG_0070.JPG
Page 121...

123 IMG_0072.JPG
Page 125...

133 IMG_0077.JPG
Page 135...

163 IMG_0092.JPG
Page 165...

203 IMG_0113.JPG
Page 205...

229 IMG_0131.JPG
Page 241...

249 IMG_0135.JPG
Page 251...

253 IMG_0137.JPG
Page 255...
Page 257...

261 IMG_0139.JPG
Page 263...

265 IMG_0140.JPG
Page 267...

273 IMG_0143.JPG
Page 275...

So once the missing pages have been taken, this list also guides what they need to be renamed to (with an "a", "b" or "c" suffix ). Seems that I need to add a "Match list" equivalent to the "worklist" function to they app. Overall though, I was pleased with the way it went!

But Did I say it was slow though? I had a brief break during which I wondered whether it could be a virus checker, running everytime I open a JPB? I am also thinking whether this old (6 years+?) SD card is just too old and too slow???? When taking the piccies it's noticeable that the camera seems to get slower and slower between shots so I'm suspecting that somethings amiss here? :(
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby 600dpi » 25 Mar 2012, 08:29

Book 3 - left hand side pages

This morning I took the left hand side pages photos of the 309 page paperback. I tried to give the camera more "thinking time" between shots, but ended up with 36 pages missing :(

But the good news was that I spotted a bug in the "BookMapper" application which meant it was writting temp files to the image source directory (OMG!). These appear to be BMPs and so for each 2.5 Mb jpg it was writting a 25Mb BMP to the SD card. I tweaked the code and these are now written in the program directory. It isn't as fast as scrolling through images as the "film strip" view in Windows Explorer, but is now down to just a few seconds per image. I also added a "Matched list" function (to export the matched map/entry settings.

The app seems much more responsive and I was able to review the LH images whilst having a coffee...

Here's the "Work list" of missing images:

Intro 2...
Intro 4...
Intro 6...
Page 2...
Page 12...
Page 30...
Page 42...
Page 46...
Page 56...
Page 60...
Page 70...
Page 78...
Page 98...
Page 102...
Page 106...
Page 136...
Page 162...
Page 166...
Page 172...
Page 176...
Page 212...
Page 218...
Page 226...
Page 228...
Page 232...
Page 240...
Page 248...
Page 258...
Page 274...
Page 280...
Page 286...
Page 288...
Page 294...
Page 296...
Page 298...
Page 304...

Here''s the start of the "Matched list"....

Intro 1... IMG_0163.JPG
Intro 3... IMG_0165.JPG
Intro 5... IMG_0167.JPG
Page 4... IMG_0169.JPG
Page 6... IMG_0173.JPG
Page 8... IMG_0174.JPG
Page 10... IMG_0175.JPG
Page 14... IMG_0176.JPG
Page 16... IMG_0177.JPG
Page 18... IMG_0178.JPG
Page 20... IMG_0179.JPG
Page 22... IMG_0180.JPG
Page 24... IMG_0181.JPG
Page 26... IMG_0182.JPG
Page 28... IMG_0183.JPG
Page 32... IMG_0184.JPG
Page 34... IMG_0186.JPG
... etc

There were a number of duff images identified and these got the batch delete treatment

Sweet! :D

Going to go out and enjoy the Spring sunshine this pm, will finish these off later....
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby 600dpi » 27 Mar 2012, 08:16

Well I completed the missing pages of the LH side of the book and renamed the images using the "Film strip"view in Windows explorer and working to the work list from BookMapper.

I then ran it all through Scan Tailor. I did run into problems. Possibly due to the width of the pages being close to A4 (although the pages are shorter). I used the Natural order, increasing height and width options to sort and crop. But it still bombed, I made sure the covers (in colour) were set to mixed and 300 dpi (with the remainder at 600 dpi, and it still bombed, so I set all pages to 300 dpi and it ran to completion. Then it was option 4 in the Homer batch file to end up with the book - which looks pretty good and is 3.97Mb for a 309 page paperback... So Book 3 done.

Stock Take: Having problems with slow image manipulation both when saving in the camera and from Windows. Guessing its the old 2GB SD card, so picked up a Scandisk 4Gb class 4 SDHC card. Used CardTricks to format it to FAT16 and make it bootable, copied the files over and ran through book 4

Book 4

A 144 page text/procedure book. Ran through the RH pages first, camera seemed quicker. I'm using Tv rather than full manual mode. I'm finding that I recognise when pages (little text) can have the shutter time reduced (more reflective?), or increased (more black) to stay within the OK zone. Loading the images into BookMapper, then running through the checks found 8 missing pages. I redid these then went through the LH pages. I took even more care, and when I checked them in BookMapper, there were no missing pages!!!

This is a bit of a landmark for me and so the 8 missing pages represent 5.5% of total. A huge improvement (and I may have just skipped some pages when turning: who knows?).
So as I write, these are now running through the final Output stage of Scan Tailor.....
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby 600dpi » 29 Mar 2012, 14:03

Haven't done much the last couple of days (other than reading the 4 books completed to date!). It would be easy to get suckered into doing lots of development on the 'BookMapper' utility above, but I decided to have a little stock take. The first thing I must do is add a second halogen spotlight to the lighting rig. The next thing I thought about was increasing the size of the single sided (one camera only folks!) glass platen.

This is what it currently looks like, with triangular ply edges and a ply "fence" opposite the handle. I had to clad the 'fence' in black card, firstly I stuck it on the outside, but Scan Tailor's "Split pages" function always saw it as the edge of a page. So the black card was also needed on the inside. I'd like to do a full A4 page (which is too big for this platen). In it's current form, it works well for paperbacks, hard back novels and smaller text books. If I was to tweak the design further, I'd angle the ply fence and sides outwards at 100 degrees and not 90 degrees... This is because with maximum sized books, there would be a larger black border in plan view from the camera perspective.

Image

So would this scale up hand held or will I need to consider a vertical support, tower(s) or similar?
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby 600dpi » 02 Apr 2012, 11:55

Yesterday I added functionality to the "BookMapper" utility so that I can "refresh" the view of the source image file directory (so that it will display any new images added for re-takes or if missing on the first pass).

I then went out and got one of the cheapest, nastiest, Android tablets I could find (running Gingerbread 2.3 and with no upgrades available). It seems to work quite well but....

The PDF files of the 4-books I prepared so far are a mixed success. 2 display OK but are relatively slow to load, 2 are truncated vertically.

Decided to check them out in Adobe reader on a PC. Looked at the main menu "page size" settings and then low and behold:

• All the PDFs that display OK have page sizes around 11 x 8.5 inches or similar (these were user guides for the Tablet and some instructions for apps) .
• The page sizes of my PDFs are all over the place and at least 50% or more bigger (the worst one was 25 x 16 or some sutch. Presumably the Android app is struggling with these).....

So why should this be?

All the initial images are the same size (4000 x 3000). Once I've run them through option 1 in Homer and load them into Scan Tailor the DPI indicates 180 x 180 (but found through trial and error that it needs 180 x 204 to be enabled). Clearly the images are cropped differently, and the camera is zoomed differently for each book. I output from ST at 300 dpi.

• I though Scan Tailor was adjusting sizes for A4 pages? or is it just A4 proportions?
• Is this a DPI issue or something else?

I'd welcome some advice and help on this one....
600dpi
 
Posts: 21
Joined: 26 Feb 2012, 05:51
Location: Bristol, UK

Re: Cardboard box training rig - first book done!

Postby Heelgrasper » 02 Apr 2012, 18:03

600dpi wrote:
• I though Scan Tailor was adjusting sizes for A4 pages? or is it just A4 proportions?
• Is this a DPI issue or something else?

I'd welcome some advice and help on this one....


I'm far from an expert on ScanTailor (I'm sure a true expert will come around at some point) but ScanTailor tries to make the size close to the size of the pages in the original book. And the proportions of the orginal. If you want you could make it adjust to a page size of your desire but it assumes that you want something as close to orginal as possible. That is also why you have to put in the DPI since it can't know how many pixels in your images equals an inch on a page of the original. That number should always be the same (like 250x250) since an inch on the book page take up the same pixels wheter it's vertical or horisontal (if you take a picture of a square it doesn't look like a rectancle in the picture). If you put in the actual DPI of your scan you should end up with a pdf with pages very close in size to the original book. Depending of course on what kind of margins you let ScanTailor put in and you controlled that something didn't make ScanTailor think some pages were much bigger than others.
---
Jakob Øhlenschlæger
Randers, Denmark

The past is a foreign country: they do things differently there
L. P. Hartley
User avatar
Heelgrasper
 
Posts: 70
Joined: 19 Feb 2012, 21:04
Location: Randers, Denmark

PreviousNext

Return to Scanners and Build Threads

Who is online

Users browsing this forum: No registered users and 2 guests