Data Project 1:
The purpose of this project was to transfer data from the
lab to the SQL Server. It was decided that the most way to transfer the data would be using the bulk copy program (bcp). However in order for bcp to work properly, the data needs to be in the correct format. We decided to write a program in Visual Basic that will do that. Lets look at a sample of text file from the pathology database.
PHASE 1: Study sample file "Oct_97.txt"
TxtLine Error: 1032 C-5509-97 44E0 56E7 0000 0000 0
TxtLine Error: 1032 S-6632-97 44E0 5E81 0000 0000 0
TxtLine Error: 1032 S-7715-97 44E0 5DD5 0000 0000 0
LASTNAME, FIRSTNAME 10/02/97
2911 14593610
C-1234-97
F 1/19/1966
SPECIMEN SOURCE:
CERVICAL/ENDOCERVICAL
DATE SPECIMEN COLLECTED:
10/02/97
STATEMENT OF ADEQUACY:
SATISFACTORY. ENDOCERVICAL CELLS PRESENT.
DIAGNOSIS - BETHESDA SYSTEM:
WITHIN NORMAL LIMITS
CYTO TECH:
F. TECH, CT(ASCP)
PATHOLOGIST:
DOC, M.D.
10/06/97 BSF
PHASE 2: Pseudocode for the program
The above flowchart includes the following 10 steps of designs
1. Open the sample file for input, error file for textline errors and output file
2. While it is not end of file, read one line at a time ignoring empty lines (len=O)
3. If line starts with "TxtLine Error", get the specimen number and resume operation
4. Read the first line of a record, separate the last name, first name and examdate
5. Read the second line of a record, separate the mrn number and the account number
6. Read the third line of a record and take out the specimen number
7. Read the fourth line of a record, separate the gender and date of birth
8. The next section is the actual pathology report which should be combined into one text field
9. Read the last part in the record which is the physician's name
10. Write all this information to an output file separating the fields by tilda "~"
PHASE 3. Write the code for the program
Here is
the GUI (graphic user interface) for the program
The code for this project is
Option Explicit
Option Compare Binary
Private Sub cmd_execute_Click()
'Declaration of variables
Dim st_linestring As String
Dim st_month As String
Dim st_year As String
Dim st_file, st_fileout As String
Dim st_report As String
Dim st_examdate, st_examtype As String
Dim st_lastname, st_firstname As String
Dim st_mrn, st_mrn1, st_mrn3 As String
Dim st_temp, st_temp1, st_temp2, st_temp3 As String
Dim st_temp4, st_temp5 As String
Dim st_accountnum As String
Dim st_gender, st_dob As String
Dim x, y, z, a, b, c, d, e, f, i, j, k, l, m, n, o As Integer
Dim st_MD1, st_MDname As String
Dim st_cpt As String
st_report = ""
st_MD1 = "M.D."
'Opening files for input and output
'here we use combo boxes and case structure to let the user select
'the month and the year for the input file
Select Case cbo_month.ListIndex
Case 0
st_month = "Jan"
Case 1
st_month = "Feb"
Case 2
st_month = "Mar"
Case 3
st_month = "Apr"
Case 4
st_month = "May"
Case 5
st_month = "Jun"
Case 6
st_month = "Jul"
Case 7
st_month = "Aug"
Case 8
st_month = "Sep"
Case 9
st_month = "Oct"
Case 10
st_month = "Nov"
Case Else
st_month = "Dec"
End Select
Select Case cbo_year.ListIndex
Case 0
st_year = "89"
Case 1
st_year = "90"
Case 2
st_year = "91"
Case 3
st_year = "92"
Case 4
st_year = "93"
Case 5
st_year = "94"
Case 6
st_year = "95"
Case 7
st_year = "96"
Case 8
st_year = "97"
Case 9
st_year = "98"
Case Else
End Select
st_file = "C:\" & "kashef\" & st_month & "_" & st_year & "." & "txt"
st_fileout = "C:\" & "kashef\" & st_month & "_" & st_year & "_out" & "." & "txt"
pic_1.Print st_file
pic_1.Print st_fileout
'Opening files for input, output and append
Open st_file For Input As #1
Open "C:\kashef\patho_bcp.txt" For Append As #2
Open "C:\kashef\error.txt" For Append As #3
Open st_fileout For Output As #4
'Begin header block - Initialization of the do-loop and if-then clause
'This part of the program gets the string from the input file
'It then trims the strings(removes spaces) and also checks for null strings.
'If it reads a text line error, it writes the mrn number to the error file
Do
Line Input #1, st_linestring
st_linestring = Trim(st_linestring)
If Len(st_linestring) <> 0 Then 'start of 1st if-then
If Left(st_linestring, 7) = "TxtLine" Then 'start of 2nd if-then
st_temp5 = Mid(st_linestring, 23, 10)
st_temp5 = Trim(st_temp5)
Print #3, st_temp5; "~"
GoTo break
Else
End If 'end of 2nd if-then
'Check to see if the string contains more than nineteen spaces.
'If it does then the program separates the left and right part of the string
'We use several string functions like Left, Right, Mid and Trim to manipulate the strings
'It then prints the information to the output file
z = InStr(st_linestring, " ")
If z <> 0 Then 'start of 3rd if-then
st_temp = st_linestring
st_examdate = Right(st_linestring, 8)
st_examdate = Trim(st_examdate)
b = InStr(st_temp, ",")
st_lastname = Left(st_temp, b - 1)
st_lastname = Trim(st_lastname)
st_firstname = Mid(st_temp, b + 1, (Len(st_temp) - _
Len(st_examdate) - Len(st_lastname) - 1))
st_firstname = Trim(st_firstname)
'new section
'i = i + 1
'Print #2, i; "~";
'pic_1.Print i; "~";
'new section
Print #2, st_lastname; "~";
Print #4, st_lastname; "~";
pic_1.Print st_lastname; "~";
Print #2, st_firstname; "~";
Print #4, st_firstname; "~";
pic_1.Print st_firstname; "~";
Print #2, st_examdate; "~";
Print #4, st_examdate; "~";
pic_1.Print st_examdate; "~";
'Check to see if the first word is a number
'If it is then the program takes out the medical record number and the
'account number. It also makes sure that the mrn# is atleast seven digits
'It then writes the information to the output file and breaks out of the loop
Line Input #1, st_linestring
st_linestring = Trim(st_linestring)
st_mrn1 = Mid(st_linestring, 1, 1)
st_mrn3 = Mid(st_linestring, 2, 1)
'start of 4th if-then
If (st_mrn1 >= Chr(48) And st_mrn1 <= Chr(57)) And _
(st_mrn3 >= Chr(48) And st_mrn3 <= Chr(57)) Then
If Len(st_linestring) <= 9 Then 'start of 5th if-then
st_mrn = st_linestring
If Len(st_mrn) <> 7 Then 'start of 6th if-then
For d = 1 To (7 - Len(st_mrn))
st_mrn = "0" + st_mrn
Next
Else
End If 'end of 6th if-then
st_accountnum = "0"
Else
For c = 1 To Len(st_linestring)
st_temp3 = Mid(st_linestring, c, 2)
If st_temp3 = " " Then 'start of 7th if-then
st_temp4 = Left(st_linestring, c)
st_mrn = Trim(st_temp4)
st_accountnum = Right(st_linestring, 8)
If Len(st_mrn) <> 7 Then 'start of 8th if-then
For d = 1 To (7 - Len(st_mrn))
st_mrn = "0" + st_mrn
Next
Else
End If 'end of 8th if-then
Else
End If 'end of 7th if-then
Next c
End If
'end of 5th if-then
Print #2, st_mrn; "~";
Print #4, st_mrn; "~";
pic_1.Print st_mrn; "~";
Print #2, st_accountnum; "~";
Print #4, st_accountnum; "~";
pic_1.Print st_accountnum; "~";
'GoTo break
Else
End If 'end of 4th if-then
'new section
'After reading the account number, it looks for the examtype number
'It trims the number and then writes it to the output file
Line Input #1, st_linestring
st_examtype = Trim(st_linestring)
Print #2, st_examtype; "~";
Print #4, st_examtype; "~";
pic_1.Print st_examtype; "~";
'Check to see if the first letter of the names is "F" or "M"
'If it is then the program takes out the gender field and the date of birht field
'It then prints the information to the output file and breaks out of the loop
Line Input #1, st_linestring
st_linestring = Trim(st_linestring)
'start of 9th if-then
If (Mid(st_linestring, 2, 1) = " ") And _
(Left(st_linestring, 1) = "F" Or _
Left(st_linestring, 1) = "M") Then
st_gender = Left(st_linestring, 1)
st_dob = Right(st_linestring, 10)
st_dob = Trim(st_dob)
Print #2, st_gender; "~";
Print #4, st_gender; "~";
pic_1.Print st_gender; "~";
Print #2, st_dob; "~";
Print #4, st_dob; "~";
pic_1.Print st_dob; "~";
'GoTo break
Else
'new section
Print #2, "~";
Print #4, "~";
pic_1.Print "~";
Print #2, "~";
Print #4, "~";
pic_1.Print "~";
'new section
End If
'end of 9th if-then
Else
End If
'end of 3rd if-then
'End of the header block
'Begin body block - Report generation section
'This part of the program looks for a specific string like "CLINICAL DIAGNOSIS"
'When it does find it, it concatenates all the strings together to generate the
'report. Once it finds the "______" or "PATHO", it then writes it to the output file
'start of 10th if-then
If Left(st_linestring, 19) = "CLINICAL DIAGNOSIS:" _
Or Left(st_linestring, 16) = "SPECIMEN SOURCE:" _
Or Left(st_linestring, 21) = "PATHOLOGIC DIAGNOSIS:" _
Or Left(st_linestring, 16) = "CLINICAL HISTORY" _
Or Left(st_linestring, 9) = "STATEMENT" Then
Do
st_report = st_report & " " & st_linestring
st_report = Trim(st_report)
Line Input #1, st_linestring
st_linestring = Trim(st_linestring)
'section to added later to deal with multiple pages
If st_linestring = " " Then 'start of 11th if-then
'Picture2.Print Len(st_linestring)
Do While Len(st_linestring) < 2
Line Input #1, st_linestring
st_linestring = Trim(st_linestring)
'Picture2.Print Len(st_linestring)
Loop
'start of 14th if-then
If Right(st_linestring, 1) < Chr(48) Or _
Right(st_linestring, 1) > Chr(57) Then
'same record
For n = 1 To 7
Line Input #1, st_linestring
Next n
Else
End If
'end of 14th if-then
Else
End If 'end of 11th if-then
'new code
Loop Until EOF(1) Or _
Left(st_linestring, 10) = "__________" Or _
Left(st_linestring, 10) = "PATHOLOGIS"
Print #2, st_report; "~";
Print #4, st_report; "~";
st_report = ""
'GoTo break
Else
End If 'end of 10th if-then
'This section deals with the string containing M.D.
'When the string is found, the program takes out the physician's name
'We use the Instr function to check for "M.D."
x = InStr(st_linestring, st_MD1)
j = InStr(st_linestring, "M. D.")
k = InStr(st_linestring, "MD")
l = InStr(st_linestring, "M.D")
m = InStr(st_linestring, "MD.")
'start of 11th if-then
If x <> 0 Or j <> 0 Or k <> 0 Or l <> 0 Or m <> 0 Then
y = InStr(st_linestring, ",")
st_MDname = Left(st_linestring, y - 1)
pic_1.Print st_MDname; "~"
Print #2, st_MDname; "~"
Print #4, st_MDname; "~"
'new code
Do
Line Input #1, st_linestring
st_linestring = Trim(st_linestring)
Loop Until EOF(1) Or _
st_linestring = " " Or _
Left(st_linestring, 7) = "TxtLine"
'start of 12th if-then
If Left(st_linestring, 7) = "TxtLine" Then
st_temp5 = Mid(st_linestring, 23, 10)
st_temp5 = Trim(st_temp5)
Print #3, st_temp5; "~"
GoTo break
Else
End If 'end of 12th if-then
Else
End If 'end of 11th if-then
'End of body block
break:
End If 'end of 1st if-then
Loop Until EOF(1)
'End of block 1
pic_1.Cls
pic_1.Print "All done with file transfer"
Close #4
Close #3
Close #2
Close #1
End Sub
Private Sub
cmd_end_Click()
End
End Sub
Private Sub cmd_tilda_Click()
Dim st_line As String
Dim st_comp As String * 1
Dim st_tilda As String
Dim x, y, z, count As Integer
z = 0
st_tilda = "~"
Open "c:\kashef\patho_bcp.txt" For Input As #1
Open "c:\kashef\tilda.txt" For Output As #2
Do While Not EOF(1)
y = y + 1
Line Input #1, st_line
For x = 1 To Len(st_line)
st_comp = Mid(st_line, x, 1)
If st_comp = st_tilda Then
count = count + 1
End If
Next
'start of 13th if-then
If count <> 10 Then
z = z + 1
pic_1.Print y, count, z
Write #2, y, count, z
pic_1.Print Left(st_line, 20)
Write #2, Left(st_line, 20)
Else
End If
count = 0
Loop
pic_1.Print "Found "; z; " rows out of"; y; " records with incorrect number of tildas"
pic_1.Print "All done with tildas"
Close #1
Close #2
End Sub
Private Sub Form_Load()
txt_1.Visible = False
txt_1.SelStart = 0
txt_1.SelLength = Len(txt_1.Text)
cbo_month.AddItem "Jan", 0
cbo_month.AddItem "Feb", 1
cbo_month.AddItem "Mar", 2
cbo_month.AddItem "Apr", 3
cbo_month.AddItem "May", 4
cbo_month.AddItem "Jun", 5
cbo_month.AddItem "Jul", 6
cbo_month.AddItem "Aug", 7
cbo_month.AddItem "Sep", 8
cbo_month.AddItem "Oct", 9
cbo_month.AddItem "Nov", 10
cbo_month.AddItem "Dec", 11
cbo_year.AddItem "89", 0
cbo_year.AddItem "90", 1
cbo_year.AddItem "91", 2
cbo_year.AddItem "92", 3
cbo_year.AddItem "93", 4
cbo_year.AddItem "94", 5
cbo_year.AddItem "95", 6
cbo_year.AddItem "96", 7
cbo_year.AddItem "97", 8
cbo_year.AddItem "98", 9
End Sub
PHASE 4. Test the program on the sample file.
After the program was finished, we tested it on a few sample files. The output of Oct_97.txt looks like:
LASTNAME~FIRSTNAME~10/02/97~0002911~14593610~C-1234-97~F~1/19/1966~SPECIMEN SOURCE:
CERVICAL/ENDOCERVICAL DATE SPECIMEN COLLECTED: 10/02/97 STATEMENT OF ADEQUACY: SATISFACTORY.
ENDOCERVICAL CELLS PRESENT. DIAGNOSIS - BETHESDA SYSTEM: WITHIN NORMAL LIMITS CYTO TECH: F.
TECH, CT(ASCP)~DOC~
...............................
As you can see, the fields are in the correct position and they are seperated by tildas "~". There are ten
fields in total: last name, first name, exam date, mrn #, account #, speciment #, gender, date of birth,
report and the physician. After the last field, we have a new line character for the end of record.
|